Introduction

Wearable devices start to gain momentum in disease diagnostic confirmation, treatment evaluation, and healthy aging1,2,3,4. For those who suffer from congenital choking5,6 or neck cancer7,8,9, post-surgical rehabilitation of the human throat often requires the clinician’s continuous monitoring and evaluation of swallowing ability10, vocal-fold motion11, oral intake of liquid12, and others13,14,15. Monitoring and diagnosis of these behaviors can also prevent secondary injuries that often occur in patients with dysphagia disorders during normal daily activities. Currently, the commercial devices to track laryngeal signatures are rigid, bulky, and tethered, including the lip-closing force gauge16, non-invasive belts17,18,19, thin-film pressure sensors20, and others21. The poor device/skin contact also results in attenuated signals that are also susceptible to motion artifacts. Therefore, soft on-throat devices are urgently needed to continuously monitor laryngeal activities for diagnosis and rehabilitation evaluation.

Diverse laryngeal activities with serial muscle movements typically cause laryngeal muscle motions and local inertial vibrations22. Therefore, they are often used to evaluate the laryngeal health condition of patients. For instance, swallowing amplitude and rhythm reflect the intake ability of food and water23. However, features obtained by traditional wearable devices such as the single force sensor on the throat20 can only provide limited information about the patient’s health condition. Monitoring of both vibrations and muscle activities is mainly achieved by using separate inertial devices and force sensors on the skin. Efforts to address this challenge lead to the development of a wearable accelerometer for neck voice disorders24 and flexible surface electromyogram (sEMG) electrodes with a strain sensor for oropharyngeal swallowing disorders23,25. However, these flexible sensors suffer from limited stretchability of less than 16%23,25, poor system integration with only sensors without functional circuit board21,26, and severe skin inflammation or allergy during use over several hours owing to its low permeability27,28,29. It is important to integrate soft wearable electronics2,30,31 with data processing/transmission units32,33,34 to realize the full potential of the wide range of electro-mechanical signatures35,36. A recent development on a skin-mounted mechano-acoustic sensing system provides a prospective way to track the activity at the suprasternal notch37,38. However, high-quality sEMG signals without being affected by motion artifacts are yet to be integrated into the device platform and the data analysis based on the advanced machine learning algorithm on the cloud is still needed for remote monitoring and evaluation.

Machine learning-based diagnosis is of great interest in the development of smart medicine, especially integrated with soft electronics (to classify the dysphagia severity)37,39,40. Early efforts include the use of the one-dimensional (1D) convolution neural network (CNN)-based deep learning for rehabilitation monitoring after orthopedic surgery41 and predicting knee joint postures42. Despite high recognition accuracy, these CNN models with individual-1D data sources only offer limited feature information during the learning processes for target predictions. These models without memory function and adaptive capabilities suffer from low prediction accuracies for data from new subjects, which is critical in practical applications.

Herein, this work presents a fully integrated standalone stretchable device platform that can wirelessly measure and analyze diverse vibrations and muscle activities directly from the human skin. The modified composite hydrogel electrode interface is designed to maintain robust contact to the throat with low contact impedance for improved signal quality during motion and low adhesion for easy removal. Besides sEMG signals, the triaxial broad-bandwidth accelerometer integrated into the patch can also monitor large body movements (e.g., walking and jum**) and subtle physiological activities (e.g., heartbeats and respiration). With a 2D class sequence feature extractor based on the CNN algorithm, 13 general features from fourteen healthy human subjects and two patients (one with myasthenia gravis and the other with laryngeal cancer) can be classified with a high accuracy of 98.2%. More importantly, the fully connected neurons of the 2D-like sequential feature-extracting model can allow the device system to adapt for use with noise from motion artifacts and on new subjects with a high prediction accuracy of 92%. A wireless user interface further enables remote monitoring and real-time evaluation of laryngeal activities on the cloud server, paving the way for the next-generation standalone stretchable device platform for laryngeal rehabilitation management and diagnosis and treatment evaluation of various diseases.

Results and discussion

Design of the laryngeal patch

The standalone stretchable device platform consisting of hydrogel electrodes and functional electronic components with signal-processing units interconnected by the coplanar serpentine Cu network (Fig. 1a) can directly adhere to the human skin (Fig. 1b). The processed and wireless transmitted signals from the inertial triaxial accelerometer and hydrogel electrodes (Fig. 1c and Fig. S1) provide continuous and non-invasive monitoring of local vibrations and muscle activities from the larynx and other locations on the human body. Various activities can also be distinguished with an efficient convolutional neural network and the data processed on a cloud server further facilitates remote rehabilitation and disease diagnosis (Fig. 1d). The standalone stretchable device platform fabricated from low-cost processes (Figs. S25) exhibits robust electromechanical performance upon various mechanical deformations (e.g., stretching, bending, and twisting) as verified by both finite element analysis (FEA) and experiments (Figs. 1e, S6, and movies S13).

Fig. 1: Overview and design of the standalone stretchable laryngeal patch.
figure 1

a Exploded diagram of the integrated device system. b Optical images of the standalone stretchable device patch attached to the laryngeal skin (top) and forearm (bottom). c Block functional diagram showing the processing steps of the acceleration and surface electromyography (sEMG) signals, including signal processing, controlling, communication, and display. d Schematic showing the use of a machine learning network and the standalone stretchable patch in laryngeal post-surgical rehabilitation. e Finite element analysis (FEA) and corresponding experimental results of the patch under mechanical deformations: uniaxial stretching of 30%, bending to the cylinder with a radius of 1 cm, conforming to a sphere with a radius of 4 cm, and twisting with a torsional angle of 90°.

Design and characterization of the composite hydrogel interface

The composite hydrogel mainly consists of monomer ([2-(Methacryloyloxy)ethyl]dimethyl-(3-sulfopropyl) ammonium hydroxide, DMAPS), crossed linker, photo-initiator, and ionic salt (see fabrication in Method). Without a need to strictly balance positive and negative charges during the copolymerization process, a wide range of monomer ratios, concentrations, and ionic strengths can be used to synthesize in this zwitterionic-type hydrogel43. To avoid oxidation of the copper-based electrode array and improve the contact quality to the skin for enhanced signal acquisition, a stretchable and highly conductive ionic hydrogel interface is designed by do** Ag nanowires (AgNWs) in the ionic composite with a polydimethylsiloxane (PDMS) skeleton (Fig. 2a). The design with the PDMS skeleton exhibits a reduced peak strain of 0.86 for a uniaxial stretching of 50%, compared to that of 3.13 in the one without (Fig. 2b). The stretchable PDMS with a higher modulus (~200 kPa) and optimized number of vertical beams (Fig. S7) provides the modified hydrogel composite with improved load and strain bearing capabilities (14 kPa/200% for the tensile stress/strain) (Figs. 2c and S8). The AgNWs with optimized concentration (0.7 wt%, Figs. S910) in the hydrogel not only provide high conductivity, but also result in lower contact impedance than that of the commercial gel electrode (Figs. 2d and S11) for the high-quality acquisition of sEMG signals (Fig. 2e). The lower contact impedance results from the high conductivity at the Cu/hydrogel interface (Fig. S12) and the improved hydrogel/skin contact quality as observed on the skin replica (Fig. 2f). The increased contact impedance from drying can recover by applying pure water (Fig. S13). Meanwhile, it is interesting to note that the composite hydrogel presents a lower tensile strain (Fig. 2g) and ca. eight times smaller peeling force (Fig. 2h) compared with the commercial gel electrodes, which facilitates easy removal, especially from the skin of the infants or elderly. The composite hydrogel that can protect the skin from UV and IR radiation (Fig. S14) also exhibits high cell viability (>80% of epithelial cells) and biocompatibility (Fig. 2i) for long-term monitoring. The modified hydrogel with excellent contact impedance and other properties suitable for integration with flexible circuits in the standalone device platform for health monitoring is superior compared to the previously reported hydrogels27,44,45,46,47,48,4a and movies S46), with a wide frequency spectrum from 0 to 400 Hz (Fig. 4b). The simultaneously measured acceleration data along three different directions allow the integrated system to distinguish multiple motions separately (e.g., talking while walking, drinking water while coughing, drinking water while swallowing) (Figs. S21 and S22), as well as capture the swallowing process from a patient with myasthenia gravis (Fig. S23). In the laryngeal events, the talking and swallowing signals follow the vibrational movements along the normal direction of the skin (z-axis), whereas walking and jum** signals stem from body motions along the throat skin from the neck to the head direction (y-axis). The time-frequency contour of the signals from the short-time Fourier transform confirms the high-frequency (up to 200 Hz) responses with a lower amplitude (less than 0.4 g) of swallowing and talking (Fig. 4c, d). In contrast, walking and jum** are opposite with the slow-moving frequency and a large acceleration magnitude (Fig. S24). These features are highly consistent with the behavior of adults38. In addition, periodically weak vibrations of laryngeal bones from cardiac and lung dilatation processes provide routes to measure heart rate (HR) and respiration rate (RR) from recorded acceleration along the Z and Y axes, respectively. With digital filtering and peak-detection (Fig. 4e), the HR and RR of 19.2 and 50.4 times per minute can be decoupled from the original data (Fig. 4f). As the mounting position of the patch moves up the laryngeal skin, the measured cardiac behavior remains unchanged, but the respiration amplitude decreases (Fig. S25) due to significantly reduced movements (along the neck direction) farther away from the chest cavity. Moreover, the swallowing signature can be clearly observed by placing the integrated device at the throat area with comparable performance to the suprahyoid area, whereas the signal at the suprasternal notch area is almost inconspicuous (Fig. S26), leading to the choice of middle throat area for laryngeal detection. More importantly, the combination of the acceleration data with sEMG for muscular activities during talking (Fig. S27) can further decouple speaking for speech recognition and more precise diagnosis and evaluation of various diseases in the clinical practice.

Fig. 4: Wireless recording of physiological processes and body motions.
figure 4

a Long-term continuous recording of various activities (e.g., sitting still, talking, swallowing, walking, and jum**). b Frequency spectrum of various biophysical activities of a healthy human. Error bars represent standard deviations, n = 3 independent samples. Data in (b) are presented as mean values ±SEM. Representative data (c and d) their corresponding frequency performance via the short-time Fourier transform (STFT). e Schematic block diagram of the algorithm and (f) the decoupled heart rate (HR) and respiration rate (RR) from inertial acceleration signals in real-time. The term (a.u.) in (f) represents the arbitrary unit.

The machine learning model for rehabilitative evaluation

The clinical evaluation for laryngeal rehabilitation usually focuses on typical events (i.e., talking, swallowing, volume, and viscosity swallowing with modified safety and effectiveness indicators) with long-term and extensive efforts77. To help automatically evaluate the laryngeal condition of the new patients and healthy individuals, a CNN-based 2D-like sequential feature extractor (2D-SFE) is explored to classify and infer pathological status based on the classification of physiological events (Fig. 5a). Comparison in the overall accuracy between this work and previous reports indicates the advantage of integrating two distinct signal inputs in adaptive machine learning34,71,73,76,78,79,80,81,82 (Table S3). The collected 1D data (acceleration and sEMG) are first transferred to a 2D vector similar to an image matrix for processing by the CNN-based 2D-SFE that contains 62 filtering layers (Fig. S28) and 2 classifying layers (Fig. S29). In the training model with the above-related activities, the processing vectors are iterated from the convolutional layer to the pooling layer and then to the activation layer to achieve dimensionality reduction of the feature vectors (FVs). The extracted feature is further classified into special targets by evaluating the consistency of the softmax function in the full connection layer. In the proof-of-the-concept demonstration, five Chinese pinyin and five vowels are chosen as the acoustic states, together with swallowing, drinking water, and coughing behaviors as feature states from fourteen healthy human subjects and two patients (one with myasthenia gravis and the other with laryngeal cancer, sampling rate of 333 Hz). The acceleration and sEMG data in 2D-like vector contours (Fig. 5b and SI Note1) also include noise from other motion actions (e.g., drinking water while coughing) to mimic real-world situations. Randomly dividing each 2D data of the laryngeal feature into 100 sequences with a fixed length of 1000 data points ensures data reliability (covering the entire test data of each feature). The triplet and cross-entropy are used as the objective functions for feature extraction and classification, respectively (Fig. 5c). Finally, the classified feature ultimately corresponds to the tested states, forming the confusion matrix

Methods

Ethics declaration

All human subject studies were approved by the Institutional Review Board of the First Affiliated Hospital of the Air Force Medical University (protocol: KY20222259-C-1), and the volunteers gave informed consent. The authors affirm that human research participants provided informed consent for publication of the images in Figs. 1b, 5b, and 6b.

Fabrication of the stretchable patch, hydrogel interface, and LTCC antenna

The fabrication of the stretchable patch primarily comprises (i) the engraving and transfer of the conductive serpentine traces, (ii) the low-temperature reflow soldering of components, and (iii) the encapsulation with Ecoflex (Fig. S2). Firstly, the mixed PDMS precursor with a weight ratio of 10:1 for the base to curing agent (Dow Corning, Sylgard 184A to B) was spin-coated on a clear glass plate at 600 rpm for 10 s, followed by curing at 70 °C for 1.5 h to form the adhesive substrate. Next, a copper (Cu) foil with a thickness of 8 μm (Red copper, T1100), coated on polyimide (PI, 3 μm, HanKe New materials Co., LTD.), was laminated on the PDMS film at a pressure of 100 kPa for 1 h. With the designed CAD file, a 355 nm UV laser (Yuanlu corporation, Wuhan) was used to engrave the serpentine Cu structure with 100 kHz pulse frequency at a speed of 300 mm s−1 for 4 times repeated cutting. Peeling off the residual left the conductive Cu network on the PDMS. The serpentine trace was then transferred to the uncured Ecoflex elastomer (Smooth-on, USA) mixed at a weight ratio of 1:1 (A: B) by a water-soluble tape (AQUASOL). Applying deionized water for 3 h dissolved the water-soluble tape. After placing laser-engraved rectangle PDMS isolators at the designed chip location, a thin layer of solder (AL656, Abond) was printed on the connecting pads through a laser-engraved PET mask. After placing all chips and components (smaller than 0.5 cm2, Fig. S36), the entire patch was heated in a solder pot (ZB2520HL, HuaQi zhengbang) at 138 °C for 30 min. Next, the water-soluble tape was applied to the copper mesh electrode, and an insulting oil (PVB, Langyi Chemical, Zhongshan) was sprayed on the circuits (except the water-soluble tape), followed by curing at 50 °C for 3 h. Finally, spin-coating silicone elastomer on the bare circuit at 500 rpm for 10 s and curing at 40 °C for 1.5 h completed the fabrication. The high reproducibility of the fabrication method is highlighted by the batch production of the integrated device platform (Fig. S37).

The fabrication of the hydrogel electrode started with sequentially mixing the monomer (DMAPS, Macklin), crossed linker (Methylene-Bis-Acrylamide, MBA, Macklin), photo-initiator (α-Ketoglutaric acid, KGA, Macklin), ionic salt (LiCl, Macklin), and deionized water at a weight of 1833:2:1:400:3333. All materials were purchased from Aladdin. Next, the 2.83 mL AgNW solution (5 mg mL−1 in isopropyl alcohol, Hengqiu Tech.) was added to the obtained hydrogel precursor, followed by a continuous string for 2 h. The PDMS mesh was prepared by engraving a 100 μm-thick PDMS film with the UV laser at a pulse frequency of 50 kHz and speed of 300 mm/s for 20 times repeated cutting. After dissolving the water-soluble tape to expose the copper electrode, the PDMS mesh was placed and the hydrogel precursor solution was poured, followed by curing in the UV pot at 30 W for 120 min, to form hydrogel-interfaced electrodes.

The fabrication of the LTCC antenna first used a punching machine (XT0800X) to punch holes (radius = 0.07 mm) in the ceramic germinal substrate at a speed of 1000 holes min−1 (Fig. S15). Next, the conductive copper slurry (General Research Institute for Non-ferrous Metals) dispersed in these holes was sintered on the heater at 150 °C for 3 h. The top and bottom antennas were then printed on the germinal substrate through a laser-engraved mask. After peeling off the mask, sintering of the antenna at 150 °C for 1 h was followed by blading the top/bottom packaged ceramic and sintering at 870 °C for 1.5 h.

Design of signal processing and transmission circuits

As illustrated in the detailed circuit schematic (Fig. S38), the signal processing unit included the low-power processor with 8-bit computing ability (Atmel, atmega328p), the inertial accelerometer (ANALOG, ADXL345), and the biological signal detection chip (NeuroSky, BMD101). The power management chip HT7133 (HOLTEK) was chosen due to its ability to convert the voltage from 3.7 to 3.3 V. The Bluetooth module PW02 (Phangwei Link) had serial port transmission ability. The integrated device can maintain continuous operation for approximately 5 to 6 h on a 35 mA h Li battery.

FEA of mechanics and electromagnetics

The FEA was carried out to study the mechanical behaviors of the hydrogel/patch under diverse deformations and the EM properties of the LTCC antenna. All material properties were assigned according to the material source in the FEA software. The Young’s modulus of these materials, the modified hydrogel, copper, eco-flex, polyimide, and skin replica, were set as 50 kPa, 80 GPa, 60 kPa, 800 MPa, 200 kPa, respectively.

Characterizations of mechanical/electrical and structural properties

All tensile tests were conducted by a mechanical testing machine (ZQ-990B, Zhi Qu). Material conductivity was measured with an LCR digital bridge meter (IM2536, HIOKI). A pair of electrodes with a size of 5.5 mm × 32 mm (or diameter of 18.36 mm for the commercial gel electrodes) separated by a distance of 40 mm on the forearm were used for the contact impedance test. The return loss of the LTCC antenna was measured by a vector network analyzer (SVA 1032X, SIGLENT). The electrophysiological signals were acquired by a multichannel tester (BL-420, TECHMAN, Chengdu). The morphologies of the materials were observed by the scanning electron microscope (HiVac, Apreo). The cell viability was tested by the MTT Assay microplate reader. The light transmittance was obtained by the solar film transmission meter (LS101, LinShang). HR and RR were obtained as the subject performed the instructed behaviors (e.g., holding the breath or sitting still).

Machine learning algorithms for classification and prediction

Continuous measurements from fourteen healthy human subjects and two patients (one with myasthenia gravis and the other with laryngeal cancer) were used as the training data and the measurements from another two volunteers (a male and a female) were used as testing data. First, all measured data were marked at different feature types and normalized to unify data dimensions. Randomly dividing each 2D data of the laryngeal feature into 100 sequences with a fixed length of 1000 data points (covering the entire test data of each feature such as swallowing, drinking, speaking, and coughing) ensured data reliability. Meanwhile, every label was coded via the one hot coding83. Next, the pretreated data were extracted through 8-weight blocks with 62 processing layers and fed into the classifier. The batch size in the learning model was set to 64 for extracting the feature. The iterative learning rate (LR) was calculated as

$${LR}=\left(1+0.5\cos \left(x+\pi /{epochs}\right)\right)*(1-{iLR}),$$
(1)

where x is the times at the corresponding training process, the epochs are iterative times during the training, and iLR is the initial LR (1e−4). The iterative attenuation rate (AR) was calculated from (with the initial value set as 0.9)

$${{AR}}_{t}={{AR}}_{t-1}-{LR}*\left(\frac{{m}_{t}}{1-{\beta }_{1}^{t}}\right)\big/\left(\varepsilon+\sqrt{{v}_{t} / (1-{\beta }_{2}^{t})}\right),$$
(2)

where β1 and β2 are the first and second-order attenuation coefficients, respectively, mt is the biased first-moment estimate, vt is the biased second raw moment estimate, and ε is the relative weight constant. Equation (2) was based on the optimized machine learning algorithm (i.e., Adaptive moment estimation, Adam). The motion/movement artifacts (e.g., from drinking while coughing) were also accounted for in the testing data from the two new human subjects (see details in SI Note1).

Development of the APP and cloud-served interface

The app was programmed using java on the Android Studio platform. The cloud-served interface was designed by the software development of the Internet of Things section in Alibaba Cloud. The pathological degree at the cloud server interface was sorted into eight levels (from the best state I to the worst state VIII) according to the rehabilitative conditions of three processes: swallowing (S), drinking water (D), and talking (T). During laryngeal rehabilitation, various behaviors such as swallowing, talking, and drinking water are assessed to be either normal or abnormal for each behavior, so the combined evaluation results of the three representative behaviors form eight evaluation states (Table S4). Although this 8-degree rating is not currently used in clinical evaluation, it could provide insights into patient’s conditions to help guide individualized rehabilitation in the future.

Statistics and reproducibility

No data were excluded from the analyses. No statistical method was used to predetermine the sample size. The results presented in Figs. 2be, h, i,3a, b, f–h4ad, f,6g–j, and Supporting Figs. 9, 1113, 21, 22, 2527 were obtained after three independent experiments with similar results.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.