Four-Channel ECG as a Single Source for Early Diagnosis of Cardiac Hypertrophy and Dilation — A Deep Learning Approach
Abstract
Background
The electrocardiogram (ECG) remains the most commonly used screening tool for cardiac diseases. Although cardiac hypertrophy, dilation, and enlargement are important causes of heart failure and sudden death, they are mainly diagnosed via echocardiography after symptom onset, due to the low sensitivity of human ECG interpretation. This study challenges the mainstream diagnostic methodology by implementing a reduced-channel deep learning–based model that utilizes an ECG (or a four-channel ECG) as a single data source for early diagnosis.
Methods
We constructed a large-scale database comprising 90,895 ECGs from 74,562 patients taken from a total of 2,386,886 ECGs and 988,257 echocardiograms from January 1, 2012 to July 17, 2021, from Tongji Hospital, Wuhan, China. A multi-label deep learning–based model using ECG as a single input was created, with echocardiography as the gold standard at the model training stage. Four distinct datasets were used for testing. Furthermore, we applied an aggregated attribution score for each lead, based on the expected gradient of the model, to investigate the representative lead of the model.
Results
The sensitivity value increased from 0.270 (as reported by six participating ECG physicians with 6 to 24 years of experience) to 0.586 after using the proposed model, demonstrating a twofold increase in average sensitivity. Therefore, in over half of the patients with cardiac hypertrophy, dilation, and enlargement, cases can potentially be detected during routine ECG monitoring. The calculated attribution score identified the four highest-performing leads: I, aVR, V1, and V5. The performance of the reduced-channel model, trained with I, aVR, V1, and V5 leads, is equivalent to that of the 12-channel model, which supports the feasibility of wearable devices as an alternative to echocardiography.
Conclusions
ECGs can serve as a viable method for early diagnosis of cardiac hypertrophy, dilation, and enlargement through routine monitoring. The four representative leads can assist in human ECG annotation and inform portable device design using fewer embedded channels. Using a large-scale cardiac hypertrophy, dilation, and enlargement database comprising 90,895 ECGs from 74,562 patients who underwent ECG and echocardiography during a single visit or within a short time frame, we more than doubled the average sensitivity across all cardiovascular regions, suggesting that, in over half of the patients with these conditions, cases can potentially be detected during routine ECG monitoring. The four highest-performing leads were identified (I, aVR, V1, and V5), supporting the potential for efficient diagnosis with fewer embedded channels. (Funded by the National Natural Science Foundation of China and others.)
Introduction
Cardiac hypertrophy, dilation, and enlargement are important causes of heart failure, sudden death, and arrhythmia, with a high prevalence in the general population.1 The conventional diagnostic methods of echocardiography and cardiovascular magnetic resonance imaging can be cost-prohibitive and difficult to access, leaving some individuals undiagnosed.2-4 There is an urgent need to increase the detection rate of these conditions and associated diseases by transforming the current passive detection strategy, which results in detection after symptom onset, into an active detection strategy via routine examination and daily health monitoring. Electrocardiography has been used as a supplementary modality for detecting cardiac hypertrophy, dilation, and enlargement.5 Unlike the heart rhythm and rate abnormalities that are commonly observed in arrhythmia, electrocardiography indicators of atrial enlargement and ventricular hypertrophy/dilation are substantially more complex.6,7 Cardiac hypertrophy, dilation, and enlargement alter the electrocardiogram (ECG) waveform in various ways, including increased duration/width or amplitudes of P-waves and the QRS complex; alterations in ST segments and T-waves; pathological deep and narrow Q-waves; and conduction abnormalities, such as bundle branch block.8-10 Several criteria have been derived for different hypertrophy/enlargement locations, including left ventricular hypertrophy and/or dilation (LVH and/or LVD), right ventricular hypertrophy and/or dilation (RVH and/or RVD), right atrial enlargement (RAE), and left atrial enlargement (LAE), based on these indicators. Most of these are established through linear or simple nonlinear combinations of different wave amplitudes and complex intervals of indicators, which are nonuniform and unsatisfactory worldwide, owing to inadequate sensitivity and/or specificity scores.
Deep learning approaches, a subset of artificial intelligence (AI) and machine learning, using ECGs have been developed but focus primarily on arrhythmia, myocardial ischemia, and myocardial infarction.11-18
Literature identifying cardiac hypertrophy, dilation, and enlargement in both the atrium and ventricle is lacking due to the following challenges. First, constructing a reliable and large-scale ECG dataset, including data from patients with cardiac hypertrophy, dilation, and enlargement who underwent both ECG (model input) and echocardiography (gold standard/model label) during the same hospitalization period or within a short time frame is extremely difficult. Second, ECGs performed in patients with cardiac hypertrophy, dilation, and enlargement have shown morphological changes that are more complicated than arrhythmias, particularly small P-wave depressions, wider P-waves, double peaks in the P-wave, and ST segment and T-wave changes. In machine learning models, morphological features across leads and spatial locations of ECG recordings are considerably more difficult to diagnose than rhythm abnormalities. Furthermore, previous studies have mainly used a 12-channel ECG, supporting the model with comprehensive information on the ECG changes but increasing the time for data processing, application, and storage, and presenting barriers for future wearable/portable devices.19,20 Because of this, it is crucial to identify the leads that contribute most to diagnostic results. In addition, previous deep learning models have specifically focused on LAE, LVH, or RVH,21-25 and have a limited ability to reflect the entire condition of heart hypertrophy and enlargement. In clinical practice, the high prevalence of complex bilateral abnormalities (patients with both right and left atrial and ventricular hypertrophy with a limited channel format) transforms the diagnosis problem into a multi-label classification task, further complicating model training.
Methods
We constructed a large-scale database on cardiac hypertrophy, dilation, and enlargement of the atrium and ventricle, containing 90,895 ECGs from 74,562 inpatients and outpatients at Tongji Hospital, China, who underwent both ECG and echocardiography during a single visit or within a short time frame (70% within 3 days and 86% within 2 weeks). As shown in Figure 1B, we built a multi-label multi-representation deep learning–based model named ECG-Cor-Net, which aims to reveal hidden representations for the detection of morphological waveform changes associated with cardiac hypertrophy, dilation, and enlargement features. The performance of the model is compared with that of six ECG physicians. An external dataset from another hospital and a public dataset are used to assess generalization.
Figure 1
ECG and Echocardiography Data
The ECG data are in standard 10 s, 12-lead format with a sampling rate of 500 Hz, and from three campuses (Main Campus, Optical Valley Campus, and Sino-French New City Campus) of Tongji Hospital. The majority of data were acquired using a General Electric (GE) Marquette MAC 3500 or 5500 ECG machine (GE and Marquette University, WI, United States), and Philips PageWriter TC30 and TC50 machines (Philips, Shenzhen Goldway Industrial Inc., Shenzhen, China), while a minority were acquired using the mecg1000 (MedeX Technologies Limited, Beijing, China). The ECG data from the First People’s Hospital of Jiangxia District, Wuhan, China (abbreviated as Jiangxia Hospital), were acquired using different machine brands, including a GE Marquette MAC 800 ECG machine (GE and Marquette University, WI, United States), and mecg300 and mecg200 (MedeX Technologies Limited, Beijing, China). Echocardiography data were acquired by a GE Vivid E9 ultrasound scanner (GE Vingmed Ultrasound, Horten, Norway) for all patients. Offline measurements were performed by an experienced cardiologist using EchoPAC (version: 203.66; GE Vingmed Ultrasound, Horten, Norway). Diastolic interventricular septum (IVS) thickness, left ventricle (LV) posterior wall (PW) thickness, LV end-diastolic dimensions, right ventricle (RV) end-diastolic dimensions, and left atrium (LA) end-systolic dimensions were measured from the parasternal long-axis view, and right atrium (RA) systolic dimensions were measured from the apical four-chamber view. According to the guidelines, LVH/LVD was further classified as a diameter of greater than 53 mm in men and greater than 51 mm in women and/or IVS and LV PW diameter of greater than 10 mm. RVH/RVD was further classified as a diameter of greater than 33 mm in men and women, LAE as a diameter of greater than 37 mm in men and greater than 35 mm in women, and RAE as a diameter of greater than 45 mm in men and women.26,27
Annotation Strategy
In our dataset, each ECG record is associated with a gold standard with respect to its corresponding echocardiography record. For example, a patient with a normal ECG can be labeled as having LVH based on the echocardiography results, whereas a patient diagnosed with a single LVH label by ECG may also have RVH according to additional echocardiography examination results. Only one patient identifier (ID) was assigned to each patient during the visit. This ensured that the patient had the same ID when undergoing the ECG and echocardiography tests.
Public Dataset Preparation
We use an open-access dataset (PTB-XL) to examine the generalizability of the proposed approach.28 The PTB-XL ECG dataset is a dataset of 21,837 clinical 12-lead ECGs from 18,885 patients of 10 s length. The diagnoses within this dataset are categorized into five superclasses, including normal ECG, myocardial infarction, ST/T change, conduction disturbance, and hypertrophy. Each superclass contains multiple subclasses. Specifically, for hypertrophy, the PTB-XL dataset provides LVH, left atrial overload/enlargement (LAO/LAE), RVH, right atrial overload/enlargement (RAO/RAE), and septal hypertrophy (SEHYP). For our tests, we focus on data labeled under the normal ECG and hypertrophy superclasses and utilize the provided standard communications protocol for computer-assisted electrocardiography (SCP-ECG) statements as the gold standard. The SCP-ECG statements labeled each subclass to some extent and, since there is no further annotation, we selected the records labeled with the code “100.” To be consistent with our multi-label five-class model, ECGs with SEHYP labels are classified as LVH. Our data preprocessing results in a test dataset comprising 8081 12-lead ECGs, including 151 LAOs/LAEs (referred to as the LAE group), 776 LVHs and SEHYPs (referred to as the LVH and/or LVD group), 33 RAOs/RAEs (referred to as the RAE group), 28 RAOs/RAEs (referred to as the RVH and/or RVD group), and 7182 normal classes.
Physician Evaluation
Three ECG physicians working in the cardiology departments of Tongji Hospital and three ECG physicians working in the cardiology departments of Jiangxia Hospital were invited to explain the ECGs to assess the comparative performance of the deep learning model. We divided the physicians into two groups according to their experience of working with ECGs: 0 to 6 years and 6 to 24 years. Table S1 in the Supplementary Appendix shows the number of physicians and the mean age, sex information, and experience level of each group. Each physician received 473 ECGs from Tongji Hospital and was informed that only the five classes included in this study could be selected (either singly or in combination).
Aggregated Attribution Score for Each Lead
To detect whether a patient has one or more of the four types of diseases (LVH and/or LVD, RVH and/or RVD, LAE, and RAE), or is a patient without any of these conditions (normal), we have formulated a multi-label diagnostic model, denoted ECG-Cor-Net (model architecture and parameters are presented in Fig. S2 and Table S6, respectively), to detect multiple types of LVH and/or LVD, RVH and/or RVD, LAE, and RAE with ECG as a single source. Each ECG input X corresponded to an output vector . Our aim is to learn a model as a nonlinear function to minimize the cost function between the ground-truth label Y and the predicted label . We propose a novel aggregated attribution score for evaluating ECG data pertaining to different classes. This score is computed for each lead and time point based on the expected gradients of the ECG-Cor-Net model, incorporating concepts from integrated gradients, a gradient-based pixel attribution method, and Shapley additive explanations, an additive feature attribution method.29 The binary classification model, denoted as , represents a nonlinear function to diagnose one of the abnormalities of LVH and/or LVD, RVH and/or RVD, LAE, and RAE.
Multi-Label Diagnostic Model with Electrocardiogram as a Single Source
The proposed ECG-Cor-Net model is a multi-label model composed of a one-dimensional (1D) flow and a two-dimensional (2D) flow that correspond to the original 10 s time series and the segmented images of the 12-channel input, respectively, as shown in the blue and red halves in Figure S2. The segmented images of the 12-channel model are one heartbeat of a 10 s ECG sample. We cropped out the flattened period (first and last 40 samples) of the segmented waveform.30 The cropped waveform samples were then converted into a 128 × 128 resolution 2D image of each lead as the input of 2D flow (see heartbeat segmentation details in Fig. S2). The overall architecture of the ECG-Cor-Net model is stacked by four modules, namely a 2D image conversion module (in the 2D flow only) to investigate the rich features of morphological characteristics from image segmentations of the ECG input, an embedding module to extract hidden features in the latent space, a correlation module to explore interdependencies between different channels and spatial locations of the extracted hidden features, and a multi-label diagnosis module for the diagnosis of five classes. See the parameters of each module in Table S6. We choose four leads with high aggregated attribution scores to obtain two simplified models for the detection of cardiac hypertrophy, dilation, and enlargement. In our experiments, we compare the performance of the simplified four-channel models and the 12-channel model. We denote the basic model, trained only using 1D time series data without utilizing 2D images, as ECG-Cor-Net-1D. We also compare the performance of ECG-Cor-Net and ECG-Cor-Net-1D to demonstrate the importance of the implementation of 2D images as another representation.
Statistical Analysis
We use multi- and single-label validation metrics to verify the performance of the proposed ECG-Cor-Net model. For multi-label validation, five diagnostic metrics are utilized, including subset accuracy,31 Hamming loss,32 average F1-score, average recall, and average precision. Subset accuracy measures the exact match ratio for the multi-label diagnosis problem, indicating the percentage of accurately predicted ECGs for all five labels. Hamming loss, another multi-label metric, assesses the Hamming distance between predicted and true labels, penalizing individual label discrepancies. Single-label metrics, such as area under (AU) receiver operating characteristic curve (AUROC), AU precision-recall curve (AUPRC), sensitivity, specificity, F1-score, and accuracy, are calculated for different classes. The calculation formulas for each metric can be found in Section S4.
Results
ECG Dataset with Echocardiographic Findings as a Gold Standard
Our cardiac hypertrophy, dilation, and enlargement dataset was based on the complete standard ECG database of the three campuses of Tongji Hospital (Main Campus, Optical Valley Campus, and Sino-French New City Campus) at Huazhong University of Science and Technology, People’s Republic of China. Ethical approval was obtained from Tongji Hospital (No. TJ-IRB20220553). As shown in Figure 2, from January 1, 2012 to July 17, 2021, there were a total of 2,386,886 ECGs and 988,257 echocardiograms recorded. We identified echocardiograms categorized under cardiac hypertrophy, dilation, enlargement, and normal, and matched them to the corresponding ECGs. To construct a balanced dataset, we identified 90,895 ECGs from 74,562 inpatients and outpatients, who underwent both ECG and echocardiography examinations at one visit or within a short time frame. A total of 7377 patients were excluded if they were less than18 years of age. A total of 541 ECGs were excluded because of detached electrodes. After exclusions, 80,007 ECGs from 65,927 patients (57.25±14.38 years, 59% men) remained for cross-validation training (the exclusion flowchart is shown in Fig. 2). As for the training dataset, we have included all the echocardiography-matched ECGs without considering ECG abnormalities. However, for the test datasets we have excluded abnormalities related to depolarization from our analysis (e.g., bundle branch blocks, ventricular pacing), as these data would be reviewed by physicians who could be influenced by such depolarization abnormalities. We divided the cross-validation dataset for training, validation, and hold-out testing (denoted as the Hold-out Test), based on patient IDs, at a ratio of 8:1:1. The label distributions of each class in the cross-validation training dataset are presented Figure S1A. An external test dataset from Tongji Hospital (denoted as the TJ-Test), containing 473 ECGs from 469 new patients was constructed for external validation (see the label distributions of each class in Fig. S1B). To assess the generalization ability of the ECG-Cor-Net model, we built another external dataset from Jiangxia Hospital (denoted as the JX-Test), containing 177 ECGs from 166 patients in Wuhan, China, to verify the performance of the model in multicenter hospitals. The age and sex information of each class in the cross-validation training, TJ-Test set, and JX-Test set are presented in Table S5. We also use the public PTB-XL dataset to examine the generalizability of the proposed approach, which contains 8081 clinical 12-lead ECGs.
Figure 2
We have analyzed the cross-validation training dataset and the TJ-Test set, and collected the patients’ electronic health records (EHRs), as shown in Table S13. There is a higher number of inpatients than outpatients in the cross-validation dataset, whereas the TJ-Test set displays a similar ratio of inpatients to outpatients. Most patients are coming from various provinces of China, including Hubei, Henan, and Anhui. In terms of etiologies, congenital cardiovascular diseases rank first in the RVH and/or RVD (28.1%) and RAE (15.1%) groups, hypertension ranks first in the LVH and/or LVD (5.5%) group, and valvular diseases rank first (7.9%), and hypertension ranks second (5.4%) in the LAE group. Regarding comorbidities, coronary heart disease is the most prevalent in patients with LVH and/or LVD (6.8%), RVH and/or RVD (5.7%), and LAE (13.9%). Atrial fibrillation ranks first in patients with RAE (8.4%). Furthermore, there are 24,933/80,007 (31.2%) patients with LVH and/or LVD in the cross-validation training dataset. This ratio is consistent with the ranges reported in these studies.33
High Multi-Label Rates in LVH and/or LVD, RVH and/or RVD, LAE, and RAE
All ECGs are classified into five specific classes according to their echocardiographic findings: normal, LVH and/or LVD, RVH and/or RVD, LAE, and RAE. After exclusion, there were 5493, 11,023, 24,933, and 37,153 ECG samples for RVH and/or RVD, RAE, LVH and/or LVD, and LAE, respectively, remaining for analysis (Fig. 3B). The identified patients (except normal) had at least one abnormality in the left and/or right atriums and/or ventricles. A multi-label ECG summary of each class in the cross-validation training dataset is presented in Figure 3 (Fig. S1A shows the overall label distributions). Overall, ECGs diagnosed with LAE account for 46% (37,153/80,007) of the overall abnormal samples, and LVH and/or LVD account for almost 31% (24,933/80,007), indicating that most cardiac hypertrophy, dilation, and enlargement occurs in the left atrium and ventricle. Figure 3B shows that RVH and/or RVD and RAE have high multi-label percentages of 98% and 75%, respectively, indicating that patients with RVH and/or RVD or RAE often have functional abnormalities in other areas of the heart. LVH and/or LVD and LAE had relatively low multi-label percentages (36% and 31%, respectively). In addition, Figure S1A shows that a large proportion of RVH abnormalities (94.8%; 5209/5493) are also diagnosed as RAE, suggesting that enlargement of the RV is highly correlated with an increase in the size of the RA. Multiple labeling at such a high rate confirms the significantly high prevalence of LVH and/or LVD, RVH and/or RVD, LAE, and RAE, and complicates the training of the model. The average values and standard deviations of the key indicators for diagnosing cardiac hypertrophy, dilation, and enlargement, such as LV, RV, LA, RA, and aortic root diameters, are presented in Tables S2 and S4 for the multi-label and single-label patients, respectively. The missing values within these indicators are also statistically accounted for in Table S3.
Figure 3
Multi-Label Performance on the TJ-Test Set
We conducted nine rounds of cross-validation training by randomly splitting the first 90% (i.e., the first 8:1 among 8:1:1) for training and validating. This training process is repeated eight times, so nine different models are obtained for performance assessment on all test sets. The detection results of the ECG-Cor-Net model on the external TJ-Test set are compared with those of six experienced ECG physicians, categorized into two groups according to their working experience (9.67±6.26 years), i.e., physicians one to three have 0 to 6 years of experience and physicians four to six have 6 to 24 years of experience. The AUROC and AUPRC curves corresponding to the single-label metrics of RVH and/or RVD, RAE, and normal are shown in Figure 4 and Table 1, whereas the results of all five classes are shown in Figure S3 and Figure S4. The details of the participating physicians are provided in Table S1. Consistent with clinical practice, in Figure 4A, physician markers are grouped at the lower left corner of the plots for RVH and/or RVD and RAE, indicating the low sensitivity (associated with the probability of a correct diagnosis conditioned on truly being hypertrophy/dilation of the heart) and high specificity (associated with the probability of a correct diagnosis conditioned on truly being nonhypertrophy/nondilation of the heart) of clinical ECG readings on cardiac hypertrophy, dilation, and enlargement. In contrast, the ECG-Cor-Net model achieved high AUROC values with LAE, LVH and/or LVD, RAE, RVH and/or RVD, and normal values of 0.837, 0.881, 0.888, 0.898, and 0.929, respectively. Similarly, in Figure 4B, physician markers are grouped to the left of the abnormalities, indicating low recall values (similar to the sensitivity values in Fig. 4A) of clinical ECG readings on cardiac hypertrophy, dilation, and enlargement.
Figure 4
Table 1
Metrics | RVH and/or RVD | RAE | LVH and/or LVD | LAE | Normal | Average |
---|---|---|---|---|---|---|
Sensitivity | ||||||
Model sensitivity | 0.440±0.052† | 0.417±0.042† | 0.565±0.080† | 0.612±0.064† | 0.898±0.026 | 0.586±0.053† |
0–6 years’ experience sensitivity | 0.029±0.013 | 0.018±0.031 | 0.282±0.159 | 0.138±0.230 | 0.908±0.126 | 0.275±0.112 |
6–24 years’ experience sensitivity | 0.065±0.038 | 0.036±0.047 | 0.167±0.086 | 0.085±0.025 | 0.974±0.019† | 0.265±0.043 |
Specificity | ||||||
Model specificity | 0.977±0.006 | 0.975±0.008 | 0.906±0.027 | 0.843±0.023 | 0.803±0.04† | 0.901±0.021† |
0–6 years’ experience mean specificity | 0.993±0.005† | 0.990±0.017 | 0.944±0.043 | 0.942±0.101 | 0.299±0.258 | 0.834±0.084 |
6–24 years’ experience mean specificity | 0.988±0.010 | 0.990±0.010† | 0.985±0.012† | 0.982±0.007† | 0.181±0.076 | 0.825±0.023 |
F1-Score | ||||||
Model F1-score | 0.531±0.038† | 0.521±0.039† | 0.614±0.044† | 0.655±0.039† | 0.855±0.005† | 0.635±0.033† |
0–6 years’ experience mean F1-score | 0.053±0.024 | 0.028±0.049 | 0.373±0.144 | 0.167±0.270 | 0.691±0.101 | 0.262±0.089 |
6–24 years’ experience mean F1-score | 0.107±0.049 | 0.059±0.075 | 0.268±0.110 | 0.152±0.041 | 0.692±0.017 | 0.256±0.058 |
Accuracy | ||||||
Model accuracy | 0.925±0.005† | 0.909±0.008† | 0.818±0.012† | 0.754±0.017† | 0.850±0.009† | 0.851±0.010† |
0–6 years’ experience mean F1-score | 0.899±0.005 | 0.875±0.011 | 0.770±0.111 | 0.634±0.026 | 0.599±0.070 | 0.755±0.045 |
6–24 years’ experience mean F1-score | 0.899±0.006 | 0.877±0.003 | 0.770±0.011 | 0.638±0.011 | 0.572±0.035 | 0.751±0.013 |
*
LAE denotes left atrial enlargement; LVH and/or LVD, left ventricular hypertrophy and/or dilation; RAE, right atrial enlargement; and RVH and/or RVD, right ventricular hypertrophy and/or dilation.
†
Indicates highest score for each metric per category.
We found that the physician groups with 0 to 6 and 6 to 24 years of experience both obtained sensitivities of RAE and RVH and/or RVD even lower than 0.100 (0.018±0.031 and 0.029±0.013 for 0 to 6 year group, and 0.036±0.047 and 0.065±0.038 for 6 to 24 year group, respectively), confirming that the rate of missed diagnoses on ECG interpretation in clinical practice is extremely high. Fortunately, the ECG-Cor-Net model significantly increases diagnostic performance compared with physician performance. The model diagnostic scores of the AUROCs (Fig. 4A) and AUPRCs (Fig. 4B) are significantly higher than those of the physicians. F1-score is increased from physicians’ 0.159±0.173 for LAE, 0.321±0.128 for LVH and/or LVD, 0.044±0.059 for RAE, 0.080±0.046 for RVH and/or RVD, and 0.692±0.012 for normal, to the model’s 0.655±0.039, 0.614±0.044, 0.521±0.039, 0.531±0.038, and 0.855±0.005, respectively; sensitivity of abnormal classes is increased from physicians’ 0.111±0.149 for LAE, 0.224±0.131 for LVH and/or LVD, 0.027±0.037 for RAE, and 0.047±0.032 for RVH and/or RVD, to the model’s 0.612±0.064, 0.565±0.080, 0.417±0.042, and 0.440±0.052, respectively. The results of our ECG-Cor-Net model are better than those of the ECG-Cor-Net-1D model (without 2D image input) (Fig. 4, Figs. S3 and S4).
More importantly, we examined the multi-label diagnostic results using five multi-label diagnostic metrics: subset accuracy, Hamming loss, average F1-score, average recall, and average precision. A comparison of the proposed ECG-Cor-Net with six experienced physicians in terms of their multi-label diagnostic performance is presented in Figure 5. Typically, subset accuracy is a stringent metric that requires a precise match between the predicted and true labels, and which equally penalizes all predictions regardless of whether they are nearly accurate or entirely incorrect. Hamming loss is employed to count instances of label misclassification. The ECG-Cor-Net model results on the TJ-Test set outperform those of all the physicians. We further compared the subset accuracy results of the ECG-Cor-Net and ECG-Cor-Net-1D models with those of physicians (see Fig. S5A). The results improved from physicians’ 0.491±0.017 to ECG-Cor-Net-1D’s 0.569±0.016, further increasing to ECG-Cor-Net’s 0.592±0.016. We further validated the model’s performance on data from different ECG devices in the test set and provided experimental results for the two most common devices (anonymized as manufacturer 1 and manufacturer 2). The model performance shows similar results for ECGs from different manufacturers in diagnosing the above abnormalities (see results in Table S7).
Figure 5
Multi-Label Performance on the External JX-Test Set
To assess the performance of the proposed model in multicenter hospitals, the proposed ECG-Cor-Net model was externally tested on the JX-Test dataset, and the results are presented in Table S8. We place significant emphasis on the sensitivity and F1-score results, similarly to tests on the TJ-Test set, as these values tend to be consistently low in real-world clinical diagnoses owing to the limited cardiac hypertrophy, dilation, and enlargement characteristics of ECGs, leading to frequent misdiagnoses by ECG diagnostic physicians. Except for the limited availability of RVH and/or RVD samples (the record of only one patient was available) from the external JX-Test set, other experimental findings validate the efficacy of our proposed ECG-Cor-Net model in diagnosing LVH and/or LVD, RVH and/or RVD, LAE, and RAE on ECGs obtained from another hospital.
Multi-Label Performance on the Hold-Out Test Set of Cross-Validation
During cross-validation training, as shown in Figure 2, the last 10% of the patient IDs were fixed and not used for training and validation, serving as the Hold-out Test set. The average performances of the different validation metrics are presented in Table S9. These results closely align with the test set results of the TJ-Test set in Table 1 and JX-Test set in Table S8, further confirming the appropriateness of our data partitioning approach and the model’s robust generalization capabilities.
Explainable Diagnosis Guides: Lead Significance
Next, we investigate the highest-performing leads for detecting cardiac hypertrophy, dilation, and enlargement. This analysis aims to provide ECG physicians with more definitive guidance, building upon the results obtained from the deep learning–based ECG-Cor-Net model with a black-box nature. To this end, we propose an aggregated attribution score for each lead, l, of each class, based on the expected gradient of the proposed ECG-Cor-Net at each time point. The higher the score, the more significant the lead, l, was for the correct prediction of that class. To avoid the inference of ECG properties between different types of abnormalities, we use the four trained binary classification models for the calculation of aggregated attribution scores.
The values of each lead correspond to each class, and the average values of the four classes are shown in Figure 6. It has been suggested that leads I, II, and aVR in the limb leads and V1, V5, and V6 in the precordial leads are more important in the diagnosis of RVH and/or RVD than other leads. In the case of RAE, the scores for leads I and V1 reach 0.161 and 0.108, respectively, ranking as the top two leads in diagnosing RAE. For LVH and/or LVD and LAE, the scores for leads I, II, and aVR in the limb leads and V1, V2, and V4 in the precordial leads are all notably high. In Figure 6, we can see that the top three representative leads with the highest scores are leads I/II in the limb leads and V1, followed by V5/V6 in the precordial leads.
Figure 6
Performance of Simplified Models Trained with Representative Leads
To further validate the effectiveness of the five representative leads identified, we retrained the model with only four input leads chosen from the five representative leads. Using the same training dataset, we obtained two simplified models: one with the inputs of leads I, aVR, V1, and V5, named ECG-Cor-Net-rep-I, and another with the inputs of leads II, aVR, V1, and V5, named ECG-Cor-Net-rep-II. We tested the simplified models on the TJ-Test set, and the subset accuracy results, compared with the original ECG-Cor-Net model, are shown in Figure 7 (subset accuracy results and their 95% confidence intervals [CIs]). The AUROC and AUPRC values compared with the ECG-Cor-Net and ECG-Cor-Net-1D models are shown in Figure S6 and Figure S7, respectively. In terms of subset accuracy, we observe that the subset accuracy of the two simplified models, ECG-Cor-Net-rep-I and ECG-Cor-Net-rep-II, is slightly lower than that of the original ECG-Cor-Net model, decreasing from 0.592 to 0.551 and 0.534, respectively. When compared with physicians, both models maintained a very high level of accuracy. Specifically, ECG-Cor-Net-rep-II shows an average sensitivity of 0.554, specificity of 0.894, F1-score of 0.603, and accuracy of 0.833 in diagnosing heart structure abnormalities and the normal group; detailed information is shown in Table S12. The AUROC and AUPRC scores of the two simplified models are comparable to those of the ECG-Cor-Net and ECG-Cor-Net-1D models.
Figure 7
Discussion
This study has allowed us to establish a substantial and comprehensive dataset based on echocardiography results as the gold standard for model training. The high data volume ensures the generalization performance of the model: the smallest sample number for RVH and/or RVD reached 5493, whereas the largest sample number was 37,153 for LVH and/or LVD; the ratios of the sample numbers for the four abnormalities are consistent with their prevalence in the real population. Tables S2 and S4 show the measurement values and standard deviations of key indicators related to cardiac hypertrophy, dilation, and enlargement for patients in the multi- and single-label training datasets. When observing the statistical parameters of these five classes, we found them to show both correlations and distinctions.
In Table S13, we also show the descriptive statistics of the care settings, etiologies, and comorbidities of the patients from Tongji Hospital. For the etiology, Table S13 shows that congenital heart diseases account for the highest proportion of RV and RA abnormalities, while hypertension accounts for the highest proportion of LV abnormality, and valvular heart diseases and hypertension account for a high proportion of LA abnormality. There was a high proportion of patients with comorbidities of coronary heart diseases and cerebrovascular diseases in those with LA and LV abnormalities, and a high proportion of patients with comorbidities of coronary heart diseases and atrial fibrillation in those with RA and RV abnormalities. This indicates that our dataset is aligned with the pathophysiological characteristics of atrial and ventricular wall or chamber abnormalities.34,35
The model architecture used is shown in Figure S2. 2D images contain a higher-dimensional representation, in which changes such as waveform variations, widths, slopes, and hidden spatial contextual information can be observed more easily. In contrast, 1D time series are more sensitive to frequency and interval changes. Therefore, combining the benefits of these two modalities can effectively train a model of atrial and ventricular hypertrophy, dilation, and enlargement. In terms of the model’s effectiveness, rationality, and generalization, the results demonstrate that the ECG-Cor-Net model network shows a high discriminatory ability in distinguishing both abnormal and normal controls based on 12-channel ECG data. In terms of subset accuracy, the proposed model outperformed the physicians by as much as 9% (from an average of 0.491 for physicians and 0.592 for the model). Comparison results presented in Figure 4 (complete results are shown in Figs. S3 and S4) and Figure S5 demonstrate that, with the aid of the imaging modality, the diagnostic ability was enhanced for all four abnormalities, whereas the normal cohort diagnostic ability remained the same. This confirms that diagnostic accuracy, even for experienced physicians, tends to be extremely low. In addition, we also validated our model with single-label binary classification and got an averaged accuracy of 0.838, AUROC of 0.914, and F1-score of 0.835, values consistent with those of models reported in current literatures, and even superior to some of them.36
An important indicator, sensitivity, was greatly improved compared with that of physicians using ECG as a single input source. In the ECG-Cor-Net model, the average value of the sensitivity increased from 0.270 for physicians to 0.586 for the model, a twofold increase, indicating that more than half of the cases could have been diagnosed during routine monitoring. Furthermore, our ECG-Cor-Net model showed consistently superior performance in diagnosing cases from data from another hospital (JX-Test set) and a public dataset (PTB-XL set), which proves the generalizability of the model. These effects were accompanied by a superior performance in the single-label binary classification of LAE, LVH and/or LVD, RAE, and RVH and/or RVD. Hence, we believe that ECG has the potential to be considered a refined screening approach for the early diagnosis of cardiac hypertrophy, dilation, and enlargement, offering a low-cost, reliable, convenient, screening method as an alternative to more expensive ones.
In addition, we have identified the five highest-performing leads (I, aVR, V1, V5, and V6) through an attribution score analysis, which is aligned with established diagnostic criteria, further validating the reliability of the ECG-Cor-Net model. These results show practical implications for human ECG annotation and the design of portable devices, as they suggest a more targeted approach with fewer embedded channels for efficient diagnosis. Based on the identified highest-performing leads, we selected four representative leads and trained a simplified model using them as input. This simplified model achieved comparable results to the original ECG-Cor-Net, providing technical support for convenient ECG examinations and wearable devices.
This research contributes significantly to the field by proposing a shift in the diagnostic paradigm for cardiac hypertrophy, dilation, and enlargement, emphasizing the utility of ECG in routine physical examinations and daily health monitoring for both cardiac walls and chamber abnormalities, in both single- and multi-label levels. The identified representative leads and the proposed deep learning model have the potential to reshape clinical practices, offering a more accessible and cost-effective means of diagnosing hypertrophy, dilation, and enlargement of the atrium and ventricle. Consequently, the implementation of the proposed model can improve the detection rate of atrial enlargement and ventricular hypertrophy/dilation diseases in patients who have undergone only a four-channel ECG format (i.e., I, aVR, V1, and V5) without the need for echocardiography.
Limitations
However, this study has some limitations. ECGs of patients less than 18 years of age were not included and patients were not grouped according to age due to differences in diagnostic guidelines. Therefore, the performance of the proposed ECG-Cor-Net model in children and adolescents remains unknown. In addition, patients were not grouped according to their etiological characteristics, and the pathogenesis causing hypertrophy or enlargement was not analyzed, which may serve as an important resource for explaining AI-enabled diagnoses. However, further research is possible, since our data include echocardiography-paired ECG data and EHRs, including etiologies. In addition, our study is based on a retrospective cohort dataset. The majority of patients are from China but we did not collect racial or ethnic background information for those that were not.
Notes
A data sharing statement provided by the authors is available with the full text of this article.
This work is funded by the National Natural Science Foundation of China (grant number 82100531), the Hubei Science and Technology Project of China (grant number 2017ACB644), and the Horizontal Cooperation Project (grant number 2023100).
Disclosure forms provided by the authors are available with the full text of this article.
Supplementary Material
References
1.
Maron BJ, Gardin JM, Flack JM, Gidding SS, Kurosaki TT, Bild DE. Prevalence of hypertrophic cardiomyopathy in a general population of young adults: echocardiographic analysis of 4111 subjects in the CARDIA study. Circulation 1995;92:785-789.
2.
Maron BJ, McKenna WJ, Danielson GK, et al. American college of cardiology/European society of cardiology clinical expert consensus document on hypertrophic cardiomyopathy: a report of the American college of cardiology foundation task force on clinical expert consensus documents and the European Society of Cardiology committee for practice guidelines. J Am Coll Cardiol 2003;42:1687-1713.
3.
Maron MS, Maron BJ, Harrigan C, et al. Hypertrophic cardiomyopathy phenotype revisited after 50 years with cardiovascular magnetic resonance. J Am Coll Cardiol 2009;54:220-228.
4.
Maron BJ, Bonow RO, Cannon ROIII, Leon MB, Epstein SE. Hypertrophic cardiomyopathy. N Engl J Med 1987;316:844-852.
5.
Romhilt DW, Estes EHJr. A point-score system for the ECG diagnosis of left ventricular hypertrophy. Am Heart J 1968;75:752-758.
6.
Kabutoya T, Hoshide S, Kario K. Advances and challenges in the electrocardiographic diagnosis of left ventricular hypertrophy in hypertensive individuals. Am J Hypertens 2020;33:819-821.
7.
Nakamura M, Sadoshima J. Mechanisms of physiological and pathological cardiac hypertrophy. Nat Rev Cardiol 2018;15:387-407.
8.
Maron BJ, Wolfson JK, Ciró E, Spirito P. Relation of electrocardiographic abnormalities and patterns of left ventricular hypertrophy identified by 2-dimensional echocardiography in patients with hypertrophic cardiomyopathy. Am J Cardiol 1983;51:189-194.
9.
Maron BJ, Gottdiener JS, Goldstein RE, Epstein SE. Hypertrophic cardiomyopathy: the great masquerader: clinical conference from the cardiology branch of the national heart, lung, and blood institute, Bethesda, Md. Chest 1978;74:659-670.
10.
Cosio FG, Moro C, Alonso M, de la Calzada CS, Llovet A. The Q waves of hypertrophic cardiomyopathy: an electrophysiologic study. N Engl J Med 1980;302:96-99.
11.
Mousavi S, Afghah F. Inter-and intra-patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach. ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE 2019:1308-1312.
12.
Majumdar A, Ward R. Robust greedy deep dictionary learning for ECG arrhythmia classification. 2017 International joint conference on neural networks (IJCNN). IEEE 2017:4400-4407.
13.
Baloglu UB, Talo M, Yildirim O, San Tan R, Acharya UR. Classification of myocardial infarction with multi-lead ECG signals and deep CNN. Pattern Recognit Lett 2019;122:23-30.
14.
Acharya UR, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M. Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf Sci 2017;415:190-198.
15.
Attia ZI, Noseworthy PA, Lopez-Jimenez F, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 2019;394:861-867.
16.
Khurshid S, Friedman S, Reeder C, et al. ECG-based deep learning and clinical risk factors to predict atrial fibrillation. Circulation 2022;145:122-133.
17.
Zhu H, Cheng C, Yin H, et al. Automatic multi-label electrocardiogram diagnosis of heart rhythm or conduction abnormalities with deep learning: a cohort study. Lancet Digit Health 2020;2:e348-e357.
18.
Ran S, Yang X, Liu M, et al. Homecare-oriented ECG diagnosis with large-scale deep neural network for continuous monitoring on embedded devices. IEEE Trans Instrum Meas 2022;71:1-13.
19.
Ko W-Y, Siontis KC, Attia ZI, et al. Detection of hypertrophic cardiomyopathy using a convolutional neural network-enabled electrocardiogram. J Am Coll Cardiol 2020;75:722-733.
20.
Liu C-W, Wu F-H, Hu Y-L, et al. Left ventricular hypertrophy detection using electrocardiographic signal. Sci Rep 2023;13:2556.
21.
Jiang J, Deng H, Xue Y, Liao H, Wu S. Detection of left atrial enlargement using a convolutional neural network-enabled electrocardiogram. Front Cardiovasc Med 2020;7:609976.
22.
Potter EL, Rodrigues CH, Ascher DB, Abhayaratna WP, Sengupta PP, Marwick TH. Machine learning of ECG waveforms to improve selection for testing for asymptomatic left ventricular dysfunction. Cardiovasc Imaging 2021;14:1904-1915.
23.
Khurshid S, Friedman S, Pirruccello JP, et al. Deep learning to predict cardiac magnetic resonance–derived left ventricular mass and hypertrophy from 12-lead ECGs. Circ Cardiovasc Imaging 2021;14:e012281.
24.
Siontis KC, Liu K, Bos JM, et al. Detection of hypertrophic cardiomyopathy by an artificial intelligence electrocardiogram in children and adolescents. Int J Cardiol 2021;340:42-47.
25.
Lin G-M, Lu HH-S. A 12-lead ECG-based system with physiological parameters and machine learning to identify right ventricular hypertrophy in young adults. IEEE J Transl Eng Health Med 2020;8:1-10.
26.
Deng Y, Xie M, Zhang Q. Chinese medical imaging (Chinese edition). Beijing: People’s Military Medical Publisher, 2011.
27.
Bonow RO, Carabello BA, Chatterjee K, et al. ACC/AHA 2006 guidelines for the management of patients with valvular heart disease: a report of the American college of cardiology/American heart association task force on practice guidelines (writing committee to revise the 1998 guidelines for the management of patients with valvular heart disease) developed in collaboration with the society of cardiovascular anesthesiologists endorsed by the society for cardiovascular angiography and interventions and the society of thoracic surgeons. J Am Coll Cardiol 2006;48:e1-e148.
28.
Wagner P, Strodthoff N, Bousseljot R-D, et al. PTB-XL, a large publicly available electrocardiography dataset. Sci Data 2020;7:1-15.
29.
Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017;30:4765-4774.
30.
Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng 1985;.32:230-236.
31.
Doppa JR, Yu J, Ma C, Fern A, Tadepalli P. HC-search for multi-label prediction: an empirical study. Proceedings of the AAAI Conference on Artificial Intelligence 2014;28(1).
32.
Babbar R, Schölkopf B. Data scarcity, robustness and extreme multi-label classification. Mach Learn 2019;108:1329-1351.
33.
Somani S, Hughes JW, Ashley EA, Witteles RM, Perez MV. Development and validation of a rapid visual technique for left ventricular hypertrophy detection from the electrocardiogram. Front Cardiovasc Med 2023;10:1251511.
34.
Yildiz M, Oktay AA, Stewart MH, Milani RV, Ventura HO, Lavie CJ. Left ventricular hypertrophy and hypertension. Prog Cardiovasc Dis 2020;63:10-21.
35.
Guihaire J, Haddad F, Mercier O, Murphy DJ, Wu JC, Fadel E. The right heart in congenital heart disease, mechanisms and recent advances. J Clin Exp Cardiol 2012;8:1-11.
36.
Kwon JM, Jeon K-H, Kim HM, et al. Comparing the performance of artificial intelligence and conventional diagnosis criteria for detecting left ventricular hypertrophy using electrocardiography. Europace 2020;22:412-419.
Information & Authors
Information
Published In
Copyright
Copyright © 2024 Massachusetts Medical Society.
History
Submitted: December 12, 2023
Accepted: August 1, 2024
Published online: September 26, 2024
Published in issue: September 26, 2024
Topics
Data Sharing Statement
Python scripts related to this paper, the test set from Tongji Hospital (TJ-Test set), and 10,000 samples of the cross-validation training set (randomly selected from the total cross-validation training set according to the proportion of different abnormalities and the normal class) are available at https://github.com/ecg4hypertrophy/DATA. The public PTB-XL set is available at https://physionet.org/content/ptb-xl/1.0.3/. Any data use will be restricted to noncommercial research purposes, and must comply with the relevant laws and regulations of China. Any authors who use the data or information of this study should make reference to this paper.
Authors
Metrics & Citations
Metrics
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.