Bridging the Gap: Missing Data Imputation Methods and Their Effect on Dementia Classification Performance

IRIS

Background/Objectives: Missing data is a common challenge in neuroscience and neuroimaging studies, especially in the context of neurodegenerative disorders such as Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD). Inadequate handling of missing values can compromise the performance and interpretability of machine learning (ML) models. This study aimed to systematically compare the impacts of five imputation methods on classification performance using multimodal data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Methods: We analyzed a dataset including clinical, cognitive, and neuroimaging features from ADNI participants diagnosed with MCI or AD. Five imputation techniques—mean, median, k-Nearest Neighbors (kNNs), Multiple Imputation by Chained Equations (MICE), and missForest (MF)—were applied. Classification tasks were performed using Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). Models were trained on the imputed datasets and evaluated on a test set without missing values. The statistical significance of performance differences was assessed using McNemar’s test. Results: On the test set, MICE imputation yielded the highest accuracy for both RF (0.76) and LR (0.81), while SVM performed best with median imputation (0.81). McNemar’s test revealed significant differences between RF and both LR and SVM (p < 0.01), but not between LR and SVM. Simpler methods like mean and median performed adequately but were generally outperformed by MICE. The performance of kNNs and MF was less consistent. Conclusions: Overall, the choice of imputation method significantly affects classification accuracy. Selecting strategies tailored to both data structure and classifier is essential for robust predictive modeling in clinical neuroscience.

Bridging the Gap: Missing Data Imputation Methods and Their Effect on Dementia Classification Performance

Aracri F.;Bianco M. G.;Quattrone A.;Sarica A.

2025-01-01

Abstract

Background/Objectives: Missing data is a common challenge in neuroscience and neuroimaging studies, especially in the context of neurodegenerative disorders such as Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD). Inadequate handling of missing values can compromise the performance and interpretability of machine learning (ML) models. This study aimed to systematically compare the impacts of five imputation methods on classification performance using multimodal data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Methods: We analyzed a dataset including clinical, cognitive, and neuroimaging features from ADNI participants diagnosed with MCI or AD. Five imputation techniques—mean, median, k-Nearest Neighbors (kNNs), Multiple Imputation by Chained Equations (MICE), and missForest (MF)—were applied. Classification tasks were performed using Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). Models were trained on the imputed datasets and evaluated on a test set without missing values. The statistical significance of performance differences was assessed using McNemar’s test. Results: On the test set, MICE imputation yielded the highest accuracy for both RF (0.76) and LR (0.81), while SVM performed best with median imputation (0.81). McNemar’s test revealed significant differences between RF and both LR and SVM (p < 0.01), but not between LR and SVM. Simpler methods like mean and median performed adequately but were generally outperformed by MICE. The performance of kNNs and MF was less consistent. Conclusions: Overall, the choice of imputation method significantly affects classification accuracy. Selecting strategies tailored to both data structure and classifier is essential for robust predictive modeling in clinical neuroscience.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Alzheimer’s disease
imputation
machine learning
MICE
missForest
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/111003

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

1

social impact