Missing value issue is often encountered in international Neuroscience and Neuroimaging databases. As many statistical methods and Machine Learning (ML) algorithms are not designed to work with missing data, usually all variables associated with these records are removed, losing information and negatively affecting performance of neurodegenerative diseases classification such as Dementia. A reliable alternative is to employ imputation to substitute missing values, for example with the mean (I-mean), which is widely applied. Recently, missForest (MF), a Random Forest based algorithm - became popular for handling missing data in biomedical research. Thus, we aimed at assessing the reliability of MF in solving the missingness problem in a cohort of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) patients from international database Alzheimer's Disease Neuroimaging Initiative (ADNI), with clinical, cognitive and neuroimaging features. First, we amputed the complete dataset with increasing percentage of missing data (from 10% to 80%) by applying Missing Completely At Random (MCAR). Then, we used I-mean and MF approaches on amputed datasets and we compared their imputation error (RSME, NRSME, MAE). When average error on all features was considered, MF showed better performance than I-mean in each amputation percentage. However, when comparing error on single features, MF had slight performance decrease compared with I-mean on cognitive features ADAS, RAVLT and MMSE, regardless of the amputation percentage. We conclude that missForest resulted to be a reliable imputation algorithm for handling missing neuroscience data, although it should be used with caution on highly skewed variables, such as cognitive scores.

Imputation of missing clinical, cognitive and neuroimaging data of Dementia using missForest, a Random Forest based algorithm

Aracri, F;Bianco, MG;Quattrone, A;Sarica, A
2023-01-01

Abstract

Missing value issue is often encountered in international Neuroscience and Neuroimaging databases. As many statistical methods and Machine Learning (ML) algorithms are not designed to work with missing data, usually all variables associated with these records are removed, losing information and negatively affecting performance of neurodegenerative diseases classification such as Dementia. A reliable alternative is to employ imputation to substitute missing values, for example with the mean (I-mean), which is widely applied. Recently, missForest (MF), a Random Forest based algorithm - became popular for handling missing data in biomedical research. Thus, we aimed at assessing the reliability of MF in solving the missingness problem in a cohort of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) patients from international database Alzheimer's Disease Neuroimaging Initiative (ADNI), with clinical, cognitive and neuroimaging features. First, we amputed the complete dataset with increasing percentage of missing data (from 10% to 80%) by applying Missing Completely At Random (MCAR). Then, we used I-mean and MF approaches on amputed datasets and we compared their imputation error (RSME, NRSME, MAE). When average error on all features was considered, MF showed better performance than I-mean in each amputation percentage. However, when comparing error on single features, MF had slight performance decrease compared with I-mean on cognitive features ADAS, RAVLT and MMSE, regardless of the amputation percentage. We conclude that missForest resulted to be a reliable imputation algorithm for handling missing neuroscience data, although it should be used with caution on highly skewed variables, such as cognitive scores.
2023
Imputation
MissForest algorithm
Mean imputation algorithm
ADNI dataset
Alzheimer's disease
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/90641
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact