Background: This study focuses on evaluating the effectiveness and reliability of GPT-4 in classifying radiological reports based on the Fazekas scale, a critical tool for assessing white matter signal abnormalities in brain MRI. We applied synthetic data creation and two specific GPT models, SinteticRMFazekasGPT and FazekasGPT, to generate and analyze 50 synthetic radiological reports. The study compared the performance of GPT-4 with the expert judgment of a neuroradiologist, for Fazekas classifications from brain MRI reports. Results: Our analysis included contingency table and Cohen's Kappa for inter-rater agreement. The significance of the difference between the observed agreement and the expected agreement by chance was calculated, with a 5% threshold for a Type I error. The agreement between GPT-4 and the neuroradiologist was total (100%) regarding the Fazekas 0, with Fazekas 2 and with Fazekas 3. Out of the 15 reports with Fazekas 1, only 13 (86.7%) were correctly classified by GPT-4, while the remaining 2 (13.3%) were classified as Fazekas 2. Overall, the agreement was 96%, compared to an expected chance agreement of 28%. The Cohen’s Kappa value was 0.94 (p < 0.001), indicating an almost perfect agreement. Conclusions: We reported a novel application of GPT-4 to automatically obtain Fazekas classification from brain MRI reports. The results suggest GPT-4 as a promising supportive tool for obtaining Fazekas classification from brain MRI reports.

Automated Fazekas classification from brain MRI reports: an artificial intelligence approach with GPT-4

Di Gennaro, Gianfranco;
2025-01-01

Abstract

Background: This study focuses on evaluating the effectiveness and reliability of GPT-4 in classifying radiological reports based on the Fazekas scale, a critical tool for assessing white matter signal abnormalities in brain MRI. We applied synthetic data creation and two specific GPT models, SinteticRMFazekasGPT and FazekasGPT, to generate and analyze 50 synthetic radiological reports. The study compared the performance of GPT-4 with the expert judgment of a neuroradiologist, for Fazekas classifications from brain MRI reports. Results: Our analysis included contingency table and Cohen's Kappa for inter-rater agreement. The significance of the difference between the observed agreement and the expected agreement by chance was calculated, with a 5% threshold for a Type I error. The agreement between GPT-4 and the neuroradiologist was total (100%) regarding the Fazekas 0, with Fazekas 2 and with Fazekas 3. Out of the 15 reports with Fazekas 1, only 13 (86.7%) were correctly classified by GPT-4, while the remaining 2 (13.3%) were classified as Fazekas 2. Overall, the agreement was 96%, compared to an expected chance agreement of 28%. The Cohen’s Kappa value was 0.94 (p < 0.001), indicating an almost perfect agreement. Conclusions: We reported a novel application of GPT-4 to automatically obtain Fazekas classification from brain MRI reports. The results suggest GPT-4 as a promising supportive tool for obtaining Fazekas classification from brain MRI reports.
2025
Artificial intelligence
Diagnostic accuracy
Fazekas scale
GPT models
GPTs
Large language models
Neurological imaging
Radiology
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/109741
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact