Word-Embeddings and Grammar Features to Detect Language Disorders in Alzheimer’s Disease Patients

Guerrero-Cristancho, Juan S.; Vásquez-Correa, Juan C.; Orozco-Arroyave, Juan R.

doi:10.22430/22565337.1387

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

TecnoLógicas

Print version ISSN 0123-7799On-line version ISSN 2256-5337

Abstract

GUERRERO-CRISTANCHO, Juan S.; VASQUEZ-CORREA, Juan C. and OROZCO-ARROYAVE, Juan R.. Word-Embeddings and Grammar Features to Detect Language Disorders in Alzheimer’s Disease Patients. TecnoL. [online]. 2020, vol.23, n.47, pp.63-75. ISSN 0123-7799. https://doi.org/10.22430/22565337.1387.

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that affects the language production and thinking capabilities of patients. The integrity of the brain is destroyed over time by interruptions in the interactions between neuron cells and associated cells required for normal brain functioning. AD comprises deterioration of the communicative skills, which is reflected in deficient speech that usually contains no coherent information, low density of ideas, and poor grammar. Additionally, patients exhibit difficulties to find appropriate words to structure sentences. Multiple ongoing studies aim to detect the disease considering the deterioration of language production in AD patients. Natural Language Processing techniques are employed to detect patterns that can be used to recognize the language impairments of patients. This paper covers advances in pattern recognition with the use of word-embedding and word-frequency features and a new approach with grammar features. We processed transcripts of 98 AD patients and 98 healthy controls in the Pitt Corpus of the Dementia-Bank database. A total of 1200 word-embedding features, 1408 Term Frequency-Inverse Document Frequency features, and 8 grammar features were extracted from the selected transcripts. Three models are proposed based on the separate extraction of such feature sets, and a fourth model is based on an early fusion strategy of the proposed feature sets. All the models were optimized following a Leave-One-Out cross validation strategy. Accuracies of up to 81.7 % were achieved using the early fusion of the three feature sets. Furthermore, we found that, with a small set of grammar features, accuracy values of up to 72.8 % were obtained. The results show that such features are suitable to effectively classify AD patients and healthy controls.

Keywords : Alzheimer's Disease; Natural Language Processing; Text Mining; Classification; Machine Learning.

· abstract in Spanish · text in English · English (

pdf )