PREDICTING ALZHEIMER’S USING MACHINE LEARNING

Semo Edam
3 min readMay 29, 2020

Alzheimer’s disease is a progressive disorder that causes brain cells to waste away (degenerate) and die. Alzheimer’s disease is the most common cause of dementia — a continuous decline in thinking, behavioral, and social skills that disrupts a person’s ability to function independently.

There is no treatment that cures Alzheimer’s disease or alters the disease process in the brain. The available medications can only temporarily improve symptoms or slow the rate of decline. These treatments can sometimes help people with Alzheimer’s disease maximize function and maintain independence for a period of time.

Dataset:

The dataset used is longitudinal MRI Data in Nondemented and Demented Older Adults: This set consists of a longitudinal collection of 1968 subjects aged between 60 and 98. Each subject was scanned on two or more visits, separated by at least one year for a total of 2508 imaging sessions.

COLUMNS:
Subject ID : Patient ID
MRI ID : Patient ID + The visit number
Group : The conclusion (target)
Visit : The visit number
M/F : Gender
Age : Age
EDUC : Years of education
SES : Socioeconomic Status
MMSE : Mini-Mental State Examination
eTIV : Estimated Total Intracranial Volume
nWBV : Normalize Whole Brain Volume
ASF : Atlas Scaling Factor

Data exploring:

After loading the dataset, I check the description and the shape, then started with the baseline:

The next step is to do train/test split. Splitting the dataset into 3 datasets, train, validation and test, will allow us to have data to train and get the score on validation before getting the final score test.

Linear model:

Starting with the linear model, I used Logistic Regression with OneHotEncoder and got a validation score of 0.81, also calculated the coefficient score of the features used, to show the relationship between each feature with the target.

Tree-based model:

After the use of the linear model, I used a tree-based model, which is the Random Forest Classifier with OrdinaEncoder, and got a better validation score of 0.93 and a test score of 0.91.

Conclusion:

In conclusion, I checked how the model was successful in predicting if the patient is demented or nondemented by using the Shapley value plot.

The graph shows how the model was able to predict the patient is 84% to be nondemented.

To show the marginal effect the two features (gender and age) have on the predicted outcome of our model. The graph above shows for each age category and gender there is a probability of being nondemented. For example, a group between age 88 and 98 included, males have a 50% chance to be nondemented, for women in the same age category have 60% to be nondemented. In other words, males between age 88 and 98 have 50% chance to be demented, and women in the same category of age have 40% chance to be demented.

--

--