Journal of Neurological Sciences (Turkish) 2017 , Vol 34 , Num 1
A novel Convolutional Neural Network Model Based on Voxel-based Morphometry of Imaging Data in Predicting the Prognosis of Patients with Mild Cognitive Impairment
Füsun Çitak-ER1,Dionysis GOULARAS2,Burcu ORMECİ, the Alzheimer's Disease Neuroimaging Initiative*3
1Yeditepe University, Department of Biotechnology, Istanbul, Turkey
2Yeditepe University, Department of Computer Engineering, Istanbul, Turkey
3Yeditepe University Hospital, Department of Neurology, Istanbul, Turkey

Summary

Objective: Nowadays, it is of great interest to identify neuroimaging biomarkers for the early detection of Alzheimer's disease (AD). It is considered that approximately half of patients with a diagnosis of mild cognitive impairment (MCI) eventually develop Alzheimer's disease, and the other half remain stable. In this context, a novel convolutional neural network (CNN) based on voxel-based morphometric analysis is proposed to predict the prognosis of patients with MCI using their baseline structural magnetic resonance (MR) images.

Methods: Two groups of patients were identified among 305 patients with a diagnosis of MCI, those who developed Alzheimer's disease during their follow-up (n=140), and those who remained stable in the MCI state (n=165). The baseline structural MR images of the patients were used for training and evaluating the proposed prediction model. Voxel-based morphometry generated from the baseline structural MR images was used to obtain significant volume of interests (VOIs) related with gray matter damage. Then, a convolutional neural network was trained to extract prognostic features from MR images using a set of convolutional feature detectors acquired by the training of a patch-based autoencoder.

Results: This work achieved an accuracy of 78.7%, slightly superior (more than 4%) to a reference study, for predicting the risk of developing Alzheimer's disease for patients with MCI.

Conclusion: The results of this study show that the use of a convolutional neural network using significant topographic regions of the brain is successful in predicting the risk of developing Alzheimer's disease for patients with MCI.

Introduction

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder that accounts for up to 75% of all dementia cases (1). Dementia refers to a set of symptoms characterized by a loss in cognitive abilities that are severe enough to disrupt a normal daily life (2). Mild cognitive impairment (MCI) is considered as a potential prodromal stage of Alzheimer's disease, defined by a set of clinical symptoms related with cognitive abilities as in dementia, but not severe enough to affect daily life. It is known that patients with MCI have a higher risk of progressing to AD than patients without it, even though some patients with MCI remain stable during their whole lifetime (3). Therefore, it is of great interest to identify biomarkers for patients with MCI that indicate the risk of developing AD. Nowadays, there are still no precise biomarkers available to indicate this kind of risk (4).

Neuroimaging is an essential part of the clinical routine in the diagnosis and monitoring of the progress of AD. Many recent neuroimaging studies on AD encourage the use of various neuroimaging modalities for achieving a differential diagnosis of dementias. Structural magnetic resonance imaging (MRI), (5) functional MRI, perfusion computerized tomography (CT), and positron emission tomography (PET) are the most promising neuroimaging modalities that provide biomarkers for the early detection of AD, such as neuroanatomic atrophy, a metabolite profile of the cerebrospinal fluid (CSF), and anisotropy (5-9). Recent MRI studies demonstrated the greater presence of hippocampal atrophy in patients with AD compared with healthy control or patients with MCI (10). The CSF levels of amyloid-beta peptide, total tau and 181-Thr-phosphorylated-tau serve as potential diagnostic biomarkers for distinguishing patients with Alzheimer's disease from cognitively normal elderly controls (NC). Furthermore, the combination of amyloid-beta peptide level with the ratio of amyloid-beta peptide level to 181-Thr-phosphorylated-tau level is a promising prognostic biomarker for predicting the progression to Alzheimer's disease in two years (11). Additionally, a decrease in the fractional anisotropy is one of the earliest abnormalities observed in the MRIs of cognitively normal patients at risk of developing Alzheimer's disease (12).

Several computer-aided diagnosis (CAD) models based on neuroimaging have been proposed in the literature with different diagnostic and prognostic goals for Alzheimer's disease. These models are differentiating AD from MCI, AD from NC, and AD from frontotemporal lobar degeneration (FTLD) (13-18). Other models differentiate MCI from NC22 or identify patients with MCI at risk of developing AD (13,14,17,22-25). In this context, the goal of this study was to build a CAD model for patients with MCI to predict the risk of developing AD.

The development of a CAD system comprises two important stages: feature extraction and classification. The feature extraction stage provides useful information derived from medical images. Then, the extracted features are used in the classification process in order to perform the diagnosis. Based on this approach, some studies proposed the use of voxel intensities of the whole brain. In these studies, MR images were taken as features for the classification procedure (19-21,26). The diagnostic power of the mean cortical thickness of brain regions was assessed and recommended as a biomarker for the early diagnosis of Alzheimer's disease (27,28). The volume of specific brain regions (e.g. hippocampus, lobes, ventricles) was used to represent MR images as a feature set (17,29-31). Gerardin et al. proposed using features of the shape of the hippocampus as a feature set for CAD systems (32). Furthermore, the use of voxel-intensities of tissue density maps was another feature type in this area (25,33,34).

Taking the whole MRI data results in a feature space that it is hard to work because it is huge. For this reason, several methods were proposed for reducing the dimensions of the feature space in order to avoid "the curse of dimensionality". Some studies performed tissue segmentation (with or without atlas-based image labeling) in their feature extraction process. For example, the gray matter tissue volume for each brain region based on the atlas of Kabani et al. was used as a feature set by Suk et al. (14,35) The FastICA method was used to convert segmented gray matter and white matter images into a reduced feature space (22). Instead of using predefined VOI-based methods (like atlas-based labeling), a STructural Abnormality iNDex (STAND)-score was proposed to select a subset of AD-specific voxels for dimensionality reduction of tissue maps (21). In another study, a special index (SPARE-AD) was proposed for selecting a set of voxels as features for classifiers (36). In this context of dimensionality reduction, the current study used a convolutional neural network to deal with the high dimensionality of MR images. Moreover, two additional methods were applied for comparison with the proposed model in order to further evaluate this approach: one taking in account the whole gray matter data and another using principal component analysis for reducing the gray matter data.

Methods

Subjects
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD).

A total of 305 patients with a baseline diagnosis of MCI were taken as the subjects of this study, whose written informed consents were obtained by ADNI before acquiring MR scan data. The subjects were separated into two groups: those who developed Alzheimer's disease during their follow-up (n=140) and those who remained stable in the MCI state (n=165). In this study, the above groups were named as converted-MCI (C-MCI) and non-Converted-MCI (NC-MCI), respectively. Among the 140 C-MCI group subjects, 26 were diagnosed as having AD six months after the baseline assessment, 38 after 12 months, 11 after 18 months, 39 after 24 months, 18 after 36 months, five after 48 months, one after 72 months, and two after 96 months. The baseline structural MR images of those patients were used for training and evaluating the proposed prediction model. This study used the baseline MRI of the patients in the format of preprocessed forms of T1-weighted MR images (3 Tesla) acquired with a magnetization-prepared rapid acquisition gradient echo (MP-RAGE) sequence, with data access permission from ADNI. ADNI provides preprocessed images that have undergone specific image preprocessing correction steps such as Gradwarp, B1 correction, and N3 specification in order to reduce the risk of scanner bias and the effect of heterogeneity of protocols (37).

Methods
In this study, a single-layer convolutional neural network with a novel pooling approach was developed to predict whether a particular patient with MCI would develop Alzheimer's disease. The baseline structural MR images of the subjects were used for training and evaluating the proposed prediction model. At first, a voxel-based morphometric analysis of regional gray matter differences between two groups of patients was used to obtain the significant VOIs related with gray matter damage. Then, an autoencoder was trained using patches sampled from the previously extracted VOIs to generate feature detectors (also known as "filters"). Finally, a convolutional neural network model was constructed using the acquired feature detectors to extract prognostic features.

Voxel-based morphometric analysis
Voxel-based morphometry (VBM) is a statistical method used in neuroimaging analysis to investigate brain tissue abnormalities (38). The normalization and segmentation steps of VBM were performed using the "VBM8" extension of the SPM8 toolbox in Matlab 2010a (The MathWorks, Inc., Natick, Massachusetts, USA) to acquire gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) tissue probability maps of the aligned structural images. The spatially normalized and modulated GM tissue probability maps were smoothed with a Full-Width Half-Maximum (FWHM) of 8 mm using SPM8.

By using the GM tissue probability maps (GM-TPM), a t-map was built by performing a voxel-wise two-sample t-test with a one-sided alternative hypothesis stating that GM concentration of C-MCI was smaller than NC-MCI. The selection of the significance level value for a statistical test has repercussions for type-I and type-II errors (39). Type-I errors occur when a high value significance is selected, which leads to a low t-value threshold thus including more voxels and increasing by consequence the false positive rate. Type-II errors occur when a low value of significant level (α) is chosen, which results in a high t-value threshold that excludes more voxels and increases by consequence the false negative rate.

After a series of experiments, the statistical significance level (α) corrected with the family-wise error (FWE) was selected as 0.005 with an extent threshold of 200 voxels. This analysis, which aimed to define the number of VOIs and indicate a significant GM difference, gave a result of five VOIs of GM tissue. Finally, VOI statistics and regional labeling were performed using WFU Pickatlas (40).

Generation of feature detectors
In this study, a convolutional neural network (CNN) configuration was used in an effort to reduce the dimensionality problem caused by the large number of voxels. The feature detectors of the CNN were defined with the help of an autoencoder. An autoencoder is a neural network trained by a back-propagation algorithm with the aim of producing an output equal to the input. Therefore, it requires that the output and the input have the same number of nodes. In general, the number of hidden layers can be greater or smaller than the input values. In our case, in order to reduce the data dimensions, a smaller number of hidden layers compared with the input values was selected. Thus, they are considered as the latent representation of the original data. Figure 1 shows an illustration of an autoencoder with four input units and two hidden units in one hidden layer. The weights wi1 where i= 1..4 of the first hidden unit h1 are marked with red. In this example the weights of the hidden layer form a matrix Wk of size 2x2, where k=1.2. This matrix corresponds to a convolutional filter.

Figure 1. An autoencoder with four input units (x1, x2, x3, x4) and two hidden units (h1, h2).

In this work, 1000 patches of size 11x11 were sampled for each of the GM tissue probability maps within the region determined by VBM analysis to train an auto-encoder with 10 hidden units in the hidden layer. The number of hidden units was determined after a series of extensive tests implying different numbers of hidden units based on the best success rate of the system. The autoencoder was trained according to the vectorized representations of the patches with a linearly decreasing learning rate going from 0.9 to 0.25, a stable learning scale of 0.9, a momentum of 0.2, and L2-weight regularization penalty of 0.0001 over a total of 20 epochs.

Generation of prognostic features
A convolutional Neural Network is a type of feed-forward neural network, which is composed of a number of convolutional and subsampling layers. CNN is an unsupervised feature learning method and optionally followed by a supervised classifier. In the convolutional layer, a set of features maps is generated by convolving the input with each of the feature detectors separately, which results in a number of feature maps equal to the number of feature detectors. Then, a subsampling procedure is applied to the feature maps resulting in feature vectors. Figure 2 shows a convolutional neural network configuration using an autoencoder for the generation of feature detectors.

Figure 2. A convolutional neural network configuration.

A single-layer convolutional neural network was trained on the GM tissue probability maps to extract the prognostic features from the baseline MRIs of patients with MCI. In the convolutional layer of the CNN, a set of feature maps was generated by convolving each filter with every GM-TPM with a proper sigmoid activation function. We proposed a novel pooling approach based on VBM for the subsampling layer, where the resulted VOIs of the VBM analysis were considered as the pooling volumes of interest (VOI). Finally, the means of the five areas of the features maps generated feature vectors derived from the corresponding feature maps. In total, from 10 feature maps, one generic feature vector occurred with a size of 50 elements and served as an input for a supervised classifier; this is detailed in the next section. Figure 3 illustrates the CNN model with the proposed pooling approach.

Figure 3: A convolutional neural network with the proposed voxel-based morphometry-based pooling approach.

Evaluation of the acquired prognostic values
A support vector machine classifier (SVM) with a polynomial kernel, a support vector machine classifier with a linear kernel, and a logistic regression classifier were trained on the acquired feature vectors using WEKA software (41). Then, the predictive performances of the models were evaluated in terms of sensitivity, specificity, and accuracy using a ten-fold cross-validation procedure, and by testing different numbers of feature detectors and VOIs (42). We defined the sensitivity as the ratio of the correctly identified patients with MCI who eventually developed AD (TP: true positives) to the total number of C-MCI group patients, and the specificity as the ratio of the correctly identified patients with MCI who remained stable (TN: true negative) to the total number of NC-MCI group patients. Similarly, we defined the accuracy as the ratio of the sum of TP and TN to the total number of patients.

STATISTICAL ANALYSIS
The statistical analysis was performed using MATLAB (MATLAB and Statistics Toolbox Release 7.3, The MathWorks, Inc., Natick, Massachusetts, USA). The demographic and clinical information of the C-MCI and NC-MCI groups were analyzed using descriptive statistical methods. Continuous variables are presented as mean, standard deviation, and range, whereas categorical variables are shown as numbers. The Mann-Whitney U test and Pearson Χ2 test were used to compare the two groups with a significance level of 0.1%.

Results

In this section, we present the results of the statistical analysis and the performance of the supervised classifier in terms of sensitivity, specificity, and accuracy. The demographic profiles of the two groups at baseline were statistically analyzed for significance in terms of sex, age, education, and the Mini Mental State Examination (MMSE) using statistical methods explained in the statistical analysis section and presented in Table 1. The statistical analysis showed that the two groups did not differ significantly with regards sex, age, and education (p>0.001). On the other hand, the MMSE scores of patients with NC-MCI were significantly higher than in patients with C-MCI (p<0.001).

Table 1: Demographic and clinical characteristics of the groups.

Figure 4 shows the 3T structural MR image acquired in the baseline examination of a woman aged 65 years with MCI who subsequently progressed to AD 12 months after the baseline examination. Figure 4a shows the 3T T1-weighted bias-corrected MR scan (170 sagittal slices, slice thickness=1.2 mm, repetition time (TR)/echo time (TE)= 6.8/3.15 ms, flip angle (FA) = 8.0º, matrix 256x256, magnetization-prepared rapid acquisition gradient echo (MP-RAGE) sequence) as acquired from the ADNI dataset. Additionally, the smoothed image with FWHM of 8 mm of the GM-TPM is shown in Figure 4b, which was acquired after applying the VBM methods as described in the voxel-based morphometric analysis section.

Figure 4: 3T structural baseline magnetic resonance imaging data of a woman aged 65 years with mild cognitive impairment patient who subsequently progressed to Alzheimer's disease. (a) Original scan (b) Acquired gray matter tissue probability map.

Figure 5a shows the t-map GM lower concentration differences observed between the C-MCI and NC-MCI groups. Figure 5b presents the same t-map superimposed on the Individual Brain Atlases using the SPM 116 atlas (IBASPM-116) of WFU Pickatlas software. Significantly reduced GM in patients with MCI who progressed to AD were determined with a threshold t-value (T=5.08) with a FWE-corrected p-value of 0.005. After the t-map statistics analysis, five significant VOIs were defined according to the VBM analysis. As an extra test, the same procedure was repeated at a FWE-corrected significance level of 0.001, which resulted in seven VOIs, in order to test if a better success rate could be obtained with seven VOIs instead of 5.

Figure 5: Volume-of-interests with significantly decreased gray matter tissue in converted group (a), and superimposed on the Individual Brain Atlases using the SPM 116 (b) the color map shows the t-value of the voxels range from red (p=0.5) to white (p=0). The threshold t-value (p<0.005 with the family-wise error-corrected) is 5.08.

The statistical results of the five and seven VOIs are given in Table 2 and Table 3 ,respectively. The statistics of the VOIs are presented in terms of the size of VOIs, which is defined as the number of included voxels. The t-statistics of the VOIs are expressed as mean ± standard deviation of t-values of their voxels, the maximum t-value observed within the VOI, and the corresponding labeling of the brain region based on the IBASPM-116 atlas. The right superior temporal gyrus, left amygdala, and the right hippocampus were the common pick points among both sets of the VOIs.

Table 2: Statistics of the five volume-of-interests related with gray matter damage (pFWE=0.005).

Table 3: Statistics of the seven volume-of-interests related with cerebral gray matter damage (pFWE=0.001).

Table 4: The prognostic power of the feature vectors.

Figure 6 shows a few patches sampled from the GM-TPMs of the patients. The size of the patches was set to 11x11 and sampled from the significant brain atrophy regions determined by the VBM analysis. These patches were used to train the autoencoder for acquiring feature detectors as explained in the generation of feature detectors section.

Figure 6: Some of the patches sampled from the gray matter tissue probability maps of the patients.

As described in the generation of prognostic features section, 10 feature detectors wk were generated and used for the extraction of feature vectors. These feature detectors are shown in Figure 7 ,where each image corresponds to a feature detector.

Figure 7: Feature detectors generated with the help of an autoencoder.

Figure 8 presents the 41th axial slice of the GM-TPM of a patient with C-MCI (a), a patient with NC-MCI (c), and the effect of the sigmoid activation functions (b, d) for these slices, respectively. The images, also known as feature maps, shown in Figure 8(b) and Figure 8(d) are images that the proposed pooling method used in order to acquire prognostic features.

Figure 8: The 45th axial slice of the gray matter tissue probability maps of a converted patient (a) and a stable patient (c). Their corresponding sigmoid activation images based on the fourth base are shown in (b) and (d), respectively.

Table 4 presents the classification performances of the prognostic features using the following classifiers: SVM with polynomial kernel, SVM with linear kernel and logistic regression. The diagnostic predictability resulted an accuracy between 73.1%-78.7% for all classification models. The best performance was achieved with a configuration of 10 feature detectors and five VOIs (pFWE=0.005).

In order to have a better evaluation of the proposed method, we compared our results with two other conventional methods: one taking the whole number of voxels of the five VOIs issued from VBM analysis and applying a classification based on a SVM classifier, and another one that takes the same data but performs an additional step by carrying out a dimensional reduction using principal component analysis (PCA), before applying the same classification. These results were compared with the results of the proposed solution as presented in Table 5 ,and showed that the proposed solution performed better.

Table 5: Performance comparison between the proposed solution and two additional classification methods.

Discussion

Alzheimer's disease is an irreversible and progressive brain disorder occurring primarily in the elderly that causes the destruction of nerve cells and eventually leads to the death of these cells (43). Nowadays, clinical examinations, neuropsychological assessments, and neuroimaging provide a standard way to establish the diagnosis of Alzheimer's disease (44). Routine neuroimaging evaluation includes structural magnetic resonance imaging, which is a powerful tool to detect structural changes that occur in patients with Alzheimer's disease, such as cortical shrinkage, enlarged ventricles, and shrinkage of the hippocampus (4). Many computer-aided diagnosis studies for Alzheimer's disease use magnetic resonance imaging because they are based on the above fact, and apply several image processing techniques and machine learning algorithms in order to detect and associate those changes with AD.

Recent studies proposed CAD models to differentiate patients with AD from cognitively normal elderly with high accuracies: 94.5% in Magnin et al., 96% in Klöppel et al., and 89.3% in Vemuri et al. (19-21) However, the development of a CAD model for differentiating AD from MCI becomes a challenging and difficult task compared with the above works because cortical atrophy is a non-specific finding in Alzheimer's disease and a patient with MCI may have a similar atrophic pattern to patients with Alzheimer's disease. As an effort to investigate the possibility of a system allowing to make such a differentiation, in this study, we focused on the problem of predicting the prognostic type of patients with MCI. This prognosis system would be based on the baseline MRI of the patients, which is one of the most challenging in the field of computer-aided diagnosis of dementias.

For this purpose, we used a convolutional neural network together with an estimation of significant regions based on the voxel-based morphometric analysis in order to assess a prognostic value of baseline MRI scans for a potential conversion of MCI to AD. In this specific area, the study of Suk et al. attracted our attention(13). Their work reported a maximal accuracy of 74.58% for the binary classification of negative prognosis vs. stable prognosis and was used as a reference study for the evaluation of our proposed system.

Voxel-based morphometry is a widely used technique in the literature for the investigation of brain tissue abnormalities (38). Posterior cortical atrophy, temporal lobe atrophy, and white matter damage in dementias are some of the detectable biomarkers that can be achieved through VBM analysis(42,45,46). Additionally, the statistical correlation of other parameters such as memory impairment or neuropsychological assessments with MRI data is investigated using VBM(47,48). In this study, we performed a VBM to determine significant brain atrophy regions on baseline MR images of patients with MCI for two purposes: first to select patches for training an autoencoder and second to determine pooling VOIs for the subsampling layer of CNN based on the VOIs of the VBM analysis. For the first part, our patch selection method can be considered similar to the method of Suk et al in terms of statistical methods(13). On the other hand, for the second part, to the best of our knowledge, VBM has never been employed in the subsampling layer of CNN for determining pooling areas.

The idea behind the proposed pooling method is that the topographic distribution of brain atrophy in Alzheimer's disease can be better specified using VBM (49). Recent studies showed that the medial temporal lobe atrophy (MTA) was an MRI biomarker that predicted AD dementia in patients with MCI (50). The left medial temporal lobe especially (hippocampus, parahippocampal gyrus), is the most affected region in patients who develop AD and our VBM findings are compatible with this fact, because it is demonstrated by the topography of the detected VOIs (51). Instead of grid-like shaped pooling areas as used in the literature, we used the topography of the brain in the subsampling layer.

To assess the success of our proposed method, the study of Suk et al was used as a reference study (13). Unlike Suk et al's method, which used two layers, we only used a single layer latent feature representation via 2D convolutional neural network, because our results were not improved with a second layer (13). Our approach resulted in a slightly superior accuracy of 78.7% than that of Suk et al.'s study (more than 4%). Another point that we considered when determining training parameters was to assess the discriminative power of the output of the hidden layer. We performed a Mann-Whitney U test for each hidden unit among its output data acquired from the C-MCI patches against ones that were acquired from the NC-MCI patches. After the training of the autoencoder, we observed that for the C-MCI and NC-MCI groups, all except one of the 10 hidden units (the 7th hidden unit with a p-value of 0.8074) were statistically significant different (p<0.001).

Additionally, we compared the success of the proposed method with two additional methods: one taking in account the whole gray matter data and another using principal component analysis for reducing the gray matter data (Table 5). The proposed method performed better than the other methods. The results of the principal component analysis of the raw data showed that the data contained significant redundancies because the ratio between the maximum and minimum eigenvalue is infinite, thus justifying the use of dimensional reduction for this kind of data.

Until today, a treatment that can cure Alzheimer's disease or any other type of dementia is still to be discovered. Thanks to recent advances in medical technology, drug and non-drug treatments are discovered and validated in patients with AD in routine clinical use, thereby improving their quality of life and preventing the progression of symptoms in the early stages of dementia. On the other hand, the majority of those patients are older patients. Therefore, drug treatment in older patients should be carefully considered in terms of adverse effects with eventual other medications of the patients.

The possibility of identifying individuals with MCI that will progress to AD would be very beneficial for the patients in terms of good prognosis and delaying/preventing the symptoms of Alzheimer's disease by immediately applying a neuromodulation treatment, and a treatment that slows down the progression to AD. Additionally, this system helps to reduce the use of unnecessary medications for individuals who are predicted to remain stable in MCI.

Conclusion

In summary, the aim of this study was to develop a CAD system for prediction prognosis type of MCI patients in order to apply to patients a medication according to the prediction. To achieve this, we implemented a single-layer convolutional neural network that uses feature detectors generated by a patch-based autoencoder. The contribution of this work is that in the proposed CNN model, we introduced a new pooling approach for the subsampling layer of CNN based on VBM analysis that selects regions with a statistical significance between converted and non-converted patients with MCI. When comparing our results with similar studies that did not use our pooling approach, we observed that our method performed better (78.7%), thus showing that the VBM analysis can be integrated successfully with deep learning approaches for MCI prediction prognosis. As a result, this system will help physicians to apply the appropriate medication to patients according to the prediction prognosis of the system.

Acknowledgements
Current study was supported by TUBITAK BIDEB 2211-C (No: 1649B031402382).

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Received by: 03 September 2016
Revised by: 08 December 2016
Accepted: 02 February 2017

References

1) Qiu C, Kivipelto M, Von Strauss E. Epidemiology of Alzheimer's disease: Occurrence, determinants, and strategies toward intervention. Dialogues Clin Neurosci. 2009;11:111-128.

2) Prince M. Epidemiology of dementia. Psychiatry. 2007;6:488-490.

3) Alzheimer's Association. 2014 Alzheimer's disease facts and figures. Alzheimer's Dement. 2014;10:47-92.

4) Dustin D, Hall BM, Annapragada A, Pautler RG. Neuroimaging in Alzheimer's disease: Preclinical challenges toward clinical efficacy. Transl Res. 2016 ;175:37-53..

5) Koikkalainen J, Rhodius-Meester H, Tolonen A, et al. Differential diagnosis of neurodegenerative diseases using structural MRI data. Neuroimage Clin. 2016 ;11:435-449.

6) Dickerson BC. Functional MRI in the early detection of dementias. Rev Neurol (Paris). 2006;162:941-944.

7) Zimny A, Sasiadek M, Leszek J, Czarnecka A, Trypka E, Kiejna A. Does perfusion CT enable differentiating Alzheimer's disease from vascular dementia and mixed dementia? A preliminary report. J Neurol Sci. 2007;257:114-120.

8) Chew J, Silverman DHS. FDG-PET in Early AD Diagnosis. Med Clin North Am. 2013;97:485-494.

9) James JS. A review of neuroimaging biomarkers of Alzheimer's disease. Neurol Asia. 2013;18:239-248.

10) Schroder J, Pantel J. Neuroimaging of hippocampal atrophy in early recognition of Alzheimer's disease--a critical appraisal after two decades of research. Psychiatry Res. 2016;247:71-78. doi:10.1016/j.pscychresns.2015.08.014.

11) Forlenza O V, Radanovic M, Talib LL, et al. Cerebrospinal fluid biomarkers in Alzheimer's disease: Diagnostic accuracy and prediction of dementia. Alzheimer's Dement (Amsterdam, Netherlands). 2015;1:455-463. doi:10.1016/j.dadm.2015.09.003.

12) Kantarci K. Fractional anisotropy of the fornix and hippocampal atrophy in Alzheimer's disease. Front Aging Neurosci. 2014;6(OCT). doi:10.3389/fnagi. 2014;6:316..

13) Suk H Il, Lee SW, Shen D, Alzheimer's Disease Neuroimaging Initiative. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage. 2014; 1;101:569-582.

14) Suk H Il, Shen D. Deep learning-based feature representation for AD/MCI classification. In: Medical Image Computing and Computer-Assisted Intervention ? MICCAI. Vol 8150 LNCS. ; 2013:583-590.

15) Zhang D, Shen D. Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease. Neuroimage. 2012;59:895-907.

16) Zhu X, Suk H Il, Shen D. A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis. Neuroimage. 2014;100:91-105.

17) Schmitter D, Roche A, Maréchal B, et al. An evaluation of volume-based morphometry for prediction of mild cognitive impairment and Alzheimer's disease. NeuroImage Clin. 2014;7:7-17.

18) Cuingnet R, Gérardin E, Tessieras J, et al. Automatic classification of patients with Alzheimer's disease from structural MRI: A comparison of ten methods using the ADNI database. Neuroimage. 2010;56:766-781.

19) Magnin B, Mesrob L, Kinkingnéhun S, et al. Support vector machine-based classification of Alzheimer's disease from whole-brain anatomical MRI. Neuroradiology. 2009;51:73-83.

20) Klöppel S, Stonnington CM, Chu C, et al. Automatic classification of MR scans in Alzheimer's disease. Brain. 2008;131:681-689.

21) Vemuri P, Gunter JL, Senjem ML, et al. Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies. Neuroimage. 2008;39:1186-1197.

22) Khedher L, Ramirez J, Gorriz JM, Brahim A. Automatic classification of segmented MRI data combining Independent Component Analysis and Support Vector Machines. Stud Health Technol Inform. 2014.

23) Young J, Modat M, Cardoso MJ, Mendelson A, Cash D, Ourselin S. Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment. NeuroImage Clin. 2013;2:735-745.

24) Shi Y, Suk H, Gao Y, Shen D. Joint Coupled-Feature Representation and Coupled Boosting for AD Diagnosis. IEEE Conf Comput Vis Pattern Recognit. 2014.

25) Davatzikos C, Fan Y, Wu X, Shen D, Resnick SM. Detection of prodromal Alzheimer's disease via pattern classification of magnetic resonance imaging. Neurobiol Aging. 2008;29:514-523.

26) Fan Y, Shen D, Gur RC, Gur RE, Davatzikos C. COMPARE: Classification of morphological patterns using adaptive regional elements. IEEE Trans Med Imaging. 2007;26:93-105.

27) Querbes O, Aubry F, Pariente J, et al. Early diagnosis of Alzheimer's disease using cortical thickness: impact of cognitive reserve. Brain. 2009;132:2036-2047.

28) Yuan L, Wang Y, Thompson PM, Narayan VA, Ye J. Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage. 2012;61:622-632.

29) Colliot O, Chételat G, Chupin M, et al. Discrimination between Alzheimer disease, mild cognitive impairment, and normal aging by using automated segmentation of the hippocampus. Radiology. 2008;248:194-201.

30) Chupin M, Hammers A, Liu RSN, et al. Automatic segmentation of the hippocampus and the amygdala driven by hybrid constraints: Method and validation. Neuroimage. 2009;46:749-761.

31) Chupin M, Gérardin E, Cuingnet R, et al. Fully automatic hippocampus segmentation and classification in Alzheimer's disease and mild cognitive impairment applied on data from ADNI. Hippocampus. 2009;19:579-587.

32) Gerardin E, Chételat G, Chupin M, et al. Multidimensional classification of hippocampal shape features discriminates Alzheimer's disease and mild cognitive impairment from normal aging. Neuroimage. 2009;47:1476-1486.

33) Lao Z, Shen D, Xue Z, Karacali B, Resnick SM, Davatzikos C. Morphological classification of brains via high-dimensional shape transformations and machine learning methods. Neuroimage. 2004;21:46-57. doi:10.1016/j.neuroimage.2003.09.027.

34) Fan Y, Gur RE, Gur RC, et al. Unaffected Family Members and Schizophrenia Patients Share Brain Structure Patterns: A High-Dimensional Pattern Classification Study. Biol Psychiatry. 2008;63:118-124.

35) Kabani N, MacDonald D, Holmes CJ, Evans A. A 3D atlas of the human brain. Neuroimage. 1998;7:S717.

36) Davatzikos C, Da X, Toledo JB, et al. Integration and relative value of biomarkers for prediction of MCI to AD progression: Spatial patterns of brain atrophy, cognitive scores, APOE genotype and CSF biomarkers. NeuroImage Clin. 2014;4:164-173.

37) Jack CR, Bernstein MA, Fox NC, et al. The Alzheimer's Disease Neuroimaging Initiative (ADNI): MRI methods. J Magn Reson Imaging. 2008;27:685-691. doi:10.1002/jmri.21049.

38) Ashburner J, Friston KJ. Voxel-based morphometry--the methods. Neuroimage. 2000;11:805-821.

39) Labovitz S. Criteria for Selecting a Significance Level: A Note on the Sacredness of .05. Am Sociol. 1968;3:220-222.

40) Maldjian JA, Laurienti PJ, Kraft RA, Burdette JH. An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets. Neuroimage. 2003;19:1233-1239.

41) Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explor Newsl. 2009;11:10?18.

42) Lehmann M, Crutch SJ, Ridgway GR, et al. Cortical thickness and voxel-based morphometry in posterior cortical atrophy and typical Alzheimer's disease. Neurobiol Aging. 2011;32:1466-1476.

43) Querfurth HW, LaFerla FM. Alzheimer's disease. N Engl J Med. 2010;362:329-344. doi:10.1056/NEJMra0909142.

44) Villareal DT, Morris JC. The diagnosis of Alzheimer's disease. J Alzheimers Dis. 1999;1:249-263.

45) Busatto GF, Garrido GEJ, Almeida OP, et al. A voxel-based morphometry study of temporal lobe gray matter reductions in Alzheimer's disease. Neurobiol Aging. 2003;24:221-231.

46) Yoon B, Shim YS, Hong YJ, Hong Y-J, et al. Comparison of diffusion tensor imaging and voxel-based morphometry to detect white matter damage in Alzheimer's disease. J Neurol Sci. 2011;302:89-95.

47) Sokolov A, Vorobyev S, Fokin V, Lupanov I, Efimtcev A, Lobzin V. Mophological and functional investigation of memory impairment in Alzheimer's Disease: Combined fMRI and voxel-based morphometry study. Alzheimer's Dement. 2014;10:55-56..

48) Kanetaka H. Neuropychological correlates of brain atrophy shown on voxel-based morphometry in Alzheimer's disease. Alzheimer's Dement. 2010;6(4).

49) Mann DM. The topographic distribution of brain atrophy in Alzheimer's disease. Acta Neuropathol. 1991;83:81-86.

50) Korf ESC, Wahlund LO, Visser PJ, Scheltens P. Medial temporal lobe atrophy on MRI predicts dementia in patients with mild cognitive impairment. Neurology. 2004;63:94-100.

51) Ferreira LK, Diniz BS, Forlenza O V., Busatto GF, Zanetti M V. Neurostructural predictors of Alzheimer's disease: A meta-analysis of VBM studies. Neurobiol Aging. 2011;32:1733-1741.