Functional and structural correlates of working memory performance and stability in healthy older adults

Despite the well-described deleterious effects of aging on cognition, some individuals are able to show stability. Here, we aimed to describe the functional and structural brain characteristics of older individuals, particularly focusing on those with stable working memory (WM) performance, as measured with a verbal N-back task across a 2-year follow-up interval. Forty-seven subjects were categorized as stables or decliners based on their WM change. Stables were further subdivided into high performers (SHP) and low performers (SLP), based on their baseline scores. At both time points, magnetic resonance imaging (MRI) data were acquired, including task-based functional MRI (fMRI) and structural T1-MRI. Although there was no significant interaction between overall stables and decliners as regards fMRI patterns, decliners exhibited over-activation in the right superior parietal lobule at follow-up as compared to baseline, while SHP showed reduced the activity in this region. Further, at follow-up, decliners exhibited more activity than SHP but in left temporo-parietal cortex and posterior cingulate (i.e., non-task-related areas). Also, at the cross-sectional level, SLP showed lower activity than SHP at both time points and less activity than decliners at follow-up. Concerning brain structure, a generalized significant cortical thinning over time was identified for the whole sample. Notwithstanding, the decliners evidenced a greater rate of atrophy comprising the posterior middle and inferior temporal gyrus as compared to the stable group. Overall, fMRI data suggest unsuccessful compensation in the case of decliners, shown as increases in functional recruitment during the task in the context of a loss in WM performance and brain atrophy. On the other hand, among older individuals with WM cognitive stability, differences in baseline performance might determine dissimilar fMRI trajectories. In this vein, the findings in the SHP subgroup support the brain maintenance hypothesis, suggesting that stable and high WM performance in aging is sustained by functional efficiency and maintained brain structure rather than compensatory changes.


Introduction
Working memory (WM) performance results from the interaction between attention, short-term retention and manipulation of information, carried out by the coordinate activation of many brain regions (Eriksson et al. 2015). The WM capacity is central for daily life activities and is predictive for a wide range of higher level cognitive measures (Unsworth et al. 2014). The aging process is associated with a broad impact on cognition, being impairments in WM a well-known disabling phenomenon in advanced age (Park and Reuter-Lorenz 2009;Salthouse 2010). However, previous studies in elderly individuals have shown high interindividual variability in the WM cognitive profile and a great heterogeneity in WM-related cognitive trajectories over time (Wilson et al. 2002;Habib et al. 2007), as well as in the associations between cognition and brain functioning (Persson et al. 2006;Nagel et al. 2009). Hence, it appears that cognitive dysfunction is not a universal phenomenon in aging (Nyberg and Pudas 2019). In this regard, Nyberg et al. (2012) introduced the concept of brain maintenance to reflect that some cognitively stable older adults achieve successful cognitive aging presumably because they exhibit little or no age-related neurochemical, functional and/or structural changes.
Regarding neuroimaging studies, cross-sectional findings in healthy aging using functional magnetic resonance imaging (fMRI) have reported distinct associations between brain activity patterns and its correlates with successful aging, including both increased and decreased activity, which normally coexist (reviewed in Grady 2012;Li et al. 2015;Rajah and D'Esposito 2005;Spreng et al. 2010). These functional adjustments have been explained by way of distinct cognitive hypotheses (Eyler et al. 2011). In general, increased blood-oxygen level-dependent (BOLD) activity has been interpreted as a compensatory mechanism to counteract the lack of functionality of the typical brain resources (Cabeza et al. 2002(Cabeza et al. , 2018Reuter-Lorenz and Park 2014). Furthermore, compensation stands as a main functional mechanism imbedded within the concept of cognitive reserve (CR), which predicts that high-CR individuals are more able to cope with age-related or disease-associated brain changes (Stern 2002(Stern , 2009). On the other hand, these functional brain changes are not invariably linked to a successful cognitive profile. In these cases, increased activations have been conceived as attempted compensatory mechanisms (Cabeza and Dennis 2013) or dedifferentiation processes (Park et al. 2004;Carp et al. 2011).
Studies entailing fMRI longitudinal observations overcome some limitations of cross-sectional research, notably as the longitudinal approach is more sensitive in identifying continuous ongoing changes in aging (Fjell et al. 2014a). In this line, Nyberg et al. (2010) provided evidence that elders, even expressing an over-recruitment observed with cross-sectional paradigms, can show age-related BOLD signal reductions when followed over time. In addition, and closely associated with the brain maintenance concept, a more recent longitudinal fMRI study showed functional increases in declarative memory decliners, while no significant effects were observed in cognitively stable individuals . These results highlight that when measured across time, increased brain activity may not necessarily be related to higher performance but rather, it may reflect ongoing cognitive decline.
Besides functional brain changes, aging is also associated with widespread modifications in structural integrity (Salat 2004(Salat , 2011Fjell et al. 2014a, b), and recent efforts have been conducted to characterize the structure underlying preserved cognition in successful aging. Previous studies focused on the hippocampal volumes highlight that there are no clear evidences that different ratios of atrophy occur in stables and decliners (Dekhtyar et al. 2017;Pudas et al. 2017). However, some investigations have identified higher cortical thickness (CTh) measures for those older adults exhibiting high (Bartrés-Faz et al. 2019), or above-average to superior cognitive performance, these latter individuals being so-called 'superagers' (Harrison et al. 2012;Gefen et al. 2015;Sun et al. 2016;Cook et al. 2017). Interestingly, some studies have used a multimodal approach to explore the interaction of cognition, brain function, and structure, demonstrating a significant relationship between cognitive performance and structural integrity in aged samples (Burzynska et al. 2013;. Also, Burianová et al. (2015) evidenced that a more preserved structure facilitated the recruitment of functional compensatory mechanisms, thereby enabling better WM accuracy.
Age-related functional changes occurring during cognitive demands and their neural substrates are not fully understood, in part due to the scarce number of longitudinal studies. Focusing on WM, only one previous work has explored the maintenance concept ) and showed that a stable left lateral activation of the prefrontal cortex area over a 4-year follow-up underlies a maintained WM performance. Therefore, in the present investigation, through a 2-year longitudinal design, we aimed to characterize functional brain patterns during a common WM task paradigm (N-back task), as well as, the 1 3 associated underlying structural integrity changes. Based on their longitudinal WM performance, elders showing age-related cognitive decline (decliners) were compared with those showing stability on WM (stables). Further, since some studies have indicated that higher performance is associated with a lower risk of cognitive decline (Habib et al. 2007;Yaffe et al. 2010;Rosano et al. 2012) and because it has been suggested that the maintenance mechanisms would differ depending on the baseline cognitive level (Cabeza et al. 2018), we subdivided the stable group by their performance at the starting point. We hypothesized that, with age, the functional patterns underlying WM would differ between stables and decliners, with the decliners likely to engage non-task-related areas, thus evidencing an unsuccessful compensatory process. In addition, we aimed to characterize the structural progression of these groups and we hypothesized that the age-related cortical atrophy after 2-year follow-up will be also related to task performance and/or stability. Furthermore, within the stable group, only those achieving higher performance scores will probably exhibit a brain profile fitting with the 'maintenance' concept (at functional and structural level).

Subjects
Forty-seven subjects aged 68.40 ± 2.86 years (mean ± standard deviation, SD) at baseline and 70.47 ± 2.97 years at 2-year follow-up, with cognitive and magnetic resonance imaging (MRI) data at both time points, were selected from the control group of a larger cohort of fit community elders recruited to a randomized controlled trial aimed at assessing the effects of walnuts on age-related diseases (Rajaram et al. 2017). Exclusion criteria were illiteracy or inability to understand the protocol or undergo neuropsychological tests, morbid obesity (BMI 3 > 40 kg/m 2 ), uncontrolled diabetes (HbA1c > 8%), uncontrolled hypertension (on-treatment blood pressure ≥ 150/100 mm Hg), prior cerebrovascular accident or major head trauma, any relevant psychiatric illness (including major depression), abnormal cognitive profile according to the normative scores (see next section), dementia, other neurodegenerative diseases (i.e., Parkinson's disease), and any chronic illness expected to shorten survival (i.e., heart failure, chronic liver disease, kidney failure, blood disease, cancer). The study was approved by the Hospital Clínic de Barcelona ethical committee and has been carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). Written informed consent was obtained from each participant prior to enrollment.

Neuropsychological assessment
A comprehensive battery of neuropsychological tests was administered to assess cognitive functioning, as described previously (Rajaram et al. 2017;Vidal-Piñeiro et al. 2014). All participants had normal cognitive profiles with minimental state examination (MMSE) scores > 25 (Mitchell 2009) and performances no more than 1.5 SD below normative scores on any of the neuropsychological tests administered, i.e., they did not fulfill criteria for mild cognitive impairment (Petersen and Morris 2005).

MRI acquisition
MRI was acquired in a 3 T Siemens scanner (Magnetom Trio Tim syngo) with 32-channel head coil at the Unitat d'Imatge per Ressonància Magnètica IDIBAPS (Institut d'Investigacions Biomèdiques August Pi i Sunyer) at Hospital Clínic de Barcelona, Barcelona, at baseline and at a 2-year follow-up evaluation. All participants underwent fMRI interleaved acquisitions [T2*weighted EPI scans, repetition time (TR) = 2000 ms, echo time (TE) = 16 ms, 336 volumes, 40 slices, slice thickness = 3 mm, interslice gap = 25%, field of view (FOV) = 220 mm, matrix size = 128 × 128] during the performance of the N-back task. In addition, a high-resolution T1-weighted structural image was obtained for each subject with a magnetizationprepared rapid acquisition gradient-echo (MPRAGE) threedimensional protocol (TR = 2300 ms, TE = 3 ms, inversion time = 900 ms, FOV = 244 mm, 1-mm isotropic voxel, matrix size = 256 × 256). For all participants, MRI images were examined by a senior neuroradiologist (N.B.) for any clinically significant pathology (none found). Then, all the acquisitions were visually inspected before analysis by the first author (L.V.-A.) to ensure that they did not contain MRI artifacts or excessive motion. At this level, no images were excluded.
The time average between both MRI visits was 23.17 ± 1.22 months. The average time between the neuropsychological assessment and the MRI acquisition was 27.40 ± 31.29 days at baseline and 12.89 ± 12.49 days at follow-up.

N-back task
Participants performed a letter N-back task with different levels of memory load (from 0 to 3 letters to be retained) inside the MR scan, as described (Sala-Llonch et al. 2012). Briefly, we used a block-designed task where each N-back condition lasted 26 s, followed by inter-block fixation periods of 13 s. Before any N-back block, an instruction screen informed the subject of the upcoming block. Each stimulus (capital letters A-J) was shown in white in the center of a black screen during 500 ms, with an inter-stimulus interval of 1500 ms. Participants were instructed to press a button when the letter "X" appeared (0-back) or when the letter shown matched the one seen one (1-back), two (2-back) or three (3-back) stimuli before. The individual performance was recorded and scores were calculated using the d prime (d′) measure, which accounts for correct responses and false alarms, computed as Z(hit rate) − Z(false alarm rate), where function Z(p), p ∈ [0,1], as the inverse of the cumulative distribution function of the Gaussian distribution of the hits and false alarm rates.

Working memory factor (WMf) calculation and group classification
We used principal component analysis (PCA) to create a composite scale which represented the WM factor (WMf), calculated using the d′ scores from the 0-back, 1-back, 2-back and 3-back conditions during the N-back fMRI task. Subsequently, we calculated the change in WMf as the difference between follow-up and baseline and we classified subjects as stables (N = 23) or decliners (N = 24), according to the change in WMf (above/below zero). Stables were further subdivided into high and low performers according to the WMf at baseline using the median as threshold. As a result, 11 out of 23 subjects were classified as stables high performers (SHP) and 12 out of 23 as stables low performers (SLP).
The main demographic and cognitive characteristics are shown in the Supplementary Material (Supplementary Table 1). It should be noted that differences in years of education and gender were detected when the sample was divided into three groups (F = 7.723, p < 0.001 and χ 2 = 8.313, p = 0.016; respectively). The SLP group disclosed a lower educational level than the SHP (t = 3.824, p = 0.001) and decliners (t = 2.876, p = 0.007). Moreover, there were differences regarding gender due to the small representation of females in the SHP compared to SLP (p = 0.039) and decliners (p = 0.011).

Functional MRI (N-back task)
Data were analyzed with the FEAT-FSL software (FMRIB's Software Library version 5.0.10; https ://fsl.fmrib .ox.ac.uk/ fsl/; Jenkinson et al. 2012). We first performed a preprocessing of all individual fMRI scans, which included nonbrain tissue removal, motion correction, spatial smoothing with a Gaussian kernel of 5 mm of full width at half maximum (FWHM), temporal filtering with a high-pass filter of 160 s and a two-step linear registration to a standard template. Further, the head motion parameters estimated by MCFLIRT (Jenkinson et al. 2002) were included as confounding explanatory variables in our model. Then, at the first-level analysis (Woolrich et al. 2001), data were fit to a general a linear model (GLM) containing the task time series with a gamma convolution of the hemodynamic response function. In this GLM, four regressors and their first temporal derivatives were modeled: 0-back, 1-back, 2-back and 3-back. By including the derivatives, we aimed to correct for shifting in the time series as well as for slice timing effects. We defined a single contrast of interest combining the previous four regressors as the difference of brain activity between the highest and lowest loads (3-back > 2-back > 1-back > 0-back), using weights of 0.375, 0.125, − 0.125 and − 0.375. The results of the first level analyses were further fit into higher level or group-level statistics, performed using the FMRIB's Local Analysis of Mixed Effects (FLAME), (Woolrich et al. 2004). We first calculated the group-mean activity maps of the task contrast and the difference between time points for all subjects. Then, we created group GLM designs to evaluate: (1) time-group interactions, (2) patterns of time-related change for each group and (3) differences between groups (stables vs. decliners; SHP vs. SLP; SHP vs. decliners; SLP vs. decliners) at both time points. Due to the gender differences when the sample was split into three groups, this variable was included as a regressor in the higher level analyses concerning SHP, SLP and decliners. All analyses were performed in the whole brain at a voxel-wise level, and a z > 2.3 was used to define contiguous clusters of activity, then cluster significance levels were estimated and corrected using family-wise error (FWE) correction. The significance threshold was set at a corrected p < 0.05. Finally, to obtain summary statistics of functional imaging data, we computed individual mean BOLD signal values within the significant region of interest (ROI) derived from the functional analyses.

Cortical thickness (CTh)
Structural T1-weighted images were automatically processed with Freesurfer (version 5.1; https ://surfe r.nmr.mgh. harva rd.edu) using its longitudinal pipeline (Reuter et al. 2012). First, the two time points were processed individually, and the results were inspected visually to ensure the accuracy of registration, skull stripping, segmentation, and cortical surface reconstruction. The first author (L.V.-A.) did the manual edition to correct the brain extraction step that was needed in eight subjects at baseline and ten at followup. There were no differences between groups regarding the number of corrections required at both time points. One participant (from the SLP group) was excluded due to the pial and white matter surface mismatches caused by the excessive motion. Then, within-subject template volumes (Reuter and Fischl 2011) and longitudinal files were created for each subject and time point through the longitudinal stream. The CTh maps were first smoothed using a 2D Gaussian kernel of 15 mm FWHM. The symmetrized percent change (spc) was used as the longitudinal measure of CTh. We performed vertex-wise statistics to study the CTh loss after the 2 years (1) for the whole sample, (2) related to group differences, and (3) for each group. As reported in the N-back task section, all the CTh analyses considering the three groups were adjusted by gender. The resulting vertex-wise statistical maps were considered significant at p < 0.05 level. Maps were further corrected for family-wise error (FWE) using a Monte Carlo Null-Z simulation, with 10,000 repetitions and a cluster p < 0.05. In addition, the global CTh values for each subject were calculated to obtain summary statistics.

Additional statistical analyses
Statistical analyses for non-imaging data were performed using IBM SPSS Statistics (Statistical Package for Social Sciences. Version 24.0. Armonk, NY: IBM Corporation). Demographic and cognitive data were described as mean ± SD (Supplementary Table 1). For categorical data, differences between groups were evaluated using the chisquared (χ 2 ) test, while Fisher's exact test was used to compare data regarding the subgroups. For quantitative data, we evaluated differences at each time point between groups (decliners and stables) using independent-sample t tests and between the three groups (decliners, SHP, and SLP) using a one-factorial analysis of variance (ANOVA). Following this ANOVA, if there were significant interactions, independent-sample t tests were conducted to compare groups by pairs at each time point. Furthermore, the differences among group trajectories were investigated using repeated measures ANOVAs. As post hoc pairwise analyses, differences between baseline and follow-up for each group were analyzed using paired-samples t tests. Wilcoxon signed-rank test was used to explore the differences across time in the whole sample. The statistically significant difference for all the analyses was considered at p < 0.05. The graphical representations were performed using GraphPad Prism (version 6.00, GraphPad Software, La Jolla, CA, USA).

Age-related over-activation at follow-up
We evaluated BOLD activity associated with our contrast of interest in the WM task (3-back > 2-back > 1-back > 0-b ack) at both time points independently. In all the subjects, we observed task-related activity in brain areas including the bilateral frontal region, paracingulate, anterior cingulate, supramarginal and angular gyrus, precuneus and lateral occipital cortex, and the right insular cortex, among others (Fig. 1a, b). As compared to baseline, follow-up analyses revealed increased task-related activity on right areas comprising the inferior division of the lateral occipital cortex, the superior temporal gyrus and the occipital fusiform gyrus for the whole sample (Fig. 1c).

Functional changes underlying performance stability and decline
There was no significant time-group interaction between stables and decliners. The functional progression for each group and their cross-sectional differences are detailed in Supplementary Material (Supplementary Figs. 4 and  5). Nevertheless, when the stable group was subdivided into SHP and SLP, we identified a time-group interaction between SHP and decliners (Fig. 2). The SHP group showed a reduction of activity at 2-year follow-up, while the decliners exhibited increased activation in a cluster encompassing the right postcentral and supramarginal gyrus and the superior parietal lobule. Furthermore, the longitudinal progressions for each group were additionally investigated. Pairwise analysis for the decliners showed increased activation (see Supplementary Fig. 4, in green). Nevertheless, no significant differences were found as regards the SHP subgroup.

Cross-sectional differences considering the three groups (SHP, SLP and decliners)
When assessing the two time points separately, SHP showed more activity than SLP at baseline in the right frontal pole (Fig. 3). After 2 years, the SHP kept showing more activity than the SLP group, but at this time point, in the thalamus bilaterally (Fig. 4, in blue). Also, at follow-up, the decliners showed more activity than both stable subgroups. The difference between decliners and SLP comprised the right basal ganglia (Fig. 4, in green) and compared to SHP, the decliners exhibited more activity mainly comprising the left temporo-parietal cortex and the bilateral posterior cingulate and lingual gyrus (Fig. 4, in red).

Cortical thickness results
We observed widespread CTh atrophy in the entire sample between baseline and the 2-year follow-up (see Supplementary Material, Fig. 6). When the sample was divided into the two main groups, as compared to stables, decliners exhibited more cortical thinning in a cluster comprising the left posterior middle and inferior temporal gyrus and the lateral occipital area (Fig. 5a). Then we calculated the specific maps of cortical atrophy for each group. The stables showed significant atrophy across time over the left caudal middle frontal and ventral precentral gyrus, and right middle temporal and inferior parietal cortex (Fig. 5b). On the other hand, the decliners exhibited a more extended pattern of atrophy in both hemispheres, with three clusters in the left hemisphere over the temporal and lateral occipital areas, rostral middle frontal, and the entorhinal areas, along with two clusters in the right hemisphere partially including the superior temporal and banks of superior temporal sulcus region (Fig. 5c). No significantly different atrophy patterns Lastly, it should be noted that we did not identify any significant correlation between the measures derived from the task-fMRI analyses and the CTh values.

Discussion
In this study, we investigated brain activity changes in healthy elders who underwent an fMRI acquisition during a WM task at baseline and at 2 years of follow-up. Fig. 3 The fMRI differences between groups. a Significant activity maps for the difference between SHP and SLP (in blue) at baseline. b Plot of baseline mean BOLD signal values at the ROI separated by group through pairwise comparison. Error bars: ± 1 SEM. SHP stables high performers, SLP stables low performers Fig. 4 The fMRI differences between groups. a Significant activity maps for the difference between SHP and SLP (in blue), SHP and decliners (in red) and SLP and decliners (in green) at follow-up. Plot of mean BOLD signal values after 2 years at the ROI separated by group through pairwise comparison between b SHP vs. SLP, c SHP vs. decliners, and d SLP vs. decliners. Error bars: ± 1 SEM. SHP stables high performers, SLP stables low performers Participants were classified as stables or decliners based on their longitudinal measures of WM performance. Further, the stable group was subdivided (i.e., SHP and SLP) on the basis of their WM performance at baseline. Interestingly, a significant interaction between SHP and decliners emerged: decliners exhibited over-activation at follow-up compared to baseline, while SHP reduced the activity. As stated below, these observations may be reflecting dedifferentiation vs. neural efficiency processes. Albeit, due to the heterogeneity in the stable group, as well as to the resulting reduced sample sizes investigated, the interaction considering the whole stable group and the decliners was non-significant. Moreover, cross-sectional analyses showed differences between groups in fMRI patterns. At baseline, less activity in the SLP compared to the SHP was observed. At followup, this difference remained, while additionally the decliner group showed more activity as compared to both stable subgroups. At the structural level, we observed significant group differences across time as regards atrophy rates over a left temporo-occipital area, where the decliners showed a more noticeable pattern of change than subjects exhibiting WM performance stability, indicating differences in the temporal evolution of brain structure between these groups.

Differences in working memory performance
Initially, the sample was split into stables and decliners according to cognitive scores, as done in prior studies in the field (Josefsson et al. 2012;Persson et al. 2012;Pudas et al. 2017;Rieckmann et al. 2017). After subdividing the stables by baseline performance, differences regarding years of education and gender emerged between the three groups. The SHP had a larger proportion of males, probably because there was a bias towards highly educated male participants reflecting a generational effect. Remarkably, the SHP and decliners had more years of education compared to SLP. Hence, according to the classification we used, a higher educational level is associated with higher initial WM performance, whereas cognitive stability can occur amongst both high-and low-educated elders (i.e., SHP and SLP differed in years of education and in performance at both time points). These results concur with those of larger longitudinal studies suggesting that education mainly contributes to higher starting cognitive performance, rather than being associated with a slower rate of decline (Vemuri et al. 2014;Wilson et al. 2019).

Distinctive fMRI trajectories underlying stability and decline in WM function
In the whole sample, we found patterns of increased fMRI brain activity ) in a cluster comprising regions outside the WM-related areas which comprised the right lateral occipital cortex (Grady et al. 2006;Jamadar et al. 2013;Archer et al. 2018). When the sample was split into stables and decliners, no significant functional trajectory differences were identified. However, this lack of results could be explained in part since stables are a heterogeneous group, as the following findings suggested. In this vein, the subdivision of the stables based on their performance at baseline allowed the identification of a significant interaction between the SHP subgroup and decliners in a cluster entailing the right superior parietal lobule, indicating an expansion of the typical WM-related areas for the decliners Pudas et al. 2017). Although the decliner group exhibited increased brain activity at followup compared to baseline, this was not accompanied with stable cognition. Thus, this over-recruitment phenomenon, far to be reflecting a compensatory mechanism (Cabeza et al. 2002;Grady et al. 2006;Reuter-Lorenz and Cappell 2008), should be interpreted as progressive neural dedifferentiation because this additional neural recruitment was unsuccessful at cognitive level (Logan et al. 2002;Park et al. 2004;Carp et al. 2011). On the other hand, the previously mentioned interaction revealed an activity reduction for the SHP, which could not be confirmed studying the specific group functional trajectory, probably due to the small sample size. Therefore, this finding suggested a more efficient fMRI pattern and preservation of neural resources for the SHP group across time (Nyberg et al. 2010). Concurring with the 'brain maintenance' concept (Nyberg et al. 2012), the observed activity reduction, instead of relating to a loss in cognitive performance, was associated with the maintenance of WM. In sum, our data support the notion that over-activation is a common trait in aging, but which might be mainly driven by decliner subjects. Further, our data emphasized that optimal cognitive function in older adults depends on the successful brain maintenance rather than triggering compensatory mechanisms to counteract the damaged normal functioning (Morcom and Henson 2018).

Cross-sectional fMRI findings
Although the groups' stratification is based on the longitudinal change, we take advantage of the two approaches used (longitudinal and cross-sectional). Even if these results should be considered exploratory given the small sample size, they contribute to understanding the differences between 'hyperactivation' and 'over-activation'. Specifically, at baseline, we found increased activity in SHP compared to SLP ('hyperactivation'), which can be interpreted as a distinctive feature of the high functioning subjects and not as a typical compensatory mechanism ('over-activation'), (Nyberg et al. 2010). These results suggested that a higher WM performance would be associated with higher WM load-dependent adaptability of neural activity (Nagel et al. 2008(Nagel et al. , 2009Nyberg et al. 2009;. While caution is needed, the low activity identified for the SLP at both time points, even compared to decliners at follow-up, supported the hypothesis that a highly demanding task could give rise a shift to the 'under-activation', suggesting a neural resource limitation, or reduced neural capacity (Stern 2009), amongst low-performing elders. In this line, the age-related low activity seems to be partly explained by a low educational level (Archer et al. 2018), which is in accordance with the fact that in our sample the SLP was the less educated group. Another noteworthy finding was that neural activity was also higher for the decliners compared to the SHP at follow-up. However, the neuroanatomical location of these results indicated that the increased activation occurred partially over the left temporo-parietal cortex and posterior cingulate and lingual gyrus, outside the typical WM pattern areas of brain activity but instead overlapping with core default mode network (DMN) regions. Such observations may indicate that a reduced activation balance between the default mode and the task-related pattern activation may be characteristic of a WM decline (Grady et al. 2006;Miller et al. 2008;Sala-Llonch et al. 2012;Spreng et al. 2016).

Decliners showed higher rates of cortical atrophy
We identified a pattern of CTh atrophy after 2 years for all the participants, in accordance with the described widespread age-related grey matter reductions with aging (Fjell et al. 2014a, b;Storsve et al. 2014). There was a significant difference between groups as regards atrophy rate. This finding is in line with former reports describing slower rates of cortical atrophy sustaining cognition in successful aging subjects compared to average-agers (Harrison et al. 2012;Cook et al. 2017). Concretely, the interaction cluster comprised the inferior temporal gyrus and the lateral occipital cortex, areas relatively preserved against the typical agerelated structural impairment ). When considering each group separately, although atrophy clusters are scattered in different regions of the brain, we observed that the decliner group displayed a more generalized pattern of atrophy comprising extended temporal and lateral occipital areas in both hemispheres. Interestingly, this group also exhibited a significant cortical reduction in the left entorhinal cortex, a feature that has been suggested as a marker of incipient Alzheimer's disease pathology (Dickerson et al. 2001). Albeit this, our study does not permit to associate the observed longitudinal findings with absence or evidence of impending brain pathology.

Limitations
A first limitation of the present study was the sample size, especially when the stable group was stratified. For this reason, the cluster-wise threshold set at z > 2.3 for the fMRI analyses should be considered lenient (see Eklund et al. 2016). Hence, and albeit present results are novel in that they offer first evidence of the functional and structural brain correlates associated with WM stability and decline in aging, it is important to highlight that they should be interpreted with caution, and further investigations with larger samples are needed. In this regard, a larger group could have also allowed a better understanding of the role of CR and its putative influence of being stable or decliner, as well as stratification of the decliner group into high-and low-performer subgroups. Further, additional time point measurements may be important to better identify subjects with stable trajectories and to discern whether the WM age-related changes follow a linear function. Moreover, the inclusion of a young reference group could have provided a powerful approach to detect successful cognitive aging. Also, the stratification criteria could be considered a limitation, but previous studies do not propose a unified method to identify 'brain stability' using two time point measures. In addition, the cross-sectional findings suggested that the low level of activation for the low-performing group is associated with the fact that this group may no longer be following a liner response by load as the cognitive demand increase. In this sense, the use of a linear task contrast supposes a limitation for the analysis of this concrete subgroup and further studies should overcome this constraint using a non-linear approach. Finally, the lack of significant associations between the functional and structural brain measures supposed a limitation to deeply discuss our results.

Conclusions
Our results provide new evidence on the underlying functional mechanisms and structural characteristics of WM cognitive stability in older age. First, we observed increased activity at follow-up compared to baseline for the whole sample. Nevertheless, this over-activation was only detected for the decliner group, highlighting the importance of conducting longitudinal studies and stratifying the elders according to their trajectories, as an approach to obtain fMRI-based data reflecting distinct cognitive profiles in aging. A novel finding of our study was that within the cognitively stable individuals, baseline WM performance suggested different age-related trajectories at a functional level. The high-performing and stable older adults showed a reduced brain activity across time, while the decliners expressed a longitudinal activity increase, evidencing neural efficiency and attempted compensation mechanisms, respectively. On the other hand, the cross-sectional approach highlighted an 'under-activation' for the SLP, suggesting a disrupted neural load-dependent adaptability. Furthermore, although structural decline occurred across time in the whole sample, the rate of atrophy was higher for the decliner group compared to stables. Finally, it should be noted that these results were found in the context of lower years of attained education in SLP, probably uncovering that these subjects are not able to engage plastic neural responses related to cognitive and/or brain reserve mechanisms, but can maintain the performance (even though at a low level) because they experiment low rates of structural atrophy.