A Bayesian joint model for zero‐inflated integers and left‐truncated event times with a time‐varying association: Applications to senior health care

Population aging in most industrialized societies has led to a dramatic increase in emergency medical demand among the elderly. In the context of private health care, an optimal allocation of the medical resources for seniors is commonly done by forecasting their life spans. Accounting for each subject's particularities is therefore indispensable, so the available data must be processed at an individual level. We use a large and unique dataset of insured parties aged 65 and older to appropriately relate the emergency care usage with mortality risk. Longitudinal and time‐to‐event processes are jointly modeled, and their underlying relationship can therefore be assessed. Such an application, however, requires some special features to also be considered. First, longitudinal demand for emergency services exhibits a nonnegative integer response with an excess of zeros due to the very nature of the data. These subject‐specific responses are handled by a zero‐inflated version of the hierarchical negative binomial model. Second, event times must account for the left truncation derived from the fact that policyholders must reach the age of 65 before they may begin to be observed. Consequently, a delayed entry bias arises for those individuals entering the study after this age threshold. Third, and as the main challenge of our analysis, the association parameter between both processes is expected to be age‐dependent, with an unspecified association structure. This is well‐approximated through a flexible functional specification provided by penalized B‐splines. The parameter estimation of the joint model is derived under a Bayesian scheme.

challenges for health care providers. A reasonable strategy for optimal medical resource allocation consists of monitoring each individual's service demand from the age of 65 onward, which in turn reinforces affiliates' loyalty. 1 For this purpose, we need precise data on individuals' health status according to their age, and we also need to assess the role of other demographic risk factors affecting the propensity to seek medical attention, such as gender and socioeconomic group. From a health care insurance provider's standpoint, monitoring the use of critical care services becomes crucial; this is the primary source of information on the deterioration of senior affiliates' health, and at the same time it represents the highest costs for the private sector. Since it is common for health care policies to renew annually, the overall risk assessment period can be properly divided into intervals of one calendar year. Then, individually analyzing critical care may provide a useful surrogate indicator or proxy variable when evaluating the subject's instantaneous mortality hazard. When considering these annually recorded medical responses, the variability in a subject's health status within a specific calendar year is unknown. To deal with this missing information, the time-dependent Cox model assumes that the level of critical care usage over the past calendar year is directly imputed to the following one, along which it remains constant. However, such an unrealistic hypothesis usually leads to severely biased results. 2 Instead, a more logical smooth evolution over time of the annual emergency demand may be considered by incorporating a suitable model with time-independent random effects. In this manner, longitudinal and mortality hazard processes are jointly modeled by assuming that the random terms are shared by both submodels; see Henderson et al 3

and references therein.
For complete overviews in this research field, the reader is referred to Tsiatis and Davidian 4 and Rizopoulos. 5 On the other hand, as a key feature of our postulated shared-parameter (SP) joint model, we assess the presumed relationship between both outcomes through a time-varying association. This ultimately quantifies the impact of the expected critical care demand (together with other secondary factors) on the age-adjusted mortality rate for insured elderly individuals. In summary, measuring the health characteristics of elderly populations helps identify high-risk individuals, 6 and opens new avenues for health providers to hedge against subject-specific longevity risk beyond the age of 65.
Our research is illustrated by a unique and relatively large dataset from the Spanish health insurance sector, henceforth HI data, 7 which provides a survey of emergency care demand for 5470 elderly policyholders. All individuals are residents of the city of Barcelona and are between 65 and 99.5 years old at the initial observation, with a median age of 74.9 years. The demand for emergency services is longitudinally recorded in terms of claim counts at the end of each calendar year throughout the entire study window, from 1 January 2006 to 1 February 2014. The monitoring can be considered sufficiently informative, with roughly six measurements per subject on average, along with a mean and median follow-up period of 5.1 and 6.3 years, respectively. In conjunction with critical care demand, gender is presumed to be an important confounding factor in understanding age-related mortality hazard patterns. Out of all subjects, 3415 (62.4%) are female, who historically have longer life expectancies than male. In addition, to identify the potential risk effect of the socioeconomic profile, the postal code is used to determine which policyholders reside in neighborhoods with a mean income above the overall city average; this group consists of 911 individuals (16.7%). The primary longitudinal response is the annual number of reported claims for ambulance services, hospitalizations, and nonroutine visits; these counts are individually recorded in an aggregated manner by each calendar year within the observation period. Subjects' entry into the study is registered at their first longitudinal response, after which subsequent responses are yearly recorded. Because individual follow-up does not necessarily start at the beginning of a calendar year, the exposure-time effect is considered to correct the first measurement's frequency. Table 1 displays the distribution of the 32 269 longitudinal responses collected over the study period and nested within individuals, where zeros represent more than half of overall measurements. This large number of zeros is due to (i) the highly infrequent nature of the data itself and (ii) the existing interplay in the Spanish health care system between universal coverage, which provides social security services without cost to the individual, and private insurance, which represents an optional extra within the publicly funded universal system. This latter point entails a group of measurements outside of the private health sector, which systematically supply null values in our data.  F I G U R E 1 Age-specific descriptive plots from the time-to-event analysis of the health insurance data. Left: Dynamic variation of the number of subjects at risk, overall and by gender. Right: Average trends for annual emergency demand with 95% confidence regions, separately for event-free subjects and those who die during the study The end of individual follow-up can be due either to death or to noninformative right censoring; dead subjects have a share of 9.3%. Since only the trajectories of policyholders who are at least 65 are monitored, all those not living long enough to reach this fixed truncation age cannot be observed. Life spans are therefore affected by left truncation, and the staggered entry of 4365 individuals (79.8%) into the study after 65 must be incorporated as delayed entries in order to avoid overestimating their event times. These are properly considered by adopting the subject's age above 65 as the time scale, in which the difference between the age of entry and 65 is implicitly considered: t = age.entry − 65. Hence, our particular time zero is set at the age of 65, and from then onward, any time point t > 0 is inextricably linked with the corresponding policyholder's age. This duality makes it equivalent to talk either in terms of time or in terms of age; for instance, a policyholder's response at age 75 is recorded at time t = 10 years. In our empirical data, the within-subject observations may arise at any time point between t min = 0 and t max = 37.5, since 102.5 years old is the maximum age at which observations are collected for any subject. Interestingly, these delayed entries result in the progressive incorporation of policyholders after the study has already begun. This entails an age-related dynamic fluctuation of the number of subjects at risk, as illustrated in the left panel of Figure 1. As expected, the number of insured women at risk is typically higher than insured men at all ages, even though this gap in the at-risk pattern by gender tends to narrow with increasing age. It is also relevant to compare the average emergency demand patterns between event-free profiles and trajectories for which death is recorded, as displayed in the right panel of Figure 1. Both of these patterns change across the above-65 age range, and there appears to be strong linkage between dead subjects and greater demand rates; a yearly average of 0.80 emergency claims is recorded among subjects who do not die during the study period, whereas this increases to 1.50 for those who experience the event. Further details about the information provided by the HI data can be found in Appendix A (Tables S1-S5 and Figure S1).

Joint modeling techniques adapted to health care sector
As stated above, we work within the scope of SP joint models for longitudinal and time-to-event data. Recent advances in joint modeling have been made to address different types of non-Gaussian longitudinal responses, including new approaches to better account for a high frequency of zeros with respect to a Poisson process. [8][9][10] Furthermore, other novel extensions to accommodate left-truncated event times within the scope of SP joint modeling have been also examined. 11,12 However, none of the previous studies have combined longitudinal counts exhibiting both excess zeros and marked heterogeneity with left-truncated event times. Simultaneously, and as the main challenge of this article, we aim to appropriately measure the relationship between the need for intensive care and the corresponding mortality hazard. The functional shape that interrelates both processes is completely unknown, and so a flexible time-varying association structure is hypothesized. Song and Wang 13 proposed a joint model in which time-varying regression coefficients are assumed for the hazard process. More recently, Barrett et al 14 postulated a joint model approach in which the association between survival and random effects can also vary over a discrete set of potential time points. Köhler et al 15 focused on a flexible Bayesian additive joint model capable of integrating strongly nonlinear individual trends and time-varying effects in the association parameter via a penalized B-spline-based approach. They summarized their developments by means of the R package bamlss. 16 Also within the Bayesian framework, Andrinopoulou et al 17 improved the classical SP joint model scheme by expanding the association parameter into penalized B-spline basis functions with quadratic splines of time, and illustrated the benefits in the context of dynamic predictions. Building on their contribution, we propose a joint model to associate zero-inflated discrete responses with left-truncated event times, while also using a time-varying association approximated by penalized cubic B-splines. This modeling framework allows us to incorporate the particular features necessary when analyzing the HI data and provides the tools to properly manage applications beyond the insurance domain.
To appropriately associate the observed emergency claims per year and mortality risk in our empirical data, the main challenges tackled in our analysis are 3-fold: (i) developing an SP joint model to accommodate correlated count rates with an overabundance of zeros and left-truncated event times, (ii) incorporating a flexible time-varying association structure between the longitudinal and time-to-event processes through a penalized B-spline approach with cubic splines, and (iii) demonstrating the validity of the proposed model, both empirically and numerically. The article is developed under a Bayesian paradigm by using Markov chain Monte Carlo (MCMC) algorithms. This approach allows for a clearer and more straightforward coding implementation for the referred tasks in comparison to the maximum likelihood approach. In addition, a full posterior inference for any parameter can be obtained, as well as a wide range of diagnostic tools for checking the model fit. These benefits, however, usually go hand-in-hand with a high computational burden derived from using MCMC sampling, especially for sizable datasets.
The remainder of the paper proceeds as follows. Section 2 specifies the Bayesian formulation for jointly tackling the measurements of longitudinal emergency care demand and the time to death. Section 3 describes the estimation procedure under the Bayesian framework. Section 4 conducts an extensive simulation study to assess the performance of the proposed joint model. Section 5 illustrates the application of our model to the motivating dataset. Section 6 describes the model assessment through posterior predictive checking and residual diagnostics. Lastly, Section 7 provides a discussion of the main results and suggests some topics for future research.

Longitudinal submodel for zero-inflated count rates
Denote y i = {y i (t ij ), i = 1, … , n} as the observed response vector for the ith subject, recorded at a sequence of time points t ij , j = 1, … , n i . For convenience, suppose that every longitudinal count rate can be formally expressed as } is a latent indicator that captures subject's probability Bi (t) to use private health care at a specific time t, while y Ci (t) is the count response derived from an adequately chosen model f C (⋅) such that E{y Ci (t) | b iC } = Ci (t) is its mean response, conditional on the random effects b iC . Specifically, v i (t) = 1 if the longitudinal measurement corresponds to a senior policyholder who uses private services within a specific time interval. In this case, y Ci (t) comes from f C (⋅) and may logically include zeros if the subject's health status is good enough. In contrast, if v i (t) = 0, then y i (t) = 0 systematically. These two zero-generation sources affecting longitudinal count rates are combined into a zero-inflated response as with probability mass function where I(⋅) is the indicator function and b i = (b ⊤ iB , b ⊤ iC ) ⊤ encompasses the common and unobserved subject-level random effects for the binary and count parts, respectively. The corresponding overall conditional mean and variance expressions are Apart from the excess of zeros, the discrete measurements y Ci (t) can also be affected by a marked level of patchiness among subject-specific responses, so the conditional variance exceeds the conditional mean. The hierarchical approach to the standard negative binomial (NB) model, commonly termed as NB2 model, arises as the traditional modeling option for f C (⋅), since this model provides a second-degree dependence of the variance upon the mean. This quadratic relationship, however, can be relaxed by assuming a NB-Power (NBP) distribution: 18,19 Above, the heterogeneous nature of y Ci (t) is explicitly accommodated through a common shape parameter > 0 for all subjects, whereas the ancillary parameter P > 0 allows for a Pth degree mean-variance relationship. The combination of (2) and (4) yields a hierarchical zero-inflated NBP (ZINBP) model, y i (t) ∼ ZINBP{ Bi (t), Ci (t), , P}. From this, two common hierarchical models for discrete data with excess zeros can be straightforwardly derived: the zero-inflated NB (ZINB2) longitudinal model is obtained by setting P = 2, whereas the zero-inflated Poisson (ZIP) longitudinal model is considered if → ∞ for all P.
On the other hand, both Bi (t) and Ci (t) can be, respectively, related to linear mixed predictors, Bi (t) and Ci (t), through prescribed link functions. Specifically, let where x ⊤ Bi (t) and z ⊤ Bi are the ith subject's rows of the fixed-and random-effects design matrices in the binary part, respectively, while x ⊤ Ci (t) and z ⊤ Ci are the corresponding rows of analogous matrices in the count part. These matrices can be defined to summarize the subject-specific evolutions using simple regression patterns, such as linear trends, or to approximate these trajectories through more flexible functional forms, such as splines. Moreover, B = ( B 0 , B 1 , … , B p B ) ⊤ and b iB = (b iB 0 , b iB 1 , … , b iB q B ) ⊤ denote the p B + 1 fixed and q B + 1 random effects for modeling the binary response, while C = ( C 0 , C 1 , … , C p C ) ⊤ and b iC = (b iC 0 , b iC 1 , … , b iC q C ) ⊤ include the p C + 1 fixed and q C + 1 random effects for modeling the count rates. This yields the two-part model Here, e i (t) acts as an offset term which accounts for the existence of an exposure-time effect related to a specific subject interval. Given that count rates are from 1-year periods in our analysis, we have e i (t) = 1 except for the first longitudinal response of those subjects whose trajectory does not start to be observed at the beginning of a calendar year; for these responses, e i (t) ranges from 0.5 to 1. Moreover, the distribution for the random effects is established to be b i ∼ N q (0, D), q = q B + q C + 2, with an unspecified structure for the q × q covariance matrix D. The latter can be decomposed into the symmetric block matrix where D B and D C are the (q B + 1) × (q B + 1) and (q C + 1) × (q C + 1) covariance submatrices for b iB and b iC , respectively, whereas D BC is the (q B + 1) × (q C + 1) cross-covariance submatrix that includes the prior correlations between b iB and b iC . Specifically, the diagonal terms of D correspond to the variances of random effects, 2 kk , for k = 1, … , q, whereas the off-diagonal elements denote the corresponding covariances, k k , for k, = 1, … , q. By combining the overall conditional expectation from (3) with the two equations of (5), we have the globally-expected longitudinal response at time t:

Structural features
Let age beyond 65 be the time scale of the study, and let the nonnegative random variable T * i denote the ith true event time. Time-to-event data may be subject to a random left truncation process, denoted by the variable L i = age.entry i − 65 ≥ 0, as well as to random right censoring, denoted by the variable C i . Suppose further that {L i , T * i , C i } are conditionally independent given the covariate history, and also suppose that L i and C i are not related to the subject's death event. For those individuals with T * i > L i , we observe both the event time T i = min{T * i , C i } and the random event indicator In contrast, individuals with T * i ≤ L i are not observed since they either do not reach 65 within the study window or pass away before reaching this age. Our primary research interest lies in assessing the time-dependent relationship between the expected zero-inflated response i (t), from the longitudinal process, and the hazard for death at time t, from the event history process. These two components are jointly modeled using a shared vector b i of time-independent random effects, while also relating both processes with an unknown functional shape whose trajectory varies over the course of time. Our proposed SP joint model with a time-varying association parameter (JMTV-ZINBP) can be expressed as the instantaneous hazard rate which is conditioned on: the truncation process, the underlying history and a vector w i (t) of exogenous, possibly time-varying, covariates, with related regression parameters w . Additionally, h 0 (⋅) is usually an arbitrary and unspecified function to describe the baseline hazard at the population level, while (⋅) denotes some flexible function of time that assesses the effect of the longitudinal outcome on the conditional hazard rate. Thus the quantity exp{ (t)} evaluates the hazard ratio at time point t for a one-unit increase in the value of i (t), while keeping the rest of variables constant. A typical major concern when dealing with JM techniques is the specification of the unknown baseline hazard as a function of age. This masks underlying mortality risks in the target population, which may vary gradually over time and ultimately determine the distribution of the event times. Consequently, even though in our analysis the original experience data is aggregated over 1-year calendar periods, survivorship needs to be continuously modeled over the entire individual's lifetime. This guarantees that a subject's conditional hazard rate will reasonably and smoothly rise with increased age. Notice that taking natural logarithms in (9) directly leads to a numerically advantageous linear relationship with time, and so any approximation of the baseline hazard function is usually performed with its logarithmic version, log{h 0 (⋅)}. A first alternative for mimicking the baseline hazard consists of approximating its true functional shape through a predefined survival law. The use of a parametric function of age confers two major advantages: the model itself automatically imposes smoothness across the age-related mortality hazard trend, and only a small number of parameters need to be estimated. 20 The most basic law to describe mortality trends in human populations is the Gompertz survival model, in which a continuous exponential increase in death risk over time is hypothesized. Although other more general forms are also extensively considered in the literature, 21 for illustrative purposes we set the Gompertz distribution as the default parametric law for describing the baseline hazard in our empirical data. A common parameterization is h 0 (t) = 0 exp( t t), where 0 > 0 and t ≥ 0 represent the initial mortality intensity and the annual proportional growth rate in mortality, respectively. Caution is advisable if 0 is close to zero, due to the fact that numerical instabilities are likely to affect this parameter estimation. This can be circumvented by setting 0 = exp( ), so that h 0 (t) = exp( + t t). Taking natural logarithms of this equation does not merely linearize it, but also provides an appealing direct relationship between the logarithm of the baseline hazard and the parameters themselves, log{h 0 (⋅)} = + t t .
Despite the benefits of using a suitable law to estimate the baseline hazard's functional shape, previous knowledge of risk behavior is required from the researcher when handling complex data. Otherwise, using a predefined law which relies on few parameters could seriously hinder the analysis. An alternative and more flexible option to model this underlying risk is the use of penalized B-spline basis functions, called P-splines, 22 requiring no precedent for handling this kind of data and allowing as much smoothness as desired. Essentially, the most appealing feature of such a generalized proposal is that the underlying functional shape behind log{h 0 (⋅)}, along with that of (⋅), are properly approximated through expanding each of these time-dependent functions into P-splines, with respective degrees d h 0 and d . For these two P-spline regressions, the corresponding B-spline basis can be immediately calculated as differences of truncated power functions, as suggested by Dierckx; 23 we also refer to Eilers and Marx 24 for computing details and some examples discussed herein. To control smoothness, one can impose discrete penalties on finite differences of the estimated coefficients from adjacent B-splines. Using this strategy, assuming a relatively large number of knots within the time domain is generally recommended, so that potential overfitting problems are circumvented with a penalty that counterbalances the flexibility. This is of primary importance because it solves one of the major questions when approximating an unspecified shape by a smoothing curve. In parallel with the penalty specification, the other cornerstone lies in the adequate selection of the number of knots for each approximation, Q h 0 and Q , respectively. The great advantage of Bayesian P-splines is precisely the fact that choosing the number of knots is not a critical step, since the penalty corrects possible overfitting concerns. This does not mean, however, that in practice one can indefinitely increase the number of knots placed, since a trade-off between enough smoothness in the fitted curve and an acceptable computational burden is necessary. Eilers and Marx 22 suggested including between 15 and 20 knots to avoid thinking about the number of knots needed, especially when dealing with large samples. Along the same lines, Ruppert 25 pointed out that, in the case of functions which are expected to have few oscillations and slow changes, one can barely observe any improvement when using more than 20 knots (he provides an example with a sample size of n = 25 000). Taking these previous references into account, the maximum number of knots considered in our generalized proposal is set at 15 for large datasets in which the time-dependent function is expected to have a smooth trajectory without marked oscillations.
To provide a unified and flexible modeling framework, a Bayesian P-spline approach is the default option utilized in our proposal to examine the functional shapes of log{h 0 (⋅)} and (⋅) within the joint model. Moreover, for comparison, the parameter estimates and the corresponding time-varying profile for a Gompertz approximation of the baseline hazard function are also included in the results section.

Prior setting
Regarding the ZINBP longitudinal approach, the univariate prior distributions for the fixed-effects regression coefficients of the binary and count parts are Here prior variances are allowed to be sufficiently large by assuming small precision parameters, B = C = 0.01, say. Furthermore, the priors assumed for the scale and power parameters are ∼ U(1, 5) and P ∼ U(1, 5).
The prior assignment within the survival approach starts with independent and diffuse normal distributions for the components of vector w , namely N(0, 0.01). For the B-spline approximation of the time-dependent functions log{h 0 (t)} and (t), an equally spaced vector h 0 of Q h 0 knots is placed on the time domain [t min , t max ], t min = h 0 ,1 < … < h 0 ,Q h 0 = t max , while a vector of Q knots is located over [t min , t max ], t min = ,1 < … < ,Q = t max , thus dividing the domain for each case into Q h 0 + 1 and Q + 1 subintervals. Hence, the baseline hazard function on the log-scale can be approximated through a linear combination of R h 0 = (Q h 0 − 1) + d h 0 B-spline functions, while the true correlation between longitudinal responses and event times can be captured via R = (Q − 1) + d B-splines: Here is the corresponding vector of unknown regression coefficients (without biological or physical interpretation). Similarly {B d ,r (t, ),r = 1, … , R } embraces the R B-spline functions to mimic the shape of (t), while = ( ,1 , … , ,R ) ⊤ reports the regression coefficients. The curve-fitting solutions for log{h 0 (t)} and (t) are positively defined over the d h 0 + 2 and d + 2 adjacent knots, respectively, being zero everywhere else. Because of boundary conditions from the B-spline definition, the original support of knots is extended in each approximation by, respectively, adding d h 0 and d knots to the left and right of their border-knots, namely { h 0 ,1 , h 0 ,Q h 0 } and { ,1 , ,Q }. The resulting Q h 0 + 2d h 0 and Q + 2d knots will be used to generate two complete basis with B-splines of degrees d h 0 and d . In this article, we use Bayesian penalized B-splines 26 to obtain a parsimonious parameterization of the baseline hazard and, especially, of the time-varying association parameter. Following such an approach, the joint prior distributions for vectors h 0 and are conditional on the amount of smoothness introduced by the positive-valued hyperparameters h 0 and , which penalize the roughness of log{h 0 (t)} and (t), respectively. Concretely, these overall reference priors are assumed to be yielding the hierarchical multivariate Gaussian where the symmetric and positive-definite precision matrices M h 0 and M in turn play the role of appropriate penalty matrices. Validity of such a priors then yields the global penalties pen Here where Δ u and Δ v are the difference matrices based upon uth and vth order squared finite differences of adjacent B-splines, respectively, while the term I introduces a small "ridge penalty" to avoid a linearly dependent system. This is done by providing a small enough value for , for instance = 10 −6 . We assume here onwards penalties based on second-order differences, yielding rank(M h 0 ) = R h 0 − 2 and rank(M ) = R − 2 in (11). The sets of parameters { h 0 , h 0 } and { , } are in each case simultaneously estimated by assuming a prior distribution which delivers a large positive support for the unknown smoothing hyperparameters h 0 and . For these, the standard prior choice is Gamma(a , b ) with expectation a ∕b and variance a ∕b 2 , where a and b are chosen to be minimally informative. Traditional options advocate for assigning highly diffuse but proper priors, with small values for a = b , 27,28 or, alternatively, combining a = 1 with a small value for b . 17,26,29 Both options are generally proven to supply quite similar results when the sample size is large enough. 30,31 In our approach, we have considered with h 0 ∼ Gamma(1, 0.005) and ∼ Gamma(1, 0.005).
The prior setting concludes with the specification of the normally distributed random effects, which is completed by assigning a proper hyperprior distribution for the q(q + 1)/2 distinct entries of the covariance matrix D. Within the Bayesian framework, the hierarchical covariance matrix of a multivariate normal variable is combined in a natural way with the inverse-Wishart (IW) prior distribution, so D ∼ IW q (S −1 , ). Here S −1 is a q × q positive definite scale matrix and denotes the degrees of freedom, with ≥ q to ensure the properness of the prior. This in turn equates to picking a Wishart prior for the precision covariance matrix, D −1 ∼ W q (S, ). A classical prior setting is usually established as S = I and = q + 1. 32 To construct the full conditional likelihood for the joint model, we assume that, given the random effects of the ith subject, these two processes are independent, as are the subject's n i longitudinal responses. 5 Under this conditional independence assumption, the complete conditional joint likelihood can be factorized as

POSTERIOR INFERENCE
where p y (⋅) is the conditional likelihood function of the zero-inflated count rates and p t (⋅) is the conditional likelihood for the event times. The resulting posterior distribution of parameters Here, the prior distribution for the unknown parameters ( , b i ) can in turn be factorized as where p b (b i | b ) follows a zero-mean multivariate Gaussian distribution. In addition, prior independence is assumed between the components of , so the posterior distribution for the parameters satisfies The resulting expression, however, is analytically intractable, so we approximate it using MCMC algorithms. The specific form presented by p y (⋅), p t (⋅), and p b (⋅) is derived in Appendix B.

General design
A simulation study is conducted to firstly check that the general algorithm proposed to run the JMTV-ZINBP is indeed working, and then to determine the main effects of using more elementary joint model structures to fit data generated from a JMTV-ZINBP. In particular, we begin with a base scenario in which a set of M = 500 independent datasets are both simulated and fitted using our complex joint model. Then, three additional scenarios are considered, using in every case a particular simplified version of our proposed joint model to fit the same 500 datasets generated in the first scenario. Each of these three alternative joint models lacks a noteworthy feature of the JMTV-ZINBP, thus enabling us to examine which aspect of the generated data is not properly characterized. The four distinct scenarios are named according to the joint model assumed therein to fit the common datasets. We therefore have (i) JMTV-ZINBP, (ii) JMTV-NBP, (iii) JMTV-ZIP, and (iv) JM-ZINBP. A replicated dataset reports the longitudinal counts observed over a 7.5-year observation window for n = 250 subjects, whose distribution by age is similar to that observed in the HI data. All subjects are 65 or older at the time of their first measurement, and subsequent measurements are then taken annually. To properly account for this age adjustment in the event time process, the time scale is measured in years above the age of 65, so this fixed truncation age is directly linked to our time zero. Particularly, 30% of individuals are assumed to enter the study at t = 0, while the remaining percentage enter at any time between t = 0 and t = 30 years; such a substantial proportion of left-truncated event times typically arises when the observation of a nonselected cohort of senior individuals is conditioned on a fixed age. Aside of truncation, the event times are potentially affected by a noninformative censoring mechanism, expressed by independent and uniformly distributed random variables C i with a mean of 20 years. This results in about 45% of all subjects being right-censored, so each subject's longitudinal information is gathered through at most n i ≤ 8 observations. Additionally, the baseline hazard function for all scenarios is taken to be Gompertz distributed over time.
The parameter setting for the datasets common to all scenarios is established as follows. The binary part only includes fixed and random baseline terms, whereas the count part is described using cubic natural B-splines of time, with a single internal knot at the median of time measurements related to each dataset, and considering both a fixed and a random baseline effect. Specifically, the two longitudinal predictors are Bi (t) ≡ Bi = B 0 + b iB 0 , and Ci (t) = C 0 + b iC 0 + C 1 B 3,1 (t, ) + C 2 B 3,2 (t, ), where the random intercepts of binary and count parts are assumed to be uncorrelated, that is, BC = 0; this way, the specification for the random effects is reduced to b iB 0 ∼ N(0, 2 B 0 ) and b iC 0 ∼ N(0, 2 C 0 ), with B 0 = 0.500 and C 0 = 0.400. We let B 0 = 1.525, ( C 0 , C 1 , C 2 ) = (0.405, 0.165, 0.070), = 1.950, P = 1.100, and the exposure period is set at 1 for all measurements. This yields count values in the range 0 to 20, with 45% of overall zero counts. The relative risks model in (9) includes a single baseline zero-one indicator group w i (t) ≡ grp i ∼ Bernoulli(0.70), with w = −0.150. Likewise, the baseline risk function is fixed as h 0 (t) = exp(−3.910 + 0.027 t), while the underlying association curve is assumed to be (t) = 2.1610 − 0.0752 t + 0.0005 t 2 .

Computed information
The time-varying association (t) is approximated through a P-spline basis of degree d = 3, placing Q = 8 internal knots between t min = 0 and t max = 37. ∑ M m=1 p 97.5,m ∕M denote the average 2.5th and 97.5th percentiles of , respectively. The estimation procedure is handled using an MCMC algorithm, which sequentially draws random and independent samples from the posterior distribution. We use Gibbs' sampling to fit our model by means of the Bayesian software JAGS v. 4.3.0. 33 For each simulated dataset, we run two parallel chains with 10 000 iterations per chain, while discarding the first 9000 iterations per chain as a burn-in period. Thinning is set to keep 500 posterior samples from each chain, thus using a total of 1000 samples for estimation. Convergence is checked via the visual examination of traceplots, as well as through the potential scale reduction factorR, defined by Gelman and Rubin, 34 and the effective sample size, n eff .

Simulation results
The average posterior mean estimates for the joint model parameters of each scenario are listed in Tables S6 to  S9, provided in Section C1 of Appendix C. Along with the fitting of each dataset under a particular scenario, a new dataset of size n = 250 is simulated from the corresponding posterior parameter distribution. Thereby, an average count distribution can be derived for each of the four scenarios from the 500 replicated datasets generated therein, and these frequency distributions can in turn be compared to the theoretical distribution from the initially generated 500 datasets. Such a comparison is presented in Table S10, which appears in Section C2 of Appendix C.
The average results of scenario (i) suggest that the JMTV-ZINBP algorithm yields little bias in most of the parameter estimates, while the average of the estimated SEs shows a strong concordance with the empirical SE. It is also interesting to note the great similarity between the fitted and theoretical frequency distribution of the count rates. In the remaining three scenarios, however, some significant disparities between the average posterior result and the true magnitude may be consistently observed because of the simplified joint model assumed. In scenario (ii), the the JMTV-NBP does not explicitly account for the excess of zeros, so the additional variability from this source of overdispersion is absorbed by the catch-all parameter . In comparison with its theoretical value when generating the data, this term exhibits an average posterior mean that is nearly half, = 1.950 vs av (t) = 1.058. Hence, the longitudinal part becomes highly efficient in accommodating the overall proportion of zeros, even if these are slightly underestimated. Conversely, lower counts are likely to be overfitted. With respect to the shape of the association parameter, we can observe that its corresponding average estimate is fairly close to the true time-varying shape. In scenario (iii), the specific features of the two-component ZIP response are specifically developed to deal with an excess of zeros. The JMTV-ZIP alternative, however, assumes a Poisson model for its count part, thereby neglecting the nonlinearities coming from unobserved heterogeneity among clustered observations. This lack of flexibility is well-reflected as a significant reduction of the corresponding average posterior mean of the binary part's intercept, B 0 = 1.525 versus B 0 ,av = 0.903, yielding a serious overestimation for the excess of zeros. Nonetheless, the fitted frequency of overall zeros remains in line with the theoretical frequency. A further examination of the fitted responses from this model reveals that the relatively long tail displayed by the original data is not captured; there is definitively more latent variability in the data than the model can adjust for. Regarding the estimation of the trajectory (t), similar results to the previous scenarios are derived, providing the model with a high capability to mimic the true association shape. Scenario (iv) retains the same longitudinal assumptions as the model which generates the data, but the constant association parameter assumed for the JM-ZINBP leads to substantial departures of the longitudinal coefficients from their true corresponding values. This ultimately gives rise to an underprediction of the frequency of zeros, along with an overrepresentation of small and intermediate counts. As a key feature of this scenario, it summarizes the time-varying relationship between longitudinal and event time with a single average posterior mean, av = 1.517; this is actually a particular value within the range covered by the true (t). For illustrative purposes, Figure 2 displays the 500 time-varying correlation profiles under scenario (i), while Figure S2 in Section C3 of Appendix C presents the time-varying profiles derived for each of the four scenarios.

Longitudinal results
We first consider a Bayesian longitudinal model with a ZINBP response to accommodate the observed emergency claims per year from the HI data. Although this is not a necessary step, it provides a reliable guide about the prior F I G U R E 2 Simulation results for the time-varying association under scenario (i). The target shape of (t) is compared to the average posterior mean av (t), which is obtained by averaging over the M = 500 posterior mean profiles from simulations choice for the longitudinal parameters in the subsequent JMTV-ZINBP fit, especially when faced with a relatively large dataset such as ours. For the binary response, we assume that the probability of private medical usage is explicitly dependent on subject-specific behavioral habits, where both fixed and random intercept terms are considered. Such usage is also influenced by two zero-one indicators: the subject's gender, sex i = I(sex i = female), and the household income, inc i = I(inc i > average), which establishes who has a monthly income above the municipal average. By contrast, the count response accounts for nonlinear patterns over time by using natural cubic B-splines, with a random term also assumed at a baseline level. The gender and household income factors are also included in this count part. Following (5), we set D). Correlation between the binary and count responses is defined through BC = Cor(b iB 0 , b iC 0 ), which controls how the subject's need for emergency care at time t relates to the claim intensity at other time points. In this regard, if our general model only accounts for random intercepts in both the binary and count parts (i.e. q = 2), an easy-to-handle alternative to the standard inverse Wishart prior could be based on the fac- 35 See also Neelon et al 36 for a detailed example of such a strategy in the hierarchical zero-inflated setting. The two referred approaches to account for random effects were previously tested and led to similar results. In this article, however, we only follow the procedure of assigning an IW prior to the random effects in order to provide a general overview of the proposed model. The Bayesian estimation is conducted via JAGS using 10 000 iterations for the adaptation, and then running two parallel chains of 25 000 iterations each after a burn-in period of 15 000. We kept 1000 posterior samples from each chain. Mean, SE, and 95% credible interval are sampled for each parameter from the corresponding marginal posterior distribution. The posterior summary statistics for the unknown parameters are provided in Table S11, within Section D1 of Appendix D.

Application of the JMTV-ZINBP
The proposed JMTV-ZINBP for the empirical data is concretized as in which the profiling of age-related functions log{h 0 (t)} and (t) can be approximated using P-splines of the same degree, d h 0 = d = d, while also placing the same quantity Q h 0 = Q = Q of equally spaced knots between t min = 0 and t max = 37.5 years; this indicates that the overall observations occur between the ages of 65 and 102.5 years. We particularly set d = 3 and Q = 15, which result in computing R = 17 regression coefficients within each approximation of the function.
Taking advantage of the previous section's results, the univariate normal prior distributions for } are now centered on their corresponding longitudinal mean estimates, while the distance between the lower and upper support of the uniform priors for and P is reduced. Furthermore, the only survival covariate considered is gender, w i (t) ≡ sex i . The posterior distribution of the parameters, ( , b i |  n ), is approximated by running an MCMC algorithm. Similarly to the longitudinal fitting, the number of iterations required in the adaptation phase is unusually high, at 25 000. The fitting of the JMTV-ZINBP accounts for two parallel chains with 100 000 iterations each after a burn-in of 50 000, and a total of 2000 posterior samples are drawn. The JAGS code within the R environment is given in the supplementary files. Sections D2 and D3 of Appendix D (Tables S12-S15) provide the main posterior statistics of the unknown parameters for the joint model fit with constant and time-varying associations, respectively, while distinguishing for each case whether the log-baseline hazard is approximated using a Gompertz model or a P-spline approach. Further, the traceplots and the posterior densities for the parameters are shown in Section D4 ( Figures S3-S11), from which a correct mixing and convergence of the chains can be concluded. The results of the two fitted JMTV-ZINBP give rise to the shapes for h 0 (t) and (t) depicted for each case in the four-panel plot shown in Figure 3. Looking at this, it appears that the baseline hazard function exhibits an increasing convex trend for the target population, whether a Gompertz model is assumed or its underlying shape is approximated using P-splines. However, with the former approach, the uncertainty around the mean trend is higher. These functional forms seem reasonable, since mortality risk increases as people get older. The top and bottom right panels show hardly any differences and demonstrate that the annual demand for emergency medical care and the mortality hazard are positively correlated over time, while the posterior statistics obtained for the regression coefficients provide evidence that this correlation is higher than zero. Most remarkably, these two plots show that the relationship between these outcomes is not stationary from the age of 65 onward, but instead varies significantly with age. From the age of 65 to the ages around 75, the mortality risk decreases sharply with each one-unit increase in the annual emergency claims rate. In consistency with this, for relatively younger individuals (within the total range of 65-102.5 years), seeking emergency services is likely indicative of a more critical health status because the intensity of use itself is lower and because younger retirees are usually more mobile. Additionally, for 75-to 95-year-olds, medical emergency services are relatively more frequently used but not necessarily indicative of a terminal illness. As a result, the time-varying association trend shows barely any noticeable variation in the 75 to 95 age range. Also recall that the abundance of data for the ages around 85 allows for a reduction of the 95% credible intervals for that age range. Finally, the risk exhibits an increasing trend from the age of 95 years onward, even though emergency service usage typically reduces sharply at very old ages. Here, the survival probability itself is so low that any unusual occurrence leads to an increase in the mortality risk.
The implications of these results are of vital importance within the health care field, giving rise to wider applications that improve resource allocation across different age groups. Indeed, the postulated joint model allows for contouring the age-related hazard ratio HR(t) = exp{ (t)ΔEC} for a specific change in the annual amount of emergency claims, ΔEC. Then, a first option to concisely capture the age effect on mortality risk relies on transforming (t) into a piecewise-constant trend over a predefined age intervals. These can be chosen in accordance with the three distinct trends for (t) shown in Figure 3, 65 to 75, 75 to 95, and 95 to 102.5. Constant association values can be separately derived within each of these three age intervals, namely k , k = 1, 2, 3, so that each value is obtained by averaging (t) over the corresponding interval. In particular, we obtain the following mean values (with their associated 95% CI):  0.553, 1.284). These quantities provide a simple but useful piecewise expression for the hazard ratio, based on the age range in which the age t of a certain policyholder is located.
A more accurate option for communicating the results is through the use of the shaded contour graph shown in Figure 4, which illustrates the hazard ratio estimations for both a specific age t above 65 and a variation ΔEC, regardless of the number of claims observed in the immediately preceding year. The graph is consistent with the shape of the time-varying association in Figure 3, showing that emergency care usage has a greater impact on survival at the lower and very advanced ages among the senior policyholders. For example, let us consider three policyholders aged 65, 85, and 100, each of them having experienced an increase of one emergency claim. From the graph, the corresponding mortality risk projections lead to a HR = 2.51 for the individual aged 65, a HR = 1.69 for the individual aged 85, and a HR = 2.59 for the centenarian. To gain insight into the impact of emergency care usage on survival at different ages, in Section D5 of Appendix D, Table S16 presents the estimated hazard ratios that correspond to different increases in emergency claims per year at the given ages between 65 and 102.5 years.

Posterior predictive checking
A basic level of performance assessment consists of judging how well the posited model captures the main features of the observed longitudinal data y i , that is, examining the plausibility that such data have really been generated by our model. Indeed, the ultimate goal of our model assessment is to inform about possible misspecifications in the model or, more generally, to identify those aspects in which the model may be improved to provide a better data fitting. A validation procedure should, ideally, consider external data which have not been used for estimating the joint model parameters, but such information is not always available. Nevertheless, the model checking can still be accurately carried out using replicated datasets y rep,i of the same size and shape as the observed one, each of these replications comprising a new dataset that would theoretically be obtained from our applied JMTV-ZINBP fitting if assuming the estimated { y , b i } from the HI data for { y , b i }. The replicated datasets are merely simulated from the posterior predictive distribution p(y rep,i | y i ) = ∫ ∫ p(y rep,i | b i , y ) p( y , b i | y i ) d y db i , so such replications can thereby actually be understood as predictions. Hence, it makes sense to consider that the more indistinguishable the replications are when compared to the original data, the more accurate the fitting is. This approach has sometimes been criticized by arguing that the data are used twice, both for fitting the model and for checking how this model mimics any relevant aspect of the data; however, within the Bayesian framework, the posterior predictive p-value can be directly interpreted as a conditional probability statement given the data, so in that sense the data would really only be used once. 37 The degree of agreement between y i and y rep,i may be expressed either quantitatively or qualitatively. The first option relies on naturally extending some of the classic goodness-of-fit methods to the Bayesian framework, thus informing in probability terms about the potential discrepancies between the HI data and its replications. This is achieved through the construction of specific auxiliary test statistics T(⋅), which can depend on the data as well as the parameters. The information provided by these diagnostic quantities can be summarized by a 95% credible interval for T rep,i ≡ T(y rep,i | b i , y ), so a narrow-enough 95% credible interval around the observed quantity T obs ≡ T(y i | b i , y ) contributes to providing strong evidence that the hypothetical replications can suitably account for a particular feature of the HI data. Additionally, a Bayesian P-value P B can be calculated. This, in particular, shows the probability that some of the replicated test statistics exceed the quantity T obs , that is, P B = Pr(T rep,i > T obs | b i , y ) + (1∕2) Pr(T rep,i = T obs | b i , y ), for the count rates. Hence, P B should tend to be close to 0.50 if the fitting of the model is appropriate, due to the uncertainty in { y , b i } from T obs . As an example, Table 2 provides the checking results when analyzing how the model approximates two key aspects of our data distribution: the two types of zeros and the extreme values at the right tail. The results provided on the left-hand side of Table 2 summarize the posterior predictive test performed to assess how the replications y rep,i , i = 1, … , 2000 are able to capture the overall percentage of zeros and the percentage of subjects for which all measurements are zero count rates. In general, the assessment of the number of zeros yields reasonable posterior intervals and Bayesian P-values. On the other hand, as displayed on the right-hand side of Table 2, the predictive checks for the existence of measurements above 20 (the maximum observed value) lead to unexpected near-one p-values. However, although almost all the replications contain some count rate above 20, so this itself is not a problematic result for our model validation since the percentage of such measurements is negligible. Note the agreement between P-values when assessing the percentage of count rates above 20 and those P-values derived from the percentage of subjects with at least one extreme count; it may be concluded that over-20 measurements appear exactly once within each of these subjects. As another numerical check, Table 3 compares the overall distribution of the observed data to the distribution which results from averaging the 2000 replicated datasets from the posterior predictive density.
Alternatively, the comparison between the characteristics of the reference data and its replications can be graphically expressed by plotting some of the checks undertaken. In this regard, there is a broad range of checking plots which are proven to be useful. 32,38 For instance, Figure 5 assesses whether the model is adequate to capture the health care usage by accounting for age and the mortality risk. With this purpose, each subject's measurement is first allocated to one of three possible age categories, namely [65, 75), [75,95), and [95, 102.5] years, and is then assigned to one of two possible subgroups distinguishing between alive and deceased subjects. The observed amount of emergency claims is recorded within each of the six possible groups, and observations are compared to the 2000 simulated distributions of annual emergency claims by age and event status. The position of T obs with respect to the histogram generated from the replicated datasets allows us to illustrate how precisely our model fits the annual emergency demand when stratifying by the aforementioned groups. The more centered the observed test statistic is with respect to its corresponding histogram, the better the model captures the distribution of the yearly count rates in that category.

Residual diagnostics
Beyond posterior predictive methodology for model diagnostics, a qualitative analysis of residuals arises as a primary step to validate the model, especially residual plots. These can in fact be considered a classical type of checking plots within the wide group of graphical posterior predictive tools. Since there are two simultaneous processes considered in our model, the analysis of the subject-specific (conditional) residuals must be separately performed for both approaches. 5

Residuals of the longitudinal submodel
For the longitudinal part, the analysis of the residuals is focused on a count-response model, so normality and homoscedasticity in the residuals is generally not observed. Instead, the corresponding scatterplot of the residuals vs the fitted values is expected to exhibit a nonhomogeneous configuration, with the data being concentrated around as many nearly parallel curves as there are possible integer responses in the observed data y i . An appropriate assessment of longitudinal residuals cannot therefore be directly conducted; a different procedure is necessary to achieve continuous residuals which allow for interpretable results. Going through this idea, Dunn and Smyth 39 define the so-called randomized quantile residuals, which enable the obtainment of continuous residuals in the case of non-Gaussian responses. To obtain this type of residuals, let us first consider the theoretical cumulative distribution of the current data, F{y i (t) | b i , y }, which in turn can be viewed as a continuous random variable distributed as U(0,1). The subject-specific quantile residuals are therefore defined by where Φ −1 refers to the inverse cumulative distribution function of the standard normal model, and r q, if y and b i are consistently estimated. We next introduce a randomization process to derive continuously distributed The above simulation-based strategy now allows us to obtain continuously distributed residuals for each response observation, and thus perform the usual residuals assessment based on the Q-Q plot. Figure S12 in Appendix E (left panel) displays the residuals assessment for our ZINBP longitudinal outcomes, where most of the plotted points tend to be concentrated around the 45-degree reference line, that is, the uniform cumulative distribution function. There are, however, two small groups of data for which the corresponding empirical quantiles are clearly larger than the theoretical ones. Specifically, the arrangement of points at the lower-left corner is probably due to the lack of left tail in the distributions of both our original and replicated datasets when comparing them to the Gaussian ones, whereas the group of points at the upper-right corner suggests a heavy right tail in the aforementioned distributions as a consequence of some extreme values.

Residuals of the relative risk submodel
The residuals diagnostic for the survival part is assessed here through the Cox-Snell residuals: 40 The basic idea behind this type of residuals comes from the principle that if the assumed model fits the data accurately, then and consequently, We can therefore assess the fitting of the survival part by examining if the Cox-Snell residuals approximately follow an exponential distribution of mean one. To properly account for left truncation and right censoring in the residuals' values, we compare the  exp (t) = exp(−t) curve with the Kaplan-Meier (KM) estimates of the survival function associated with the set of residuals r cs,i {t| i (t), t }. In our case, such residuals are obtained from the posterior mean estimates : We can use the Bayesian computational tools to obtain a large number of replicates for the Cox-Snell residuals, thus including their corresponding KM survival curves in the final comparison. The results are shown on the right panel of Figure S12 in Appendix E, where the survival curve estimates are approximated through  KM (t) ≈ a exp(bt) + c, with a, b, c ∈ R.

DISCUSSION
Seeking a better understanding of mortality risk evolution among elderly beneficiaries of private care, we have simultaneously tackled different statistical challenges which may arise within the joint modeling framework. Specifically, our article sets itself apart from previous works by combining different aspects into a single estimation procedure. We have proposed a Bayesian joint model where the relationship between the longitudinal and time-to-event processes is flexibly modeled over time through cubic P-splines, while also accounting for the special features of each process. Assessing the longitudinal response involves tracking data with count rates that are affected by both an abundance of zeros and an inherent heterogeneity among subject-specific measurements. The hierarchical ZINBP regression model has been found to offer an especially optimal level of generality. In the survival process, the usual right censoring is accounted for, along with left truncation. This situation typically arises when individuals are subject to a determined age threshold for study entry, and the subsequent delayed entry bias can be properly addressed by translating each subject's starting point to the corresponding age above the aforementioned threshold. Moreover, an accurate estimation of the baseline hazard function is also proposed using a P-spline-based approach. However, based on the researcher's previous knowledge of the data, this assumption may be eventually simplified to use an adequate survival model which has built-in smoothness.
A number of health care databases include survey information that is ideal for assessment via a joint model which combines a time-varying association with the aforementioned features. The proposed JMTV-ZINBP is exemplified using the relationship between the yearly counts for emergency health care usage and the mortality risk for a relatively large sample of policyholders over 65. Although it is broadly accepted that senior policyholders' morbidity increases with age, precisely determining the impact of health deterioration (measured here by an increase in claims, reflecting a rise in emergency care needs) on mortality risk projections at a specific age over 65 is not something that can be done intuitively. In contrast with the stationary hazard ratio for a one-claim increase provided by a traditional joint model approach, our time-varying joint model quantifies the dynamic nature of the true relationship between emergency demand and risk. The association shape obtained from our empirical data shows it to depart from a stationary relationship, exhibiting significant variation across the ages from 65 to 100. Concretely, the emergency care requirement turns out to be much more defining in the mortality risk for subjects under 75 years old, and not so much for ages around 85, where the aging process itself entails the need for greater medical attention. At ages above 95, the relationship between emergency care requirements and mortality risk is again higher than the constant parameter association obtained under a traditional joint model approach. This means that, as in the 65 to 75 age category, the emergency claims at 95 to 102.5 are more meaningful in terms of risk, albeit for different reasons. In this latter age group, human biology itself establishes immediate limits on the remaining life expectancy, so requiring emergency care services becomes very critical indeed in this age range. To conclude, our joint model allows for the estimation of the specific age-related hazard ratio for any age above 65. This provides a genuine output in private health care, supplying professionals with an initial measure for determining who is likely to require more emergency attention and thus allowing for the optimal assignment of policies over time.
A numerical study has been performed to test both the consistency of our general model and its advantageous features when compared to simpler joint model alternatives. In addition, a Bayesian model assessment has been performed by comparing the predictive distribution to the observed data, along with using residual plots related to both longitudinal and survival submodels. This makes headway in applying the proposed joint model in a wide range of potential studies focused on assessing the time-changing effect of a zero-inflated discrete outcome on the hazard process. Nonetheless, despite the unified modeling framework provided, further research remains on the agenda. Firstly, it would be very interesting to be able to estimate how many policyholders do not reach the age of 65 in our study due to major health problems. These missing subjects might ultimately lead to a bias in the estimates for both the longitudinal and time-to-event parameters, 41 inasmuch as healthier individuals from our target population are more likely to be observed. Although we expect that the number of policyholders who die before 65 is not very high, we do not currently have this information available, so this issue has not been addressed. Secondly, it would be extremely useful to include multiple longitudinal outcomes in the joint model, [42][43][44] in order to allow for discerning between the three analyzed reasons for emergency medical attention. Thus, insurers could then identify the types of emergency claims which more decisively contribute to increases in the hazard risk. Delving deeper into this idea, an individual could then be assigned multiple health behaviors over time based on the amount and type of claims experienced, therefore extending survival techniques to model multistate event history data. More concretely, the observation times between sequentially observed claims could be efficiently modeled using the hazard risk information, through the methodology applied in Richards. 45 A third important extension to this work would be rooted on the idea that our data most likely masks a certain number of unknown subpopulations or latent classes; 46 we could then focus on identifying how many individuals belong to each class in order to optimally fit the data. Implementing the second and third points of improvement would not only entail methodological changes, but would also considerably increase the processing time for the joint model, and these have therefore remained beyond the scope of the present study. In connection with this, several authors have pointed out that the computation time may become the main constraint when applying Bayesian joint models. 47 For relatively large datasets such as ours, a key point of enhancement would be exploring alternative Bayesian procedures due to the associated computational complexity. For instance, in some fields of investigation, one might be interested in analyzing the main longitudinal response over periods of time shorter than a year, using monthly, weekly, or even daily count rates. From a conceptual point of view, there is no methodological constraint that would prevent the use of these interval lengths. In our case, however, this would entail splitting each of the current longitudinal responses into as many measurements as required for the time frame chosen. In practice, this could lead to unfeasible computational times.