Setting priorities in clinical and health services research: Properties of an adapted and updated method

Objectives: The objectives of this study is to review the set of criteria of the Institute of Medicine (IOM) for priority-setting in research with addition of new criteria if necessary, and to develop and evaluate the reliability and validity of the final priority score. Methods: Based on the evaluation of 199 research topics, forty-five experts identified additional criteria for priority-setting, rated their relevance, and ranked and weighted them in a three-round modified Delphi technique. A final priority score was developed and evaluated. Internal consistency, test–retest and inter-rater reliability were assessed. Correlation with experts’ overall qualitative topic ratings were assessed as an approximation to validity. Results: All seven original IOM criteria were considered relevant and two new criteria were added (“potential for translation into practice”, and “need for knowledge”). Final ranks and relative weights differed from those of the original IOM criteria: “research impact on health outcomes” was considered the most important criterion (4.23), as opposed to “burden of disease” (3.92). Cronbach's alpha (0.75) and test–retest stability (interclass correlation coefficient = 0.66) for the final set of criteria were acceptable. The area under the receiver operating characteristic curve for overall assessment of priority was 0.66. Conclusions: A reliable instrument for prioritizing topics in clinical and health services research has been developed. Further evaluation of its validity and impact on selecting research topics is required.


Conclusions:
A reliable instrument for prioritizing topics in clinical and health services research has been developed. Further evaluation of its validity and impact on selecting research topics is required.

Keywords: Health services research, Priority-setting, Research assessment
Methods that identify research gaps in health needs and that help to set priorities for research topics can be helpful in decision making on funding for healthcare research. Prioritization is usually based on the feasibility of carrying out the research, which in turn depends on the methodological strengths of the project and the abilities and experience of the research team. The relevance of the research topic is presumably always implicitly present in decision making on funding for research, but it is rarely taken into account in a transparent and explicit manner. Some countries (12;13;15;21) and health technology assessment programs (1;27;31) have, nevertheless, developed systematic prioritization methods to optimize their investment.
Among them, the first set of criteria was developed by the United States Institute of Medicine (IOM), which recommended seven general criteria for setting priorities for the assessment of healthcare technologies. These criteria included prevalence, burden of illness, cost of managing the problem, variability in practice, potential to change health outcomes, potential to change costs, and potential to inform ethical, and legal or social issues (13). Several years later the IOM Committee on Reviewing Evidence to Identify Highly Effective Clinical Services was set to recommend an organizational framework for assessing evidence on clinical effectiveness so that consumers, clinicians, professional specialty societies, payers, purchasers, and other decision makers have independent, valid information for making healthcare decisions (14). In a review of methods to prioritize topics, the Committee concluded that no single priority setting method was obviously superior to others and they could not find any systematic assessments of the comparative strengths and weaknesses of different approaches to priority setting.
Recently, the IOM Committee on Comparative Effectiveness Research Prioritization was charged with recommending national priorities on comparative effectiveness research. The above-mentioned and other IOM previous reports provided the initial basis for the methodology and criteria used (21). The Committee recognized two levels of criteria to prioritize the comparative effectiveness research topics: (i) condition-level criteria (prevalence, mortality, morbidity, cost, and variability), and (ii) priority topic-level criteria (utility for decision making, risk associated with care, information gaps and duplication, and gaps in translation).
In the IOM and other research priority setting methods, burden of disease, variability in clinical practice, and the cost-effectiveness of healthcare interventions are among the most common criteria used to evaluate the relevance of research topics (18;24;25;28). Other aspects that are usu-ally considered but which are more difficult to quantify include the existence of social inequalities in health, translation of new knowledge into clinical practice, and social returns from the research investment (8;19). Although it is agreed that explicit criteria should be used to prioritize funding, no standard method is currently used (20). Consensus strategies (2;22;26), qualitative methods (5-7;33), and group (23) or consultation techniques with key stakeholders (10;17;), as well as economic evaluations of the usefulness of the information generated by the research (9), have all been used (16).
Catalonia, an Autonomous Community in northeastern Spain, has held a biennial call for clinical and health services research since 1996. The purpose of the call is to finance the generation and synthesis of evidence aiming at improving the quality and outcomes of healthcare by providing more and better information to support decision-making. The call is primarily financed by the Catalan Health Service, and commissioned by the Catalan Agency for Health Technology Assessment and Research (CAHTA). To identify research needs, a "call-for-research-topics" is publicly announced to healthcare stakeholders. The topics received are then prioritized by a group of experts using the relevance criteria established by the IOM Committee on Priorities for Assessment and Reassessment of Health Care Technologies (13). Criteria weights are based on an adapted version method carried out by the Service of Health Technology Assessment of the Basque Country (OSTEBA) (1). Prioritized research topics are then included in the final call for projects. Project proposals are evaluated in terms of methodology and the research team's experience using a standard peer-review system (Figure 1).
After a decade of experience, the CAHTA decided to review the "call-for-research-topics" process and evaluate its own formal prioritizing instrument. The objectives of this study were to (i) identify new criteria for priority-setting in the call for clinical and health services research topics; (ii) develop a composite priority score; and (iii) evaluate the reliability and validity of the final priority score. After being adapted, the new method should be more appropriate to assess topics in a broader area, such as clinical and health services research, than the original one designed for health technology assessment.

Design
The study was carried out within the CAHTA's 2006 call for research topics, which is described in detail elsewhere (29). A three-round expert consensus technique was added to the usual procedure to identify relevant priority-setting criteria and their relative weights. The metric properties of the instrument (final priority score) were evaluated by experts, who reviewed the topic proposals using a peer-review method, with a subset reviewed twice by the same reviewer (retest, in the third round) 3 months later.

Consensus on Priority-Setting Criteria
A modified Delphi technique was used to develop consensus regarding the criteria to be used in priority-setting of research topics and their relative importance. Participants were CAHTA's collaborators from healthcare services and universities, and CAHTA staff. The first consultation round included the original seven IOM criteria together with three additional criteria identified through a review of the literature and the minutes of previous CAHTA Scientific Committee meetings. Experts were also asked whether they thought further criteria should be included. The second round included the previous criteria plus those proposed in the first round.
In this round, experts were requested to evaluate ("totally in agreement," "in agreement," "in disagreement," "totally in disagreement") the need to include each of the criteria in the prioritization system. As recommended (30), an 80 percent consensus threshold was used to determine whether a criterion should be included or excluded in the new instrument. Experts were also invited to identify the criterion they considered the most important (rank = 1) and the criterion they considered the least important (rank = 11). The sum, means, and standard deviations of importance ranks were calculated.

Weighting of Criteria
During the third round, experts were asked to give a weight to each criterion, from 1 (minimum) to 5 (maximum), taking the criterion previously considered as the least important as reference (weight = 1). Weights for each criterion were estimated through three statistical methods. The first, calculating the arithmetic mean. The second, by using the slope from a linear regression between the arithmetic mean of the weight and the rank order value of the criterion's importance. The third, a further linear regression which took into account the variability of the scores given to each criterion. In the latter, the reference criterion was given the same value and a minimal variance of 0.05 (instead of 0) in all cases. The scores resulting from the application of the three methods were compared with those obtained by applying the IOM weights through analysis of correlation (Spearman's coefficient) and concordance (Kappa's statistics). In these analyses, topics were categorized by score quintiles.

Assessment of the Final Priority Score
The same experts evaluate topics proposed in response to the 2006 CAHTA research call applying the final consensus set of criteria (each one with a Likert-type scale choice: 1, the lowest value, to 5, the highest value). The final priority score was calculated using the weights obtained through the three different methods described earlier as the sum of products between the score given by the reviewer and the weight of the criterion. Reliability and validity was evaluated for the three weighting systems.
The reliability of the priority score (30) was assessed by examining internal consistency (Cronbach's α), test-retest stability (Pearson's correlation coefficient; CC), and interrater agreement assuming randomized effects (interclass correlation coefficient; ICC), since two experts evaluated each proposal, as usual. A value of >0.70 was considered satisfactory for all of the reliability indicators. The number of topics to re-evaluate for test-retest reliability was set at 146 assuming a 10 percent standard deviation over the score and a detectable difference of ≥3 points with an alpha risk of 5 percent and a beta risk of 10 percent in a bilateral contrast, and 20 percent loss to follow-up. Test-retest reliability was analyzed using scores for the eight criteria that had two assessments from the same expert.
Construct validity of the new instrument was assessed by correlating the final score with the global assessment of each proposal's priority level, which experts had been asked to evaluate using a four-category scale ("very low," "low," "high," "very high"). The area under the receiver operating characteristic (ROC) curve resulting from the prediction of the "very high" category was also estimated.

Description of the experts participating
A total of fifty-one experts were invited, forty-eight of whom participated in the first round and thirty-three in the third round. Between 72 and 75 percent of the experts (depending on the round) were physicians and the remainder belonged to seven other health-related disciplines. Between 40 and 45 percent were CAHTA staff and the remainder were external collaborators. The average number of topics evaluated by each reviewer was 12.2 for CAHTA staff and 6.8 for the external collaborators.

Priority-Setting Criteria
During the first round of the consensus process, a new criterion was proposed for inclusion concerning "the need for knowledge on the topic." In the second round, consensus was reached (over 80 percent agreement) on the inclusion or exclusion of eight of the criteria (Table 1), two of which were newly added criteria ("potential for translation into practice" and "need for knowledge"). One of the criteria ("possibility of clarifying ethical, legal, or social aspects") achieved a high, although formally insufficient level of agreement (77 percent), but was eventually included because it is also used in the IOM system. It was, however, considered to be the least important criterion. Criteria regarding the "potential to change health outcomes" and "burden and importance of disease" most frequently achieved the highest ranking (mean = 2.40 and 2.89, respectively). The latter criterion also had one of the lowest levels of dispersion (30 percent). The two criteria that did not reach a minimum consensus for inclusion ("financial opportunity" and "political interest") always ranked below fifth place in terms of importance (Table 1, see minimum values), and also had a low variability in score. Thus, nine criteria were included in the priority-setting instrument.

Weight of Priority-Setting Criteria
The calculation of the weights based on mean scores resulted in a gap between the criterion with a weight set at 1 and the remaining criteria ( Figure 2). Using weights derived from the regression analyses reduced this gap. Those obtained by predicting the weight using a linear model, which included the order of importance were the most harmonious. The three weighting methods correlated strongly with the scores obtained using IOM weights (Spearman's coefficient >0.90). The Kappa's statistic for the concordance analysis using score quintiles was approximately 0.61. Table 2 presents the weights for all of the criteria included, together with those used in the IOM and OSTEBA priority-setting systems. The range of values obtained is   greater than that obtained in the other two methods, with "potential to change health outcomes" being assigned a weight which was more than twice that of the criterion considered least important. The order of the criteria was also different; the criterion with the highest weight in the IOM method ("burden of illness") was here ranked in second place, and the criterion with the highest weight in the OSTEBA method ("variations in use") ranked sixth in the CAHTA system.

Final Priority Score
In the first appraisal of topic proposals, 398 assessments of 199 topics were obtained from forty-eight reviewers. In the second appraisal (third Delphi round), 113 assessments of 68 topics (74.3 percent of the 152 expected) were obtained from thirty-five reviewers. A total of forty-six topics were assessed twice by the same reviewer (test-retest). The final set of nine criteria showed a sufficiently high internal consistency for measuring priority (Cronbach's α = 0.75) and omitting criteria did not significantly affect the alpha value. The final priority score was slightly lower (−0.40) in the second appraisal (p = .297), with a test-retest correlation (CC) of 0.66. The inter-rater agreement between evaluators was high (ICC = 0.79). The correlation between the final score for each topic and the level of priority assigned overall by experts (construct validity) was moderate (CC = 0.36). The area under the ROC curve, which was used as a measure of the accuracy of predicting a "very high priority proposal," was 0.66.

DISCUSSION
The use of sound and explicit criteria to prioritize between research topics is important for any research funding agency.
In this study, we identified and weighted a set of relevant criteria for prioritizing clinical and health services research topics. The priority system proposed is initially based on the method used by the IOM for setting priorities for the assessment of healthcare technologies. Two new criteria were added and different weights reached through a consensus process which included a variety of experts from different health science disciplines. The final priority score generated by the instrument demonstrated satisfactory reliability and preliminary evidence of validity. The resulting priority system retains the condition-level criteria (prevalence, burden of illness, cost, and variability) that are present in almost any method for prioritizing either synthesis or generation of evidence and, in addition to maintaining priority topic-level criteria related to potential impact, adds criteria to the latter level (need for knowledge, and translation of knowledge into clinical practice) that have been somehow also recommended for prioritizing effectiveness research (21). Thus, changes made in the method help to extend its usefulness from the health technology assessment field to broader areas of clinical and health services research, which can make a valuable contribution to set a research-led and evidence-based healthcare policy, as already recognized in several countries (3;8;11;21;22). The two new criteria (need for scientific knowledge on a topic, and potential for translation of new knowledge into practice) may make the system more relevant to emerging problems and new situations, as well as helping to reduce gaps in knowledge. They reinforce the recommendation of taking into account the current state of evidence for designing new research (4), together with answering questions relevant to clinical and health policy decision makers (32). This, in turn, can potentially accelerate the impact of scientific advances by getting research into practice through evidence-based decision making (11). Likewise, the criterion related to the potential impact of research on health outcomes was assigned the greatest weight in the present study, compared with burden of disease (13) or the variability in medical practice (1) in the other studies. Although this difference could in part be due to differences between the studies, such as the availability of current and reliable data on the topics assessed (as recommended by the IOM) or the characteristics of the experts involved, these results may also reflect society's changing priorities, particularly given the increasing speed with which knowledge is produced and technological innovations occur, and their relevance in current biomedical research (34). This method of quantitative evaluation showed acceptable reliability. The good internal consistency and the preliminary predictive validity observed here indicate a coherent measurement system with a good capacity to discriminate between proposals on the basis of their priority levels. A weakness concerning the reliability analysis was that the retest measure was not available for one of the criteria, as it was suggested in the first Delphi round and included in the second round questionnaire. We were not able to extend the work with more rounds, but we believe that similar results would be obtained in a new concordance analysis. The agreement reached from two peer-review evaluations is moderate. Results might improve if criteria scoring was categorized using exact values (e.g. for the prevalence criterion the score of "1" may mean less than 5 percent, the score of "2" up to 15 percent, etc) as in The Netherlands (27) and partially in the United States (13), instead of scoring through a qualitative Likert scale (e.g., where the expert decides whether 15 percent of prevalence should be assigned a score of "2" or a different score).
The concept of priority measured by the instrument was derived from the opinion of policy makers and experts incorporated in the IOM recommendations (13), as well as from Catalan healthcare stakeholders through a consensus technique. The score obtained with the tool correlated moderately with the global assessment of priority that reviewers gave to each topic, providing preliminary evidence of validity. Although experts participating in the consensus came from a wide range of health science disciplines, a limitation of this method is that patients were not represented, in contrast with some other priority setting initiatives (5;33). The validity of the priority score could be further evaluated through investigation of the long-term benefits of the prioritized research or by a controlled experiment in which a subset of topics are prioritized by appraisers with and without the prioritization tool. It would also be interesting to investigate whether and how use of the system actually impacts on health needs, on effectiveness, the safety, and quality of clinical practice, or even on the efficiency and equity of the health system.

POLICY IMPLICATIONS
This work contributes to the increasing debate about strategies for setting research priorities. Despite the limitations mentioned above, the CAHTA method has several advantages, including the use of updated criteria and weights and the fact that it is a relatively agile, low-cost, participatory process that allows for priority-setting over a much wider range of topics. Its reliability and validity have also been demonstrated. The results will be applied in future prioritization procedures in Catalonia in the hope of improving the relevance and impact of the topics which are funded. Although this research priority instrument was developed in a regional context, it might well be applicable to other countries.