About Elsevier/IMNG Clinical Endocrinology News Subscribe to UPDATE: The Newsletter of Clinical Endocrinology News Your Account
 
September 04, 2010 Email this Page Print this Page
Advertisement
Breaking News
Go for the Silver? Using Simulated Randomized Trials
By Mark S. Lesney
July 28, 2010

Although randomized clinical trials remain the “gold” standard for medical knowledge, they share with the precious metal limitations of cost and rarity. With the dramatic push for outcomes-driven medicine and initiatives based on quality of care, there may not be enough “gold” to go around. Is it time for a “silver” standard to supplement the system?

Simulated randomized trials might be that silver standard, said Dr. Eugene H. Blackstone of the Cleveland Clinic Foundation in an interview. He and other researchers are determined to craft an alternative set of trials to supplement randomized clinical trials.

Some randomized efforts are too expensive, too lengthy, or too lacking in potential patient volunteers to ever be accomplished, and simulated trials could compensate for the gaps in medical knowledge. The first paper dealing with the subject, according to Dr. Blackstone, was a seminal article in the early 1980s by Dr. Paul Rosenbaum of the University of Wisconsin, Madison, and Dr. Donald Rubin at the University of Chicago (Biometrika 1983;70:41-55). However, it was not until the 1990s that interest in using an expanded set of statistical tools for comparisons between nonrandomized patients really took off.

The concept of simulated clinical trials is to utilize powerful statistical methods to analyze clinical databases and completed trials for unmined riches. A propensity score is used to sort and segregate patients in a clinical or trial database into analytical silos, where they can be functionally treated as a series of smaller, randomized trial populations or as matched cases.

How does propensity scoring work? Dr. Blackstone explained it this way: When sorting apples and oranges, the propensity score of each piece of fruit provides an estimate of its propensity toward (probability of) belonging to one group (apples) vs. another (oranges) (J. Thorac. Cardiovasc. Surg. 2002;123:8-15).

This is first done by propensity modeling: picking a set of variables that incorporates “everything recorded that may relate to either systematic bias or simply bad luck.” An example is a prospective, nonrandomized observational cohort study of the benefits of aspirin use in patients with coronary artery disease performed by Dr. Patricia A. Gum, Dr. Blackstone, and their colleagues at the Cleveland Clinic Foundation. In this study, the aspirin-treated patients were older, more likely to be men, and more likely to have a history of hypertension and other comorbidities than were the non–aspirin users. The aspirin users also took more ancillary drugs, such as beta-blockers and angiotensin-converting enzyme inhibitors. All these differences probably resulted from physician bias as to which patients should be treated with aspirin for particular conditions (JAMA 2001; 286:1187-94).

One question for which a randomized clinical trial would be optimum, but not available, is “which known or suspected coronary artery disease patients have lower all-cause mortality with aspirin treatment?“

Simple univariate analysis showed no association between aspirin use and mortality. However, adjustment for a wide variety of demographic and clinical variables showed that aspirin use was significantly associated with reduced mortality (P = .002), with a hazard ratio (HR) of.67. Further analysis of the characteristics of patients who most benefited from aspirin could not be conducted in a significant fashion.

Because aspirin use was not randomly assigned in this patient population, the research team knew they had to account for potential confounding factors and selection bias that made the patients appear to be apples and oranges, not easily compared.

Their solution was to create a propensity score for the population, which modeled the likelihood of any particular patient being treated with aspirin. This was done by selecting variables that appeared to be associated with aspirin treatment: age, sex, clinical history, medication use, cardiovascular assessment, and exercise capacity. Once the final model was created, it contained 34 covariates.

Each patient was “run through” the model to determine his or her propensity score (i.e., the likelihood, based on demographic and clinical variables, that the person would have been treated with aspirin, independent of whether actual treatment had occurred). For example, Patient A, a nonsmoking woman in her early 50s with a fairly good exercise capacity, would have a lower likelihood of getting aspirin treatment. Patient O, a man over age 60 and a smoker with poor exercise capacity, would have a high likelihood of getting aspirin treatment. It doesn’t matter to the method that the cohort data show that both patients actually got aspirin treatment. The important thing for propensity scoring is their statistical likelihood of having received it.

Once a propensity score is obtained, comparisons can be made in any of three ways, according to Dr. Blackstone. First is simple matching. Dr. Blackstone explained the approach in the Journal of Thoracic and Cardiovascular Surgery: “A patient is selected from the control group whose propensity score is nearest to that of a patient in the case group. If multiple patients are close in propensity scores, optimal selection among these candidates can be used. Remarkably, problems of matching on multiple variables disappear by compressing ‘everything known about the patient’ into a single score!” (2002;123:8-15).

The second way comparisons can be made is stratification, or subclassification into roughly equal-sized groups, according to Dr. Blackstone. In the aspirin example, stratified patients could be divided into five subgroups, or quintiles, based on their calculated propensity scores. In the given scenario, patient A would probably be assigned to the first quintile and patient O to the last. Within each group there would be an important division: patients who actually received aspirin treatment vs. those who did not. But within each quintile, other patient characteristics – from sex to comorbidities to medications – would be the same because of the matched propensity scores, in effect making each quintile a mock randomized trial.

This method of comparison has one obvious problem, said Dr. Blackstone. Since the propensity score was derived from the likelihood of receiving aspirin, there will obviously be more patients in the higher quintiles who received aspirin than in the lower quintiles. Balance in patient characteristics was obtained at the expense of balance in the number of patients receiving each treatment in each quintile. In the study by the Cleveland Clinic group, there were 113 patients who received aspirin in the first quintile vs. 1,092 who did not, compared with 1,045 patients in the fifth quintile who received aspirin vs. 261 who did not.

According to Dr. Blackstone, this means that although an analysis can treat the patients in each quintile as if they were a randomized population according to their lack of statistically significant differences in pertinent clinical or demographic characteristics, the populations have to be considered unbalanced by size (as if a clinical trial were originally crafted for a randomization of 2:1 of treatment A to treatment B). It is within each of these quintiles that comparison of outcome (in this case, all-cause mortality) for the chosen variable (aspirin use) is made.

With such analysis in the original paper, Dr. Gum, Dr. Blackstone, and their colleagues determined that aspirin use was significantly associated with a lower risk of death (HR .56; P less than .001).

And the third way propensity scores can be used is in a multivariable analysis of outcomes. “Such an analysis includes both the comparison variable of interest [age, sex, etc.] and the propensity score,” Dr. Blackstone explained. “The propensity score adjusts the apparent influence of the comparison variable of interest for patient selection differences not accounted for other variables in the analysis.”

In this way, the Cleveland Clinic team was able to determine that the primary characteristics associated with the greatest aspirin-related reduction in mortality were older age, known coronary artery disease, and impaired exercise capacity – much more clinically relevant information than obtained from the original multivariate analysis.

“How does this differ from what we have always done? Most of the time you find that those two types of analyses [multivariate risk analysis and propensity score analysis] give similar results,” said Dr. Blackstone in the interview. “But 15%-20% of the time they don’t give the same results. We’ve never in the past had a good way to figure out have we been fooled or not fooled [by the results], but now we can do both kinds of analyses just to see if there is consistency.”

Simulated clinical trials based on propensity analysis are not a replacement for randomized trials or traditional hazard analysis, but are an important adjunct for mining of registry and trial data to obtain clinically relevant information that might not otherwise be available, Dr. Blackstone concluded.

He said that he had no conflict of interest related to any of the studies or his comments in the interview. Dr. Gum, Dr. Blackstone, and their colleagues reported no financial conflict of interest in their study.

 
For more articles, click here.
 

Terms of Use Privacy Policy Contact Us

Copyright ©2008 Elsevier/International Medical News Group
Clinical Endocrinology News Network