Radiology has been defined by technologic innovations and advancements focused on disease diagnosis and therapeutic interventions. Conduct and dissemination of research evaluating the utility of imaging modalities and their applications have been and will continue to be vital components for guiding clinical recommendations and developing practice guidelines. Knowledge of basic statistical concepts will help the practicing radiologist to critically evaluate the literature and to make informed clinical decisions—the tenets of evidence-based radiology. Similarly, the appropriate use of statistical methodology and interpretation thereof are essential for conducting scientifically rigorous studies. However, introduction to research methodology, in general, and statistics in particular, is limited over the course of radiology training. The clinical burden of radiologists, limited nontechnical resources centered on radiology research, and the lack of willingness to learn greatly impede future attainment and continued development of these skills
Variables can be classified as either continuous or categoric and regarded as quantitative or qualitative measures, respectively. Continuous variables have equal intervals between values, such as height, weight, or the number of CT examinations performed for a patient during a specific hospital visit. In contrast, categoric (or discrete) variables can only take on predefined values, which are known as categories or levels. Categoric variables can further be classified as to whether they are nominal or ordinal in nature. Ordinal variables have a natural order, such as the increasing degrees of a pain scale. Nominal variables, such as sex, have no such intrinsic order. A binary (or dichotomous) variable is a categoric variable with only two values.
Sensitivity and specificity are the most commonly used statistical measures for comparing accuracy of a new imaging modality with a reference standard and to other imaging examinations. Sensitivity is the ability of a test to correctly identify an outcome among patients who have been diagnosed with that outcome using a reference standard approach. Specificity is the ability of a test to correctly classify healthy patients among those patients who were classified as nondiseased as a result of a reference standard test.
Analysis of categoric data is most often conducted with the chi-square test to determine whether the distribution of proportions is equal or different, with the analytical objective of comparing the observed number of observations with an expected number of observations. The chi-square test is so named because the chi-square distribution is used to determine the likelihood of the observed distributions. Many generalizations of this method have been extended to a host of situations, including comparisons of more than two groups. For smaller sample sizes (defined as whether any outcome group has fewer than five observations), the Fisher exact test should be used. For the Fisher exact test, an exact probability is calculated for the likelihood of the observations and is not based on a distribution.
Correlation is used to measure the strength and direction of a linear association between a continuous exposure and a continuous outcome. Correlation is measured on a scale between −1 and 1, with 0 representing no linear association, and −1 and 1 representing perfect negative and positive linear associations, respectively. One of the main limitations of the correlation is the inability to measure nonlinear associations.
For comparing continuous variables between two groups when the outcome is not normally distributed or with small sample sizes, paired and unpaired comparisons can be made using the Wilcoxon signed rank test and the Mann-Whitney U test, respectively. Importantly, in these tests, the means between two groups are not being compared; rather, these tests compare medians between the groups.
Nonparametric ANOVA, commonly referred to as the “Friedman” and “Kruskal-Wallis” tests, can also be used for paired and unpaired measurements, respectively, for more than two group comparisons. Inference for small sample sizes is conducted in a similar fashion to large sample methods because each is testing whether there is no difference between groups.
When comparison of means among more than two groups is of interest, analysis of variance (ANOVA) is most often used. The results of ANOVA will only show whether there is a difference between any two groups and is not a test of which specific groups differ—that is, this test determines whether all of the group means are equal. Repeated measures ANOVA can be used for dependent comparisons, and interpretation follows in a similar manner. To determine which groups differ from one another, post hoc testing between groups is required. However, performing many statistical tests or multiple comparisons can result in significant findings through chance alone. In these instances, a corrected significance level should be used.
Extending the applications of sensitivity and specificity, receiver operating characteristic (ROC) curves are often used to compare the accuracy of diagnostic techniques and to account for variability between and among readers (who are generally radiologists in the imaging literature). This variability is often attributable to the individual subjectivity of the reader. The ROC curve plots the sensitivity versus 1 − specificity, or the true-positive rate versus the false-positive rate. In this manner, the subjective nature of the readers can be considered when comparing diagnostic techniques. Evaluation of the ROC curve is performed by calculating the area under the curve (AUC), with higher values indicating better diagnostic accuracy. Thus, these tests can be directly compared on the basis of their AUCs.
There are two different approaches to determine inclusion of covariates in models. In an “a priori approach,” researchers use the existing literature to identify variables that could potentially influence the association between the exposure and outcome. Covariate selection is therefore evidence based. In “step-wise regression,” a data-driven approach for variable selection, the researcher begins with a set of variables that have been collected as part of the study and inclusion of covariates in the final model is determined by statistical significance, overall model fit, or a combination of the two. Fundamental to interpretation, multivariable analyses will only include subjects with data recorded for all variables—that is, the final analysis is a random subsample of all patients. Therefore, those individuals who have missing covariates will not be included, which could potentially introduce bias if the omitted subjects differ from those included
The most commonly encountered multivariable analysis for the evaluation of the association between an exposure (continuous or categoric) and a continuous outcome is linear regression. As reflected in the name of this analytical approach, the underlying assumption behind the use of this model is a linear relationship between the exposure and the outcome of interest. We will thus be concerned with finding the form of the best fitting linear line and not the strength as is the case in correlation. Any other relationship pattern—quadratic, U-shape, sigmoid, and so on—is not well evaluated using this approach.
When only two categoric variables are evaluated, chi-square analysis is usually performed. When a dichotomous outcome and more than one covariate are considered, logistic regression is the most commonly used multivariable analytical approach, which allows one to simultaneously consider the effects of multiple variables, including mixtures of categoric and continuous variables. Under this scenario, the association between a binary outcome and multiple continuous or categoric exposure variables is evaluated. The point estimate from logistic regression is called the “odds ratio” (OR), which is the ratio of the odds of an outcome in the exposed group to the odds of the outcome in the nonexposed group. The difference between probability and odds is that: note that the logistic model infers a linear relationship between the predictor of interest on the log (odds) scale.
When the outcome is a continuous variable with skewed distribution, using linear regression could result in misleading results. One solution is categorizing the contiguous outcome to two categories and to use a logistic regression model. The disadvantage of this approach is the loss of potentially valuable data in the process of categorizing the outcome. The more appropriate negative binomial and Poisson regression analyses are two of the most commonly used methods for dealing with skewed continuous outcomes. Both of these models are considered parametric models, with Poisson analysis a special case of the negative binomial; this also assumes that the mean and variance are equal. Detailed discussion regarding the pros and cons of using linear, Poisson, and negative binomial regression for analysis of continuous outcome can be found elsewhere. The point estimates in both the negative binomial and Poisson regressions are interpreted as incidence rate ratios or the percentage difference in outcome for a one-unit change in the main exposure of interest. Many extensions of this model are available to accommodate different forms of count data
Survival analysis, or time-to-event analysis, is commonly used in the literature to evaluate the potential influence of a new imaging modality on patient outcome. For example, multiple studies have evaluated the influence of mammography on patient survival after breast cancer. The outcome of interest for these analyses is a dichotomous variable— did the event occur: yes/no—with consideration of the time to occurrence. Thus, survival analysis can be conceptualized as an extension of logistic regression with the inclusion of time. Survival analysis enables simultaneous consideration of the effects of multiple variables on survival time. One complication of using survival time as a variable is that most studies contain censored data, in which one does not know the exact survival time of a subject. This can happen when a subject survives past the end of the study period or when a subject is lost to follow-up.
The secular trends (increasing, decreasing, or no change) in practice patterns are often of interest to resource planners. Several analytical approaches are available to answer these questions. The chi-square test for trend is the most basic analysis. In this unadjusted analysis, the scientific question being tested is whether there has been a linear trend in proportions over time. Similarly, a linear trend could also be investigated using multivariable linear regression to adjust for covariates. As explained earlier, assuming the outcome of interest has a linear distribution over time may be an untenable assumption. Likewise, Poisson or negative binomial regression could be used to evaluate the trend in occurrence of skewed continuous outcomes.
Considering the ubiquity of advanced methodologies in the radiology literature, analytical competency is now more important than ever. Statistical consultations will likely be required when conducting clinical research; therefore, statistical literacy will be essential for collectively designing an appropriate analytical approach specific to the scientific question. To provide continued leadership and excellence in the specialty, radiologists should assume the responsibility of showing the technologic utility of imaging tests; this onus is largely placed on well-trained academic radiologists. Lacking this, inappropriate conclusions and clinical recommendations will result from poorly conducted research.