Continuous Measurement Scale With a True Zero and Lineraity of Variable
| |
In this series, we have been learning about the use of statistics to plan, execute, and analyze our research. This module is designed to help define and categorize data into conventional measures for display and analysis. Display, or visualization, of the data is an important concept and one that is at the root of our understanding of various types of data. Before addressing which types of graphs, presentations, or analyses are useful and appropriate, we need to define exactly what type of data to analyze. In our studies, we choose different variables with which to collect the data that can be divided into two primary types by quantity or category. The quantity types are continuous (measuring) and discrete (counting), and the category types are nominal (named) and ordinal (ordered). The following section defines and gives examples of each.
Continuous Data
Continuous data are probably the least frequently reported in the radiology literature because our work has been traditionally one of dichotomous interpretation: either an imaging study successfully reveals an abnormal from a normal finding or it does not. Continuous data are found in which the data of interest exist in a quantifiable range of values that can take any conceivable value in that range. The degree of precision is based on the technology used for its measurement. Some examples are blood pressure (mm Hg), size of a tumor (cm), serum cholesterol (μg/ mL), length of an MR imaging sequence (sec), and amount of contrast material (mL). Each of these variables can have a wide range of values whose precision of measurement can vary significantly. Another way to think about continuous data is that a possible value between two other values always exists. An example would be a patient with a systolic blood pressure of 111.5 mm Hg that lies between two other patients with pressures of 111.2 and 111.9 mm Hg.
Discrete Data
A discrete variable is characterized by having only certain values (usually integers). For example, a patient can have only a whole integer representing the number of breast tumors. There are never cases of "2.7 tumors detected on a mammogram" (although a group of patients might have a mean of 2.7 tumors). Another example might be, "The study used eight radiographs for archiving the images for a study." In the previous example, it seems obvious that we use only a whole radiograph, not 7.5 or 8.5. The distinction may not be so apparent: consider WBC. Because one counts the number of cells per millimeter cubed, the data appear (e.g., 33 cells/mm3) like a ratio scale (which is discussed in the next section of this article). Because there are never partial cells, the data are defined as discrete.
Comparing Ratio and Interval Scales
Ratio scales of measurement have a constant interval size and a true zero point. If one patient has a 6-cm kidney tumor and a second has a 3-cm tumor, then we can state that the second tumor is half as large as the first. Ratio scales also include capacities (mL), volumes (cm3), rates (mL/min), weights (kg), and lengths of time (min).
Interval scale data are those derived from a measurement scale that possesses a uniform interval, but interval scale data have no true zero, as, for example, the centigrade temperature scale (degrees Celsius). Although the difference between 20°C and 25°C is the same as between 5°C and 10°C, 50°C cannot be considered twice as hot as 25°C because the zero point is arbitrary. Actually, the temperature scale of kelvin is a ratio scale, because the zero value is real at absolute zero.
Nominal Data
Nominal variables often describe characteristics, such as male and female, and are commonly used in radiologic studies. Nominal scales name the values of the nominal variable. For example, a breast tumor type could be classified as benign, malignant, or containing calcifications.
Ordinal Data
This type of data deals with comparisons that are relative, rather than quantitative. Thus, the data consist of an ordering or a ranking of measurements. When one orders the finding, then the scale becomes ordinal, even if the steps in the order are different. An example is the Kurtzke [1] expanded disability scale (0-10) for the neurologic assessment of patients with multiple sclerosis. In this widely used scale, a worsening in the patient status of one unit from 1 (minor signs) to 2 (elevated thresholds) is dramatically different from 6 (walks with assistance) to 7 (wheelchair bound). A common from used in radiology is to classify image interpretability as poor, moderate, or excellent and perhaps grade as 1, 2, and 3.
It is also possible to have exactly the same original data portrayed in several different data types. Using an example of examination marks, we can have raw marks of 97, 75, 68, and 51 (discrete data) that can be expressed as the grades A, B, C, and D (ordinal data) or pass, pass, pass, and fail (nominal data). Although this latter example appears trivial, this exact type of data reduction is common in radiology, in which a complex data set is reduced to presence or absence to facilitate the common 2 × 2 chisquare analysis of diagnostic accuracy. The problem with data reduction is that it can result in a loss of information.
Let us take the different types of measurement in turn and examine exploring, summarizing, and presenting each type.
Continuous data have no discrete divisions between elements apart from those imposed by our measuring technique. Some examples are time, the size of a tumor, and blood pressure. Table 1 lists the time taken for a bolus injection of radiographic contrast material to reach a maximum in the kidney with a range of 8-28 sec for 30 patients. These data are raw in the sense that they are unadulterated, unmodified, and untransformed. Time is a continuous measurement, which can take any value whatsoever, but the precision of its measurement is dependent on our measurement tool (wall clock vs stop watch, accurate to a millisecond). By the established rules of science, a reported time of 21 sec is actually all times from 20.5 sec up to and including 21.4 sec. The next section illustrates a variety of ways of exploring these data.
TABLE 1 Contrast Agent Transit Time for Maximal Renal Enhancement
A preliminary and easy way to look at continuous (raw) data is to use the "stem-and-leaf" plot. Although likely unfamiliar to the radiologist, it is easy to construct without computerized graphing packages and shows the distribution of the data in a rudimentary way. The common "stem" is along the left for each decade (0 for units = 0-9, 1 for teens = 10-19, and 2 for twenties = 20-29), and the different values are sorted by increasing values in the second column (Table 2). Most values are in the decade from 10-19, and there are no values exceeding 28 sec. This plot style would clearly identify a highly unusual value (84 sec) from a large number of points—for example, if a value of 8 in the stem and 4 in the leaf were seen. Although a stem-and-leaf plot can allow an easy appreciation of a data set, the details of the distribution are missing.
TABLE 2 "Stem-and-Leaf" Plot of Contrast Agent Times
To obtain a more detailed examination of our example of enhancement time, we created a dot plot that shows the frequency of occurrence of any individual data values (Fig. 1 ). In a way analogous to the stem-and-leaf plot, the dot plot uses a stem value for each unique time value in the whole set. Then, a single dot is plotted for each occurrence of that value in the data set—for our example, one dot for 8 or 9 sec and 4 dots for 16 sec. Although possibly also unfamiliar, this method is another way to picture the raw data and is analogous to a histogram for each time point. In this type of plot, each data point simultaneously shows the actual value, occupies space, and represents one counting unit. Compared with the stem-and-leaf visual, the dot plot permits a more detailed appreciation of the variability in the data and is close to a histogram (albeit one that has been stood on its end).
View larger version (7K) | Fig. 1. —Dot plot of transit time data (found in Table 1) shows each asterisk as representing actual occurrence of specific time (sec) beside it. |
The data set that is organized as a conventional histogram shows the frequency of each data value as a bar (Fig. 2 ). When the data are scattered or the data intervals are too numerous, it is customary to reduce the number of intervals, remembering that there should be enough intervals or bins to show any relevant pattern. Because the data in Table 1 consist of 30 values, one published rule is to use approximately √n (square root) intervals, where n is the total number of values [2]. With an n value of 30 and a √n value of 5.5, five or six intervals are appropriate. If we choose six intervals, then the resulting histogram shows a maximum in the 15- to 17-sec interval (Fig. 3 ). Although this reduction of the data by decreasing the number of intervals loses some of the details of the exact measurements seen in Figure 2 , the essential character of the data is illustrated, in that the maximal enhancement time values are identified in the 12- to 21-sec area in intervals containing 12-14, 15-17, and 18-21 sec. If the transit time variability is important, then you might prefer to choose Figure 2 . Conversely, if showing the typical time were your goal (say to choose an optimal imaging time), then the expression of data in Figure 3 would be appropriate. Neither choice is artificial; each emphasizes a different aspect of the data.
Having plotted our data and appreciated its distribution, we must determine three primary attributes: the center, the dispersion, and the symmetry of the data distribution.
The central tendency is the tendency of the observations to accumulate at a particular value or in a particular category. The three ways of describing this phenomenon are mean, median, and mode.
The most widely used measure of central tendency is the familiar mean, in which the calculation of the mean is simply adding all values in the data set and dividing the sum by the number of samples. This procedure yields a mean value of 17.2 sec for our time data. The mean is only applicable to ratio or interval scale data.
Another way to look at this data is to make a cumulative frequency diagram. We first convert the frequency histogram (Fig. 3 ) to a cumulative frequency table (Table 3) and then plot Table 3 as the final cumulative frequency diagram (Fig. 4 ). The conversion is started by listing the number of occurrences for each interval under the interval values in Table 3. Then we calculate the cumulative frequency for each interval as the total frequency of that interval, plus the frequency of all lower intervals. For example, the cumulative frequency for the interval 18-21 sec is the actual frequency (seven occurrences) plus the total frequency in all smaller intervals (n = 18) to yield 25. It is also possible to convert the raw frequency histogram to the cumulative frequency diagram in an entirely analogous way using the individual data values rather than the intervals.
TABLE 3 Cumulative Frequency of Contrast Agent Transit Times
View larger version (8K) | Fig. 4. —Graph shows distribution of enhancement data converted to cumulative data (see Table 2). Conversion from histogram format permits easy visualization of quartiles; Q = third quartile, M = median, q = first quartile. |
The cumulative frequency diagram provides the investigator with an opportunity to visualize three important measures of the data: the first quartile (q), the median value (M, or the second quartile), and the third quartile (Q). The median is the middle value from the data set. Because there is an even number of observations (n = 30), we take the 15th and 16th values from a list of the data with increasing values (16 and 17 sec here) and take the average, which is 16.5 sec. The median divides the data into two equal parts (by the number of observations); the quartiles divide each of these halves into two or four parts total. The values of q, M, and Q can help to show whether the data are symmetric in the interquartile range, which happens if the M—q and Q—M ranges are approximately equal. This determination of interquartile ranges is our first introduction to measures that characterize the dispersion or spread of the observed data.
Imagine that the histogram illustrated in Figure 3 could be physically weighed instead of occupying some space in a plot. The mean can be conceptually thought of as dividing the histogram into two equal parts by weight, whereas the median is simply the middle measurement in the data set. The median also expresses less information than the mean because the median is based on the rank of the individual data values (not the actual values). When the data set has many values that are low or high compared with the average, the median is less sensitive to these values and may be a preferential way to describe the central tendency. Thus, the median is insensitive to the data extremes. In our example, this insensitivity could happen if we exchanged the highest value in our set (28 sec) with a larger data point (100 sec). Although the median value would remain the same (16.5 sec), the new mean is 19.6 sec. Thus, the median retains its ability to identify a value more consistent with the spirit of the data compared with the mean, which has been increased by the extreme value.
In addition to the quartile divisions mentioned previously, the distribution can also be divided into other parts, such as percentiles (or 100 parts). A representative example of this division is the use of lethal dose 50 (LD50) from pharmacologic studies. The LD50 is actually the dose at which 50% of the experimental animals died, or the 50th percentile of lethal doses, or the median lethal dose. Similarly, q (first quartile) is the 25th percentile and Q (third quartile) is the 75th percentile.
A useful way to depict this type of data is the box-and-whiskers plot (Fig. 5 ), which is effective in summarizing the properties of a data set. The bottom and top of the box are the 25th and 75th percentiles (which are q and Q in Fig. 4 ), the line in the box is the median value (M), and the "whiskers" (looking like error bars) extend to the 10th and 90th percentiles.
View larger version (9K) | Fig. 5. —Box-and-whiskers plot (for interval data found in Table 1) shows median and percentiles as marked. Compare this graph with Figure 4 that expresses the same data with quartiles. |
The mode is another term used to describe the central tendency of a data set. The mode is defined as the most frequently occurring measurement, which is 16 sec for our enhancement data. It is possible that the data set has more than one mode. Hence, it is possible to see the descriptor "bimodal" for a distribution of data having two modes or two peaks on a plot of the data.
As seen in Figure 2 , our enhancement maxima do not all occur at the same time and are spread over a substantial range (8-28 sec). We can exactly express this dispersion or nonuniformity in the data. The most commonly used measure of dispersion for a single sample of continuous data is the SD, and, like the mean, the SD takes all the data into account. The SD is a statistical measure that expresses the average amount by which all data values in the set deviate from the mean value: the smaller the differences, the smaller the deviations, and the smaller the SD (and vice versa). For our data set, the mean is 17.2 sec with an SD of 4.7 sec.
If we can assume that the data we collect is normally distributed, the SD has some useful interpretations. For example, 68% of all observations will lie within ± 1 SD of the mean value. Ninety-five percent of the data lies within ± 2 SDs, and 99.7% lies within ± 3 SDs of the mean. Hence, the SD is approximately one sixth of the total data range for a normal distribution.
The mean and SD of a normally distributed data set tell us about the internal structure or internal proportions. Another term that is often seen is the standard error. The SD of the means of many samples from the same population is called the standard error. The standard error depends on the sample SD, the number of samples, and the proportion of the population in the sample. These three statistical measures—mean, SD, and standard error—are used to determine whether two experimentally determined samples are from different populations. When we compare samples, we are applying a test of significance. "Statistically significant" may not equate to "interesting" or "important."
Tables are effective for the presentation of ordinal data. Table 4 illustrates an example of the reporting of vessel conspicuity for different visualization techniques: digital subtraction angiography, contrast-enhanced time-of-flight MR angiography, three-dimensional time-of-flight MR angiography, and dynamic MR angiography. The ordinal scale is partial visibility to excellent visibility in four steps represented in the table by "+" to "+++" in an intuitively obvious way.
TABLE 4 Comparison of Vessels Revealed on Digital Subtraction Angiography (DSA) and MR Imaging Techniques
Proportions and rates are descriptive parameters for a population that can be estimated from a sample. Rate is the occurrence of a particular event in a sample and is given as a percentage. Table 5 shows an example in which the number of events (and rate as a percentile) is listed for four possible categories of neurologic outcome resulting from carotid artery stenting. "Proportion" is a descriptor that is applicable to categoric data. A stacked bar chart permits visualization of the proportions of three measures in three different patient groups (Fig. 6 ). In radiology, we frequently use a common statistical test (chisquare) to determine whether the rate or proportion of observations is different in two or more populations.
TABLE 5 Complications Associated with Stenting of the Carotid Artery
View larger version (11K) | Fig. 6. —Bar chart shows proportion of patients in three treatment groups who were found with no change in size of prostate (black bar), enlargement (white bar), or decrease in size of prostate (gray bar). Note proportion of patients in each classification in each of three differently sized groups. |
At times, we take two simultaneous measurements of our study population for the purpose of determining whether a relationship exists. In some instances, the measurements are taken to establish a pattern in the data (e.g., body weight and X-ray attenuation) or to search for an easy-to-measure surrogate marker for a hard-to-measure value (e.g., to measure the amount of iodinated contrast agent in a solution using its optical absorbance).
A scatterplot is the first step in examining the relationship between two sets of measures. The correlation coefficient (r) measures how close the relationship between two measurements is to linearity. The maximal values for r are 1 or -1, and the two variables can be positively or negatively correlated. If the two variables show a nonlinear relationship (e.g., parabolic), then r equals zero, even though a strong relationship exists. The two calculations for correlation coefficient are Pearson's product moment correlation for normal data, and for ordinal data, Spearman's rank correlation.
When a correlation coefficient is used, three steps should be adhered to: first, plot the raw data in a scatterplot; second, observe whether a relationship exists between the variables; and third, if the data suggest a linear, but not a curvilinear relationship, then calculate r. The problem with correlation calculations is that correlation can be confused with causality, and caution should be used about such an interpretation. The possibility of an indirect relationship, via a third and unmeasured variable, should be eliminated. It is up to the scientist to prove that these third variables have no effect on the observed correlation. Another caution is that Pearson's correlation coefficient is only dependable when the two compared variables are normally distributed because an outlier point can dominate the correlation.
In interpreting the strength of a correlation coefficient, we found no common consensus on the scale descriptors. A useful published example of descriptors might be: 0.0-0.2, very weak or negligible; 0.2-0.4, weak or low; 0.4-0.7, moderate; 0.7-0.9, strong, high, or marked; 0.9-1.0, very strong or very high [3].
Plotting data sets in scatterplots (Fig. 7A ,7B ,7C ,7D ,7E ,7F ) permits us to visually evaluate the data, and we can predict the outcome of an analysis of the correlation coefficients. The data in Figure 7A would have a good correlation, which is supported by a Pearson's test yielding an r value of 0.864 (strong correlation). Figures 7B and 7C are obviously linear and have an r value of 0.99 and an r value of -0.99 (very strong correlation). Figure 7D is somewhat ambiguous. However, r is equal to -0.549 and thus a moderate correlation exists. The data in Figure 7E are clearly related, but because the relationship is nonlinear, r is equal to 0.078. Even Figure 7F has a higher correlation coefficient, an r value of 0.247, than Figure 7E . A look at the correlation values alone for these data sets would suggest that the data in Figure 7E had no relationship, whereas the data have an interesting one that is immediately visible in the scatterplot.
View larger version (11K) | Fig. 7A. —Scatterplots for six data sets show different data distributions. Pearson's product moment correlation coefficients for data sets are as follows: 0.864 (A) |
View larger version (10K) | Fig. 7B. —Scatterplots for six data sets show different data distributions. Pearson's product moment correlation coefficients for data sets are as follows: 0.991 (B) |
View larger version (11K) | Fig. 7C. —Scatterplots for six data sets show different data distributions. Pearson's product moment correlation coefficients for data sets are as follows: -0.992 (C) |
View larger version (12K) | Fig. 7D. —Scatterplots for six data sets show different data distributions. Pearson's product moment correlation coefficients for data sets are as follows: -0.549 (D) |
View larger version (11K) | Fig. 7E. —Scatterplots for six data sets show different data distributions. Pearson's product moment correlation coefficients for data sets are as follows: 0.078 (E) |
View larger version (16K) | Fig. 7F. —Scatterplots for six data sets show different data distributions. Pearson's product moment correlation coefficients for data sets are as follows: 0.247 (F) |
When the scatterplot of the data for two variables looks like a linear relationship exists, then it is tempting to try and describe the relationship as linear and calculate the relationship between them using linear regression. This approach compares a dependent variable (y) in relation to an independent variable (x), which yields the familiar y = mx + b, where m is the slope of line and b is the y-intercept (when x = 0). Our hypothetic example shows the plot of raw data, regression line, and 95% confidence limits (Fig. 8 ). The difference between correlation and regression is that in a correlation, neither variable can be fixed, whereas in regression, one measurement is a variable (y) and depends on the other (x). Often, the value of x is assumed to be fixed, is capable of observation without error, and is normally distributed. Should there be no logical argument to define one variable as dependent and the other as independent, then the solution is to use a calculation of correlation and avoid the concept of dependence altogether. The importance of confidence limits should not be underestimated, either here with regression [4], or elsewhere with statements of sensitivity and specificity [5] or proportions and rates. For example, if we claim no side effects from contrast injections in 20 patients (rate = 0%), the upper 95% confidence limit of the rate of occurrence is actually 19%.
View larger version (11K) | Fig. 8. —Graph shows hypothetic data set (•) with linear regression (solid line) and 95% confidence intervals (dashed lines) plotted. Note that confidence intervals permit appreciation of strength of regression. r 2 = 0.927, slope (m = 1.28), and x-intercept = -0.286. |
Sensitivity and specificity are ratios fundamental to the radiology discipline. They relate the ability of an imaging technique to reveal disease when present (sensitivity) and to rule out disease when absent (specificity). The numbers are generated using the familiar 2 × 2 table, which we have seen previously in this series [6], for proportions used to compare diagnostic determination (presence or absence of disease) with a standard of reference. The better the latter (e.g., surgical confirmation), the more valuable and accurate the diagnostic measurement will be. Although the analysis of a 2 × 2 contingency table has been shown previously in this series of articles, we will use the example in Table 6 to calculate these values. Sensitivity is a(a + c), which is equal to 653/730 or 89%; specificity is d / (b + d), which is equal to 1400/1537 or 92%. Missing from most reports in the radiology literature is the confidence interval based on the binomial theorem [7]. There are a few key questions to consider when evaluating sensitivity and specificity values: Was there an independent and blind comparison with the standard of reference? Was the diagnostic test evaluated in a group of patients appropriate to the target population? Was the standard of reference applied regardless of the diagnostic test [7]? Both negative and positive predictive values can also be calculated from the 2 × 2 table, as well as prevalence, pre- and posttest odds, likelihood ratios, and posttest probability. Usually, these statistical measurements are portrayed in simple tables or in the text of an article. It is useful to show all 2 × 2 contingency tables because it is then possible for the reader to calculate all these values. Even when the 2 × 2 is expanded into a receiver operating characteristic analysis (to be described later in the series), the relevant measure (usually area under the curve) can be expressed in table format with the appropriate confidence intervals.
TABLE 6 Sample Contingency Table
The purpose of this article was to define the different variables that radiologists routinely use to describe their data. Categoric and continuous data types were identified, and suitable graphs and tables were shown to depict the findings in an informative and succinct manner. Continuous data and measures of central tendency and dispersion were shown. The relationship between two variables was determined by the correlation coefficient with a consideration of the caveat that correlation should not be confused with causality. The familiar 2 × 2 contingency table and derived values were explored. Identifying variable types and choosing their appropriate displays should be a more straightforward task after studying these examples.
Address correspondence to S. J. Karlik.
Series editors: Craig A. Beam, C. Craig Blackmore, Stephen J. Karlik, and Caroline Reinhold.
This is the eighth in the series designed by the American College of Radiology (ACR), the Canadian Association of Radiologists, and the American Journal of Roentgenology. The series, which will ultimately comprise 22 articles, is designed to progressively educate radiologists in the methodologies of rigorous clinical research, from the most basic principles to a level of considerable sophistication. The articles are intended to complement interactive software that permits the user to work with what he or she has learned, which is available on the ACR Web site (www.acr.org).
Project Coordinator: Bruce J. Hillman, Chair, ACR Commission on Research and Technology Assessment.
1. Kurtzke JF. On the evaluation of disability in multiple sclerosis. Neurology 1998; 50:1961-1970 [Google Scholar]
2. Clarke GM. Statistics and experimental design . London: Edward Arnold, 1994: 7 [Google Scholar]
3. Rowntree D. Statistics without tears . London: Penguin, 1991: 170 [Google Scholar]
4. Glanz SA. Primer of biostatistics . New York: McGraw-Hill, 1992: 211 [Google Scholar]
5. Harper R, Reeves B. Reporting of precision of estimates for diagnostic accuracy: a review. BMJ 1999; 318:1322-1323 [Google Scholar]
7. Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine . New York: Churchill Livingston, 1997: 118-128 [Google Scholar]
Source: https://www.ajronline.org/doi/10.2214/ajr.180.1.1800047?mobileUi=0
0 Response to "Continuous Measurement Scale With a True Zero and Lineraity of Variable"
Post a Comment