Dr. James Manos (MD)
August 5, 2018
Statistics at a Glance
How to read a scientific study
Image (free to use): scatterplot ms. Uploaded by the user Marius xplore (December 15, 2006). Source: Wikipedia. Link: https://commons.wikimedia.org/wiki/File:Scatterplot.jpg
A clinical trial is an experiment performed on humans to evaluate the comparative efficacy of two or more therapies (20).
Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as a classification function (41).
Sensitivity (also called the true positive rate or the recall in some fields) measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition) (41). It relates to the ability of the test to determine positive results. It is equal to the number of true positive subjects/ the number of true positive subjects + the number of false-negative subjects (1), (3).
Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition) (41). It relates to the ability of the test to identify negative results. It is equal to the number of true negative subjects/ the number of true negative subjects + the number of false-positive subjects (1), (3).
Sensitivity quantifies avoiding false negatives, as specificity does for false positives (41).
Prevalence or prevalence proportion in epidemiology is the proportion of a population with a condition (typically a disease or a risk factor such as smoking or seatbelt use). It is arrived at by comparing the number of people found to have the condition with the total number of people studied and is expressed as a fraction, as a percentage, or as the number of cases per 10,000 or 100,000 people (35). Prevalence is the total number of disease cases in a given population at a specific time (36).
Point prevalence is the proportion of a population with the condition at a specific time. Period prevalence is the proportion of a population that has the situation at some time during a given period (‘12-month prevalence’ etc.) and includes people who already have the condition at the start of the study period and those who acquire it during that period. Lifetime prevalence (LTP) is the proportion of a population that, at some point in their life (up to the time of assessment) has experienced the condition. Prevalence estimates are used by epidemiologists, healthcare providers, government agencies, and insurers (35).
Incidence measures the risk of developing some new condition within a specified period. Although sometimes loosely expressed simply as the number of new cases during some period, it is better expressed as a proportion or a rate with a denominator (37). Incidence is the extent or frequency of occurrence, especially the number of new disease cases in a population over a period (38).
Incidence proportion (also known as cumulative incidence) is the number of new cases within a specified period divided by the size of the population initially at risk. For example, if a population initially contains 1,000 non-diseased persons and 28 develop a condition over two years of observation, the incidence proportion is 28 cases per 1,000 persons, i.e., 2.8% (37).
The mortality rate measures the number of deaths (in general or due to a specific cause) in a population, scaled to the size of that population per unit of time. The mortality rate is typically expressed in units of deaths per 1,000 individuals per year; thus, a mortality rate of 9.5 (out of 1,000) in a population of 1,000 would mean 9.5 deaths per year in that entire population or 0.95% out of the total (39).
In epidemiology, the term morbidity rate can refer to the incidence or prevalence of a disease or medical condition. This measure of sickness is contrasted with the mortality rate of a situation, which is the proportion of people dying during a given time interval (40).
Standard deviation is a widely used measurement of variability or diversity in statistics. It shows the variation or ‘dispersion’ from the average (mean or expected value). A low standard deviation indicates that the data points tend to be very close to the mean. A high standard deviation indicates that the data are spread over a broad range of values (4).
The odds ratio measures effect size, describing the strength of association or non-independence between two binary data values. It is used as a descriptive statistic and is essential in logistic regression. Unlike other measures of association for paired binary data, such as the relative risk, the odds ratio treats the two variables being compared symmetrically and can be estimated using some non-random samples (21).
The hazard ratio (HR) in survival analysis is an explanatory variable's effect on an event's hazard or risk. For a less technical definition, we may consider HR to be an estimate of relative risk. The instantaneous hazard rate is the limit of the number of events per unit of time divided by the amount at risk as the interval decreases (29).
Bayes factor in statistics is a Bayesian alternative to classical hypothesis testing. The Bayesian model comparison is a method of model selection based on Bayes factors (30).
A confidence interval (CI) in statistics is an interval of values bounded by confidence limits within which the true value of a population parameter is stated to lie with a specified probability (23). The confidence interval is a kind of interval estimate of a population parameter and is used to indicate the reliability of the evaluation. It is an observed interval (i.e., it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. How often the observed interval contains the confidence level or confidence coefficient determines the endpoint.
A confidence interval with a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taking over all the data that might have been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the parameter the proportion of the time set by the confidence level.
Specifically, the meaning of the term confidence level is that if confidence intervals are constructed across many separate data analyses of repeated (and possibly different) experiments, the proportion of such intervals that contain the real value of the parameter will approximately match the confidence level; this is guaranteed by the reasoning underlying the construction of confidence intervals. A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data obtained (22).
The relative risk in statistics and epidemiology is the risk of an event (or of developing a disease) relative to exposure. Relative risk is a ratio of the probability of the event occurring in the exposed group versus a non-exposed group (26).
The weighted mean is like the arithmetic mean, the most common type of average, where instead of each data point contributing equally to the final average, some data points add more than others. The notion of weighted mean plays a role in descriptive statistics and occurs more generally in several other areas of mathematics.
If all the weights are equal, the weighted mean is the same as the arithmetic mean. While weighted means generally behave similarly to arithmetic means, they do have a few counter-intuitive properties, as captured, for instance, in Simpson’s paradox. The term weighted average usually refers to the weighted arithmetic mean. Still, weighted versions of other means can also be calculated, such as the weighted geometric mean and the weighted harmonic mean (31).
The weighted mean difference (WMD): In a meta-analysis, information to be pooled can either be dichotomous (how many patients die, say, out of a total number) or continuous (the mean cholesterol was X mmol/L, with some estimate of variance). We need to combine measures for continuous variables, where each group's mean, standard deviation, and sample size are known. The weight given to each study (how much influence each study has on the overall results of the meta-analysis) is determined by the precision of its estimate of effect and, in the statistical software in RevMan and CDSR, is equal to the inverse of the variance.
This method assumes that all the trials have measured the outcome on the same scale. The weighted mean could be calculated for groups before and after an intervention (like blood pressure lowering). The weighted mean difference would be the difference between start and finish values. For this, though, the difference would usually be calculated not as the difference between the overall start value and the total final value but rather as the sum of the differences in the individual studies, weighted by the individual variances for each survey.
Precision is not the only way of calculating a weighted mean or weighted mean deviation. Another more straightforward approach is to weigh the number in the study. This is a defense against giving undue weight to small studies of low variance, where there may have been less than robust data treatment, and people could have cheated (32).
The control group is a set of items or people that serve as a standard or reference for comparison with an experimental group. A control group is like the experimental group in number and is identical in specified characteristics, such as sex, age, annual income, parity, or other factors, but does not receive the experimental treatment or intervention (14).
Placebo is an inactive substance, such as saline solution, distilled water, or sugar, or a less than useful dose of a harmless substance, such as a water-soluble vitamin, prescribed as if it were an effective dose of needed medication. Placebos are used in experimental drug studies to compare the effects of the inactive substance with those of the investigational drug. They are also prescribed for patients who cannot be given the medication they request or who, in the healthcare provider's judgment, do not need it (15).
Nocebo reaction or response refers to harmful, unpleasant, or undesirable effects the subject manifests after receiving an inert dummy drug or placebo (see above). Nocebo responses are not chemically generated and are due only to the subject’s pessimistic belief and expectation that the inert drug will produce negative consequences. In these cases, there is no ‘real’ drug involved. However, the adverse effects of administering the inert drug, which may be physiological, behavioral, emotional, and/or cognitive, are nonetheless real (17).
Case-control (retrospective) studies are studies in which the study group consists of subjects with the disease (e.g., lung cancer). The control group includes those without the disease. The previous occurrence of the putative cause (e.g., smoking tobacco) is compared between each group. Case-control studies are retrospective in that they start after the onset of the disease (although cases may be collected prospectively) (2). A retrospective study is an epidemiologic study in which participating individuals are classified as either having some outcome (cases) or lacking it (controls); the result may be a specific disease, and the persons’ histories are examined for specific factors associated with that outcome. Cases and controls are often matched concerning specific demographic or other variables but need not be.
Compared to prospective studies, retrospective studies suffer from drawbacks: specific vital statistics cannot be measured, and large biases may be introduced in selecting controls and recalling past exposure to risk factors. The advantage of the retrospective study is its small scale, usually brief time for completion, and its applicability to rare diseases, which would require a review of very large cohorts in prospective studies (18).
Cohort (prospective) studies are studies in which the study group consists of subjects exposed to the putative cause (e.g., smoking tobacco). The control group includes non-exposed subjects. The incidence of the disease is compared between the groups over time.
A Cohort study generates incidence data, whereas a case-control study does not (2).
A prospective study is an epidemiologic study in which the groups of individuals (cohorts) are selected by factors to be examined for possible effects on some outcome. For example, the impact of exposure to a specific risk factor for the eventual development of a particular disease can be studied. The cohorts are then followed over a period to determine the incidence rates of the investigated outcomes as they relate to the primary factors in question (19).
Evidence-based medicine (EBM) aims to apply the best available evidence gained from the scientific method to clinical decision-making. It seeks to assess the strength of evidence of the risks and benefits of treatments (including lack of treatment) and diagnostic tests. This helps clinicians to understand whether a treatment will do more good than harm.
Evidence quality can be assessed based on the source type (from meta-analyses and systematic reviews of triple-blind, randomized, placebo-controlled clinical trials) and other factors, including statistical validity, clinical relevance, currency, and peer-review acceptance. EBM recognizes that many aspects of health care depend on individual factors such as quality and value of life judgments, which are only partially subject to scientific methods. EBP seeks to clarify those parts of medical practice that are, in principle, subject to scientific methods and to apply these methods to ensure the best prediction of outcomes in medical treatment, even as debate continues about which results are desirable (16).
Evidence-based Medicine recommendations, according to the US Preventive Service Task Force, can be categorized into the following categories:
· Level A: good scientific evidence suggests that the clinical service's benefits substantially outweigh the potential risks. Clinicians should discuss the service with eligible patients.
· Level B: there is at least good scientific evidence suggesting that the benefits of the clinical service outweigh the potential risks. Clinicians should discuss the service with eligible patients.
· Level C: there is at least fair scientific evidence suggesting that there are benefits provided by the clinical facility. However, the balance between benefits and risks is very close to making general recommendations. Clinicians need not offer it unless there are individual considerations.
· Level D: there is at least good scientific evidence suggesting that the risks of the clinical service outweigh the potential benefits. Clinicians should not routinely offer the service to asymptomatic patients.
· Level I: scientific evidence is lacking, is of poor quality or conflicting, so the risk against benefits balance can’t be assessed. Clinicians should help patients understand the clinical service's uncertainty (16).
Randomized controlled trial (RCT) is a type of scientific experiment, a form of a clinical trial, most commonly used in testing the safety (or more specifically, information about adverse drug reactions and adverse effects of other treatments) and efficacy or effectiveness of healthcare services, such as medicine or nursing, or health technologies. In RCT, the study subjects are randomly allocated to receive one or other alternative treatments under study after assessment of eligibility and recruitment but before the intervention begins. Random allocation in real trials is complex.
After randomization, the two (or more) groups of subjects are followed up in precisely the same way, and the only differences between the care they receive (for example, regarding procedures, tests, outpatient visits, follow-up calls, etc.) should be those intrinsic to the treatments being compared. The most important advantage of proper randomization is that it minimizes allocation bias, balancing known and unknown prognostic factors in the assignment of treatments. The terms ‘RCT’ and randomized trial are often used synonymously. Still, some authors distinguish between ‘RCTs,’ which compares treatment groups with control groups not receiving treatment (as in a placebo-controlled study), and ‘randomized trials,’ which can compare multiple treatment groups.
An RCT may be blinded (see below) by procedures that prevent study participants, caregivers, or outcome assessors from knowing which intervention was received. Blinding is sometimes inappropriate or impossible to perform in an RCT; for example, if an RCT involves a treatment in which active participation of the patient is necessary (e.g., physiotherapy), participants cannot be blinded to the intervention. Traditionally, blinded RCTs have been classified as single-blind, double-blind, or triple-blind. However, these terms may have different meanings for different people. RCTs without blinding are called ‘unblinded,’ ‘open,’ or (if the intervention is a medication) ‘open label’ (8).
Matching. An association between A and B may be due to another factor P. To eliminate this possibility, matching for P is often used in case-control studies. One powerful (but unreliable, if numbers are small) way to do this in clinical trials is to allocate subjects to groups randomly; and check important Ps have been distributed evenly between the groups (2).
Overmatching. For example, if unemployment causes low income, and low income causes depression, then matching study and control groups for income would mask the genuine causal link between unemployment and depression. Avoid matching factors that may intervene in the causal chain linking A and B (2).
Blinding. The trial is single-blind if the subject does not know which of the two trial treatments is administered. To further reduce the risk of bias, the experimenter should also not know. This is a double-blind trial. In a proper trial, ‘the blind lead the blind’ (2).
In a single-blind experiment, the subjects do not know whether they are so-called ‘test’ subjects or members of an ‘experimental control group. The single-blind experimental design is used where the experimenters must know the full facts. So, the experimenters cannot be blind, or the experimenters will not introduce further bias, and so the experimenters need not be blind. However, there is a risk that subjects are influenced by interaction with the researchers — known as the experimenter’s bias. In a double-blind experiment, neither the individuals nor the researchers know who belongs to the control and experimental groups. Only after all the data have been recorded (and, in some cases, analyzed) do the researchers learn which individuals are which.
Experimenting with a double-blind fashion can lessen the influence of prejudices and unintentional physical cues on the results (the placebo effect, the observed bias, and the experimenter’s bias). Random assignment of the subject to the experimental or control group is a critical part of a double-blind research design. A third party keeps the key that identifies the subjects and which group they belong to and is not given to the researchers until the study is over (7).
Triple blind clinical trial or another experiment involves neither the subject, the person administering treatment, or the person evaluating the response to treatment knowing which subjects are receiving a particular treatment or lack of treatment (9).
Bias (in statistics). A statistic is biased if calculated to be systematically different from the population parameter of interest. In testing a statistical hypothesis, a test is considered unbiased when the probability of rejecting the null hypothesis is less than or equal to the significance level when the null hypothesis is correct.
Some types or aspects of bias which should not be considered mutually exclusive include:
a) Selection bias, also called Berksonian bias, occurs when individuals or groups are more likely to participate in a research project than others, resulting in biased samples.
b) Spectrum bias occurs from evaluating diagnostic tests on biased patient samples, leading to overestimating the test's sensitivity and specificity (see above).
c) The bias of an estimator is the difference between an estimator's expectation and the real value of the estimated parameter. d) The omitted-variable bias is the bias that appears in estimates of parameters in regression analysis when the assumed specification is incorrect in that it omits an independent variable that should be in the model.
d) Detection bias occurs where a phenomenon is more likely to be observed and/or reported for a particular set of study subjects.
For example, the syndemic (the aggregation of 2 or more diseases in a population in which there is some level of positive biological interaction that exacerbates the adverse health effects of any or all of the conditions (6)) involving obesity and diabetes may mean that doctors are more likely to look for diabetes in obese patients than in less overweight patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
e) Funding bias may lead to selecting outcomes, test samples, or test procedures that favor a study’s financial sponsor.
f) Reporting bias involves an obliquity in data availability, such that observations of a particular kind may be more likely to be published and used in research.
g) Data–snooping bias occurs from the misuse of data mining techniques (5).
A meta-analysis (in statistics) combines the results of several studies that address a set of related research hypotheses. In its purest form, this usually is by identification of a standard measure of effect size, for which a weighted average might be the output of meta-analyses. Here the weighting might be related to sample sizes within the individual studies.
More generally, other differences between the studies need to be allowed. Still, the general aim of a meta-analysis is to estimate more powerfully the real ‘effect size’ as opposed to a smaller ‘effect size’ derived in a single study under a given unique set of assumptions and conditions. Meta-analyses are often, but not always, essential components of a systematic review procedure (see below).
In the Cochrane Collaboration, meta-analysis refers to statistical methods of combining evidence, leaving other aspects of ''research synthesis'' or ''evidence synthesis,'' such as combining information from qualitative studies, for the most general context of systematic reviews (10).
Meta-analysis is a systematic method that uses statistical techniques to combine results from different studies to obtain a quantitative estimate of the overall effect of a particular intervention or variable on a defined outcome. It produces a stronger conclusion that can be provided by any individual study (12).
A systematic review is a literature review focused on a research question that tries to identify, appraise, select, and synthesize all high-quality research evidence relevant to that question. Systematic reviews of high-quality RCTs (randomized controlled trials; see above) are crucial to EBM (evidence-based medicine). Understanding systematic reviews and how to implement them in practice is becoming mandatory for all professionals involved in the delivery of healthcare (11).
PubMed is a free database accessing the MEDLINE database of references and abstracts on life sciences and biomedical topics primarily. The United States (US) National Library of Medicine (NLM) at the National Institutes of Health (NIH) maintains the database as part of the Entrez information retrieval system (27).
MEDLINE (Medical Literature Analysis and Retrieval System Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and healthcare. MEDLINE also covers literature in biology, biochemistry, and molecular evolution. MEDLINE is freely available online and searchable via PubMed and NLM's National Center for Biotechnology Information’s Entrez system (28).
Cochrane Collaboration is a project being coordinated in 10 centers worldwide and carried out by thousands of volunteers, which searches the world medical literature on randomized control trials (RCTs; see above) and, together with all the unpublished trials that can be located, publishes the findings in an electronic form (13).
The number needed to treat (NNT) is an epidemiological measure used in assessing the effectiveness of a healthcare intervention, typically a treatment with medication. The NNT is the average number of patients who need to be treated to prevent one additional adverse outcome (i.e., the number of patients who need treatment for one benefit compared with a control in a clinical trial). It is defined as the inverse of the absolute risk.
The ideal NNT is 1, where everyone improves with treatment, and no one improves with control. The higher the NNT, the less effective the treatment. Variants are sometimes used for more specialized purposes. One example is the number needed to vaccinate. NNT values are time specific. For example, if a study ran for 5 years, and it was found that the NNT was 100 during this 5-year period, in one year, the NNT would have to be multiplied by 5 to correctly assume the right NNT for only the one-year period (in the example the one-year NNT would be five hundred) (24).
Absolute risk reduction (risk difference) in epidemiology decreases the risk of a given activity or treatment concerning a control activity or treatment. It is the inverse of the number needed to treat (see above).
For example, we may consider a hypothetical drug that reduces the relative risk of colon cancer by 50% over five years. Even without the medication, colon cancer is relatively rare, maybe 1 in 3,000 every five years. The rate of colon cancer for a five-year treatment with the drug is, therefore, 1/6,000, as by treating 6,000 people with the medication, one can expect to reduce the number of colon cancer cases from 2 to 1. In general, absolute risk reduction is usually computed concerning two treatments A and B, with A typically a drug and B a placebo (in the example above, A is a 5-year treatment with the hypothetical drug, and B is treated with a placebo, i.e., no treatment). A defined endpoint must be specified (in the above example: the appearance of colon cancer in the five-year period).
If the probabilities pA and pB of this endpoint under treatments A and B are known, then the absolute risk reduction is computed as (pB – pA). If a clinical parameter is devastating enough (e.g., death, heart attack), drugs with a low absolute risk reduction may still be indicated in particular situations. If the endpoint is minor, health insurers may decline to reimburse medications with a small absolute risk reduction (25).
Analysis of variance (ANOVA) in statistics is a collection of statistical models and their associated procedures in which the observed difference in a particular variable is partitioned into components attributable to dissimilar sources of variation. ANOVA is a statistical test used to examine differences among two or more groups by comparing the variability between the groups with the variability within the groups (34).
In its purest form, ANOVA provides a statistical test of whether the means of several groups are all equal and therefore generalizes the t-test to more than two groups.
Doing multiple two-sample t-tests (t-test is any statistical hypothesis test in which the test statistic follows a Student’s t distribution if the null hypothesis is supported) would result in an increased chance of committing a type I error (type I error, also known as an error of the first kind, is an α error or a false positive is an error of rejecting a correct null hypothesis (H0)). For this reason, ANOVAs are useful in comparing two, three, or more means (33).
Thanks for reading!
Thanks for reading!
Reference – Bibliography
1. Epidemiology, p. 665, Longmore M., Wilkinson I.B, Davidson E.D., Foulkes A., Mafi A.R., Oxford Handbook of Clinical Medicine, Oxford Medical Publications, 8th edition, 2010.
2. Screening, p. 487, Collier J., Longmore M., Brinsden M., Oxford Handbook of Clinical Specialties, Oxford Medical Publications, 7th edition, 2006.
Reference – Links
(Retrieved: February 1, 2016)
No comments:
Post a Comment