Who remembers the mammography study that was issued last year, and the firestorm that ensued over its conclusion that mammography for women in their 40s should be an individual decision between women and their doctor, rather than a blanket recommendation for all women in this age group?
There were cries that this was tantamount to rationing care. There were protestations that it would be taking away a valuable screening tool for breast cancer.
But if people went back and read the original study by the U.S. Preventive Services Task Force, they would have seen this wasn’tÂ what the study was about. The study didn’t conclude that mammography was useless and that it should be scrapped. Nor did it address the cost-effectiveness of mammography; it was primarily an epidemiological study designed to measureÂ the clinical benefits of mammography in reducing breast cancer deaths across the population of women in the U.S. And yes, it wasn’t without its limitations.
If there’s going to be an argument over a study, shouldn’t it at least address what the study actually says?
American readers these days seem to be awash in studies. Hardly a day goes by without coming across reports of new or better drugs and treatments, or a new recommendation on how to live a more healthful life. But as the USPSTF report on mammography demonstrates, it often takes careful reading and a critical eye toÂ analyze the findings from a study and, more importantly, to be able to understand what the study is about – as well as what it’s not about.
When it comes right down to it, the ability to dissect a study is mostly a learned skill. It’sÂ aÂ matter of knowing what to look for and what questions to ask.
For starters, there’s the size of the study. Did it involve 100 participants or 1,000? Was it conducted in multiple settings or at a single large institution? Numbers aren’t everything, of course; a small, well-designed study can be just as valid – perhaps even more so – than a large study that’s poorly designed. Nevertheless, when a study is small or limited in scope, caution is called for when it comes to interpreting the results. It can beÂ hard to reliably extrapolate the findingsÂ from these studies because there’s always a chance they won’t apply in a larger, more diverseÂ setting. A good example is this study that measured patient preferences for seeing a doctor vs. a mid-level provider. The study was conducted among emergency room patients at three urban teaching hospitals – a reasonably good sample but not large enough to indicate whether patients in a non-emergency setting or in a rural setting might feel the same way.
Study longitude, or duration, is another consideration. One of the best examples of a long-term study is the Framingham Heart Study, which began in 1948 and has yielded much of what we currently know about heart health and heart disease risk factors in the United States. This study carries considerable heft for both its size and longitude. The majority of studies, however, are fairly short-term, and it can become correspondingly more difficult to draw conclusions, especially when the study involvesÂ a disease or an intervention withÂ long-term implications.Â This is the weakness of many weight-loss studies, which might track how well the participants are able to lose weight over the course of, say, one year, yet fall short in evaluating how well they’re able to maintain their weight loss beyond the one-year mark.
How were the participants recruited? I blogged last month about a study conducted at Mayo Clinic to evaluate the feasibility and use of online patient care. It was an interesting study but it had one major drawback: The participants were specifically recruited, rather than being randomly selected.Â Most of them also were Mayo Clinic employees,Â which might have made themÂ more savvy or more sensitive than the general population about using online care.
How was the information for the study collected? Did it come from medical records? Interviews? Patient or clinician surveys? Many studies are based on survey responses or some form of self-reporting, such as asking participants to keep a food diary.Â Survey data can sometimes be skewed, both by the questions that were asked and how they were worded. Surveys that rely on the participants to fill out and return a questionnaireÂ tend to get responses from people who are highly motivated to participate, and might not reflect the attitude of the general public. Surveys that rely on self-reporting also can be tricky becauseÂ self-reporting isn’t always accurate.
Computer modeling has become an increasingly common research tool. It can be valuable for analyzing large amounts of data and drawing conclusions that might otherwise be difficult to measure. These types of studies are highly reliant on number-crunching and mathematical formulas, however, and formulas can sometimes be flawed.
How were the terms and parameters of the study defined? The definitions are important because they help ensure that apples are being compared to apples. If you’re going to evaluate a type of treatment for heart disease, for instance, you mightÂ decide to focus your study on participants who haven’t yet had a heart attack so you can ensure some uniformity in the population that’s being studied. Sometimes, though, the terms can be tooÂ exclusionary – and this can end up being reflected in the study’s findings. Here’s an example:Â A study is undertaken toÂ measure physical activity in a community and the authors decide to define “physical activity” as time spent at a gym or fitness center. This can unintentionally introduce a class bias against individuals who can’t afford to join a gym, or those who work in physically demanding occupations who don’t necessarily need to go to a gym to stay active.
In any study, bias is an ever-present risk. Study authors can be biased, sometimes unintentionally so and sometimes because of overt conflicts of interest. The premise or hypothesis of a study can be crafted in such a way as toÂ make the conclusion a foregone certainty. Selection bias can come into play in defining the terms of the study and recruiting the participants. Data dredges, whichÂ involve collecting and analyzing large numbers of similar studies, canÂ unwittingly engage in cherrypicking.
The best way to understand a study and how it was conducted is to go straight to the source – the study itself. This isn’t always feasible, though, and many studies also quite frankly are difficult/boring/incomprehensible for the average reader to slog through. Many folks end up relying on news stories which might not be complete or might misinterpret the study’s findings altogether. When reading any news account of a study, there are several additional caveats to keep in mind: Does the language fit the evidence? Does the story hype a theory that’s unproven or premature?
Every study has some limitations – some more than others. This doesn’t necessarily mean it’s a bad study or that no study can be trusted. Studies, after all, are the backbone for much of what we know in medicine – which treatments work in which patients, which screenings are the most effective, which models seem to be the best for patient care, and so on. They can help reinforce theories. Often they inject much-needed facts into the discussion.
Each study, though,Â needs to be understood within the context of its scope, methods and findings. Studies might offer up information that’s valuable, or controversial, or inconclusive, or preliminary, but they rarely can capture more than one fragment of a very large and often fuzzy picture. And none can contain the final, authoritative word.
Photo: Wikimedia Commons