Why I’m over “Overdiagnosis”

#BadScienceOfTheDay:
Once again, a journal article claiming extremely high rates of breast cancer overdiagnosis (“up to 48%”) makes big news.
 
Here’s the problem: it’s a retrospective cohort study which calculated an overdiagnosis % based on a truly bizarre endpoint: the total number of “advanced” versus “non-advanced” breast cancers.
 
Now in any scientifically valid study, you would use a definition of “advanced” that involves lymph node and/or distant metastases – features that strongly correlate with overall survival. You might even do a secondary analysis with overall survival as an endpoint.
 
Nope. These clowns defined advanced as “radiographic primary tumor size >= 2cm”… then using a completely invalid endpoint they did a bunch of statistics to come up with a completely invalid conclusion. Garbage in, garbage out.
 
As if to illustrate the ridiculousness of their own study, the Danish researchers published several overdiagnosis estimates based on different statistical approaches. The two bottom-line numbers were 24.4% and 48.3%…
 
How can you have any faith in your statistics when they give you two different answers that are completely different? If I tried to sell you a car by saying that, depending on how you measure it, it either has 244 horsepower or 483 horsepower, you’d call me a fraud!
 
* * *
Here’s the problem with the entire concept of “overdiagnosis”:
 
1) Overdiganosis is a synthetic endpoint that is effectively a derivative of a derivative. The degree of statistical modeling required to estimate overdiagnosis means that any errors, biases, or design flaws in the starting data-set will be enlarged by orders of magnitude. A tiny change in the statistical assumptions leads to a 2-fold difference in your final “endpoint”.
 
2) Overdiagnosis is highly un-reproducible within studies, or between studies. If you look at literature reviews of “overdiagnosis in breast cancer”, the published values from top-tier journals ranges from ~5% to ~50%. That’s such a large range it is impossible to apply to real life.
 
3) Overdiagnosis is neither clinically apparent, nor is it provable or disprovable. No one can single out a patient and definitively prove that they were overdiagnosed, or they weren’t overdiagnosed. In my simplistic clinical mindset, that makes it more like faith healing than scientific medicine.
 
* * *
The conceptual basis of overdiagnosis makes some sense in some cases. It is easy to imagine a bedridden 90-year-old being diagnosed with some tiny little cancer that probably won’t kill them.
 
However, the statistical methods used to estimate overdiagnosis percentages are horribly unreliable, and they do a piss-poor job of helping make decisions in real life.
 
A physician with common sense can avoid scanning or treating the bedridden 90-year-old while still offering care to the rest of the patients, and he doesn’t need to quote an artificial “24% to 48%” number to back up his clinical judgement.

Why Neuropsych Studies are Big Liars

Bad Science Of The Day:

Why Big Liars Often Start Out as Small Ones

I came across this article in the “Science” section of New York Times. It is a link to a Nature Neuroscience paper out of the University College of London, which amazingly enough appears to have free fulltext. Naturally, I pulled up the actual article and spent quite some time trying to make heads/tails out of it. Sadly, it wasn’t worth the time.

soniamdisappoint

The original article, as well as the NYT piece, makes the very plausible claim that the human brain desensitizes itself to dishonesty in the same way that you become desensitized to bad smells. So slimy corporate executives, crooked politicians, and hustling street vendors aren’t actually trying to lie and cheat. They’ve just gone nose-blind to the stink of their own deception.

That’s certainly a plausible hypothesis, and it passes the Bayesian common-sense test. The problem is, after reading the Nature Neuroscience article, I have a hard time washing away the stink of their poor methodology. It smells like an Unreproducible Neuropsych Study, suffering from many of their common Bad Habits:

* Very small n
* Really stretching it with experimental design
* Really stretching it with synthetic endpoints
* Running minimally-bothersome trial stimuli on subjects stuck in a highly-bothersome fMRI scanner
* Data-torturing statistical methods
* Shoehorning hard numerical data into a Touchy Feely Narrative

***
First of all, their subjects were 25 college students with an average age of 20. I can understand only having 25 subjects, as it’s not exactly cheap/easy to recruit people into fMRI neuropsych experiments. But they actually scanned 35 kids. 10 of them caught on to their trial design and were excluded.

Really? One third of their subjects “figured out” the trial and had to be excluded? Actually, it was probably more, only one-third admitted to figuring out the trial design. For being a study about deception, the researchers sure were terrible at decieving their test subjects.

Alanis Morisette would be proud of the irony, as would Iron Deficiency Tony Stark.

***
The experimental design was questionable as well. The researchers used the Advisor-Estimator experiment, a commonly cited psychological model of Conflict of Interest.

Normally an advisor-estimator experiment involves a biased advisor (who is rewarded for higher estimates) assisting an unbiased estimator (who is rewarded for accurate estimates).

This is a great surrogate model for real-world conflicts of interest, like consultants who make more money if you are convinced to buy ancillary services. But it seems like a terrible surrogate for deception. As the experimenters themselves noted, there was no direct personal interaction between the subject and the estimator, no actual monetary stakes involved, and no risk of the subject being caught or punished for lying.

Worse yet, the magnitude of deception involved is incredibly minimal: skewing an estimate by a few pounds in the hopes of being paid a pound or two. That’s a minimal level of emotional manipulation of the subjects. I don’t know about British college kids, but I’d be much more emotionally disturbed by the fact that I’m stuck in a fMRI scanner.

Radiographic measurement, as with photographic image quality, is all about signal to noise ratio. In this case the emotional “signal” (distress caused by lying) is tiny compared to the ambient emotional “noise”.

***
Things get really silly when you read their composite endpoint, something called “Prediction beta”. It appears to be a statistical mess: a 2nd-order metric divided by a 2nd-order metric and averaged into something that resembles a correlation coefficient but is numerically less than 0.1.

Somehow this was statistically significant at p=0.021. But then you read that the authors also tested a crapload of other brain regions, and none of them were nearly as “predictive” as the amygdala. That’s a textbook case of multiple-comparisons data torturing, and it means that their p-values should have been Bonferroni’d into oblivion. The significance threshold shouldn’t have been 0.05, it should have been much, much lower.

***
When all is said and done, the authors should be congratulated for having taken a common sense anecdote (“Small lies lead to bigger ones”) and spent an immense amount of time and money coming up with super-unconvincing scientific data to back it up.

I imagine their next Amazing Rigorous Neuro-Psycho-Radiology trial will demonstrate, after testing twenty hypotheses with thirty different regressions, a borderline-statistically-significant correlation between insufficient parental affection and abusive bullying behavior.

Bullcrap like this is why common-sense driven people are losing their faith in science.

Paging Dr. Hologram: Artificial Intelligence or Stupidity?

 

The Doctor (Star Trek: Voyager)

Doctors Turn to Artificial Intelligence When They’re Stumped,” reports PBS. A dermatologist uses the Modernizing Medicine app to search for a drug to prescribe. A Microsoft researcher describes electronic health records as “large quarries where there’s lots of gold, and we’re just beginning to mine them”. Vanderbilt pharmacists build a computer system to “predict which patients were likely to need certain medications in the future”. CEOs, venture capitalists, and PhD researchers all agree: artificial intelligence is the future of medicine.

In the article, IBM’s Watson is even described as an “artificially intelligent supercomputer”, which sounds far more brilliant than its intended level of expertise of a “nurse” or “second year med student”. (This makes no sense either. A nurse is way smarter than a 2nd year med student unless your patient desparately needs to know about the Krebs cycle. Unless it’s a brand new nurse.)

A simple read-through of the PBS article might convince you that artificial intelligence really is on the cusp of taking over medicine. By the last few paragraphs, the PBS writers are questioning whether computers might not be altogether more intelligent than humans, making “decisions” rather than “recommendations”. You’d be forgiven for believing that electronic health records (EHR) software is on the verge of becoming an Elysium Med-Pod, Prometheus Auto-Surgeon, or if you prefer the classics a Nivenian AutoDoc.


 

 

 

“Machines will be capable, within twenty years, of doing any work that a man can do.”

Herbert A. Simon, The Shape of Automation for Men and Management, 1965

Reading between the lines gives a much clearer picture of the state of electronic clinical decision support (CDS) algorithms:

  • Dr. Kavita Mariwalla, a MD dermatologist treating real patients, uses AI to figure out what drugs to prescribe.
  • Dr. Joshua Denny, a PharmD treating real patients, uses AI to recieve prescriptions and to anticipate what drugs may be prescribed.
  • Dr. Eric Horvitz, a PhD computer scientist at Microsoft, talks about mining your medical records for profit. Of course he would do it in a totally privacy-respecting, non-creepy, non-exploitative way.
  • Daniel Cane, a MBA CEO who sells software, suggests that it is easier for physicians to learn “what’s happening in the medical journals” by buying his software. (because reading medical journals is just too difficult)
  • Euan Thompson, a partner at a venture capital firm, suggests that artificial intelligence will make “the biggest quality improvements”, but only if people are willing to pay the “tremendous expense” involved.
  • Dr. Peter Szolovits, a PhD computer scientist, is optimistic about computers learning to make medical decisions, and his biggest concern is that the FDA would come down on them “like a ton of bricks” for “claiming to practice medicine.”

It isn’t hard to tell that the clinicians and the non-clinicians have very different views of medical AI.

 


Are Computers Really That Smart?
I’m sorry Dave, but I cannot do that.

The most useful programs in current-day medical practice are pharmacy-related. So when PBS wrote their article about AI, they latched on to two pharmacy-related examples of direct patient care. Computers can search through vast amounts of information very quickly, telling us the correct dosing for a drug, second-line drugs you can switch to, or whether X patient is more likely to have a bleeding event with Plavix based on the data in their EHR.

Even then, computers can sometimes be more of a hassle than a help. Most physicians practicing nowadays have run into annoying pharmacy auto-messages in the vein of, “Mrs. Smith is 81 years old, and you just ordered Benadryl. Patients over the age of 70 are more likely to have adverse effects from Benadryl. Please confirm that you still want to order the Benadryl.” (you can replace “benadryl” with just about any imaginable medication.)

However, one thing that computers definitely can’t do is pick up on subtle cues. The PBS article suggests that a computer could tell that a patient is lying when he says he’s not smoking even though there are “nicotine stains” on his teeth and fingers. A computer would need incredibly good machine vision just to see those stains, and how would it know the teeth weren’t stained from coffee, chronic antibiotic use, or just poor dental care. Same with fingers; your patient could be a mechanic wearing a Ford dealership hat and coveralls, how do you know his fingers aren’t stained with motor oil?

For all the recent advances in machine-vision, self-driving cars and all, a computer can only do what it is programmed to do. A Googlemobile can only drive itself because Google has spent years collecting immense amounts of data, correcting errors as they pop up. “Rather than having to figure out what the world looks like and what it means,” Google says, “we tell it what the world is expected to look like when it’s empty. And then the job of the software is to figure out how the world is different from that expectation.”

A wannabe Hologram Doctor can’t rely on having an ultra-precise map of what to expect from a human body, because every single human is different. This is a vastly more difficult problem than figuring out that a slowly moving human-sized object is a pedestrian.


 

The Perils of Excessive Hype
Daisy, Daisy, daisy…

So what’s the harm? If medical-AI researchers want to suggest that computers are on the verge of telling lies from truth, diagnosing complex diseases, and “practicing medicine” like a trained professionals, can we really blame them? After all, they’re just hyping up their field.

Well, the fact is that AI publicity has always been the greatest enemy of AI research. Ever since the 1960s, every time an incremental improvement is made in AI, people hype it up to ridiculous levels, and the hype ends up discrediting the actual technology. Real machine-learning technologies have only improved over time (after all, Moore’s Law is still in effect) but the perception of AI has whiplashed back and forth through the decades.

Perception is a very big deal in healthcare, just ask pediatricians about vaccines. If large healthcare institutions implement (or mandate) half-assed AI programs that end up hurting some patients (even if relatively few), the ensuing public mistrust of medical AI may never go away. You can bet your ass that the FDA would turn hostile to AI if that happened.

Machine-learning technology has a lot of potential for improving healthcare, but unless you’re a venture capitalist or software CEO it’s irresponsible to suggest that decision-support software will rapidly change medical decision-making for the better.

What’s even more irresponsible is suggesting that commercial software should replace reading as a way for physicians to keep up with the medical literature. Anyone who’s worked with “Clinical Pathways” type software knows that don’t always give you a “board exam safe” answer. While they may hew to some consensus guideline, which guideline they use is entirely up to the MD consultants hired by the software company. It’s the professional responsibility of each physician to go to meetings, keep up with the evidence, and use our brains to decide which papers to believe and which guidelines to follow. If we can’t be trusted with that much, then why do MDs go through 4 years of med school and 3-8+ years of postgraduate training?

As a physician and technophile, I think that EHR and CDS are greatly beneficial when done correctly and when they don’t take away from the physician’s medical judgement. Rushing new medical software into practice, whether to comply with a poorly-thought-out government mandate or to generate free publicity, has the potential to do much more harm than good. Like many other medical advances, it is much better to be right than to be first.

Link

Hacking the Mind Epilogue: Psychosurgery

Hacking the Brain Epilogue: Psychosurgery

While we’re on the subject of “hacking the human mind“, it looks like there is renewed interest in psychosurgery. The link goes to an article about deep brain stimulation for alcoholic cravings, PTSD, and depression!

People have been triyng to control psychiatric conditions with surgery since the days of the prefrontal lobotomy. Electrical stimulation has the advantages of precision and reversibility. However, as with any neurosurgical procedure it relies upon localizing an unwanted symptom to a specific location in the brain. For example, deep brain stimulation works for Parkinson’s because the disease is localized to the basal ganglia.

No matter how much funding you throw at electroneurology, it won’t do any good if an unwanted emotion or compulsion is spread out over a large area of the brain. It remains to be seen how well localized things like alcoholism and PTSD are.

Why did the Corpus Callosum cross the road?

Why did the Corpus Callosum cross the road?
To get to the other side.

Why did the beta-amyloid cross the road?
Because… I… it… What was the question again?

Why did the spinothalamic tract cross the road?
The other side was on fire.

Why did the amygdala cross the road?
It was running away from a… OH MY GOD ITS COMING RIGHT FOR US!

Why did the central sulcus cross the road?
You would too, if you were surrounded by creepy homunculus things.

Why did the Wernicke’s aphasia cross the road?
The road a cross dirun like two. Free crosses rodeo why? Arrest and Texas in red, yes happy area.

Why did the Korsakoff syndrome cross the road?
I don’t know, what does it matter to you? I was going to the park. That’s right, I was taking a walk in the park. Now get off my back!

Why did the septum pellucidum cross the road?
I don’t know but that sounds pretty bad, you better’d start dexamethasone.

Why did the optic chiasm cross the road?
Because it couldn’t see the median.

Why did the cavernous sinus cross the road?
It didn’t.

Why did the Broca’s aphasia cross the road?
Hodor?