Sleep

The Middlemiss Study Tells Us Nothing About Sleep Training, Cry-It-Out, or Infant Stress

Last week, I wrote a post about sleep training and stress, in which I argued that everything we know about stress suggests that sleep training is not harmful.

In response, some people objected that sleep trained babies continue to experience elevated cortisol and significant distress, even after they have stopped crying. In their view, sleep training teaches babies that crying does not help. They haven’t learned to self-soothe or to fall asleep on their own, they’ve simply given up.

What a heartbreaking thought. And one that surely strikes fear in the heart of many parents.

So it’s important to realize that this claim comes from a single small and deeply flawed study of 25 babies, led by Wendy Middlemiss, a researcher at the University of North Texas’s College of Education.

Here is a typical example of how her study is described in the popular press:

“The researchers found high levels of cortisol, a stress hormone, in both the mothers and the babies during the times the babies were crying. After several days, the babies learned to go to sleep without crying. Researchers found that during these quiet nights, the mothers no longer had high cortisol levels but the babies’ cortisol levels remained high. They had merely learned to remain quiet while distressed.

The researchers noted that this was the first time the mothers and babies had not been in sync emotionally. The mothers no longer had high stress levels, not realizing that their babies were still just as upset.”

I can see how many parents would read something like this and swear off sleep training.

Here’s the truth, though: Nothing in her study supports these claims.

To see why, let’s start by briefly reviewing her study’s design. Middlemiss studied 25 mother-infant pairs who were enrolled in a sleep training program at a local hospital. The babies ranged in age from 4 to 10 months.

Mothers spent the day at the hospital with their infants, helped prepare them for sleep at naptime and bedtime, and then retreated to a hallway outside the room where they could hear them, but their infants could no longer see them. The nurses put the infants down in their cribs and let them cry, without soothing, until they fell asleep. This process was repeated for 4 days.

On the 1st and 3rd nights of this program, Middlemiss measured the babies and mothers’ cortisol levels, a hormonal marker of stress. She tested their cortisol levels once right before bedtime and then again 20 minutes after the babies had fallen asleep.

So what’s wrong with her study? Well, a lot.

(1) The study lacked a control group. Without a group of control babies who were put to sleep by nurses in the hospital, but who did not experience sleep training, we cannot say whether sleep training affected infants’ cortisol levels, or whether something else about the program, like being put to sleep in an unfamiliar room in a hospital or being put to sleep by a stranger, affected infants’ cortisol levels.

 (2) She does not analyze her data correctly. Let’s take her findings one at a time.

Claim #1: Babies cortisol levels remained “high” before and after falling asleep on the first and third nights of sleep training, while mothers’ levels dropped after their babies had fallen asleep on the third night.

Problem #1. Middlemiss does not report a baseline cortisol level for the babies. We do not actually know whether infants’ cortisol levels were “high” or “low” or “normal”. She calls them high. But we have know way to know that they’re high; they stay constant throughout the study.

When Melinda Wenner Moyer, a reporter for Slate, asks why she calls their cortisol levels high, Middlemiss responds that she also assessed the babies’ cortisol levels while at home, and the levels were lower than at the hospital. But she never reports these baseline levels in her paper.

To anyone who has published a scientific paper, this is a baffling response. If you are drawing conclusions based on data you collected, you report that data. That’s the way it works.

Problem #2. She uses the wrong statistical analyses to compare before and after cortisol levels. To me, this is the most egregious problem with her research.

Here is how she describes her analyses:Screen Shot 2016-04-20 at 1.55.04 PM

In stats-speak, Middlemiss compared whether the group means before and after sleep training were significantly different from one another. This is the wrong analysis. She should have used a repeated measures analysis. Simply comparing group means is incorrect. It is also considerably less powerful (it fails to take into account that you already know something about the individuals the second time around) and thus more likely to lead to a false conclusion of no change.

Let me illustrate why using an analogy. Imagine you have a group of 25 students who enroll in an SAT prep class. You compare their test scores before the class has begun with their tests scores after the class ends. The mean test score among students does not increase. Does this imply that the class was worthless?

Well, maybe, and maybe not. What you really want to know is not whether the group mean is higher, but whether on average the students improved. And that is actually a different question. For example, if 90% of the students improved by say, 50 points, while 10% dropped by 500 points, the means would remain the same, despite the vast majority of students improving.

Problem #3 Middlemiss is missing a ton of data, and–you guessed it–she does not handle that issue correctly either.

Look back how she describes her analyses. Do you see how the number of mothers and infants in each group changes from before and after sleep training? Consider the mothers’ cortisol data on the third night. The pre-sleep group includes cortisol samples from 17 of the 25 mothers. The post-sleep group includes cortisol samples from 12 of the 25 mothers.

This raises some questions: Are these 12 mothers a subset of the first 17? Or do these 12 include mothers not included in the before group? Middlemiss never tells us.

Why is missing data a problem? Let me again use an analogy. Imagine I have a bag of 25 apples. First I pull 17 apples out and weigh them, and then put them back. Next I pull out a second set of 12 apples out and weigh those. The second set of 12 apples weigh less, on average, than the first 17. Can I conclude that apples in the bag have lost weight?

Of course not.

Now, missing a sample or two is not a huge deal, even in a relatively small study. Large studies can often handle significant amounts of missing data, provided data loss occurs more or less at random. But Middlemiss is missing samples from over half of the mothers on the third night!

At the very least, if you wanted to test whether the women’s cortisol has dropped after sleep training, you ought to compare the same mothers before and after the their infants fell asleep. But she does not do that.

So, was the babies’ cortisol high? I don’t know. Did the mothers’ cortisol drop? I don’t know. It’s impossible to tell from what she reports.

(Note that this should never be the case in a scientific publication. The whole point of a scientific publication is to make what you did clear enough that someone else could replicate your study and analyses.)

Claim #2 Mothers’ and babies’ cortisol levels were no longer correlated after the third night of sleep training. 

This is what she reports:

Screen Shot 2016-04-20 at 2.29.05 PM

Problem #4 Here again Middlemiss uses the wrong statistical test. She claims that the mothers and babies cortisol levels no longer correlated after the third night of sleep training, because the second correlation, r(10)=.422 is not statistically different from zero. This is the wrong test.

She should have tested whether post-sleep correlation of r=.422 is significantly different than the pre-sleep correlation of r=.582. A significant difference between the correlations is, after all, what she claims to have found.

And by the way, the two correlations are not significantly different from one another.

Problem #5 Even if Middlemiss had performed the correct test, we would still have a major problem, because, yet again, she’s lost over half of her sample! We have no way of knowing whether the pre-sleep mothers-infant pairs are the same, largely the same, or completely different from the post-sleep mother-infant pairs. She nevers tell us.

Problem #6 Her entire argument boils down to 10 mother-infant pairs. That’s too small of a number to tell much of anything. To see why, note that if she found a correlation of .422 in her entire sample of 25 mother-infant pairs, it would have been significantly different from zero–meaning her whole argument rests not just on the wrong analysis but on a lack of statistical power.

In sum…

What did Middlemiss and colleagues conclude from these non-findings? 

That infants continue to experience physiological distress, as measured by cortisol, despite being able to soothe themselves to sleep, and that sleep training leads to an “asynchrony” in mother and infant cortisol levels.

What do I conclude?

That this study should never have been published, at least not in its current form. Middlemiss and colleagues had no control group. They performed the wrong statistical analyses. They had huge amounts of missing data which they did not account for at all. Her key findings relied on less than half of her original sample of 25.

These are not minor, nitpicky problems. These are major, glaring problems that make interpretation of her findings impossible.

Warning women against sleep training on the basis of this study is absurd.

11 replies »

  1. 👏🏻👏🏻👏🏻👏🏻

    That is all.

    No, it isn’t. Thank you for taking the time to go over these. Mothers who do cry it out are often accused of being unloving and abusing their babies. Because of that, many mothers are afraid to ever do it. Your posts help bring peace of mind to mothers who feel it is the right move for their baby. They even bring peace of mind to mothers who did cry it out with their babies and worry they may have damaged them somehow. So thank you for the time and analysis.

    • Thanks for sharing, Kiki.

      In my opinion, this post does not address the major problems with Middlemiss’s study–the lack of a control group and problems with missing data and analysis. Instead she just reiterates that the babies remain stressed–and we cannot actually conclude that from this study. For one, we don’t even know if the same babies’ cortisol levels were reported after sleep as before sleep. We have no idea whether they were stressed out by cry-it-out or something else about the situation.

      And I completely disagree with her argument that we know the babies’ cortisol levels were high. We cannot say that the babies cortisol levels were high based on comparison with other studies, because we have no idea how the assays for cortisol were run in her study–the assays for cortisol can vary from study to study and from assay to assay. More importantly, we do not know the normal range of stress responses for these babies.

      • Seeing as that post doesn’t actually address the Middlemiss study, I’m curious what you read.

        Re your points above, a few stats issues:

        1) You’re absolutely right to be concerned about a lack of a control group. However, I hope you extend this to mean that we have no reason to believe cry-it-out works as the studies done on that either don’t have a control group or don’t seem to show any effectiveness. Given your big push for this being the largest fault, you should mention it’s an even bigger issue for CIO research that claims effectiveness as the time period lapses are much longer and you are more likely to see changes in the control group. Here, although a control group would be best, the chance of changes over 3 days without intervention are lower (though still present).

        2) The Middlemiss study did use repeated-measures in the form of a within-subjects t-test. You can tell that by the df used in the test.

        3) The comparison of r in the synchrony analysis isn’t *wrong*, but rather not complete. Your suggestion of the analysis comparing r’s would be best to include, but doesn’t answer the question they were asking. It would provide necessary information for the reader though and would suggest to a possible type-II error in the second analysis (likely due to low n), but we can’t be certain. I also want to add that in no correct way could you run this analysis because the groups you’d be comparing are different. The second is a subset of the first, but to compare correlations you either need the exact same group OR two different groups measured on the same variables. You have neither here.

        In short, there ARE flaws in the Middlemiss study, but it was a PRELIMINARY study that aimed to tell us something that hadn’t been explored yet. It shouldn’t be used as a conclusive study that tells us everything, but it certainly highlights the need for follow-up research.

      • Two things: 1. Yes, the df implies that she used a repeated measures analysis, but everything else in the text suggests that she used a mean comparison. If she did use a repeated measures design, then she should have reported that, and she should have provided the means for the same individuals before and after.

        2. I don’t see how this study in particular suggests the need for more research, or the need for anything. Her results do not support her interpretation of her findings. That’s all there is to it. That’s not science. You can have limitations; you can have remaining questions; but you cannot pretend to have found something you haven’t.

        We all know the perfect sleep training study has yet to be done. And we would all like it to be done, regardless of Middlemiss’s alleged findings. So I don’t see how this study highlights the need for further research. To me, it highlights nothing other than that some people are so ideologically opposed to sleep training that they will contort the evidence to support their agenda.

  2. Would you leave a child to CIO, one screaming at their bedroom door, shouting mum come and get me?
    Regardless of this experiment and it flaws CIO is going to cause high levels of stress to the baby, they are communicating and being ignored.

  3. I’m so glad to see that this “scientific” study has been re-evaluated. People who don’t like CIO methods use this study to bully and shame moms who do — a fact which has been MORE irritating and has raised MY cortisol level far greater than than sleep-training my two children (only one who truly did CIO) combined. In ANY scientific investigation, your results are going to be meaningless without a control group. And without seeing their moms there, wouldn’t that fact alone stress out those poor little babies? This study is severely flawed!

Leave a Reply