How to run a sleep experiment

Find out what affects your sleep, step by step

Published in

Biohacker Blog

12 min readMay 30, 2020

There is one simple idea that I want you to keep in mind:
being a mad scientist without actual science is just being mad.

If we use self-tracking to improve our lifestyle, there are many errors we can make. Let me give you an example:

Last Friday, Bob went to dinner with a friend.
Who’s Bob? I don’t know, just tag along.
So Bob and his friend had some lemon-flavored ice-cream.
They enjoyed the ice cream. Had five servings, each. They loved it.

What you do not know about Bob though is that: Bob measures his sleep every single night. Bob uses a fancy tracking device he got as a birthday present from his cousin, Jessica. That was so sweet of you, Jessica. Anyway.
So Bob is curious: how did the ice-cream affect his sleep?
He wakes up the next day and he looks at his data. Bob is in awe!

According to “the data”, Bob had the best sleep after ice cream.
Bob is so happy, he goes on telling everyone:

I had this ice-cream yesterday and my sleep looks much better than average.
This ice-cream must be good for sleep.

Not too fast, Bob! There are a couple of issues here:

Maybe, that day, Bob would’ve had good sleep anyway. It just turned out that “eating ice-cream” happened the same night.
What if it was the fact that “Bob met his friend” that helped him sleep better? This is known as a confounding factor.
And maybe Bob loves the ice-cream so much that he wants to prove that it’s healthy. This is one form of what is known as experimenter bias. It’s not that Bob is a lier, but subconsciously Bob just wants to believe that ice-cream is good for him.

Note that this is part 1of a three-part series on How to run a sleep experiment:

This part, part 1, will focus on how to plan the experiment properly.
In the next part, part 2, we will talk about how to be careful in taking measurements for the experiment.
Finally, in part 3, we will talk about how to analyze the results at the end of the experiment.

Now, I know Bob’s case sounds silly. And, thank god, very few people are like Bob. However, anyone who is willing to use self-tracking to change their lifestyle should know how to do it well. These biases are present in all of us, either we recognize them, or we end up fooling ourselves.

I, for example, always believed that drinking chamomile helps me sleep better. I used to always drink it hours prior to bed. What if I did not drink chamomile? Well, then the emptiness of my cup those nights reflects the emptiness in my heart and soul. So I actually experimented with chamomile for a month. Going off chamomile for a while and then on and off again. The result? There was no significant difference between the days I drank chamomile before bed and the days I did not.

Do I still drink chamomile? Yes, because I love it, and it’s just part of those routines that I got so accustomed to. But now, if I do not have any, I just do not worry about it, because I know that in the end, it doesn’t really matter.

In this post, I will show you exactly how you can run an experiment on your sleep (or any other measurable aspect of your lifestyle). Remember this piece of wisdom from Richard P. Feynman:

To simplify, here are the steps to follow for planning an experiment:

Step 1— Select what to experiment on

Simply select a habit, food, or supplement you want to test. But, how?

Personally, the change that has most impacted my sleep is probably avoiding blue light before bed. But there are also strategies that I use on special occasions to get a better night’s sleep. To keep it simple, there are two main ways I use to decide on which supplement or habit to experiment on.

The first is observation. For example, If I took a warm shower before bed and ended up sleeping well, then I would consider experimenting with hot showers.

The second is research. Or trends. For example, if a research article says that supplementing with magnesium can improve deep sleep, I would want to try it out. When it comes to supplements, I usually stick with things that I am convinced are safe and reasonable.

One website that I love to use for finding out information about certain supplements or foods is called Examine. You can simply search for lavender or melatonin and get a good overview.

The main supplements I have tried in the past two years (that seemed to work for me) are magnesium, phosphatidylserine, reishi, valerian, and chrysanthemum. And the main habits being: wearing blue light blocking glasses, eating dinner earlier, fasting, taking a hot shower among others.

For more stuff like this, sign up at https://www.ismail.land/subscribe

Step 2 — Be more specific, formulate a hypothesis

Let us assume that I observed that hot showers help me sleep better. But we cannot simply say that the hypothesis is:

Hot showers help me sleep better

What does “better” actually mean? Does it mean I went to sleep faster? Does it mean I woke up feeling better? Does it mean I slept for a long period without waking up at night? In order to formulate a proper hypothesis, we have to be more specific.

I personally use the Oura ring for tracking sleep. If I observed that I had more deep sleep after a hot shower then I could make the following hypothesis:

Hot showers increase the amount of deep sleep

What if I don’t have an Oura ring or a Fitbit or any fancy tracking device? Then, I could say: “Hot showers increase the subjective quality of sleep”. The word “subjective” just says that it is based on what I feel like. This would mean that I would have to take personal subjective measurements (like a grade between 1 and 10) on how I feel about my sleep every morning. The point is, we need to choose something we can measure. The next part of this series will go into more detail about measurements.

Just like the burden of proof in law, where a defendant is considered innocent until proven otherwise. In “science”, the default position in an experiment is that there is no relationship between an exposure (hot showers) and an outcome (deep sleep). This means that we need to define this position as a hypothesis. This is called a null hypothesis. In our case, this would be Hot showers have no effect on the amount of deep sleep. We will run the experiment considering that this is true. In a way, this is similar to saying that the hot showers are innocent (have no effect on deep sleep) until proven guilty (have a significant positive or negative effect on deep sleep). The proof part is what we will talk about in part 3 of this series where we will go deeper into analyzing the results of the experiment.

Step 3— finally, plan the experiment

To keep things simple we will focus here on experiments that could be “controlled”. For instance, we cannot control the weather, well I assume most of us can’t.

So if we study the impact of rain on sleep, we cannot really design the experiment as we want. We cannot decide: Mondays and Tuesdays we make it rain and then the rest of the week we keep it sunny. Maybe one day we’ll be able to do that, but I don’t see that happening yet. On the other hand, if we want to experiment with drinking some magic potion before bed, we may have the possibility of choosing which days to drink it.

Let us assume I am running an experiment on how: “Drinking lavender tea increases the amount of deep sleep”

In this case, designing the experiment means answering the following:

For how long am I going to run the experiment?
Which days will I drink lavender and which days will I not?
Is it possible to make it so that I cannot differentiate between drinking lavender and not drinking lavender? (this is known as blinding)

If, for example, we run this experiment for one month, it could look something like this picture:

Let’s use A to refer to the time period (in our case of one day) where we do not drink the lavender. Let’s use B to refer to the day where we do drink the lavender.

How can we have a “balanced” experiment?

One important requirement for a good experiment is to balance treatment assignments, especially for potential confounding factors, so
that the treatments are compared fairly.

In this design, we assign one day where we do not drink lavender (A) followed by one day where we drink lavender (B). Then a day where we drink lavender again (B) then a day where we do not drink lavender (A).

This is an example of a counterbalanced design known as ABBA, the assignments alternate between AB and BA in a manner that minimizes possible confounding with time.

Here are two simple ways to balance treatments:

Counterbalancing: Deciding on which specific days in order to balance for a known confounding factor. For example, if I know there are days where I will have a high-intensity workout I can try to make sure that the same amount of A and B days are assigned to workout days.
Randomization: This simply means assigning A days and B days at random (each with the same probability). This can be effective when we do not know the possible confounding factors.

Counterbalancing can be more effective at achieving a nearly exact balance for the potential confounding factors that we are already aware of. On the other hand, randomization achieves balance over a long period of time when averaged across a large number of days.

However, we should understand and remember that exact balance cannot usually be achieved.

In counterbalancing, the reason we usually go for ABBA (or BAAB) instead of ABAB or BABA is that the latter will not be balanced for a time trend. For example, as shown in the picture below:

Let us say that the hours of deep sleep are not related to drinking lavender, but it decreases with time as shown in the graphs above. For the graph on the left (ABAB), if we average the hours of sleep, we will find that B has a lower value than A. But for the one on the right (ABBA), the low value of A at the end will compensate, making it so that A and B have almost the same average.

So here’s a simple way to choose between randomization or counterbalancing:

How much time should I run the experiment for? One year? One month
It depends. Personally I’m not going to engage in one year trying something out unless I know it’s worth it.

So, just like in a pilot study, we can try a small experiment first and only design a larger one if we see that it is worth it.
This is especially the case if the experiment requires less practical things such as taking blood samples or training to exhaustion to measure VO2max.

This is just my opinion: start small to see if it’s worth the investment.

The other deciding factor is how much time I am willing to do this for and by when do I want to see the end results.

What is blinding and is it possible?

In my personal case, I am both the person designing the experiment, running the experiment, and analyzing it. This means that if I really like lavender tea, it may have a placebo effect on me.

To minimize this, whenever possible, I need to include some form of blinding. In other words, make it so that I do not know when I’m drinking lavender tea vs. when I’m not. In this case, this would be practically hard or impossible. I could fake some lavender flavored water and have my roommate shuffle them up, but it’s not super practical.

But when we analyze the data at the end, we can still log things as A and B so that we only know which one is A and which one is B after we’re done.

However, if we use some lavender extract pills for the experiment, one could imagine having pills with the extract and pills without the extract. Note that it would be better to have those pills be the same color/size, but have them in two different places. Even better, would be to have another person decide on which pill to give us every single day.

Note that there are experiments for which this would be impossible. For example, it would be hard to add blinding to taking a cold shower. Because the person taking the cold shower would feel it, I assume.

Is there a known possible carryover effect?

For example, if you’re experimenting with the impact of regular coffee vs. decaf on cognition: the first days when you stop coffee will be hard. This is known as a carryover. Where one treatment, such as coffee, can impact the other treatment period. So what do you do then?

One way to solve this is to add what is called a washout period. In the case of coffee vs. decaf, this can be adding one week in between coffee and decaf where you do not have either. This week is called a washout period because you give time for the first treatment’s effect to fade away.

In the case of coffee, it is easy to spot. But it can be very subtle. For example, if you’re experimenting with a diet such as the ketogenic diet, you should give your body enough time to adapt. Because in such a diet, there is an adaptation phase where it can be difficult to sleep at night, at the beginning. Or in the case of experimenting with some exercise program, you should remember that the next days will still be impacted. A prudent step to take in these cases is to understand the treatments very well before engaging in an experiment. This can help avoid making lots of mistakes when analyzing the experiment.

Conclusion

Designing the experiment is the first step. Of course, the list of steps above is not exhaustive. But at least they add some much-needed thoughtfulness into the world of quantified-self and biohacking.

The next step would be to actually start the experiment and take measurements. But which measurements can you take and how do you do that right? We will cover that in part 2 of this series. Then, in part 3 of the series, we will go through how to analyze the results.

Thank you for your time.

That’s all folks! If you liked this post please give it some claps

Disclaimer:
This content is for informational purposes only. This content is not intended to be a substitute for professional medical advice, diagnosis, or treatment. You must consult with a health-care practitioner, before undertaking any of the exercises, habits, protocols, techniques or otherwise.
This content does not constitute the practice of medicine or any other professional health care services, including providing medical advice. The use of information on this post is at the user’s own risk.