When Life Gives You Lemons
An Everyday Look At A /B Testing 🍋 One of the greatest pleasures in life is a cool glass of lemonade after a hot day laboring over your keyboard. You’ve spent the afternoon anticipating this evening’s party. Deserving a break, you meander over to the fridge.
To your utmost horror, you find that you drank the last of the lemonade yesterday. Everyone’s coming to your party this evening — but now all you have in the fridge is water, sugar, and whole lemons. And the worst thing: you’ve forgotten the ratio of ingredients! 😲
Error: PromptContext provided undefined context No worries Andrew, I’ll Google the recipe. Not so fast! As an intrepid data scientist, you must design a scientific test which will determine the perfect ratio of water, sugar, and lemons!
Welcome to the first chapter of Everyday Data Science , where you’ll learn ...
🪙 How to make decisions with A/B testing
🎰 How to get the most money from a multi-armed bandit
🛎️ How to model those bandits with the Beta distribution
🤹 How to perform your A/B test rapidly using Thompson sampling
Error: PromptContext provided undefined context Fine, let’s do it scientifically ... You rack your brain ... it was seven cups total, and one cup of lemon juice ... but was it one or two cups of sugar? We’ll call these candidates Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A and Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B :
Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A : 1 cup lemon juice, 1 cup sugar, and 5 cups of water.
Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B : 1 cup lemon juice, 2 cups sugar, and 4 cups of water.
Your job is to discover which of these is best, using an A/B test.
Error: PromptContext provided undefined context Hang on, what do you mean, “best”? A great question! To answer that, let’s look at a more traditional A/B test.
What is A/B testing? By day, you work as a data scientist for a large shopping website named after a massive rainforest. C ongo ⏡ .com \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\orange{\text{C}\undergroup{\text{ongo}}\text{.com}} C ongo .com . Your website sells tons of goods everyday. However, you believe changes to the product page can increase sales, and profits, even further.
Our current page, shown here as Page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B , has the Buy Button (shown here with the $$$ box) before the information about what we are selling. But customers might be better informed, and then end up buying our product, if the button came after the product’s information. That’s shown here as Page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A .
We like money, so the “best” page here is the one that gets the most sales. Which page do you think we should use?
Error: PromptContext provided undefined context Page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A ! Error: PromptContext provided undefined context Page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B ! Error: PromptContext provided undefined context I’m not sure ... Nor am I! We need data to help us decide!
You’re very confident! But I’m not so sure. We need data to help us decide!
You’re very confident! But I’m not so sure. We need data to help us decide!
A/B testing is a way to test our hypothesis and determine for sure which product page is best. There are many ways to run an A/B test, and they can get very complicated!
Error: PromptContext provided undefined context Okay, but I’m just an everyday data scientist!
The simple way Indeed, as an everyday data scientist, your toolkit looks different. So let’s explore the simplest type of A/B test, before taking a peek at more complicated versions.
Continuing with the product page example, you get the go ahead from management to run your test. You work with engineering to build a system that routes people to one of the two pages, randomly, based on a coin flip.
After a while, the data looks like this:
Page A 0 1 1 1 1 1 1 0 Page B 0 1 0 1 1 1 0 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{matrix}
\text{Page }\A & \0 & \1 & \1 & \1 & \1 & \1 & \1 & \0 \\
\text{Page }\B & \0 & \1 & \0 & \1 & \1 & \1 & \0 & \1
\end{matrix} Page A Page B 0 0 1 1 1 0 1 1 1 1 1 1 1 0 0 1 In the lines above, a 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 means that a customer was directed to the page, and then bought the item. A 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 means they left without buying anything.
(We’re making some assumptions about the customer behavior here that are important. For example, we ignore the case where a customer comes back later and buys the item.)
Above, you see 16 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
16 16 customer visits, 8 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
8 8 for each page. Now we have some data, which page would you say is better?
Error: PromptContext provided undefined context Page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A is better Error: PromptContext provided undefined context Page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B is better Error: PromptContext provided undefined context It’s still unclear ... I think so too! Sure, page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A got 6 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
6 6 purchases, while page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B only got 5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
5 5 . But perhaps page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B just got thriftier visitors!
Page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A got 6 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
6 6 purchases out of 8 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
8 8 visits. Page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B only got 5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
5 5 purchases out of 8 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
8 8 visits. So in this little sample, page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B did a bit worse than page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A .
But I would say it’s still unclear. Perhaps, through the randomness of the coin flip, page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B just got thriftier visitors!
Error: PromptContext provided undefined context Okay Well, yes, page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A got 6 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
6 6 purchases, while page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B only got 5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
5 5 . So you could say A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A performed a bit better.
However, I would say it’s still unclear! Perhaps, through the randomness of the coin flip, page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B just got thriftier visitors!
Error: PromptContext provided undefined context Fair enough. In other words, the results are not yet statistically significant. We’re gonna need more data! So you get permission from management to run the test for two weeks, after which both pages have 5000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
5000 5000 recorded data points. You find that page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A has more 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 values than page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B :
0 1 T o t a l Counts for Page A 1725 3275 5000 Counts for Page B 2198 2802 5000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{matrix}
& \0 & \1 & \bold{Total} \\
\text{Counts for Page }\A & 1725 & 3275 & 5000 \\
\text{Counts for Page }\B & 2198 & 2802 & 5000
\end{matrix} Counts for Page A Counts for Page B 0 1725 2198 1 3275 2802 Total 5000 5000 In the jargon of A/B testing, Page A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A now has a conversion rate of 66 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
66\% 66% , while page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B only has a 56 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
56\% 56% conversion rate.
Error: PromptContext provided undefined context But is it statistically significant yet? Error: PromptContext provided undefined context I see how this could solve the lemonade dilemma ... Excellent! Which of the following are like “pages”, i.e. things we need to decide between?
Error: PromptContext provided undefined context Ingredients Error: PromptContext provided undefined context Recipes Error: PromptContext provided undefined context Friends at the party Not quite. The pages are things we’re trying to decide between. For the party, you’re trying to work out which lemonade recipe is the best, so pages are similar to recipes, like “Lemon, sugar and water in 2:1:4 ratio”.
Error: PromptContext provided undefined context Okay Right! We’re trying to decide between recipes, so the pages are similar to recipes, like “Lemon, sugar and water in 2:1:4 ratio”.
So, serving up page B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B to a visitor and then watching whether they purchase is like following Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B , and then asking a friend to try the lemonade. You will present a cup to your friend, and they can choose to drink it all ( 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 ) or leave it after a sip ( 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 ).
Error: PromptContext provided undefined context What if they just drink half? Then tell them to read the instructions more carefully! Yes, we’re reducing all the subtlety of human aesthetics down to a binary judgement. But this makes it easier to model.
We’ve reached what I call the simple method for A/B testing: wait for a large amount of time and visitors, so that our result is very likely to be statistically significant. In this case, I recommend calling 10 000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
10\,000 10 000 friends, and splitting them into two groups: 5000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
5000 5000 will get Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A , and the other 5000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
5000 5000 will get Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B .
Error: PromptContext provided undefined context Are you kidding? 10 000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
10\,000 10 000 friends? Give half bad lemonade? Before 8pm? Hmm ... I see your predicament. Yes, the simple method requires a lot of test subjects. And a lot of time. And many of the test subjects — I mean, friends — will get the bad version. Yes, the simple method has several problems ...
Error: PromptContext provided undefined context Let me guess, you’ll tell me there’s a better way.
Yes, there’s a better way! 🤠 A/B testing can be modeled as a multi-armed bandit problem.
Picture a slot machine with three arms (above). Via an industrial spy, you know how the machine works: the first arm gives $ 10 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\$10 $10 when pulled, the second arm gives $ 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\$1 $1 , and the third gives $ 25 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\$25 $25 . To make the most money in the fewest pulls, which arm should we be pulling?
Error: PromptContext provided undefined context Pick number 3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
3 3 , m’lord! Yes, I agree: the optimum strategy is to pull arm 3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
3 3 every time, because it gives out the most money. That was a fairly easy one.
Now picture a different slot machine. As before, it has three arms, each with a fixed dollar value. But this time, you don’t know the values ahead of time. You want to take as few tries as possible to find the best arm, as then you can focus on pulling the best arm as often as you can. How many pulls do you need to make in total, before you know the best arm?
Error: PromptContext provided undefined context 1 pull Error: PromptContext provided undefined context 2 pulls Error: PromptContext provided undefined context 3 pulls Error: PromptContext provided undefined context More than 3 pulls Actually, I would say we need to pull each arm exactly once, to know the fixed dollar value of each arm. After that, we just continue with whichever value is highest.
Error: PromptContext provided undefined context Okay Right: we pull each arm once, and then continue with whichever gave out the highest dollar amount.
Now let’s apply the bandit model to your lemonade dilemma. Each arm is labelled with a possible recipe. When you pull the arm “Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B ”, it dispenses a cup of that lemonade, you give it to a friend to evaluate, and then you record the result ( 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 or 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 ).
If there are just two arms, Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A and Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B , how many arm pulls will you need to make before you know the best recipe? (Note: “best” here means “most likely to be liked by a random new friend.”)
Error: PromptContext provided undefined context Twice Error: PromptContext provided undefined context More than twice I think that’s a sensible answer if all your friends have the same preferences. You’re using the greedy™ approach: try each arm, then always pull the arm that gives maximum reward.
But real people all have slightly different preferences: the pulling of arms is now probabilistic. What if the first two friends have weird tastes? Then the greedy approach would give every future friend their unrepresentative weird preference!
Error: PromptContext provided undefined context Okay I agree! Real people all have slightly different preferences, so the pulling of arms is now probabilistic.
We now have what’s called a multi-armed bandit. Each pull now has a random payout, but some arms have a higher average payout than other arms, so we want to pull those arms once we find them.
We’ll say that each recipe has some probability of success, p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p . We’re looking for the recipe with the highest probability p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p . To get an idea of the probability p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p for a recipe, we need to take several samples.
Error: PromptContext provided undefined context I knew that, Andrew. I still don’t have 10 000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
10\,000 10 000 test subjects. But you’re finally in a position to see the better method! Enter Thompson sampling. Here’s the idea.
Say you give Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A to a friend, and she likes it! 🤩 How would this change the chance that, when the next friend comes along, you’ll give them Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A ?
Error: PromptContext provided undefined context More likely Error: PromptContext provided undefined context Less likely Error: PromptContext provided undefined context No change That’s reasonable. After all, that was the moral you just learned: perhaps Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A is bad, and the first person just has weird taste! Doesn’t that mean you don’t have enough information to make a decision?
However, in the real world, I think you would want to please the next friend, and so you would be a bit more likely to serve Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A .
Error: PromptContext provided undefined context Okay I’m not your life coach, but perhaps your ‘friendships’ are unhealthy? I think you should want to please the next friend, so you should be a bit more likely to serve Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A .
Error: PromptContext provided undefined context Okay That’s the idea: Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A seems good and you want to please the next friend, so you should be a bit more likely to serve Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A . (But you should still be suspicious: maybe the first person just had weird taste!)
Thompson sampling uses the information you gain during the test to make it more efficient while you are testing . Roughly: you serve the more promising recipes more often, but ensure that you still serve the less promising recipes occasionally, because you still want to confirm that they’re bad.
This method was first introduced in 1933, but was pretty much ignored by the academic community until decades later. Then at the beginning of the 2010’s, it was shown to have very strong practical applications which has led to a widespread adoption of the method.
The idea can be complicated to understand, but simple to implement and incredibly powerful. You’ll impress your friends with the perfect glass of lemonade.
Error: PromptContext provided undefined context Okay but it’s 6pm. People are arriving soon!
The Bernoulli distribution Patience! First, you must understand a couple of probability distributions!
In your earlier A/B test, you flipped a coin to determine which page to serve a visitor. In English, you might say “Either the coin lands Heads \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{\text{Heads}} Heads or Tails \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{\text{Tails}} Tails . It’s an unfair coin, where the probability is 60 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\blue{60\%} 60% that it lands Heads \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{\text{Heads}} Heads . I flip the coin, and call its result C \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
C C .” Here’s how you might sketch that:
Mathematicians use a different language to say the same thing: “ C ∼ Bernoulli ( p = 0.6 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
C \sim \BERN{p=0.6} C ∼ Bernoulli ( p = 0.6 ) . Therefore P ( C = 1 ) = 0.6 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
P(C = \1) = \blue{0.6} P ( C = 1 ) = 0.6 , and P ( C = 0 ) = 1 − 0.6 = 0.4 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
P(C = \0) = 1-\blue{0.6} = 0.4 P ( C = 0 ) = 1 − 0.6 = 0.4 .”
You can read this aloud as “ C \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
C C follows a Bernoulli distribution, with parameter p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p equal to 0.6 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.6 0.6 . Therefore the probability that C = 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
C = \1 C = 1 is 0.6 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\blue{0.6} 0.6 , and the probability that C = 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
C=\0 C = 0 is 0.4 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.4 0.4 .” The mathematician might sketch this like so:
Now, let’s check your understanding. Say I have a biased coin that lands Tails \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{\text{Tails}} Tails a full 70 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
70\% 70% of the time. I flip the coin, and call its result D \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D D . How would you describe D \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D D ?
Error: PromptContext provided undefined context D ∼ Bernoulli ( p = 0.7 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D \sim \BERN{p=0.7} D ∼ Bernoulli ( p = 0.7 ) Error: PromptContext provided undefined context D ∼ Bernoulli ( p = 0.3 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D \sim \BERN{p=0.3} D ∼ Bernoulli ( p = 0.3 ) It’s the other way around. p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p is the probability that D = 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D = \1 D = 1 , i.e. Heads \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{\text{Heads}} Heads . If D \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D D is 70 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
70\% 70% likely to be Tails \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{\text{Tails}} Tails , then it’s 30 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
30\% 30% likely to be Heads \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{\text{Heads}} Heads . We denote 30 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
30\% 30% as 0.3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.3 0.3 .
Error: PromptContext provided undefined context Okay Right: D \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
D D is 30 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\blue{30\%} 30% likely to be Heads \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{\text{Heads}} Heads .
Similarly, there is a certain underlying probability p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p about people’s preference for a lemonade recipe. Say R \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
R R is the result of offering a lemonade to a random friend. They will either drink it ( R = 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
R=\1 R = 1 ) or reject it ( R = 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
R=\0 R = 0 ). In probability-speak, we can write R ∼ Bernoulli ( p ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
R \sim \BERN{p} R ∼ Bernoulli ( p ) . We’re imagining each recipe is a biased coin, and trying to find the recipe with the highest p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p .
Sampling Once you have a distribution, you can generate data that fits that distribution, using sampling. To sample from the distribution below, imagine randomly picking one of the small blue squares. Once you’ve picked a random square, output the number below it.
By following this procedure, you’re sampling from the distribution Bernoulli ( p = 0.6 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BERN{p=0.6} Bernoulli ( p = 0.6 ) . By doing it repeatedly, you’ll generate a list of 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 s and 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 s where 60 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
60\% 60% of the results will be 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 . You can use this technique to sample from any distribution.
Error: PromptContext provided undefined context Cool, but ... I already have the 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 s and 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 s! I need to figure out p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p !
The Beta distribution Indeed! Enter the Beta distribution. In a sense, the Beta distribution is the inverse of sampling from a Bernoulli distribution. It lets us go backwards: “given some samples from a Bernoulli distribution, what is its probable value of p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p ?”
Let’s build some intuition. Say you make Recipe Q \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
Q Q , and test it 100 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
100 100 times, observing 49 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{49} 49 successes and 51 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{51} 51 failures. What do you think Recipe Q \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
Q Q ’s value of p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p is?
Error: PromptContext provided undefined context Close to 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 Error: PromptContext provided undefined context Close to 0.5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.5 0.5 Error: PromptContext provided undefined context I don’t know If p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p were close to 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 , that would mean it’s nearly always a success, so you should expect nearly 100 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{100} 100 successes. But you observed only 49 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{49} 49 . I think p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p is closer to 0.5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.5 0.5 .
Error: PromptContext provided undefined context Okay I think so too. It’s possible that, say, p = 0.01 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p=0.01 p = 0.01 , but very unlikely. It’s much more likely that p = 0.5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p = 0.5 p = 0.5 , or close to that.
Fair enough; you can’t know exactly . But you should have a pretty good idea! If p = 0.01 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p=0.01 p = 0.01 , for instance, it would be very unlikely to observe 49 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{49} 49 successes and 51 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{51} 51 failures.
Error: PromptContext provided undefined context Okay If we sketched this intuition, we’d get something like the following:
The above sketch is a probability distribution. Each small blue square is a possible value of p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p . This models our belief about p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p after 49 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{49} 49 successes and 51 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{51} 51 failures. The value of p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p p is probably around 0.5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.5 0.5 . It’s very unlikely to be near 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 or 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 . But we wouldn’t be too surprised if, say, p = 0.45 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\p = 0.45 p = 0.45 .
Mathematicians call the above distribution Beta ( 50 , 52 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{50}{52} Beta ( 50 , 52 ) . The Beta distribution has two parameters, α \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\al α and β \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\be β , which represent the number of 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 s and 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 s we’ve seen.
Error: PromptContext provided undefined context Cool, but hang on ... those numbers look off! Well spotted! The parameters α \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\al α and β \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\be β are the number of 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\1 1 s and 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\0 0 s we’ve seen ... plus one! So if we’ve seen 49 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
49 49 successes, we should set α = 50 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\al=50 α = 50 , and if we’ve seen 51 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
51 51 failures, we should set β = 52 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\be=52 β = 52 .
Error: PromptContext provided undefined context How bizarre! We won’t go into the reasons for the “plus one” here. Let’s just check you understand. You encounter another one-armed bandit. When you pull its arm, it cashes out at some unknown success rate z \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
z z . But you’ve never pulled the arm: you’ve observed zero successes and zero failures. Which distribution models your prior belief about z \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
z z ?
Error: PromptContext provided undefined context z ∼ Beta ( 0 , 0 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
z \sim \BETA00 z ∼ Beta ( 0 , 0 ) Error: PromptContext provided undefined context z ∼ Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
z \sim \BETA11 z ∼ Beta ( 1 , 1 ) Right: your observation counts are ( 0 , 0 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
(0,0) ( 0 , 0 ) , so the Beta parameters should be ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
(1,1) ( 1 , 1 ) .
No, you forgot to add one to each number. Your observation counts are ( 0 , 0 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
(0,0) ( 0 , 0 ) , so the Beta parameters should be ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
(1,1) ( 1 , 1 ) .
Error: PromptContext provided undefined context Ahh, yes. The sketch above is the Beta distribution’s probability distribution function, f ( x ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
f(x) f ( x ) . You can draw it more precisely using its equation:
f ( x ) = x α − 1 ( 1 − x ) β − 1 Γ ( α β ) Γ ( α ) Γ ( β ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
f(x) = \frac{x^{\al - 1}(1-x)^{\be - 1}\Gamma(\al \be)}{\Gamma(\al)\Gamma(\be)} f ( x ) = Γ ( α ) Γ ( β ) x α − 1 ( 1 − x ) β − 1 Γ ( α β ) We won’t derive or use this equation here, but here are some example plots:
You’ve seen already that the blue line is Beta ( 50 , 52 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{50}{52} Beta ( 50 , 52 ) . (Note, as before: you should be visualizing each line as representing all the little squares stacked under that line, each square being a possible value. But we only draw the line, so that we can draw multiple distributions in the same plot.)
There are three other lines above, and one of them is Beta ( 2 , 3 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{2}{3} Beta ( 2 , 3 ) . But which line is it? Think about what Beta ( 2 , 3 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{2}{3} Beta ( 2 , 3 ) represents, and use your intuition instead of calculating.
Error: PromptContext provided undefined context The red line Error: PromptContext provided undefined context The orange line Error: PromptContext provided undefined context The green line It’s not that one. Beta ( 2 , 3 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{2}{3} Beta ( 2 , 3 ) represents the probability of each possible p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
p p value after seeing 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{1} 1 success and 2 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{2} 2 failures.
The red line is completely flat: it claims that all possible values of p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
p p are equally likely.
But we know at least that p ≠ 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
p\neq0 p = 0 , because that would result in only failures (i.e., we would have seen 3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{3} 3 failures). So it cannot be the red line.
For the same reason, p = 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
p=1 p = 1 is also impossible: we would have seen 3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{3} 3 successes. So the graph of Beta ( 2 , 3 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{2}{3} Beta ( 2 , 3 ) must also end at 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 . This rules out the green line.
This leaves only the orange line. To confirm, notice that it peaks around 0.3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.3 0.3 . A success probability of 30 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
30\% 30% seems very reasonable after seeing 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{1} 1 success and 2 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{2} 2 failures.
Error: PromptContext provided undefined context Okay Right! Beta ( 2 , 3 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA{2}{3} Beta ( 2 , 3 ) represents 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{1} 1 success and 2 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{2} 2 failures. We know that p \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
p p can’t be 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 or 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 , because then we would get only failures or only successes. This rules out the red and green lines, leaving only the orange line. It peaks around 0.3 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.3 0.3 : a probability of 30 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
30\% 30% seems very reasonable after seeing 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\green{1} 1 success and 2 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\red{2} 2 failures.
By the way, the red line is Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA11 Beta ( 1 , 1 ) , and the green line is Beta ( 3 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA31 Beta ( 3 , 1 ) . If you feel like testing yourself more, here’s a beta distribution calculator .
Error: PromptContext provided undefined context But seriously, Andrew. My friends are arriving.
Building a Thompson sampler 🧱 Okay, don’t panic: with Thompson sampling, you can test throughout the party, rather than having to do it in advance! Here’s your clipboard to tally up the results:
Wins Losses Recipe A 0 0 Recipe B 0 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{matrix}
& \green{\text{Wins}} & \red{\text{Losses}} \\
\text{Recipe }\A & 0 & 0 \\
\text{Recipe }\B & 0 & 0 \\
\end{matrix} Recipe A Recipe B Wins 0 0 Losses 0 0 Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A has some success rate a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a , and Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B has some success rate b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b . We want to figure out which is highest, a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a or b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b . We can draw their probability distribution with our current (lack of) knowledge:
a ∼ Beta ( 1 , 1 ) b ∼ Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{align*}
\a &\sim \BETA11 \\
\b &\sim \BETA11
\end{align*} a b ∼ Beta ( 1 , 1 ) ∼ Beta ( 1 , 1 ) Consult the plots above. What does Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA11 Beta ( 1 , 1 ) look like?
Error: PromptContext provided undefined context A bell curve Error: PromptContext provided undefined context A flat line Right, it’s the straight horizontal line.
See the earlier example plots: Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA11 Beta ( 1 , 1 ) is the straight horizontal line.
Error: PromptContext provided undefined context Okay, I see it now. This distribution means we have no prior belief about the rates a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a or b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b . Or, rather, we think all success rates between 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 and 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 are equally likely.
Error: PromptContext provided undefined context So, Claire just arrived. Should I give her A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A or B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B ? Oh, hi Claire! Finally we get to use the Thompson sampling algorithm! It’s a very short algorithm: guess the values of a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
{\boldsymbol{\a}} a and b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
{\boldsymbol{\b}} b , then serve the recipe with the higher guess.
Error: PromptContext provided undefined context What do you mean, “guess the values”? Guessing the value of a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a means sampling from a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
{\boldsymbol{\a}} a ’s probability distribution. Recall the “small blue squares under the line” visualization. To make a guess, we pick a random square, then output its value on the x \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
x x -axis.
We need to make two guesses, g a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\a} g a and g b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\b} g b . Now let’s try out the algorithm on Claire.
Error: PromptContext provided undefined context Okay Andrew, sample from Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA11 Beta ( 1 , 1 ) , please. Done! Your random sample is: g a = 0.76 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\a}=0.76 g a = 0.76 .
Error: PromptContext provided undefined context Thanks. Sample Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA11 Beta ( 1 , 1 ) again, please. Done! Your random sample is: g b = 0.63 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\b}=0.63 g b = 0.63 . Now to check: do you recall what these guesses mean ? When we guess that a = 0.76 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a = 0.76 a = 0.76 , what are we really saying?
Error: PromptContext provided undefined context Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A has success rate of 76 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
76\% 76% Error: PromptContext provided undefined context Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A is 76 % \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
76\% 76% likely to be best No, we’re guessing at each recipe’s success rate . The probability that Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A is best is not something we ever calculate in this algorithm.
Error: PromptContext provided undefined context Okay Right, we’re guessing the success rates of each recipe, then choosing which to serve based on those guesses.
So now, which recipe should you serve to Claire?
Error: PromptContext provided undefined context Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A Error: PromptContext provided undefined context Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B No, you should serve A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A . Check: g a > g b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\a} > g_{\b} g a > g b , because 0.76 > 0.63 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0.76 > 0.63 0.76 > 0.63 . Thompson says you must serve the variant with the highest guess, which is A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A .
Right: g a > g b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\a} > g_{\b} g a > g b , so you should serve A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A .
Error: PromptContext provided undefined context Okay, she’s trying Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A ... Oh no, Claire hated it! 🤢 She takes herself to the restroom to recover, and you dutifully record her reaction on the clipboard:
Wins Losses Recipe A 0 1 Recipe B 0 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{matrix}
& \green{\text{Wins}} & \red{\text{Losses}} \\
\text{Recipe }\A & 0 & \bold{1} \\
\text{Recipe }\B & 0 & 0 \\
\end{matrix} Recipe A Recipe B Wins 0 0 Losses 1 0 Now John is arriving. No time for greetings! You need to make two new guesses. For Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A , which distribution do you need to sample from?
Error: PromptContext provided undefined context Beta ( 1 , 1 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA11 Beta ( 1 , 1 ) Error: PromptContext provided undefined context Beta ( 1 , 2 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\BETA12 Beta ( 1 , 2 ) No, that’s what it was last time, but now your wins+losses table has updated. Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A has 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 wins and 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 loss. We add 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 to each of these, getting ( 1 , 2 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
(1,2) ( 1 , 2 ) .
Error: PromptContext provided undefined context Ah, yes. Right. Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A has 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 wins and 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 loss. We add 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 to each of these, getting ( 1 , 2 ) \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
(1,2) ( 1 , 2 ) .
You get two new guesses, g a = 0.31 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\a}=0.31 g a = 0.31 and g b = 0.59 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
g_{\b}=0.59 g b = 0.59 .
Error: PromptContext provided undefined context Okay, John’s trying Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B . ... aaaand he loves it! 🤩 Once again, you record his reaction:
Wins Losses Recipe A 0 1 Recipe B 1 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{matrix}
& \green{\text{Wins}} & \red{\text{Losses}} \\
\text{Recipe }\A & 0 & 1 \\
\text{Recipe }\B & 1 & 0 \\
\end{matrix} Recipe A Recipe B Wins 0 1 Losses 1 0 Let’s look at the distributions of a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a and b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b now:
Lucy’s about to arrive. Looking at the above distributions, which recipe are we more likely to serve her?
Error: PromptContext provided undefined context Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A Error: PromptContext provided undefined context Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B Recall again that each line represents the stack of boxes under that line. Most of line a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a ’s boxes are close to 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 , but most of line b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b ’s boxes are close to 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 . So we’re more likely to get a larger guess for b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b , which would cause us to serve Recipe B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B .
Error: PromptContext provided undefined context Okay Right: most possible values of a \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\a a are close to 0 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
0 0 , while most possible values of b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b are close to 1 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
1 1 . So we’re more likely to get a larger guess for b \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\b b .
This doesn’t mean B \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\B B has won yet, however! After testing out these two recipes against a few friends, you end up with:
Wins Losses Recipe A 7 1 Recipe B 1 5 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\begin{matrix}
& \green{\text{Wins}} & \red{\text{Losses}} \\
\text{Recipe }\A & 7 & 1 \\
\text{Recipe }\B & 1 & 5 \\
\end{matrix} Recipe A Recipe B Wins 7 1 Losses 1 5 It turns out your first two guests were anomalies! Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A is now winning. Our algorithm will now suggest lemonade A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A to our friends more often so they don’t miss out on the tastiest recipe.
As the number of wins for a lemonade gets higher, the numbers between 0 and 1 picked for that lemonade will be higher as well, making that lemonade more likely to be picked moving forward. The reverse is also true as the losses get higher, with the lemonade being less likely to get picked.
Because Thompson sampling updates itself after each new piece of information, we don’t have to wait two weeks for our results. Instead, each person we test will make the model better as we test, letting us use just our small friend group, rather than 10 000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
10\,000 10 000 people. It is far more efficient than the naive method we tried first, and our friends will get good lemonade much faster, resulting in less time lost drinking bad lemonade.
Error: PromptContext provided undefined context Thanks, Andrew! You’ve saved the party!
Wrapping up 📦 In the end, we have a few lines of decision-making code based on sound probabilistic principles that is guaranteed to converge. We can increase the satisfaction of our friends and quickly discover which lemonade is the best. In this case, according to our experiments, the answer is Recipe A \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
\A A : 1 cup lemon juice, 1 cup sugar, and 5 cups of water. We didn’t need to test out our method against 10 000 \def\A{\purple{\bold{A}}}
\def\B{\blue{\bold{B}}}
\def\a{\purple{a}}
\def\b{\blue{b}}
\def\1{\green{\bold{1}}}
\def\0{\red{\bold{0}}}
\def\al{\green{\alpha}}
\def\be{\red{\beta}}
\def\BETA#1#2{\text{Beta}(\green{#1}, \red{#2})}
\def\BERN#1{\text{Bernoulli}(\blue{#1})}
\def\p{\blue{p}}
10\,000 10 000 friends to be statistically confident. And in the end, we have a foolproof method that we can apply to any number of culinary needs.
Error: PromptContext provided undefined context Finish lesson 🍋 Next in Everyday Data Science: