∫ The reign of expectation terror
Kernel:
- In a certain expectation game, there's a finite convergence strategy.
Humans are tenacious, persistent, and hardly know when to quit, whether it's about when to stop playing games and go to sleep, or when to sell holding stocks to cut loss. In computer science vernacular, it's called a "stopping problem".
I've read a great blog by Samir Khanhttps://patternsofideas.github.io/posts/expectations/. He argued that in some scenarios, when it comes to choose the optimal stopping, not even a great tool such as the expectation of payoff helps.
Imagine a game of sequential bets where at each step we could decide to stop and take all the credit we've collected so far or we could take a chance to double it, only if the chance fails then we lose it all.
We could try to do some math to decide whether should we continue at each step. But if the chance \(p > 0.5\) You will see that the expected payoff favors we take the bet. Yet if there's anything we learn from probability, we know that eventually we would fail if we don't stop. That we will surely get back empty-handed, or continue betting until we do. So expectation fails us! ... or not?
The reason why we use expectation is to marginalize out all uncontrollable factors, and that reason is pretty solid. If we follow the expectation yet the outcome does not making sense. Perhaps your reward function does not truly reflect what you feel?
Let's us consider the scenario where we don't actually play the game. We only input a number, say k , that tells the game how many times we would take the bets, the game itself will then instantly resolve the game and display the outcome to you. When treating the game this way, I feel more comfortable taking a larger \(k\) from the fact that now I've gotten rid of in-game cognitive biases.
Humans have so many cognitive biases. One in active here is called loss aversion. When the stakes are high enough, gaining more might not be more important than preventing loss. Thus we incline to stop to collect what we already have than going back to nothing. This seems like the bias works against us to define what's the best interest.
Still I would not choose \(k = \infty\), despite the fact that it gives the best expected payoff; i.e., there is a slim probability that I would get a handsome reward. Am I wrong? Or are there some other cognitive biases in work here? If not, then may be the expectation fails me?
While I enjoy Khan's premise and statements, I have to disagree with his conclusion. It's never been about expectation alone. It's also about policies, and we have to use the expectation to derive a sensible policy. One that does not involve betting to the infinity to get the reward.
Suppose that at each step, I'll toss a biased coin that goes head with probability \(q\). If it comes out head, I'll continue playing; otherwise I'll stop and the current reward. What is the expected payoff of this policy?
The expected payoff of this random policy goes to infinity when \(q\) approaches \(\frac{1}{2p}\). Meaning sometimes I would secure the reward, sometimes I will continue, sometimes I will get nothing, but in overall the expected payoff is infinity just like the case where I never stop in the first policy. Now we don't leave our fate at the hand of infinitesimal god, but rather the marvel of mathematics.
Now which policy is faster? Faster means I could expectedly get the same reward while enjoy a shorter game. We could analytically show that the expected stopping point of the random policy is always sooner than the first policy at the same expected payoff.
Given an expected payoff of \(xr > 0\), I could derive the parameters of both policies:So expectation has yet to fail. Because we're not only looking at actions but on the scale of policies when considering the expectation. An intelligence agent is one that chooses policies that is most efficient for the best expected payoff.