How do 'luck' calculations work..

Zorba · July 03, 2010, 10:50:34 PM

Quote from: pck on July 03, 2010, 09:19:53 AM
I agree. What I had assumed was that non-perfect equity calculations should at least be internally consistent, that is, the numbers should add up. Obviously that is not the case. About the reason I still wonder.

It's similar to the fact that 1-ply evaluations are not consistent with 0-ply evaluations. Neural nets aren't necessariliy consistent among plies, in practice, they never are on the whole.

QuoteNevertheless this is very important for the consideration of what we call "skill". If we can attribute the term "skill" only to the whole of a bot's algorithm, that is, to its behaviour in all possible match situations, and if it could further happen that algorithm A1 beats A2 (in the long run), A2 beats A3, and A3 beats A1, then that would eliminate the possibility of attribution of skill for these three bots. Skill would only be a partial order on the set of all possible bot-algorithms. To prove that this is or isn't possible should be difficult.

As long as both players (bot or not) in a bg match are imperfect, this partial ordering can definitely exist and very likely does exist. F.i. against GnuBG 0-ply, some players have developed strategies that with a high probability will lead to positions where the bot starts playing very poorly, leading to almost certain wins from that point. These same players couldn't do that to most expert human players. Yet, these expert human players, when not aware of these special "bot-killer" strategies, might themselves lose against GnuBG 0-ply on average.

What we're aiming for with skill assessments is a measure of error against a perfect bot. That's why apart from 0-ply or 2-ply evaluations, many players use 3- or 4-ply for evaluations and better yet, rollouts, to try and remove as much bias as possible from the bots numbers.

What's so interesting about luck evaluations (compared to error evaluations) is that they are, by definition, unbiased, so they can't favour certain game type or strategies more than any other, in the long run. This is not true for error evaluations, which can be (and actually are) biased, as the bots play certain game types much better than others.

Zorba · July 03, 2010, 11:21:03 PM

Quote from: sixty_something on July 03, 2010, 11:32:53 AM
is it safe to say we can drop the word "trivial" from your and Zorba's somewhat dismissive rebuttals of my suggestions?

i think you may be observing some of the same things i did when i first looked at "luck" calculatioins in detail ... like you, i "still wonder" about the reasons why what seems conceptually obvious simply does not add up when put to the test of striking a balance sheet for "luck" calculations

when observed values don't match theory and assumed concepts, perhaps it is time to revisit both with an open mind .. i am

Sorry for being dismissive, but I'm just trying to be clear on this and avoid confusion. Once you settle for the definition of luck as it has been described by some bg theorists (f.i. Zare in his articles) and the calculation of luck as it has been used in many backgammon programs in the past and the present, then some statements are just false, as they are mathematically incorrect.

"Overall luck tends to zero in the long run", regardless of the moves made by any player, is one such result from using this definition of luck. So I just don't think trying to argue this point makes any sense; it's a mathematical result.

What you can argue about, is how it comes that some people feel "something's wrong" with this approach. Maybe you don't agree with the above definition of luck. But in that case, perhaps you should try to come up with an alternative definition of luck. In the process, you might start to appreciate the current definition of luck

I'd like to mention Zare's article once more, he describes it much better than I can:
http://www.bkgm.com/articles/Zare/AMeasureOfLuck.html

I'm all for an open mind, but I feel pretty strongly that it's no use to draw the conclusion from a very small sample (such as my earlier posted luck curve) that "observed values don't match theory". Actually, pck earlier claimed that in his larger sample, luck got very close to zero. If you do very long sessions with bots playing each other, as many people have done, luck will also tend towards zero.

pck · July 04, 2010, 10:14:14 AM

Quote from: Zorba on July 03, 2010, 10:05:18 PM
Just because GnuBG 0-ply plays both sides, does not mean it plays both sides equally well in individual games. Only in the long run will they show equal skill levels. So the outcome of a particular match is not completely luck dependent: one side may have given up much more equity in errors than the other side.

There is one case where gnu (or any bot) should play both sides equally well: If analysis and player plys are set to equal levels, gnu should not find that any player has made any mistake. And indeed it doesn't. Both ERs are always 0. Hence all equity change in the match should be attributed to luck and the luck adjusted result should always be zero.

However, the phenomenon blitzxz and I described persists when ply_analysis = ply_players.

Quote from: Zorba on July 03, 2010, 10:05:18 PM
Furthermore, GnuBG's luck evaluations, just like its error evaluations, are not perfect, so this is another factor contributing to inaccuracies, especially noticeable in the short run.

I assume ply_analysis = ply_players. Analyses may not be perfect, but the hard question is, why can gnu not make the numbers consistent (as opposed to correct) within any of its n-ply worlds, however flawed they may be in an absolute sense. Why is it not possible, with both ERs = 0 and skill out of the picture, to add up all equity changes produced by the dice rolls and get a total change of +50% for the winner and -50% for the loser?

This question is reinforced by the Zare articles you linked to:

Final − Initial = Net Luck + Net Skill in http://www.bkgm.com/articles/Zare/HedgingTowardSkill.html

(A formula like this, which is based on the concept that what is not skill is precisely luck, was actually the reason I posed my question in the first place.)

Quote from: Zorba on July 03, 2010, 10:05:18 PM
BTW, for practical purposes: GnuBG defaults to using 0-ply for its luck calculations. Luck calculations can be considered more difficult than normal evaluations, as 21 different dice rolls and their best plays have to be considered, so it's no surprise that 0-ply luck analysis can give rather inaccurate results generally. Use GnuBG's command line and the command boomslang mentioned to increase the ply level of GnuBG's luck analysis.

With different methods/ply-levels for luck and move evaluations, it is clear that discrepancies can occur (see below).

Quote from: Zorba on July 03, 2010, 10:05:18 PM
Another interesting thing to consider here is that a n-ply luck analysis is closer to a (n+1)-ply error analysis than to a n-ply error analysis, due to the 21 different rolls that have to be analyzed. This is also true for the time it takes to do such an analysis: using 2-ply for luck analysis is about as slow as doing a 3-ply error analysis.

I ran a few tests. First I set luckanalysis and everything else to 2-ply (which took much longer than 3-ply analysis). The discrepancies still showed up but seemed to be lower than before (around 3-6% as opposed to 7-15%). But I could run only very few test matches because evaluation took so long (Intel Core 2 Duo 7300, 2.6 GHz, gnu set to use both cores). Next I tried luckanalysis at 1-ply and everything else at 2-ply (your n -> n+1 suggestion). In my first trial match, the LAR discrepancy was almost zero. But my celebratory mood subsided when in further trials it went back to 5-7%, and in one particular 1-point match I got an LAR as big as 23%.

So in the experiments the problem keeps showing up.

The conceptual question is still open. Luck is that which isn't skill. So why have seperate, independent luck and skill analyses? Why not have only a skill analysis (= best move analysis) and report luck as whatever equity changes are not due to skill? (Or vice versa.) Then the numbers would be consistent. Using two different evaluation modes will break formulas such as Zare's and create confusion. If there are any conceptual reasons for doing this, we have not begun to touch them yet.

pck · July 04, 2010, 10:25:21 AM

Quote from: Zorba on July 03, 2010, 08:06:19 PM
I don't know if it that's true, it seems very hard to find out what kind of distribution of net luck per move really underlies the graph.

Look familiar?

pck · July 04, 2010, 10:44:27 AM

Quote from: Zorba on July 03, 2010, 10:50:34 PM
What we're aiming for with skill assessments is a measure of error against a perfect bot. That's why apart from 0-ply or 2-ply evaluations, many players use 3- or 4-ply for evaluations and better yet, rollouts, to try and remove as much bias as possible from the bots numbers.

What's so interesting about luck evaluations (compared to error evaluations) is that they are, by definition, unbiased, so they can't favour certain game type or strategies more than any other, in the long run. This is not true for error evaluations, which can be (and actually are) biased, as the bots play certain game types much better than others.

This is a (clear, thanks for that) description of what happens, but not a justification for it. I was asking for the latter. The luck / skill evaluations as you describe them will necessarily break formulas such as Zare's, which express fundamental conceptions we have about luck and skill. So the numbers generated by gnu will conceptually be at odds with the understanding of luck and skill we start out with.

Of course it may be worth to pay that price if there are other insights to be gained from proceeding like this. But the discrepancies which the above methods introduce do not give us a measure, or get us closer to, an absolute measure of correctness (of best move or luck evals) - or if they do, we have no way of knowing it (rollouts do not guarantee greater accuracy, although it is of course fair to assume they do in most cases).

Still, I'd rather keep it simpler and have consistent n-ply worlds in which luck and skill add up. It may just be a matter of taste.

pck · July 04, 2010, 11:11:07 AM

Quote from: Zorba on July 03, 2010, 07:37:20 PM
No, but this is not about philosophy. Once you've settled for a definition, like the bots have and which has been explained here and elsewhere, there are correct and false conclusions you can draw from it.

And how do we settle on a definition? Surely not any definition will do. We need conceptual analysis before we can put our thoughts into a formula.
So it is most definitley about philosophy (= conceptual clarification). (But not only about philosophy.)

Quote from: Zorba on July 03, 2010, 07:37:20 PM
I simply mean the theoretical (mathematical) argument that luck will tend towards zero in the long run, regardless of a player's skill. That can be verified quite easily using mathematics.

Your statement above is correct, but it cannot contribute to the understanding of the phenomena of luck and skill.

We would never accept a definition of luck which has luck not tending towards zero in the long run. For such a definition would violate our common, pre-math understanding of luck. The formal definition builds on that understanding, not the other way round. So it does not come as a surprise that mathematical luck tends to zero - it could not be otherwise, since we would change the definition if it did not.
The formal proof that it does tend to zero is a verification of the conceptual validity of the definition. If it failed, we'd srcap our definition and start anew. We would not say "I have proved the most peculiar thing about luck - it doesn't tend to zero in the long run".
The reason we would say that is conceptual in nature, we do not (and cannot) appeal to any further formalisms here.

Quote from: Zorba on July 03, 2010, 07:37:20 PM
Most everything that's being discussed here is described very well in these articles by Douglas Zare:

http://www.bkgm.com/articles/Zare/AMeasureOfLuck.html
http://www.bkgm.com/articles/Zare/HedgingTowardSkill.html

These are good articles, but the reason they make sense is not that they define something, but that what they define is in harmony with our common understanding of luck and skill. If their definitions weren't, we would have no use for them, or at least wouldn't call them definitions of "luck" and/or "skill".

pck · July 04, 2010, 11:33:52 AM

Quote from: Zorba on July 03, 2010, 11:21:03 PM
What you can argue about, is how it comes that some people feel "something's wrong" with this approach. Maybe you don't agree with the above definition of luck. But in that case, perhaps you should try to come up with an alternative definition of luck. In the process, you might start to appreciate the current definition of luck

Zorba recommending conceptual investigation. I never thought I'd see the day. :: D--(

Quote from: Zorba on July 03, 2010, 11:21:03 PM
I'm all for an open mind, but I feel pretty strongly that it's no use to draw the conclusion from a very small sample (such as my earlier posted luck curve) that "observed values don't match theory". Actually, pck earlier claimed that in his larger sample, luck got very close to zero. If you do very long sessions with bots playing each other, as many people have done, luck will also tend towards zero.

I already remarked in a previous posting that luck/skill discrepancies may well cancel each other out over a lot of matches if their noise goes both ways (for which I see no reason why it shouldn't). It is quite possible to arrive at a result which matches the theory by means not justified by the theory.

pck · July 04, 2010, 11:46:28 AM

Quote from: Zorba on July 03, 2010, 10:05:18 PM
Another interesting thing to consider here is that a n-ply luck analysis is closer to a (n+1)-ply error analysis than to a n-ply error analysis, due to the 21 different rolls that have to be analyzed.

I experimented a little more and found a counterexample. The attached match was played and analysed at 2-ply. Its luck analysis ply-level is 1. The luck adjusted result is -23%. If you set luck analysis to 2-ply and re-analyse, the LAR becomes -13%. So there are matches where it gets better instead of worse when you switch from n,n+1 to n,n.

sixty_something · July 04, 2010, 02:30:54 PM

Quote from: socksey on July 02, 2010, 02:10:32 AM
Maybe it's an English thing!

socksey, are pck and Zorba speaking English enough for you?

English is my first and only language and i can barely keep up with reading them, much less replying .. seriously guys, nice work at elaborating on your positions .. i really haven't had time to read it all yet, but look forward to doing so

meanwhile, onward through the fog, err 92% relative humidity and heat, i am off for a morning walk

stiefnu · July 04, 2010, 08:47:02 PM

I got a wee bit lost amongst those calculations but this clip reminds me there's always someone even unluckier than me: http://www.youtube.com/watch?v=zH67LCz9GTY#

socksey · July 04, 2010, 09:02:04 PM

Indeed!

I can relate to that, stiefnu!

socksey

Coffee, chocolate, men..........some things are just better when they're rich! - unknown

diane · July 04, 2010, 10:09:20 PM

It is amazing how many times some people can tell that same story and think you are still interested

pck · July 04, 2010, 10:25:34 PM

Quote from: stiefnu on July 04, 2010, 08:47:02 PM
I got a wee bit lost amongst those calculations but this clip reminds me there's always someone even unluckier than me:

rofl

I admit defeat. This is so much better than any technical discussion could ever be.

boomslang · July 04, 2010, 11:21:25 PM

Quote from: boomslang on June 28, 2010, 03:05:42 PM
I think if you look at the luck of bots (and good players) have during a match or game, then you will see that they are more often 'lucky' (meaning having a positive sum of MWC/equity) than 'unlucky' (meaning having a negative sum of MWC/equity).
[...]
I dont know if anyone can back this up with empirical data though. I might give it a go if I can find some spare time.

... and so I did: GNUbg Expert played 31 times a 64pt match against GNUbg Beginner. In total 42000 moves were made in 763 games. Two histograms of the luck (in eq.) for each roll of both players are plotted in the first graph. Clearly, these are not normally distributed. The huge peaks at 0 are, amongst others, rolls when the player was on the bar against a closed board and rolls when already in a lost/won position.

In the second graph are two kernel density plots of the two players' total EMG luck in each game. Both distributions look bell shaped but are also not normally distributed (Lilliefors test p < 0.0002).

The better player (blue) had a positive luck 57% of the games. This differs significantly from 50% (p = 0.0001). This means that if you consider a bot 'lucky' when it has a positive luck, it not just appears lucky more often, it actually is. However, it was really unlucky quite a few times: the distribution is skewed to the left.

The difference in luck per roll (last graph) shows the same bimodal pattern as Zorba's histogram. This is because for two players, X and O, the following relation holds:

endresult(X) = 50% - error(X) + luck(X) + error(O) - luck(O),

in other words

net luck diff = endresult(X) - 50% + error(X) - error(O).

A net luck of zero will require that the total error of the loser equals the total error of the winner plus 50% (I am talking GWC here). When a good player plays against a weaker player (150pts weaker on average as in Zorba's example, or maybe 500 pts lower as in my simulated example) then apparently the nature of backgammon makes this very unlikely. If the better players makes about 10% error during a game and the weaker player 40%, then the net luck will have two peaks at 100%-50%+10%-40%=20% (for won games) or -80% (for lost games). The 10% and 40% are just guesses though.

boomslang · July 04, 2010, 11:24:06 PM

whoops, 2nd graph should've been this one...

diane · July 05, 2010, 03:40:15 AM

I know how to do this 'simply'...a list of rolls...the same list...one game I will use them to play the best moves possible...then use them in the worst way possible...and see what the numbers tell me on analysis...

I realise the list of numbers wont all get played in one game...but it should be possible to align them quite well...

dorbel · July 05, 2010, 08:58:01 AM

QuoteThe better player (blue) had a positive luck 57% of the games.

This figure coincides with my own shorter studies of matches played between me and gbots on fibs. I have broken the study down into games rather than matches. When you do this you can see that the "luckier" player ALWAYS wins the game. However the better player, which in my study is at least occasionally me, gets the positive luck more often. Of course in matches the better player will also tend to win more points when he wins the game (and fewer when he loses), because of his better cube use. Jokers appear to be equally distributed.

Many congratulations to boomslang for his elegant experiment, even though much of his mathematical language is incomprehensible to this reader! I have believed what he now shows to be true for a long time.

pck · July 05, 2010, 12:30:20 PM

Quote from: dorbel on July 05, 2010, 08:58:01 AM
Many congratulations to boomslang for his elegant experiment, even though much of his mathematical language is incomprehensible to this reader! I have believed what he now shows to be true for a long time.

What exactly have you believed? That skill creates better luck, or that the definition of luck as used by the bots is flawed?

boomslang was careful enough to include a crucial "if":

"This means that if you consider a bot 'lucky' when it has a positive luck..."

"'lucky'" here refers to our intuitive concept of luck, whereas the luck in "positive luck" refers to the mathematical concept embodied by gnu's luck calculations.

pck · July 05, 2010, 01:29:23 PM

First of all a major vote of thanks to boomslang for taking the time to conduct and evaluate this experiment. Very nice work indeed.

Quote from: boomslang on July 04, 2010, 11:21:25 PM
In the second graph are two kernel density plots of the two players' total EMG luck in each game. Both distributions look bell shaped but are also not normally distributed (Lilliefors test p < 0.0002).

The better player (blue) had a positive luck 57% of the games. This differs significantly from 50% (p = 0.0001). This means that if you consider a bot 'lucky' when it has a positive luck, it not just appears lucky more often, it actually is. However, it was really unlucky quite a few times: the distribution is skewed to the left.

Thanks as well for being careful enough to include the crucial "if" here. As I argued in previous postings, we cannot possibly accept the definition of luck as it shows itself through these numbers (barring statistical flukes which I rule out because of the large number of rolls in the experiment). The densities in the second graph should be symmetrical around zero instead of showing the biases they do. As you say, these biases are responsible for the asymmetry in the 3rd graph, since in generating graph 3 from graph 2 the relation

Quote from: boomslang on July 04, 2010, 11:21:25 PM
net luck diff = endresult(X) - 50% + error(X) - error(O).

is enforced.

As I said in #62, with independent and different methods and/or ply-levels for luck and skill evaluations, it is clear that discrepancies can occur which break the above equation ("within gnu", before your evaluation). The luck/skill data gnu puts out (= the data used to produce graph 2) will then be inconsistent.

Contrarily to what I had surmised, gnu's luck-evaluation flaws show systematic bias towards the better player. They don't go both ways so as to cancel out over time. A remarkable result indeed.

Quote from: boomslang on July 04, 2010, 11:21:25 PM
A net luck of zero will require that the total error of the loser equals the total error of the winner plus 50% (I am talking GWC here). When a good player plays against a weaker player (150pts weaker on average as in Zorba's example, or maybe 500 pts lower as in my simulated example) then apparently the nature of backgammon makes this very unlikely.

So what do you think? Have you shown with this that total luck does not tend to zero in the long run if the skill difference between the players is large? Or have you shown that gnu's calculation of luck is defective? You're obviously aware of the problem or you wouldn't have included that "if" above.

vegasvic · July 05, 2010, 03:14:14 PM

what does all this have to do with playing backgammon ?

I never get any luck .... so STFU all of you

hahahahaaaaaaa !!

FIBS Board backgammon forum

News:

How do 'luck' calculations work..

Zorba

Zorba

pck

pck

pck

pck

pck

pck

sixty_something

stiefnu

socksey

diane

pck

boomslang

boomslang

diane

dorbel

pck

pck

vegasvic