News:

VIP Donor members get fewer ads;  Donate - don't be late
cheers

Main Menu

eXtremeGammon's chi-squared tests on FIBS dice

Started by Zorba, April 15, 2010, 08:19:05 PM

Previous topic - Next topic

pck

Quote from: boomslang on April 17, 2010, 11:55:22 AM
I don't know what you mean by an 'occurrence of certain last roll situations', so I cannot comment on this.

I meant situations like the one you gave as an example (4 checkers on acepoint, probability for a last roll double > 1/6).

Quote from: boomslang on April 17, 2010, 11:55:22 AM
1. "If the starting points of races are distributed randomly..." We know that this is not true: the starting point of race situations cannot lie in the first 8 or so rolls
2. P is not a random subset of S: it is the tail of sequence S, so it for sure will include the last roll (and because of 1. it will for sure not include the opening roll).

More than 1/6th of P will be doubles.

1. I didn't say "distributed evenly", but "distributed randomly", which I admit is confusing (see below 2. for a hopefully clearer explanation).

2. Note that S was supposed to be an infinite sequence of rolls used to play an infinite number of matches, not just one match. Hence what I called P is an infinite collection of tails in your sense. It is a subset of S, the collection of all rolls.

Instead of using my misleading phrase "P is a random subset of S", consider this:

Imagine two infinite dicestreams, both unbiased towards doubles. With the first stream we play all pre-race parts of all our matches. As soon as we have a race situation, we switch to the dice from the second stream. After a game/match we switch back to stream #1, and so on. We get a resulting, double-unbiased dicestream S over all our rolls of all our matches which is clearly also double-unbiased in pre-race as well as in in-race situations (= P) alone. If races were double-biased, this construction shouldn't be possible. But obviously it is.

The fact that it is so trivially possible to do this lends further credit to my claim that the idea of double-biased races is due to a confusion/mixing of two perspectives.

ah_clem

If I may be so bold as to weigh in with a few observations...

1) An even distribution of rolls across all 36 possible values is hardly proof that the dice are random - consider a dice generator that merely lists the 36 rolls in order, iterates through the list and starts over at the beginning when it gets to roll #37.  This generator would have a perfect distribution, but it's far from random.

2) Any sample from actual games is subject to sample bias. My understanding is that FIBS has a single dice generator that is shared across all ongoing matches - it merely gives the next roll to whoever asks for it.  This stream is what we'd like to test for randomness, but all we have at the moment is samples from the stream,  and as others have noted those samples are biased: doubles can't be the first move, doubles are more likely to occur at the end of games (both via bearoff, and by prompting a cube next roll), big numbers like 6-5 are also more likely to occur at the end of games for the same reason, small rolls like 2-1 will be the last roll relatively infrequently.

In short, the rolls that show up in a particular game are not a fair sample of the main dice stream.  pck explains this pretty well above.

3) To answer dorbel's  question "how good do the dice have to be?", I'd say that pretty much any psuedorandom generator would be good enough, even the lowly rand() in the C library.  The problem with mediocre pseudorandom generators is that they tend to repeat patterns after a few hundred thousand or so rolls, but since the only thing exposed to the user is an erratic sample of the  stream, it would be difficult to impossible to detect any patterns just by looking at the dice in your own matches.  Of course, it's not much harder to use a better psuedorandom generator than rand() (e.g. Mersenne Twister) so there's not much excuse for failing to upgrade, even though it doesn't really matter all that much.

Anybody know what algorithm FIBS uses?

4) I prefer to spend my time and energy on reducing my own errors rather than worrying about the dice. YMMV.


pck

#22
Quote from: ah_clem on April 17, 2010, 04:21:36 PM
1) An even distribution of rolls across all 36 possible values is hardly proof that the dice are random - consider a dice generator that merely lists the 36 rolls in order, iterates through the list and starts over at the beginning when it gets to roll #37.  This generator would have a perfect distribution, but it's far from random.

This distribution would indeed pass the first part of the fibs "dice test", but not the second, "Distribution of runs of n identical rolls".

Quote from: ah_clem on April 17, 2010, 04:21:36 PM
In short, the rolls that show up in a particular game are not a fair sample of the main dice stream.  pck explains this pretty well above.

If one game does not represent the whole of the dicestream accurately, this would be due to sample size, not because of certain rolls occuring "unnaturally" frequently at the end of games. I actually argued against that. What I (and dorbel and Zorba) said was that more last roll doubles than 1 in 6 (in the average game) do not imply more doubles than 1 in 6 overall. The exception is the case of first-roll doubles, where doubles are systematically excluded from gameplay. This can (and must) be fixed by including the rolls which decide who gets the first game-roll.

In the discussion above, we did not touch the problem of sample size. That is a different can of worms.

What I tried to explain in #16 was that one can look at the same statistical problem in various ways, and must take good care to construct one's theoretical description of the problem according to the perspective chosen (and also not change perspectives later).

boomslang

Quote from: ah_clem on April 17, 2010, 04:21:36 PM
1) An even distribution of rolls across all 36 possible values is hardly proof that the dice are random - consider a dice generator that merely lists the 36 rolls in order, iterates through the list and starts over at the beginning when it gets to roll #37.  This generator would have a perfect distribution, but it's far from random.
Quote from: pck on April 17, 2010, 05:15:28 PM
This distribution would indeed pass the first part of the fibs "dice test", but not the second, "Distribution of runs of n identical rolls".

It will also fail on XG's Chi2 test as there is no variability in the data (resulting in X2 values equal to zero).

Quote from: pck
What puzzles me is how it is possible to get good results for the player's as well as the opp's distribution, but a bad one for their combined distribution.

This is possible if the player's and the opponent's dice distribution would have had the same kind of bias, for example one die would have one side that is not straight. The two seperate tests might not reveal the (small) anomaly because the sample size is not large enough; put both groups together and you have doubled the sample size which then could be big enough to show an effect.

pck

#24
Quote from: boomslang on April 18, 2010, 01:27:45 AM
This is possible if the player's and the opponent's dice distribution would have had the same kind of bias, for example one die would have one side that is not straight. The two seperate tests might not reveal the (small) anomaly because the sample size is not large enough; put both groups together and you have doubled the sample size which then could be big enough to show an effect.

Good point. I believe this is the proper explanation for the first three results of XG_test.jpg in #6. By mistake, the same set R of 240.000 rolls entered the X2-test twice. R by itself has a X2 (25.48) which is well within acceptable bounds (P = 18.367%, the probability of getting from an unbiased distribution an even larger X2 than the one observed). But the combined distribution's X2, being exactly twice as large as R's because it duplicates all of R's deviations from the expected values, is a near-impossibility (P = 0.016%).