News:

click the random photo on the front page to go directly to the forum

Main Menu

Dirty dancing

Started by stiefnu, January 12, 2009, 08:46:09 PM

Previous topic - Next topic

stiefnu

A while ago,noticing that I seemed to be dancing a lot on 1 point boards, I started to keep count.  After a further 500 or so games (many of which have involved no dancing chances), I have the following results, where only one chequer was on the bar and the 6 point stacked:
•   I have come straight in off the bar 20 times
•   My  opponents have come in 24 times.
•   I have danced 12 times
•   My opponents have danced just twice

On average, there should be a dance:enter ratio of 1:36.   However, the combined result gives a ratio of only 1:3, with my oppos' ratio at 1:12 and mine at a dismal 1:1.7.   So, after around 60 or so such dancing opportunities, I find I have danced around 7 times more often than my opponents.  And even they could count themselves unlucky!

Are these results normal, with the apparent bad luck simply because of a small sample size?  Would someone better at statistics than I care to comment, please?

UBK

 I am quite sure your counts are incorrect. In about 500 games you only had 58 instances of a single checker on the bar vs a one point board?

dorbel

The "500 or so" is a bit of giveaway. Not very scientific is it? UBK's objection is also a killer. I examined 15 games from two matches chosen entirely at random. In these 15 games I saw 25 instances of being on the bar against a one point board! 24 of these entered incidentally.

stiefnu

Quote from: UBK on January 13, 2009, 01:30:46 AM
I am quite sure your counts are incorrect.
Quote from: dorbel on January 13, 2009, 09:19:59 AM
The "500 or so" is a bit of giveaway. Not very scientific is it?
I guess you're both right.  The "500 or so" was an estimate and must have been substantially lower than that but the totals for dances, and hence the ratios derived, were faithfully recorded - though off course it may be that I missed some, in the heat of battle.  However, I am sure that my recent dancing ratio has been unusually high, bearing in mind that I had already noticed something odd, before beginning the count.

I look forward to the pendulum swinging the other way.  :-)

boomslang

Hi Steve,


You danced 12 out of 32 times and that can't be attributed to bad luck alone. In fact, any number of dances above 3 suffices for that. Two possibilities remain:
   
1) FIBS' dice are not fair to you.
   
2)  Something went wrong during your data collection.
   
Given your remark that you might have missed some observations in the heat of the moment, I recommend that you use dorbels approach: in JavaFIBS/GnuBG/Snowie/BGBlitz open your 20 (or even better, 50) most recent matches and re-count the number of dances/enters for you and your opponents by pressing the `next' button. This might take a while, I admit, but it would be a) fair to FIBS, and b) interesting to see if your observations were objective. Please post your new findings!
   
Greetings, boomslang

PS Theoretically, the dance:enter ratio is 1:35 because the probabilities are 1/36 vs 35/36...

socksey

Oh, and don't forget to read this:  http://www.alef.co.uk/fibs/archive/dysfunction.html   :lol:

socksey



I have heard it said that a coward dies one thousand deaths because he thinks of death often, and that a brave person dies only once because he never thinks about such things.  I think it is more likely to be the other way around. The truly brave person gets past the discomfort of thinking of death often, and allows his or her life to be shaped in a mature, healthy, and positive way by that awareness. - unknown


lewscannon

Stiefnu,

Are you in the middle of a free fall, rating-wise? I've been through periods where I can't win an old spoon (to use an expression my grandmother used), and dancing like Fred Astaire is definitely one of the symptoms. All in all, it evens out over time, but one can encounter this just as easily as when you're hot and a 5 point board can't hold you. I'm still not sold on the equinamity of Fibs dice, despite all the sane arguments made for it, (please don't take me up on it again), but I will say that it evens out in the long run. Has anyone considered that perhaps the Fibs dice generator, like Hal of the movie '2001', has acquired consciousness? I was playing a bot this weekend when it asked me "Just what do you think you're doing, lews? This game is too important for me to allow you to jeopardize it."

stiefnu

Quote from: boomslang on January 14, 2009, 01:46:18 PM
I recommend that you use dorbels approach: in JavaFIBS/GnuBG/Snowie/BGBlitz open your 20 (or even better, 50)
After a further 57 games (addiction is a terrible thing!), the results are much more what one would expect: I have not danced; my oppo has danced once; we have both entered 20 times.

Quote from: socksey on January 14, 2009, 03:03:22 PM
Oh, and don't forget to read this:  http://www.alef.co.uk/fibs/archive/dysfunction.html
Nice one!

Quote from: lewscannon on January 14, 2009, 03:09:14 PM
Are you in the middle of a free fall, rating-wise?
Well, I guess a drop from 1,799 down to 1,554 could indeed be described as fee fall!  I've recovered a hundred since, so perhaps the pendulum is now swinging the other way.

lewscannon

Quote from: stiefnu on January 14, 2009, 04:27:50 PM
After a further 57 games (addiction is a terrible thing!), the results are much more what one would expect: I have not danced; my oppo has danced once; we have both entered 20 times.
Nice one!
Well, I guess a drop from 1,799 down to 1,554 could indeed be described as fee fall!  I've recovered a hundred since, so perhaps the pendulum is now swinging the other way.

Fibs freefalls are the worst. I wind up in a foul mood during them, especially when they happen over the course of a league. Every decision you make blows up in your face and you can't get a roll to save your life. And then cowgirl sneers at you while she takes your points. Oh, the pain and humiliation!

playBunny

Ignoring any problems with the data collection, I thought it would be worth posing the question in the DailyGammon forum for any stats propellerheads. And lo, one appeared and gave an answer. I don't claim to fully understand it but I think it's meaningful enough to at least give an indication.

I gave a shorter version of the problem as you gave it and later restated it more simply:
Quote
In essence he just wants to know how big a sample he needs before he can say that a distribution such as he has found can be relied on not to be chance (or rather, too chancy). Ie. if he were rolling two dice and got double six in 14 out of 58 rolls then can he deduce that the dice are biased? What if it's 14,000 double sixes out of 58,000 rolls? Where between those limits can he stop because the results are accurate enough to represent the dice?

And here's what he said:
Quote
I guess it's my duty as a (the?) resident professional stats propellerhead to pipe in.
In classical hypothesis testing, once you've fixed a testing procedure, there's a relationship between four quantities:

1. The size of the sample required.

2. The magnitude of the effect you want to detect.

3. The significance level of the hypothesis test (informally, the probability of false positives you're willing to allow).

4. The Type II error rate (the probability of false negatives you're willing to allow). This last one is usually given as the power of the test (1 minus the Type II error rate).

So, to address the question of how large a sample is required, you need to first choose a testing procedure and then pin down the other three quantities (2-4). A reasonable choice for the testing procedure would be a one-sample binomial test of the null hypothesis that the true dancing rate is 1/36, against the alternative hypothesis that the dancing rate is different from 1/36.

There are fairly widely used conventional choices for the significance level (.05) and the Type II error rate (.20), so let's adopt those for the sake of the example.

So that leaves us with a direct relationship between the magnitude of the effect (HOW different from 1/36 is the dancing rate) and the sample size required to have a good shot at detecting the effect statistically:

If the true dancing rate is 20%, you only need a sample of 15 rolls to be likely to detect it.

If the true dancing rate is 15%, you only need a sample of 27 rolls to be likely to detect it.

For 10%, you need 64 rolls.

For 5%, you need 531 rolls.

In this case, we might not be satisfied with a 5% false positive rate. If we adopt a conservative 0.1% false positive rate instead, the relationship between the true dancing rate and # of rolls required to be likely to have a statistically significant finding are:

20% -> 27 rolls

15% -> 48 rolls

10% -> 123 rolls

5% -> 1,090 rolls

Posted by thefellswooper at Tue Jan 13 02:49:27 2009

Note that he's talking about the number of rolls because I stated the problem in terms of rolling double-sixes but each roll also represents one instance of being on the bar against a one-point board and failing to get in.

As I understand it, the 20%, 15%, 10% and 5% are the potential double-six rates of the dice and the number of rolls is the number of trials or events required to prove that rate. In reality the double-six rate of the dice is expected to be 1/36, or 2.78%, which means that the number of events would be greater than the 1090 required to prove the 5% double-six rate. I don't know what the calculation is but the curve is shooting up and would suggest that it's maybe a couple of thousand, if not more.

playBunny

thefellswooper also added the following, which I give for completeness.
Quote
Let me add that I'm not generally the world's biggest fan of classical hypothesis testing, and that this question might be better suited to a Bayesian framework, in which we could quantify and incorporate our prior beliefs regarding the fairness of the dice. Then, at the end, we could obtain a probability distribution on true dancing rate rather than a less illuminating yes-or-no answer.

Posted by thefellswooper at Tue Jan 13 03:18:00 2009
It sounds great but don't ask me to explain it! :D

stiefnu

Quote from: playBunny on January 15, 2009, 12:43:26 PM
I don't know what the calculation is but the curve is shooting up and would suggest that it's maybe a couple of thousand, if not more.

Thanks for posing the question PlayBunny and then posting thefellswooper's answer.  As the ratios between 20% - 10% and 10% - 5% both increase by a factor of around 5, I would guess that from 5% - 2.7% would also be of that order, which suggests that the sample size may have to be nearer to 5,000.  Not sure I'm going to have the time to analyse that many games!  Anyone know how many matches GnuBG can store, before it gets indigestion?

playBunny

Quote from: stiefnu on January 15, 2009, 03:13:47 PM
Thanks for posing the question PlayBunny and then posting thefellswooper's answer.  As the ratios between 20% - 10% and 10% - 5% both increase by a factor of around 5, I would guess that from 5% - 2.7% would also be of that order, which suggests that the sample size may have to be nearer to 5,000.  Not sure I'm going to have the time to analyse that many games!
You're welcome. I though it an interesting question, especially as it can be applied to other situations.

Yes, it may well be nearer 5000. I drew a very crude graph and stuck a ruler against it but I didn't want to overestimate. ;)

QuoteAnyone know how many matches GnuBG can store, before it gets indigestion?
The actual matches are stored individually in a folder so they're limited by your hard disk storage. The SQLite database of matches (players, error rates, etc) only contains a small record per match and can easily handle more matches than you can play in a lifetime.

boomslang

Quote from: playBunny on January 15, 2009, 12:43:26 PM
    As I understand it, the 20%, 15%, 10% and 5% are the potential double-six rates of the dice and the number of rolls is the number of trials or events required to prove that rate. 

No, it means this: suppose FIBS dice aren't fair to you and your chances of rolling a 66 when you're on the bar are 10% (instead of 2.78%). You will need 64 rolls to have an 80% chance of correctly rejecting the null hypothesis of fair dice (at a significance level of 0.05).  If FIBS gives you a 5% 'dance ratio', then you will need 531 rolls.

You cannot 'prove' the ratio of 10%, 5% etc. though; you can only reject (with 95% confidence in this case) the hypothesis that your dance ratio is 2.78%.