News:

most Recent Posts and Recent Topics now viewable on the front page!

Main Menu

position analysis

Started by boop, January 12, 2014, 06:57:44 PM

Previous topic - Next topic

boop

hi,

i've been playing for the last few days on bgtrain.com and came across a position where i couldn't believe the top move was correct and not mine!!!!
So I did a rollout at 2 ply then 3 ply and my move was confirmed top. (it now turns out after talking to the site developer that my GNU version along with a few other knowledgable people's, who also confirmed my move, is far inferior to the up to date GNU version for analysis)

I have 2 question but have a look first to see what your best move would be.



.
.
.
.
.
.
.
.
.

Position ID DhAAYNsAgEcKAA:QQmnAEAACAAE

The grey attachment is the GNU analysis with my choice being its top move



the attachment with the blue selection is the better analysis from bgtrain.com


1) how can my GNU rollout be sooo wrong at 2 and 3 ply when there's hardly any game left?

2) Can you see why moving a blot to 6 away rather than 7 away with bar/19 22/21 is better? and what other things are you taking into account to make that move the best?

boop

dorbel

Why do you think that bgtrain's analysis is better? The writer might say that it is but it isn't. Bar/24, 19/13 is clearly correct. Bar/19, 22/21 isn't even second best. Bar/18 is better.
I would be happy to play this as a proposition. Invite him to back his "analysis" with hard cash.

boop

well i've been chatting about it with zorba in shouts and i think i've been confusing evaluations (bgtrain) with rollouts (GNU) in my post above.
... i'm not sure the difference though.

I'm happy that you agree with my choice of move though.

dorbel

#3
2-ply evaluation is the bot's best guess. It looks down the line to what the equity might be for each of the candidate plays after you and your opponent have each rolled and played again and it is of course an average of all the possible rolls and the best plays associated with each. Amazingly (to me), this is startlingly accurate. However this evaluation is wildly wrong and the explanation that there is a new version of Gnu that is better than yours is absurd. To accept a Gnu 2-ply analysis as incontrovertible fact is lazy and damaging in any case, even when it is correct! Always test with a rollout, ideally with the superior ExtremeGammon, not free but clearly better and essential for any serious student.
BgTrain is an interesting venture, but probably not very valuable as a learning tool. Understanding why an answer is correct is much more important than finding what the answer is. If you are going to provide wrong answers and no explanation, the total effect will actually be negative!

Analysis is something else again, an attempt to say why the best play is best and give reasons. looking at this one, it is clear that making the 24pt is a good idea, forcing White to bear off all her checkers in order to win. 19/13 is then the best 6, diversifying the stack and gaining some outield coverage. If however, an alternate play generated extra shots next turn it might be better, so in-depth analysis requires us to test each candidate to see how many shots it creates next on average. At a quick count bar/24, 19/13 generates 608/36, bar/18 595/36 and bar/19, 22/21 a measly 534/36. No contest.

ah_clem

In case Dorbel's post wasn't clear already, let me re-emphasize that bgtrain's answers come from gnu 3-ply analysis.  None of the answers are rolled out.  So it is to be expected that sometimes they are wrong.

dorbel

Not actually true clem. The table from bgtrain clearly shows that it is 2-ply. Of course evaluation can be wrong and often is, but never in my experience by this margin in such a simple position. I suspect operator error, but why the operator should defend his obvious blunder by saying that he has a version of Gnu that is much stronger than boop's is inexplicable.
I vaguely recall that Gnu 3-ply sometimes did produce a bizarre result, which is why 2-ply tends to be standard for Gnu users, but I have no evidence that that is what is happening here.

boop

well i discussed this in shouts with zorba. He then updated his GNU to the most recent version from gnubg.org (v1.02.000 July 28th, 2013) and found that his evaluation was now the same as bgtrains. His rollout was the same or very close to mine though. weird!

So it's not operator error

Goirgos from bgtrain emailed this message to me today

"I just did a rollout for this position and you are correct. Note that for the time being I am not doing rollouts, but I am making the evaluation stronger by using a wider filter. In the future I might use rollouts as well."

The position page where comments can be added is here: http://www.bgtrain.com/?pid=DhAAYNsAgEcKAA:QQmnAEAACAAE

i'm putting my rollout in the comment and would like to paste your (quick count) analysis Dorbel but maybe you'd like to add it under your own name?




ah_clem

Quote from: dorbel on January 13, 2014, 01:31:05 PM
Not actually true clem. The table from bgtrain clearly shows that it is 2-ply.

Ok. I was just going by what Giorgos Tzampanakis, the guy behind bgtrain, has said over at rec.games.backgammon.  He said the site used gnu 3-ply analysis and  I took him at his word.  The important thing is that it's not a rollout, regardless of 2-ply vs 3-ply.


Zorba

#8
There are a few different things about the bgtrain website and gnubg that got mixed up here, I think.

While I was playing with the problems on the website, I had some trouble with some of the answers, so I plugged it into my own gnubg. I got results that were always different from the website, sometimes also changing the best move, order of the moves and equity differences. After some back and forth, it turned out the website uses a newer version of gnubg than I do, and that for the first time in 7 years, the newer gnubg version actually includes updated (better trained) neural nets! From the available test data that I found, it appears this new gnubg does indeed play clearly better than the older versions and bridges a large part of the gap that existed with XG2.

For this particular position, however, the newer gnubg actually performs worse. gnubg 1.0 gets it wrong by quite a lot on 2-ply, whereas gnubg 0.90 gets it right on 2-ply. It happens. As neural nets are trained to get some positions better, they will sometimes get other positions worse.

On rollout, this is usually corrected and indeed it seems that both versions of gnubg agree that B/24 19/13 is the best play, and B/19 22/21 clearly is not.

A rollout means the computer simulates a large number of games being played to the end for each candidate move, and tallying the results. This typically takes a while, many minutes or even hours. An n-ply evaluation means it just looks ahead n moves, and uses the neural net's figures at that point. This is usually done almost instantly or in a matter of seconds upto 3-ply; 4-ply might take upto a minute or so.

To sum it up: The website uses gnubg 1.0. It seems to use 3-ply evaluations for the top 4 plays, unless it thinks the issue is already decided on 2-ply, which unfortunately it thought to be the case here (3-ply still gets it wrong though, but with smaller differences). Here's where the site could improve a bit I think; it might be better to use 2-ply for all moves or if feasible 3-ply for all moves, rather than the mixed approach with 0-, 2- and 3-ply evaluations it uses now.

Even so, as the gentlemen above me already pointed out, gnubg's evaluations are very good overall, but far from perfect. So it will get things wrong, especially when it's being fed tricky positions as is probably the goal in this quiz. This quiz problem is a good example of how gnubg can go wrong on evaluation.

This particular position might seem relatively easy for us humans, but it does have some unusual features such as the big stack on opponent's sixpoint, and the perfect four point board, which has the highest two points open though. To win from here, there could be a lot of tricky moves along the way, trying to close the board backwards and perhaps, even a remote chance of hitting a second checker. So I think this position is actually not so easy for the bot. In general, my rule of thumb with gnubg is that when one side has 6 men back or more, it's unreliable.

As for gnubg evaluation settings, upto version 0.90: 4-ply checker play is clearly better than 3-ply checker play, which is only slightly better than 2-ply checker play. For cube decisions though, 3-ply is a bad idea. So recommended settings are the fast preset "supremo" for 2-ply checker and cube, or make your own slower but slightly better setting with "grandmaster" 3-ply checker play, but "world class"/"supremo" 2-ply cube. And if you're patient, a 4-ply overall setting is always a very good choice, both for cube and checker play, but it does take much longer.

The v1.0 neural nets have reduced the odd/even-ply effect gnubg suffered from, meaning that 3-ply cube is now simply better than 2-ply cube. So with the new gnubg, it's simple: 4-ply is best (but slow), then "grandmaster" 3-ply (reasonably fast), then "supremo" 2-ply (very fast).

To give a rough idea of how much gnubg 1.0 improves over gnubg 0.9:  tests suggest it plays with an error rate that is about 50% reduced. Another interesting thing to note is that the new gnubg 2-ply, plays as good or even better as the old gnubg 4-ply (in other words, the new version can play as good as the old version at some setting, but much faster).

You can get the new gnubg here: http://www.gnubg.org/index.php?itemid=22

Test data is here:
http://www.capp-sysware.com/downloads/gnubg/gnubg_prelim_results-1_00.txt
http://www.extremegammon.com/studies.aspx
The fascist's feelings of insecurity run so deep that he desperately needs a classification of some things as successful or superior and other things as failed or inferior. This also underlies the fascist's embracement of concepts like mental illness and IQ tests.  - R.J.V.

Luck is my main skill