There are a few different things about the bgtrain website and gnubg that got mixed up here, I think.
While I was playing with the problems on the website, I had some trouble with some of the answers, so I plugged it into my own gnubg. I got results that were always different from the website, sometimes also changing the best move, order of the moves and equity differences. After some back and forth, it turned out the website uses a newer version of gnubg than I do, and that for the first time in 7 years, the newer gnubg version actually includes updated (better trained) neural nets! From the available test data that I found, it appears this new gnubg does indeed play clearly better than the older versions and bridges a large part of the gap that existed with XG2.
For this particular position, however, the newer gnubg actually performs worse. gnubg 1.0 gets it wrong by quite a lot on 2-ply, whereas gnubg 0.90 gets it right on 2-ply. It happens. As neural nets are trained to get some positions better, they will sometimes get other positions worse.
On rollout, this is usually corrected and indeed it seems that both versions of gnubg agree that B/24 19/13 is the best play, and B/19 22/21 clearly is not.
A rollout means the computer simulates a large number of games being played to the end for each candidate move, and tallying the results. This typically takes a while, many minutes or even hours. An n-ply evaluation means it just looks ahead n moves, and uses the neural net's figures at that point. This is usually done almost instantly or in a matter of seconds upto 3-ply; 4-ply might take upto a minute or so.
To sum it up: The website uses gnubg 1.0. It seems to use 3-ply evaluations for the top 4 plays, unless it thinks the issue is already decided on 2-ply, which unfortunately it thought to be the case here (3-ply still gets it wrong though, but with smaller differences). Here's where the site could improve a bit I think; it might be better to use 2-ply for all moves or if feasible 3-ply for all moves, rather than the mixed approach with 0-, 2- and 3-ply evaluations it uses now.
Even so, as the gentlemen above me already pointed out, gnubg's evaluations are very good overall, but far from perfect. So it will get things wrong, especially when it's being fed tricky positions as is probably the goal in this quiz. This quiz problem is a good example of how gnubg can go wrong on evaluation.
This particular position might seem relatively easy for us humans, but it does have some unusual features such as the big stack on opponent's sixpoint, and the perfect four point board, which has the highest two points open though. To win from here, there could be a lot of tricky moves along the way, trying to close the board backwards and perhaps, even a remote chance of hitting a second checker. So I think this position is actually not so easy for the bot. In general, my rule of thumb with gnubg is that when one side has 6 men back or more, it's unreliable.
As for gnubg evaluation settings, upto version 0.90: 4-ply checker play is clearly better than 3-ply checker play, which is only slightly better than 2-ply checker play. For cube decisions though, 3-ply is a bad idea. So recommended settings are the fast preset "supremo" for 2-ply checker and cube, or make your own slower but slightly better setting with "grandmaster" 3-ply checker play, but "world class"/"supremo" 2-ply cube. And if you're patient, a 4-ply overall setting is always a very good choice, both for cube and checker play, but it does take much longer.
The v1.0 neural nets have reduced the odd/even-ply effect gnubg suffered from, meaning that 3-ply cube is now simply better than 2-ply cube. So with the new gnubg, it's simple: 4-ply is best (but slow), then "grandmaster" 3-ply (reasonably fast), then "supremo" 2-ply (very fast).
To give a rough idea of how much gnubg 1.0 improves over gnubg 0.9: tests suggest it plays with an error rate that is about 50% reduced. Another interesting thing to note is that the new gnubg 2-ply, plays as good or even better as the old gnubg 4-ply (in other words, the new version can play as good as the old version at some setting, but much faster).
You can get the new gnubg here: http://www.gnubg.org/index.php?itemid=22
Test data is here:http://www.capp-sysware.com/downloads/gnubg/gnubg_prelim_results-1_00.txt http://www.extremegammon.com/studies.aspx