News:

click the little + - buttons to customise your Fibsboard (unclick them sometimes to see what you're missing :)

Main Menu

A score for unfinished games

Started by boomslang, April 01, 2010, 11:24:41 PM

Previous topic - Next topic

boomslang

Hi,

I compiled a simple metric that reflects the relative number of unfinished / expired games of a player compared to the overall average of unfinished /  expired games.  It can be easily implemented in RepBot (or any other bot that is logged in 24/7) and be updated in real time.

Note: Table deleted at request of boomslang.  It can be restored if required.  Diane.

dorbel

Erm, yes, I seeeeeeeeeeee, but what does it all mean?

ah_clem

I think it's just S/N where S is the number of saved games for that player and N is the average number of saved games for all players.

A value of N == 1 would mean the player has an average number of saved games. N == 0 would imply that the player has no saved games.

I don't see that it's much more useful than the actual saved count.


diane

I still would like to come off the list, I am not comfortable sitting there!!  :blink:  :laugh: :laugh:
Never give up on the things that make you smile

diane

#4
You say based on stats from 2006-2009...have you been spying on me this whole time?  :unsure: How else do you know expired saved games over that time frame?
Never give up on the things that make you smile

boomslang

I guess a bit more info would've been good...

Quote from: ah_clem on April 02, 2010, 07:48:42 PM
I think it's just S/N where S is the number of saved games for that player and N is the average number of saved games for all players.


No...  See below.

Quote from: ah_clem on April 02, 2010, 07:48:42 PM

I don't see that it's much more useful than the actual saved count.


The actual saved count has two drawbacks:

1) The count only gives the current number of saved games, not matches that are already expired.  If a notorious dropper doesn't log in for a while, his unfinished matches expire and he starts with a clean sheet;

2) the count is not corrected for activity. I would say that it is OK for an active player to have more saved matches than an inactive player.


During the time my bots were online, MonteCarlo had 313 of its matches expired; TURKIYE only had four.  Does this mean TURKIYE is more likely to finish a match?  No.  MonteCarlo started 73201 matches, so only 0.43% of them were unfinished.  TURKIYE started 35 matches and had four of them unfinished (11%). 

The total number number of unfinished games (of all the players in the table) was 1877 out of 185900 (1.01%).  TURKIYE's "rate" of unfinished matches is therefore 11.45 times the average and MonteCarlo's rate is only 0.42 times the average.  I guess it is safe to state that MonteCarlo is NOT a dropper!  TURKIYE on the other hand has the odds against him.

A score below 1 is BY DEFINITION a good score: it means they finish their matches more often than average.  Looking at the table, I guess that any score below 2.5-3 or so would be totally OK, but that is open to debate...


Quote from: diane on April 02, 2010, 09:03:34 PM
You say based on stats from 2006-2009...have you been spying on me this whole time?  :unsure: How else do you know expired saved games over that time frame?

The data that are used for the scores are nothing else than the 'toggle report' results.  Public info that is monitored by Repbot and tobgol all the time.


maria

No comment, but a question:

Where did my first post go?


diane

Quote from: maria on April 03, 2010, 08:24:54 AMWhere did my first post go?

There is no evidence that was ever a first post...did you check it loaded up and didn't time out with the intermittent server issue?
Never give up on the things that make you smile

maria

Quote from: diane on April 03, 2010, 09:24:58 AM
There is no evidence that was ever a first post...did you check it loaded up and didn't time out with the intermittent server issue?

Okay maybe it was early in the day before I had my breakfast.  I was thinking of my reply to a post by Zorba at the backgammon haiku topic.
<whacking myself upside the head>

diane

Never give up on the things that make you smile

inim

The idea is good and would fit in as an automatically derived metric, see http://www.fibsboard.com/repbot/repbot-formula-and-parameter-changes/msg15026/#msg15026.

The one question I have is how you determine which games expired.
This space is available for rent by advertisers. Call 0900-INIMITE today, and see your sales skyrocketing in no time! New customers receive free Vl@9rå and a penis enlargement set as a bonus! We support banners, flash banners, and scrollers. Discrete handling by our HQ on the Dutch Antilles.

boomslang

In the long run, there is no distinction between 'expired games' and 'unfinished games': expired games are simply unfinished games.  Since this metric only looks at unfinished matches on an aggregate level, there's no need to tell which match was played to an end and which match wasn't.

For all players in the database you add two counters:

A) number of started matches, and

B) number of finished matches.

Whenever you see something like

   "AAAA and BBBB start an n point match"

you increase A for players AAAA and BBBB by 1. (Of course unlimited matches should be skipped.)  Whenever you see

   "CCCC wins an n point match against DDDD"

you increase B for players CCCC and DDDD by 1.  The numbers shown in the table above are [(A-B) / A] / [(sum(all A)-sum(all B)) / sum(all A)].

The idea is that Repbot issues a warning whenever the score of a player is above a certain level. I think with a little bit of mathematics, this level can even be player specific (based on his total number of started matches), so that NIHILIST's score of 33 would NOT be considered too high (only 3 started matches after all) but TURKIYE's score of 11 would.

I would say "RepBot, start logging!"


PS I presented the table NOT for naming and shaming reasons but to give an impression of whether or not this metric works. Maybe it is better if stog or socksey deletes it from FIBSboard.

ah_clem

To summarize: right now players can only see the number of current saved matches.  A player who starts 100 matches and drops half of them can go away for a month (of use another nick for a month), and come back to a zero savedcount.

Your proposed statistic keeps a historical record of saved matches, so even after the match expires form the server, it still counts in the user's reputation.

This sounds reasonable to me.  If you can get inim to implement it, I'd use it to decide whether to play someone.  I'm not sure if normalizing it (i.e. dividing by [(sum(all A)-sum(all B)) / sum(all A)].) makes it easier to understand, but I cna work with the stat either way.

One obvious problem is that Repbot is not always online, so it would miss counting some matches - perhaps even resulting in some players having more completed matches than matches.  I don't think this is a showstopper, but something to consider.

inim

#13
Quote from: ah_clem on April 11, 2010, 03:44:05 PM
Your proposed statistic keeps a historical record of saved matches, so even after the match expires form the server, it still counts in the user's reputation.

Repbot already detects match events, which is start and both regular and irregular end of matches. The events are stored in a mySQL table. This is needed to enforce Patti's rule that opinions can only be cast 24h after the last match event.
Code:
http://openfibs.svn.sourceforge.net/viewvc/openfibs/trunk/modules/repbot/src/main/java/net/sf/repbot/linelistener/MatchEventListener.java?revision=50&view=markup


Currently these events are garbage collected after 24h, because they no longer can possibly matter for the 24h rule.

The started-vs-finished matches metric boomslang proposes can be implemented as a rather simple extension to this table. First it needed to tell between different match events, start, end, user-logout-during-match. The latter is also already supported by RepBot and can be re-used. The event types are here:
http://openfibs.svn.sourceforge.net/viewvc/openfibs/trunk/modules/repbot/src/main/java/net/sf/repbot/linelistener/Monitor.java?revision=50&view=markup

The rationale for garbage collection was to keep the DB small because it is backed up daily and transfered to a backup location. I consider this too important to break it, or to bloat the DB with years of historic match start/end event data. So we need a refinement of the algorithm which aggregates the counts without the need to keep an arbitrarily long history. A possibility would be to keep the last say 6 month, and after that just keep an integer counter for match start and match end events.

That means that A and B must be calculated from 2 inputs, the elaborate event table for the last 6 months and the aggregated historical data older than 6 month. Given that events years back may be irrelevant, using a sliding window may actually even better. That means the quotient boomslang proposes is only calculated over match events for the last say 6 month. That should keep the table at a constant size and still is good enough in my view.

Another alternative, if you guys convince me that we really need an arbitrarily long history, is to exclude the match event table from the backup. Local disk space is a non-issue, of course. The rationale here is that if I am struck by lightening tomorrow, anybody must be able to pick up source and DB backup and continue RepBot. That wouldn't be the case if RepBot relied on private files only available on my server.

Quote from: ah_clem on April 11, 2010, 03:44:05 PM
One obvious problem is that Repbot is not always online, so it would miss counting some matches - perhaps even resulting in some players having more completed matches than matches.  I don't think this is a showstopper, but something to consider.

Repbot has an uptime of 99%, it basically only goes offline for a minute if the server kernel or repbot itself is updated. Recovery after fibsquakes is good, repbot typically is one of the first bots back online after a server hickup. I am with Patti here: this is a game server and not a financial industry application. If we lose a few events, I couldn't care less :)

Implementation wise, each of you is welcome to add the functionality to RepBot. It is open source. My only condition was that it is done cleanly within the existing architecture. Avi Kivity is a genius coder and created excellent, easily maintainable code with a clean OO architecture. I intend to keep RepBot in that shape and will refuse incorporating dirty hacks. That said, 80% of the code needed is already there for the 24h rule, so carefully augmenting it shouldn't break the architecture.

The other issue is how to present the new metric. I can not simply change the output syntax of any command, because legacy bots and clients parse RepBot output and can break. So the new functionality must come as a new command. I would like to see an implementation of my weighted-sum approach here, otherwise for each new idea we need a new command, rendering the command interface hopelessly cluttered in a few years. Always keep in mind the commands are an API 3rd party software outside my control hardcodes. So I am very conservative here and insist on a clean command line interface design which can stand the test of time.

Any takers?
This space is available for rent by advertisers. Call 0900-INIMITE today, and see your sales skyrocketing in no time! New customers receive free Vl@9rå and a penis enlargement set as a bonus! We support banners, flash banners, and scrollers. Discrete handling by our HQ on the Dutch Antilles.

ah_clem

I am also with you and Patti here with regard to this being "just" a game server.  I will lose zero sleep worrying about Repbot's occasional downtime.  (c:

But, to take this approach farther, since it's not a financial server that needs to log every transaction for auditing purposes, why not just keep it simple (as boomslang originally proposed) and just keep track of two integers: number of matches started and number of matches finished. Increment as the event occurs, rather than logging each and then calculating sums at query time.  This is  lightweight from a db perspective.

Regarding the sliding window, while I agree that really old events should "age out", since we're computing a ratio the old events will become negligible due to swamping.  The same way that all those matches I lost at first don't matter to my rating today.  The problem I see with a "forever" count is that a long time user could go on a drop-fest and still have a good metric after dropping hundreds of matches.

So, were I to design the Cadillac metric, it would consist of two numbers: one derived from the incremented integers (forever) and one derived from the data for the last six months.  I'd present them as
100 X B/A and call it finish percentage:

ah_clam has finished 78.3% of matches in the last six months and 84.3% overall.

I think expressing it as a percentage will be more easily understandable to most people than boomslangs normalized ratio.

Anyway, that's my $.02.   I have no plans to do any implementation, though, so my opinion doesn't count nearly as much as whoever it is who volunteers to do the actual work.

boomslang

Quote from: inim on April 12, 2010, 11:18:44 AM

The rationale for garbage collection was to keep the DB small because it is backed up daily and transfered to a backup location. I consider this too important to break it, or to bloat the DB with years of historic match start/end event data. So we need a refinement of the algorithm which aggregates the counts without the need to keep an arbitrarily long history. A possibility would be to keep the last say 6 month, and after that just keep an integer counter for match start and match end events.



For my proposed metric there is no need to keep a log of individual events.  The only thing needed are two counters per player (started and finished matches). They should be backed up though, but that is not an issue.

It is also not needed to know if a player logs out during a match. (Although it might provide extra information). "Toggle report" information is the only input needed.

I think the implementation is therefore quite straightforward, don't you think, inim?

Quote from: ah_clem on April 12, 2010, 02:58:52 PM


ah_clam has finished 78.3% of matches in the last six months and 84.3% overall.

I think expressing it as a percentage will be more easily understandable to most people than boomslangs normalized ratio.


My suggestion would be to issue just a simple warning when the ratio is too high (similar to RepBot now deciding that a reputation below 0 is BAD and above 0 is GOOD) and don't bother the user with numbers.
(If a user needs to see a number, then it should be normalized and it should be based on the player's  UNFINISHED percentage and not it's FINISHED percentage. 99.5% finished and 98.5% finished doesn't look like a big difference, but the latter is three times as high when it comes to unfinished percentage.)

I also don't think the metric should use a sliding window.  The more data about unfinished matches, the better the statistics get.  Long time users are not likely to suddenly start dropping matches.  Lots of effort was put into his/her account after all.


Maybe inim can tell what is needed (Eclipse, mySQL, ...) for adapting RepBot's code?