Repbot formula and parameter changes

inim · June 20, 2008, 01:53:35 PM

RepBot has been updated to version 2.0.1. The only change is that it allows to assign weights to bot vouchers and complaints at runtime. Currently the weights are set to voucher=0 and complaint=1.

The new policy change became necessary to work around reputation inflation caused by some bot's policy to vouch all players who finished a match. This of course allows any player to build a reputation of 100K within only ten games against those bots, rendering reputation meaningless as a criterion to base decisions on.

Please note that no opinion has been deleted and bots continue to be able to give opinions. Their opinions are readily available with the RepBot "list" command. However, the numerical score currently only considers them with the given weights.

The formula is subject to future revision and parameterization. The actual opinion data stored in the database allows for a variety of calculations, and adding some more database fields would allow even more. I am aware of several models worth further investigation, and also consider to support multiple metrics with a new command. So please don't be worried about changes of the numerical value, all ratings will be affected in the same way and their relative values will remain compareable, hopefully even better than today.

The main criteria for any changes are:

1) RepBot rating must not encourage creation of sock puppets, i.e. unexpericend accounts must have a limited influence only.

2) RepBot must be able to cope with bot opinions, and bots create many opinions very quickly and by this easily render our lazy and slow human voices meaningless.

3) The default and main reputation must be expressed as a signed integer, because many clients (GUI and other bots) rely on this format to base their decisions on. As many of those clients are unmaintained, no incompatible change is possible.

4) The value range and sign of the reputation must allow to use RepBot rating as "a tool to help avoiding droppers and abusive players", just as the help text says for ages.

Whatever is required to improve behaviour w/r to that 4 criteria is welcome as a suggestion.

Mookie · June 20, 2008, 08:27:16 PM

In order to make this system work as well as possible, I strongly suggest that you play, and vouch, MOOKIE.

mookie

inim · June 22, 2008, 11:13:52 AM

Here is my basic mindset on the modeling of a better reputation formula, quick shot so others can comment.

A) Categories
There may be different categories of reputation, which are not compatible with each other, i.e. you may find different reps for different views. For example a nice guy who is a notorious bot dropper should have a very bad reputation with bots, but a good one with humans. Bots should refuse invites, humans should not. Most current abuse patterns, criticism, and other problems stem from the IMO hopeless attempt to mix categories in a one-size-fits-all approach.

Thus what categories there are relevant for fibs? I found these so far:

A.1) Subjective human opinion
Is this a nice person without too much flaming, dice whining etc.? I.e: the classic popularity contest.

A.2) Reliability in bot games
Does the person lose against known bots like a man, or is it a bot dropper?

A.3) Reliability in human games
Does that person finish matches, plays sufficiently fast, and resumes them in time otherwise?

A.4) Turing test
Is this person communicating like a black hole and thus suspect to be a bot driven sock puppet? How much confidence we have we actually play a human when we play this account? Are rating and strength within reasonable corelation?

These four categories have incompatible reputation values, as they are rather orthogonal. I.e. you can be bad in any of them and good in any other.

B ) Inputs and Parameters

B.1.1) The best input for A.1 is the classic RepBot vouch/tell without time window after matches. If you dislike the guy for any reason, you should be able to share that. Bots need to be excluded from this category, of course.

B.2.1) The best input for A.2 is vouch/tell with time limit for matches for bots. Humans need to be excluded from this category.

B.3.1) One input for A.3 is vouch/tell with time limit for matches for humans. Known bots need to be excluded from this category.

B.3.2) Another input for A.3 can be derived automatically, by simply watching match events, pairings, match length and scores. We have some reason to assume a person is reliable if it plays, say, 3 5pt matches against a human, and finishes the matches. Dropper suspicion is fed by the fact one has a lot more game start events than match end events. Another interesting input would be the number of games which expire for that user, but that is not accessible via CLIP. The basic idea is to find one (or more) machine-maintainable score(s) derived from CLIP events, details pending. Both known bots, humans, and even sock puppets can likely be assesed using the same CLIP events, verification of this claim pending.

B.4.1) A.4 is the hardest criterion, as many human players are simply silent because they choose so. A simple approach with few false positives would be e.g. counting shouts. A person active in shout is less likely to be a sock puppet. Problem here is that such a count may encourage trash shouts by sock puppets to manipulate it. Bots openly playing as such are of course excluded.

Another test is to let gnubg determine match evaluations, so a 1450 account with 5000 experience playing at 2100 strength would be pretty suspect. However, computational cost of this is very high. I so far have no really workable idea for this input, so it probably will be dropped from a first implementation.

Note: RepBot has a function isBot(nick) ever since, which is maintained as a database flag by the admin. A function isSockPupet(nick) is out of reach, unless we design a reliable Turing test.

C) Calculation

Now that we got several categories and thus reps, we can lump them together in the classic single number rep. However, 10 fibsters have 16 opinions how to do that. So, why not let them decide themselves what matters for them and what does not?

Given weights WBn.m we can create the classic "default" reputation simply as:

Rep = CEILING(WB1.1*B1.1 + WB2.1*B2.1 + WB3.1*B.3.1 + WB3.2*B.3.2 + WB4.1*B.4.1)

When RepBot allows users to set weights to arbitrary values, users have the possibility to not only ignore or boost factors (weight=0, or >1), but in addition can scale the numeric value by simply using a common factor on the weights (i.,e. WBn.m := 0.1 *WB.n.m for all n, m scales by a division with ten). With a set of default weights, reps remain compareable for use in e.g. shout or private exchange about it.

Scaling the formula's graph on the Y-Axis and moving reps along the X-Axis (aka determine the sign of the function value) can be done with highschool math, no need to consider it at this stage. It's an implementation detail which, given a concrete data set, should be applied to suit expectations such as that "bad guys" should have a negative sign.

D) Implementation

There need to be a few new commands to implement this new behaviour, but no rocket science.

D1) tell repbot ask <player> <category_token>*

Without a category, the default weighted sum is given. With category given, the returned value will be only considering the given categories, adjusted by the weights.

D2) tell repbot set weight <category_token> <float number>
Set the user defined weight for a given category. *_default tokens can not be set, they are global for the system.

D3) tell repbot get weight <category_token>
Return the user defined weight as a float value for a given category

<category_Token> ::= human_opinion, bot_reliability, human_reliability, auto_reliablity, auto_turingtest, human_opinion_default, bot_reliability_default, human_reliability_default, auto_reliablity_default, auto_turingtest_default

D4) All sort of batch input and probably some sort of summary and info commands will add the syntactic shugar needed for convenience. However, D1-D3 are a minimal algebra and would be sufficient already.

OK folks, do you like it?

donzaemon · June 22, 2008, 12:13:00 PM

I like it a lot , but , it might be much more information than the averege fibster wants. As long as "tell repbot ask <player>" returns a normalized value then everyone will have what they want from it I think. The people who want to break it down will be able to , but for the most part you can just look at one rep value like we have gotten used to doing ....

Tom · June 22, 2008, 01:44:04 PM

B.3.2)

We can also time the matches and see if a match is over quickly evaluate the complaint slightly

if someone plays fast and then gets a complaint, it must be for something else (or no reason at all)

I think a bigger question is what kind of complaints do we really want to track?

Patti · June 22, 2008, 05:47:53 PM

I think that it runs the risk of confusing casual users into not using RepBot at all, and that the additional information probably has minimal utility.

inim · November 12, 2008, 10:01:33 PM

Quote from: inim on June 20, 2008, 01:53:35 PM
RepBot has been updated to version 2.0.1. The only change is that it allows to assign weights to bot vouchers and complaints at runtime. Currently the weights are set to voucher=0 and complaint=1.

I have just changed the parameters to voucher=0 and complaint=0.

Rationale is this:

1) FIBS rules changed when Patti introduced her infamous "bot dropper punishement script", so bots now have a much better defense. They do no longer need to use Repbot for for punishement anymore, Patti does it for them now.
2) Many of the opinions in the RepBot-DB were cast by the now banned GBot* family, and players have no way to get rid of them as the bot accounts are banned.
3) Requests concerning bot opinions (and Gbot opinions in particular) make up a significant share of my workload maintaining Repbot.

Please accept the new policy, I lack the time to discuss it in length. Just did what I think is appropriate, you may or may not disagree. Thx.

Standard disclaimer: No bot complaint or voucher was deleted, a "tell repbot list NICK" still has them all. If the Gbots come back before they expire and are garbage collected by RepBot, I may change the weight again because (a) people then can bother the bot maintainer about the complaints and (b) they may be able to get rid of them again.

FIBS Board backgammon forum

News:

Repbot formula and parameter changes

inim

Mookie

inim

donzaemon

Tom

Patti

inim