ELO system for bbo?
#1
Posted 2021-January-17, 03:59
#2
Posted 2021-January-17, 04:33
Bridge is similar to all other equilibrium games such as Chess and Go, but the big dissimilarity is the requirement for people to play as a pair.
This means that if you want a rating system, there would need to be a rating system for pairs rather than individuals.
Unfortunately, many Bridge partnerships are as evanescent as steam from a boiling kettle (I could meander more through that metaphor but you get the idea).
There is a workaround that I have used. You can download an ELO calculator from the web and then play in regular massive daylongs where the number of competitors is often greater than 1000 every day. The quality is very mixed so it can be characterised as international tournament every day.
Map your results from these robot daylongs, and you will gain a rough idea of your equivalency to Chess ratings.
Clearly, stars, masterpoints and success in previous tournaments - all of which contribute to overall ranking in Bridge - are of little value in determining a player's instantaneous skill level.
This is because masterpoints accumulate like barnacles and exist to make money for Bridge organisations who then provide the tournaments that we all enjoy.
There is an equivalent problem in Chess in that titles such as International Master and Grandmaster are awarded based on achieving excellent results in high-quality tournaments.
This is why I suggest using large daylongs if you really want to map Elo (his name was Arpad Elo by the way) rankings onto Bridge players.
Of course, there is the little problem that not everyone regards robot Bridge with the same affection as I do (litotes), but them's the breaks.
#3
Posted 2021-January-17, 05:13
A lot of members focused on their rating and would not play with weaker partners or opponents: even though the rating system protected them from weaker players, very few believed this and even fewer were willing to take the risk. This caused a lot of bad feeling when players were booted by the table host shortly after sitting down.
I believe that BBO deliberated avoided a ratings system initially to avoid this behaviour.
It is interesting to see that the English Bridge Union, and perhaps others, now have national rating schemes and the players are more accepting. Perhaps because there are more events targeted at them.
#4
Posted 2021-January-17, 05:29
#5
Posted 2021-January-17, 05:34
#6
Posted 2021-January-17, 06:21
lmhk, on 2021-January-17, 05:34, said:
The ride isn't worth the cost of the ticket
Introducing rating systems introduces an enormous range of problems.
The most direct are a series of incredibly painful social interactions when people are either
A. Trying to protect their ratings
B. Trying to blame other people because they just damaged their ratings
Running a close second is a series of incredibly painful interactions trying to explaining to people that the rating systems says that are bad bridge players.
And of course, there is the never ending joy of trying to explain "How the ratings system works" and "No, the ratings system is not broken" to people who are mathematically illiterate and technophobes.
As was already discussed, Fred made a very conscious decision not to implement a ratings system after seeing how destructive this was to the OKB playing environment.
I think that he made the right choice at the time. Right now, I think that the arguments against creating a ratings system are somewhat less strong (I think that there are good enough machine learning algorithms to do a good job at creating a ratings system. 18 years back, I'm less sure). However, the critical issue is still the impacts on the social dynamics of the site. And here, I think that ratings systems are still rank poison.
#7
Posted 2021-January-17, 09:27
hrothgar, on 2021-January-17, 06:21, said:
Not sure we even need machine learning. As previously discussed, the EBU NGS ranking scheme seems to work and is designed to see beyond pairs. It could be adapted to the more mainstream BBO tournaments without problems I would expect.
As for the social issues, you may be right. I have played on another online card game site where the rating system poisons all social interactions, for the reasons you mention. But I would be curious to know if there are counter-experiences. Does Elo create chronic social problems in online chess, which is considerably more popular than bridge? If not, could it be because the rating is accepted as site-independent and true?
#8
Posted 2021-January-17, 09:31
pescetom, on 2021-January-17, 09:27, said:
"seems to work" is mighty thin soup
I can point out any number of flaws with the NGS system
If you are going to do this, do it right.
#10
Posted 2021-January-17, 10:21
pescetom, on 2021-January-17, 09:40, said:
Have you done so to the NGS Working Group, or on BridgeWinners which would be a better place than here?
Yes I have.
They don't care.
They are happy with their archaic little scheme and aren't interested in considering other approaches.
In terms of the flaw, at the most basic the NGS scheme calculated results on a session by session basis. It is unable to adjust for the way in which differences in individual boards impact your results. Some board are (naturally) going to be flat. Other have a lot more room for player skill to impact results. If you're unlucky enough to play flat boards versus weak pairs and complicated boards against strong pairs you're going to have a crappy session.
NGS, ELO and the like are based on methods that are 40+ years old.
In the world of machine learning and AI, 5 years is a lifetime.
The EBU's attitude seems to be
We came up with the following.
We think it's good enough
There's no reason for us to improve anything
We aren't even interested in doing to bake off to evaluate other schemes or compare accuracy
#11
Posted 2021-January-17, 14:35
pescetom, on 2021-January-17, 09:27, said:
As for the social issues, you may be right. I have played on another online card game site where the rating system poisons all social interactions, for the reasons you mention. But I would be curious to know if there are counter-experiences. Does Elo create chronic social problems in online chess, which is considerably more popular than bridge? If not, could it be because the rating is accepted as site-independent and true?
Some of my partners have EBU NGS ratings. NGS ratings are crude and simple but they're easy to understand and friends generally like them. Rating schemes can cause problems and BBO has been set against them from the beginning. In the unlikely event that BBO adopts such a scheme, IMO, BBO should allow dissenting players to opt out and revert to the current daft self-rating system.
#12
Posted 2021-January-17, 14:55
Colorado Springs Power Ratings
I can't vouch for their accuracy, but I did find a local cheating pair based on an extraordinary high online rating.
#13
Posted 2021-January-17, 15:35
hrothgar, on 2021-January-17, 10:21, said:
They don't care.
They are happy with their archaic little scheme and aren't interested in considering other approaches.
In terms of the flaw, at the most basic the NGS scheme calculated results on a session by session basis. It is unable to adjust for the way in which differences in individual boards impact your results. Some board are (naturally) going to be flat. Other have a lot more room for player skill to impact results. If you're unlucky enough to play flat boards versus weak pairs and complicated boards against strong pairs you're going to have a crappy session.
NGS, ELO and the like are based on methods that are 40+ years old.
In the world of machine learning and AI, 5 years is a lifetime.
The EBU's attitude seems to be
We came up with the following.
We think it's good enough
There's no reason for us to improve anything
We aren't even interested in doing to bake off to evaluate other schemes or compare accuracy
Thanks. Flat boards versus weak pairs and complicated boards against strong pairs is one reason we need to play so many boards at pairs: but if we do and the algorithm evaluates hundreds of such tournaments it's not obvious to me why the rating should be crappy in predicting the outcome of a pair of such tournaments. Many archaic things work. The NGS team say "we expect the standard deviation of the error in your current grade to be around 2%, provided you have a typical mix of partners". Are they way off? Is ELO?
#14
Posted 2021-January-17, 15:56
Here the scheme caused real problems. See, the other club was of a lower standard but the initial data had no way to reflect that. So they would come to our club and their rating would drop. Once they realised that, these players would simply stop coming to "protect" their rating. The effect was so pronounced that the club simply dropped the scheme altogether after two years.
#15
Posted 2021-January-17, 16:12
sfi, on 2021-January-17, 15:56, said:
Here the scheme caused real problems. See, the other club was of a lower standard but the initial data had no way to reflect that. So they would come to our club and their rating would drop. Once they realised that, these players would simply stop coming to "protect" their rating. The effect was so pronounced that the club simply dropped the scheme altogether after two years.
Sure, the diffusion problem is real, and not just in terms of clubs: just think about two players who only ever partner each other, the system can not distinguish their strength. But there are ways of seeding diffusion and over a few years things should work out anyway - would Meckstroth really put up with me for years, does nobody ever leave that bad club? I figure that if there is so little interplay between clubs that they remain isolated for years then any kind of national or international rating is superfluous to them anyway.
People from a weak club visiting a strong club will finish near bottom more often than not. That will discourage them, with or without a rating system.
#16
Posted 2021-January-17, 16:26
pescetom, on 2021-January-17, 16:12, said:
Yes, it would have worked itself out eventually if people continued playing at both clubs. But the short-term impact was damaging enough both to the players and to the club that the end point was never reached, and the fact that players stopped playing at the other club meant that this point would take longer to reach. Another factor is that people don't like being told they're not as good as they think they are. And this would have been a long-term impact of this rating once it eventually started to reflect reality.
The bottom line was that the scheme was measurably hurting table numbers and not bringing any perceived benefit to either the players or the club. So it was dropped.
#17
Posted 2021-January-17, 17:06
pescetom, on 2021-January-17, 15:35, said:
Given that the ACBL and the EBU refuse to release data sets or compare the accuracy of the algorithms that are being used, who the ***** can tell...
The big issue here is that groups like the EBU refuse to do appropriate due diligence
They have a system
They claim it works
#18
Posted 2021-January-17, 18:36
Judging other players, or comparing your own bridge to that of other players, is something you should avoid, though. It would create lots of social issues:
- players blaming their partners for ruining their rating
- players leaving mid-hand when they are on their way to a result that would ruin their rating
- players selecting partners and opponents based on (mostly misguided) beliefs about how the choice influences their rating
- players avoiding playing when they are tired or distressed because of fear of ruining their own rating
- players accusing each other of cheating
- forum discussions being dominated by conspiracy theories about how the rating system is biased in favour of certain players
- players creating new accounts to start with a fresh rating (which in turns leads other players to be prejudiced against new accounts and thereby make it difficult for genuine newbies to get into the community)
Fred and Uday had seen this and/or similar disasters happening on other sites so they rightly chose not to implement a rating system on BBO.
#19
Posted 2021-January-17, 21:19
helene_t, on 2021-January-17, 18:36, said:
Judging other players, or comparing your own bridge to that of other players, is something you should avoid, though. It would create lots of social issues:
- players blaming their partners for ruining their rating
- players leaving mid-hand when they are on their way to a result that would ruin their rating
- players selecting partners and opponents based on (mostly misguided) beliefs about how the choice influences their rating
- players avoiding playing when they are tired or distressed because of fear of ruining their own rating
- players accusing each other of cheating
- forum discussions being dominated by conspiracy theories about how the rating system is biased in favour of certain players
- players creating new accounts to start with a fresh rating (which in turns leads other players to be prejudiced against new accounts and thereby make it difficult for genuine newbies to get into the community)
Fred and Uday had seen this and/or similar disasters happening on other sites so they rightly chose not to implement a rating system on BBO.
If you play only robot games, it shouldn't matter that the robots get better, or worse, over time. You are playing against other human players so as long as the average field stays the same, your score should vary according to your own skill level trends. If you are an improving player, your scores should go up accordingly.
As for the social issues, I don't think a rating system affects the playing environment nearly as much as some people think.
E.g. BBO implemented a self-rating system. Many BBO tables (players) state they don't want beginners/novices, or maybe just expert and above. So, many players overrate themselves so they can play in advanced games. Only problem is that it doesn't take more than a hand or two before they are exposed as beginners to intermediate players. Then they'll get bounced from the table. Having a data driven rating system won't make this worse.
As for most of the other bad behavior, you see players randomly jumping to 7NT, and redoubling for down many, leaving mid hand, whether or not the hand is going to be a disaster, actual cheating, whether for fun or for actual real life master points, getting banned and creating new accounts, etc. One "solution" would be to "fine" players a rating point of two for egregious bad behavior.
#20
Posted 2021-January-17, 22:04
Academics are constantly trying to devise new rankings so that they look better than other academics.
citations=likes
number of publications = masterpoints, and don't get me started on the H-index.