Wednesday, December 12, 2007

The Arena Rating Reset

One thing that's always puzzled me about Blizzard's Arena system is that Blizzard resets ratings at the beginning of every season. Most other rating systems go out of their way to avoid situations like that.

The concept of ratings is that a person has a "true skill level". Ideally, her rating accurately reflects her true skill. And as her true skill increases and decreases, so does the rating. The problem is that when a person first enters the rating system, her rating does not match her true skill, but through winning and losing many games, the two will eventually match.

As such, most rating systems treat that entry point a little differently. In chess, people are often given a "provisional" rating, which lasts for about 25 matches or so. While they have a provisional rating, rating changes are calculated differently. Microsoft's Trueskill system actually has a second value paired with the rating, which is a measure of how "confident" the system is that the rating actually reflects a player's true skill. As you play more and more games, the system becomes more and more confident that your rating is correct.

Blizzard's rating system is very unusual in that it returns everyone to that low confidence state every so often. The usual reason given is that it gives everyone a fresh start. But in reality, your true skill level doesn't change that much that quickly. All the reset does is cause people to end up playing matches against teams of wildly differing skill level. A team that should be rated 2000 is now rated 1500, and is going to steamroll most teams it will encounter at the start of the season. This happens until the ratings shake out and people are restored to their true skill levels.

I think there's a different reason Blizzard resets the ratings. The Arena rating system is meant to be a zero-sum system. If my team gains 20 points, your team loses 20 points. However, in its current incarnation, the Arena system is vulnerable to rating inflation. What happens is a low-ranked team (say 1200 rating), gets tired, dissolves, and reforms as a new team. The new team enters at 1500 points, meaning that 300 points are added into the system. They will probably eventually fall back to 1200 and the process may begin anew.

By resetting the ratings, Blizzard clears out the excess ratings added into the system and restores the system to its zero-sum balance. Unfortunately it has the side effect of ensuring that ratings don't match the true skill of the teams for a few weeks after the reset. And it causes heavy load and long queue times on the servers as the higher ranked teams seek to restore their correct rating.

It's sort of amusing, but chess actually has the same problem, only in the opposite direction. I have a friend who is heavily involved with the Chess Federation of Canada. According to him, one of the main problems with their rating system is the existence of chess schools or camps for youth. What happens is that during the summer, the kids play constantly against each other and end up being pretty good because of the practice and training (not Grandmaster-good, but better than average).

Then at the end of the summer, they will play in a couple of rated tournaments. Because their entry rating is lower than their true skill level, they end up taking a lot of rating points from the other people. However, after that summer they stop playing tournament-level chess, taking those points with them, and the chess rating system suffers from point deflation.

To combat this, the equations used by the CFC that govern rating changes have a very small bonus term, which increases the amount of rating in the system, hopefully restoring the balance and keeping the amount of rating in the system constant.

It's an amusing parallel to the situation faced because Blizzard did team-based ratings and allowed teams to be dissolved and rebuilt. A personal rating system, such as is being introduced with Season 3, tends to be more robust because it cannot be reset easily, and the system is not quite as vulnerable to inflation.

13 comments:

  1. Quick question (I should know the answer to this): Are arena points awarded now based solely on personal ratings? I know that you must have a minimum personal rating for certain season three gear, but I think I remember reading that actual spending points are awarded based on team points.

    Great blog. Keep up the paladin postings!

    ReplyDelete
  2. @zaen:

    No, they are given according to team ratings. However, items that require rating are based both on personal and team ratings.

    ReplyDelete
  3. I feel a comparison to chess is a bit off. Arena is teambased. Chess is 1 vs 1 (and probably one of the most even playfields one can get and the factor luck is 0). Teams are subject to change (players in it, respeccs, gearchanges). I think a comparison with another teamgame (soccer, hockey, etc)would be more appropriate. But then you will notice that all teamsports (that I can think off) work with seasons and resets.

    If the seasons wouldnt reset automaticly every team below 1500 would probably disband/reform again at some point, even more then now.

    The concept of ratings is that a person has a "true skill level". Ideally, her rating accurately reflects her true skill.
    I think you kind of made it clear while PvP gear should be 'easyer available'. The only way to show true skill, is by starting every match on equal grounds. The bigger the gap in gear is for teams, the more this rating would reflect geardiffrence then skill. Everyone having the exact same choices in gear would be ideal for this (like in FPS games), however the drive to compete would be alot less if there wasn't anything shiny waiting at the end, not just a fancy title. This is where pvp'ers and pve'ers are basicly the same. Eventhough both state not to do it for the Epixx, they both still want them :)

    ReplyDelete
  4. Do arena rankings really measure player skill? I don't think that skill (as in I'm better at pushing buttons than you) is directly measured by the ranking. Do arena rankings measure gear difference? I don't think that they measure gear ranking directly either. They obviously measure both at the same time.

    Now players tend to get caught up in the idea that my ranking is better than yours and use it as a comparison for power / progression / etc.... but that use by players is entirely incidental to what the arena ratings are actually used for by the game.

    A basic principal of PvP is simply this: If I lose every battle I quit. Even if you reward me with a few points or some gear most people don't like the feeling of losing and will go somewhere to get that feeling of winning be it raiding, soloing, making a new alt, etc...

    The arena ranking are entirely set up to make sure that in the worst case scenario I always play at the level of my own suck and that my win % is at least 50/50. The arena rankings for the lower and mid level players is a retention system.

    The resets occur in my opinion because for the hardcore PvP (Just like for the hardcore raiding community) you periodically have to introduce new gear / raids --- in order to keep interest and in order to open the system up and let some people who are in the low / mid game have a chance of jumping into the big leagues. At the highest level of arena ratings you may very well be directly measuring player skill and you have to periodically reset everyone to prevent permenent advantages. PvP hardcore compete for prestige. This is why the ratings are reset. No one permenently gets the titles or rating. You have to continually stay on top and earn it out. WOW gives you the epic flyer as a long term badge but the everyday prestige is in the titles/ratings.

    How does the rating reset effect the low / mid players. Usually not much. I may experience a few weeks at 0-10 wins/losses because of bad luck in drawing against the gladiators but most of us losers would just switch to the battlegrounds for a few weeks until the ratings sort out a bit. For the casual PvP player the arena plus the battlegrounds is an excellent player retention mechanism.

    For those people worried about the "welfare" epics and so on you are not really looking at things from the proper perspective. You are looking at it through the eyes of a raider or PvPer which is entirely about the short term and the relative social status. The developers know that with Wrath of the Lich King all this gear is reset anyway. They only have to maintain players until then. How do you maintain casual players? You give them lots of stuff. How do you maintain hardcore PvP? You keep the battle interesting by changing the fight/rules/ratings. How do you keep the hardcore raiders? You give them new raids.

    Everything in patch 2.3 was about player retention. The alts, the gearing, the ZA raid zone, and the PvP season resets and changes.

    ReplyDelete
  5. You also have to consider that a high ranked gladiator team would only have to play 20 games total in the next season if the ratings were to remain(assuming personal ratings were set to your team rating when introduced, and are not reset in successive seasons). 20 games on the team and 20% participation is all you need. This leaves very few opportunities to actually fight that team, which may play all its games for the season in the first week and simply wait for the rewards to roll in, or play them all at the end of the season in the early morning. Granted most people at this level of PvP play would rather play more than such, but not resetting ratings leads to loopholes like above, which is probably part of the reason that ratings are reset. In particular, it would be ridiculously easy to purchase gladiator titles if ratings were not reset.

    ReplyDelete
  6. @Daemion

    I find it interesting that you disagree that the comparison between ratings systems for the CFC and arena are improper because of the nature of 1v1 vs. team play. Both use what is supposed to be a 0 sum method. one suffers from good players who come in underrated, perform well, taking rating points from the people they beat, and then stop playing, effectively removing those points from the total. The other just suffers the opposite, players coming back in with 300 more points to loose after they already lost them means more points in the pot total.

    As for the resetting, it is in a way a no win situation. stagnation as a result of no resets, or games where a freshly minted team gets turned into charred ground beef because they had the unlucky draw of getting last seasons Champs.
    Personally I would prefer a situation where my team with 2 games to their name doesn't go against a team which will not only beat us, but do so in a very humiliating way. Maybe some system where the personal rating averages of the team members were used for the first couple of weeks in match ups as a modifier to the team rating would be a solution. (if these did not reset season to season) Or simply use last season's personal ratings at the beginning.

    ReplyDelete
  7. damn no edit... in my above post (im tego) i advocated using the personal ratings in the first few weeks. To clarify the team rating would still reset, but by allowing the fighting to begin to work out where the "better"(higher previous ranked) teams would begin by being more likely to face each other would help reduce the number of very quick wipe games some teams face. the better teams would still have to work their way up the rankings so there would be no guaranteed win, while less experienced teams didn't get taken quite so quickly in the beginning.

    ReplyDelete
  8. I love reading your blogs.
    I can't put my thoughts into words.

    ReplyDelete
  9. daemion, from the point of view of the rating system, Arena *is* 1v1. One team with a rating vs. another team with rating. So the comparison to the ELO chess system is actually very strong.

    As for team sports, most sports have a fixed schedule and measure performance by win percentage. The relative skill of your opponents does not matter. So that system does not match Arena at all.

    As for the 20 game requirement, if that's too low, we could increase it to 100 or so. If an Arena season is 4 months long, that's a bit more than playing the minimum games for half the Season. And if you can retain your rating after 100 games, odds are that is what your rating is meant to be.

    Remember that the other purpose of rating is to match teams against teams of similar skill. It's not just 20 or 100 games, it's 20 or 100 games against teams ranked 2K+.

    As for stagnation, a rating system isn't supposed to have wild swings. If you get better, your rating will steadily get better. If you get worse, your rating will steadily decrease. That's not stagnation, that's the correct situation.

    ReplyDelete
  10. I'm not so sure one can lump the 1V1 versus teamVteam in the same boat. A person cannot drop out of the Chess system and re-enter as a new individual. Teams, however, can reform with the exact same members.

    Secondly, any arena team worth noting will utilize a large team cooperation effort. By resetting the arena ratings, Blizzard can really test the durability of this team unity. Imagine a middle-of-the-road team getting steamrolled 10 times and that'll really test if they have what it takes to dethrone the previous top teams. If it were just a single person, there would be no need as an individual can always have 100% say as to what he/she wants/represents.

    Finally, Chess does not reward players with ever increasingly lethal pieces. Yet this is true for PvP. If there is nothing past S3 gear, I don't see why they would reset the ratings. Incidentally, this is proven to be true as each season only ends when new gear is available. If you assume that the end of a season and availability of new gear is mutually exclusive, rating resets start to make sense.

    ReplyDelete
  11. Arena rating isn't a measure a skill (well, it is, but not a long-term lifetime measure of skill.) Instead of thinking of it like a chess player's rating, think of it more like a chess tournament ranking.

    Everyone enters the tournament at the same level (0-0 in the case of chess, 1500 in the case of arenas.) You play the tournament/arena season, win some and lose some, and end up with a final ranking. Based on that ranking you get rewards and a new tournament starts again.

    There is no equivalent chess rating in WoW. There is no calculation that takes skill and gear into account to say how "good" your team is, or what your expected rating in a particular arena tournament season will be. Everyone is assumed to be equal at the start of the season, and you'll see 1500 awesome teams going up against 1500 crappy teams in the first week.

    ReplyDelete
  12. I was a little imprecise in my definition of rating. Rating is not exactly skill. It represents "probability of defeating the other team". In most games, that's skill, but in WoW, that's skill + gear.

    An 1800 team is more likely to beat a 1500 team.

    However, I don't think that just because a new season starts this fact changes. Both teams will get gear, but the 1800 team generally will be better geared after, and they will still have their skill. The 1800 team is still more likely to beat the 1500 team.

    So if the statement is still valid, why have the reset? If the statement is not valid, teams will fall and rise in rating in the natural order of things. The reset is unnecessary and sub-optimal from an accurate rating point-of-view.

    ReplyDelete
  13. I've a bit of ADD, so I apologize if I missed this in either the article, or the comments.

    I think that the rating reset is great because it gives every team the chance to go up against top tier teams. If a team that wants to improve, but only gets to play mostly against people of similar skill levels, it's a bit difficult to advance beyond that barrier; with a rating reset, you get to play top tier teams and see how they do things. Maybe it's positioning, maybe it's target swapping, maybe it's how they control your team, there's almost always something to be learned from an encounter with a gladiator team when you're non-glad team.

    Plus, new gear is nice too. :D

    ReplyDelete