Next: , Up: Felo ratings -- some background


5.1 The calculation

The calculation of new Felo ratings is a three-steps process:

  1. The result value for the fencer is calculated. This is the fraction of single hits that he won. For example, if the fencer won 15:10, his result value is 15/(15+10)=0.6.
  2. The difference of the Felo ratings of both fencers yields the expected result value. The formula for this can be found e.g. in the Wikipedia article about the Elo rating system.
  3. Let us call the difference between the result value and the expected result value the surprise. The surprise is multiplied by the total number of points in the bout, and the so-called k factor, see List of all parameters. The result is added to the old Felo rating. This yields the new one.

This k factor is some sort of damping parameter. If it is too high, the Felo ratings will oscillate too heavily. If it is too low, the Felo ratings will converge too slowly.

Why does the Felo rating take the score into account?

Some find it unusual or even irritating that the Felo ratings uses the complete score rather than just using win or loss.

The rationale for this is very simple: It makes Felo ratings converge very quickly, i.e., they find their true value much earlier. This is advantageous when a fencing group starts with Felo ratings, so that everybody has a real Felo rating quite soon. Similarly, it means that the Felo ratings reflect changes in your abilities or condition rather accurately. If Felo counted only wins and losses, such subtle developments would not be noticeable.

There is another mathematical issue with pure win/loss ratings. Normally, you have bouts with 5, 10, and 15 win points. Sometimes, even other values are possible. Unfortunately, they are incompatible because the win/loss probability in a 15 point bout is more extreme than that in a 5 point bout. Consequently, the ratings calculated from 5 point bouts only are much closer together than for longer bouts. Putting them all into one ratings would mean you measure tallnesses both to the head and to the shoulders, and calculate an average tallness from the whole set of values. This is ridiculous obviously.

Thus, you'd have to make the results compatible first. Mathematically, this works, however, it means that you estimate the single-hit probability for every bout. Thus, you end up with the single hit again, just with less accuracy. So you gain little (if anything at all) and lose much.

The Felo ratings – as they are – are a good combination of rapid adaption and accuracy.