“The difference between the perplexity of a (language) model and the true perplexity of the language is an indication of the quality of the model.

Why do we use perplexity instead of the entropy?

If we think of perplexity as a branching factor (the weighted average number of choices a random variable has), then that number is easier to understand than the entropy. I found this surprising because I thought there will be more profound reasons. I asked Dr. Zettlemoyer if there is any other reason other than easy interpretability. His answer was “I think that is it! It is largely historical, since lots of other metrics would be reasonable to use also!””

Perplexity Intuition (and Derivation) – Towards Data Science

Comments are closed.