**Zipf's law**.

As a small digression, or "fun fact", I told you today about Zipf's law. This is a probability mass function such that $P(X=i)\propto 1/i^s$, $i=1,2,\dotsc,m$, for some $s > 0$, typically $s=1$. It's certainly mysterious why this distribution should be a good fit to frequencies with which words are used in a language. According to the Wikipedia article linked above, "Zipf himself proposed that neither speakers nor hearers using a given language want to work any harder than necessary to reach understanding, and the process that results in approximately equal distribution of effort leads to the observed Zipf distribution". My guess is thinking that he is thinking that to retrieve the $i$th most popular word from your memory has a cost proportional to $i$ (like you were digging down into a list), and that the average workload that each word puts on the brain should be constant, so $ip(i)$ should constant. But this is clearly hand-waving. Maybe you can think of a better explanation

In queueing theory and other theories of congestion, delays and long waiting times can be caused by variability in service times. Suppose customers enter a queue at a constant rate of 1 every 5 minutes. They are served one at a time, service times are independent, taking $1,2,\dotsc$ minutes with probabilities $p_1,p_2,\dotsc$, respectively

An interesting fact (researched extensively during the past 30 years) is that there are large qualitative difference between how such queues behave when $p_i$ decays geometrically or exponentially (as do the Poisson, or geometric distribution, which has $p_i= q^ip$), and when it decays according to a power law, like $p_i\propto 1/i$ or $\propto 1/i^{0.9}$.