I spoke about the notion of information entropy: H(X)=−∑ipilogpi. This has already featured on Examples sheet 1, #16. There we were to consider the multinomial distribution for dice rolls,
ϕ(k1,…,k6)=n!k1!⋯k6!16n, where ∑iki=n and ∑iiki=ρn.
Using Stirling's formula and setting ki=npi we see that this is maximized by maximizing the entropy of −∑ipilogpi, subject to ∑iiki=ρn. You'll find something like this at the start of the Wikipedia article on Large deviations theory.
Entropy is an important concept in many fields, particular in communication/information theory. Shannon's coding theorem says, informally,
"N i.i.d. random variables each with entropy H(X) can be compressed into more than NH(X) bits with negligible risk of information loss, as N tends to infinity; but conversely, if they are compressed into fewer than NH(X) bits it is virtually certain that information will be lost."
You can learn about this in the Part II course: Coding and Cryptology.
ϕ(k1,…,k6)=n!k1!⋯k6!16n, where ∑iki=n and ∑iiki=ρn.
Using Stirling's formula and setting ki=npi we see that this is maximized by maximizing the entropy of −∑ipilogpi, subject to ∑iiki=ρn. You'll find something like this at the start of the Wikipedia article on Large deviations theory.
Entropy is an important concept in many fields, particular in communication/information theory. Shannon's coding theorem says, informally,
"N i.i.d. random variables each with entropy H(X) can be compressed into more than NH(X) bits with negligible risk of information loss, as N tends to infinity; but conversely, if they are compressed into fewer than NH(X) bits it is virtually certain that information will be lost."
You can learn about this in the Part II course: Coding and Cryptology.