## Wednesday, 11 March 2015

### Lecture 24

The overhead slides I used for our last lecture are here (best downloaded and viewed with a pdf reader, rather than as a web page).

I showed you a bit of Large deviation theory and Chernoff's upper bound. The lower bound  was quoted without proof (except that in Example 24.1 we did the lower bound for the special case of $B(1,p)$). If perhaps some time mid-summer you have a moment and are curious to see the lower bound proved (which you have learned enough to understand), and more about large deviations, see these slides from a talk I once gave, pages 5–9. This note on the Cramer-Chernoff theorem is also good. Interestingly, Chernoff says in an interview with John Bather that the Chernoff bound should really be named after someone else!

I have used large deviation theory in my research to analyse buffer overflows in queues. But I know almost nothing about the subject of Random matrices. I prepared the content for Section 24.2 because I was intrigued by the results, wanted to learn, and thought it would be something entertaining with which to conclude. I did some of my reading in Chapter 2 of An Introduction to Random Matrices, by Anderson, Guiomet and Zeotouni, and then simplified their treatment to make it suitable for IA.

I thought it was nice that in showing you these two advanced topics, I could bring into play so many of the ideas we have had in our course: Markov and Chebyshev inequalities, moment generating function, sums of Bernoulli r.vs, Stirling’s formula, normal distribution, gambler’s ruin, Dyke words, generating functions, and the Central limit theorem.

So I hope you will consider the applicable courses in your next year of study. In Appendix H I have written some things about applicable mathematics courses in IB.

## Monday, 9 March 2015

### Lecture 23

Lévy's continuity theorem is the same thing as the continuity theorem given in today's lecture, but for characteristic functions.

Here is a little more history about the Central Limit Theorem.

Henk Tijms writes in his book, Understanding Probability: Chance Rules in Everyday Life, Cambridge: Cambridge University Press, 2004,

"The Central Limit Theorem for Bernoulli trials was first proved by Abraham de Moivre and appeared in his book, The Doctrine of Chances, first published in 1718. He used the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin. This finding was far ahead of its time, and was nearly forgotten until the famous French mathematician Pierre-Simon Laplace rescued it from obscurity in his monumental work Théorie Analytique des Probabilités, which was published in 1812.

De Moivre spend his years from age 18 to 21 in prison in France because of his Protestant background. When he was released he left France for England, where he worked as a tutor to the sons of noblemen. Newton had presented a copy of his Principia Mathematica to the Earl of Devonshire. The story goes that, while de Moivre was tutoring at the Earl's house, he came upon Newton's work and found that it was beyond him. It is said that he then bought a copy of his own and tore it into separate pages, learning it page by page as he walked around London to his tutoring jobs. De Moivre frequented the coffeehouses in London, where he started his probability work by calculating odds for gamblers. He also met Newton at such a coffeehouse and they became fast friends. De Moivre dedicated his book to Newton."

The Wikipedia article on the Central limit theorem mentions two things that would be suitable for the television programme QI.

1. There is a quite interesting explanation of why the term "Central" is used.

The actual term "central limit theorem" (in German: "zentraler Grenzwertsatz") was first used by George Pólya in 1920 in the title of a paper. Pólya referred to the theorem as "central" due to its importance in probability theory. According to Le Cam, the French school of probability interprets the word central in the sense that "it describes the behaviour of the centre of the distribution as opposed to its tails".

Personally, I had always thought it was the second of these, but the first is also plausible.

2. There is a quite interesting Cambridge connection.

A proof of a result similar to the 1922 Lindeberg CLT was the subject of Alan Turing's 1934 Fellowship Dissertation for King's College at the University of Cambridge. Only after submitting the work did Turing learn it had already been proved. Consequently, Turing's dissertation was never published.

## Friday, 6 March 2015

### Lecture 22

You might like to experiment with the bivariate normal. Here is the Mathematica code that I used to plot the joint density function. You can copy and paste this code into Mathematica and they play with viewing from different angles or changing the value of the correlation, r.

S = ParallelTable[
Plot3D[PDF[
MultinormalDistribution[{1, 1}, {{1, r }, {r , 1}}], {x, y}],
{x, -3, 4}, {y, -3, 4}, PlotRange -> All,
MeshFunctions -> {#3 &}, PlotPoints -> 50], {r, {0, 0.6}}]


You can also get the code here, and for Kelly betting here.

### Inversion of p.g.f and m.g.f.

For discrete random variables working with the p.g.f. is easy. We have

Theorem 12.2The distribution of $X$ is uniquely determined by the p.g.f. $p(z)$.

The proof is trivial because the distribution of $X$ is contained in coefficients of the power series $p(z)=p_0+p_1z+p_2z^2+\cdots$. It is also easy to tell if any given power series is a p.g.f. All we need is check is that the coefficients are nonnegative and sum to 1.

However, things are not so easy for moment generating functions. Given a power series $m(\theta)=1+m_1\theta + m_2\frac{1}{2}\theta^2 +\cdots$
it is not easy to tell if there exists a r.v. that has moments $m_1,m_2,\dotsc$.
Also we had without proof the theorem analogous to Theorem 12.2,

Theorem 21.4 The moment generating function determines the distribution of $X$, provided $m(\theta)$ is finite for all $\theta$ in some interval containing the origin.

To prove Theorem 21.4 we would need to show how to recover the p.d.f. from the m.g.f.
In fact, it is easier to use the characteristic function. Let's take a small look at that.

In Lecture 22 we saw that the moment generating function of $N(0,1)$ is $m(\theta)=e^{\theta^2/2}$. This looks a lot like the p.d.f., which is $f(x)=e^{-x^2/2}/\sqrt{2\pi}$.

The characteristic function (c.f.) $\chi$ is obtained by replacing $\theta$ with $it$, $t$ real,

$\chi(t) = \int_{-\infty}^\infty f(x)e^{itx} dx$. This is $\chi(t)=e^{-t^2/2}$ for $N(0,1)$.

We can compute nearly the same integral of $\chi$ to recover $f$:

$\int_{-\infty}^\infty \chi(t)e^{-itx} dt=2\pi e^{-x^2/2}=2\pi f(x)$.

In general, given the c.f. we can recover the p.d.f. using the inversion formula

$f(x)=\frac{1}{2\pi}\int_{-\infty}^\infty \chi(t)e^{-itx} dt$.

The m.g.f. exists only if all moments are finite. However, the c.f. exists even when moments do not exist, since $e^{itX}$ is just some point on a unit circle in the complex plane, and hence nothing is going off to infinity. The Cauchy distribution is an example of a distribution that has no m.g.f, but does have a c.f., namely $e^{-|t|}$.

### Feedback

On Friday March 6, I distributed the usual feedback forms. Alternatively, you can complete this equivalent on-line form, or print the paper form here and hand it to me. If you prefer, you may wait until closer to the last lecture to complete the on-line form or to hand me a paper form.

The on-line version will be sent to my email anonymously (unless you choose to sign your name). After reading the responses, I will forward printouts to the Faculty Office.

### Examples Sheet 3

Most of you will have now done Examples Sheet 3. Here are a few notes.

#4. You'll want to compare this with Examples sheet 4 #5.

#5. What would be the analogous question to this one, posed for continuous r.vs?

#8. You should know how to answer this two ways: by using p.g.fs, and also directly via that tower property of conditional expectation.

#16. How do we know that the answer is $a_z=a^z$, where $a$ the smallest positive root of $a=pa^3+1-p$? We have such a theorem for the extinction probability in a branching processes. How can this coin tossing problem be viewed through the lens of some branching process?

#17. If you need a hint, here is one: $P(|X+Y|>\epsilon)\leq P(|X|\geq \epsilon/2)+P(|Y|\geq \epsilon/2)$.

#18. The hint is for the third part. But it also useful in answering the second part: "Could you load the die so that the totals {2,3,4,5,6,7} are obtained with equal probabilities?"

### Planet Zog problems

Here are two challenging tripos questions from recent years. The first is a problem in geometric probability (as in Lecture 18). To do this question it will help to know that the surface area of a spherical cap is $2\pi R h$, where $R$ is the radius of the sphere and $h$ is the height of the cap. The surface area of the entire sphere is $4\pi R^2$.

2003/II/12
Planet Zog is a ball with centre $O$. Three spaceships $A$, $B$ and $C$ land at random on its surface, their positions being independent and each uniformly distributed on its surface. Calculate the probability density function of the angle $AOB$ formed by the lines $OA$ and $OB$.

Spaceships $A$ and $B$ can communicate directly by radio if  $\angle AOB < \pi /2$, and similarly for spaceships $B$ and $C$ and spaceships $A$ and $C$. Given angle $\angle AOB =\gamma < \pi /2$, calculate the probability that $C$ can communicate directly with either $A$ or $B$. Given $\angle AOB =\gamma > \pi /2$, calculate the probability that $C$ can communicate directly with both $A$ and $B$. Hence, or otherwise, show that the probability that all three spaceships can keep in in touch (with, for example, $A$ communicating with $B$ via $C$ if necessary) is $( \pi +2)/(4\pi )$.

The second question is rather different. Here you are being tested on the idea that a sample space can be partitioned into disjoint events.

2004/II/12
Let $A_1,A_2,\dotsc,A_r$ be events such that $A_i\cap A_j = \emptyset$; for $i \neq j$. Show that the number $N$ of events that occur satisfies

$P(N = 0) = 1 −\sum_{i=1}^rP(A_i) .$

Planet Zog is a sphere with centre $O$. A number $N$ of spaceships land at random on its surface, their positions being independent, each uniformly distributed over the surface. A spaceship at $A$ is in direct radio contact with another point $B$ on the surface if $\angle AOB < \pi/2$. Calculate the probability that every point on the surface of the planet is in direct radio contact with at least one of the $N$ spaceships.

[Hint: The intersection of the surface of a sphere with a plane through the centre of the sphere is called a great circle. You may find it helpful to use the fact that $N$ random great circles partition the surface of a sphere into $N(N − 1) + 2$ disjoint regions with probability one.]

## Wednesday, 4 March 2015

### Lecture 21

I mentioned the beta distribution today. There is more about that distribution in Appendix D. I became much better acquainted with this distribution myself when I was doing some work for a reinsurance company in 2013.

## Monday, 2 March 2015

### Lecture 20

Today we saw how to transform random variables. By this means we can discover further interesting relationships between distributions.

There is a particularly strange fact in Examples sheet 4, #19.

By the way, if (like me) you have ever been puzzled why it is that the absolute value of the Jacobian, $|J|$, gives the volume change of a $n$-dimensional parallelepiped under an linear transformation in $\mathbb{R}^n$, you might enjoy reading this essay A short thing about determinants, or “an attempt to explain volume forms, determinants and adjugates by staying in 3-d”, by Gareth Taylor.

I have also now added to the notes an Appendix C, in which I explain how under a linear transformation a sphere is mapped into an ellipsoid of a volume that greater by a factor $|J|$. I think that in previous years lecturers have said at this point "you already know this from Vector Calculus". But I am not sure that's that case, and so decided it would be nice to try to remove some of the mystery.

The convolution we need to add two Cauchy random variables is tricky. On page 81, the partial fractions method of computing this integral is to write

$\frac{1}{(1+x^2)(1+(z-x)^2)}=\frac{A+Bx}{(1+x^2)}+\frac{C+D(x-z)}{(1+(z-x)^2)}$

for $(A,B,C,D)=\frac{1}{4+z^2}(1,2/z,1,2/z)$. Upon integrating w.r.t. $x$ the terms with $B$ and $D$ give $0$ (by symmetry either side of $0$), and the terms with $A$ and $C$ provide the claimed result of $\frac{2}{\pi(4+z^2)}$.

Using the ideas in today's lectures, one can show that if $X,Y$ are independent $N(0,1)$ then $W=X^2+Y^2$ and $T=\tan^{-1}(Y/X)$ are independent $\mathscr{E}(1/2)$ and $U[-\pi/2,\pi/2]$. We can use this result in the reverse direction to generate normal random variables. Make two independent samples from $U[0,1]$, say $U$ and $V$. Let $R^2=-2\log U$ (which is $\mathscr{E}(1/2)$) and let $\Theta=(V-\frac{1}{2})\pi$. Then $X=R\cos\Theta$ and $Y=R\sin\Theta$ are independent $N(0,1)$.