The tower property of conditional expectation (also called the law of iterated expectations) is akin to the law of total probability that we met in Section 6.3. Another name is law of total expectation. A special case is
$E[X]=\sum_{i:P(A_i)>0} E[X \mid A_i]P(A_i)$,
where $A_1,\dotsc,A_n$ are events giving a partition of the sample space $\Omega$. Compare the above to
$E[X]=E\big[E[X \mid Y]\bigr]=\sum_y E[X \mid Y=y]P(Y=y)$.
Sometimes we might write $E\Big[E[X \mid Y]\Bigr]$ as $E_Y\Big[E[X \mid Y]\Bigr]$, just to remind ourselves that the outer expectation is being applied to the variable $Y$. Of course it is implicit in all this that $E[X]$ exists.
Remember that $P(A | B)$ is only defined for $P(B)>0$. If you want to see something that illustrates this in a cute way, then read this nice description of Borel's paradox. The paradox occurs because of a mistaken step in conditioning on an event of probability $0$, which can be reached as a limit of a sequence of events in different ways.
$E[X]=\sum_{i:P(A_i)>0} E[X \mid A_i]P(A_i)$,
where $A_1,\dotsc,A_n$ are events giving a partition of the sample space $\Omega$. Compare the above to
$E[X]=E\big[E[X \mid Y]\bigr]=\sum_y E[X \mid Y=y]P(Y=y)$.
Sometimes we might write $E\Big[E[X \mid Y]\Bigr]$ as $E_Y\Big[E[X \mid Y]\Bigr]$, just to remind ourselves that the outer expectation is being applied to the variable $Y$. Of course it is implicit in all this that $E[X]$ exists.
Remember that $P(A | B)$ is only defined for $P(B)>0$. If you want to see something that illustrates this in a cute way, then read this nice description of Borel's paradox. The paradox occurs because of a mistaken step in conditioning on an event of probability $0$, which can be reached as a limit of a sequence of events in different ways.