Proof that the single-variable linear-regression predictor derived using
the general matrix-based multiple regression algorithm gives the same
results as the original Pyret implementation.
Given: a set of inputs \(\left\{ x,\ldots \right\}\) and their
corresponding outputs \(\left\{ y,\ldots \right\}\).
\[X = \begin{pmatrix}
1 & x \\
\cdot & \cdot \\
\cdot & \cdot \\
\cdot & \cdot
\end{pmatrix},Y = \begin{pmatrix}
y \\
\cdot \\
\cdot \\
\cdot
\end{pmatrix}\]
Using the multiple-regression algorithm, we get
\[B = \begin{bmatrix}
\alpha \\
\beta
\end{bmatrix} = \left( X^{T}X \right)^{- 1}X^{T}Y.\quad\quad(1)\]
and the predictor function is \(y = \alpha + \beta x\).
\[X^{T} = \begin{pmatrix}
1 & \cdot & \cdot & \cdot \\
x & \cdot & \cdot & \cdot
\end{pmatrix}\]
\[\therefore X^{T}X = \begin{pmatrix}
1 & \cdot & \cdot & \cdot \\
x & \cdot & \cdot & \cdot
\end{pmatrix}\begin{pmatrix}
1 & x \\
\cdot & \cdot \\
\cdot & \cdot \\
\cdot & \cdot
\end{pmatrix} = \begin{pmatrix}
n & \Sigma x \\
\Sigma x & \Sigma x^{2}
\end{pmatrix}\]
\[\begin{aligned}
\text{ det }X^{T}X & = n\Sigma x^{2} - (\Sigma x)^{2} = \Delta\text{ (say) } \\
\text{and }\quad\quad\text{ cof }X^{T}X & = \begin{pmatrix}
\Sigma x^{2} & - \Sigma x \\
- \Sigma x & n
\end{pmatrix}
\end{aligned}\]
The adjoint of a matrix is the transpose of its cofactor matrix. So
\[\text{ adj }X^{T}X = \left( \text{cof }X^{T}X \right)^{T}\]
But \(\text{cof }X^{T}X\) is diagonally symmetric, so its
transpose is itself. So
\[\text{ adj }X^{T}X = \text{ cof }X^{T}X\]
The inverse of a matrix is its adjoint divided by its determinant. So
\[\left( X^{T}X \right)^{- 1} = \frac{\text{adj }X^{T}X}{\Delta} = \left( \frac{1}{\Delta} \right)\begin{pmatrix}
\Sigma x^{2} & - \Sigma x \\
- \Sigma x & n
\end{pmatrix}\]
Putting all this in (1), we have
\[\begin{aligned}
B & = \left( \frac{1}{\Delta} \right)\begin{pmatrix}
\Sigma x^{2} & - \Sigma x \\
- \Sigma x & n
\end{pmatrix}\begin{pmatrix}
1 & \cdot & \cdot & \cdot \\
x & \cdot & \cdot & \cdot
\end{pmatrix}\begin{pmatrix}
y \\
\cdot \\
\cdot \\
\cdot
\end{pmatrix} \\
& = \left( \frac{1}{\Delta} \right)\begin{pmatrix}
\Sigma x^{2} & - \Sigma x \\
- \Sigma x & n
\end{pmatrix}\begin{pmatrix}
\Sigma y \\
\Sigma xy
\end{pmatrix} \\
& = \left( \frac{1}{\Delta} \right)\begin{pmatrix}
\Sigma x^{2}\Sigma y - \Sigma x\Sigma xy \\
- \Sigma x\Sigma y + n\Sigma xy
\end{pmatrix}
\end{aligned}\]
\[\begin{aligned}
\therefore\alpha & = \frac{\Sigma x^{2}\Sigma y - \Sigma x\Sigma xy}{n\Sigma x^{2} - (\Sigma x)^{2}} \\
\text{and }\beta & = \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^{2} - (\Sigma x)^{2}}\quad\quad\ (2)\
\end{aligned}\]
Back to the original Pyret implementation. There we have
\[\begin{aligned}
\beta & = \frac{\Sigma xy - \frac{\Sigma x\Sigma y}{n}}{\Sigma x^{2} - \frac{(\Sigma x)^{2}}{n}} \\
& = \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^{2} - (\Sigma x)^{2}}
\end{aligned}\]
\[\begin{aligned}
\text{ and }\quad\quad\alpha & = \overline{y} - \beta\overline{x} \\
& = \left( \frac{\Sigma y}{n} \right) - \left( \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^{2} - (\Sigma x)^{2}} \right)\left( \frac{\Sigma x}{n} \right) \\
& = \left( \frac{\Sigma y}{n} \right) - \left( \frac{n\Sigma x\Sigma xy - (\Sigma x)^{2}\Sigma y}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \right) \\
& = \frac{\Sigma y\left( n\Sigma x^{2} - (\Sigma x)^{2} \right) - n\Sigma x\Sigma xy + (\Sigma x)^{2}\Sigma y}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \\
& = \frac{n\Sigma x^{2}\Sigma y - (\Sigma x)^{2}\Sigma y - n\Sigma x\Sigma xy + (\Sigma x)^{2}\Sigma y}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \\
& = \frac{n\Sigma x^{2}\Sigma y - n\Sigma x\Sigma xy}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \\
& = \frac{\Sigma x^{2}\Sigma y - \Sigma x\Sigma xy}{n\Sigma x^{2} - (\Sigma x)^{2}}
\end{aligned}\]
But these match exactly the values for \(\alpha,\beta\) in
(2). \(\quad\quad\) QED.