Proof that the single-variable linear-regression predictor derived using the general matrix-based multiple regression algorithm gives the same results as the original Pyret implementation.

Given: a set of inputs \(\left\{ x,\ldots \right\}\) and their corresponding outputs \(\left\{ y,\ldots \right\}\).

Let

\[X = \begin{pmatrix} 1 & x \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \end{pmatrix},Y = \begin{pmatrix} y \\ \cdot \\ \cdot \\ \cdot \end{pmatrix}\]

Using the multiple-regression algorithm, we get

\[B = \begin{bmatrix} \alpha \\ \beta \end{bmatrix} = \left( X^{T}X \right)^{- 1}X^{T}Y.\quad\quad(1)\]

and the predictor function is \(y = \alpha + \beta x\).

We have

\[X^{T} = \begin{pmatrix} 1 & \cdot & \cdot & \cdot \\ x & \cdot & \cdot & \cdot \end{pmatrix}\]
\[\therefore X^{T}X = \begin{pmatrix} 1 & \cdot & \cdot & \cdot \\ x & \cdot & \cdot & \cdot \end{pmatrix}\begin{pmatrix} 1 & x \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \end{pmatrix} = \begin{pmatrix} n & \Sigma x \\ \Sigma x & \Sigma x^{2} \end{pmatrix}\]

We then have

\[\begin{aligned} \text{ det }X^{T}X & = n\Sigma x^{2} - (\Sigma x)^{2} = \Delta\text{ (say) } \\ \text{and }\quad\quad\text{ cof }X^{T}X & = \begin{pmatrix} \Sigma x^{2} & - \Sigma x \\ - \Sigma x & n \end{pmatrix} \end{aligned}\]

The adjoint of a matrix is the transpose of its cofactor matrix. So

\[\text{ adj }X^{T}X = \left( \text{cof }X^{T}X \right)^{T}\]

But \(\text{cof }X^{T}X\) is diagonally symmetric, so its transpose is itself. So

\[\text{ adj }X^{T}X = \text{ cof }X^{T}X\]

The inverse of a matrix is its adjoint divided by its determinant. So

\[\left( X^{T}X \right)^{- 1} = \frac{\text{adj }X^{T}X}{\Delta} = \left( \frac{1}{\Delta} \right)\begin{pmatrix} \Sigma x^{2} & - \Sigma x \\ - \Sigma x & n \end{pmatrix}\]

Putting all this in (1), we have

\[\begin{aligned} B & = \left( \frac{1}{\Delta} \right)\begin{pmatrix} \Sigma x^{2} & - \Sigma x \\ - \Sigma x & n \end{pmatrix}\begin{pmatrix} 1 & \cdot & \cdot & \cdot \\ x & \cdot & \cdot & \cdot \end{pmatrix}\begin{pmatrix} y \\ \cdot \\ \cdot \\ \cdot \end{pmatrix} \\ & = \left( \frac{1}{\Delta} \right)\begin{pmatrix} \Sigma x^{2} & - \Sigma x \\ - \Sigma x & n \end{pmatrix}\begin{pmatrix} \Sigma y \\ \Sigma xy \end{pmatrix} \\ & = \left( \frac{1}{\Delta} \right)\begin{pmatrix} \Sigma x^{2}\Sigma y - \Sigma x\Sigma xy \\ - \Sigma x\Sigma y + n\Sigma xy \end{pmatrix} \end{aligned}\]
\[\begin{aligned} \therefore\alpha & = \frac{\Sigma x^{2}\Sigma y - \Sigma x\Sigma xy}{n\Sigma x^{2} - (\Sigma x)^{2}} \\ \text{and }\beta & = \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^{2} - (\Sigma x)^{2}}\quad\quad\ (2)\ \end{aligned}\]

Back to the original Pyret implementation. There we have

\[\begin{aligned} \beta & = \frac{\Sigma xy - \frac{\Sigma x\Sigma y}{n}}{\Sigma x^{2} - \frac{(\Sigma x)^{2}}{n}} \\ & = \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^{2} - (\Sigma x)^{2}} \end{aligned}\]
\[\begin{aligned} \text{ and }\quad\quad\alpha & = \overline{y} - \beta\overline{x} \\ & = \left( \frac{\Sigma y}{n} \right) - \left( \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^{2} - (\Sigma x)^{2}} \right)\left( \frac{\Sigma x}{n} \right) \\ & = \left( \frac{\Sigma y}{n} \right) - \left( \frac{n\Sigma x\Sigma xy - (\Sigma x)^{2}\Sigma y}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \right) \\ & = \frac{\Sigma y\left( n\Sigma x^{2} - (\Sigma x)^{2} \right) - n\Sigma x\Sigma xy + (\Sigma x)^{2}\Sigma y}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \\ & = \frac{n\Sigma x^{2}\Sigma y - (\Sigma x)^{2}\Sigma y - n\Sigma x\Sigma xy + (\Sigma x)^{2}\Sigma y}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \\ & = \frac{n\Sigma x^{2}\Sigma y - n\Sigma x\Sigma xy}{n\left( n\Sigma x^{2} - (\Sigma x)^{2} \right)} \\ & = \frac{\Sigma x^{2}\Sigma y - \Sigma x\Sigma xy}{n\Sigma x^{2} - (\Sigma x)^{2}} \end{aligned}\]

But these match exactly the values for \(\alpha,\beta\) in (2). \(\quad\quad\) QED.