1. Introduction
This note introduces derivatives of some real-valued functions of a matrix.
2. Some derivatives
For a real-valued function \(f: \mathbb{R}^{m \times n} \to \mathbb{R}\), we can define the derivative of \(f(X)\) with respect to \(X \in \mathbb{R}^{m \times n}\) as: \[\nabla_{X} f(X) := \Big[ \frac{\partial f(X)}{\partial X_{ij}} \Big]_{m \times n}\]
That is, the matrix derivative of \(f\) is the matrix of element-wise partial derivatives. Let’s get straight to some examples.
\(\nabla_{X} \text{tr}(X_{m \times n} A_{n \times m}) = A^T\)
Since \(\text{tr}(XA) = \sum_{\ell = 1}^{m} \sum_{k = 1}^{n} X_{\ell k} A_{k \ell}\), we obtain that: \[\nabla_{X}\text{tr}(XA) = \Big[\frac{\partial}{\partial X_{ij}} \sum_{\ell = 1}^{m} \sum_{k = 1}^{n} X_{\ell k} A_{k \ell} \Big]_{m \times n}\]
And:
\[\begin{align*} \sum_{\ell = 1}^{m} \sum_{k = 1}^{n} X_{\ell k} A_{k \ell} &= \sum_{\ell = 1}^{m} \Big[ X_{\ell j} A_{j \ell} + \sum_{k \neq j} X_{\ell k} A_{k \ell} \Big] \\ &= \Big[ X_{i j} A_{j i} + \sum_{k \neq j} X_{i k} A_{k i} \Big] + \sum_{\ell \neq j} \Big[ X_{\ell j} A_{j \ell} + \sum_{k \neq j} X_{\ell k} A_{k \ell} \Big] \end{align*}\]
Thus: \[\frac{\partial}{\partial X_{ij}} \sum_{\ell = 1}^{m} \sum_{k = 1}^{n} X_{\ell k} A_{k \ell} = A_{ji}\]
Hence: \[\nabla_{X} \text{tr}(XA) = [A_{ji}]_{m \times n} = \Big[ [A_{ij}]_{n \times m} \Big]^T = A^T\]
\(\nabla_X \det(X_{n \times n}) = \det(X) (X^{-1})^T\)
Recall some definitions from linear algebra:
Define the minor of \(X_{n \times n}\) at \((i, j)\)th entry as \[\text{minor}(X)_{ij} := \det(X_{-i, -j})\] where \(X_{-i, -j}\) is the matrix with the \(i\)th row and \(j\)th column removed.
Define the cofactor of \(X\) at \((i, j)\)th entry as \[\text{cof}(X)_{ij} := (-1)^{i + j} \text{minor(X)}_{ij}\]
The determinant of \(X\) is defined by a cofactor expansion of \(X\): \[\det(X) := \sum_{i = 1}^{n} X_{ij} \text{cof}(X)_{ij}\] where \(j\) can be any fixed number in \(\{1, \dots, n \}\). It is called “a” cofactor expansion because: \[\det(X) := \sum_{i = 1}^{n} X_{ij} \text{cof}(X)_{ij} = \sum_{j = 1}^{n} X_{ij} \text{cof}(X)_{ij}\] That is, cofactor expansions along the \(i\)th row or the \(j\)th column of \(X\) are all the same for any fixed \(i, j \in \{1, \dots, n \}\).
The cofactor matrix of \(X\) is defined as: \[\text{cof}(X) := \big[ \text{cof}(X)_{ij} \big]_{n \times n}\]
Lastly, the inverse of \(X\) is defined as: \[X^{-1} := \frac{\text{cof}(X)^T}{\det(X)} = \Big[ \frac{\text{cof}(X)_{ji}}{\det(X)} \Big]_{n \times n}\]
So we can write \(\nabla_X \det(X_{n \times n})\) as:
\[\begin{align*} \nabla_X \det(X_{n \times n}) &= \Big[ \frac{\partial}{\partial X_{ij}} \sum_{k = 1}^{n} X_{ik} \text{cof}(X)_{ik} \Big]_{n \times n} \\ &= \Big[ \text{cof}(X)_{ij} \Big]_{n \times n} \\ &= \det(X) \Big[ \frac{\text{cof}(X)_{ij}}{\det(X)} \Big]_{n \times n} \\ &= \det(X) \Big[ \frac{\text{cof}(X)_{ji}}{\det(X)} \Big]^T_{n \times n} \\ &= \det(X) (X^{-1})^T \end{align*}\]
\(\nabla_{X} \log( \det(X_{n \times n})) = (X^{-1})^T\)
\[\begin{align*} \nabla_X \log( \det(X_{n \times n})) &= \Big[ \frac{\partial}{\partial X_{ij}} \log \big( \sum_{k = 1}^{n} X_{ik} \text{cof}(X)_{ik} \big) \Big]_{n \times n} \\ &= \Big[ \frac{\text{cof}(X)_{ij}}{\sum_{k = 1}^{n} X_{ik} \text{cof}(X)_{ik}} \Big]_{n \times n} \\ &=: \Big[ \frac{\text{cof}(X)_{ij}}{\det(X)} \Big]_{n \times n} \\ &= \Big[ \frac{\text{cof}(X)_{ji}}{\det(X)} \Big]^T_{n \times n} \\ &= (X^{-1})^T \end{align*}\]