Matrix Derivative
The derivative of a matrix with respect to either a scalar or vector variable involves calculating the derivative of each element within the matrix, similar to the process used for functions.
Below are the main cases with practical examples:
Derivative of a Matrix with Respect to a Scalar Variable
The derivative of a matrix \( A(t) \), whose elements depend on a scalar variable \( t \), is a new matrix where each element is obtained by differentiating the corresponding element of \( A(t) \) with respect to \( t \).
A Practical Example
Let’s consider this 2x2 matrix:
$$ \mathbf{A}(t) = \begin{bmatrix} t^2 & \sin(t) \\ e^t & t + 1 \end{bmatrix} $$
This is a matrix function \( \mathbf{A}(t) \) that depends on the scalar variable \( t \).
To find the derivative of \( \mathbf{A}(t) \) with respect to \( t \), I differentiate each element of the matrix separately with respect to \( t \).
The derivatives of each element are:
- \( \frac{d}{dt} \left( t^2 \right) = 2t \)
- \( \frac{d}{dt} \left( \sin(t) \right) = \cos(t) \)
- \( \frac{d}{dt} \left( e^t \right) = e^t \)
- \( \frac{d}{dt} \left( t + 1 \right) = 1 \)
Therefore, the derivative of \( \mathbf{A}(t) \) with respect to \( t \) is:
$$ \frac{d\mathbf{A}(t)}{dt} = \begin{bmatrix} 2t & \cos(t) \\ e^t & 1 \end{bmatrix} $$
This illustrates a matrix derivative where the elements depend on a scalar variable \( t \).
Derivative of a Matrix with Respect to a Vector
The derivative of a matrix \( A(\mathbf{x}) \), where the elements depend on a vector \( \mathbf{x} = [x_1, x_2, \ldots, x_n]^T \), results in a new matrix (or tensor) where each element is given by the partial derivative of the corresponding element of \( A(\mathbf{x}) \) with respect to each component of the vector \( \mathbf{x} \).
This process creates a tensor containing all the partial derivatives.
A Practical Example
Consider a matrix \( \mathbf{B}(\mathbf{x}) \) dependent on a vector \( \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \).
$$ \mathbf{B}(\mathbf{x}) = \begin{bmatrix} x_1^2 & x_1 x_2 \\ x_1 + x_2 & x_2^2 \end{bmatrix} $$
To find the derivative of matrix \( \mathbf{B} \) with respect to vector \( \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \), I calculate the partial derivatives of each matrix element with respect to each vector component \( x_1 \) and \( x_2 \).
The partial derivatives with respect to \( x_1 \) are as follows:
- \( \frac{\partial}{\partial x_1} (x_1^2) = 2x_1 \)
- \( \frac{\partial}{\partial x_1} (x_1 x_2) = x_2 \)
- \( \frac{\partial}{\partial x_1} (x_1 + x_2) = 1 \)
- \( \frac{\partial}{\partial x_1} (x_2^2) = 0 \)
Thus, the matrix of partial derivatives with respect to \( x_1 \) is:
$$ \frac{\partial \mathbf{B}}{\partial x_1} = \begin{bmatrix} 2x_1 & x_2 \\ 1 & 0 \end{bmatrix} $$
The partial derivatives with respect to \( x_2 \) are:
- \( \frac{\partial}{\partial x_2} (x_1^2) = 0 \)
- \( \frac{\partial}{\partial x_2} (x_1 x_2) = x_1 \)
- \( \frac{\partial}{\partial x_2} (x_1 + x_2) = 1 \)
- \( \frac{\partial}{\partial x_2} (x_2^2) = 2x_2 \)
So, the matrix of partial derivatives with respect to \( x_2 \) is:
$$ \frac{\partial \mathbf{B}}{\partial x_2} = \begin{bmatrix} 0 & x_1 \\ 1 & 2x_2 \end{bmatrix} $$
By combining these partial derivatives, we form the Jacobian matrix, which can be visualized as a 3-dimensional array (or tensor).
$$ \mathbf{J}(\mathbf{x}) = \begin{bmatrix} \frac{\partial \mathbf{B}}{\partial x_1}, \frac{\partial \mathbf{B}}{\partial x_2} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} 2x_1 & x_2 \\ 1 & 0 \end{bmatrix}, \begin{bmatrix} 0 & x_1 \\ 1 & 2x_2 \end{bmatrix} \end{bmatrix} $$
The final result is the Jacobian matrix of \( \mathbf{B} \).
In this case, since \( \mathbf{B} \) has dimensions \( 2 \times 2 \) and \( \mathbf{x} \) has dimensions \( 2 \times 1 \), the Jacobian matrix is essentially a tensor of size \( 2 \times 2 \times 2 \).
Example 2
Consider a function mapping a vector \( \mathbf{x} = [x, y]^T \) to a matrix \( A(\mathbf{x}) = \begin{bmatrix} xy & x^2 \\ y^2 & xy \end{bmatrix} \).
To compute the derivative of \( A \) with respect to \( \mathbf{x} \), I find all the partial derivatives:
- Derivative with respect to \( x \): $$ \frac{\partial A}{\partial x} = \begin{bmatrix} y & 2x \\ 0 & y \end{bmatrix} $$
- Derivative with respect to \( y \): $$ \frac{\partial A}{\partial y} = \begin{bmatrix} x & 0 \\ 2y & x \end{bmatrix} $$
These matrices form the Jacobian of the matrix function with respect to the vector \( \mathbf{x} \).
$$ \mathbf{J}(\mathbf{x}) = \begin{bmatrix} \frac{\partial A}{\partial x} & \frac{\partial A}{\partial y} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} y & 2x \\ 0 & y \end{bmatrix} & \begin{bmatrix} x & 0 \\ 2y & x \end{bmatrix} \end{bmatrix} $$
And so forth.