Trace is the derivative of determinant

A question I always had when learning linear algebra is, “what does the trace of a matrix mean conceptually?” For example, the determinant of a matrix is, roughly speaking, the factor by which the matrix expands the volume. The conceptual meaning of trace is not as straightforward, but one way to think about it is

trace is the derivative of determinant at the identity.

Roughly you can think of this in the following way. If you start at the identity matrix and move a tiny step in the direction of M, say M\epsilon where \epsilon is a tiny number, then the determinant changes approximately by \text{tr}(M) times \epsilon. In other words, \det(1 + M\epsilon) \approx 1 + \text{tr}(M)\epsilon. Here 1 stands for the identity matrix.

One can be very precise about what it means to take the “derivative” of the determinant, so let me do some setup. Let K be either \mathbb{R} or \mathbb{C} (so we are working with real or complex Lie groups; but of course, everything makes sense for algebraic groups over arbitrary fields). Then there is a morphism of Lie groups called the determinant \det: \text{GL}_n(K) \to K^\cross, given by sending a matrix to its determinant. Since we are restricting to invertible matrices, the determinants are nonzero. To check that this is really a morphism of Lie groups (i.e. both a smooth map and a homomorphism of groups), note that the determinant map is a polynomial map in the entries of the matrix (and therefore smooth) and is a group homomorphism by the property that \det(AB)=\det(A)\det(B).

Now, given any smooth map of manifolds f which maps point p \mapsto f(p), there is an induced linear map on from the tangent space of p to the tangent space of f(p) called the derivative of f at p. In particular, if f is a Lie group homomorphism, then it maps the identity point to the identity point, and the derivative at the identity is furthermore a homomorphism of Lie algebras. What this means is that, in addition to being a linear map, it preserves the bracket pairing.

In the case of \text{GL}_n, the Lie algebra at the identity matrix is called \mathfrak{gl}_n. We can think of it as consisting of all n \cross n matrices, and the bracket operation is defined by [A, B] = AB - BA. The Lie algebra of K^\cross at 1 consists of the elements of K; since K^\cross is abelian, the bracket is trivial.

The main claim, which I will prove subsequently, is that this map \mathfrak{gl}_n(K) \to K, the derivative of the determinant at the identity, is actually the trace. That is, it sends a matrix to its trace, the sum of the entries on the diagonal. Note that since it is a homomorphism of Lie algebras, it preserves the bracket, and we recover the familiar property of trace \text{tr}(AB - BA) = 0, so \text{tr}(AB)=\text{tr}(BA).

We can find the derivative of a smooth map on \text{GL}_n(K) directly, since it is an open subset of a vector space. Let \phi be a matrix; then the derivative at the identity evaluated at \phi is

    \[\lim_{t \to 0} \frac{\det(1+t\phi) - 1}{t}.\]

\det(1+t\phi) is a polynomial in t, and the number we’re looking for is the coefficient of the t term.

We have

    \[\det(1 + t\phi) (e_1 \wedge \cdots \wedge e_n) = (e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)\wedge \cdots \wedge (e_n+\phi(e_n)t).\]

Just to get a concrete idea of what this expands to, let’s look when n=2. Then

    \[(e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)=e_1\wedge e_2 + (\phi(e_1)\wedge e_2 + e_1 \wedge \phi(e_2))t + (\phi(e_1)\wedge \phi(e_2)) t^2.\]

When n=3,


    \[=e_1\wedge e_2\wedge e_3\]

    \[+ (\phi(e_1)\wedge e_2 \wedge e_3 + e_1 \wedge \phi(e_2) \wedge e_3 + e_1 \wedge e_2 \wedge \phi(e_3))t\]

    \[+ (\phi(e_1)\wedge \phi(e_2) \wedge e_3 + \phi(e_1)\wedge e_2) \wedge \phi(e_3) + e_1\wedge \phi(e_2) \wedge \phi(e_3)) t^2\]

    \[+ (\phi(e_1)\wedge \phi(e_2) \wedge \phi(e_3)) t^3.\]

In particular, the coefficient of t is \text{tr}(\phi). (In fact, see if you can convince yourself that the coefficient of t^i is \text{tr}(\wedge^i \phi).)

See some discussion of the meaning of trace.

Acknowledgements: Thanks to Ben Wormleighton for originally telling me the slogan “trace is the derivative of determinant”, and for teaching me about Lie groups and Lie algebras.

To add: discussion of Jacobi’s formula, exponential map

Determinant of transpose

An important fact in linear algebra is that, given a matrix A, \det A = \det {}^tA, where {}^tA is the transpose of A. Here I will prove this statement via explciit computation, and I will try to do this as cleanly as possible. We may define the determinant of A by

\det A = \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{\sigma(1), 1} \cdots A_{\sigma(n), n}.

Here S_n is the set of permutations of the set \{ 1, \dots n \}, and sgn(\sigma) is the sign of the permutation \sigma. This formula is derived from the definition of the determinant via exterior algebra. One can check by hand that this gives the familiar expressions for the determinant when n = 2, 3.

Now, since ({}^tA)_{i, j} = A_{j, i}, we have

\det {}^tA = \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{1, \sigma(1)} \cdots A_{n, \sigma(n)}

= \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{\sigma^{-1}(\sigma(1)), \sigma(1)} \cdots A_{\sigma^{-1}(\sigma(n)), \sigma(n)}.

The crucial observation here is that we may rearrange the product inside the summation so that the second indices are increasing. Let b = \sigma(a). Then the product inside the summation is

\prod_{1 \leq a \leq n} A_{\sigma^{-1}(\sigma(a)), \sigma(a)} = \prod_{1 \leq b \leq n} A_{\sigma^{-1}(b), b}

Combining this with the fact that sgn(\sigma) = sgn(\sigma^{-1}), our expression simplifies to

\sum_{\sigma \in S_n} (-1)^{sgn(\sigma^{-1})} A_{\sigma^{-1}(1), 1} \cdots A_{\sigma^{-1}(n), n}.

Noticing that the sum is the same sum if we replace all \sigma^{-1}s with \sigmas, we see that this equals \det A. So \det A = \det {}^tA. \square

I wonder if there is a more conceptual proof of this? (By “conceptual”, I mean a proof based on exterior algebra, bilinear pairings, etc…)

More on algebraic numbers

A complex number is algebraic if it is the root of some polynomial P(x) with rational coefficients. \sqrt{2} is algebraic (e.g. the polynomial x^2 -2); i is algebraic (e.g. the polynomial x^2 + 1); \pi and e are not. (A complex number that is not algebraic is called transcendental)

Previously, I wrote some blog posts (see here and here) which sketched a proof of the fact that the sum and product of algebraic numbers is also algebraic (and more). This is not an obvious fact, and to prove this requires some amount of field theory and linear algebra. Nevertheless, the ideas in the proof lead the way to a better understanding of the structure of the algebraic numbers and towards the theorems of Galois theory. In that post, I tried to introduce the minimum algebraic machinery necessary in order to state and prove the main result; I don’t think I entirely succeeded.

However, there is a more direct approach, one which also allows us find a polynomial that has \alpha + \beta (or \alpha\beta) as a root, for algebraic numbers \alpha and \beta. That is the subject of this post. Instead of trying to formally prove the result, I will illustrate the approach for a specific example: showing \sqrt{2} + \sqrt{3} is algebraic.

This post will assume familiarity with the characteristic polynomial of a matrix, and not much more. (In particular, none of the algebra from the previous posts)

A case study

Define the set \mathbb{Q}(\sqrt{2}, \sqrt{3}) = \{a + b\sqrt{2} + c\sqrt{3} + d\sqrt{6} \ | \ a, b, c, d \in \mathbb{Q} \}. We will think of this as a four-dimensional vector space, where the scalars are elements of \mathbb{Q}, and the basis is 1, \sqrt{2}, \sqrt{3}, \sqrt{6}. Every element can be uniquely expressed as a(1) + b\sqrt{2} + c\sqrt{3} + d\sqrt{6}, for a, b, c, d \in \mathbb{Q}.

We’re trying to prove \sqrt{2} + \sqrt{3} is algebraic. Consider the linear transformation T on \mathbb{Q}(\sqrt{2}, \sqrt{3}) defined as “multiply by \sqrt{2} + \sqrt{3}“. In other words, consider the linear map T: \mathbb{Q}(\sqrt{2}, \sqrt{3}) \to \mathbb{Q}(\sqrt{2}, \sqrt{3}) which maps v \mapsto (\sqrt{2} + \sqrt{3})v. This is definitely a linear map, since it satisfies T(v + w) = Tv + Tw and T(cv) = c(Tv). In particular, we should be able to represent it by a matrix.

What is the matrix of T? Well, T(1) = \sqrt{2} + \sqrt{3}, T(\sqrt{2}) = 2 + \sqrt{6}, T(\sqrt{3}) = 3 + \sqrt{6}, and T(\sqrt{6}) = 3\sqrt{2} + 2\sqrt{3}. Thus we can represent T by the matrix

\begin{bmatrix}0 & 2 & 3 & 0 \\1 & 0 & 0 & 3 \\1 & 0 & 0 & 2 \\0 & 1 & 1 & 0\end{bmatrix}.

Now, the characteristic polynomial \chi_T(x) of this matrix, which is defined as \text{det}(T-xI), is x^4 - 10x^2 + 1, which has \sqrt{2} + \sqrt{3} as a root. Thus \sqrt{2} + \sqrt{3} is indeed algebraic.

Why it works

The basic reason is the Cayley-Hamilton theorem. It tells us that T should satisfy the characteristic polynomial: T^4 - 10T^2 + I is the zero matrix. But the matrix we get when plugging T into \chi_T(x) should correspond to multiplication by \chi_T(\sqrt{2} + \sqrt{3}); thus \chi_T(\sqrt{2} + \sqrt{3}) = 0.

Note that I chose \sqrt{2} + \sqrt{3} randomly. I could have chosen any element of \mathbb{Q}(\sqrt{2}, \sqrt{3}) and used this method to find a polynomial with rational coefficients having that element as a root.

At the end of the day, to prove that such a method always works requires the field theory we have glossed over: what is \mathbb{Q}(\alpha, \beta) in general, why is it finite-dimensional, etc. This constructive method, which assumes the Cayley-Hamilton theorem, only replaces the non-constructive “linear dependence” argument in Proposition 4 of the original post.

Two proofs complex matrices have eigenvalues

Today I will briefly discuss two proofs that every matrix T over the complex numbers (or more generally, over an algebraically closed field) has an eigenvalue. Notice that this is equivalent to finding a complex number \lambda such that T - \lambda I has nontrivial kernel. The first proof uses facts about “linear dependence” and the second uses determinants and the characteristic polynomial. The first proof is drawn from Axler’s textbook [1]; the second is the standard proof.

Proof by linear dependence

Let p(x) = a_nx^n + \dots a_1x + a_0 be a polynomial with complex coefficients. If T is a linear map, p(T) = a_nT^n + \dots a_n T + a_0I. We think of this as “p evaluated at T”.

Exercise: Show (pq)(T) = p(T)q(T).

Proof: Pick a random vector v \in V. Consider the sequence of vectors v, Tv, T^2v, \dots T^nv. This is a set of n+1 vectors, so they must be linearly dependent. Thus there exist constants a_0, \dots a_n \in \C such that a_nT^nv + a_{n-1}T^{n-1}v + \dots a_1Tv + a_0v = (a_nT^n + a_{n-1}T^{n-1} + \dots a_1T + a_0I)v = 0.

Define p(x) = a_nx^n + \dots a_1x + a_0. Then, we can factor p(x) = a_n(x-\lambda_1)\dots (x-\lambda_n). By the Exercise, this implies a_n(T - \lambda_1I)\dots (T-\lambda_nI)v = 0. So, at least one of the maps T - \lambda_iI has a nontrivial kernel, so T has an eigenvalue. \square

Proof by the characteristic polynomial

Proof: We want to show that there exists some \lambda such that T - \lambda I has nontrivial kernel: in other words, that T - \lambda I is singular. A matrix is singular if and only if its determinant is nonzero. So, let \chi_T(x) = \det(T - xI); this is a polynomial in x, called the characteristic polynomial of T. Now, every polynomial has a complex root, say \lambda. This implies T - \lambda I, so T has an eigenvalue. \square


To me, it seems like the determinant based proof is more straightforward, although it requires more machinery. Also, the determinant based proof is “constructive”, in that we can actually find all the eigenvalues by factoring the characteristic polynomial. On subject of determinant-based vs determinant-free approaches to linear algebra, see Axler’s article “Down With Determinants!” [3].

There is a similar situation for the problem of showing that the sum (or product) of two algebraic numbers is algebraic. Here there is a non-constructive proof using “linear dependence” (which I attempted to describe in a previous post) and a constructive proof using the characteristic polynomial (which will hopefully be the subject of a future blog post). A further advantage of the determinant-based proof is that it can be used more generally to show that the sum and product of integral elements over a ring are integral. In this more general context, we no longer have linear dependence available.


  1. Sheldon Axler, Linear algebra done right. Springer 2017
  2. Evan Chen, An Infinitely Large Napkin, available online
  3. Sheldon Axler. Down with Determinants! The American Mathematical Monthly, 102(2), 139, 1995. doi:10.2307/2975348, available online

Algebro-geometric proof of Cayley-Hamilton

Here is a sketch of proof of the Cayley-Hamilton theorem via classical algebraic geometry.

The set of n x n matrices over an algebraically closed field can be identified with the affine space \mathbb{A}^{n^2}. Let V be the subset of matrices that satisfy their own characteristic polynomial. We will prove that V is in fact all of \mathbb{A}^{n^2}. Since affine space is irreducible, it suffices to show that V is closed and V contains a non-empty open set.

Fix a matrix M. First, observe that the coefficients of the characteristic polynomial are polynomials in the entries in M. In particular, the condition that a matrix satisfy its own characteristic polynomial amounts to a collection of polynomials in the entries of M vanishing. This establishes that V is closed.

Let U be the set of matrices that have n distinct eigenvalues. A matrix has n distinct eigenvalues if and only if its characteristic polynomial has no double roots when it splits. This occurs if and only if the discriminant of the characteristic polynomial is nonzero. The discriminant is a polynomial in the coefficients of the characteristic polynomial. Thus the condition that a matrix have n distinct eigenvalues amounts to a polynomial in the entries of M not vanishing. Thus U is open.

Finally, we have to show U \subseteq V. It is easy to check this for U a diagonal matrix. The general result follows from the fact that the determinant and thus the characteristic polynomial is basis-invariant.

I learned this from