Trace is the derivative of determinant

A question I always had when learning linear algebra is, “what does the trace of a matrix mean conceptually?” For example, the determinant of a matrix is, roughly speaking, the factor by which the matrix expands the volume. The conceptual meaning of trace is not as straightforward, but one way to think about it is

trace is the derivative of determinant at the identity.

Roughly you can think of this in the following way. If you start at the identity matrix and move a tiny step in the direction of M, say M\epsilon where \epsilon is a tiny number, then the determinant changes approximately by \text{tr}(M) times \epsilon. In other words, \det(1 + M\epsilon) \approx 1 + \text{tr}(M)\epsilon. Here 1 stands for the identity matrix.

One can be very precise about what it means to take the “derivative” of the determinant, so let me do some setup. Let K be either \mathbb{R} or \mathbb{C} (so we are working with real or complex Lie groups; but of course, everything makes sense for algebraic groups over arbitrary fields). Then there is a morphism of Lie groups called the determinant \det: \text{GL}_n(K) \to K^\cross, given by sending a matrix to its determinant. Since we are restricting to invertible matrices, the determinants are nonzero. To check that this is really a morphism of Lie groups (i.e. both a smooth map and a homomorphism of groups), note that the determinant map is a polynomial map in the entries of the matrix (and therefore smooth) and is a group homomorphism by the property that \det(AB)=\det(A)\det(B).

Now, given any smooth map of manifolds f which maps point p \mapsto f(p), there is an induced linear map on from the tangent space of p to the tangent space of f(p) called the derivative of f at p. In particular, if f is a Lie group homomorphism, then it maps the identity point to the identity point, and the derivative at the identity is furthermore a homomorphism of Lie algebras. What this means is that, in addition to being a linear map, it preserves the bracket pairing.

In the case of \text{GL}_n, the Lie algebra at the identity matrix is called \mathfrak{gl}_n. We can think of it as consisting of all n \cross n matrices, and the bracket operation is defined by [A, B] = AB - BA. The Lie algebra of K^\cross at 1 consists of the elements of K; since K^\cross is abelian, the bracket is trivial.

The main claim, which I will prove subsequently, is that this map \mathfrak{gl}_n(K) \to K, the derivative of the determinant at the identity, is actually the trace. That is, it sends a matrix to its trace, the sum of the entries on the diagonal. Note that since it is a homomorphism of Lie algebras, it preserves the bracket, and we recover the familiar property of trace \text{tr}(AB - BA) = 0, so \text{tr}(AB)=\text{tr}(BA).

We can find the derivative of a smooth map on \text{GL}_n(K) directly, since it is an open subset of a vector space. Let \phi be a matrix; then the derivative at the identity evaluated at \phi is

    \[\lim_{t \to 0} \frac{\det(1+t\phi) - 1}{t}.\]


\det(1+t\phi) is a polynomial in t, and the number we’re looking for is the coefficient of the t term.

We have

    \[\det(1 + t\phi) (e_1 \wedge \cdots \wedge e_n) = (e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)\wedge \cdots \wedge (e_n+\phi(e_n)t).\]

Just to get a concrete idea of what this expands to, let’s look when n=2. Then

    \[(e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)=e_1\wedge e_2 + (\phi(e_1)\wedge e_2 + e_1 \wedge \phi(e_2))t + (\phi(e_1)\wedge \phi(e_2)) t^2.\]

When n=3,

    \[(e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)\wedge(e_3+\phi(e_3)t)\]

    \[=e_1\wedge e_2\wedge e_3\]

    \[+ (\phi(e_1)\wedge e_2 \wedge e_3 + e_1 \wedge \phi(e_2) \wedge e_3 + e_1 \wedge e_2 \wedge \phi(e_3))t\]

    \[+ (\phi(e_1)\wedge \phi(e_2) \wedge e_3 + \phi(e_1)\wedge e_2) \wedge \phi(e_3) + e_1\wedge \phi(e_2) \wedge \phi(e_3)) t^2\]

    \[+ (\phi(e_1)\wedge \phi(e_2) \wedge \phi(e_3)) t^3.\]

In particular, the coefficient of t is \text{tr}(\phi). (In fact, see if you can convince yourself that the coefficient of t^i is \text{tr}(\wedge^i \phi).)

See some discussion of the meaning of trace.

Acknowledgements: Thanks to Ben Wormleighton for originally telling me the slogan “trace is the derivative of determinant”, and for teaching me about Lie groups and Lie algebras.

To add: discussion of Jacobi’s formula, exponential map

On quotient rings

In this post I will talk about how to compute the product and tensor product of quotient rings R/I and R/J. This sort of thing is usually left as an exercise (especially the first Corollary) and not proved in full generality in algebra courses, although it is not hard.

In all that follows R is a commutative ring with identity and I and J are ideals of R.

Lemma: If I \subseteq J, there is a natural map R/I \to R/J.

Proposition: The natural map R/I \otimes R/J \to R/I+J is an isomorphism of R-algebras.

Proof: To see surjectivity, notice that 1 generates R/(I+J) as an R-module, and 1 \mapsto 1, since the map is a ring homomorphism. To see injectivity, notice that every element

    \[\sum_{i=1}^n [a_i]\otimes[b_i] \in R/I \otimes R/J \ \ (a_i, b_i \in R)\]

is equal to the pure tensor [c]\otimes 1 = 1 \otimes [c] where c =\sum a_ib_i. If [c] \otimes 1 \mapsto 0, then c \in I+J. So c=i+j, i\in I, j\in J. Then [c]\otimes1=[i+j]\otimes1=[i]\otimes1+1\otimes[j]=0+0=0.

Corollary: \mathbb{Z}/m\otimes\mathbb{Z}/n\cong\mathbb{Z}/(\gcd(m,n)).

Proposition (Chinese Remainder Theorem): The natural map R/(I\cap J) \to R/I\times R/J is injective. If I+J=(1), it is also surjective, and thus an isomorphism.

Proof: To see injectivity, notice that if [c]\mapsto(0,0), then c\in I and c\in J, so c\in I\cap J so [c]=0\in R/(I\cap J). To see surjectivity, note that I+J=(1) implies there exist i\in I, j\in J, such that i-j=b-a, for any a,b \in R. Consider ([a], [b]) \in R/I\times R/J, and set i and j. Then a+j=b-j, so a+i \mapsto ([a],[b]).

Corollary: If \gcd(m,n)=1, \mathbb{Z}/m\times \mathbb{Z}/n\cong\mathbb{Z}/mn. In particular, by applying this repeatedly we have \mathbb{Z}/p_1\dots p_n = \mathbb{Z}/p_1 \times \cdots \times \mathbb{Z}/p_n.

Also, note the following fact:

Proposition: If I + J = (1), then I \cap J = IJ.

Proof: Clearly IJ \subseteq I \cap J. To go the other way, note that I + J = (1) means that there exist i \in I, j \in J, and m, n \in R such that mi + nj = 1. So, consider an element a \in I \cap J. Then we have a = a(mi+nj) = m(ia) + n(aj). Since a \in J, m(ia) \in IJ, and since a \in I, n(aj) \in IJ. So a \in IJ.

Remark: One can also think of this in terms of Tor: \text{Tor}_1(R/I, R/J) = (I\cap J)/IJ, and when I+J = (1) this Tor group vanishes.

Determinant of transpose

An important fact in linear algebra is that, given a matrix A, \det A = \det {}^tA, where {}^tA is the transpose of A. Here I will prove this statement via explciit computation, and I will try to do this as cleanly as possible. We may define the determinant of A by

\det A = \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{\sigma(1), 1} \cdots A_{\sigma(n), n}.

Here S_n is the set of permutations of the set \{ 1, \dots n \}, and sgn(\sigma) is the sign of the permutation \sigma. This formula is derived from the definition of the determinant via exterior algebra. One can check by hand that this gives the familiar expressions for the determinant when n = 2, 3.

Now, since ({}^tA)_{i, j} = A_{j, i}, we have

\det {}^tA = \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{1, \sigma(1)} \cdots A_{n, \sigma(n)}

= \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{\sigma^{-1}(\sigma(1)), \sigma(1)} \cdots A_{\sigma^{-1}(\sigma(n)), \sigma(n)}.

The crucial observation here is that we may rearrange the product inside the summation so that the second indices are increasing. Let b = \sigma(a). Then the product inside the summation is

\prod_{1 \leq a \leq n} A_{\sigma^{-1}(\sigma(a)), \sigma(a)} = \prod_{1 \leq b \leq n} A_{\sigma^{-1}(b), b}

Combining this with the fact that sgn(\sigma) = sgn(\sigma^{-1}), our expression simplifies to

\sum_{\sigma \in S_n} (-1)^{sgn(\sigma^{-1})} A_{\sigma^{-1}(1), 1} \cdots A_{\sigma^{-1}(n), n}.

Noticing that the sum is the same sum if we replace all \sigma^{-1}s with \sigmas, we see that this equals \det A. So \det A = \det {}^tA. \square

I wonder if there is a more conceptual proof of this? (By “conceptual”, I mean a proof based on exterior algebra, bilinear pairings, etc…)

A special case of Krull’s intersection theorem

In this post I will prove a special case of Krull’s intersection theorem, which can be proved without invoking the Artin-Rees lemma. This result is useful in the study of discrete valuation rings. The following proof is from Serre’s Local Fields.

Proposition: Let R, \frak{m} be a noetherian local domain where \frak{m} = (\pi). Then \bigcap_{n \geq 0} \frak{m}^n = 0.

Proof: Suppose y \in \bigcap_{n \geq 0} \frak{m}^n. Then y = \pi^n x_n for all n \geq 0. So for all n \pi^nx_n = \pi^{n+1}x_{n+1}. Since R is a domain, x_n = \pi x_{n+1}.

Therefore consider the ascending chain (x_0) \subseteq (x_1) \subseteq \cdots. This eventually stabilizes for high enough n since R is noetherian, so for some n, x_{n+1} = cx_n. Thus x_{n+1} = c\pi x_{n+1}, so (1-c\pi)x_{n+1} = 0. But 1-c\pi is a unit, so x_{n+1} = 0, so y = 0. \square

This theorem holds more generally even if R is not assumed to be a domain, but the proof is more complicated (but still among the same lines).

Proposition: Let R,  \frak{m} be a noetherian local ring where \frak{m} = (\pi) and \pi is not nilpotent. Then \bigcap_{n \geq 0} \frak{m}^n = 0.

Proof: Let \frak{u} be the ideal of elements that kill some power of \pi. We will use variables u_1, u_2, \dots to refer to elements of \frak{u}. Since R is noetherian, \frak{u} must be finitely generated, so all elements of \frak{u} kill \pi^N for some fixed N.

Now suppose y \in \bigcap_{n \geq 0} \frak{m}^n. \pi^nx_n = \pi^{n+1}x_{n+1}, so \pi^n(x_n - \pi x_{n+1}). Thus x_n - \pi x_{n+1} \in \frak{u}.

Consider the ascending chain \frak{u} + (x_0) \subseteq \frak{u} + (x_1) \subseteq \cdots. Since R is noetherian it must eventually stablize, so for some n, x_{n+1} can be written as u_1 + cx_n. But recall that x_n = u_2 + \pi x_{n+1}. So x_{n+1} = u_1 + c(u_2 + \pi  x_{n+1}) = u_3 + c\pi x_{n+1} so (1-c\pi)x_{n+1} = u_3. 1-c\pi is a unit since \frak{m} = (\pi), and R is local, so x_{n+1} \in \frak{u}. If we force n to be large enough to surpass N, then \pi^{n+1}x_{n+1} = 0, so y = 0. \square

More on algebraic numbers

A complex number is algebraic if it is the root of some polynomial P(x) with rational coefficients. \sqrt{2} is algebraic (e.g. the polynomial x^2 -2); i is algebraic (e.g. the polynomial x^2 + 1); \pi and e are not. (A complex number that is not algebraic is called transcendental)

Previously, I wrote some blog posts (see here and here) which sketched a proof of the fact that the sum and product of algebraic numbers is also algebraic (and more). This is not an obvious fact, and to prove this requires some amount of field theory and linear algebra. Nevertheless, the ideas in the proof lead the way to a better understanding of the structure of the algebraic numbers and towards the theorems of Galois theory. In that post, I tried to introduce the minimum algebraic machinery necessary in order to state and prove the main result; I don’t think I entirely succeeded.

However, there is a more direct approach, one which also allows us find a polynomial that has \alpha + \beta (or \alpha\beta) as a root, for algebraic numbers \alpha and \beta. That is the subject of this post. Instead of trying to formally prove the result, I will illustrate the approach for a specific example: showing \sqrt{2} + \sqrt{3} is algebraic.

This post will assume familiarity with the characteristic polynomial of a matrix, and not much more. (In particular, none of the algebra from the previous posts)

A case study

Define the set \mathbb{Q}(\sqrt{2}, \sqrt{3}) = \{a + b\sqrt{2} + c\sqrt{3} + d\sqrt{6} \ | \ a, b, c, d \in \mathbb{Q} \}. We will think of this as a four-dimensional vector space, where the scalars are elements of \mathbb{Q}, and the basis is 1, \sqrt{2}, \sqrt{3}, \sqrt{6}. Every element can be uniquely expressed as a(1) + b\sqrt{2} + c\sqrt{3} + d\sqrt{6}, for a, b, c, d \in \mathbb{Q}.

We’re trying to prove \sqrt{2} + \sqrt{3} is algebraic. Consider the linear transformation T on \mathbb{Q}(\sqrt{2}, \sqrt{3}) defined as “multiply by \sqrt{2} + \sqrt{3}“. In other words, consider the linear map T: \mathbb{Q}(\sqrt{2}, \sqrt{3}) \to \mathbb{Q}(\sqrt{2}, \sqrt{3}) which maps v \mapsto (\sqrt{2} + \sqrt{3})v. This is definitely a linear map, since it satisfies T(v + w) = Tv + Tw and T(cv) = c(Tv). In particular, we should be able to represent it by a matrix.

What is the matrix of T? Well, T(1) = \sqrt{2} + \sqrt{3}, T(\sqrt{2}) = 2 + \sqrt{6}, T(\sqrt{3}) = 3 + \sqrt{6}, and T(\sqrt{6}) = 3\sqrt{2} + 2\sqrt{3}. Thus we can represent T by the matrix

\begin{bmatrix}0 & 2 & 3 & 0 \\1 & 0 & 0 & 3 \\1 & 0 & 0 & 2 \\0 & 1 & 1 & 0\end{bmatrix}.

Now, the characteristic polynomial \chi_T(x) of this matrix, which is defined as \text{det}(T-xI), is x^4 - 10x^2 + 1, which has \sqrt{2} + \sqrt{3} as a root. Thus \sqrt{2} + \sqrt{3} is indeed algebraic.

Why it works

The basic reason is the Cayley-Hamilton theorem. It tells us that T should satisfy the characteristic polynomial: T^4 - 10T^2 + I is the zero matrix. But the matrix we get when plugging T into \chi_T(x) should correspond to multiplication by \chi_T(\sqrt{2} + \sqrt{3}); thus \chi_T(\sqrt{2} + \sqrt{3}) = 0.

Note that I chose \sqrt{2} + \sqrt{3} randomly. I could have chosen any element of \mathbb{Q}(\sqrt{2}, \sqrt{3}) and used this method to find a polynomial with rational coefficients having that element as a root.

At the end of the day, to prove that such a method always works requires the field theory we have glossed over: what is \mathbb{Q}(\alpha, \beta) in general, why is it finite-dimensional, etc. This constructive method, which assumes the Cayley-Hamilton theorem, only replaces the non-constructive “linear dependence” argument in Proposition 4 of the original post.

Two proofs complex matrices have eigenvalues

Today I will briefly discuss two proofs that every matrix T over the complex numbers (or more generally, over an algebraically closed field) has an eigenvalue. Notice that this is equivalent to finding a complex number \lambda such that T - \lambda I has nontrivial kernel. The first proof uses facts about “linear dependence” and the second uses determinants and the characteristic polynomial. The first proof is drawn from Axler’s textbook [1]; the second is the standard proof.

Proof by linear dependence

Let p(x) = a_nx^n + \dots a_1x + a_0 be a polynomial with complex coefficients. If T is a linear map, p(T) = a_nT^n + \dots a_n T + a_0I. We think of this as “p evaluated at T”.

Exercise: Show (pq)(T) = p(T)q(T).

Proof: Pick a random vector v \in V. Consider the sequence of vectors v, Tv, T^2v, \dots T^nv. This is a set of n+1 vectors, so they must be linearly dependent. Thus there exist constants a_0, \dots a_n \in \C such that a_nT^nv + a_{n-1}T^{n-1}v + \dots a_1Tv + a_0v = (a_nT^n + a_{n-1}T^{n-1} + \dots a_1T + a_0I)v = 0.

Define p(x) = a_nx^n + \dots a_1x + a_0. Then, we can factor p(x) = a_n(x-\lambda_1)\dots (x-\lambda_n). By the Exercise, this implies a_n(T - \lambda_1I)\dots (T-\lambda_nI)v = 0. So, at least one of the maps T - \lambda_iI has a nontrivial kernel, so T has an eigenvalue. \square

Proof by the characteristic polynomial

Proof: We want to show that there exists some \lambda such that T - \lambda I has nontrivial kernel: in other words, that T - \lambda I is singular. A matrix is singular if and only if its determinant is nonzero. So, let \chi_T(x) = \det(T - xI); this is a polynomial in x, called the characteristic polynomial of T. Now, every polynomial has a complex root, say \lambda. This implies T - \lambda I, so T has an eigenvalue. \square

Thoughts

To me, it seems like the determinant based proof is more straightforward, although it requires more machinery. Also, the determinant based proof is “constructive”, in that we can actually find all the eigenvalues by factoring the characteristic polynomial. On subject of determinant-based vs determinant-free approaches to linear algebra, see Axler’s article “Down With Determinants!” [3].

There is a similar situation for the problem of showing that the sum (or product) of two algebraic numbers is algebraic. Here there is a non-constructive proof using “linear dependence” (which I attempted to describe in a previous post) and a constructive proof using the characteristic polynomial (which will hopefully be the subject of a future blog post). A further advantage of the determinant-based proof is that it can be used more generally to show that the sum and product of integral elements over a ring are integral. In this more general context, we no longer have linear dependence available.

References

  1. Sheldon Axler, Linear algebra done right. Springer 2017
  2. Evan Chen, An Infinitely Large Napkin, available online
  3. Sheldon Axler. Down with Determinants! The American Mathematical Monthly, 102(2), 139, 1995. doi:10.2307/2975348, available online

Hilbert’s Basis Theorem

Here is a proof of Hilbert’s Basis Theorem I thought of last night.

Let R be a noetherian ring. Consider an ideal J in R[X]. Let I_i be the ideal in R generated by the leading coefficients of the polynomials of degree i in I. Notice that I_i \subseteq I_{i+1}, since if P \in I_i, xP \in I_{i+1}, and it has the same leading coefficient. Thus we have an ascending chain \dots \subseteq I_{i-1} \subseteq I_i \subseteq I_{i+1} \subseteq \dots, which must terminate, since R is noetherian. Suppose it terminates at i = n, so I_n = I_{n+1} = \dots.

Now for each I_i choose a finite set of generators (which we can do since R is noetherian). For each generator, choose a representative polynomial in J with that leading coefficient. This gives us a finite collection polynomials: define J_i to be the ideal of R[x] generated by these polynomials.

Let J' = J_0 + J_1 + \dots + J_n. I claim J = J'. Assume for the sake of contradiction that there is a polynomial P of minimal degree (say i) which is in J but J'. If i \leq n, there is an element of P' \in J' with the same leading coefficient, so P - P' is not in J' but has degree smaller than i: contradiction. If P is of degree i > n, then there is an element of P' of J_n which has the same leading coefficient as P. Thus P - x^{i-n}P' is of degree smaller than i but is not in J': contradiction.

Thus J = J'. Since J is therefore finitely generated, this proves R[x] is noetherian.

Algebro-geometric proof of Cayley-Hamilton

Here is a sketch of proof of the Cayley-Hamilton theorem via classical algebraic geometry.

The set of n x n matrices over an algebraically closed field can be identified with the affine space \mathbb{A}^{n^2}. Let V be the subset of matrices that satisfy their own characteristic polynomial. We will prove that V is in fact all of \mathbb{A}^{n^2}. Since affine space is irreducible, it suffices to show that V is closed and V contains a non-empty open set.

Fix a matrix M. First, observe that the coefficients of the characteristic polynomial are polynomials in the entries in M. In particular, the condition that a matrix satisfy its own characteristic polynomial amounts to a collection of polynomials in the entries of M vanishing. This establishes that V is closed.

Let U be the set of matrices that have n distinct eigenvalues. A matrix has n distinct eigenvalues if and only if its characteristic polynomial has no double roots when it splits. This occurs if and only if the discriminant of the characteristic polynomial is nonzero. The discriminant is a polynomial in the coefficients of the characteristic polynomial. Thus the condition that a matrix have n distinct eigenvalues amounts to a polynomial in the entries of M not vanishing. Thus U is open.

Finally, we have to show U \subseteq V. It is easy to check this for U a diagonal matrix. The general result follows from the fact that the determinant and thus the characteristic polynomial is basis-invariant.

I learned this from https://aergus.net/blog/posts/using-zariski-topology-to-prove-the-cayley-hamilton-theorem.html/

On algebraic numbers

A complex number is algebraic if it is the root of some polynomial P(x) with rational coefficients. \sqrt{2} is algebraic (e.g. the polynomial x^2 -2); i is algebraic (e.g. the polynomial x^2 + 1); \pi and e are not. A complex number that is not algebraic is called transcendental.

Is the sum of algebraic numbers always algebraic? What about the product of algebraic numbers? For example, given that \sqrt{2} and \sqrt{3} are algebraic, how do we know \alpha = \sqrt{2} + \sqrt{3} is also algebraic?

We can try to repeatedly square the equation x = \sqrt{2} + \sqrt{3}. This gives us x^2 = 2 + 3 + 2\sqrt{6}. Then isolating the radical, we have x^2 - 5 = 2\sqrt{6}. Squaring again, we get x^4 - 10x^2 + 25 = 24, so \alpha is a root of x^4 - 10x^2 + 1. This is in fact the unique monic polynomial of minimum degree that has \alpha as a root (called the minimal polynomial of \alpha) which shows \alpha is algebraic. But a sum like \sqrt{2} + \sqrt{3} + \sqrt{5} + i would probably have a minimal polynomial of much greater degree, and it would be much more complicated to construct this polynomial and verify the number is algebraic.

It is also increasingly difficult to construct polynomials for numbers like \sqrt{2} + \sqrt[3]{3}. And, apparently there also exist algebraic numbers that cannot be expressed as radicals at all. This further complicates the problem.

In fact, the sum and product of algebraic numbers is algebraic – in fact, any number which can be obtained by adding, subtracting, multiplying and dividing algebraic numbers is algebraic. This means that a number like

    \[\frac{\sqrt{2} + i\sqrt[3]{25} - 2 + \sqrt{3}}{15 - 3i + 4e^{5i\pi / 12}}\]

is algebraic – there is some polynomial which has it as a root. But we will prove this result non-constructively; that is, we will prove that such a number must be algebraic, without providing an explicit process to actually obtain the polynomial. To establish this result, we will try to look for a deeper structure in the algebraic numbers, which is elucidated, perhaps surprisingly, using the tools of linear algebra.

Definition: Let S be a set of complex numbers that contains 0. We say S is an abelian group if for all x, y \in S, x + y \in S and x - y \in S. In other words, S is closed under addition and subtraction.

Some examples: \mathbb{Z}, \mathbb{Q}, \mathbb{R}, \mathbb{C}, the Gaussian integers \mathbb{Z}[i] (i.e. the set of expressions a+bi where a and b are integers)

Definition: An abelian group is a field if it contains 1 and for all x, y \in S, xy \in S, and if y \neq 0, x/y \in S. In other words, S is closed under multiplication and division.

We generally use the letters k, F, E for arbitrary fields. Of the previous examples, only \mathbb{Q}, \mathbb{R}, \mathbb{C} are fields.

Exercise: Show that if k is a field, \mathbb{Q} \subseteq k.

Exercise: Show that the set \{a + b\sqrt{2} \ | \ a, b \in \mathbb{Q} \} is a field. We call this field \mathbb{Q}(\sqrt{2}). (Hint: rationalize the denominator).

Exercise: Describe the the smallest field which contains both \sqrt{2} and \sqrt{3} (this is called \mathbb{Q}(\sqrt{2}, \sqrt{3}))

Generally if k is a field and x_1, \dots x_n some complex numbers, we will denote the smallest field that contains all of k and the elements x_1, \dots x_n as k(x_1, \dots x_n).

Definition: Let k be a field. A k-vector space is an abelian group V such that if c \in k, x \in V, then cx \in V. Intuitively, the elements of V are closed under scaling by k. (We also sometimes use the phrase “V is a vector space over k “)

Examples: \mathbb{Q}(\sqrt{2}, \sqrt{3}) and \mathbb{Q}(\sqrt{2}) are both \mathbb{Q}-vector spaces. In fact, \mathbb{Q}(\sqrt{2}, \sqrt{3}) is also a \mathbb{Q}(\sqrt{2}) vector space.

With this language in place, we can state the main goal of this post as follows:

Theorem: If \alpha_1, \dots \alpha_n are algebraic numbers and x \in \mathbb{Q}(\alpha_1, \dots \alpha_n), then x is algebraic.

Note how this encompasses our previous claim that any number obtained by adding, subtracting, multiplying and dividing algebraic numbers is algebraic. For example, we can deduce the giant fraction above is algebraic by applying the theorem to \mathbb{Q}(\sqrt{2}, i, \sqrt[3]{25}, \sqrt{3}, e^{5i/12}).

Definition: A field extension of a field k is just a field F that contains k. We will write this as “F \supseteq k is a field extension”.

Exercise: If F \supseteq k is a field extension, then F is a k-vector space

Consider a field like \mathbb{Q}(\sqrt{2}). Not only is it a field, but it is also a \mathbb{Q}-vector space – its elements can be scaled by rational numbers. We want to adapt concepts from ordinary linear algebra to this setting. We want to think of \mathbb{Q}(\sqrt{2}) as a two dimensional vector space, with basis 1 and \sqrt{2} – every element is can be uniquely written as a \mathbb{Q}-linear combination of 1 and \sqrt{2}. Similarly, we would like to think of \mathbb{Q}(\sqrt{2}, \sqrt{3}) as a \mathbb{Q}-vector space with basis 1, \sqrt{2}, \sqrt{3}, \sqrt{6}. Note that if we regard \mathbb{Q}(\sqrt{2}, \sqrt{3}) as a \mathbb{Q}(\sqrt{2})-vector space, its basis is 1, \sqrt{3}. This is because if an element is written as a + b\sqrt{2} + c\sqrt{3} + d\sqrt{6}, where the coefficients are rational, we can rewrite it as (a + b\sqrt{2})1 + (c + d\sqrt{2})\sqrt{3}, now with coefficients in \mathbb{Q}(\sqrt{2}).

On the other hand, consider \mathbb{Q}(\pi). Since \pi is not algebraic, no linear combination (with rational coefficients) of the numbers 1, \pi, \pi^2, \pi^3, \dots equals 0 without all the coefficients being zero. This makes them linearly independent – and makes \mathbb{Q}(\pi) an infinite-dimensional vector space.

This finite-versus-infinite dimensionality difference will lie at the crux of our argument. Here is a rough summary of the argument to prove the Theorem:

  1. If we add an algebraic number to a field, we get a finite-dimensional vector space over that field.
  2. By repeatedly adding algebraic numbers to \mathbb{Q}, the resulting field will still be a finite-dimensional \mathbb{Q}-vector space.
  3. Any element of a field which is a finite-dimensional \mathbb{Q}-vector space must be algebraic.

From here on, we will assume that the reader has some familiarity with ideas from linear algebra, such as the notions of linear combination, linear independence, and basis. We will only sketch proofs (maybe I will add details later). We would like to warn the reader that the proof sketches have many gaps and may be difficult to follow.

Definition: A field extension F \supseteq k is finite if there is a finite set of elements e_1, \dots e_n \in F such that every element of F can be represented as c_1e_1 + \dots c_ne_n, where all the c_i \in k. We say the elements e_1, \dots e_n generate (aka span) F over k. If, furthermore, every element can be represented uniquely in this form, then we say e_1, \dots e_n is a kbasis (or just basis) for F.

Examples: \mathbb{Q}(\sqrt{2})/\mathbb{Q} and \mathbb{Q}(\sqrt{2}, \sqrt{3})/\mathbb{Q} are finite extensions, while \mathbb{Q}(\pi)/\mathbb{Q} is not.

Exercise: Show \mathbb{C}/\mathbb{Q} is not a finite extension. (Hint: \mathbb{C} is uncountable)

Proposition 1: Every finite field extension F/k has a basis.

Proof sketch: This is a special case of a famous theorem in linear algebra. Assume e_1, \dots e_n generate F. Start with the last element. If e_n can be represented as a linear combination of e_1 \dots e_{n-1}, remove it from the list. If we removed e_n, we now have e_1 \dots e_{n-1}, and we can repeat the process with e_{n-1}. Continue this process until we cannot remove any more elements. Then (it can be checked that) we obtain a set whose elements are linearly independent. Then (it can be checked that) these form a k-basis for F. \square

For the next theorem, keep in mind the example of the extensions \mathbb{Q}(\sqrt{2})/\mathbb{Q} and \mathbb{Q}(\sqrt{2}, \sqrt{3})/\mathbb{Q}(\sqrt{2}).

Proposition 2: Suppose F \supseteq k and E \supseteq F are finite field extensions. Then E \supseteq k is a finite extension.

Proof: By Proposition 1, both these field extensions have bases. Label the F-basis of E as e_1, \dots e_n, and the k-basis for f as e'_1, \dots e'_m. Then every element of F can be uniquely represented as c_1e_1 + \dots + c_ne_n, for c_i \in F. Furthermore, each c_i can be written uniquely as c_i = a_{i, 1}e'_1 + \dots + a_{i, m}e'_m for a_{i, j} \in k. Plugging these in for the c_i shows that the mn elements of the form e_ie'_j form a k-basis for E. \square

Proposition 3: If \alpha is algebraic, then k(\alpha) \supseteq k is a finite extension.

Proof sketch: Suppose x \in k(\alpha). First we will show every element in k(\alpha) is a polynomial in \alpha with coefficients in k. k(\alpha) consists of all numbers that can be obtained by repeatedly adding, subtracting, multiplying and dividing \alpha and elements of k. By performing some algebraic manipulations and combining fractions, we can see that every element can be written in the form p(\alpha)/q(\alpha), where p and q are polynomials with coefficients in k.

Now I claim that for any polynomial q where q(\alpha) \neq 0, there exists a polynomial s such that s(\alpha) = 1/q(\alpha). Let m be the minimal polynomial of \alpha. Since q(\alpha) \neq 0 and m(\alpha) = 0, q and m are relatively prime as polynomials. So there exist polynomials r and s such that rq+sm = 1. Plugging in \alpha into the polynomials gives us r(\alpha)q(\alpha) + s(\alpha)m(\alpha) = 1. Since m(\alpha) = 0, s(\alpha) = 1/q(\alpha), as desired.

Thus any element in k(\alpha), written as p(\alpha)/q(\alpha), can be written as p(\alpha)s(\alpha) for an appropriate polynomial s; in other words, every element of k(\alpha) is a polynomial of \alpha with coefficients in k. Now, suppose the minimal polynomial of \alpha is a_nx^n + \dots a_1x + a_0. Then \alpha^n = \frac{1}{a_n}(-a_0 - a_1\alpha \dots - a_{n-1}\alpha^{n-1}). So \alpha^n (and all higher powers of \alpha) can be written as rational linear combinations of 1, \alpha, \dots \alpha^{n-1}. Then it can be verified that every element in k(\alpha) can be written uniquely as a k-linear combination of 1, \alpha, \dots \alpha^{n-1}. This establishes that k(\alpha)/k is a finite extension; in particular, it has the basis 1, \alpha, \dots \alpha^{n-1}. \square

Proposition 4: Suppose k \supseteq \mathbb{Q} is a finite extension. Then any x \in k is algebraic.

Proof sketch: Consider the elements 1, x, x^2, \dots…. They cannot all be linearly independent. This is because in a finite-dimensional vector space of dimension n any set with at least n+1 elements must be linearly dependent. So there must be some linear combination of them which equals zero. But this simply means that for some constants a_i and positive integer n, a_nx^n + \dots a_1x + a_0 = 0. Letting the a_i be the coefficients of a polynomial P, P(x) = 0, so x is algebraic. \square

We have developed enough theory now to prove the result.

Proof of main Theorem: Let \alpha_1, \alpha_2, \dots \alpha_n be some collection of algebraic numbers. Then by Proposition 3, \mathbb{Q}(\alpha_1, \dots \alpha_n) \supseteq \mathbb{Q}(\alpha_1, \dots \alpha_{n-1}), \dots \mathbb{Q}(\alpha_1, \alpha_2) \supseteq \mathbb{Q}(\alpha_1), \mathbb{Q}(\alpha_1) \supseteq \mathbb{Q} are all finite field extensions. Applying Proposition 2 repeatedly gives us \mathbb{Q}(\alpha_1, \dots \alpha_n) \supseteq \mathbb{Q} is a finite extension. Then by Proposition 4, any element of \mathbb{Q}(\alpha_1, \dots \alpha_n) is algebraic.

This theorem essentially says that given a collection of algebraic numbers, by repeatedly adding, subtracting, multiplying and dividing them, we can only obtain algebraic numbers. But what about radicals? For example, we have now established that \sqrt{2} + \sqrt{3} + \sqrt{5} is algebraic. Is \sqrt{\sqrt{2} + \sqrt{3} + \sqrt{5}} algebraic as well?

In fact, we can generalize our result to the following: any number obtained by repeatedly adding, subtracting, multiplying, dividing algebraic numbers, as well as taking m-th roots (for positive integers m) will be algebraic.

This is not too hard now that we have the main theorem. It suffices to show that if x is algebraic, \sqrt[m]{x} is algebraic. If x is algebraic, then there exist constants a_0, \dots a_n such that a_nx^n + \dots a_1x + a_0 = 0. This can be rewritten as a_n(\sqrt[m]{x})^{mn} + \dots a_1(\sqrt[m]{x})^m + a_0 = 0. Thus \sqrt[m]{x} is algebraic.


Acknowledgements: Thanks to Anton Cao and Nagaganesh Jaladanki for reviewing this article.

Odd and even permutations

A standard part of the theory of permutations is the classification of permutations into “odd” and “even” types. In this post I will develop the theory of odd and even permutations, focusing on adjacent permutations.

Let X be an finite ordered set. A permutation of X is a rearrangement of X; formally, it is a bijection \sigma: X \to X. For simplicity, we will label the elements of X as 1, 2, \dots n, and we will represent a permutation \sigma by the list \sigma(1)\sigma(2)\dots\sigma(n). Thus if X has five elements, then the permutation that reverses X can be written as 54321.

The set of permutations of X naturally forms a group. That is, given two permutations \sigma_1 and \sigma_2, we can form their product \sigma_2 \dot \sigma_1 (or \sigma_2\sigma_1 for short), which is defined by performing \sigma_1 first and then \sigma_2, following the convention for composition of functions. For example, the product of the permutations 13425 \cdot 21345 is 31425.

Here are some standard permutations. The identity permutation is the permutation that does nothing. A transposition is a permutation that swaps two elements. If a transposition swaps a and b, we will denote it as (a \ b). An adjacent transposition is a transposition that swaps two adjacent elements (so it is denoted (a \ a+1)).

An important property of a permutation is its inversion number. An inversion of a permutation \sigma is a pair i, j \in \{1, 2, \dots n\} where i < j such that \sigma(i) > \sigma(j). In other words, it is a pair of elements whose relative positions have been switched by the permutation. The inversion number of \sigma is the size of its set of inversions. Thus the inversion number is at most {n \choose 2} (this occurs when the permutation reverses X). A permutation has inversion number 0 if and only if it is the identity permutation. Also, a permutation has inversion number 1 if and only if it is an adjacent transposition.

A permutation is called odd if its inversion number is odd, and even if its inversion number is even. We would like to show that the product of odd and even permutations behaves like addition of odd and even numbers. To show this, we will use the following lemma:

Every permutation is a product of adjacent transpositions.

This can be proved by using insertion sort, an algorithm which effectively writes any permutations as a product of adjacent transpositions.

Here is how it works. Consider, for example, the permutation \sigma given by 23154. Let us try to sort this list. Insertion sort first moves 1 to the front by repeated swaps, first swapping the second and third position and then the first and second positions. Then we get 12354. Then insertion sort would swap the fourth and fifth positions, and then finish with 12345. So, if we do these transpositions in reverse, we get our original permutation. That is, \sigma = (2 3)(1 2)(4 5). We can see that using this process, insertion sort can write any permutation as a product of adjacent transpositions.

We need one more lemma:

If \sigma is a permutation and \rho an adjacent transposition, then \sigma and \sigma\rho have opposite parity. That is, one is odd and one is even.

Here is why. After performing \sigma, if we swap two adjacent elements a and a+1, then the only change to the set of inversions is a, a+1, since a and a+1 haven’t moved relative to any other element. So, the only possible change is that a, a+1 was an inversion and no longer is, or it wasn’t an inversion and now is. So the inversion number has either been increased or decreased by one – so the parity has been switched.

So, if a permutation \sigma is written as a product of n adjacent transpositions, then its parity is equal to the parity of n. This establishes that the product of odd and even permutations behaves like addition of odd and even numbers (formally, what we are saying is that “parity” defines a homomorphism from the permutation group to \mathbb{Z}/2\mathbb{Z}).

Another interesting fact is that every transposition is odd. We can see this (without needing to count inversions) by noticing that we can obtain any transposition (a \ b) (with a < b) by repeatedly swapping adjacent elements till the element in place a is moved to place b, and then repeatedly swapping the other direction till the element originally in place b is moved to place a. If we write this out we see that (a \ b) = (a \ a+1)(a+1 \ a+2) \dots (b-2 \ b-1)(b-1 \ b)(b-2 \ b-1) \dots (a \ a+1), which has an odd number of terms.

Thus every permutation can be written only either as a product of an odd number or an even number of transpositions. Since the identity permutation is clearly even, we have the following remarkable fact: given any list, it is impossible, after performing an odd number of swaps, to obtain the original list.

Acknowledgements: Thanks to Anton Cao for suggestions. Thanks to Jessica Meng for reminding me the correct convention for composition order.