Gluing, descent, sheaves and more

Gluing functions

Suppose \{U_i\} is a cover of a topological space X. What does it take to glue continuous functions f_i on U_i into a function on all of X?

Well, all we need is that they need to agree on intersections. If f_i|U_i \cap U_j = f_j|U_i\cap U_j for all pairs i, j, then they glue uniquely to a function g on X such that g|U_i = f_i.

Let’s reformulate this slightly. Let U be the disjoint union of the U_is. Then we have a surjective “cover” U \to X. Combine the f_is into a function on U called f. Then what we are trying to do is see if the function f on U can be “descended” to X i.e. we are trying to find functions g on X, if any, such that when such a function is pulled back to U it is f. Note here that by “is” we mean the functions are literally identical. We can also reformulate what is means for the f_is to agree on intersections. The fiber product of U with itself over X has two maps, left and right projection, down to U:

Rendered by QuickLaTeX.com

Call them p_1 and p_2. Then for the f_i to “agree on intersections” is equivalent to the statement f \circ p_1 = f \circ p_2 as functions on U \times_{X} U (think about this!). We’ll return to this later. Now, let’s step up one level, from functions to vector bundles.

Gluing vector bundles

Suppose \{U_i\} is a cover of a topological space X. What does it take to glue vector bundles (or sheaves) V_i \to U_i to a vector bundle V \to X? We also know the answer to this: transition maps satisfying the cocyle condition. More precisely, we need for each pair i, j an isomorphism of vector bundles \phi_{i,j}:V_i \to V_j restricted to U_i \cap U_j, such that for every triple i, j, k, \phi_{j,k} \circ \phi_{i,j} = \phi_{i,k} on the triple intersection U_i \cap U_j \cap U_k.

Instead of just needing the “local objects” to agree on intersections, we need data specifying how they agree; furthermore, we need the data to be mutually compatible

Let’s try to reformulate this in terms of the disjoint union of the U_is again. Let’s call U the disjoint union of the U_i and combine the V_i together as a single vector bundle V \to U. What we’re trying to do is “descend” this vector bundle over U to X.

Since we have triple intersections, we’ll need to consider the fiber product U \times_{X} U \times_{X} U. It has three projections down to U \times_{X} U (by omitting any of the three factors).

Rendered by QuickLaTeX.com

Let’s call these projections p_{12}, p_{23}, p_{31}.

Now we are ready to state the answer to the question of when does the vector bundle descend. The isomorphism on intersections is given by an isomorphism \phi: p_1^*V \to p_2^*V of vector bundles, and the cocycle condition translates to p_{12}^*(\phi) \circ p_{23}^*(\phi) = p_{31}^*(\phi)

Should we go up one more level?

Gluing categories of vector bundles

No, I’m not ready yet.

Sheaves are functors that satisfy descent

Let’s fix a base category. Consider the functor F which assigns to every base object the set of (insert adjective) functions on it. Then such a functor (presheaf) is a sheaf if, for every function on U which, along the two projections p_1, p_2: U \times U \to U, is pulled to identical functions, is the unique pullback of a function on X. In other words F(X) is the equalizer F(U) \rightrightarrows F(U \times U).

Each element of F(X) is: an element of F(U), and a truth value on F(U \times U) (the truth value is whether the “functions” agree).

Let’s move one level up. Consider the functor F which assigns to every base object the category of sheaves over it. Then such a functor is a 2-sheaf if for every sheaf on U, there is an isomorphism between the pullbacks along the two projections U \times U \to U, and these isomorphisms, pulled back to U \times U \times U (along the three projections p_{12}, p_{23}, p_{13}: U \times U \times U \to U \times U), satisfy a cocyle condition. If such a thing made sense, we might say F(X) is equivalent, as a category, to the following “2-limit of categories”:

Rendered by QuickLaTeX.com

Each object of F(X) is: an object of F(U), isomorphism of F(U\times U), and truth value on F(U\times U\times U). Each morphism of F(X) is: morphism of F(U), truth value on F(U\times U). (morphism of descent datum)

Consider the functor F which assigns to every base object the 2-category of sheaves of 1-categories over it.

Each object of F(X) is: an object of F(U), isomorphism of F(U\times U), 2-isomorphism of F(U\times U\times U), and a truth value on F(U\times U\times U\times U). A morphism of F(X) is a morphism of F(U), a 2-morphism of F(U\times U), and a truth value in F(U\times U\times U). A 2-morphism of F(X) is a 2-morphism in F(U\times U) and a truth value on F(U \times U \times U).

And so on…

Acknowledgements: Thanks to David Corwin for explaining these ideas to me.

Trace is the derivative of determinant

A question I always had when learning linear algebra is, “what does the trace of a matrix mean conceptually?” For example, the determinant of a matrix is, roughly speaking, the factor by which the matrix expands the volume. The conceptual meaning of trace is not as straightforward, but one way to think about it is

trace is the derivative of determinant at the identity.

Roughly you can think of this in the following way. If you start at the identity matrix and move a tiny step in the direction of M, say M\epsilon where \epsilon is a tiny number, then the determinant changes approximately by \text{tr}(M) times \epsilon. In other words, \det(1 + M\epsilon) \approx 1 + \text{tr}(M)\epsilon. Here 1 stands for the identity matrix.

One can be very precise about what it means to take the “derivative” of the determinant, so let me do some setup. Let K be either \mathbb{R} or \mathbb{C} (so we are working with real or complex Lie groups; but of course, everything makes sense for algebraic groups over arbitrary fields). Then there is a morphism of Lie groups called the determinant \det: \text{GL}_n(K) \to K^\cross, given by sending a matrix to its determinant. Since we are restricting to invertible matrices, the determinants are nonzero. To check that this is really a morphism of Lie groups (i.e. both a smooth map and a homomorphism of groups), note that the determinant map is a polynomial map in the entries of the matrix (and therefore smooth) and is a group homomorphism by the property that \det(AB)=\det(A)\det(B).

Now, given any smooth map of manifolds f which maps point p \mapsto f(p), there is an induced linear map on from the tangent space of p to the tangent space of f(p) called the derivative of f at p. In particular, if f is a Lie group homomorphism, then it maps the identity point to the identity point, and the derivative at the identity is furthermore a homomorphism of Lie algebras. What this means is that, in addition to being a linear map, it preserves the bracket pairing.

In the case of \text{GL}_n, the Lie algebra at the identity matrix is called \mathfrak{gl}_n. We can think of it as consisting of all n \cross n matrices, and the bracket operation is defined by [A, B] = AB - BA. The Lie algebra of K^\cross at 1 consists of the elements of K; since K^\cross is abelian, the bracket is trivial.

The main claim, which I will prove subsequently, is that this map \mathfrak{gl}_n(K) \to K, the derivative of the determinant at the identity, is actually the trace. That is, it sends a matrix to its trace, the sum of the entries on the diagonal. Note that since it is a homomorphism of Lie algebras, it preserves the bracket, and we recover the familiar property of trace \text{tr}(AB - BA) = 0, so \text{tr}(AB)=\text{tr}(BA).

We can find the derivative of a smooth map on \text{GL}_n(K) directly, since it is an open subset of a vector space. Let \phi be a matrix; then the derivative at the identity evaluated at \phi is

    \[\lim_{t \to 0} \frac{\det(1+t\phi) - 1}{t}.\]


\det(1+t\phi) is a polynomial in t, and the number we’re looking for is the coefficient of the t term.

We have

    \[\det(1 + t\phi) (e_1 \wedge \cdots \wedge e_n) = (e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)\wedge \cdots \wedge (e_n+\phi(e_n)t).\]

Just to get a concrete idea of what this expands to, let’s look when n=2. Then

    \[(e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)=e_1\wedge e_2 + (\phi(e_1)\wedge e_2 + e_1 \wedge \phi(e_2))t + (\phi(e_1)\wedge \phi(e_2)) t^2.\]

When n=3,

    \[(e_1+\phi(e_1)t)\wedge(e_2+\phi(e_2)t)\wedge(e_3+\phi(e_3)t)\]

    \[=e_1\wedge e_2\wedge e_3\]

    \[+ (\phi(e_1)\wedge e_2 \wedge e_3 + e_1 \wedge \phi(e_2) \wedge e_3 + e_1 \wedge e_2 \wedge \phi(e_3))t\]

    \[+ (\phi(e_1)\wedge \phi(e_2) \wedge e_3 + \phi(e_1)\wedge e_2) \wedge \phi(e_3) + e_1\wedge \phi(e_2) \wedge \phi(e_3)) t^2\]

    \[+ (\phi(e_1)\wedge \phi(e_2) \wedge \phi(e_3)) t^3.\]

In particular, the coefficient of t is \text{tr}(\phi). (In fact, see if you can convince yourself that the coefficient of t^i is \text{tr}(\wedge^i \phi).)

See some discussion of the meaning of trace.

Acknowledgements: Thanks to Ben Wormleighton for originally telling me the slogan “trace is the derivative of determinant”, and for teaching me about Lie groups and Lie algebras.

To add: discussion of Jacobi’s formula, exponential map

On quotient rings

In this post I will talk about how to compute the product and tensor product of quotient rings R/I and R/J. This sort of thing is usually left as an exercise (especially the first Corollary) and not proved in full generality in algebra courses, although it is not hard.

In all that follows R is a commutative ring with identity and I and J are ideals of R.

Lemma: If I \subseteq J, there is a natural map R/I \to R/J.

Proposition: The natural map R/I \otimes R/J \to R/I+J is an isomorphism of R-algebras.

Proof: To see surjectivity, notice that 1 generates R/(I+J) as an R-module, and 1 \mapsto 1, since the map is a ring homomorphism. To see injectivity, notice that every element

    \[\sum_{i=1}^n [a_i]\otimes[b_i] \in R/I \otimes R/J \ \ (a_i, b_i \in R)\]

is equal to the pure tensor [c]\otimes 1 = 1 \otimes [c] where c =\sum a_ib_i. If [c] \otimes 1 \mapsto 0, then c \in I+J. So c=i+j, i\in I, j\in J. Then [c]\otimes1=[i+j]\otimes1=[i]\otimes1+1\otimes[j]=0+0=0.

Corollary: \mathbb{Z}/m\otimes\mathbb{Z}/n\cong\mathbb{Z}/(\gcd(m,n)).

Proposition (Chinese Remainder Theorem): The natural map R/(I\cap J) \to R/I\times R/J is injective. If I+J=(1), it is also surjective, and thus an isomorphism.

Proof: To see injectivity, notice that if [c]\mapsto(0,0), then c\in I and c\in J, so c\in I\cap J so [c]=0\in R/(I\cap J). To see surjectivity, note that I+J=(1) implies there exist i\in I, j\in J, such that i-j=b-a, for any a,b \in R. Consider ([a], [b]) \in R/I\times R/J, and set i and j. Then a+j=b-j, so a+i \mapsto ([a],[b]).

Corollary: If \gcd(m,n)=1, \mathbb{Z}/m\times \mathbb{Z}/n\cong\mathbb{Z}/mn. In particular, by applying this repeatedly we have \mathbb{Z}/p_1\dots p_n = \mathbb{Z}/p_1 \times \cdots \times \mathbb{Z}/p_n.

Also, note the following fact:

Proposition: If I + J = (1), then I \cap J = IJ.

Proof: Clearly IJ \subseteq I \cap J. To go the other way, note that I + J = (1) means that there exist i \in I, j \in J, and m, n \in R such that mi + nj = 1. So, consider an element a \in I \cap J. Then we have a = a(mi+nj) = m(ia) + n(aj). Since a \in J, m(ia) \in IJ, and since a \in I, n(aj) \in IJ. So a \in IJ.

Remark: One can also think of this in terms of Tor: \text{Tor}_1(R/I, R/J) = (I\cap J)/IJ, and when I+J = (1) this Tor group vanishes.

The Riemann hypothesis for function fields

(These are notes adapted from a talk I gave at the Student Arithmetic Geometry seminar at Berkeley)

Introduction

Probably the most famous open problem in number theory is the Riemann hypothesis. In addition to being worth a million dollars, it is a deep and fundamental problem that has remained intractable since it was first proposed by Bernhard Riemann, in 1859.

The Riemann hypothesis springs out of the field of analytic number theory, which applies complex analysis to problems in number theory, often studying the distribution of prime numbers. The Riemann hypothesis itself has significant implications for the distribution of primes and implies an asymptotic statement about their density (for a precise statement, see here). But the Riemann hypothesis is usually formulated in the language of complex analysis, as a statement about a complex-analytic function, the Riemann zeta function, and its zeroes. This formulation is succint and elegant, and allows the problem to be subsumed into the larger study of the largely conjectural theory of L-functions.

This broader theory allows one to create analogues of the Riemann zeta function and Riemann hypothesis in other contexts. Often these “alternative Riemann hypotheses” are even harder than the original Riemann hypothesis, but there is a famous case where this is fortunately not true.

In the 1940’s, André Weil proved an analogue of the Riemann hypothesis: not for the Riemann zeta function, but for a different zeta function. Here’s one way to describe it: very roughly speaking, the Riemann zeta function is based on the field rational numbers \mathbb{Q} (it can be defined as an Euler product over the primes of \mathbb{Q}). Our zeta function will constructed analogously, but instead be based on the field \mathbb{F}_q(t) (the field of rational functions with coefficients in the finite field \mathbb{F}_q). So instead of the number field \mathbb{Q}, we have swapped it out and replaced it with a function field.

Actually, what Weil proved, and what we will prove today, is the analogue of the Riemann hypothesis for global function fields. This work represents the greatest progress we have towards the original Riemann hypothesis, and serves as tantalizing evidence for it.

There is a general pattern in number theory which looks something like the following: start with a problem in number theory. Adapt the problem from the number field setting to the function field setting. Then interpret the function field as the function field of a curve (usually), and then use techniques of algebraic geometry (for example, \mathbb{F}_q(t) is the function field of a line over \mathbb{F}_q). That is exactly what we will do here: it will therefore look less like complex analysis and more like algebraic geometry

Math (somewhat rushed)

Let C be a smooth projective curve over a finite field \mathbb{F}_q. Let N_r be the set of \mathbb{F}_{q^r} points of C. Then the zeta function of C is defined by

    \[Z(C, T) = \exp \bigg(\sum_{r=1}^{\infty} N_r \frac{T^r}{r} \bigg).\]

Here, we are using T as a change of variables: if we plug in q^{-s} for T, then we obtain an exactly analogous zeta function to the Riemann zeta function, except with respect to the function field of C instead of the field \mathbb{Q}. There are three important properties that we would like Z(C, T) to have: (1) rationality, (2) satisfies a functional equation, and (3) satisfies an analogue of the Riemann hypothesis. Part (3) was proved by André Weil in the 1940’s; parts (1) and (2) were proved much earlier. In this post, I will present a proof of the analogue of the Riemann hypothesis assuming (1) and (2), along the lines of Weil’s original proof using intersection theory. All this material and much more is in an expository paper by James Milne called “The Riemann Hypothesis over Finite Fields: From Weil to the Present Day”. A useful reference is Appendix C in Hartshorne’s Algebraic Geometry; some material also comes from section V.1 on surfaces.

Let g be the genus of C. Then (1) says that Z(C, T) is a rational function of T. The specific function equation of (2) is the following:

    \[Z(C, \frac{1}{qT}) = q^{1-g}T^{2-2g} Z(C, T)\]

It turns out that we can write out Z explicitly: there exist constants \alpha_i for 1 \leq i \leq 2g such that

    \[Z(C, T) = \frac{(1- \alpha_1T) \cdots (1 - \alpha_{2g}T)}{(1-T)(1-qT)}\]

and the functional equation implies that the constants \alpha_i can be rearranged if necessary so that

\alpha_i\alpha_{2g-i} = q.

Now, the analogue of the Riemann hypothesis states the following:

|\alpha_i| = \sqrt{q}.

(To see the connection between this statement and the ordinary Riemann hypothesis, check out this blog post by Anton Hilado)

Notice that, assuming rationality and the functional equation, the Riemann hypothesis will follow from simply the inequality |\alpha_i| \leq \sqrt{q}.

We will prove the Riemann hypothesis via the Hasse-Weil inequality, which is an inequality that puts an explicit bound on N_r. The Hasse-Weil inequality states that

|N_r - (1 + q^r)| \leq 2g \sqrt{q^r}

which is actually a pretty good bound. Why does the Hasse-Weil inequality imply the Riemann hypothesis? Well, if we take the logarithm of Z(C, T) and use the power series for \log(1-x), regrouping terms gives us

N_r = 1 + q^r - \sum_{i = 1}^{2g} \alpha_i^r; so |\alpha_1^r + \cdots \alpha_{2g}^r| < 2g \sqrt{q^r}.

In other words,

\left|\left(\frac{\alpha_1}{\sqrt{q}}\right)^r + \cdots + \left(\frac{\alpha_1}{\sqrt{q}}\right)^r \right| is bounded.

Letting r \to \infty, we have

\max \left| \frac{\alpha_i}{\sqrt{q}} \right| \leq 1, so \alpha_i \leq \sqrt{q} for all i

as desired. (check this works, even if \alpha_i are not distinct)

Proof of the Hasse-Weil inequality

We will prove the Hasse-Weil inequality using intersection theory. First, we will consider C as a curve over \overline{\mathbb{F}_q}. Then there is the Frobenius map \text{Frob}_r: C \to C. If we embed C into projective space, then \text{Frob}_r sends [x_0 : \cdots : x_n] \mapsto [x_0^{q_r} : \cdots : x_n^{q^r}]. We can interpret N_r as the size of the set of fixed points of \text{Frob}_r. Our plan then to use inequalities from intersection theory to bound the intersection of \Gamma_{\text{Frob}_r} and \Delta (the diagonal) in C \times C.

First, let us set up the intersection theory we need. This material is from Chapter V.1 of Hartshorne, on surfaces.

Intersection pairing on a surface: Let X be a surface. There exists a symmetric bilinear pairing \text{Pic }X \times \text{Pic }X \to \mathbb{Z} (where the product of divisors C and D is denoted C.D) such that if C, D are smooth curves intersecting transversely, then

C.D = |C \cap D|.

Furthermore, another theorem we’ll need is the Hodge index theorem:

Let H be an ample divisor on X and D a nonzero divisor, with D.H = 0. Then D^2 \leq 0. (D^2 denotes D.D)

Now let us begin with some general set up. Let C_1 and C_2 be two curves, and let X = C_1 \times C_2. Identify C_1 with C_1 \times * and C_2 with * \times C_2. Notice that C_1.C_1 = C_2.C_2 = 0 and C_1.C_2 = 1. Thus (C_1 + C_2)^2 = 2 > 0.

Let D be a divisor on X. Let d_1 = D.C_1 and d_2 = D.C_2; also, (D - d_2C_1 - d_1C_2).(C_1 + C_2) = 0 (expand it out). The Hodge index theorem implies then that (D - d_2C_1 - d_1C_2)^2 \leq 0. Expanding this out yields D^2 \leq 2d_1d_2. This fundamental inequality is called the Castelnuovo-Severi inequality. We may define \text{def}(D) := 2d_1d_2 - D^2 \geq 0.

Next, let us prove the following inequality: if D and D' are divisors, then

| D.D' - d_1d_2' - d_2d_1' | \leq \sqrt{\text{def}(D)\text{def}(D')} .

Proof (fill in details): Expand out \text{def}(mD + nD') \geq 0, for m, n \in \mathbb{Z}. We can let \frac{m}{n} become arbitrarily close to \sqrt{\frac{\text{def}(D')}{\text{def(D)}}, yielding the inequality. \square

Here’s another lemma we will need: Consider a map f: C_1 \to C_2. If \Gamma_f is the graph of f on C_1 \times C_2, then \text{def}(\Gamma_f) = 2g_2 \text{deg}(f) (where g_2 is the genus of C_2).

Proof (fill in details): Rearrange adjunction formula. \square

Now we have what we need: we will do intersection theory on C \times C. The Frobenius map f = \text{Frob}_r: C \to C is a map of degree q^r, so \text{def}(\Gamma_f) = 2gq^r. We might as well think of \Delta as the graph of the identity map, so \text{def}(\Delta) = 2g. Finally, d_2' = d_2 = d_1' = 1 and d_1 = q^r. Plugging it into the inequality, we get

| \Gamma_f . \Delta - q^r - 1 | \leq \sqrt{(2gq^r)(2g)}

yielding the Hasse-Weil inequality

|N_r - (1 + q^r)| \leq 2g \sqrt{q^r}.

This proves the Riemann hypothesis for function fields, or equivalently the Riemann hypothesis curves over finite fields.

The Weil conjectures

After Weil proved this result, he speculated whether analogous statements were true for not only curves over finite fields, but higher-dimensional algebraic varieties over finite fields. He proposed as conjectures that the zeta functions for such varieties should also satisfy (1) rationality, (2) a functional equation, and (3) an analogoue of the Riemann hypothesis.

Weil also speculated a connection with algebraic topology. In our work above, the genus g was crucial. But the genus can alternatively be defined topologically, by taking the equations that define the curve, looking at the locus they cut out when graphed over the complex numbers, and counting how many holes the resulting shape has. Weil suggested that for arbitrary varieties, topological Betti numbers should play this role: that is, the zeta function of the variety over the finite field should be closely connected with the topology of the analogous variety over the complex numbers.

There’s an interesting blog post that discusses this idea, in our context of curves. But the rest is history. The story of the Weil conjectures is one of the most famous in all of mathematics: the effort to prove them revolutionized algebraic geometry and number theory forever. The key innovation was the theory of étale cohomology, which is an analogue of classical singular cohomology for algebraic varieties over arbitrary fields.

Determinant of transpose

An important fact in linear algebra is that, given a matrix A, \det A = \det {}^tA, where {}^tA is the transpose of A. Here I will prove this statement via explciit computation, and I will try to do this as cleanly as possible. We may define the determinant of A by

\det A = \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{\sigma(1), 1} \cdots A_{\sigma(n), n}.

Here S_n is the set of permutations of the set \{ 1, \dots n \}, and sgn(\sigma) is the sign of the permutation \sigma. This formula is derived from the definition of the determinant via exterior algebra. One can check by hand that this gives the familiar expressions for the determinant when n = 2, 3.

Now, since ({}^tA)_{i, j} = A_{j, i}, we have

\det {}^tA = \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{1, \sigma(1)} \cdots A_{n, \sigma(n)}

= \sum_{\sigma \in S_n} (-1)^{sgn(\sigma)} A_{\sigma^{-1}(\sigma(1)), \sigma(1)} \cdots A_{\sigma^{-1}(\sigma(n)), \sigma(n)}.

The crucial observation here is that we may rearrange the product inside the summation so that the second indices are increasing. Let b = \sigma(a). Then the product inside the summation is

\prod_{1 \leq a \leq n} A_{\sigma^{-1}(\sigma(a)), \sigma(a)} = \prod_{1 \leq b \leq n} A_{\sigma^{-1}(b), b}

Combining this with the fact that sgn(\sigma) = sgn(\sigma^{-1}), our expression simplifies to

\sum_{\sigma \in S_n} (-1)^{sgn(\sigma^{-1})} A_{\sigma^{-1}(1), 1} \cdots A_{\sigma^{-1}(n), n}.

Noticing that the sum is the same sum if we replace all \sigma^{-1}s with \sigmas, we see that this equals \det A. So \det A = \det {}^tA. \square

I wonder if there is a more conceptual proof of this? (By “conceptual”, I mean a proof based on exterior algebra, bilinear pairings, etc…)

Archimedean absolute values

In the previous post we discussed the Archimedean property for an ordered field. Today I’ll discuss the Archimedean property for valued fields, that is, fields equipped with an absolute value.

Recall that an absolute value on a field K is a function | \ |: K \to \mathbb{R}_{\geq 0} satisfying the following axioms:

  1. |a| = 0 if and only if a = 0
  2. |ab| = |a||b|
  3. |a+b| \leq |a| + |b| (triangle inequality)

for all a, b \in K.

Here is an intuitive, analogous definition for the Archimedean property:

Definition: The absolute value | \ | is Archimedean if, for x, y \in K, x \neq 0, |nx| > |y| for some natural number n.

Clearly the standard absolute value (which is defined on \mathbb{C} and \mathbb{R}, and therefore \mathbb{Q}) is Archimedean. But wait: since we assumed x \neq 0, we can divide both sides by |x| to obtain |n| > |y/x|. In other words, we can write the definition equivalently as:

Equivalent Definition: The absolute value | \ | is Archimedean if, for all a \in K, |n| > |a| for some natural number n.

Here a takes the place of y/x. The important thing here is that a can be any element of K. So what this is saying is that, given any element of the field, there is some natural number that beats it.

Now, let us assume that the absolute value is nontrivial. (The trivial absolute value has |a| = 1 for all nonzero a). Thus, for some a, |a| \neq 1. So, either |a| > 1 or |1/a| = 1/|a| > 1. Thus by taking arbitrarily high powers of a or 1/a, we can obtain arbitrarily high absolute values. So we can reformulate the definition as follows:

Equivalent Definition: | \ | is Archimedean if the set \{ |n| \ | \ n \in \mathbb{N}\} contains arbitrarily large elements.

In other words, the set is unbounded. So, | \ | is non-Archimedean if the sequence |1|, |2|, |3|, \dots is bounded. However, if any |n| > 1, then taking arbitrarily high powers of n can give us arbitrarily high absolute values. So

Equivalent Definition: | \ | is non-Archimedean if |n| \leq 1 for n \in \mathbb{N}.

Finally, I will present another very useful characterization of the (non)Archimedean property.

Theorem/Equivalent Definition: | \ | is non-Archimedean if |a+b| \leq \text{max}(|a|, |b|).

Proof: (to be added)

The Archimedean property

If a and b are positive real numbers, if you add a to itself enough times, eventually you will surpass b. This is called the Archimedean property, and it is one of the fundamental properties of the system of real numbers. Informally, what this property says is that no numbers are infinitely larger than others. We can formally define this property as follows:

Let F be an ordered field. We say F is Archimedean if, for x, y \in F where x, y > 0, there exists a natural number n such that nx > y.

An example of a non-Archimedean number system is the hyperreal numbers. Hyperreal numbers are an enlargement of the real numbers that also contain “infinite” and “infinitesimal” quantities. The hyperreal numbers are used to give an alternative formulation of calculus in the subject of non-standard analysis, where instead of using limits, one computes with actual infinitesimals.

More familiar examples of non-Archimedean fields are function fields. For example, consider the field of rational functions (on \mathbb{R}), denoted \mathbb{R}(x). We can order rational functions by declaring that

p > q if p(x) > q(x) as x \to \infty

for any p, q \in \mathbb{R}(x). In other words, we order rational functions by looking at their asymptotic behavior. One can check that this satisfies the axioms, making \mathbb{R}(X) an ordered field.

Exercise: Show that a rational function

    \[p(x) = \frac{f(x)}{g(x)} = \frac{a_nx^n + \cdots + a_1x + a_0}{b_mx^m + \cdots + b_1x + a_0}\]

is positive with respect to the order (i.e. p > 0) if and only if a_n/b_m > 0.

Now one can see that the field of rational functions is clearly not Archimedean. For example, if we consider p(x) = 1/x, no matter how many times we add it to itself, it will never surpass q(x) = 1: the function np(x) = \frac{n}{x} is eventually surpassed by q(x) = 1, no matter how great n is.

Exercise: Define the degree of a rational function to be the degree of its numerator minus the degree of its denominator. For rational functions p, q > 0, show there exists an integer n such that np > q if and only if degree(p) \geq degree(q).

Thus the basic idea of the Archimedean property is at the core of asymptotic analysis. In defining big-O notation, we write f(x) = O(g(x)) if some multiple of f surpasses g as x goes off to infinity.

In the next post, I will discuss the Archimedean property for valued fields (as opposed to ordered fields), and how this applies to number theory.

15 triangles in a web of cubics

Consider a homogeneous cubic form in three variables X, Y, and Z, such as

X^2Y - X^2Z + XY^2 -XYZ - Y^2Z + Z^3

Sometimes a cubic form can be factored. In this case, we are lucky: it factors as

(X-Z)(Y-Z)(X+Y+Z),

but in general we will not be so lucky. It is pretty rare for a random cubic form to be factorable. From the perspective of projective algebraic geometry, a homogeneous cubic form cuts out an algebraic curve in the projective plane:

and a cubic that factors into three linear forms will cut out three lines: a “degenerate” plane cubic. Such a collection of three lines is called a “triangle”.

The space of homogeneous cubic forms is a 10-dimensional vector space with basis X^3, X^2Y, X^2Z, Y^3, Y^2X, Y^2Z, Z^3, Z^2X, Z^2Y, XYZ. However, given a cubic form F, the scaled form cF corresponds to the same curve. Furthermore, if all the coefficients are zero, then the form doesn’t correspond to the curve at all. Thus the space of plane cubics is nine-dimensional projective space.

In this post we will prove the following enumerative result:

A general three-dimensional family of plane cubics contains exactly 15 triangles.

By a three-dimensional family (aka “web”), we mean some embedded copy of \mathbb{P}^3 in this space \mathbb{P}^9 of plane cubics. This geometric result corresponds to the following purely algebraic fact:

A general four-dimensional subspace of the ten-dimensional space of cubic forms contains exactly 15 forms which factor into three linear forms.

(I should probably say something about the term “general”. The statement “a general x \in S satisfies property P” this means that P(x) says that the subset of S for which P(x) holds is dense in S.)

This material is drawn from 3264 And All That by Eisenbud and Harris

The strategy

Let us consider the space of (ordered) triples of lines, or nonzero linear forms up to scaling. This is \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2. We can construct a morphism

\mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2 \to \mathbb{P}^9

which sends a triple of linear forms F, G, H to their product FGH. This map is (in general) 6 to 1, since there are 6 permutations of three (distinct) linear forms.

We will do intersection theory in \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2: specifically, we will pull back the class of a \mathbb{P}^3 in \mathbb{P}^9 to \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2, and count how many points it consists of (in other words, how many triples of linear forms correspond to cubic surfaces in a general web). Then we will divide this number by 6, to count the number of triangles in the family.

Computation in the Chow ring

The morphism

\phi: \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2 \to \mathbb{P}^9

induces a map of Chow rings:

\phi^*: A(\mathbb{P}^9) \to A(\mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2).

Now, A(\mathbb{P}^9) \cong \mathbb{Z}[x]/(x^{10}) and A(\mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2) \cong \mathbb{Z}[a, b, c]/(a^3, b^3, c^3). The class of any \mathbb{P}^3 in A(\mathbb{P}^9) is x^6. Furthermore, \phi^*(x) = a+b+c. So, \phi^*(x^6) = (a+b+c)^6. If we expand this out, removing every term that has any variable to a power of three or greater, we see that every term except the monomial term of a^2b^2c^2 vanishes, and its coefficient is the {6 \choose 2, 2, 2} = \frac{6!}{2!2!2!} = 90. 90 is the the number of ordered triples of linear forms which correspond to cubic forms contained in a general web. Dividing by six, since six ordered triples correspond to one unordered triple (i.e. one distinct triangle), we obtain our answer of 15.

(I will add more details to this…)

A special case of Krull’s intersection theorem

In this post I will prove a special case of Krull’s intersection theorem, which can be proved without invoking the Artin-Rees lemma. This result is useful in the study of discrete valuation rings. The following proof is from Serre’s Local Fields.

Proposition: Let R, \frak{m} be a noetherian local domain where \frak{m} = (\pi). Then \bigcap_{n \geq 0} \frak{m}^n = 0.

Proof: Suppose y \in \bigcap_{n \geq 0} \frak{m}^n. Then y = \pi^n x_n for all n \geq 0. So for all n \pi^nx_n = \pi^{n+1}x_{n+1}. Since R is a domain, x_n = \pi x_{n+1}.

Therefore consider the ascending chain (x_0) \subseteq (x_1) \subseteq \cdots. This eventually stabilizes for high enough n since R is noetherian, so for some n, x_{n+1} = cx_n. Thus x_{n+1} = c\pi x_{n+1}, so (1-c\pi)x_{n+1} = 0. But 1-c\pi is a unit, so x_{n+1} = 0, so y = 0. \square

This theorem holds more generally even if R is not assumed to be a domain, but the proof is more complicated (but still among the same lines).

Proposition: Let R,  \frak{m} be a noetherian local ring where \frak{m} = (\pi) and \pi is not nilpotent. Then \bigcap_{n \geq 0} \frak{m}^n = 0.

Proof: Let \frak{u} be the ideal of elements that kill some power of \pi. We will use variables u_1, u_2, \dots to refer to elements of \frak{u}. Since R is noetherian, \frak{u} must be finitely generated, so all elements of \frak{u} kill \pi^N for some fixed N.

Now suppose y \in \bigcap_{n \geq 0} \frak{m}^n. \pi^nx_n = \pi^{n+1}x_{n+1}, so \pi^n(x_n - \pi x_{n+1}). Thus x_n - \pi x_{n+1} \in \frak{u}.

Consider the ascending chain \frak{u} + (x_0) \subseteq \frak{u} + (x_1) \subseteq \cdots. Since R is noetherian it must eventually stablize, so for some n, x_{n+1} can be written as u_1 + cx_n. But recall that x_n = u_2 + \pi x_{n+1}. So x_{n+1} = u_1 + c(u_2 + \pi  x_{n+1}) = u_3 + c\pi x_{n+1} so (1-c\pi)x_{n+1} = u_3. 1-c\pi is a unit since \frak{m} = (\pi), and R is local, so x_{n+1} \in \frak{u}. If we force n to be large enough to surpass N, then \pi^{n+1}x_{n+1} = 0, so y = 0. \square

More on algebraic numbers

A complex number is algebraic if it is the root of some polynomial P(x) with rational coefficients. \sqrt{2} is algebraic (e.g. the polynomial x^2 -2); i is algebraic (e.g. the polynomial x^2 + 1); \pi and e are not. (A complex number that is not algebraic is called transcendental)

Previously, I wrote some blog posts (see here and here) which sketched a proof of the fact that the sum and product of algebraic numbers is also algebraic (and more). This is not an obvious fact, and to prove this requires some amount of field theory and linear algebra. Nevertheless, the ideas in the proof lead the way to a better understanding of the structure of the algebraic numbers and towards the theorems of Galois theory. In that post, I tried to introduce the minimum algebraic machinery necessary in order to state and prove the main result; I don’t think I entirely succeeded.

However, there is a more direct approach, one which also allows us find a polynomial that has \alpha + \beta (or \alpha\beta) as a root, for algebraic numbers \alpha and \beta. That is the subject of this post. Instead of trying to formally prove the result, I will illustrate the approach for a specific example: showing \sqrt{2} + \sqrt{3} is algebraic.

This post will assume familiarity with the characteristic polynomial of a matrix, and not much more. (In particular, none of the algebra from the previous posts)

A case study

Define the set \mathbb{Q}(\sqrt{2}, \sqrt{3}) = \{a + b\sqrt{2} + c\sqrt{3} + d\sqrt{6} \ | \ a, b, c, d \in \mathbb{Q} \}. We will think of this as a four-dimensional vector space, where the scalars are elements of \mathbb{Q}, and the basis is 1, \sqrt{2}, \sqrt{3}, \sqrt{6}. Every element can be uniquely expressed as a(1) + b\sqrt{2} + c\sqrt{3} + d\sqrt{6}, for a, b, c, d \in \mathbb{Q}.

We’re trying to prove \sqrt{2} + \sqrt{3} is algebraic. Consider the linear transformation T on \mathbb{Q}(\sqrt{2}, \sqrt{3}) defined as “multiply by \sqrt{2} + \sqrt{3}“. In other words, consider the linear map T: \mathbb{Q}(\sqrt{2}, \sqrt{3}) \to \mathbb{Q}(\sqrt{2}, \sqrt{3}) which maps v \mapsto (\sqrt{2} + \sqrt{3})v. This is definitely a linear map, since it satisfies T(v + w) = Tv + Tw and T(cv) = c(Tv). In particular, we should be able to represent it by a matrix.

What is the matrix of T? Well, T(1) = \sqrt{2} + \sqrt{3}, T(\sqrt{2}) = 2 + \sqrt{6}, T(\sqrt{3}) = 3 + \sqrt{6}, and T(\sqrt{6}) = 3\sqrt{2} + 2\sqrt{3}. Thus we can represent T by the matrix

\begin{bmatrix}0 & 2 & 3 & 0 \\1 & 0 & 0 & 3 \\1 & 0 & 0 & 2 \\0 & 1 & 1 & 0\end{bmatrix}.

Now, the characteristic polynomial \chi_T(x) of this matrix, which is defined as \text{det}(T-xI), is x^4 - 10x^2 + 1, which has \sqrt{2} + \sqrt{3} as a root. Thus \sqrt{2} + \sqrt{3} is indeed algebraic.

Why it works

The basic reason is the Cayley-Hamilton theorem. It tells us that T should satisfy the characteristic polynomial: T^4 - 10T^2 + I is the zero matrix. But the matrix we get when plugging T into \chi_T(x) should correspond to multiplication by \chi_T(\sqrt{2} + \sqrt{3}); thus \chi_T(\sqrt{2} + \sqrt{3}) = 0.

Note that I chose \sqrt{2} + \sqrt{3} randomly. I could have chosen any element of \mathbb{Q}(\sqrt{2}, \sqrt{3}) and used this method to find a polynomial with rational coefficients having that element as a root.

At the end of the day, to prove that such a method always works requires the field theory we have glossed over: what is \mathbb{Q}(\alpha, \beta) in general, why is it finite-dimensional, etc. This constructive method, which assumes the Cayley-Hamilton theorem, only replaces the non-constructive “linear dependence” argument in Proposition 4 of the original post.