The Riemann hypothesis for function fields

(These are notes adapted from a talk I gave at the Student Arithmetic Geometry seminar at Berkeley)


Probably the most famous open problem in number theory is the Riemann hypothesis. In addition to being worth a million dollars, it is a deep and fundamental problem that has remained intractable since it was first proposed by Bernhard Riemann, in 1859.

The Riemann hypothesis springs out of the field of analytic number theory, which applies complex analysis to problems in number theory, often studying the distribution of prime numbers. The Riemann hypothesis itself has significant implications for the distribution of primes and implies an asymptotic statement about their density (for a precise statement, see here). But the Riemann hypothesis is usually formulated in the language of complex analysis, as a statement about a complex-analytic function, the Riemann zeta function, and its zeroes. This formulation is succint and elegant, and allows the problem to be subsumed into the larger study of the largely conjectural theory of L-functions.

This broader theory allows one to create analogues of the Riemann zeta function and Riemann hypothesis in other contexts. Often these “alternative Riemann hypotheses” are even harder than the original Riemann hypothesis, but there is a famous case where this is fortunately not true.

In the 1940’s, André Weil proved an analogue of the Riemann hypothesis: not for the Riemann zeta function, but for a different zeta function. Here’s one way to describe it: very roughly speaking, the Riemann zeta function is based on the field rational numbers \mathbb{Q} (it can be defined as an Euler product over the primes of \mathbb{Q}). Our zeta function will constructed analogously, but instead be based on the field \mathbb{F}_q(t) (the field of rational functions with coefficients in the finite field \mathbb{F}_q). So instead of the number field \mathbb{Q}, we have swapped it out and replaced it with a function field.

Actually, what Weil proved, and what we will prove today, is the analogue of the Riemann hypothesis for global function fields. This work represents the greatest progress we have towards the original Riemann hypothesis, and serves as tantalizing evidence for it.

There is a general pattern in number theory which looks something like the following: start with a problem in number theory. Adapt the problem from the number field setting to the function field setting. Then interpret the function field as the function field of a curve (usually), and then use techniques of algebraic geometry (for example, \mathbb{F}_q(t) is the function field of a line over \mathbb{F}_q). That is exactly what we will do here: it will therefore look less like complex analysis and more like algebraic geometry

Math (somewhat rushed)

Let C be a smooth projective curve over a finite field \mathbb{F}_q. Let N_r be the set of \mathbb{F}_{q^r} points of C. Then the zeta function of C is defined by

    \[Z(C, T) = \exp \bigg(\sum_{r=1}^{\infty} N_r \frac{T^r}{r} \bigg).\]

Here, we are using T as a change of variables: if we plug in q^{-s} for T, then we obtain an exactly analogous zeta function to the Riemann zeta function, except with respect to the function field of C instead of the field \mathbb{Q}. There are three important properties that we would like Z(C, T) to have: (1) rationality, (2) satisfies a functional equation, and (3) satisfies an analogue of the Riemann hypothesis. Part (3) was proved by André Weil in the 1940’s; parts (1) and (2) were proved much earlier. In this post, I will present a proof of the analogue of the Riemann hypothesis assuming (1) and (2), along the lines of Weil’s original proof using intersection theory. All this material and much more is in an expository paper by James Milne called “The Riemann Hypothesis over Finite Fields: From Weil to the Present Day”. A useful reference is Appendix C in Hartshorne’s Algebraic Geometry; some material also comes from section V.1 on surfaces.

Let g be the genus of C. Then (1) says that Z(C, T) is a rational function of T. The specific function equation of (2) is the following:

    \[Z(C, \frac{1}{qT}) = q^{1-g}T^{2-2g} Z(C, T)\]

It turns out that we can write out Z explicitly: there exist constants \alpha_i for 1 \leq i \leq 2g such that

    \[Z(C, T) = \frac{(1- \alpha_1T) \cdots (1 - \alpha_{2g}T)}{(1-T)(1-qT)}\]

and the functional equation implies that the constants \alpha_i can be rearranged if necessary so that

\alpha_i\alpha_{2g-i} = q.

Now, the analogue of the Riemann hypothesis states the following:

|\alpha_i| = \sqrt{q}.

(To see the connection between this statement and the ordinary Riemann hypothesis, check out this blog post by Anton Hilado)

Notice that, assuming rationality and the functional equation, the Riemann hypothesis will follow from simply the inequality |\alpha_i| \leq \sqrt{q}.

We will prove the Riemann hypothesis via the Hasse-Weil inequality, which is an inequality that puts an explicit bound on N_r. The Hasse-Weil inequality states that

|N_r - (1 + q^r)| \leq 2g \sqrt{q^r}

which is actually a pretty good bound. Why does the Hasse-Weil inequality imply the Riemann hypothesis? Well, if we take the logarithm of Z(C, T) and use the power series for \log(1-x), regrouping terms gives us

N_r = 1 + q^r - \sum_{i = 1}^{2g} \alpha_i^r; so |\alpha_1^r + \cdots \alpha_{2g}^r| < 2g \sqrt{q^r}.

In other words,

\left|\left(\frac{\alpha_1}{\sqrt{q}}\right)^r + \cdots + \left(\frac{\alpha_1}{\sqrt{q}}\right)^r \right| is bounded.

Letting r \to \infty, we have

\max \left| \frac{\alpha_i}{\sqrt{q}} \right| \leq 1, so \alpha_i \leq \sqrt{q} for all i

as desired. (check this works, even if \alpha_i are not distinct)

Proof of the Hasse-Weil inequality

We will prove the Hasse-Weil inequality using intersection theory. First, we will consider C as a curve over \overline{\mathbb{F}_q}. Then there is the Frobenius map \text{Frob}_r: C \to C. If we embed C into projective space, then \text{Frob}_r sends [x_0 : \cdots : x_n] \mapsto [x_0^{q_r} : \cdots : x_n^{q^r}]. We can interpret N_r as the size of the set of fixed points of \text{Frob}_r. Our plan then to use inequalities from intersection theory to bound the intersection of \Gamma_{\text{Frob}_r} and \Delta (the diagonal) in C \times C.

First, let us set up the intersection theory we need. This material is from Chapter V.1 of Hartshorne, on surfaces.

Intersection pairing on a surface: Let X be a surface. There exists a symmetric bilinear pairing \text{Pic }X \times \text{Pic }X \to \mathbb{Z} (where the product of divisors C and D is denoted C.D) such that if C, D are smooth curves intersecting transversely, then

C.D = |C \cap D|.

Furthermore, another theorem we’ll need is the Hodge index theorem:

Let H be an ample divisor on X and D a nonzero divisor, with D.H = 0. Then D^2 \leq 0. (D^2 denotes D.D)

Now let us begin with some general set up. Let C_1 and C_2 be two curves, and let X = C_1 \times C_2. Identify C_1 with C_1 \times * and C_2 with * \times C_2. Notice that C_1.C_1 = C_2.C_2 = 0 and C_1.C_2 = 1. Thus (C_1 + C_2)^2 = 2 > 0.

Let D be a divisor on X. Let d_1 = D.C_1 and d_2 = D.C_2; also, (D - d_2C_1 - d_1C_2).(C_1 + C_2) = 0 (expand it out). The Hodge index theorem implies then that (D - d_2C_1 - d_1C_2)^2 \leq 0. Expanding this out yields D^2 \leq 2d_1d_2. This fundamental inequality is called the Castelnuovo-Severi inequality. We may define \text{def}(D) := 2d_1d_2 - D^2 \geq 0.

Next, let us prove the following inequality: if D and D' are divisors, then

| D.D' - d_1d_2' - d_2d_1' | \leq \sqrt{\text{def}(D)\text{def}(D')} .

Proof (fill in details): Expand out \text{def}(mD + nD') \geq 0, for m, n \in \mathbb{Z}. We can let \frac{m}{n} become arbitrarily close to \sqrt{\frac{\text{def}(D')}{\text{def(D)}}, yielding the inequality. \square

Here’s another lemma we will need: Consider a map f: C_1 \to C_2. If \Gamma_f is the graph of f on C_1 \times C_2, then \text{def}(\Gamma_f) = 2g_2 \text{deg}(f) (where g_2 is the genus of C_2).

Proof (fill in details): Rearrange adjunction formula. \square

Now we have what we need: we will do intersection theory on C \times C. The Frobenius map f = \text{Frob}_r: C \to C is a map of degree q^r, so \text{def}(\Gamma_f) = 2gq^r. We might as well think of \Delta as the graph of the identity map, so \text{def}(\Delta) = 2g. Finally, d_2' = d_2 = d_1' = 1 and d_1 = q^r. Plugging it into the inequality, we get

| \Gamma_f . \Delta - q^r - 1 | \leq \sqrt{(2gq^r)(2g)}

yielding the Hasse-Weil inequality

|N_r - (1 + q^r)| \leq 2g \sqrt{q^r}.

This proves the Riemann hypothesis for function fields, or equivalently the Riemann hypothesis curves over finite fields.

The Weil conjectures

After Weil proved this result, he speculated whether analogous statements were true for not only curves over finite fields, but higher-dimensional algebraic varieties over finite fields. He proposed as conjectures that the zeta functions for such varieties should also satisfy (1) rationality, (2) a functional equation, and (3) an analogoue of the Riemann hypothesis.

Weil also speculated a connection with algebraic topology. In our work above, the genus g was crucial. But the genus can alternatively be defined topologically, by taking the equations that define the curve, looking at the locus they cut out when graphed over the complex numbers, and counting how many holes the resulting shape has. Weil suggested that for arbitrary varieties, topological Betti numbers should play this role: that is, the zeta function of the variety over the finite field should be closely connected with the topology of the analogous variety over the complex numbers.

There’s an interesting blog post that discusses this idea, in our context of curves. But the rest is history. The story of the Weil conjectures is one of the most famous in all of mathematics: the effort to prove them revolutionized algebraic geometry and number theory forever. The key innovation was the theory of étale cohomology, which is an analogue of classical singular cohomology for algebraic varieties over arbitrary fields.

15 triangles in a web of cubics

Consider a homogeneous cubic form in three variables X, Y, and Z, such as

X^2Y - X^2Z + XY^2 -XYZ - Y^2Z + Z^3

Sometimes a cubic form can be factored. In this case, we are lucky: it factors as


but in general we will not be so lucky. It is pretty rare for a random cubic form to be factorable. From the perspective of projective algebraic geometry, a homogeneous cubic form cuts out an algebraic curve in the projective plane:

and a cubic that factors into three linear forms will cut out three lines: a “degenerate” plane cubic. Such a collection of three lines is called a “triangle”.

The space of homogeneous cubic forms is a 10-dimensional vector space with basis X^3, X^2Y, X^2Z, Y^3, Y^2X, Y^2Z, Z^3, Z^2X, Z^2Y, XYZ. However, given a cubic form F, the scaled form cF corresponds to the same curve. Furthermore, if all the coefficients are zero, then the form doesn’t correspond to the curve at all. Thus the space of plane cubics is nine-dimensional projective space.

In this post we will prove the following enumerative result:

A general three-dimensional family of plane cubics contains exactly 15 triangles.

By a three-dimensional family (aka “web”), we mean some embedded copy of \mathbb{P}^3 in this space \mathbb{P}^9 of plane cubics. This geometric result corresponds to the following purely algebraic fact:

A general four-dimensional subspace of the ten-dimensional space of cubic forms contains exactly 15 forms which factor into three linear forms.

(I should probably say something about the term “general”. The statement “a general x \in S satisfies property P” this means that P(x) says that the subset of S for which P(x) holds is dense in S.)

This material is drawn from 3264 And All That by Eisenbud and Harris

The strategy

Let us consider the space of (ordered) triples of lines, or nonzero linear forms up to scaling. This is \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2. We can construct a morphism

\mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2 \to \mathbb{P}^9

which sends a triple of linear forms F, G, H to their product FGH. This map is (in general) 6 to 1, since there are 6 permutations of three (distinct) linear forms.

We will do intersection theory in \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2: specifically, we will pull back the class of a \mathbb{P}^3 in \mathbb{P}^9 to \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2, and count how many points it consists of (in other words, how many triples of linear forms correspond to cubic surfaces in a general web). Then we will divide this number by 6, to count the number of triangles in the family.

Computation in the Chow ring

The morphism

\phi: \mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2 \to \mathbb{P}^9

induces a map of Chow rings:

\phi^*: A(\mathbb{P}^9) \to A(\mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2).

Now, A(\mathbb{P}^9) \cong \mathbb{Z}[x]/(x^{10}) and A(\mathbb{P}^2 \times \mathbb{P}^2 \times \mathbb{P}^2) \cong \mathbb{Z}[a, b, c]/(a^3, b^3, c^3). The class of any \mathbb{P}^3 in A(\mathbb{P}^9) is x^6. Furthermore, \phi^*(x) = a+b+c. So, \phi^*(x^6) = (a+b+c)^6. If we expand this out, removing every term that has any variable to a power of three or greater, we see that every term except the monomial term of a^2b^2c^2 vanishes, and its coefficient is the {6 \choose 2, 2, 2} = \frac{6!}{2!2!2!} = 90. 90 is the the number of ordered triples of linear forms which correspond to cubic forms contained in a general web. Dividing by six, since six ordered triples correspond to one unordered triple (i.e. one distinct triangle), we obtain our answer of 15.

(I will add more details to this…)

Algebro-geometric proof of Cayley-Hamilton

Here is a sketch of proof of the Cayley-Hamilton theorem via classical algebraic geometry.

The set of n x n matrices over an algebraically closed field can be identified with the affine space \mathbb{A}^{n^2}. Let V be the subset of matrices that satisfy their own characteristic polynomial. We will prove that V is in fact all of \mathbb{A}^{n^2}. Since affine space is irreducible, it suffices to show that V is closed and V contains a non-empty open set.

Fix a matrix M. First, observe that the coefficients of the characteristic polynomial are polynomials in the entries in M. In particular, the condition that a matrix satisfy its own characteristic polynomial amounts to a collection of polynomials in the entries of M vanishing. This establishes that V is closed.

Let U be the set of matrices that have n distinct eigenvalues. A matrix has n distinct eigenvalues if and only if its characteristic polynomial has no double roots when it splits. This occurs if and only if the discriminant of the characteristic polynomial is nonzero. The discriminant is a polynomial in the coefficients of the characteristic polynomial. Thus the condition that a matrix have n distinct eigenvalues amounts to a polynomial in the entries of M not vanishing. Thus U is open.

Finally, we have to show U \subseteq V. It is easy to check this for U a diagonal matrix. The general result follows from the fact that the determinant and thus the characteristic polynomial is basis-invariant.

I learned this from

“Locally of finite type” is a local property

Previously we showed that morphisms locally of finite type are preserved under base change. We can use this to show that

(*) Given a morphism of schemes p: X \to Y, the preimage of any affine \text{Spec }A \subset Y can be covered by affines such that the corresponding ring maps are of finite type.

Alternatively, if we define a morphism locally of finite type to be one that satisfies (*), then what we are saying is that such a property can be checked on a cover; we can replace “any affine” with “an affine in a cover of affines”.

Let’s try to prove (*). First, we base change to \text{Spec }A. Since the morphism p^{-1}(\text{Spec }A) \to \text{Spec }A is also locally of finite type, we can cover \text{Spec }A by affines \text{Spec }A_i such that their preimages can be covered by the spectra of finitely-generated A_i-algebras B_{ik}. However, we don’t know if these are finitely-generated A-algebras! To fix this, we base change to even smaller affines. Cover \text{Spec }A_i by basic open sets \text{Spec }A[f^{-1}]. This gives us a cover of each \text{Spec }B_{ik} by basic open sets of the form \text{Spec }(A[f^{-1}] \otimes_{A_i} B_{ik}). Since A_i \to B_{ik} is of finite type, A[f^{-1}] \to A[f^{-1}] \otimes_{A_i} B_{ik} is of finite type. Since A \to A[f^{-1}] is clearly of finite type, A \to A[f^{-1}] \otimes_{A_i} B_{ik} is of finite type, giving us the desired cover of p^{-1}(\text{Spec }A). The following diagram may be illustrative (every square is a pullback)

Group schemes and graded rings

In this post we will describe how an action of the multiplicative group scheme \mathbb{G}_m on \text{Spec }R defines a \mathbb{Z}-grading of R. A future post may describe how this relates to projective schemes. (I will do all of this using diagrams, but there may be some easier way using the functors of points). All this was taught to me by Mark Haiman in Math 256B (Algebraic Geometry) at UC Berkeley.

Fix a field k; we will work in the category of k-schemes. Thus R will be a k-algebra, and we will establish a graded k-algebra structure on R. However, none of our arguments change if we just let k be \mathbb{Z}. A group scheme is a group object in the category of k-schemes. A precise definition can be found here. Most importantly, group schemes can act on other schemes. The definition of a group scheme action can be found here. Note that all definitions are given by diagrams (or functor of points). For example, we specify the “identity element” of a group scheme by a map \text{Spec }k \to G, rather than selecting some point in the underlying topological space.

\mathbb{G}_m is defined as \text{Spec }k[s, t]/(st-1) = \text{Spec }k[t, t^{-1}]. (for shorthand, we will write k[t, t^{-1}] as k[t^\pm]). As a variety, it can be thought of as k^*, the “punctured affine line”. Its group operation is given by a map \mathbb{G}_m \times_k \mathbb{G}_m \to \mathbb{G}_m which corresponds to the k-algebra map \mu: k[t^\pm] \to k[t^\pm] \otimes_k k[t^\pm] \cong k[t^\pm, u^\pm] defined by t \mapsto tu. The identity is given by a map \text{Spec }k \to \mathbb{G}_m corresponding to i: k[t^\pm] \to k defined by t \mapsto 1.

Suppose \mathbb{G}_m acts on \Spec R. The action map \mathbb{G}_m \times_k \text{Spec }R \to \text{Spec }R corresponds to a k-algebra map \phi: R \to R \otimes_k k[t^\pm] \cong R[t^\pm] such that the following diagrams commute:


\xymatrix{R\ar[r]^{\phi}\ar[d]^{\phi} & {R[t^\pm]} \ar[d]^{id_R\otimes \mu}\\{R[u^\pm]} \ar[r]^{\phi \otimes id_{k[u^\pm]}} & {R[t^\pm, u^\pm]}}


\xymatrix{R\ar[r]^{\phi}\ar[d]^{id_R} & {R[t^\pm]} \ar[dl]^{i}\\ R}

For r \in R, write \phi(r) = \sum_{-\infty}^{\infty} r_it^i \in R[t^\pm], where almost all the r_i are zero. Then the first diagram implies that

(*) if \phi(r) = r_it^i (i.e. the polynomial is just a single monomial), then \phi(r_i) = r_it^i.

This is because, along the top and right arrows, we have r \mapsto r_it^i \mapsto r_it^iu^i and along the left and bottom arrows we have r \mapsto r_iu^i \mapsto \phi(r_i)u^i. Furthermore, the second diagram says that

(**) for all r, \sum r_i = r.

Therefore, letting R[t^\pm]_d stand for the degree d homogenous component of R[t^\pm] (so that it consists of multiples of t^d), let R_d := \phi^{-1}(R[t^\pm]_d). Since all the R[t^\pm]_d are disjoint, their preimages are disjoint as well. Furthermore, for an arbitrary element r, we have r = \sum r_i by (*), and by (**), we have that each r_i \in R_i. Thus R = \sum R_i as a direct sum.

It remains to show that R_mR_n \subseteq R_{m+n}. But this is easy: if r_m \in R_m, r_n \in R_n, then \phi(r_mr_n) = \phi(r_m)\phi(r_n) = r_mr_nx^{m+n} \in R[t^\pm]_{m+n}, so r_mr_n \in R_{m+n} as desired.