Math 263x: Computational Techniques in Number Theory and Algebraic Geometry (Fall 2012)

Math 263x is a new “topics class” concentrating on some of the computational tools and techniques that can complement theoretical research in number theory, algebraic geometry, and related fields. We meet Mondays and Wednesdays from 1:00 PM to 2:00 PM in Room 222 of the Science Center.

If you find a mistake, omission, etc., please let me know by e-mail.

September 5: Introduction
September 10: Belyi maps and some of their uses
September 12: Start on computation of Belyi functions
September 17: More on Belyi polynomials etc.
September 19: Resultants
September 24: A cube minus a square
September 28 [sic]: Interlude on sorting and searching
October 1: Using multivariate (and usually p-adic) Newton's method
October 3: Wednesday, Oct. 3: Multivariate p-adic Newton, cont'd
[October 8: No class: University holiday (Columbus Day)]
October 10: positive- [usually 1-]dimensional families
October 15: Curves of genus 0 through 5
October 17: Hyperelliptic curves; curves of genus 1; a Weil-Belyi function on an elliptic curve (and parametrizing 5-torsion etc.)
October 22: Overview of complex reflection groups and their invariant rings (which give rise to highly symmetric curves)
October 24: The Hilbert-Molien series of a group representation; A4, S4, A5 acting on the Riemann sphere
Oct. 29: NO CLASS: Harvard cancelled all classes due to anticipated severe weather
October 31: Covariants of subgroups of GL2 related to A4 and S4
November 5: Covariants of subgroups of GL2 related to A5; overview of the exceptional reflection groups of dimension 3 and higher
November 7: Hurwitz quaternions, the W(D4) lattice, and the W(F4) invariants; Belyi functions parametrizing trinomials with interesting Galois groups
Nov. 12 and 14: NO CLASS: I'll be in Lausanne, Switzerland
November 19: Introduction to (classical (elliptic)) modular curves
[November 21 is the start of Thanksgiving break]
November 26: Fricke and Atkin-Lehner involutions of X0(N) and some of their uses
November 28: Non-Atkin-Lehner elements of the normalizer of Γ0(N)



Wednesday, Sep. 5: Introduction

After outlining the general purpose and spirit of the class, we give an example that illustrates some of our concerns in a context that does not require most of the background that will be freely assumed later in the semester. The example is Fermat's celebrated two-squares theorem: a prime p can be written as a sum of two distinct squares if and only if p≡1 mod 4. The representation is unique up to switching the two summands. So take say p = 31415926535897932384626433832795028841. Fermat promises an essentially unique solution to the Diophantine equation p = x2 + y2.

HOW TO ACTUALLY FIND THIS SOLUTION?

Trying all x < p1/2 works in finite time, but not “finite enough” even with the computer (and if/when the computers catch up I can double the number of digits in p…). One proof of the theorem almost yields an efficient algorithm, using an idea attributed to Cornacchia (1908): x/y is a square root of −1 mod p, and conversely given such a root we recover (x,y) in time ≪logcp by lattice reduction (which in two dimensions is basically the Euclidean algorithm). All the ingredients we used are already implemented in packages such as gp, so the resulting algorithm can be expressed by a one-liner such as

fermat(p) = qflll([lift(sqrt(Mod(-1,p))),p;1,0])[1,]

[Victor Miller 1992, transcribed some time later into the new gp syntax]. So for instance

fermat(p) = qflll([lift(sqrt(Mod(-1,p))),p;1,0])[1,]
#
fermat(31415926535897932384626433832795028841)

returns [4223562448517994405, -3684758713859920604] in about  0 ms. (and this would even be feasible, if arduous, to do by hand).

Why did we write that this analysis “almost yields an efficient algorithm”? Well, how do we find the square root mod p? An embarrassment: it's easy to evaluate the Legendre symbol, but if it's +1 we generally don't know how to get a square root in deterministic polynomial time unless we assume the extended Riemann hypothesis for the Legendre character mod p — though we can do it in “random polynomial time”. However, modular square roots of small numbers can be evaluated in polynomial (albeit not practical) time by using the arithmetic of elliptic curves mod p ! That was the application Schoof gave for his algorithm [ = René Schoof: Elliptic Curves over Finite Fields and the Computation of Square Roots mod p, Math. of Computation 44, pages 483–494 (1985)] for counting rational points on an elliptic curve mod p. In our case we count points on the curve Y2 = X3 − X, which is relevant because it has complex multiplication by a square root of −1: the count is p+1±2x or p+1±2y, from which we recover the two-square representation in determinstic polynomial time.


Monday, Sep. 10: Belyi maps [unramified covers of P1 − {0,1,∞}] and some of their uses

See Serre's Topics in Galois Theory (Boston: Jones & Bartlett 1992) for the application to the inverse Galois problem (perhaps the best-known arithmetic application) and other results concerning Belyi functions. In algebraic geometry, such functions might be most famous for the equality case in the Hurwitz bound of 84(g−1) on the number of automorphisms of a Riemann surface (a.k.a. algebraic curve over C) of genus g>1: if C attains this bound, or more generally has more than 12(g−1) automorphisms, then the quotient map CC/Aut(C) is a Belyi function.

Such functions appear surprisingly often in other contexts; one of these years I might write an article on the ubiquity of Belyi functions. For now, I give references and/or links to some of the places where I've run across Belyi functions over the years:

• ABC implies Mordell, International Math. Research Notices 1991 #7, 99–109 [bound with Duke Math. J. 64 (1991)].
The Klein Quartic in Number Theory (1998, in the MSRI volume The Eightfold Way on Klein's quartic curve x3y + y3z + z3x = 0)
• “slides” from a 1999 talk at MSRI on “Other Arithmetic Manifestations of Branched Covers
Shimura curve computations (1998) [especially the curves associated to groups commensurate with arithmetic triangle groups]
Rational points near curves and small nonzero |x3y2| via lattice reduction (2000) [see the start of Section 4, pages 22–25; some of the other material here will figure later in the course]
• Trinomials ax7+bx+c and ax8+bx+c with Galois Groups of Order 168 and 8·168 (with Nils Bruin), Lecture Notes in Computer Science 2369 (proceedings of ANTS-5, 2002; C.Fieker and D.R.Kohel, eds.), 172–188.
• My HCMR article on “The ABC’s of Number Theory” (2007)


Wednesday, Sep. 12: Start on computation of Belyi functions

A bit more detail on the topology: a Belyi map CP1(C) of degree n is determined by permutations g0, g1, g that satisfy g0g1g=id and generate a transitive group G of permutations of the n sheets. This group is then Galois group of the Galois closure of the function-field extension C(C) / C(t) associated to the cover (where t is a coordinate on P1(C)). Warning: if the cover is defined over a field F that is not algebraically closed then one might have to first take an extension of this ground field before obtaining a function-field extension with Galois group G; this is already seen for the (2-point!) Belyi cover t = zn if n > 2 and F is a field like Q that does not contain the nth roots of unity. Also, the gi are defined only up to conjugation in the normalizer of G in Sn. Distinct solutions might still be algebraically conjugate because the generators of π1(P1(C) − {0,1,∞}) are not canonical. The number of solutions of g0g1g=id given the G-conjugacy classes of the gi can be computed from the character table (see again Serre), though checking whether a given solution actually generates G can be trickier. It has been done for enough examples to show (together with more theory about fields of definition, plus Hilbert's specialization theorem) that every sporadic group except possibly M23 is the Galois group of infinitely many extensions of Q! Some of these extension are so big that we don't expect to ever see them, but for smaller groups like M11 (and interesting non-sporadic groups) we can actually compute the Belyi covers and specialize to find explicit extensions.

Start on computing explicit examples. We'll start with some of the simpler cases, where C is rational g=id is an n-cycle so the map is a polynomial. Some of these cases we have already seen: the two-point covers t = cxn (where g1 is trivial) and t = cxa1(1−x)a2 where a1+a2=n and c is chosen to put the image of the critical point a1/n at t=1. Thus g0 is the product of cycles of length a1 and a2, and g1 is a simple transposition.

Next we might make g0 the product of three cycles, of lengths a1, a2, and a3, so we have t = c xa1 (x−1)a2 (x−w)a3 and must also choose the parameter w. For generic w, there are two critical points other than 0, 1, and w, namely the roots of the quadratic in the numerator of a1/x + a2/(x−1) + a3/(xw). There are two ways to make this a Belyi function: either the roots coincide, in which case g1 is a 3-cycle, or they are distinct but mapped to the same t, making g1 a double transposition. The former is simpler, so we'll start with that case. The roots coincide if and only if the discriminant of the quadratic in x vanishes. We calculate that this discriminant is (a1+a2)2 w2 − 2(a1na2a3) w + (a1+a3)2. So there are two solutions of w, and indeed we can see directly that there are two solutions of g0g1g=id up to conjugacy in the specified conjugacy classes, provided the ai are distinct. If not, there's a single solution, but we might not be able to put the zeros at x=0, 1, w because two (or all three) might be algebraic conjugates. Example: n=5, a1=3, a2=a3=1 yields

x3 (x2+15x+60) = (x+6)3 (x2−3x+6) − 64
so −x3(x2+15x+60)/64 [or equivalently x3(x2−15x+60)/64] is a quintic Belyi polynomial with group A5. In any case the solutions of the quadratic in w cannot be rational, or even real, beacuse the discriminant of that quadratic is −16a1a2a3n < 0.

In general if g0 is the product of m cycles and g1 is an (m+1)-cycle (with the other nm−1 points fixed) we expect m! distinct Belyi maps assuming the cycle lengths are pairwise distinct, but a unique map if all but one of the cycle lengths is the same. Exercise: Show how to find the corresponding unique map algebraically; what happens if m|n and all m cycles are of the same length?


Monday, Sep. 17: More on Belyi polynomials etc.

POSTSCRIPT to the three-cycle analysis last time: one could also see directly that such a polynomial cannot exist over R, because its logarithmic derivative a1/x + a2/(x−1) + a3/(xw) would be a decreasing function of x and thus couldn't have a real critical point. On the other hand, it is possible to have real, and even rational, solutions of “−16a1a2a3n = square” if we allow some ai to be negative, and this yields Belyi functions of a different kind, where g1 is still a 3-cycle but each of g0 and g is a product of two cycles, say of lengths a1, a2 and b1, b2 where a1+a2 = b1+b2 = degree of the rational function. The Diophatine equation

a1+a2 = b1+b2,       a1a2b1b2 = d2
gives a double cover of P2 that is a rational Del Pezzo surface; for instance, we may choose any rational numbers for r:=a1/a2, s:=b1/b2 subject to rs=square, and then solve a1+a2=b1+b2. The first few nontrivial solutions give three Belyi functions of degree 10, with {a1, a2} and {b1, b2} any two of {1,9}, {2,8}, and {5,5}. For instance, if we choose {1,9} and {2,8} then our function has the form P(x) = x9(x+w)/(x+1)2 for some w, and then the condition that P’/P have a double root yields w=2 or w=50/49. [NB in each case w and w−1 are S-units for S={2,3,5,7}, as expected by Beckmann.] Exercise: Verify these values of w, and check that in the first case we obtain an identity
x9(x+2) + (39/28) (x+1)2 = (2x+3)3 · septic(x)
where the septic has S-unit discriminant (indeed a {2,3}-unit in this case). Obtain the analogous identity for w=50/49, and/or for one or more of the functions with b1=b2=5.   END POSTSCRIPT

Before proceeding to the case that g1 is a double transposition, consider the generalization where g1 is an (m+1)-cycle and g0 is the product of m+1 cycles of lengths a1, a2, …, am+1. That is, P has m+1 distinct roots of multiplicities a1, a2, …, am+1, and P−1 has a root of multiplicity m. As before, this is equivalent (modulo scaling P) to the condition that the logarithmic derivative P’/P have a root of multiplicitly m (note that the numerator of P’/P is a polynomial of degree m).

Suppose first that the ai are distinct. Then the roots of P are in the field of coefficients of P. As before, once m>1 this field cannot be Q, or indeed any subfield of R, because P’/P is monotone decreasing; but we can still ask to compute those roots as algebraic numbers. There are m! possibilities: starting with the n-cycle, we must choose m+1 of its vertices to divide the circumference into segments of lengths ai in any order, and there are m! choices up to rotation along the cycle. (If the points moved by g1 were not in cyclic order on g then g0 would have fewer than m+1 cycles and the covering curve would have positive genus.  Cf. the extensive literature on Grothendieck's “dessins d’enfant”.) So we write

P(x) = xa1 Π1≤im (x+wi)ai+1
for some distinct nonzero wi. [Note that we do not insist on scaling these to put w1 at 1, to retain the symmetry among the roots of P, which is parametrized by the point (w1 : w2 : … : wm) in projective (m−1)-space; permutations of the roots act on this space by projective linear transformations, and the subgroup that fixes the first root acts by coordinate permutations.] Then the numerator of P’/P is a homogeneous polynomial of degree m in x and the wi. It soon follows that the condition that this numerator be an m-th power amounts to m−1 homogeneous equations in the wi, of degrees 2, 3, …, m. Since we already know to expect m! = 2 · 3 · · · m solutions, these solutions must constitute the complete intersection of the corresponding m−1 hypersurfaces in Pm−1. [Interlude about resultants, elimination theory, Gröbner bases, etc. went here; more about resultants next time.]

Recall that we postponed till later the case that g0 is the product of only three cycles, so t = c xa1 (x−1)a2 (x−w)a3 for some w (where a1, a2, and a3 are the cycle lengths), but g1 is a double transposition rather than a 3-cycle. In this case the count of solutions of g0g1g=id grows with the degree n even without increasing the number of cycles in g0, so we expect that typically w and thus P will have to be in ever larger number fields, but can still ask how to compute them.

In this setting the critical points x1, x2 at the roots of a1/x + a2/(x−1) + a3/(xw) are distinct but satisfy P(x1) = P(x2) [and we can normalize the common value c to 1 by multiplying P by c−1 to put the third branch point at t=1]. Let Q(x) be the quadratic polynomial with roots x1, x2. We could solve Q(x)=0 to find x1, x2 as algebraic functions of w, and then work out what P(x1) = P(x2) means as an equation for w; equivalently, and less laboriously, we could ask that the remainder of P mod Q be a constant polynomial: the linear coefficient is some function of w, which we would set to zero. But this would still yield an unnecessarily complicated equation, because the values of w that make x1=x2 would arise as spurious solutions. Instead we exploit the fact that P must be congruent to a constant not just mod Q but even mod Q2, and this condition would fail for x1=x2 (when P is congruent to c only mod Q3/2). So we'll use for our equation the vanishing of the x3 coefficient of P mod Q2. Exercise: is the vanishing of this coefficient already enough to assure that P mod Q2 is a constant polynomial, or might we have to then impose the further conditions that the x2 and x coefficients vanish as well?

Consider for example the case that n=6 and (a1, a2, a3) = (4,1,1). As we already saw, the coincidence a2=a3 lets us simplfy P to the form Cx4(x2+ax+b) for some nonzero C and parameters a, b determined up to scaling to λa, λ2b. Even so, we expect two inequivalent solutions [corresponding to the two isomers of cyclohexadiene]. This is one of the simplest cases with a double transposition. Here it turns out that for each solution the Galois group is properly contained in S6 (though it cannot be the alternating group A6 or a subgroup of A6 , because g0 and g are odd permutations). When the two “double bonds” are adjacent, we get GL2(F5) (a.k.a. the (3-)transitive copy of S5 in S6, a.k.a. the image of the points stabilizer in S6 under an outer automorphism of S6); when the two “double bonds” are opposite, we have the imprimitive 48-element subgroup of S6, isomorphic to the symmetrics of the octahedron acting on its six vertices as the stabilizer of the partition into three opposite pairs.

Indeed we find that Q(x) = 6x2 + 5ax + 4b, and then that the x3 coefficient of P mod Q2 is 144ab − 100a3. Thus either a = 0 or 36b = 25a2. The solution a = 0 makes P a polynomial in x2, which corresponds to the imprimitive solution. Thus the other case must be the GL2(F5) cover. A convenient choice of scaling is (a, b) = (6,25), giving the identity 27x4(x2+6x+25) = (3x2−12x+20) (3x2+15x+50)2 − 50000.

Exercises:
i) Since we just got a sextic cover with Galois group S5, there must also be a Belyi map of degree 5 giving the same Galois closure. The cycle structures are 32 / 41 / 221 (corresponding to the S6 cycle structures 6 / 411 / 2211). Find the Belyi map.
ii) What happens for the Belyi polynomials of degree 7 for which g1 is a double transposition and g0 has shape 331 or 421?


Wednesday, Sep. 19: Resultants

POSTSCRIPT on some Belyi polynomials of degree at most 7 for which g1 is a double transposition: the examples of degree 6 aren't quite the smallest, but we've in effect seen all those of smaller degree, with g0 and g1 interchanged (which switches P with 1−P). Thus the only degree-4 possibility is 4 / 211 / 22, where g0 is a simple transposition, and in degree 5 one of the two possibilities is 5 / 311 / 221, where g0 is a 3-cycle. This leaves only 5 / 221 / 221, but when g0 and g1 are both involutions G must be dihedral and we get a cover equivalent to a Čebyšev polynomial. Here we can calculate directly the identity x (x2+5x+5)2 + 4 = (x+4) (x2+3x+1)2; translating x by 2 yields the equivalent form (x−2) (x2+x−1)2 + 4 = (x+2) (x2x−1)2; and then replacing x by 2x and dividing P by 2 yields the fifth Čebyšev polynomial 16x5 − 20x3 + 5x which is ramified above ±1, each of which has one simple and two double preimages.
As for degrees 6 and 7…
• In degree 6, besides the cases with cycle structures 6 / 411 / 2211, there's also 6 / 222 / 2211 and 6 / 321 / 2211. The former again has G dihedral (and also imprimitive in this case, since 6 is composite), giving rise to a Čebyšev polynomial. For the latter, there are three solutions, all generating the alternating group A6: writing P(x) = cx3 (x+1)2 (x+w), we compute that w must be one of the three roots of 25w3 − 12w2 − 24w − 16, an irreducible polynomial that happens(?) to give the field Q(21/3); explicitly w = 24/3 / (3−21/3). [gp's nfdisc reported that disc(Q(w)/Q) = −108, which sufficed to identify the field with Q(21/3); there are various ways then to find w as an element of that cubic field, and then it's known that for any cubic extension K/F any two elements of K\F are related by a unique fractional F-linear transformation that can be found by linear algebra.]
• 7 / 331 / 22111: here G is necessarily the 168-element group; the polynomial is defined over Q((−7)1/2). The Galois closure is the Klein quartic, and x is a rational coordinate on the quotient of this quartic by one of the two kinds of index-7 subgroups of G (both isomorphic with S4 and switched by an outer automorphism of G). While the Klein quartic can be defined over Q, its automorphism group cannot. Again I refer to
my article on the Klein quartic.
• 7 / 421 / 22111: here
P(x) = cx4 (x+1)2 (x+w) and there are four possibilities for w, but they are not all conjugate: two are the roots of 2w2+w+1, which again generate Q((−7)1/2), and the others are roots of 27w2−18w−25 and generate Q(211/2). Indeed there are two possible Galois groups, G168 and the full alternating group A7; which one corresponds to which pair of w's? [...]
END POSTSCRIPT

Motivation, definition, and properties of resultants of univariate polynomials, which we'll use to eliminate one of two variables when we've brought one of these calculations down to solving two simultaneous nonlinear equations. The Sylvester matrix of P and Q has corank equal the degree of gcd(P, Q), as can be seen by identifying the row kernel with {(A, B) : deg(A) < deg(Q),  deg(B) < deg(P),  AP+BQ = 0}. If ξ is a common zero of P and Q then the column vector (ξn−1, ξn−2, …, ξ2, ξ, 1) (where n = deg(P) + deg(Q) is the matrix size) is in the kernel. This fully accounts for the kernel if gcd(P, Q) has distinct roots. What happens if there are some roots of multiplicity 2 or greater?

Exercise: Use this to find the Belyi polynomials of degree 11 with cycle structures 3312 and 2413 (a.k.a. 33311 and 2222111) above 0 and 1. [Start by writing P(x) = C (x3+ax+b)3 (x2+cx+d) ≡ constant mod Q2 where Q(x) = P’(x) / (x3+ax+b)2.] What's going on?


Monday, Sep. 24: A cube minus a square

POSTSCRIPT re degree-11 Belyi polynomials: for 11 / 3312 / 2413 there are 10 possibilities, in two Galois orbits of sizes 2 and 8. The larger gives G = A11, but the pair gives functions defined over Q((−11)1/2) for which G is the simple group of 660 elements, isomorphic with PSL2(F11). [This is the last of Galois' list of subgroups of index p in PSL2(Fp), here isomorphic with A5; we already saw the S4 and A4 cases, for p=7 and p=5 respectively. There's also the case of the index-6 subgroup of PSL2(F9) ≅ A6.] Curiously a very similar thing happens if we change the cycle structure of g0 to 11 / 4213 / 2413, as Andrew Sutherland tried: again 10=2+8 possibilities with the larger orbit giving G = A11, and the smaller orbit defined over Q((−11)1/2), but here the quadratic orbit gives G = M11, the smallest Mathieu group! Both of these appear in the list of Galois groups of polynomials in Müller's paper: see the first item under (ii) and the second under (iii) in the statement of the main theorem (between pages 3 and 4). The appearance of Q((−11)1/2) in both cases is no coincidence: it ultimately comes down to the fact that in both PSL2(F11) and M11 the 11-cycle g is conjugate with its k-th power iff k is a quadratic residue of 11, even if we consider conjugacy in the normalizer of the group in S11. The quadratic extension Q((−11)1/2) arises as the subfield of the 11th cyclotomic field Q(μ11) fixed by the squares in Gal(Q(μ11) / Q) = (Z/11Z)*.
END POSTSCRIPT

We begin our consideration of the important special case where g0 and g1 are 3- and 2-cycles with few or no fixed points, corresponding to A + B = C where B and C are nearly a square and nearly a cube (“nearly” = within low-degree factors). Such identities arise in connection with Hurwitz curves (and other highly symmetric Riemann surfaces), and also in connection with Hall's conjecture (which asserts that for integers x, y either x3 = y2 or |x3y2|ε x½−ε). Rather more structurally, because the discriminant of a cubic X3+pX+q is −(4p3+27q2), pairs (P, Q) of polynomials for which P3Q2 has many repeated factors often appear in connection with elliptic and modular curves and elliptic surfaces.(*) Sometimes these connections overlap: the identity (x2+10x+5)3 − 1728x = (x2+4x−1)2 (x2+22x+125) yields (via a “Pell equation”) the only known infinite family of solutions of 0 ≤ |x3y2|x½ [Danilov 1982]; the associated Belyi map, with exponents 51 / 33 / 2211, also gives the cover X0(5) → X(1) of modular curves (a connection noted in [NDE 1998]: if x = 125(η(5τ) / η(τ))6 = w5(η(τ) / η(5τ))6 then j(τ) = (x2+10x+5)3/1728x); and the Galois closure X(5) is the icosahedral Galois cover P1P1/A5P1.

(*) Recall that any cubic polynomial X3+aX2+bX+c can be brought into the form X3+pX+q using the change of variable XXa/3, though sometimes another choice of translate is more useful. For example, Hall's identity with deg(P)=8, deg(Q)=12, and deg(P3Q2)=5 has P(x) = 4(x8 − 2x7 + 7x6 − 6x5 + 11x4 + 4x3 + 12x + 1) [without the factor of 4 the polynomial Q would not quite be in Z[x]; Hall actually used P(x+1), which has larger coefficients]. Then Q is the truncation of the Laurent expansion of P3/2 about x = ∞ (the xk coefficients in this expansion vanish for 1 ≤ k ≤ 6). But it's much simpler to write the corresponding cubic as X3 + (x4x3+3x2+1) X2 − 2(x3x2+2x) X + (x2x+1), and this actually reflects a natural approach to finding these P and Q.


Friday, Sep. 28: Interlude on sorting and searching

[There was no class Wednesday the 26th due to the Yom Kippur fast, so I offered to lecture Friday instead; lacking a quorum to proceed with Newton's method, I substituted a tangential lecture on some applications of sort-and-search in computational number theory]

A basic problem: given a list xi (1<i<N), find all coincidences xi = xj with i ≠ j. This seems to require (N 2N)/2 pairwise tests, but in fact can be done in time O(N log N) by first sorting the list and then looking for consecutive matches. The key fact that sorting takes time only O(N log N) is nontrivial but has been known for some time (see e.g. the beginning of Volume III of Knuth's The Art of Computer Programming). So is the fact that O(N log N) is optimal if each step can only compare two list elements (because it takes at least log2(N!) comparisons to distinguish all N! possible orderings; see “bucket sort” and, rather more facetiously, “spaghetti sort” for models of computation that allow for O(N) sorting). This reduction from order N2 to N log N makes sorting a powerful computational tool in various contexts, despite the additional space cost (usually all N of the xi must be stored at once, even when xi is given by a function of i, when it takes practically no space to test xi = xj for each pair {i, j}). A typical application is finding all solutions of f (i) = f (j) with 1≤i<jN, or all solutions of f (x)=g(y) with x and y in different sets. Unix utilities such as sort, uniq, and comm can be very helpful here because they implement the relevant algorithms in an efficient and often convenient form; when the files aren't in quite the right format, grep, sed, and (for more complicated tasks) editors like vi and emacs, can provide the needed pre/inter/post-processing. For example, it is easy to find in a wordlist two common words that are the same except that one begins with C and the other with U (and are more familiar and longer than crease/urease). [Here's the solution.]

Besides its application to this kind of wordplay, fast sorting finds use in some computational problems in number theory. A typical example is the search for elliptic curves E/Q with many small integral points (“small” meaning comparable with the coefficients of E; that is, xH2 and yH3 if the coefficients ai are O(Hi)). In this ANTS-6 (2004) paper with Mark Watkins we did this as a heuristic proxy for large Mordell-Weil rank. The factorization trick to (in essence) find y, y and B given x, x and A in x3 + Ax + By2 = x3 + Ax’ + By2 = 0 will recur later this term when/if we reach parametrizations of elliptic surfaces with two sections. Another kind of application is to searching for solutions of Diophantine equations of the form  f (x, y) = g(z, t). When  f  is itself of the form  f (x, y) = f1(x) + f2(y), and the same is true of g, there are further nontrivial improvements that reduce the space requirement from the expected order of N 2 down to N; see for instance Dan Bernstein's sortedsums page. (It still takes N 2 time, though there's no factor of log N because the lists of f1, f2, g1, g2 values can be pre-sorted in time only N log N.)


Monday, Oct. 1: Using multivariate (and usually p-adic) Newton's method

For a paradigm we shall use Hall's identity, already exhibited at the end of last Monday's notes: deg(P3Q2)=5, where P is the octic P(x) = 4(x8 − 2x7 + 7x6 − 6x5 + 11x4 + 4x3 + 12x + 1), and Q is the truncation of the Laurent expansion of P3/2 about x = ∞, whose xk coefficients in this expansion vanish for 1 ≤ k ≤ 6. This is the first primitive example of an ABC identity deg(P3Q2) = m+1 with deg(P) = 2m and deg(Q) = 3m; this can also be seen combinatorially from the structure of permutations g0, g1, g of 6m objects that satisfy g0g1g=id and have cycle structures 32m, 23m, 15m+1(5m−1); they correspond to trees on 2m vertices (the 3-cycles), each of degree 1 or 3, embedded in the plane (so each cubic vertex has an orientation, and therefore not quite the same as the azanes Nm−1Hm+1, which do not come with a planar embedding — and much different from boranes, which turn out to be much more complicated than ball-and-stick chemistry would suggest). Hall's example has m=4; see below for m=1,2,3.

For any m, we can normalize P and Q to be monic, and require the Laurent expansion of P3/2 to match Q to within O(x1−2m); that is, the xk coefficient must vanish for 1 ≤ k ≤ 2m−2. Each coefficient is a polynomial in the 2m non-leading coefficients of P, call them ci (i = 0, 1, 2, … 2m−1), so it might seem we have 2 equations too few; but in fact there are only 2m−2 parameters for P once we account for the two-dimensional group of affine linear transformations (the ax+b group”). We can choose a canonical translation by setting the x2m−1 coefficient of P, and thus also the x3m−1 coefficient of Q, to zero (these are the next-to-leading coefficients; for m=1 this is the familiar “completing the square”). This leaves one dimension of equivalences, scaling x to ax for some nonzero a; this multiplies each ci by a2mi. We can thus interpret (c2m−2 : c2m−3 : c2m−4 : … : c1 : c0) as a point in (2m−2)-dimensional weighted projective space with weights (2, 3, 4, …, 2m−1, 2m), and are looking for a point in the intersection of (2m−2) hypersurfaces, with the vanishing of the xk coefficient corresponding to a hypersurface of weighted degree 2m+k.

We encountered such a problem before (see the end of the Sep.12 notes), where the number of solutions was given by Bézout's product formula. But here such a formula would grossly overestimate the number of solutions because there's an (m−2)-dimensional variety of spurious solutions (P, Q) = (R 2, R 3). Indeed it is not immediately obvious that there are non-spurious solutions; their existence follows from the topological construction of Belyi functions, but that still leaves the question of computing the polynomials explicitly. This is easy for the first few m, especially since the first few equations must be linear in the first few coefficients (counting from c0), which lets us solve for the first m/3+O(1) coefficients as rational functions of the remaining 5m/3 or so. But then we're left with a complicated system of nonlinear equations. m=4 is the first case where resultants fail us: the first equation has weight 13, so is linear in c0 (weight 8), but the next equation (weight 14) is already quadratic in c1 (weight 7), and there are still four more dimensions to go. What to do? [Since we're using this example as a paradigm, we pretend that we don't know how to exploit the special “cube minus a square” structure; but even those tricks give out before long, leaving us with even more complicated simultaneous equations in four or more variables.]

We'll exploit our foreknowledge that the solutions must be rational numbers. (We shall soon see how to adapt this method to solutions that are not necessarily rational but still algebraic of small degree.) We do not know in advance how complicated these rational numbers must be (in the present case we at least pretend not to know…), but we hope that they're simple enough that we can approximate them closely enough to guess the numerator and denominator, and then confirm the guess by checking that we have an exact solution of our system of equations. The complexity of a rational number c is measured by its height H(c): if c=r/s in lowest terms then H(c) = max(|r|,|s|). The number of solutions of H(c) ≤ M is asymptotically proportional to H2, so we need enough precision to tell at least that many numbers apart, i.e. more than 2 log H(c) digits; equivalently, c must be known to within O(H−2). Fortunately that necessary level of accuracy is also enough to recover r and s, and not just in theory but practically, using continued fractions, a.k.a. the Euclidean algorithm. This is a common enough application that it is often implemented as a pre-existing routine, see e.g. bestappr in gp.

If we have an approximation that's close but not good enough for this purpose, we can repeatedly improve it using Newton's method. Suppose we want to find a zero x* of a differentiable function F on d-dimensional space, and have an approximation x0 that's within ε of x*. Then, as in the familiar case d=1, we expect to get a better approximation by setting x1 = x0 − (F’(x0))−1F(x0). Note that F is a matrix-valued function. If this function is continuous, and F’(x*) is invertible, then x1 is within O2) of x* provided that ε was small enough. Iterating then gives us errors of order ε4, ε8, ε16, etc., doubling the precision at each step; and soon we reach enough precision to recover the rational coordinates of x*.

In practice we won't actually compute F’(x*), because F will typically be quite complicated (in our example there are rational functions of moderately high degree and a square root), making the formulas for its partial derivatives horrendous. Fortunately it's good enough to approximate them by the “difference quotient”: for each unit vector ej, replace the column F’(x0) ej of F’(x0) by ε−1(F(x0+ej)−F(x0)). This still requires d2 function evaluations, but they're usually much simpler functions than the entries of the formal derivative of F, and we're now spared the chore of working out this formal derivative.

But we still don't know how to find a close enough initial approximation x0 — and indeed it's hard to estimate just how close we must be since we have no handle on how close F’(x*) and its nearby values F’(x) are to being singular, and how fast F changes near x*. We often circumvent this problem by remembering a ubiquitous slogan of modern number theory: treat all completions of a number field on an equal footing to the extent possible. Here instead of the Archimedean completion R we'll work with a p-adic completion Qp, where numerical analysis is much simpler thanks to the non-Archimedean norm. Better yet, finding the initial approximation usually reduces to an exhaustive search mod p, which is often feasible past the range of Gröbner-basis methods (though this method too must fail eventually, being exponential in the number of variables). As long as F’(x*) is not just invertible but invertible mod p, Newton will work starting from (an arbitrary lift to Zp of) x* mod p, so that once we've found the solution mod p by exhaustive search, we approximate x* to p-adic precision 2N in N Newton steps.

Warning: even though P had quite small coefficients (absolute value at most 12), the weighted projective coordinates (c2m−2 : c2m−3 : c2m−4 : … : c1 : c0) are more complicated, e.g. even scaling ci by 4mi leaves us with (84 : 176 : 2366 : 13536 : 26884 : 218864 : 268777); and we cannot solve for individual ci, only for weight-zero expressions like c8 / (c2c6), which are more complicated yet. Fortunately it's easy to get enough p-adic precision. In hairier settings, we might first recognize a relatively simple weight-zero expression, which would give us a start on choosing a good scaling (in our illustrative example, c23 / (c32) = 9261/484 = 213/222 suggests (c2, c3) = (21, 22) as a good starting normalization). At the end it can still be something of an art to massage P to a more appealing form (12 vs. 268777).

As promised, here are the polynomials for m=1,2,3. They all have imprimitive Galois groups. In general if the Galois group of a cover CP1 is imprimitive then the cover factors as CC0P1. If the original cover was ramified only above 0, 1, ∞ then the same is true of C0P1, and this Belyi map may be of interest in its own right. In our setting the other map CC0 can always be taken to be the squaring or cubing map, which is possible if m+1 a multiple of 2 or 3 respectively.
• For m=1, the permutations generate a subgroup of S6 that fixes a partition of the six objects into three pairs. On the polynomial side, we can normalize the quadratic P to be monic with no linear coefficient, say P(x) = x2 + 2a for some nonzero a. Then Q(x) = x3 + 3ax and P3Q2 = 3a2x2 + 8a3. All these solutions are equivalent over C, but the equivalence class over Q depends on a mod (Q*)2. The three pairs permuted by the Galois group are pairs x} of roots of P3/Q2 = t. Writing P3/Q2 in terms of X:=x2 yields the degree-3 Belyi function (X+2a)3 / (X (X+2a)2) with cycle structures 3/21/21, equivalent to the cover of modular curves X0(2) → X(1).
• For m=2, the permutations fix a partition of 12 objects into four triples. Once we normalize the quartic polynomial P by translating x to kill the x3 coefficient, the quadratic and constant coefficients must vanish as well (check this directly!), leaving a polynomial we can write as a multiple of P(x) = x4 + 4ax. Then Q(x) = x6 + 6ax3 + 6a2 and P3Q2 = −(8a3x3 + 36a4). All these solutions are cubic twists of each other. Writing P3/Q2 as a function of x3 yields a degree-4 Belyi function with cycle structures 31/31/22, equivalent to the cover of modular curves X0(3) → X(1).
• Finally, for m=3, the group fixes a partition of the 6m=18 objects into nine pairs. The polynomisls were obtained by Birch in 1961: up to twist and scaling, P(x) = 36x6 + 24x4 + 10x2 + 1,   Q(x) = 216x9 + 216x7 + 126x5 + 35x3 + 21x/4, and P3Q2 = 9x4/2 + 39x2/16 + 1. Writing P3/Q2 as a function of x3 yields a degree-9 Belyi function with cycle structures 22221/333/711; since the corresponding permutations have orders 2, 3, 7, the Galois closure is a Hurwitz curve. It turns out to be the Fricke-Macbeath curve of genus 7, second-smallest after the Klein quartic, with 504 automorphisms that constitute the simple group SL2(F8). [While the Fricke-Macbeath curve can be defined over Q, the automorphisms can only be defined over the cyclic cubic extension Q(2 cos(2π/7)); the group Gal(Q(2 cos(2π/7)) /Q) acts on SL2(F8) via Aut(F8), giving a Galois extension of Q(t) with group ΣL2(F8) = SL2(F8) : Aut(F8).]


Wednesday, Oct. 3: Multivariate p-adic Newton, cont'd

Q: what if the coefficients we solve for weren't rational? Remember we've seen a few examples over fields like Q((−7)1/2), Q((−11)1/2), and Q(21/3).

A0: If we know the quadratic imaginary field in advance, just work over C rather than R, and use continued fractions to approximate the real and imaginary parts separately. If the field is unknown, but still quadratic imaginary, we can recover the real part and the square of the imaginary part from a close enough approximation, and then we're done.

BUT that doesn't help us for real quadratic or more complicated irrationalities…

(Added Oct.10) A1: OK, so find all the algebraic conjugates — which in effect is what we already did in the imaginary quadratic case (when an approximation to c automatically gives us an approximation to its complex conjugate). Then their elementary symmetric functions are rational numbers, and we've reduced to a previous problem. This is useful in other contexts too, as for computing “CM points on modular curves”, such as j-invariants of elliptic curves with complex multiplication (CM). For example, consider the j-invariant of an elliptic curve with CM by Z[(−58)1/2]. This is known to be an algebraic integer, here of degree 2 (the class number of Z[(−58)1/2]). Using the classical series j = q−1 + 744 + 196884q + … (or an existing implementation, e.g. ellj(sqrt(-58)) in gp), we find that one conjugate is j0 = 604729957825300085503.99999217152685675585…, but this is still not enough accuracy to recover the quadratic equation satisfied by j0. We could double the precision and try again, but it's easier to use the algebraic conjugate ellj(sqrt(-29/2)), which we compute as j1 = 24591258496.000007828473…: we now recognize the first elementary symmetric function j0+j1 as the integer 604729957849891344000 (NB we knew in advance that it's in Z, so there's no question of whether our precision is adequate), and that lets us bootstrap the accuracy of j1 to compute the product j0j1 as 14871070713157137145512000000000 [which happily factors smoothly as it should by Gross-Zagier: 212365953314931733], and thus solve for both CM values as 432000 (1399837865393267 ± 291/2 259943365786104). [Alternatively, knowing a bit more of the CM theory we could have predicted that j0j1 is a multiple of 291/2, or at any rate of the square root of some factor of 58, at which point even less precision is needed. Yes, there's good reason why j0, and thus also q−1=exp(581/2π) and j1, are individually much closer to integers than one would expect by chance: the pair (j0, j1) constitutes an integral CM point on the modular curve X0(2)/w2.]

BUT we're actually going to work p-adically, and then it's usually feasible to approximate one conjugate (see below) but not all of them unless the algebraic degree is very small (or we get very lucky).

We need a more powerful technique.

Overview of lattice basis reduction, accomplished in polynomial time by the Lenstra-Lenstra-Lovász algorithm, which for us is a tool for finding integer relations and similar Diophantine-approximation tasks. For us, we need it to recognize an algebraic number c of degree at most d from a good approximation, which is tantamount to finding an integer relation in 1, c, c2, …, cd, or in c and 1, a, a2, …, ad−1 if we already know that c is in the field generated by the algebraic number a of degree d. These algorithms are usually presented over R, but (at least in their incarnation as lattice-reduction techniques) work equally well (i.e. via Euclidean lattices of comparable parameters) over Qp, and indeed gp's lindep and algdep work for p-adic as well as real (or complex) numbers. If we know approximations to more than one algebraic conjugate of c, say δ conjugates, then we can use them all together, looking for integer relations on vectors of length δ. (The degree d can't be too large, but if it's so large that we cannot find c with a few thousand digits of precision then we usually didn't care much about this c as an algebraic number in the first place…)

Lattice reduction in dimension >2 can give rise to useful practical improvements even for problems that we already know how to solve using 2-dimensional lattice reduction (a.k.a. Euclid). For example, if we're trying to recover a point (c0 : c1 : c2 : … : ck) of height at most H in (unweighted) projective space Pk(Q), we need to know the point to within about 1 / H2 to apply continued fractions to (say) each ratio ci / c0, but in fact an H−(k+1)/k approximation suffices if we use all the ci together and apply LLL in dimension k+1. [NB halving the necessary precision can gain a factor of about 4 in the computing time. See my ANTS-4 paper for further improvements when we know a proper subvariety of Pk(Q) that contains our point.] Likewise for an element of (say) Q(i), where the real and imaginary parts usually have the same denominator to within a small factor; indeed in this case it's even better to search for a representation (a+bi) / (c+di) with a, b, c, d small.

Finally (and you may have been wondering about this for a while now), how do we know that our solution will have even one conjugate defined over Qp? Well in general we don't, and can claim only a technique that often works well in practice, not a provably efficient algorithm. (For “provably efficient” we'd also need an upper bound on the height of the solution.) Still, we expect — based first on a naïve probabilistic argument, and then on its corroboration by Čebotarev's density theorem — that for random p the expected number of conjugates defined over Qp should equal 1, so (barring bad luck) we should find one before p gets so large that we run out of patience or computing time. In any case, once we've found an algebraic solution by hook or by crook we can prove it is correct by simply checking that the desired identity holds — at least if an identity is all we're looking for, with no additional condition like the identification of G when it is not determined uniquely by the cycle structures and other available data.


Wednesday, Oct. 10: positive- [usually 1-]dimensional families

So far we've developed some techniques for finding (maps described by) isolated identities. But there's also considerable interest in identities that vary in positive-dimensional families. Our techniques adapt at least to low- (preferably 1-)dimensional families; some of the techniques we'll develop later this term also lend themselves to families of higher dimension.

Two examples of problems that naturally lead to a one-dimensional family of identities: covers of P1 branched at four points with a varying cross-ratio, or pairs E, E of elliptic curves related by a cyclic isogeny of degree N. The first example naturally generalizes to d-dimensional families of covers branched at d+3 points (and likewise covers of higher-genus curves, e.g. another kind of one-dimensional family involves covers of an elliptic curve branched above only one point). The d-dimensional moduli space of such a family need not be rational, and may be of interest itself. Indeed the moduli curves X0(N) of cyclic N-isogenies are a special case: we may assume (by composing by translation in the group law) that the isogeny E’→E takes the origin of E to the origin of E; then it is known that the isogeny commutes with the multiplication-by-(−1) maps on the two curves, and thus descends to a map E’/{±1}→E/{±1}. But that's a rational map P1P1 branched above the four ramified points of EE/{±1} (coming from the 2-torsion of E), whose Galois group is dihedral of order 2N (semidirect product of Z/NZ with {±1}), and conversely such a four-point cover lifts to a cyclic N-isogeny between elliptic curves. If N is odd, say N=2k+1, then each of the four monodromy generators has cycle structure 1·2k. (This modular curve also comes naturally equipped with a Belyi map j/1728, of degree roughly N — the exact formula is N Πp|N(1+p−1), the product running over prime factors of N without multiplicity.)

Now when we have an isolated identity we can try to find it by using Newton to bootstrap from an approximate solution. That doesn't work in a positive-dimensional family: it might actually be easier to find an initial approximation, but with fewer equations than unknowns it doesn't make sense to speak of to define the exact solution that it approximates. Even if we do know a special exact solution (e.g. the 13-isogeny 3+2i from the elliptic curve y2 = x3x to itself), that's not enough to describe the full family.

[...]


Monday, Oct. 15: Curves of genus 0 through 5

So far we've made sure that all our Belyi covers are rational curves; but that's not always the case, nor the only interesting case. Before giving some examples of Belyi maps on non-rational curves, we need to give (or recall) enough of a description of such curves to understand the form of explicit equations defining the curves and rational functions on them. We'll stop at genus 5, which is the last case that a generic curve is a complete intersection in projective space, namely the zero-locus of a three-dimensional space of quadrics (homogeneous polynomials of degree 2) in P4. For genus 4, it's the intersection of a quadric and a cubic in P3; for genus 3, a quartic curve in P2. In each of these cases the only non-generic exceptions are hyperelliptic curves; in genus 2 all curves are hyperelliptic. It's actually the more familiar cases of genus 0 and 1 that can get tricky if we want to work over fields like Q or (for families of curves) C(t) that are not algebraically closed. [This and most of what follows is standard (neo)classical algebraic geometry; standard references include several books co-authored by Joe Harris, and at Harvard it may be even more convenient to ask a student of Joe Harris. ]

genus 0: over C it's just the projective line (a.k.a. Riemann sphere), but over a field that's not algebraically closed (nor finite) even genus-zero curves needn't be trivial. Such a curve is always a smooth conic in P2 (embedded by the space Γ(−K) of anticanonical sections, which has dimension 3 by Riemann-Roch); thus the curve is rational iff it has a rational point ("if" by the usual slope parametrization, and the converse is trivial). More generally a genus-zero curve is rational iff it has a divisor D of odd degree: "if" because some D+cK has degree 1, and is effective by Riemann-Roch; and again "only if" is trivial. All our genus-zero covers so far came with such a D, indeed a distinguished point, which could be described as the unique multiplicity-m preimage of t for some m≥1 and t in {0,1,∞} (being careful that we're not in a case where we've had to make two or all three branch points algebraic conjugates). But that need not be the case in general. For example, there's a unique action defined over Q of the symmetric group S4 on a curve of genus zero; but the curve cannot be rational over Q, or even over R, because then (by the usual averaging argument) S4 would be contained in O2(R)/{±1}, and thus would contain a cyclic group with index at most 2. [This could also be checked by calculating the Schur indicator of the 2-dimensional irreducible representation of the relevant central extension 2S4, or indeed of its subgroup isomorphic with the 8-element quaternion group.] So the genus-zero curve must be "pointless"; explicitly it is the conic x2+y2+z2=0, with S4 acting by signed coordinate permutations. [... quaternion algebras like Hurwitz {2,∞}, Br2, ...]

genus 1: again, over an algebraically closed field such as C we have a familiar picture, this time an elliptic curve C, since there must be a rational point P0. More generally, any divisor D of positive degree is still effective by Riemann-Roch, and if deg(D)=1 then D∼(P0) for some rational point P0. We can then use the sections of 3P0 to embed C in P2 as a cubic in Weierstrass form y2 + a1xy + a3y = x3 + a2x2 + a4x + a6, calculating the coefficients ai by comparing Laurent expansions about P0 as usual. Sometimes (especially when C arises as a modular curve such as X0(11)) it is more convenient to start from a degree-2 function x on C and a holomorphic differential ω, and then set z = dx/ω, which is anti-invariant under the involution of C satisfying x ι = x and is regular away from the poles of x, and thus satisfies an equation z2 = P(x) for some polynomial P of degree 3 or 4 according as x has one double pole or two simple poles. On modular forms, the q-expansions of modular forms often give a convenient handle on rational functions and holomorphic functions. Here we may tell Sage:

ModularForms(11,prec=14).echelon_basis()
to get the q-expansions of a basis of the modular forms on X0(11) to within O(q14), and get the result
[
1 + 12*q^2 + 12*q^3 + 12*q^4 + 12*q^5 + 24*q^6 + 24*q^7 + 36*q^8 + 36*q^9 + 48*q^10 + 72*q^12 + 24*q^13 + O(q^14),
q - 2*q^2 - q^3 + 2*q^4 + q^5 + 2*q^6 - 2*q^7 - 2*q^9 - 2*q^10 + q^11 - 2*q^12 + 4*q^13 + O(q^14)
]
in which the second generator, call it φ1, is a cusp form and thus yields a holomorphic differential ω = φ1dq/q. The ratio φ0/φ1 then gives us a rational function x = q−1 + 2 + 17q + 46q2 + 116q3 + 252q4 + 533q5 + 1034q6 + 1961q7 + 3540q8 + 6253q9 + 10654q10 + 17897q11 + O(q12), and we compute z = q (dx/dq) / φ1 = −q−2 −2q−1 + 12 + 116q + 597q2 + 2298q3 + ···. Comparing coefficients (or doing something like
Z=subst(z^2,q,serreverse(1/x)); subst(truncate(Z),q,1/X)
in gp) then gives us the equation z2 = x4 − 4x3 − 88x2 − 300x − 304 = (x+4) (x3−8x2−56x−76) for X0(11). We can then project the rational zero x = −4 to infinity and normalize the leading coefficient of the resulting cubic (i.e., replace x by (−11/x)−4 and absorb the factor (22/x2)2 into z2) to get a Weierstrass model y2 = x3 + 14x2 + 55x + 121/4; the standard form y2 + y = x3x2 − 10x − 20 is then recovered by translating x by −5 and y by ½. (We'll hopefully come back to questions such as where the q-expansions of φ0 and φ1 come from, and how the curve X0(11) actually parametrizes 11-isogenies.)

But what if there is no divisor of degree 1? Any genus-1 curve C comes with a divisor D of some positive degree d, and Γ(D) has dimension d, giving a map to Pd−1 that is an embedding as an “elliptic normal curve ” for d≥3 (for d=2 it's a 2:1 map, giving a model of C of the form y2 = quartic(x)). As with curves of increasing genus, elliptic normal curves of increasing degree d get increasingly complicated, and are complete intersections only for the smallest few cases; here these are d=3 (a plane cubic) and d=4 (the intersection of two quadrics). Fortunately the d≤4 cases are most if not all the ones we'll encounter. Still even those cases are much more complicated than the plane conics that are as tricky as genus-zero curves get. Two famous examples already for d=3 are x3+py3+p2z3=0 for p prime (no p-adic solution) and 3x3+4y3+5z3=0 (Selmer's example of a cubic with no rational points even though there is no local obstruction). [To be continued Wednesday…]

genus 2: Once g≥1, the curve C is of general type (the canonical divisor is positive). The space of holomorphic differentials gives a map, the canonical map, to Pg−1, which is either an embedding or a 2:1 map. In the latter case C is hyperelliptic, which is the case for all curves of genus 2 but only for special curves once g≥3. At least if g is even (or if the ground field is algebraically closed, or finite, or more generally is a field with trivial Br[2]), a hyperelliptic curve of genus g has the form y2 = P(x) where P is a polynomial of degree 2g+2 without repeated roots. For g=2, the degree-2 function x on C is simply the ratio, say ω1/ω2, of generators of the 2-dimensional space of holomorphic forms. The hyperelliptic involution ι then takes any (x, y) to (x, −y), and takes each ωi to ωi (as can be seen either from the explicit formulas (ω1, ω2) = (x dx/y, dx/y) or by observing that a ι-invariant holomorphic differentials descends to a holomorphic differential on P1, and is thus zero). Thus we can construct y as dx / ω2, and then find the hyperelliptic defining equation for C that equates y2 with a sextic polynomial in x (or a quintic if there's a rational Weierstrass point and ω2 was chosen to vanish at that point).

Again we give an example of a modular curve, this time X1(13), which is the first X1(N) of genus >1. [For N≤12 the curve is rational, except for N=11 when it is a curve of genus 1 that you should by now know how to compute; we shall see later this term how to describe the elliptic curves with N-torsion points that are parametrized by these curves X1(N).] This time we need the two-dimensional space of cuspforms for Γ1(13), which in Sage we can get from

CuspForms(Gamma1(13),prec=14).echelon_basis()
to get
[
q - 4*q^3 - q^4 + 3*q^5 + 6*q^6 - 3*q^8 + q^9 - 6*q^10 - 2*q^12 + 2*q^13 + O(q^14),
q^2 - 2*q^3 - q^4 + 2*q^5 + 2*q^6 - 2*q^8 + q^9 - 3*q^10 + 3*q^13 + O(q^14)
]
We call these φ1 and φ2, and set ω1 = φ1 dq/q and ω2 = φ2 dq/q so that x = ω1/ω2 has a pole at the cusp q = 0. It's more convenient to subtract 1, taking x = q−1 + 1 + q + q2 + q4q6 − 2q8 + O(q11) rather than q−1 + 2 + q + q2 + q4q6 − 2q8 + O(q11). This doesn't change y, but simplifies the equation from y2 = x6 − 8x5 + 26x4 − 46x3 + 53x2 − 42x + 17 to y2 = x6 − 2x5 + x4 − 2x3 + 6x2 − 4x + 1, which is the well(?)-known formula up to changing x to −x (which makes all the signs positive).

genus 3 and higher: Here if C is not hyperelliptic then the holomorphic differentials (= sections of the canonical divisor) embed C as a curve of degree 2g−2 in Pg−1. For g=3,4,5 this curve is a complete intersection, as described at the beginning of today's notes; given generators for the holomorphic differentials, and thus homogeneous coordinates on Pg−1, the defining equations are linear relations in the monomials of appropriate degree, and can be found by linear algebra. [Next time we'll see what to do if C turns out to be hyperelliptic.]


Wednesday, Oct. 17: Hyperelliptic curves; curves of genus 1; a Weil-Belyi function on an elliptic curve (and parametrizing 5-torsion etc.)

A curve C of genus g>1 is hyperelliptic if and only if the canonical map CPg−1 is not embedding; in this case the map is 2:1 to its image, which is a curve of genus 0 and degree g−1, call it C0. Thus if g is even the quotient curve C0 is rational (the hyperplane section is a divisor of odd degree), and then C has the familiar form y2 = P(x) for some polynomial P of degree 2g+2 without repeated roots. The holomorphic differentials are then A(x) dx/y where A is an arbitrary polynomial of degree at most g−1. If g is odd, C0 could be a conic Q(x0, x1, x2) = 0, and then C can be written as the double cover y2 = P(x0, x1, x2) for some homogeneous polynomial P of degree g+1 such that the curve P = 0 meets the conic in 2g+2 distinct points.

Given just C and the holomorphic differentials, we can recognize the hyperelliptic curves as those for which the differentials satisfy too many quadratic relations, (g−1)(g−2) / 2 as opposed to the generic (g−2)(g−3) / 2. If we also have a rational point p on C then we can easily generalize our approach to genus-2 curves to find a hyperelliptic equation for C. Choose a basis for the holomorphic differentials whose i-th element ωi (1 ≤ i ≤ g) vanishes to order exactly i−1 at p. We'll use only the last two basis elements, and write x = ωg−1/ωg, a degree-1 function on C0. Then the function field of C is generated by x and y = dx/ωg. We again give a modular example, this time the curve X0(41) of genus 3. (Modular curves often come with involutions, and are thus hyperelliptic much more commonly than one might expect “at random”.) Here the Sage output of CuspForms(41,prec=20).echelon_basis() is


[
q + q^4 - q^5 - 2*q^6 + 2*q^7 - 2*q^8 - 3*q^10 - 2*q^12 + 2*q^14 + 2*q^15 + 3*q^16 - 2*q^17 + 3*q^18 + 2*q^19 + O(q^20),
q^2 - 2*q^4 - q^5 + 3*q^8 + q^9 + q^10 - 2*q^11 - 2*q^12 + 2*q^13 + 2*q^14 - 4*q^16 - 2*q^18 + 2*q^19 + O(q^20),
q^3 - 2*q^4 + q^6 - q^7 + 2*q^8 + 2*q^10 - 3*q^11 - q^12 + 2*q^13 - q^14 - 2*q^15 - 2*q^18 + 3*q^19 + O(q^20)
]

(you may have surmised by now that CuspForms(41,prec=20) is actually an abbreviation for CuspForms(Gamma0(41),prec=20), which works as well). Call these φ1, φ2, φ3 respectively. There's a rational point at the cusp q = 0, and we take that for our base point p, so each ωi is the corresponding φi dq/q. Here we needn't explicitly set up a linear system to check for a quadratic relation, because such a relation must write φ1φ3φ22 as a linear combination of φ2φ3 and φ32 and we can peel off the coefficients one at a time; here we find that φ1φ3φ22 = −2φ2φ3 + O(q21), which is more then enough q-adic precision to prove that the identity holds exactly: a nonzero section of 2K has only 8 zeros with multiplicity, so at most O(q10) dq2. [What would happen if we didn't check for quadratic relations and just routinely set out to find a quartic equation satisfied by the φi?] So we take x = ω2/ω3 − 1 = φ2/φ3 − 1 = q−1 + 1 + 2q + 2q2 + 3q3 + 4q4 + 7q5 + 8q6 + 11q7 + O(q8) and y = (q dx/dq) / ω3 = −q−4 − 2q−3 − 2q−2 + q−1 + 12 + 42 q + 120 q2 + … and find the hyperelliptic equation y2 = x8 − 4x7 − 8x6 + 10x5 + 20x4 + 8x3 − 15x2 − 20x − 8 for X0(41).
As before, this octic polynomial has distinct roots modulo all primes other than 2 and factors of the level (here 41: the discriminant is −216416), reflecting the curve's good reduction at all primes not dividing the level — the bad reduction at 2 is an artifact that can be removed by “uncompleting the square”.

For a general hyperelliptic curve C of genus 3, the genus-zero quotient curve C0 might not be rational but is always given by the unique quadratic equation satisfied by the holomorphic differentials ωi. We can then choose the ratio of any two, say x = ω2/ω3, and construct an ι-anti-invariant function y = dx / ω3 (NB same denominator), which has double poles above each pole of x (= each zero of ω3) and nowhere else. Thus we can write (ω32y)2 as a homogeneous polynomial of degree 4 in the three ωi, giving a hyperelliptic equation for C.

[elliptic normal curves and their Jacobians]

Here's an example of some of the new considerations that arise when we deal with Belyi functions on curves of positive genus. We'll find the unique such function f : EP1 with cycle structures 5, 5, 221. By Riemann-Hurwitz E has genus 1, and since it has at least one obvious divisor of degree 1 (the simple preimage of the 221 point) E is an elliptic curve. It might not be clear a priori that the two quintuple points are distinguishable, but for now we put them at f = 0 and f = ∞, and choose the quintuple pole as the origin O for the group law on E. Call the quintuple zero T, so f  has divisor ( f ) = ( f )0 − ( f ) = 5(T) − 5(O), and write the divisor ( f )1 as P+2D where P has degree 1 and D has degree 2. Now in genus zero any two divisors of the same degree are linearly equivalent, but here 5(T) − 5(O) ∼ 0 is a nontrivial condition, telling us that T is a 5-torsion point on E, and f  is the associated Weil function. [T cannot be a trivial torsion point, because f (T) ≠ f (O) implies T ≠ O. In general if n(T) ∼ n(O) then T is m-torsion for some factor m of n, and if 1<m<n then the function with divisor n(T) − n(O) is an imprimitive cover of P1, being an (n/m)-th power of a Weil function of degree m. Here n=5 is prime so there is no imprimitive case to consider.]

Curiously we can also predict the simple preimage P of the third branch point 1: it must be −2T. This exploits a trick that must have been rediscovered many times, though to my surprise I've found no explicit mention of it earlier than my ABC⇒Mordell paper of 1991. The idea (which applies to branched covers of any positive genus) is that once we know all the ramification of f, we know the divisor of its differential df, and the fact that this divisor is canonical gives us additional information (an extra equation in the Jacobian) on the preimages of the branch points. It is more convenient to work with the logarithmic differential df / f, which has a simple pole at each zero or pole of f, and a zero of multiplicity m−1 wherever f = t has a zero of multiplicity m for some t other than 0 and ∞. Here this means the logarithmic differential has divisor D−(O)−(T), so D ∼ (O)+(T); since also (P)+2D ∼ 5(O), we can eliminate D to find (P)+2(T) ∼ 3(O), whence P = −2T in the group law, as claimed. Thus we can start from any Weil function w with divisor 5(T)−5(O) (i.e. any multiple of f ) and recover f  as w / w(−2T).

The next step is to parametrize pairs (E, T) where E is an elliptic curve and T is a 5-torsion point on E (NB: this is much better than starting from a random E and then choosing one of its 24 nontrivial 5-torsion points). The following procedure for parametrizing elliptic curves with a torsion point of low order goes at least back to Tate (see the formula for a general curve with a 7-torsion point the end of §7 of his paper The Arithmetic of Elliptic Curves (Inventiones Math. 1974)). Suppose E has extended Weierstrass form with coefficients (a1, a2, a3, a4, a6), that is

y2 + a1xy + a3y = x3 + a2x2 + a4x + a6.

Let T be any point other than the group-law origin O, and translate x and y to put T at (0,0); this makes a6=0. The tangent to E at T has slope a4/a3, so T is 2-torsion iff a3=0. Otherwise, we may translate y by (a4/a3) x, keeping T at (0,0) but making a4 = 0 (equivalently: making the tangent to E at P horizontal). At this point we've used up all the available changes of variable except multiplying (x, y) by (λ2, λ3) for some nonzero λ, which multiplies each ai by λi; thus we have parametrized the space of elliptic curves E together with a nonzero, non-2-torsion rational point T by an open set in (1, 2, 3)-weighted projective space — not quite the entire projective space, because we must exclude (a1 : a2 : a3) that make E singular. In particular, a3 must not vanish lest E be singular at T. Moreover, T is 3-torsion iff the (horizontal) tangent at T meets E with multiplicity 3 at T, which is the case iff a2 = 0. Hence if T is not 3-torsion then neither a2 nor a3 is zero, so we may choose the unique λ that makes a2 = a3 = a for some nonzero a.

Now it's easy to describe, for small N >3, the pairs (a1, a) that make T an N-torsion point. We illustrate with the case N=5 that motivated this excursion. We write the condition 5T = 0 as 3T = −2T, which (since T ≠ O) is equivalent to the condition that 2T and 3T have the same x coordinate. [These coordinates can be computed in gp with   ellpow([a1,a,a,0,0], [0,0], 2)   and   ellpow([a1,a,a,0,0], [0,0], 3)   (look, Ma, no ellinit!), though here the group-law computations are simple enough to be done unaided.] We find that 2T = (−a, a1aa) and 3T = (1−a1, a1a−1), so 5T = 0 iff a1 = a+1. Therefore the general 5-torsion point on an elliptic curve is equivalent to the point (0, 0) on the curve with coefficients (a+1, a, a, 0, 0). Exercise: Find the corresponding formulas for 4T = 0, 6T = 0, and (recovering the formula in Tate's paper) 7T = 0.

Next step is to find a Weil function w. Since w is a section of 5(O), it is a linear combination of xy, x2, y, x, and 1. There's a one-dimensional space of combinations that vanish to order at least 4 at T, and then the fifth zero is automatically at T as well because T is 5-torsion. One way to find these combinations is to expand y in a Taylor series about x=0 near T; we find y = x2x3 + x4 + (a−1−1) x5 + O(x6) , so w = x2 − y − xy works. An alternative approach, which can be used even for Weil functions of really high degree, is to write w as a product of powers of linear forms. Here the functions x and y on E have divisors (T) + (−T) − 2(O) and 2(T) + (−2T) − 3(O) respectively, so xy2 has divisor 5(T) + (−T) + 2(−2T) − 8(O), and we need only divide by the equation of the line through T and −2T, which is tangent to E at −2T. This gives w = xy2 / (x+y+a). Rationalizing the denominator and removing a common factor y simplifies this to (x+1)y − x2, same up to sign as our previous answer.

We are finally ready to find the value of a, and thus the curve E, for which f  = w / w(−2T) = (x2 − y − xy) / a2 is a (5, 5, 221) Belyi function. There are several ways to go about this. A simple one is to locate the x-coordinate of the zeros of of f −1 by computing the resulting with respect of y of f −1 with the defining equation of the curve. This yields a quintic in x, one of whose roots is x(−2T) = −a, and the other four must come in two equal pairs; that is, the quintic must be c (x+a) times the square of a quadratic polynomial, for some constant c. We find that the resultant is −(x+a) (x4ax3 + a2x2 + 3a2x + a2a3), so the last factor must be a square. As usual we solve for a by comparing with the Laurent expansion of the square root about x = ∞ (which here is x2ax/2 + 3a2/8 + O(1/x)). We find that a = −8, and check that this indeed makes f a Belyi function with the desired cycle structures.

The standard model of E has coordinates (a1, a2, a3, a4, a6) = (1, 1, 1, 22, −9) and can be obtained for instance by telling gp E = [-7,-8,-8,0,0]; R = ellglobalred(ellinit(E)); ellchangecurve(E, R[2]) which also shows that the curve has conductor 50, small enough that it already appears in Tingley's 40-year-old Antwerp Tables (which include all curves of conductor at most 200). I usually advocate against forcing equations into such forms, which can make the equations more unwieldy and hide features such as the point (0, 0); but the ellglobalred form does have the advantage of being a canonical reduced form, which one can use to tell whether two curves are isomorphic or to compile tables for future reference. [Once the genus exceeds 1 it can be much harder to detect and find isomorphisms between two given curves.] Here we learn from the table that E has not just a rational 5-torsion point, but also a rational 3-isogeny; indeed it was already known in 1972 that this curve and the 3-isogenous one with coefficients (1, 1, 1, −3, 1) are the only elliptic curves over Q with both a rational 5-torsion point and a rational 3-isogeny. These curves' appearance here is related with the fact that the Galois closure of our Belyi function is the Bring curve of genus 4 with automorphism group S5, which has maximal size for a genus-4 curve in characteristic zero; I hope I'll have the time to say more about this in a few weeks.


Monday, Oct. 22: Overview of complex reflection groups and their invariant rings (which give rise to highly symmetric curves)

[motivation: Klein, Fermat etc., and Bring curves]

Recall that a complex reflection is a linear transformation g of a finite-dimensional projective space V such that g is of finite order and fixes a subspace of dimension g−1 (i.e. 1−g has rank 1); equivalently, det(g) is a root of unity ζ≠1, and g has matrix diag(ζ, 1, 1, …, 1) with respect to some basis of V (a g-eigenbasis as usual). A complex reflection group is then a finite subgroup G of GLn(C) that is generated by the complex reflections it contains. Examples include: the Euclidean reflection groups An, BCn, Dn, En that arise in the theory of Lie groups; any finite subgroup of GL1(C)=C*; and the group G1×G2 of block-diagonal square matrices of order n1+n2, if each Gi is a complex reflection group in GLni(C).

A subgroup G of GLn(C) acts on the polynomial ring C[x1, x2, …, xn] by linear substitutions. G is said to have a polynomial invariant ring if the subring (C[x1, x2, …, xn])G is C1, Φ2, …, Φm] for some algebraically independent homogeneous polynomials Φi; if G is finite then necessarily m = n (and then |G| = Πi deg(Φi); if G is infinite then m<n, e.g. if G is GLn(C) itself then m is zero!) Groups with polynomial invariant rings include the finite subgroups μm of GL1(C), with Φ1=xm; a block-diagonal group G1×G2 as above, provided each Gi itself has polynomial invariant ring; and the symmetric group Sn of n×n permutation matrices, for which Φi can be the i-th elementary symmetric function or power sum of the x's (indeed symmetric polynomials are the paradigm for a reflection group; BTW this example illustrates that the generators Φi are generally far from unique — though a few may be determined up to scalar multiple, and all the degrees remain the same — and the most convenient choice may depend on the application, in which case we also want to know the formulas for going between different generating sets).

The overlap between the two lists of examples is no coincidence:

Theorem (Shephard-Todd 1954): A finite subgroup G of GLn(C) has polynomial invariant ring if and only if G is a complex reflection group.

Part of the original proof (G.C. Shephard and J.A. Todd, “Finite unitary reflection groups”, Canad. J. Math. 6 (1954), 274–304) required obtaining the full list of irreducible reflection group (Table VII on p.301 of the Shephard-Todd paper). A more conceptual proof followed (Chevalley, later extended by Serre), but the classification of complex reflection groups remains a very useful tool and source for highly symmetric algebraic curves and other geometric structures.

There are three infinite families of irreducible complex reflection groups, plus 34 exceptional cases, in dimensions ranging from 2 to 8. Most (19) of the 34 exceptional complex reflection groups are in dimension 2; each is a variation of one of the three special groups A4, S4, and A5 of rotations of the Riemann sphere). The counts in dimensions 3, 4, 5, 6, 7, 8 are respectively 5, 5, 1, 2, 1, 1. We begin with the infinite families, then describe the exceptional reflection groups and their invariant rings.

We have encountered already most of the infinite families. The simplest one (though it happens to be listed third in the Shephard-Todd list) consists of the cyclic groups μm in GL1(C), with the invariant ring generated by Φ1 = xm.

The next series (and the first in the Shephard-Todd table) is the symmetric group Sn+1 acting on an n-dimensional space, the zero-sum hyperplane of the permutation representation. [...]

The final infinite series generalizes the group that arises in the symmetries of Fermat curves. For any n>1 and m>1, the μm-signed n×n permutation matrices constitute a complex reflection group G(m, 1, n) of order n!mn with invariant degrees m, 2m, …, n m, because the invariant polynomials are symmetric functions in the m-th powers of the coordinate variables (we exclude n=1, which we have seen already, and m=1, which yields the not-quite-irreducible group Sn of permutation matrices). We obtain a homomorphism Gμm by mapping each matrix in G to the product of its nonzero elements. For each factor q of m, the preimage of μq is again a complex reflection group, which Shephard and Todd call G(m, p, n) where pq=m. This group has order n!mn/p, and (again assuming m>1) acts irreducibly except for G(2,2,2) which is a Klein 4-group. The invariant degrees are m, 2m, …, (n−1)m, and nq. Recall that the invariant polynomials for G(m, 1, n) are the symmetric functions in the m-th powers of the coordinate variables; if we generate these by the elementary symmetric functions, then Φn is the m-th power of the product of the variables, and we get generators of the G(m, p, n) invariants by replacing this m-th power by the q-th power of the same product.

The real (a.k.a. Euclidean) reflection groups in these families are: μ2 = {±1} = A1 (and trivially μ1 = {1}) in GL1(R); the symmetric group Sn+1 = An in GLn(R); the groups G(2, 1, n) = BCn and G(2, 2, n) = Dn, again in GLn(R); and less obviously (because this requires a complex change of basis) G(m, m, 2) which is the 2m-element dihedral group in GL2(R).

It is not quite correct to say the three infinite families plus the 34 exceptional cases fully account for the irreducible complex reflection groups. There's also the missing group G(2,2,2), and several coincidences where the same small group appears more than once on the list. This is already familiar from the classification of simple Lie groups, where in addition to the exceptional groups G2, F4, E6, E7, E8 we have the missing D2 (which is not simple, decomposing as two A1's) and the coincidence A3=D3. Since Lie groups correspond to Euclidean reflection groups, these irregularities appear in our list: the 34 exceptions include F4, E6, E7, E8; the reducible D2 is the reducible G(2,2,2); and A3=D3 is the coincidence of S4 with G(2,2,3). [This last one is related with the exact sequence 1→(Z/2Z)2S4S3→1; note how the invariant degrees 2,3,4 manage to match up!] There are two further coincidences involving the dihedral groups: G(3,3,2) is also S3, and G(4,4,2) is also G(2,1,2). These correspond to the Lie groups A2 and B2; note that G2 is not exceptional as a reflection group, because like A2 and BC2 it's just one of the dihedral groups (though distinguished by being “crystallographic”).

[In positive characteristic, it is known that a representation of a finite group that has polynomial invariant ring must still be generated by g such that 1−g has rank 1, but there are several new complications. For one, such g could also be a “transvection”, with all eigenvalues 1 but one 2×2 Jordan block. Dickson already gave the example of SLn(Fq), which has no reflections (why?) but is generated by transvections, and has an invariant polynomial ring with generators of degrees qn − qi for 0 < i < n and (qn − 1) / (q − 1). Dickson also showed that GLn(Fq) has invariant polynomial ring with degrees qnqi (0 ≤ i < n). The full list of such groups is known, but this is a more recent result, and here it turns out that not all reflection/transvection groups actually have polynomial invariant rings.]


Wednesday, Oct. 24: The Hilbert-Molien series of a group representation; A4, S4, A5 acting on the Riemann sphere

Recall that if U is a graded vector space with degrees 0, 1, 2, …, and finite-dimensional graded pieces U0, U1, U2, …, then the Hilbert series for U is the generating function Σn≥0 dim(Un) · t n.   If U = (C[V])G, the algebra of invariants for a finite group G acting on a finite-dimensional space V, then this generating function is called the “(Hilbert-)Molien series” of the representation. If the invariant ring is polynomial with generators of degrees di then the series is 1 / Πi (1 − qdi). In characteristic zero, the fixed subspace of any finite-dimensional representation of G has dimension |G|−1 ΣgG tr(g). This yields the formula |G|−1 ΣgG (1 / det(1− tg)) for the Molien series. (This formula figures in one approach to the classification of complex reflection groups, which is one way that the problem becomes much harder in positive characteristic.) For example, if V = C and G = μm we get m−1Σζm=1 1 / (1− ζ t), which simplifies to 1 / (1− t m) as expected; if V = C2 and G = {±1}, we get   (1/2) [(1−t)−2 + (1+t)−2] = (1+t 2) / (1−t 2)2, so (as we already knew) the invariant ring is not polynomial, though here it's still a “complete intersection ring”, with three generators x2, xy2, y2 in degree 2 and one relation x2y2=(xy)2 in degree 4, corresponding to the factorization (1−t 4) / (1−t 2)3 of the Molien series. [...]

We next work out in some detail the identities related with exceptional finite subgroups of GL2(C), which give rise to some of beautiful mathematics (mostly classical but with various modern links) that should be [i.e. that I wish were] better known. For starters, suppose G is a finite subgroup (not necessarily a complex reflection group) of GL2(C), and let D be its normal subgroup of diagonal matrices. Recall that any complex representation of a finite group G fixes a positive-definite Hermitian pairing (obtained by averaging, a simple case of the “unitarian trick”), and thus map G to the unitary group of that pairing. [This is why Shephard and Todd can title their paper “finite unitary reflection groups” and still get a description of all finite complex reflection groups.] So here G is a subgroup of U2(C). It follows that the induced action of G0 := G / D on P1(C) is an injection into PU2(C), which is the group SO3(R) of Euclidean rotations of the Riemann sphere P1(C). Now it is known that a discrete subgroup of SO3(R) is cyclic, dihedral, or one of the three exceptional groups A4, S4, A5. The cyclic subgroups yield reducible representations, and dihedral cases yield reflection groups G(m, p, 2). We next consider the exceptional cases, in which G0 is the group of orientation-preserving symmetries of the regular tetrahedron, octahedron (dually: cube), or icosahedron (dually: dodecahedron) inscribed in the Riemann sphere.

For any finite subgroup G0 of PU2(C) (or even PGL2(C)) its preimage in SL2(C) is in the middle of a short exact sequence 1→{±1}→G1G0→1. When G0 is one of the three exceptional groups, or more generally any group containing an involution, the short exact sequence cannot split, because in SL2(C) contains no involution other than the central element −1. We next describe in each case polynomials that are invariant or at least “covariant” under the action of these groups 2A4, 2S4, 2A5. (We say P is “covariant” under an action of G when there's a homomorphism GC* such that gP = χ(g)P for all group elements g.) Note that G1 is never a reflection group, because it contains no reflections (a complex reflection cannot have determinant 1); but for most of our purposes we need only the projective action, and also once we know the covariant polynomials we can easily describe the invariant rings of each of the reflection groups with the same image in PGL2(C).

A nonzero polynomial P is covariant for G1 iff its zero divisor (which is just a finite multiset in the Riemann sphere) is invariant under G0. Now each of our G0 is the group of rotations of a regular polyhedron with N triangles meeting at each vertex, and acts freely on the Riemann sphere except for the vertices, face centers, and edge centers of the polyhedron, whose stabilizers are cyclic of order N, 3, 2 respectively. Here are the familiar counts:
  N   G0 polyhedron |G0|  V F E
  3   A4 tetrahedron 12  4  4  6
  4   S4 octahedron 24  6  8 12
  5   A5  icosahedron   60   12   20   30
Now the Euler relation E = V + F − 2 = (V−1) + (F−1) means that once we know the polynomials of degrees V and F we can obtain the third polynomial as the Jacobian determinant of the first two. (The Jacobian cannot vanish because the polynomials are algebraically independent.) Also, since our polyhedron has triangular faces we have F = 3E/2, which together with Euler's formula implies F = 2(V−2); thus we can get the polynomial of degree F as the Hessian (determinant of the matrix of second partial derivatives) of the degree-V polynomial. So it remains to find the G0-orbit of smallest size V in the Riemann sphere. In each case there is a linear relation in degree |G0| between the N-th power, cube, and square of the covariants of degree V, F, E respectively. The ratio of these powers gives the quotient map P1(C) → P1(C) / G = P1(C), with the target P1(C) arising naturally as a line in P2(C), and intersecting the three coordinate lines of that P2(C) at the three branch points of the target P1(C). In each case this quotient map can also be identified as the covering of modular curves X(N) → X(1), with the branch points of order 2, 3, and N at j = 1728 = 123,  j = 0, and j = ∞ respectively.


Monday, Oct. 31: Covariants of subgroups of GL2 related to A4 and S4

We next give explicit formulas in each case, using x and y has homogeneous coordinates and z = x/y.

N=3: We put the four vertices of the tetrahedron at z = ∞ and the cube roots of unity (note that this choice is not consistent with the usual picture of the Riemann sphere with the equator on the unit circle; we shall see that the equator ends up being |z| = 2½). Thus we may take for the first polynomial A = x3yy4. The Hessian of A, divided by −9, is B = x4 + 8xy3, with roots at z = 0, −2, and 1±(−3)½. Dividing the Jacobian derivative ∂(A,B) / ∂(x,y)by −4 yields C = x6 − 20x3y3 − 8y6, with relation 64A3B3 + C2 = 0.

G0 clearly contains the 3-cycle z→ζz where ζ is a cube root of unity. [Sorry, no \mapsto in HTML.] This 3-cycle lifts to the pair of linear substitutions (x, y) → ±(ζ−1x, ζy) in G1, which multiply A by ζ and B by ζ−1, leaving C fixed. G0 also contains a Klein 4-group, because any four-point subset of P1 determines a Klein 4-group that permutes the set freely and transitively (i.e. sharply 1-transitively, a consequence of the fact that PGL2 acts sharply 3-transitively on P1). For example, G0 contains the involution that takes 1↔∞ and ζ↔ζ−1, which is z ↔ (z+2) / (z−1). [In general, a fractional linear transformation z → (az+b) / (cz+d) is an involution iff a + d = 0, i.e. iff the trace of the corresponding 2×2 matrix vanishes.] This involution, together with z→ζz, generates G0. The involution lifts to the 4-cycles (x, y) → ±(−3)−½(x+2y, xy) in G1, which leave A, B, C all invariant. Thus A, B, C are covariants of G1 with characters that take our 3-cycle (x, y) → (ζ−1x, ζy) to ζ, ζ−1, 1 respectively. Call the first of these characters χ. We get the smallest exceptional reflection group (#4 in the Shephard-Todd table) by replacing each element g of G1 by χ(g)g. The resulting subgroup of GL2(C) is a double cover of the same G0, and is abstractly isomorphic with G1, but contains complex reflections such as (x, y) → (x, ζ−1y). Its ring of invariants is generated by B and C, with degrees 4 and 6. The other three reflection groups mapping to G0 are obtained from this one by extending the center from {±1} to μ4, μ6, and μ12; their invariant degrees are respectively (4, 12), (6, 12), and (12, 12): change C to C2,  B to B3, or both. These are Shephard and Todd's groups 6, 5[sic], and 7.

Besides the A4 cover P1P1 (a.k.a. X(3)→X(1)), these polynomials with tetrahedral symmetry, especially quartics like A and B, arise in the construction of symmetric higher-genus curves and other objects. (See below for sextics like C which have octahedral symmetry.) Consider first the elliptic curve w2 = A(x, y). For any homogeneous quartic f(x, y) with distinct roots, the Klein 4-group of symmetries of the roots lifts to the elliptic curve w2 = f (x, y), giving translation by the 2-torsion points of the curve. For f = A the curve also has a 3-cycle that is not translation by a torsion point (because it has fixed points), so we get a curve with j-invariant 0. (This is also clear from our formula for A; e.g. dehomogenizing by setting y=1 yields w2 = x3 − 1.) Likewise a quartic such as x4y4 with dihedral symmetry yields an elliptic curve w2 = f (x, y) with j-invariant 1728. Going beyond genus 1, the quartic plane curve w4 = f (x, y) has 48 symmetries, forming a reducible complex reflection group in PGL3(C) (and a dihedral  f  yields the Fermat quartic, with 96 symmetries not all of which preserve the map to the (x : y) line). Finally (for now), consider the smooth quartic surface A(x, y) = A(v, w) in P3(C). Schur observed in 1882 that the 12 symmetries of the tetrahedron yield 64 lines on this quartic, four for each symmetry plus 42=16 joining a root of A(x, y) to a root of A(v, w); this is more than the 48 of the Fermat quartic surface, though the Fermat quartic surface has more symmetries. Segre showed in 1943 that 64 lines is maximal for a smooth quartic surface over C (in small positive characteristic there can be extra lines).

The μ4 case (#6) also arises in coding theory; this is the reason I chose χ(g)g rather than χ−1(g)g, which is equivalent and has quartic invariant A rather than the “larger” B. Let K be a linear code of length n a finite field of q elements. [Normally one uses not K but C for Code, but this could get confusing here…] Recall that the (Hamming) weight enumerator WK(x, y) is the homogeneous polynomial of degree n whose xnwyw coefficient is the number of codewords of weight w (each w in [0,n]). The weight enumerator of any linear code K is related with the weight enumerator of its dual code K via the MacWilliams identity WK (x, y) = |K|−1WK(x + (q−1) y, x − y).  Suppose now that K is a “Type III code”, i.e. that q = 3 and K is self-dual. The first example with n > 0 is the “tetracode”, generated by (1, 1, 1, 0) and (1, −1, 0, 1), with weight enumerator x4 + 8xy3 (all eight nonzero words have weight 3). This looks familiar for good reason! The condition K=K implies that |K| = 3n/2 (in general a self-dual code of length n has dimension n/2) and that every word has weight divisible by 3 (compute the pairing of any word with itself). The former property, together with MacWilliams, implies that the weight enumerator WK is invariant under (x, y) ↔ 3−½(x+2y, xy); the latter, that WK is invariant under (x, y) → (x, ζy). Therefore WK is invariant under the subgroup of GL2(C) generated by these two linear transformations, which is reflection group #6 with center μ4 = {±1, ±i} (note that the scaling coefficient in the MacWilliams identity is 3−½, not (−3)−½). This yields Gleason's theorem for Type III codes: the weight enumerator is a polynomial in B = x4 + 8xy3 and C2, or equivalently in B and A3 = y3(x3y3)3. In particular, 4 | n, which was not obvious (though it can be proved by more direct means). Also, any Type III code of length 8 has weight enumerator B2, and a Type III code of length 12 with no words of weight 3 must have weight enumerator B3 − 24A3 = x12 + 264x6x6 + 440x3x9 + 24y12. It is known that such a code exists, and is unique up to isomorphism, namely the extended ternary Golay code, which is also a natural route to the sporadic Mathieu group M12, and especially its double cover 2.M12; e.g. the 132 pairs of words of weight 6 are supported on the blocks of the (5,6,12) Steiner system, and the 12 pairs of words of maximal weight form the unique Hadamard matrix of order 12.

N=4: We have two natural choices here. One is to note that the edge-centers of a regular tetrahedron are the vertices of a regular octahedron, while the vertices of the tetrahedron and its dual constitute the eight vertices of the octahedron's dual cube. We may thus use the above C and AB = x7y + 7x4y4 − 8xy7 as our sextic and octic polynomials with octahedral symmetry. (As we know, we could also construct AB as a multiple of the Hessian of C; as it happens it's the Hessian divided by −3600.) Then D = ∂(C, AB) / ∂(x,y) = x12 + 88x9y3 + 704x3y9 − 64y12 is the covariant dodecic(?), with C4 + 256(AB)3D2 = 0. [What would happen if we instead took the Hessian of AB to get a covariant polynomial of degree 2(8−2)=12?] In this picture the symmetry group S4 consists of the A4 and its composition with the involution z ↔ −2/z that switches the roots of A and B. The polynomials AB, C, and D are invariant under 2A4, but the action of 2S4 multiplies C by the nontrivial character 2S4→{±1} coming from the sign character of S4, while fixing AB and D. This can be seen directly from the action of the determinant-1 lifts of z ↔ −2/z, which are (x, y) ↔ (±2½x, ∓2−½y). It follows that if we lift even permutations from S4 to SL2(C), but odd permutations to matrices of determinant −1 in GL2(C), we get another double cover of S4 that does have a polynomial invariant ring, generated by AB and C; this is Shepard and Todd's complex reflection group #12 (where #8 through #11 are larger groups that contain the to SL2(C) lift of S4).

It is often more convenient to start from P = xy (x4y4), whose roots z = 0, ∞, ±1, ±i are vertices of an octahedron inscribed in the sphere with obvious fourfold symmetry z → iz (and with the equator restored to |z| = 1). Dividing the Hessian of P by −25 yields Q = x8 + 14x4y4 + y8, with roots at the vertices of the cube dual to that octahedron. Dividing the Jacobian derivative ∂(P,Q) / ∂(x,y)by −8 then yields R = x12 − 33x8y4 − 33x4y8 + y12, which has its twelve roots at the eight points μ4(1±2½) and the four primitive 8th roots of unity 2−½(±1±i ). The identity relating these covariants is 108P4Q3 + R2 = 0.

Here the symmetry group is generated by z → iz together with the involution z ↔ (z+1) / (z−1), which switches 1 ↔ ∞,  0 ↔ −1, and i ↔ −i. The determinant of the associated linear map (x, y) → (x+y, xy) is −2, so its lifts to SL2 are obtained by dividing by the square roots of −2. [...]


Monday, Nov. 5: Covariants of subgroups of GL2 related to A5; overview of the exceptional reflection groups of dimension 3 and higher

N=5:


Wednesday, Nov. 7: Hurwitz quaternions, the W(D4) lattice, and the W(F4) invariants; Belyi functions parametrizing trinomials with interesting Galois groups


Monday, Nov. 19: Introduction to (classical (elliptic)) modular curves

Review of the complex-analytic picture of the modular curves Y(1) and X(1) (affine and projective j-line), which parametrize C-isomorphism classes of elliptic curves C (or, for j=∞, degenerate elliptic curves C / Zω = C*), and their covers [Y(N), Y1(N), Y0(N) and] X(N), X1(N), X0(N), which respectively parametrize curves with full level-N structure, an N-torsion point, or a cyclic N-isogeny. These curves are all Belyi covers of the projective j-line, branched only above the cusp j = ∞ and the elliptic points j = 0, 1728 of order 3 and 2 respectively (the latter two parametrizing elliptic curves y2 = x3 + a6,   y2 = x3 + a4x with triangular and square period lattices and extra automorphisms).

The covers X(N) → X(1) are Galois. For small N these curves are well known: rational for N = 2, 3, 4, 5, with SL2(Z/NZ) / {±1} acting as S3, A4, S4, A5; elliptic for N = 6, with equation y2 = x3 + 1; the Klein quartic for N = 7; and an unramified double cover of the Fermat quartic for N = 8. For large N, the genus of X(N) grows roughly as N3/24, so we probably don't want to work directly with these curves past N=11 or so, when the genus is 26, though there are still nice models in projective spaces of dimension about N/2 (far from complete intersections once N>8, but retaining all the modular automorphisms of X(N)). But X1(N) and especially X0(N) are feasible to work with even for considerably larger N, since their genus grows only quadratically and linearly or so with N, and the modular struture and the availability of q-expansions of modular functions and forms makes such calculations feasible well beyond the range of “random” Belyi functions. For example, we can compute the genus-16 curve X0(191), and write j as an explicit degree-192 Belyi function on this curve. We've used these formulas to exhibit the j-invariants of a pair of Galois-conjugate elliptic curves defined over a quadratic number field that are related by an isogeny of degree 191.

We're not up to computing with X0(191) yet, but we'll show how to get rational coordinates and j on X0(N) for smome smaller but still not entirely trivial N, including 13 and 25, which are respectively the last prime and the last integer for which X0(N) has genus 0. In general, if N is prime then the Belyi map X0(N) → X(1) has degree N+1. [In the modular-curves literature one often denotes the level by N, not p, even when it is assumed prime.] The cusp j = ∞ has two preimages (the cusps of X0(N)), a simple pole at τ = ∞ and a pole of order N at τ = 0. The zeros of j are all triple except for one simple zero for N=3 and a pair of simple zeros if N is 1 mod 3. Likewise, the zeros of j-1728 are all double except for one simple zero for N=2 and a pair of simple zeros if N is 1 mod 4. (This can be seen by identifying the sheets of the cover X0(N) → X(1) with the N+1 points of a projective line mod N, namely the line associated to the 2-dimensional (Z/NZ)-vector space of N-torsion points on an elliptic curve; the monodromy generators of the three branch points are the reductions mod N of the stabilizers of τ = ∞, ρ, i in Γ(1) = SL2(Z) / {±1}, namely [1,1; 0,1], [0,1; −1,0], [0,−1; 1,1], the first of which is indeed the product of the other two.) It then follows by Riemann Hurwitz that X0(N) has genus g if N = 12g + {−1, 5, 7, 13} (and genus zero for N=2 and N=3). In particular, X0(N) is rational iff N is one of 2, 3, 5, 7, 13. It is no coincidence that these are precisely the primes N for which N−1 is a factor of 24; we shall give a uniform construction of a function h of degree 1 on X0(N) for each such N (which works also for the composite cases N=4, 9, 25, albeit not quite as nicely because for these N the curve has more than two cusps). We'll then write j as a rational function of this h, and verify that it is a Belyi function with the desired cycle structures.

As usual there's a PGL2 choice for the rational coordinate h on a rational curve, which we exploit (and reduce) by putting a convenient point at infinity. Here we require that the pole of h be at the cusp τ = ∞, i.e. that the Laurent expansion of h in terms of q = eiτ begin with a 1/q term; and we scale h so that h = q−1 + O(1). This determines h up to an additive constant. [Such a normalized rational coordinate on a modular curve is classically called a “Hauptmodul” (plural “Hauptmoduln”) for the curve.] We could pin down h completely by requiring that the consant term of the q-expansion vanish (that is, h = q−1 + O(q)); but this is usually not natural, and we prefer to use this freedom to put the zero of h at some convenient place (e.g. we usually work with j, not j−744). Here we put the zero at the other cusp of X0(N), so that h takes the values 0 and ∞ at τ = 0 and ∞ respectively.

For N=2 we easily construct such h as Δ(τ) / Δ(2τ), where Δ(τ) = 12−3 (E4(τ)3E6(τ)2) = q Πn≥1 (1−qn)24 is the cusp form of weight 12 for Γ(1), which is nonzero everywhere except for the simple zero at the cusp of X(1). Then j is a rational function of degree 3 in this h, with a simple pole at h = ∞ and a double pole at h = 0; thus it is a linear combination of h, 1, h−1, h−2, and by comparing coefficients (or using serreverse) we find the coefficients 1, 768, 196608, 16777216 = 1, 3 · 28, 3 · 48, 88,  whence j = (h+256)3 / h2 = 1728 + (h+64) (h−512)2 / h2. [Check: The general elliptic curve with a 2-isogeny is the same as the general curve y2 = x3 + ax2 + bx with a 2-torsion point. This curve has j = 256 (3ba2)3 / (b2 (4ba2)).   Equating this to (h+256)3 / h2 yields a cubic in h with one rational root,h = 256 b / (a2−4b);   equivalently, (b : a2) = (h : h+256). This gives an explicit parametrization by X0(2) of elliptic curves with a 2-isogeny (up to quadratic twist, which the modular curve cannot detect).]

For prime N>2, we can likewise construct the rational function Δ(τ) / Δ(Nτ) on X0(N) with a pole only at the cusp ∞ and a zero only at the cusp 0, but the pole and zero have order N−1 > 1, so we would have to take an (N−1)-st root to get a Hauptmodul, and this is possible only when N−1 is a factor of 24. [In general, we can only take a root of order gcd(N−1,24), and find that the divisor (0) − (∞) on X0(N) yields a torsion point of order (N−1) / gcd(N−1, 24) on the Jacobian J0(N) of the modular curve.] This is suggested by the 24th powers in the product formula for Δ. Indeed it is known that the 24th root

η(τ) = q1/24 Πn≥1 (1−qn) = q1/24q25/24q49/24 + q121/24 + q169/24 − − + + …

is a modular cusp form of weight 1/2 (with μ24-valued multipliers) for Γ(1). [I prefer this writing of the q-expansion to the usual one that factors out q1/24 to get “pentagonal-number” exponents, because the squares 1, 25, 49, 121, 169, … show we're dealing with a twisted theta function.] Thus h := (η(τ) / η(Nτ))24/(N−1) is a modular function for some congruence subgroup of Γ0(N), and since it is invariant under the stabilizer of each cusp, it is a modular function for all of of Γ0(N) by Ligozat's characterization of modular η products (“Courbes Modulaires de Genre 1”, Mémoires de la Soc. Math. de France 43 (1975)), and we have obtained our Hauptmodul. The factorizations of j and j −1728 are then as follows:
N=3:   j  =  (h+27) (h+243)3 / h3  =  1728 + (h2 − 406h − 39)2 / h3;
N=5:   j  =  (h2 + 250h + 55)3 / h5  =  1728 + (h2 + 22h + 125) (h2 − 500h − 56)2 / h5;
N=7:   j  =  (h2 + 13h + 49) (h2 + 245h + 74)3 / h7  =  1728 + (h4 − 10 · 72 h3 − 9 · 74 h2 − 2 · 76 h − 77)2 / h7;
N=13:   j  =  (h2 + 5h + 13) (h4 + 19 · 13 h3 + 20 · 132 h2 + 7 · 133 h + 134)3 / h13
     =  1728 + (h2 + 6h + 13) (h6 − 38 · 13 h5 − 122 · 132 h4 − 108 · 133 h3 − 46 · 134 h2 − 10 · 135 h − 136)2 / h13.

We can already see some patterns and arithmetical artifacts here, some of which we'll soon be able to explain. For starters, note that the pair of simple roots of j (when N is 1 mod 3), or of j −1728 (when N is 1 mod 4), is always in K=Q((−3)1/2) or Q(i) respectively. This is because they correspond to endomorphisms of degree N of an elliptic curve E of j-invariant 0 or 1728, that is, N-isogenies from E to itself; in each case the ring of endomorphisms is isomorphic with the ring of integers in K, and is defined over the same field K: for y2 = x3 + a6, the endomorphism ring ia generated by (x, y) → (ρx, y) where ρ is a cube root of unity; and for y2 = x3 + a4x,  a generator is (x, y) → (−x, iy). We get a cyclic N-isogeny by choosing one of the two primes of K above the rational prime N.

When N−1 is a factor of 24 but N is not prime, we can still use the formula h := (η(τ) / η(Nτ))24/(N−1) to obtain a Hauptmodul on X0(N). There are three such N, namely N=4, 9, 25, and each is the square of a prime; thus X0(N) has cusps other than ∞ and 0, but each of the new cusps has a factor of exactly N1/2 in the denominator of τ, and at such a cusp η(τ) and η(Nτ) vanish to the same order, so h has neither zero nor pole. (NB we must check that the local order of vanishing is integral even to verify Ligozat's criterion.) These other cusps still figure as poles in the formula for j as a rational function of h. This makes it a bit harder to recover that function from the q-expansions, but still routine with the computer: if j = A(h) / B(h) for some polynomials A, B of bounded degree, then B(h) j = A(h) yields a system of linear equations in the coefficients of A and B. [Naturally this is closely related with our earlier technique of solving for the coefficients of an algebraic relation between two rational functions on the curve once we have enough rational points on it; indeed it's the special case where the relation is of degree 1 in one variable and the sample points are “infinitely close”. This is tantamount to an algorithm requiring ~N3 operations, assuming the coefficients are small enough that it's reasonable to assign unit time to each arithmetic operation. There are subtler techniques that reduce the run time significantly, but ~N3 is small enough that it's usually not worth implementing such techniques unless they're already built into the package we're using, as with using serreverse to expand j in powers of 1/h.] Here we find:
N=4:   j  =  (h2 + 28h + 212)3 / ((h + 16) h4)  =  1728 + (h + 32)2 (h2 − 29h − 213)2 / ((h + 16) h4);
N=9:   j  =  (h + 9) (h3 + 35h2 + 37h + 38)3 / ((h2 + 9h + 27) h9)
     =  1728 + (h6 − 2 · 35 h5 − 11 · 37 h4 − 56 · 38 h3 − 5 · 312 h2 − 2 · 314 h − 315)2 / ((h2 + 9h + 27) h9);
N=25:   j  =  (h10 + 2 · 53 h9 + 7 · 54 h8 + 56 · 54 h7 + 57 · 55 h6 + 202 · 55 h5 + 21 · 57 h4 + 8 · 58 h3 + 11 · 58 h2 + 2 · 59 h + 59)3
          / ((h4 + 5h3 + 15h2 + 25h + 25) h25)
     =  1728 + (h2 + 2h + 5) (h4 + 10h3 + 45h2 + 100h + 125)2
          · (h10 − 4 · 53 h9 − 29 · 54 h8 − 262 · 54 h7 − 279 · 55 h6 − 1004 · 55 h5 − 21 · 58 h4 − 8 · 59 h3 − 11 · 59 h2 − 2 · 510 h − 510)2
          / ((h4 + 5h3 + 15h2 + 25h + 25) h25).

If you carried out this calculation you may have noticed that many of the coefficients of the q-expansions of h vanish: for N=4, we have h = q−1 − 8 + 20q − 62q3 + 216q5 − 641q7 + − …, so h+8 is an odd function of q; likewise for N=9, we have h = q−1 − 3 + 5q2 − 7q5 + 3q8 + 15q11 − 32q14 + 9q17 + − + …, so h+3 is q−1 times a power series in q3. We shall explain this next week. (Also for N=25 the qn coefficient of h+1 vanishes unless n is ±1 mod 5, which takes a bit more work to account for.)


Monday, Nov. 26: Fricke and Atkin-Lehner involutions of X0(N) and some of their uses

A key feature of the curves X0(N), for both theory and computation, is maps between them that reflect transformations of the isogenies they parametrize. For example, if M is a proper factor of N then there are several natural maps X0(N) → X0(M), the easiest one coming from the inclusion of Γ0(N) into Γ0(M); we'll see later that all these maps come from GL2(Q)-conjugates of Γ0(N) in Γ0(M). In the special case M = N > 1, there are nontrivial conjugations of Γ0(N) to itself. Today we consider the most important of these: the Fricke involution and more generally the group of Atkin-Lehner involutions.

Recall that to any isogeny φ: EE is associated a dual isogeny φ*: E’ → E of the same degree, with φ** = φ and φ*φ = deg(φ) [the multiplication-by-deg(φ) map on E]. If φ is cyclic, then so is φ*. Since the modular curve X0(N) parametrizes cyclic N-isogenies, we obtain a map wN : Y0(N) → Y0(N) as follows: if P is a point on Y0(N) parametrizing an isogeny φ, then wN(P) is the point parametrizing φ*. Since φ** = φ, this map wN is an involution. Extending to X0(N) yields the Fricke involution of X0(N).  Over C, this involution is represented by the fractional linear transformtion of the (extended) upper half-plane taking τ to −1 / (Nτ), which (for N>1) is not in SL2(Z), but does belong to the normalizer of Γ0(N) in PGL2(Q)+ [the “+” means positive determinant — else we'd have a fractional linear transformation that switches the upper and lower half-planes]; indeed it had better normalize Γ0(N) to give a well-defined automorphism of X0(N). The group generated by Γ0(N) and wN is sometimes called Γ0(N)+, and the corresponding modular curve X0(N)/wN can then denoted by X0(N)+; for a field k, the k-rational points of X0(N)+ parametrize (j-invariants of) cyclically N-isogenous pairs E, E of “k-curves”, with E, E (and the isogeny between them) defined over some field K containing k with degree at most 2, with E isomorphic with Eσ over the algebraic closure of k. Here σ is the Galois involution of K/k, or the identity if K=k. Warning: In general there might not be any twist over K that makes E’ ≅ Eσ over the same field K.]

By the Riemann-Hurwitz formula, the genus of X0(N)+ can be computed from the genus of X0(N) together with the number of fixed points of wN. This number can be expressed in terms of the class numbers of binary quadratic forms of discriminants N (if that's congruent to 0 or 1 mod 4) and −4N; it is even by Riemann-Hurwitz, and nonzero because τ = i / N½ is always a fixed point. The total count grows as N (1/2)±ε. Thus as N→∞ the genus of X0(N)+ is asymptotically half the genus of X0(N). It's never more than half that genus, and can be substantially less for small (or even some not-so-small) N.  Indeed X0(N)+ has genus 0 for some N as large as 71, famously including all the prime factors of the size of the Monster group (see “Monstrous Moonshine”); each of these curves is thus rational, because X0(N), and thus any quotient of X0(N), has at least one rational point, the cusp τ = i ∞ . Also, the genus of X0(N)+ is 1 for some N as large as 131 (and at least when N is prime this elliptic curve has rank 1 over Q, so for each of these N there are infinitely many Q-curves” over quadratic extensions of Q that are N-isogenous with their Galois conjugates. The largest N for which X0(N)+ has genus 2 (genus 3) is 191 (resp. 239). This and much more information about the genera of these curves and their spaces of holomorphic differentials (a.k.a. weight-2 cuspforms) can be read off tables such as Antwerp Table 5. We have seen already how to use the q-expansions of modular forms on such curves to compute explicit models.

For the seven values of N>1 with (N−1) | 24, for which we constructed an X0(N) Hauptmodul h as the 24 / (N−1) power of  η(τ) / η(Nτ), the involution wN takes hN12 / (N−1) / h.  Note that the constant N12 / (N−1) is integral even for the two values N=9 and N=25 for which the exponent 12 / (N−1) is half-integral. The two fixed points h = ±N 6 / (N−1) of wN give elliptic curves E/C with a cyclic N-isogeny to E itself. Thus E has complex multiplication (CM). This is clear for the fixed point at τ = i / N½, for which j(E) = j(i N½) and thus End(E) is the imaginary quadratic order Z[(−N)½] of discriminant −4N. At this τ, our h is clearly positive, so h = +N 6 / (N−1).  For N = 2, 3, 4, 7, we get the integers h = 64, 27, 16, 7 respectively, and can then use last week's formulas for j as a rational function of h to compute j = 203, 2·303, 663, 2553 respectively. In this case the negative root h = −N 6 / (N−1) yields the third cusp of X0(N) for N=4, and simpler CM values j = 123, 0, −153 of discriminant −4, −3, −7 for N = 2, 3, 7 respectively. [If we had known in advance that these “singular moduli” are in Z then we could have just used a few terms of the q-expansions for j, or better yet for E4 and Δ (easier to bound the coefficients), and then rounded to the nearest integer. But this approach requires developing (or taking on faith) a substantial chunk of the theory of complex multiplication.] If N is one of the remaining three values 5, 9, 25, then the two square roots of N12 / (N−1) yield the two Galois-conjugate j-invariants of curves with CM by Z[(−N)½].

[Added after class: each of the pairs (X0(N), wN) yields several further CM values of j, coming from cyclic N-isogenies φ: EE with φ* ≠ −φ, coming from pairs of solutions h of j(h) = j(wN(h)) other than the fixed points of wN. For N=2 this gives h2 + 47h + 212 = 0 and j = −153 which we have already seen, where the endomorphism ring has discriminant −7 and contains the elements (±1±(−7)½) / 2 of norm 2). For N=3 the new CM values have h2 + 46h + 36 = 0 and h2 − 10h + 36 = 0; the former repeats the CM value j = 203 seen already, but the latter gives the new j = −323 with discriminant −11.   Exercise: Do the analogous computation for N=5, 7, 13; can you identify the CM discriminant of each of the new “singular moduli” found?]

The Fricke involution wN of X0(N) generalizes to an Atkin-Lehner involution wN1 of the same curve X0(N) for any factorization N = N1N2 with gcd(N1, N2) = 1; that is, with N1, N2 being complementary “unitary divisors” of N. These wN1 form an abelian group of exponent 2 (assuming N≠1) and rank equal to the number of distinct prime factors of N, the identity element being the trivial “involution” w1. To construct wN1, factor the typical isogeny parametrized by X0(N) in two ways as the product of cyclic isogenies of degrees N1 and N2, namely EE’ → E’’’ and EE’’ → E’’’ where the isogenies EE and E’’ → E’’’ have degree N1, and the isogenies EE’’ and E’ → E’’’ have degree N2. That is, points P on Y0(N) parametrize commutative diagrams of cyclic isogenies of degrees N1 and N2. Replacing each of the isogenies of degree N1 by its dual yields another such diagram, which corresponds to another point P on Y0(N); we define wN1 to be the map that takes P to P, completed to a map X0(N) → X0(N), which again is an involution because φ** = φ. For N1=N this definition recovers wN, as it should to justify the notation; and as is true for wN, the involution wN1 comes from a matrix with integer entries and determinant N1 in the normalizer of Γ0(N) in PGL2(Q)+. The product of two Atkin-Lehner involutions wN1 and wN2 is wN3 where N3 is the unitary divisor of N determined uniquely by the condition that N1N2N3 be a square. Unlike the special case N1=N, for general unitary divisors N1 of N we cannot expect to be able to lift wN1 to an involution in PGL2(Q)+. It sometimes happens that an involution wN1 has no fixed points at all; this first happens for (N, N1) = (14, 2) and (15, 3) When wN1 does have fixed points, they can again be counted via class numbers of quadratic forms; the general formulas can get annoying to figure out, but fortunately this is not necessary for any N small enough that we might actually compute formulas for X0(N) or its quotient by some subgroup of the Atkin-Lehner group: by Riemann-Hurwitz, this is equivalent to finding, for each choice of ± signs, the dimension of the (±1, ±1, …, ±1) eigenspace of the action of the wN1's on the holomorphic differentials on X0(N), or equivalently on the weight-2 cuspforms for Γ0(N); and these dimensions have been extensively tabulated (already the fifth Antwerp table contains this information for all N≤300), and computer packages will compute the eigenspaces, and thus a fortiori their dimensions, on the fly.


Wednesday, Nov. 28: Non-Atkin-Lehner elements of the normalizer of Γ0(N)