Lecture notes for a lecture on Newton iteration for simultaneous algebraic equations delivered July 31, 2006, at the Clay Mathematics Institute's Summer School on arithmetic geometry at Goettingen Noam D. Elkies A basic problem in number theory (2000+ BCE -- 2000+ CE): _Solve_ equations in rational or integral variables. NB: 1) nowadays we also want to "study solutions of...", but the problem of actually solving remains of interest. 2) Classically "rational or integral" means Q or Z. Nowadays also K and O_K with [K:Q] finite, or function fields over Q, K, F_q, etc. Few variables and equations: arithmetic complicates the problem (e.g. y^2 = x^3 + Ax + B much easier to solve over C -- though to be sure already there the solutions have nontrivial structure...) Many variables and equations: arithmetic might actually help! Why do we care? Often, variables = coefficients in _polynomial identities_ (arise naturally in Dioph. eqns. over function fields, and elsewhere) Example (from abstract): 1) fix elliptic curve E: y^2 = x^3 + Ax + B = P(x). E[N] = N-torsion points <--> Weil function A(x) + y B(x) <--> A^2 - P B^2 = c (x-x0)^N with gcd(A,B) = 1 2) X_1(N) <--> same thing with P varying as well. In this case the problem can be solved for quite large N using division polynomials and q-expansions; but these powerful tools are very specific to the structure of elliptic and modular curves, and in general we do not have or expect such tools and need a good general technique. Surprisingly often such problems amount to computing Covers of P^1 with few (but >=3) ramified points e.g. E[N] can also be obtained from dihedral covers of P^1 ramified above 4 points (namely x=infinity and the roots of P(x)); the map j : X_1(N) --> P^1 realizes X_1 as a cover of P^1 ramified above j=infinity, 0, 1728. Other examples: Inverse Galois (see Serre's Lectures) Shimura modular curves Belyi functions (3 branch points) -- ABC conjecture Elliptic surfaces of large Neron-Severi rank (with finite Mordell-Weil group) But not all interesting applications come from branched covers of P^1; e.g. Elliptic surfaces with large Neron-Severi rank some of which comes from Mordell-Weil (one of many cases of this is the topic of the exercises I put together for this evening), or families of elliptic curves with a large integral point (with M.Watkins). Now the first thing that we might think of when faced with many algebraic equations of degree d_i > 1 in many variables is Bezout's theorem, which promises D = \prod_i d_i solutions. This product is usually large, which is *not* good, because we then expect D solutions conjugate over a huge (degree-D) number field, which may not even be feasible to write down in any reasonable way. Fortunately, for many applications Bezout can wildly overestimate the number of solutions! That's because the solutions we seek are not *complete* intersections of the hypersurfaces specified by our simultaneous equations: typically there are many extraneous solutions. For instance, in the E[N] example, there will be solutions with gcd(A,B) nontrivial, corresponding to torsion points of lower order; when N is composite, there will also be solutions corresponding to M-torsion points for proper factors M of N. Sometimes there are even positive-dimensional varieties of extraneous solutions. The solutions of interest are often quite few in number, and thus defined over Q or a tractable extension of Q, even when the Bezout estimate is frightfully large. In many cases enough arithmetic information is available to predict in advance the number of solutions and sometimes even additional information such as fields of definition or primes of bad reduction. So, how to actually find those solutions? The problems range from nearly trivial to totally intractable -- though the boundaries keep shifting as technique and software improve (and computers become faster). Being clever in choosing variables, transformations, etc., always helps -- but again only raises the bar and still leaves us to apply some standard techniques at the end. These techniques are: 1) "inspection" e.g. find a map f: P^1 --> P^1 of degree n, branched only above 0, 1, infinity, with cycle structures (n-1, 1), (2, 1^{n-2}), (1^n). Solution: Put preimage of infinity at infinity, so f is a polynomial of degree n; put (n-1)-fold preimage at 0, so f(x) = a*x^n + b*x^{n-1}; then solve the calculus exercise to put the nonzero root of f'(x) at a point where f(x)=1. (The resulting f is uniquely determined modulo changing coordinates from x to cx for some nonzero c.) When we have more variables than equations, this yields a rational moduli space; e.g. the general f: P^1 --> P^1 branched only above 4 points, with cycle structure (n-1, 1) at zero and (1^n) at infinity, is x^{n-2} times a polynomial of degree 2 -- one parameter up to scaling f and x. 2) Reduce to 1 equation in v>0 variables; solve it (if v=1), or parametrize the resulting curve (v=2), surface (v=3), etc. using tools of algebraic geometry. e.g.: as before but (n-2, 1^2), (3, 1^{n-3}), 1^n. So now f(x) = a*x^n + b*x^{n-1} + c*x^{n-2} -- looks like 3 parameters, but only 1 up to scaling. Adjust it so that f(x) has the same value at the nonzero roots of f'(x) -- that is, so that the remainder of f(x) modulo n*a*x^2 + (n-1)*b*x + (n-2)*c has x-coefficient zero. 3) Reduce to 2 equations: eliminate one with _resultants_ (and usually eliminate extraneous solutions before finding the one(s) of interest); then proceed as in #2. Example -- see exercises. 4) Reduce to a few more variables (around 4?): eliminate with _Gr\"obner bases_ (in effect, generalized resultants; a great topic for a computational talk, but not a talk I'm prepared to give). And if one's best efforts still yield equations in too many variables for even Gr\"obner bases to be feasible? 5) Give up. That's a legitimate technique! There are plenty of computational problems out there, and we use our time and effort more efficiently working on tractable ones than breaking our head on intractable problems. Still, before using this technique (and perhaps even before embarking on #4), Try also p-adic Newton! That is: i) Choose p More about this choice soon. ii) Find solution(s) mod p (besides methods 1-4, arithmetic and exhaustive search my be of use) iii) Lift to a good approximation in Q_p: start with any lift, which is within O(p) of the answer, then apply multivariate Newton r times to improve to O(p^(2^r))... This is multivariate Newton, the natural generalization of the familiar univariate case, and likewise based on the calculus "theorem" that "any" function has an affine-linear approximation with quadratic error. For one variable, each step of the iteration to solve f(x)=0 replaces x by x - f(x)/f'(x), provided f' is nonzero at the nearby solution; if f is a function from and to n-dimensional space, use the same formula but interpret f' as a matrix (a.k.a. linear transformation) to be inverted, again provided it is invertible at the desired solution. In particular, ...if f'(x_0) is invertible mod p (in which case Newton converges to the unique solution in Z_p that reduces to x_0 mod p -- recall Hensel's lemma). [Even if f'(x_0) is not quite invertible mod p, we might still converge to a p-adic solution albeit more slowly.] Since f may be quite complicated to evaluate -- e.g. the coordinates of f(x) may be resultants or discriminants of polynomials whose coefficients are rational functions of x_1,...,x_n -- it could get quite daunting to actually compute f'. Fortunately, we don't have to compute f'(x), only f(x + \epsilon e_i - x) (i=1,...,n) ! e_i being the unit vectors. The point is that by evaluating f at n+1 points to within O(|\epsilon|^2), we get an approximation of f' to within O(|\epsilon|); if |f(x)| << |\epsilon|, that approximation suffices to compute x - f(x)/f'(x) to the desired accuracy of O(|\epsilon|^2). iv) Recover solutions in Q, or in small-degree extensions of Q, using lattice basis reduction (a.k.a. integer relation finding; in gp, algdep(x,n)) [which works even when x is a p-adic rather than Archimedean approximation; if x is known to within O(p^d) then we seek an integer relation among 1,x,x^2,...,x^n,p^d with all but the last coefficient small.] WARNING: this might not find all solutions, because... @ "bad reduction": -- no solution mod p; -- solution doesn't lift to char.0 (in which case f' mod p is not invertible, see above); -- solution does lift, but to an unramified extension of Q_p (ditto); --> change p (larger p less likely to have bad reduction) or, possibly exploit bad reduction! (if the arithmetic of the reduction mod p is understood well enough to allow this; I don't have any example of this but maybe it's been done) @ Good reduction, but no solution mod p because no prime of residue field F_p in the field of definition of the solution (of course this never happens for a rational solution; for any number field K, even if the solution is defined only over K there are of course *some* p that work, but we might have been unlucky in choosing p, and anyway the smallest p might be too large) --> change p (or maybe try F_{p^2}, F_{p^3} if p is really small) @ Solution lifts alright, but cannot be recognized as algebraic. Sometimes the field of definition is of such a large degree, and/or the solutions have such a large height, that lattice reduction does not suffice to recognize the variables as algebraic numbers before we reach our limits in computing p-adic approximations. But then maybe we didn't want to see those solutions anyway... --> use Technique 5 above ;-( @ Too many possibilities for exhaustive search --> reduce p; if not possible, but the problem is important enough, maybe get enough computers working on it in parallel; if not, go back to Technique 5 @ Even when it succeeds, the method might not find _all_ solutions... some may be hidden by bad reduction, ramification, or a larger residue field, as discussed above ... and even when we do find them all we may not be able to prove it. (unless we know in advance the number of solutions, or for some arithmetic reason we're interested only in ones defined over Q, etc.) Still, for all that, the p-adic Newton technique yields solutions where none could be found otherwise. Sometimes we can even prove we have them all. Note that even if we don't know in advance a bound on the height, and thus can't tell how much p-adic precision is necessary, once we've used lattice reduction to guess an algebraic solution we can confirm it by simply substituting back into the equations (and when the solution has f' invertible mod p we can then cite Hensel to prove that our solution must be the right one). Sometimes we get lucky and there's a prime p at which the problem has good reduction but nevertheless simplifies. See my paper on the computation Shimura-Belyi covers associated with congruence subgroups triangle groups (http://arXiv.org/abs/math/0409020), recently published in ANTS-7 proceedings. UPDATE -- during the ANTS meeting I succeeded in computing such a cover of degree 5^3 + 1 = 126 associated to the (2,3,7) triangle group: there are explicit rational functions A,B,C on the genus-2 curve y^2 = -16*x^6 + 172*x^5 - 407*x^4 - 334*x^3 + 31*x^2 + 50*x + 5 such that A^7 + B^3 + C^2; then B^3/A^7 is the degree-126 map, with geometric Galois group PSL_2(F_{125}). [The ANTS paper is for the cover of degree 3^3 + 1 = 28; simpler covers, of degree 8 and 9, were already known in the 19th century, and the three covers of degree 14, all defined over Q(cos(2*Pi/7)), can also be computed, albeit not as easily, without resorting to p-adic Newton trickery.] Example: computing a K3 surface of Neron-Severi rank 20. [This is the maximum for a K3 surface in characteristic zero; such K3's are called "singular", in analogy with the equally misleading traditional terminology for CM elliptic curves; these notions are related: if E1 and E2 are elliptic curves then the Kummer surface obtained by blowing up (E1xE2)/{1,-1} at its 16 double points is "singular" if and only if E1 and E2 are isogenous CM curves.] Specifically, an elliptic K3 surface of N-S rank 20, entirely accounted for by two reducible bad fibers, both multiplicative, of types I_3 and I_{17}. That is to say: an elliptic surface y^2 = x^3 - 3A(t) x + 2B(t) (deg(A)=8, deg(B)=12) for which the discriminant, a scalar multiple of A^3 - B^2 has a triple and a 17-fold root. [This is maximal by the ABC conjecture for polynomials; the rational function A^3/B^2, related to the j-invariant of the curve, is then a degree-24 Belyi function with ramification (3^8) at 0, (2^{12}) at infinity, and (17 3 1^4) at 1. Unfortunately the resulting Galois extension is "boring", with group A_{24} rather than a more interesting transitive subgroup of A_{24}...] The following derivation uses some not-quite-trivial optimizations, but purposefully withholds one or two so as to illustrate the p-adic Newton method at the end without getting too caught up in algebraic manipulation. -- Put the I_3 and I_{17} fibers at t=0 and infinity. So, we want A^3 - B^2 = t^3 * quartic (24 - 17 = 7 = 3 + 4) A = a^2 + 2b, deg(a)=4, deg(b)<=3 => B = a^3 + 3ab + c, deg(c)<=2 => A^3 - B^2 = a^2(3b^2-2ac) - 6abc + 8b^3 - c^2 degree: <=14 9 9 4 so: 3b^2 - 2ac = linear. -- We can nicely parametrize this by adopting the following neat trick [which also appears in other guises related to near-cancellation in A^3 - B^2 (Hall's conjecture) and elsewhere (Don Zagier told me of this in another context)]: use the automorphism group of the quadratic form Q(a,b,c) := 3b^2 - 2ac, which is isomorphic with PSL_2. In more down-to-earth terms: generically c is quadratic, and the remainder b' of b mod c is linear. Then Q(a,b,c)=Q(a',b',c) for some constant a'. So start from constant a', linear b', and quadratic c such that Q(a',b',c) -- which is easy to parametrize, for instance by normalizing a' = 1, choosing the linear polynomial b' arbitrarily, and then choosing c to kill the t^2 coefficient of 3b'^2 - 2c. Now choose some linear polynomial d, set b=b'+2cd, and find a=a'+6d(b'+cd). This brings us down to A^3-B^2 of degree at most 9; we eliminate the t^9 and t^8 coefficients by solving for the remaining coefficients of c, normalize by making d monic, say d=t+d0, and finally obtain: b' = b1 t + b0; c = (3 b1^2/2) t^2 + (18 b0 - b1) b1 t/6 + (9 b1^2 d0 + (9 b0 - b1)^2) / 54; b = b' + 2cd; a = 1 + 6d(b'+cd) This parametrizes (an open set in) all elliptic K3 surfaces y^2 = x^3 - 3A(t) x + 2B(t) with an I_{17} fiber at t=infinity; there are three parameters b0,b1,d0 but we can kill one of them, for instance by setting d0=0, because translating t by any constant yields an equivalent surface. Aside: This will yield, via formulas in Abhinav Kumar's thesis, the algebraic condition on the Igusa invariants of a curve C of genus 2 that make the Jacobian J(C) an abelian surface with real multiplication by Z[(1+sqrt(17))/2]. [For any genus-2 curve C, the Kummer surface of J(C) is the quotient under a Nikulin involution of a K3 surface whose Neron-Severi lattice contains E8 + E7 + (hyperbolic plane); Kumar obtained the coefficients of this K3 surface explicitly in terms of the Igusa invariants of C. Enlarging this lattice to one of rank 18 and discriminant -17 corresponds to the above real multiplication in J(C).] Further specialization of b0,b1,d0 yields the curve parametrizing surfaces with a I_{18} fiber, and the single surface with a I_{19} fiber. [There are actually two kinds of I_{18} surfaces. One has 3-torsion, but we do not see it in our analysis, because it comes from (A,B)=(a^2+2b,a^3+3ab+c) for which b,c have degrees 2,0 rather than the generic 3,2.] But we want another specialization, the one in which A^3-B^2 has a third-order zero at t=0. This imposes three rather complicated polynomial conditions on our three parameters b0,b1,d0. Solving them algebraically would be quite painful. But it is easy to have the computer try all 13^3 possibilities mod 13. [We chose p=13 because this prime is split in the CM field Q(sqrt(-51)), which guarantees good reduction; if p is inert or ramified prime, S becomes "supersingular" mod p, with Neron-Severi rank 22, which for small p can make the configuration of reducible fibers more complicated.] We find (b0,b1,d0)=(-3,1,2), and after a few iterations of Newton recover the lift (61*3^4/4, 3^9, -19/(4*3^5)). The resulting (A,B) have huge coefficients, e.g. A is (945805746756374804027510784 t^8 + 59999356652038634274168066048 t^7 + 1422814869054215764245396905472 t^6 + 14929432812660380691376861698816 t^5 + 58118397806543759118915292051344 t^4 - 4597406646430468821931783375776 t^3 + 135241402451317669132244024328 t^2 - 1764491496504191928892453032 t + 8625721468351356394552465) / (2^{20} 3^{14}) but that's an artifact of our normalizations, which reduce the variable count but don't always yield the simplest formulas. Multiplying t by a relatively small factor can make the polynomials look much more complicated than they really are... Fortunately it is usually not hard to bring such monstrosities down to size. Here we can simplify considerably by dividing t by 324 and A by 729: A = 2^{-16} * (9 t^8 - 144 t^7 + 612 t^6 + 2142 t^4 - 28560 t^3 + 884 t^2 + 7072 t + 83521) and the coefficients become even smaller if we allow affine linear changes of variable: replace t by 1-4t, to get A = 9 t^8 + 18 t^7 - 9 t^6 - 18 t^5 + 27 t^4 + 12 t^3 - 16 t^2 + 4 t + 1, with the I_3 fiber now at t=1/4. [The fact that this surface is defined over Q can be explained either via the "rigidity" theory for Belyi covers, or using the arithmetic of K3 surfaces, where the key is that while the ring of integers in Q(sqrt(-3*17)) does not have unique factorization, its class group is no larger than genus theory requires. This latter approach also explains some finer arithmetic details, such as behavior modulo small primes, and the fact that while the surface is defined over Q, not all of the components of the reducible fibers can be -- the best one can do is make the I_{17} components all rational, as we have done, and then the two nontrivial I_3 components are conjugate over Q(sqrt(-3)).] Hard exercise: do the same for 7*13 instead of 3*17. Answer: A = t^8 - 2 t^7 + 7 t^6 + 42 t^5 - 133 t^4 + 196 t^3 - 516 t + 409, with I_{13} at infinity and I_7 at t=3/4. Along the way, we get the rational parametrization of I_5 I_{13} surfaces -- and thus, via Kumar's thesis, of genus-2 Jacobians with real multiplication by Z[(1+sqrt(65))/2 -- and of I_6 I_{13} surfaces, which are the starting point of this evening's exercises. ==== What if we have more variables than equations? Suppose n+1 variables in n equations. Then we get (usually) a curve C, but usually in a very complicated model, again including extraneous components, etc., even when C is known or expected to have a simple model with small or even zero genus. Newton cannot "solve" directly (e.g. can't invert a non-square matrix). But we can still adapt the technique: having found one regular point P_0 mod p: @ choose a hopefully simple (i.e. low degree) function f : C --> P^1 @ choose various c in Q that reduce mod p to f(P_0) @ For each one, use Newton to compute a close p-adic approximation to a point in f^{-1}(c) on C -- which we can now do (n equations in only n variables) @ Now use algdep to recognize the coordinates as algebraic numbers. The ratios of coefficients of their defining equations are rational functions of c. @ Once this is done to sufficient accuracy for enough c's, we can guess the coefficients of an equation M(f,g)=0 relating f with some other function g : C --> P^1. This is probably an equation for C; now use algebraic geometry of curves to simplify it. Likewise for 2- or higher-dimensional varieties, though of course birational algebraic geometry gets harder once the dimension exceeds 1. Even for dimension 1 it can take significant work, but I've done it at least once, in the course of parametrizing certain K3 surfaces of Neron-Severi rank 19 that led to the new rank records for elliptic curves over Q(t) and Q. (UPDATE: at least three times by now; the other two are also rank-19 elliptic K3 surfaces, one for a Mordell-Weil rank problem, the other for a specialization to a singular K3 surface defined over Q with Neron-Severi discriminant -660.)