Lecture notes for a lecture on
Newton iteration for simultaneous algebraic equations
delivered July 31, 2006, at the Clay Mathematics Institute's
Summer School on arithmetic geometry at Goettingen
Noam D. Elkies
A basic problem in number theory (2000+ BCE -- 2000+ CE):
_Solve_ equations in rational or integral variables.
NB: 1) nowadays we also want to "study solutions of...",
but the problem of actually solving remains of interest.
2) Classically "rational or integral" means Q or Z.
Nowadays also K and O_K with [K:Q] finite, or function fields
over Q, K, F_q, etc.
Few variables and equations: arithmetic complicates the problem
(e.g. y^2 = x^3 + Ax + B much easier to solve over C -- though
to be sure already there the solutions have nontrivial structure...)
Many variables and equations: arithmetic might actually help!
Why do we care? Often,
variables = coefficients in _polynomial identities_
(arise naturally in Dioph. eqns. over function fields,
and elsewhere)
Example (from abstract):
1) fix elliptic curve E: y^2 = x^3 + Ax + B = P(x).
E[N] = N-torsion points <--> Weil function A(x) + y B(x)
<--> A^2 - P B^2 = c (x-x0)^N with gcd(A,B) = 1
2) X_1(N) <--> same thing with P varying as well.
In this case the problem can be solved for quite large N
using division polynomials and q-expansions; but these
powerful tools are very specific to the structure of
elliptic and modular curves, and in general we do not have
or expect such tools and need a good general technique.
Surprisingly often such problems amount to computing
Covers of P^1 with few (but >=3) ramified points
e.g. E[N] can also be obtained from dihedral covers of P^1
ramified above 4 points (namely x=infinity and the roots of P(x));
the map j : X_1(N) --> P^1 realizes X_1 as a cover of P^1
ramified above j=infinity, 0, 1728. Other examples:
Inverse Galois (see Serre's Lectures)
Shimura modular curves
Belyi functions (3 branch points) -- ABC conjecture
Elliptic surfaces of large Neron-Severi rank
(with finite Mordell-Weil group)
But not all interesting applications come from branched covers of P^1;
e.g. Elliptic surfaces with large Neron-Severi rank some of which
comes from Mordell-Weil (one of many cases of this is the topic
of the exercises I put together for this evening), or families of
elliptic curves with a large integral point (with M.Watkins).
Now the first thing that we might think of when faced with many
algebraic equations of degree d_i > 1 in many variables is
Bezout's theorem, which promises D = \prod_i d_i solutions.
This product is usually large, which is *not* good, because
we then expect D solutions conjugate over a huge (degree-D)
number field, which may not even be feasible to write down
in any reasonable way. Fortunately, for many applications
Bezout can wildly overestimate the number of solutions!
That's because the solutions we seek are not *complete* intersections
of the hypersurfaces specified by our simultaneous equations:
typically there are many extraneous solutions. For instance,
in the E[N] example, there will be solutions with gcd(A,B) nontrivial,
corresponding to torsion points of lower order; when N is composite,
there will also be solutions corresponding to M-torsion points
for proper factors M of N. Sometimes there are even
positive-dimensional varieties of extraneous solutions.
The solutions of interest are often quite few in number,
and thus defined over Q or a tractable extension of Q,
even when the Bezout estimate is frightfully large.
In many cases enough arithmetic information is available
to predict in advance the number of solutions and sometimes
even additional information such as fields of definition
or primes of bad reduction.
So, how to actually find those solutions? The problems
range from nearly trivial to totally intractable -- though
the boundaries keep shifting as technique and software improve
(and computers become faster). Being clever in choosing
variables, transformations, etc., always helps -- but again
only raises the bar and still leaves us to apply some standard
techniques at the end. These techniques are:
1) "inspection"
e.g. find a map f: P^1 --> P^1 of degree n, branched only above
0, 1, infinity, with cycle structures (n-1, 1), (2, 1^{n-2}), (1^n).
Solution: Put preimage of infinity at infinity, so f is a polynomial
of degree n; put (n-1)-fold preimage at 0, so f(x) = a*x^n + b*x^{n-1};
then solve the calculus exercise to put the nonzero root of f'(x)
at a point where f(x)=1. (The resulting f is uniquely determined
modulo changing coordinates from x to cx for some nonzero c.)
When we have more variables than equations, this yields a
rational moduli space; e.g. the general f: P^1 --> P^1 branched
only above 4 points, with cycle structure (n-1, 1) at zero and
(1^n) at infinity, is x^{n-2} times a polynomial of degree 2 --
one parameter up to scaling f and x.
2) Reduce to 1 equation in v>0 variables; solve it (if v=1),
or parametrize the resulting curve (v=2), surface (v=3), etc.
using tools of algebraic geometry.
e.g.: as before but (n-2, 1^2), (3, 1^{n-3}), 1^n.
So now f(x) = a*x^n + b*x^{n-1} + c*x^{n-2} -- looks like
3 parameters, but only 1 up to scaling. Adjust it so that
f(x) has the same value at the nonzero roots of f'(x) --
that is, so that the remainder of f(x) modulo
n*a*x^2 + (n-1)*b*x + (n-2)*c has x-coefficient zero.
3) Reduce to 2 equations: eliminate one with _resultants_
(and usually eliminate extraneous solutions before finding
the one(s) of interest); then proceed as in #2. Example -- see exercises.
4) Reduce to a few more variables (around 4?): eliminate with
_Gr\"obner bases_ (in effect, generalized resultants; a great topic
for a computational talk, but not a talk I'm prepared to give).
And if one's best efforts still yield equations in too many variables
for even Gr\"obner bases to be feasible?
5) Give up.
That's a legitimate technique! There are plenty of computational
problems out there, and we use our time and effort more efficiently
working on tractable ones than breaking our head on intractable problems.
Still, before using this technique (and perhaps even before embarking
on #4),
Try also p-adic Newton! That is:
i) Choose p
More about this choice soon.
ii) Find solution(s) mod p (besides methods 1-4, arithmetic
and exhaustive search my be of use)
iii) Lift to a good approximation in Q_p: start with any lift,
which is within O(p) of the answer, then apply multivariate
Newton r times to improve to O(p^(2^r))...
This is multivariate Newton, the natural generalization of the familiar
univariate case, and likewise based on the calculus "theorem" that
"any" function has an affine-linear approximation with quadratic error.
For one variable, each step of the iteration to solve f(x)=0 replaces
x by x - f(x)/f'(x), provided f' is nonzero at the nearby solution;
if f is a function from and to n-dimensional space, use the same formula
but interpret f' as a matrix (a.k.a. linear transformation) to be
inverted, again provided it is invertible at the desired solution.
In particular,
...if f'(x_0) is invertible mod p (in which case Newton converges
to the unique solution in Z_p that reduces to x_0 mod p --
recall Hensel's lemma). [Even if f'(x_0) is not quite
invertible mod p, we might still converge to a p-adic solution
albeit more slowly.]
Since f may be quite complicated to evaluate -- e.g. the coordinates
of f(x) may be resultants or discriminants of polynomials whose
coefficients are rational functions of x_1,...,x_n -- it could get
quite daunting to actually compute f'. Fortunately,
we don't have to compute f'(x), only f(x + \epsilon e_i - x)
(i=1,...,n) !
e_i being the unit vectors. The point is that by evaluating
f at n+1 points to within O(|\epsilon|^2), we get an approximation
of f' to within O(|\epsilon|); if |f(x)| << |\epsilon|, that
approximation suffices to compute x - f(x)/f'(x) to the desired
accuracy of O(|\epsilon|^2).
iv) Recover solutions in Q, or in small-degree extensions of Q,
using lattice basis reduction (a.k.a. integer relation finding;
in gp, algdep(x,n))
[which works even when x is a p-adic rather than Archimedean approximation;
if x is known to within O(p^d) then we seek an integer relation among
1,x,x^2,...,x^n,p^d with all but the last coefficient small.]
WARNING: this might not find all solutions, because...
@ "bad reduction":
-- no solution mod p;
-- solution doesn't lift to char.0
(in which case f' mod p is not invertible, see above);
-- solution does lift, but to an unramified extension of Q_p
(ditto);
--> change p (larger p less likely to have bad reduction)
or, possibly exploit bad reduction!
(if the arithmetic of the reduction mod p is understood
well enough to allow this; I don't have any example of this
but maybe it's been done)
@ Good reduction, but no solution mod p because no prime
of residue field F_p in the field of definition of the solution
(of course this never happens for a rational solution; for any
number field K, even if the solution is defined only over K
there are of course *some* p that work, but we might have been
unlucky in choosing p, and anyway the smallest p might be too large)
--> change p (or maybe try F_{p^2}, F_{p^3} if p is really small)
@ Solution lifts alright, but cannot be recognized as algebraic.
Sometimes the field of definition is of such a large degree, and/or
the solutions have such a large height, that lattice reduction does not
suffice to recognize the variables as algebraic numbers before we reach
our limits in computing p-adic approximations. But then maybe we
didn't want to see those solutions anyway...
--> use Technique 5 above ;-(
@ Too many possibilities for exhaustive search
--> reduce p; if not possible, but the problem is important enough,
maybe get enough computers working on it in parallel; if not,
go back to Technique 5
@ Even when it succeeds, the method might not find _all_ solutions...
some may be hidden by bad reduction, ramification, or a larger residue
field, as discussed above
... and even when we do find them all we may not be able to prove it.
(unless we know in advance the number of solutions, or for some
arithmetic reason we're interested only in ones defined over Q, etc.)
Still, for all that, the p-adic Newton technique yields solutions where
none could be found otherwise. Sometimes we can even prove we have them all.
Note that even if we don't know in advance a bound on the height,
and thus can't tell how much p-adic precision is necessary, once we've
used lattice reduction to guess an algebraic solution we can confirm it
by simply substituting back into the equations (and when the solution
has f' invertible mod p we can then cite Hensel to prove that our solution
must be the right one).
Sometimes we get lucky and there's a prime p at which the problem
has good reduction but nevertheless simplifies. See my paper on
the computation Shimura-Belyi covers associated with congruence
subgroups triangle groups (http://arXiv.org/abs/math/0409020),
recently published in ANTS-7 proceedings. UPDATE -- during the
ANTS meeting I succeeded in computing such a cover of degree
5^3 + 1 = 126 associated to the (2,3,7) triangle group:
there are explicit rational functions A,B,C on the genus-2 curve
y^2 = -16*x^6 + 172*x^5 - 407*x^4 - 334*x^3 + 31*x^2 + 50*x + 5
such that A^7 + B^3 + C^2; then B^3/A^7 is the degree-126 map,
with geometric Galois group PSL_2(F_{125}). [The ANTS paper
is for the cover of degree 3^3 + 1 = 28; simpler covers,
of degree 8 and 9, were already known in the 19th century,
and the three covers of degree 14, all defined over Q(cos(2*Pi/7)),
can also be computed, albeit not as easily, without resorting
to p-adic Newton trickery.]
Example: computing a K3 surface of Neron-Severi rank 20.
[This is the maximum for a K3 surface in characteristic zero;
such K3's are called "singular", in analogy with the equally
misleading traditional terminology for CM elliptic curves;
these notions are related: if E1 and E2 are elliptic curves
then the Kummer surface obtained by blowing up (E1xE2)/{1,-1}
at its 16 double points is "singular" if and only if
E1 and E2 are isogenous CM curves.] Specifically,
an elliptic K3 surface of N-S rank 20, entirely accounted for by
two reducible bad fibers, both multiplicative, of types I_3 and I_{17}.
That is to say: an elliptic surface
y^2 = x^3 - 3A(t) x + 2B(t) (deg(A)=8, deg(B)=12)
for which the discriminant, a scalar multiple of
A^3 - B^2
has a triple and a 17-fold root. [This is maximal by the
ABC conjecture for polynomials; the rational function A^3/B^2,
related to the j-invariant of the curve, is then a degree-24
Belyi function with ramification (3^8) at 0, (2^{12}) at infinity,
and (17 3 1^4) at 1. Unfortunately the resulting Galois extension
is "boring", with group A_{24} rather than a more interesting
transitive subgroup of A_{24}...]
The following derivation uses some not-quite-trivial optimizations,
but purposefully withholds one or two so as to illustrate the
p-adic Newton method at the end without getting too caught up in
algebraic manipulation.
-- Put the I_3 and I_{17} fibers at t=0 and infinity. So,
we want A^3 - B^2 = t^3 * quartic (24 - 17 = 7 = 3 + 4)
A = a^2 + 2b, deg(a)=4, deg(b)<=3
=> B = a^3 + 3ab + c, deg(c)<=2
=> A^3 - B^2 = a^2(3b^2-2ac) - 6abc + 8b^3 - c^2
degree: <=14 9 9 4
so: 3b^2 - 2ac = linear.
-- We can nicely parametrize this by adopting the following neat trick
[which also appears in other guises related to near-cancellation in
A^3 - B^2 (Hall's conjecture) and elsewhere (Don Zagier told me of this
in another context)]: use the automorphism group of the quadratic form
Q(a,b,c) := 3b^2 - 2ac, which is isomorphic with PSL_2. In more
down-to-earth terms: generically c is quadratic, and the remainder
b' of b mod c is linear. Then Q(a,b,c)=Q(a',b',c) for some constant a'.
So start from constant a', linear b', and quadratic c such that
Q(a',b',c) -- which is easy to parametrize, for instance by normalizing
a' = 1, choosing the linear polynomial b' arbitrarily, and then
choosing c to kill the t^2 coefficient of 3b'^2 - 2c. Now choose
some linear polynomial d, set b=b'+2cd, and find a=a'+6d(b'+cd).
This brings us down to A^3-B^2 of degree at most 9; we eliminate
the t^9 and t^8 coefficients by solving for the remaining coefficients
of c, normalize by making d monic, say d=t+d0, and finally obtain:
b' = b1 t + b0;
c = (3 b1^2/2) t^2
+ (18 b0 - b1) b1 t/6
+ (9 b1^2 d0 + (9 b0 - b1)^2) / 54;
b = b' + 2cd;
a = 1 + 6d(b'+cd)
This parametrizes (an open set in) all elliptic K3 surfaces
y^2 = x^3 - 3A(t) x + 2B(t) with an I_{17} fiber at t=infinity;
there are three parameters b0,b1,d0 but we can kill one of them,
for instance by setting d0=0, because translating t by any constant
yields an equivalent surface.
Aside: This will yield, via formulas in Abhinav Kumar's thesis,
the algebraic condition on the Igusa invariants of a curve
C of genus 2 that make the Jacobian J(C) an abelian surface
with real multiplication by Z[(1+sqrt(17))/2]. [For any
genus-2 curve C, the Kummer surface of J(C) is the quotient
under a Nikulin involution of a K3 surface whose Neron-Severi
lattice contains E8 + E7 + (hyperbolic plane); Kumar obtained
the coefficients of this K3 surface explicitly in terms of
the Igusa invariants of C. Enlarging this lattice to one
of rank 18 and discriminant -17 corresponds to the above
real multiplication in J(C).]
Further specialization of b0,b1,d0 yields the curve parametrizing
surfaces with a I_{18} fiber, and the single surface with a I_{19} fiber.
[There are actually two kinds of I_{18} surfaces. One has 3-torsion,
but we do not see it in our analysis, because it comes from
(A,B)=(a^2+2b,a^3+3ab+c) for which b,c have degrees 2,0
rather than the generic 3,2.] But we want another specialization,
the one in which A^3-B^2 has a third-order zero at t=0. This
imposes three rather complicated polynomial conditions on our
three parameters b0,b1,d0. Solving them algebraically would be
quite painful. But it is easy to have the computer try all 13^3
possibilities mod 13. [We chose p=13 because this prime is split
in the CM field Q(sqrt(-51)), which guarantees good reduction;
if p is inert or ramified prime, S becomes "supersingular" mod p,
with Neron-Severi rank 22, which for small p can make the
configuration of reducible fibers more complicated.]
We find (b0,b1,d0)=(-3,1,2), and after a few iterations of Newton
recover the lift (61*3^4/4, 3^9, -19/(4*3^5)).
The resulting (A,B) have huge coefficients, e.g. A is
(945805746756374804027510784 t^8
+ 59999356652038634274168066048 t^7
+ 1422814869054215764245396905472 t^6
+ 14929432812660380691376861698816 t^5
+ 58118397806543759118915292051344 t^4
- 4597406646430468821931783375776 t^3
+ 135241402451317669132244024328 t^2
- 1764491496504191928892453032 t
+ 8625721468351356394552465) / (2^{20} 3^{14})
but that's an artifact of our normalizations, which reduce
the variable count but don't always yield the simplest formulas.
Multiplying t by a relatively small factor can make the polynomials
look much more complicated than they really are... Fortunately
it is usually not hard to bring such monstrosities down to size.
Here we can simplify considerably by dividing t by 324 and A by 729:
A = 2^{-16} *
(9 t^8 - 144 t^7 + 612 t^6 + 2142 t^4 - 28560 t^3 + 884 t^2 + 7072 t + 83521)
and the coefficients become even smaller if we allow affine linear
changes of variable: replace t by 1-4t, to get
A = 9 t^8 + 18 t^7 - 9 t^6 - 18 t^5 + 27 t^4 + 12 t^3 - 16 t^2 + 4 t + 1,
with the I_3 fiber now at t=1/4. [The fact that this surface is defined
over Q can be explained either via the "rigidity" theory for Belyi covers,
or using the arithmetic of K3 surfaces, where the key is that while the
ring of integers in Q(sqrt(-3*17)) does not have unique factorization,
its class group is no larger than genus theory requires. This latter
approach also explains some finer arithmetic details, such as behavior
modulo small primes, and the fact that while the surface is defined
over Q, not all of the components of the reducible fibers can be --
the best one can do is make the I_{17} components all rational,
as we have done, and then the two nontrivial I_3 components
are conjugate over Q(sqrt(-3)).]
Hard exercise: do the same for 7*13 instead of 3*17. Answer:
A = t^8 - 2 t^7 + 7 t^6 + 42 t^5 - 133 t^4 + 196 t^3 - 516 t + 409,
with I_{13} at infinity and I_7 at t=3/4. Along the way, we get
the rational parametrization of I_5 I_{13} surfaces -- and thus,
via Kumar's thesis, of genus-2 Jacobians with real multiplication
by Z[(1+sqrt(65))/2 -- and of I_6 I_{13} surfaces, which are
the starting point of this evening's exercises.
====
What if we have more variables than equations? Suppose n+1 variables
in n equations. Then we get (usually) a curve C, but usually in a very
complicated model, again including extraneous components, etc., even when
C is known or expected to have a simple model with small or even zero genus.
Newton cannot "solve" directly (e.g. can't invert a non-square matrix).
But we can still adapt the technique: having found one regular point
P_0 mod p:
@ choose a hopefully simple (i.e. low degree) function f : C --> P^1
@ choose various c in Q that reduce mod p to f(P_0)
@ For each one, use Newton to compute a close p-adic approximation
to a point in f^{-1}(c) on C -- which we can now do (n equations
in only n variables)
@ Now use algdep to recognize the coordinates as algebraic numbers.
The ratios of coefficients of their defining equations are rational
functions of c.
@ Once this is done to sufficient accuracy for enough c's, we can
guess the coefficients of an equation M(f,g)=0 relating f with
some other function g : C --> P^1. This is probably an equation
for C; now use algebraic geometry of curves to simplify it.
Likewise for 2- or higher-dimensional varieties, though of course
birational algebraic geometry gets harder once the dimension exceeds 1.
Even for dimension 1 it can take significant work, but I've done it
at least once, in the course of parametrizing certain K3 surfaces
of Neron-Severi rank 19 that led to the new rank records for
elliptic curves over Q(t) and Q. (UPDATE: at least three times
by now; the other two are also rank-19 elliptic K3 surfaces,
one for a Mordell-Weil rank problem, the other for a specialization to
a singular K3 surface defined over Q with Neron-Severi discriminant -660.)