On the computation of reducible invariant tori in

We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.

We assume that x ∈ R n , θ ∈ T d , and f is a smooth diffeomorphism.The components of the frequency vector ω are assumed to be linearly independent over the rationals, that is, Note that, if there exists a k ∈ Z d such that k, ω = 0, then one of the components of θ can be expressed as a linear combination of the remaining ones so that the new expression for (1) has the same dynamics as before but the new set of angles has dimension d − 1.Note that, due to the translation in the variable θ, the dynamical system (1) cannot have fixed points or periodic orbits.The simplest invariant set for this system is an invariant torus parametrized by the angle θ: we will say that (1) has an invariant torus of dimension d, with frequency vector ω, if there exists a smooth injective map u : Systems of the form (1) appear in many applications.For instance, it is quite common that the first (simplified) model of a physical situation is an autonomous system, and that successive improvements are based on adding perturbations that depend on time in a periodic way.If they have incommensurable periods, then the resulting perturbation is quasi-periodic and the model (after a suitable section in the case of a flow) takes the form of (1).Good examples come from Celestial Mechanics, where the effect of perturbing bodies can be modeled as the sum of periodic perturbations with incommensurable periods (see, for instance, [24,25,22,23]).
The knowledge of these solutions (and their stable and unstable manifolds) is a key step in the study of the dynamics of these systems.As mentioned above, they are the simplest solutions, and, in this sense, they play the same role as the equilibrium points in autonomous systems [14].
In this paper we focus on the computation of invariant tori of maps, focusing on the case in which the dimension of the tori is larger than 1.As we will see in the examples, the method can be used to look for invariant tori of flows by means of a suitable Poincaré section.Let us start by introducing the main concepts involved in this problem.
As for fixed points or periodic orbits, it is natural to consider the linearized normal behavior around a torus, since it provides the linear approximation to its stable, unstable, and center manifolds.If h(θ) ∈ R n denotes a small displacement with respect to a point u(θ) on the torus, we have that If we define A(θ) = D x f (u(θ), θ) and we rename h as x, the linearized normal behavior around the torus is described by the following linear skew-product: (3) x = A(θ)x, θ = θ + ω.
Definition 1.1.The system (3) is said to be reducible if and only if there exists a change of variables of the form z = C(θ)y such that (3) becomes x = Bx, θ = θ + ω, where the matrix B = C −1 (θ + ω)A(θ)C(θ) does not depend on θ.The matrix B is called the Floquet matrix, and z = C(θ)y is the Floquet transformation.
Note that the dynamics of reducible systems is easily described by computing the (eigenvalues of the) reduced Floquet matrix.Hence, the linear stability of the torus is completely described by (the eigenvalues of) this Floquet matrix.
Reducible tori are quite common in applications.For instance, it is well known that KAM tori of maximal dimension are always reducible.Most of the lower dimensional tori found by means of KAM schemes are also reducible (see, for instance, [16,41,35]).However, there are some KAM results that do not require reducibility (the first result in this direction is [3]).For a discussion on the role of the reducibility condition, see [4].A discussion about the mechanisms behind a reducibility loss can be found in [27].Some examples of reducible tori in astrodynamics can be found in [9,30,26].
In this work we describe a numerical procedure to compute reducible invariant tori and their Floquet matrix.The method is very suitable for a parallel computer, and we will explain our implementation for a cluster of PCs and how it scales with the number of processors.As mentioned before, this method can also be applied to flows by using a suitable Poincaré section.Note that a suitable section of the flow reduces the dimension of the computed torus and, as we will see, it also significantly reduces the number of computations compared to a similar method for flows.
There are several methods in the literature for the computation of two dimensional invariant tori of flows (or, equivalently, invariant curves for maps); among them we mention [13,14,46,12,40,8,47,9,15,43,44,28,1,42,29].There are also numerical methods to approximate the Floquet matrix and Floquet transformation of linear skew-products; see [30,50].We stress that dealing with tori of dimensions larger than 2 for flows (or larger than 1 for maps) is more difficult, essentially because of the increase of the computational effort with the dimension of the torus.A recent advance in computational power comes from the increase in the number of processors available, either in a single computer or through a fast network.For this reason it is quite natural to focus on parallelization issues.A possibility is to implement the same algorithms as before, but taking advantage of a parallel environment (for an example, see [10]).Here we focus on an algorithm that requires the torus to be reducible and that offers a high degree of parallelism.
Our algorithm is derived from the proof of the main theorem given in [35].Although the results of [35] are for flows (not necessarily Hamiltonian flows), they can be easily translated for maps.In our approach, we represent the parametrization u(θ) of the torus by means of Fourier series, and we apply a Newton method to look for these Fourier coefficients.It turns out that, if the torus is reducible and we know an approximation to the reducing change and the Floquet matrix, the linear system that appears at each step of the Newton process can be written in block diagonal form (or even in diagonal form!), which is a huge reduction in computational effort.Therefore, here we try to compute not only the torus but also the Floquet reduction.
As mentioned above, one of the main issues we have addressed here is parallelization.When computing invariant tori of dimensions larger than 1, computing resources usually becomes the main constraint.The method considered here allows for a high degree of parallelism, especially in the parts with higher computation.This includes the evaluation of the map and its differential and the solution of all the linear systems.We have implemented this method to run on a cluster.The coding has been done in ANSI C, with calls to the PVM library for the communications.We include several examples showing the performance of our implementation.A preliminary version of this work can be found in [31].
One of the main drawbacks of this approach is that not all the tori are reducible (for concrete examples, see [6,30,27]).However, reducibility is quite common in many situations, and, as we show here, taking advantage of it can allow huge savings in computer time and memory.
The main difference between this work and previous papers that deal with the computation of invariant tori is that here we focus on the computation of tori of dimensions larger than 1 for maps (i.e., larger than 2 for flows).In fact, we compute invariant tori of dimensions up to 5 for flows, with accuracies between 10 −9 and 10 −12 , depending on the example.The numerical Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpapproximation of these tori is very difficult, due to the huge number of computations required.There are almost no papers in the literature dealing with the computation of tori of dimensions larger than 2 with this level of accuracy, so it is very difficult to make comparisons with other methods.Among them, we cite [21] (reprinted in [23]), which contains a calculation of a four dimensional torus of a flow with quite low accuracy (about 10 −4 ).Moreover, [11] contains some computations of tori of dimension 2 for maps, although there is no mention of the accuracy.An alternative procedure to compute higher dimensional tori close to a previously computed two dimensional torus of a flow is to compute a high order normal form near this torus (see [19]); the main inconvenience of this procedure is that it is only valid in a neighborhood of the initial torus (the case of invariant tori of dimensions up to 3 for flows around a periodic orbit can be found in [18]).A specific method for close to integrable Hamiltonian systems can be found in [39]; this method reproduces the classical Kolmogorov proof of the KAM theorem, and it is used to approximate a four dimensional torus for the Sun-Jupiter-Saturn system.
In this paper we focus only on nonautonomous systems (2), and we look for invariant tori parametrized by the forcing angle θ.In a forthcoming paper [32] we will show how this method can also be used for autonomous systems, where the frequency of the torus does not appear explicitly.
The paper is organized as follows: Section 2 contains the description of the method, section 3 discusses the implementation in a parallel computer, and section 4 contains some examples to illustrate the method.
2. The main iteration.We will assume that there exists an invariant torus for (1), that is, that there exists a smooth function u : T d → R n verifying (2).The numerical method is based on a Newton iteration, with quadratic convergence.Therefore, it needs an initial approximation to the desired torus.For the present moment, and to simplify the presentation, we will simply assume that such an approximation is available.In the examples we will discuss some possibilities for deriving this initial approximation.
Let .be a norm over R n , and, if u : T d → R n is a smooth function, we use the standard notation u ∞ = sup θ u(θ) .We assume that we have a parametrization x 0 : then y 0 ∞ is small.We will denote this as y 0 ∞ ≈ ε.
We also assume that we know an n × n matrix C 0 (θ) that is also a good approximation to the Floquet transformation.This means that the matrix In one step of the iterative procedure, we want to compute a new approximation x 1 to the torus, a new approximation C 1 to its Floquet change, and a new approximation B 1 to its Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpreduced matrix, such that the new "remainders" are of second order with respect to the previous ones, that is, In a practical implementation of the method we will use truncated Fourier series to represent the approximations to the torus and the Floquet change.For simplicity, we will first describe it using infinite series.The details of the representation in truncated Fourier series are discussed in section 3.1.

On the hypotheses.
As mentioned before, the method is based on a proof of existence of such tori.Here we discuss the role of these hypotheses and how they affect the convergence of the numerical method.
A first hypothesis refers to the smoothness of the map.It is quite common to require analyticity, since it simplifies the proofs and it is enough for most of the applications.For these reasons, we will assume that the map (1) and the functions x 0 (θ) and C 0 (θ) giving the initial approximations to the torus and to the Floquet change are real analytic.
A second hypothesis is a Diophantine condition on the frequency vector ω and the eigenvalues λ 1 , . . ., λ n of the matrix B 0 : we assume that there exist real constants c > 0 and where "i" denotes the complex unit.It is clear that condition (8) is always satisfied if all the eigenvalues λ j have moduli different from 1, and that (9) is always satisfied if all the eigenvalues have different moduli.If there are eigenvalues with modulus 1 and/or eigenvalues with the same modulus, (8) and (9) may still hold (for a general reference, see [37]).The roles of these two conditions are different.When (8) does not hold, in the conservative case the torus is generically destroyed (see [49]).In the dissipative case, the torus suffers a differentiability loss (see [7]).When, during a continuation process, a torus approaches a failure of condition (8), methods based on a Fourier series representation of the torus require more and more Fourier modes to achieve a given accuracy.
Condition (9) has a more technical role: it is required to obtain reducible tori, but its failure is not related to the destruction of the tori.A study of the role of (9) along a family of invariant tori can be found in [4], where the dynamical effects of the failure of this condition are studied.Here, we need condition (9) to compute the Floquet transformation that reduces the linearized flow along the torus to a constant coefficients matrix.So, when ( 9) is nearly violated, the number of Fourier modes used to represent this Floquet transformation increases.
As we will see in the next two sections, the iterative method is based on computing corrections for the approximations to the torus and the Floquet transformation.These corrections are obtained by solving a linear equation that comes from the linearization of the problem.
As usual, this equation is solved by expanding in Fourier series with respect to θ.Then, the coefficients of the Fourier series of these corrections contain the divisors exp( k, ω i) − λ j (for the torus) and λ j exp(i k, ω ) − λ (for the Floquet change).Hence, we not only need that these values are different from zero but that they cannot get too close to zero when |k| increases (k is the index of the Fourier expansion), because this could destroy the convergence of these series.This is the so-called small divisors problem. 1he last hypothesis is a nondegeneracy condition.At each step of the iterative process, the approximation to the reduced Floquet matrix B changes, and so do its eigenvalues.This means that asking for conditions ( 8) and ( 9) at the beginning is enough for the first step of the iterative process, but it does not guarantee that they are satisfied for the following steps.To deal with this situation, it is common to ask that the eigenvalues λ j depend on parameters.Then, playing with the parameters, one can recover properties ( 8) and ( 9).This allows us to show that, for a (Cantor) set of positive Lebesgue measures of values of the parameters, these conditions are satisfied in all the steps and the proof can be completed.
For more details on these topics, we suggest the general works [5,38,45] and references therein.
In principle, to be sure that the method will converge, one should check all the conditions of the theorems, including that the value ε that measures the accuracy of the initial approximation is small enough.As we will apply only a finite number of iterations of the method, we can check for the conditions at each step.In particular, for the range of considered Fourier coefficients (i.e., the range of indices k) we check if the values | exp( k, ω i) − λ j | and |λ j exp(i k, ω ) − λ | are very small.If so, the program stops with a message.The error due to the use of truncated Fourier series is discussed in section 3.2.

Correcting the approximation to the torus.
We assume that we know an approximation to the torus, x 0 , and an approximation to its Floquet transformation, C 0 , and to its Floquet matrix, B 0 .If we define y 0 and Q 0 as in ( 4) and ( 5), then we are assuming that y 0 ∞ and Q 0 ∞ are of order ε (here ε denotes a small parameter).To simplify notation, we write this as y 0 ∞ ≈ ε and Q 0 ∞ ≈ ε.The first step of the iteration is based on the following property.Lemma 2.1.Let λ 1 , . . ., λ n be the eigenvalues of B 0 , and we assume that they satisfy the Diophantine condition (8) for some constants c > 0 and γ > d.Let us define g(θ) = −C −1 0 (θ+ ω)y 0 (θ).Then, there exists a function u : and such that, if x 1 (θ) = x 0 (θ) + C 0 (θ)u(θ) and y 1 is defined as in (6), we have that , we can use the smoothness of f to look for a linear approximation to h as We look for an h satisfying If we can find such an h, with a norm of the same order as y 0 , it is clear that we will have y 1 ∞ ≈ ε 2 .To solve (11), we apply the transformation h(θ) = C 0 (θ)u(θ) to obtain ( 12) where Q 0 has been defined in (5) and g(θ It is easy to check that, to get a remainder y 1 of order ε 2 , it is enough to compute an approximate solution of ( 12) with an error of order ε 2 .Therefore, since it is enough to compute u with an accuracy of order ε, we drop Q 0 and we focus on (10).
To see that (10) has a solution, we expand u and g in Fourier series.Although it is usual in theoretical works to expand in complex series, here we expand in real Fourier series.The reason is that we have used real expansions in the computer programs.Hence, where If we plug these expansions into (10), it is not difficult to check that the unknown Fourier coefficients α k , β k must satisfy the equations The eigenvalues of the matrices 0 is invertible if all the eigenvalues of B 0 are different from exp(± k, ω i).For the convergence of the series we use condition (8) to show the analyticity of u in a subset of the analyticity domain of g in the same way it is done in [35] or [38].

Correcting the approximation of the Floquet transformation. The linearized flow around the initial approximation is given by
We have an approximation for its Floquet change and the reduced Floquet matrix, C 0 (θ) and B 0 , satisfying (5) with Q 0 ∞ ≈ ε.After the previous step, we have a new approximation to the solution, x 1 , and we want to find a new transformation C 1 (θ) and a new reduced matrix Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpB 1 such that ( 7) is satisfied with Q 1 ∞ ≈ ε 2 (this situation has already been considered in several places; see [2,34,33]).Let us introduce the matrices R and B 1 as where Avg (R) denotes the average of the map θ → R(θ), that is, Lemma 2.2.Let λ 1 , . . ., λ n be the eigenvalues of B 1 , and we assume that they satisfy the Diophantine condition (9) for some constants c > 0 and γ > d.Let us define R(θ) = R(θ) − Avg (R).Then, there exists a matrix valued function H : and such that, if C 1 (θ) = C 0 (θ)(Id + H(θ)), the matrix Q 1 defined as in (7) satisfies Proof.We look for a new approximation to the Floquet transformation, ε-close to the previous one, ( 16) where H(θ) denotes a matrix with norm of order ε.The new transformation x = C 1 (θ)z must reduce the linear skew product Let us introduce the matrices R and B 1 as in (14).Applying the transformation x = C 0 (θ)y to (17), we obtain . Now, if we denote a near identity transformation as y = (Id + H(θ)) u Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php(H is supposed to be small), and we apply it to (19), we obtain We are interested in reducing (20) to order ε 2 .Note that (I + H(θ + ω)) −1 can be approximated by with an error of O( H 2 ).This means that (20) can be written as Hence, if H(θ) satisfies (15), it is clear that the transformation ( 16) reduces (17) to the form (18).
To show that (15) has a solution, we expand H and R in complex Fourier series (although for the computations we will use a real expansion, as will be discussed later).Therefore, we denote and we insert these expansions into (15).Equating for each k we have that, for k = 0, it is enough to take H 0 = 0.For k = 0, H k has to satisfy For each k = 0, let us define the linear map L k , acting on the space of constant matrices, as L k (H) = HB 1 exp(i k, ω ) − B 1 H.Without loss of generality, we can assume that B 1 is already in diagonal form, B 1 = diag(λ 1 , . . ., λ n ).Then, it is not difficult to show that ker(L k ) = {0} ⇐⇒ there exist j, such that λ j exp(i k, ω ) − λ = 0.
Taking into account condition (9), it is clear that (21) determines the matrices H k for all k = 0.Moreover, (9) allows us to prove the convergence of these series and to show that H is analytic on a subset of the domain of analyticity of R. As in Lemma 2.1, the details can be found in [35] or [38].
For the computations, we will use real Fourier series.As H 0 = 0, we write Then, it is not difficult to derive the following expressions: For each k, this is a linear system (of dimension 2n 2 ) for the unknowns H Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php2.4.The iterative scheme.This process is repeated until the norm of the remainders, y k ∞ and Q k ∞ , is small enough.The scheme is summarized as follows: We assume we have an approximation x 0 (θ) of a torus such that y 0 (θ) ∞ = x 0 (θ + ω) − f (x 0 (θ)) ∞ ≈ ε and an approximation y = C 0 (θ)x to its Floquet transformation such that it transforms the linear system Each iteration of the process has two steps.The first step consists of the following operations: . This is done by expanding g and u in real Fourier series and using (13).
The second step corrects the approximation to the Floquet matrix and Floquet transformation.It consists of the following operations: . This is done by expanding R and H in real Fourier series; see (22).
. C 1 is the new approximation to the Floquet change.Note that the dimension of the linear systems in the method depends only on the dimension of the phase space, and their number depends only on the number of Fourier modes used in the approximation.As the number of Fourier modes needed to approximate a torus grows exponentially with the dimension of the torus, the method is suitable for computing tori of higher dimensions.

Computer implementation.
In this section, we discuss the computer implementation of the algorithm.As is usual in numerical implementations, we will use truncated Fourier series to represent tori and their Floquet transformations, although for some operations it is better to represent them by a table of values (see section 3.1).The number of Fourier modes considered (or the number of points in the tables) is determined from the level of accuracy, as discussed in section 3.2.Finally, in section 3.3, we discuss some details of the parallel implementation.

Table of values and Fourier coefficients.
Some of the operations in the algorithm are better performed on Fourier series (see (13) and ( 22)), while some others are better done on tables of values of the functions (the computation of the matrices D x f (x j (θ)) and R(θ) and the products involving them).
To perform these transformations, we have used a multidimensional discrete Fourier transform algorithm, as explained in [17].These techniques are implemented in the well-known Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phplibrary FFTW3 (its home page is http://www.fftw.org/),included as an optional package in several GNU/Linux distributions.
We note that the two sets (table of values and truncated Fourier series) have the same size.Hence, truncated series or tables of values will be stored indistinctly in the same array, depending on the step where the algorithm is.

3.2.
The control of the error.We have two main sources of error that affect the result.
(i) The error of the invariance condition on the table of values of the torus (and of the Floquet transformation).This error is easy to control: the iterations explained in section 2 are stopped when the remainders on this mesh are small.
(ii) The interpolation error, due to the substitution of the torus (and the Floquet change) for a mesh of points.To control this second source of errors we first to estimate it, and then to be able to change the mesh when this error is too large.In what follows we will discuss these issues.
Let us denote by A ⊂ T d the set of points corresponding to the tables of values used in the algorithm.Assume that after some iterations we have obtained two sets of values, one for the torus and one for the Floquet transformation, {x(θ i )} θ i ∈A and {C(θ i )} θ i ∈A , and a constant Floquet matrix B satisfying (23) max where δ is a fixed tolerance.Let us define δ 1,2 as Note that the computation of δ 1,2 is a difficult task (especially for d ≥ 2).We will use two different methods to estimate these values.The first method is very fast (it requires almost no work), but the value provided is a rough estimate of the true value.The second method requires a much greater effort, and the result is more reliable.Let us first explain these two methods, and then we will discuss how we use them.
The first method consists in looking at the norm of some of the "last" coefficients of the Fourier series and using it as an estimate for the truncation error of the series.As we are looking for a fast indicator, we have used the last two coefficients along the "lines" k = (0, . . ., 0, k j , 0, . . ., 0) (for j = 1, . . ., d).Once the Newton iteration has converged on a given mesh, we check the size of these coefficients.If one of them is larger than a prescribed threshold, we assume that the interpolation error is too big, and we increase the number of Fourier modes in the direction of these large coefficients (this is equivalent to refining the mesh of points in the direction of the corresponding angles θ j ).Then, we restart the Newton iterations from the previous approximation, with the new Fourier modes set to zero.
The second test is to evaluate the estimation of the error (23) on a set of values Ã ⊂ T d different from the set A used for the computations.A first option is to use a thinner partition Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpÃ to produce a better estimate of the invariance errors δ 1,2 .The main inconvenience is that the number of points in the mesh grows very fast.For instance, if we double the number of points in each direction θ j , j = 1, . . ., d, the total number of points in the mesh grows by a factor of 2 d .Although this test has a natural parallelism, it can take a long time, especially in those cases where the map is given by the Poincaré section of a flow.For this reason, we propose the following alternative: During the iterative procedure, we will use a mesh Ã with the same number of points as A. This mesh is obtained by adding a small vector γ = (γ 1 , . . ., γ d ) to each point of A. The values γ j are one half of the distance between the points of A in the direction θ j .In this way the new mesh Ã is interlaced with the initial mesh A. Then, we check the conditions max If this test is not satisfied, we add some Fourier coefficients (in all the "directions" θ i ), and we go back to the Newton iteration to refine the solution.If it is satisfied, we can either stop the algorithm and accept the solution or check it again with a thinner mesh.The idea is to avoid checking with finer meshes during the computations (it is too costly) and to do a single check at the end to be sure of the accuracy.

Parallelism.
The method has mainly two parallel points: the simultaneous resolution of the different linear systems that appear in ( 13) and ( 22) and the multiple evaluations of the map f and its differential.This last point could be the most computing-intensive part if the evaluation of f and Df requires a lot of operations (think of a Poincaré map of an ODE).These are the places where most of the computational work is done.
Our implementation runs in a cluster of PCs, connected through an Ethernet network, and using the PVM library for the communications (see [20]).The PVM library is freely accessible from the Internet, and it also comes as an option in several GNU/Linux distributions.
The programming model used is the so-called master-slave model: there is a main program (the master) that splits the work into independent pieces and sends them to a set of programs (usually called slaves, but we prefer to refer to them as secondary programs) that do the computations and send the results back to the master (we prefer to call it the primary program).This is one of the simplest paradigms for parallel computing, and it is enough for our situation.
In our implementation, the primary program runs in one of the nodes of the cluster, while the secondary programs run in different nodes, on a CPU-per-program basis.We use three different classes of secondary programs.
(i) Secondary programs of the kind A (SPA).Their task consists in creating and solving the linear systems (13).In this way each SPA receives from the main program a subset of multi-indices k.For each k, the SPA computes the linear system in ( 13) and solves it.Then, it returns to the main program an integer.This integer means that it has finished the task.If there are more systems to be solved, the main program sends the SPA another subset of multi-indices; otherwise it sends an integer to tell SPA that there are no more systems to solve.Then, SPA returns all the computed solutions to the main program.Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php(ii) Secondary programs of the kind B (SPB).They take care of the construction and resolution of the linear systems (22).Similarly to the SPA, each SPB receives from the main program a subset of multi-indices k.Then, for each k, the SPB constructs the linear system in (22) and solves it.After that, it returns to the main program an integer saying that this task has been finished.As before, the main program knows if there are more systems to solve, and, if so, it send two integers more.If all the systems have been solved, the main program sends an integer to indicate to the SPB that the computation is finished, and then the SPB sends all the results to the main program.
(iii) Secondary programs of kind C (SPC).They have the task of evaluating over A the application f and its differential D x f .The SPC also receives from the main program a set of values of the angles θ.The SPC computes the evaluation of f (θ) and D x f (x(θ), θ) for all these θ's and asks the main program for another set.When all the evaluations have been done, The SPC sends all the results to the main program.
The nonparallel part is done by the main program, and it contains the fast Fourier transforms and the error estimates.We have done some tests including a parallelization of these two points, but, for the examples included in this paper, it takes a bit longer.The reason is that both computations are very fast and the overhead added by the communications is too big.However, for much larger examples, this parallelization could be advantageous.

Examples.
In this section we present several examples to show the effectiveness of the method.

A quasi-periodically forced pendulum.
Let us consider the following system of ODEs: ẋ = y, ẏ = −α sin x + εq(θ 0 , . . ., θ d ), (24) θi = ω i , i = 0, . . ., d, where x ∈ R, y ∈ R, and, for i = 0, . . ., d, θ i ∈ T and ω i ∈ R. We will use the following function q: The reason for using this expression and not something simpler like cos θ i is to have a perturbing function with many harmonics of a relevant amplitude.A natural way to get this is to invert a trigonometric polynomial with a zero in the complex plane, not far from the real line (think of the Fourier expansion of the function q in (25)).
Note that, for ε = 0, system (24) has the d-dimensional invariant torus x = y = 0, θ ∈ T d .It is known [35,36] that there exists a Cantor set E of values of ε for which the system (24) has a (d+ 1)-dimensional invariant torus, with frequencies ω 0 , . . ., ω d .Moreover, the Lebesgue measure of the set E ∩ [0, ε 0 ] is exponentially small with ε 0 .When ε is small but outside of the set E, the torus can still exist, but with different stability properties and/or it does not need to be unique.For a discussion of this, see [4].Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 1
The torus for the quasi-periodically forced pendulum.The first column shows the value of d; the second column corresponds to the accuracy required to the solution (the value δ in (23)); the third column contains the number of harmonics of the initial approximation; the last column shows the final number of harmonics used to approximate the solution within the accuracy shown in the second column.Here we select a small value for ε, and we will try to compute an invariant torus close to the origin, with frequencies ω 0 , . . ., ω d .For the computations we will use several values for d to produce invariant tori of different dimensions.For d = 1, . . ., 4, the frequency vector ω is taken as ( 26) Finally, we will use the parameters α = 0.8 and ε = 0.15.To apply the methodology described before, let us consider the return map for the section θ 0 = 0 (mod 2π).If we denote a point on this section by z = (x, y) and θ = (θ 1 , . . ., θ d ), we represent the associated return map to the section θ 0 = 0 ( mod 2π) as with ω = (ω 1 , . . ., ωd ), where, for i = 1, . . ., d, we define ωi = 2πω i .The Jacobian of ( 27) is obtained by means of the numerical integration of the variational equations of ( 24).We will focus on invariant tori of dimension d, that is, tori that can be parametrized as z 0 : T d → R 2 and that satisfy the equation z 0 (θ + ω) = P (z 0 (θ), θ).They are the simplest invariant objects of a system like (27).As mentioned before, the linearization around such a torus is given by (3), where A(θ) = D z P (z 0 (θ), θ).When ε = 0, the point x = y = 0 is invariant by the flow of (24), and this implies that the map P satisfies P (0) = 0 (note that if we consider the variable θ, this set is a d-dimensional invariant torus for ( 27)).To apply the previous algorithm, we will use z 0 (θ) ≡ 0 as an initial approximation to the torus and the Jacobian of P at z = 0 (for ε = 0) as an approximation to the reduced matrix (note that, for ε = 0, the Jacobian of P does not depend on θ) and the identity matrix as an approximation to the Floquet transformation.In other words, we use the torus for ε = 0 as an approximation to the torus for ε small.
As the number of Fourier terms needed to get an approximation with a prescribed accuracy is not known in advance, we start with a given number, and the algorithm for the control of the error (see section 3.2) takes care of increasing the number of Fourier terms.Table 1 Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 3
Computing time for the torus of the quasi-periodically forced pendulum (24).The first column is the value of d, the second column (p) shows the number of processors used, the third column is the total time (clock time, not CPU time) taken by the program, the fourth is the CPU time used by the main program (the nonparallel part), and the last one is the difference between the two previous columns.shows the required number of terms for different values of d and different accuracies.Table 2 displays the eigenvalues of the Floquet matrix of the solutions for d = 1, 2, 3, and 4. Note that all these solutions are linearly stable.Table 3 shows the computing time needed to obtain the solution in four different cases: d = 1 and d = 2 with accuracy 10 −14 , d = 3 with accuracy 10 −12 , and d = 4 with accuracy 10 −10 .The second column shows the number of CPUs used (each is a 2.4 GHz Intel Xeon processor).Each processor runs one SPA, one SPB, and one SPC (see section 3.3).Moreover, one of the processors runs the main program.The third column corresponds to the total time spent by the algorithm (i.e., from the beginning to the end of the run).The time spent by the main program (i.e., the part that has not been parallelized) is shown in the fourth column.The time used in the parallel part is obtained by subtracting the third and fourth columns, and it is shown in the last one.
Figure 1 (left) shows the invariant curve of ( 27) for d = 1 in the (x, y) plane.Figure 1 -  -  (right) shows the torus of ( 27) for d = 2.To make this plot, we have taken a uniform "squared" mesh on T 2 and we have plotted the image of this mesh through the parametrization of the torus to make a "geometric" plot.Another option would be to take a trajectory on T 2 (a straight line with the slope given by the frequencies) and to plot the image of this line in the phase space by means of the parametrization of the torus.Figure 2 shows three slices of the invariant torus for the case d = 3; that is, we fix the value of one of the angles and we draw the resulting two dimensional torus as in the previous figure.Finally, Figure 3 shows six sections of the solution in the case d = 4.Of course, it is possible to use a three dimensional mesh in Figure 2 (or a four dimensional mesh in Figure 2), but the resulting plot is less clear.Another possibility is to visualize the torus making a movie of the section, when the value of the coordinate used for the section varies from 0 to 2π.

A dissipative situation.
Here we consider a dissipative perturbation of ( 24): where the function q is defined in (25) and the frequencies ω i are defined in (26).We also fix ε = 0.15, α = 0.8, and d = 4.As was done in section 4.1, we use the Poincaré section θ 0 ≡ 0, and we will compute invariant tori for the corresponding Poincaré map P .As before, we will obtain the initial approximation for the method from the case ε = 0: the fixed point x = y = 0 is used for the torus, the differential of P at this point for the Floquet matrix, and the identity matrix for the Floquet transformation.The invariant torus for ε = 0.15 has been computed with an accuracy of 10 −12 .The number of Fourier terms required to get this accuracy is 1, 185, 921.The eigenvalues of the Floquet matrix have modulus 4.3213918263772258 × 10 −2 and argument ±1.5161316762139108.We note that, as the divergence of ( 28) is constant and equal to −1 and the matrix A(θ) is obtained by integrating the variational flow for 2π units of time (we recall that ω 0 = 1), then det A(θ) ≡ exp(−2π).Therefore, the product of the modulus of the two computed eigenvalues should be equal to exp(−2π).As an extra test for the level of accuracy, we compute the difference between this product and exp(−2π), and the result is 7.1 × 10 −19 .Figure 4 shows six sections of the solution.

The multicircular model.
Let us assume that the Sun and Jupiter move in a circular orbit around their common center of mass according to the Kepler laws.The restricted three body problem (RTBP) describes the motion of a third particle on the vector field produced by these two primaries (the Sun and Jupiter).It is assumed that the mass of this third particle is so small that it does not affect the motion of the two primaries.It is common to choose units of distance, time, and mass such that the distance between the primaries is 1, the period of the primaries is 2π, and the sum of masses of the Sun and Jupiter is 1.It is also usual to use a rotating coordinate system with origin at the center of mass so that the two primaries are kept fixed on the X axis.In this reference system, there are five equilibrium points for the particle: three of them (L 1 , L 2 , and L 3 ) are on the X axis, and the other two (L 4 and L 5 ) form an equilateral triangle with the primaries (for more details, see, for instance, [48]).
We are interested in finding quasi-periodic solutions near L 5 in models that are a perturbation of the RTBP.More concretely, we consider perturbations obtained by adding planets to the system.We will assume that the planets move in circular orbits around the center of masses of the Sun and Jupiter.In particular, we want to consider the direct gravitational effect on the particle due to Saturn, Uranus, Neptune, and Earth.The equations of motion for the particle are i , and ω i are the frequencies of the perturbing planets, m i are their masses, and a i denotes their semimajor axis, in these coordinates.Moreover, turbing planets).The mass parameter μ (that corresponds to the Sun-Jupiter problem) is 9.5388118036309677 × 10 −4 .Table 4 shows the values of the constants, in adimensional units, that appear in the equation.The index i = 0 corresponds to Uranus, i = 1 to Saturn, i = 2 to Neptune, and i = 3 to Earth.As in the previous examples, we will work on the section θ 0 ≡ 0. The corresponding return map depends on d angles (because the system depends on d + 1 and the section removes one of them).For simplicity, let us abuse the notation and denote by z the vector (x, y, z, p x , p y , p z ) ∈ R 6 .Moreover, we define θ = (θ 1 , . . ., θ d ), and, as before, we call P the return map to the section θ 0 ≡ 0. In this way, the system (29) can be represented by (30) z = P (z, θ), θ = θ + ω, with ω = (ω i ), with ωi = 2πω i /ω 0 for i = 1, . . ., d.The differential of P is obtained by means of the numerical integration of the variational equations of (29).Therefore, if z 0 (θ) is a quasi-periodic solution of (30), then the linear dynamics around it are given by (3), where A(θ) = D z P (z 0 (θ), θ).We are interested in finding an invariant torus of (29) near the equilibria L 5 of the unperturbed system, with the same frequencies as the perturbation.We will use as an initial approximation the coordinates of the point L 5 of the Sun-Jupiter RTBP.The linearization of P at L 5 (again for the Sun-Jupiter RTBP) will be used as the initial guess for the reduced Floquet matrix of the invariant torus of (29) that we want to compute.As for the unperturbed case (the RTBP) the Floquet change is the identity, we will also select the identity as a first approximation for the perturbed situation.
Table 5 shows the number of Fourier terms required to get an approximation of the torus with a given accuracy.In all the cases (d = 1, 2, 3) the six eigenvalues of the Floquet matrix are complex conjugate pairs with modulus 1, so the tori are linearly stable.In Table 6 the positive argument of these eigenvalues is shown.The first row corresponds to the case d = 1, Downloaded 02/05/13 to 161.116.168.89.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 6
Positive argument of the eigenvalues of the Floquet matrix of the tori found for the multicircular model.the second to the case d = 2, and the last row to the case d = 3.As an extra test, we compute the difference between the product of the eigenvalues and 1 (the product has to be 1 due to the Hamiltonian structure).This difference is 4.66 × 10 −15 for d = 1, 2.64 × 10 −15 for d = 2, and 5.14 × 10 −12 for d = 3.Finally, Figures 5 and 6 show the tori for the discrete system (30).The tori for the flow (29) can be easily obtained by means of numerical integrations starting on a mesh of points on the tori found for (30).

Figure 1 .
Figure 1.The picture of the found solution in the case of d = 1 (on left) and d = 2 (on right).

Table 5
Accuracy and number of harmonics: The first row shows the value of d, the second row is the accuracy of the solution, and the third row contains the number of harmonics needed in the approximation of the solution.