Temporal Diffeomorphic Free-Form Deformation: Application to Motion and Strain Estimation from 3D Echocardiography

This paper presents a new registration algorithm, called Temporal Diffeomorphic Free Form Deformation (TDFFD), and its application to motion and strain quantification from a sequence of 3D ultrasound (US) images. The originality of our approach resides in enforcing time consistency by representing the 4D velocity field as the sum of continuous spatiotemporal B-Spline kernels. The spatiotemporal displacement field is then recovered through forward Eulerian integration of the non-stationary velocity field. The strain tensor is computed locally using the spatial derivatives of the reconstructed displacement field. The energy functional considered in this paper weighs two terms: the image similarity and a regularization term. The image similarity metric is the sum of squared differences between the intensities of each frame and a reference one. Any frame in the sequence can be chosen as reference. The regularization term is based on the incompressibility of myocardial tissue. TDFFD was compared to pairwise 3D FFD and 3D+t FFD, both on displacement and velocity fields, on a set of synthetic 3D US images with different noise levels. TDFFD showed increased robustness to noise compared to these two state-of-the-art algorithms. TDFFD also proved to be more resistant to a reduced temporal resolution when decimating this synthetic sequence. Finally, this synthetic dataset was used to determine optimal settings of the TDFFD algorithm. Subsequently, TDFFD was applied to a database of cardiac 3D US images of the left ventricle acquired from 9 healthy volunteers and 13 patients treated by Cardiac Resynchronization Therapy (CRT). On healthy cases, uniform strain patterns were observed over all myocardial segments, as physiologically expected. On all CRT patients, the improvement in synchrony of regional longitudinal strain correlated with CRT clinical outcome as quantified by the reduction of end-systolic left ventricular volume at follow-up (6 and 12 months), showing the potential of the proposed algorithm for the assessment of CRT.


Motion and strain quantification from 3D ultrasound
The quantification of cardiac motion and strain provides insight into cardiac function through the assessment of how a given pathology affects global and local deformation of the myocardium. In clinical routine, ultrasonography (US) is the standard imaging modality for the quantification and the analysis of regional motion and strain. 3D US and Magnetic Resonance Imaging (MRI) offer in clinical practice similar volume rates (about 20-40 frames per heart cycle). Standard 3D US acquisition protocols divide the field of view in sectors, each sector being acquired separately. Within a sector, 3D US provide real-time images, while MRI requires gating. Besides, MRI is most often acquired per-slice and not as a complete volume. Even though MRI has better signal-to-noise ratio (SNR), US remains the most commonly used imaging modality. Indeed, US allows image analysis of follow-ups with implanted devices, has lower cost and higher availability than MRI at clinical sites. With respect to 2D US, 3D US enables motion quantification in the entire myocardium and does not suffer from out-of-plane motion artifacts. Nonetheless, several factors can induce artifacts in US images, such as signal attenuations, shadows, inhomogeneous resolution etc. Thus, 3D US images have relatively low SNR, making their processing more challenging.
Several approaches (Duan et al. (2007); Elen et al. (2008); Kawagishi (2008); Abe et al. (2008); Wang et al. (2010); Andrade et al. (2011)) have been proposed to extend 2D speckle tracking techniques to 3D with the objective of recovering myocardial motion from a 3D US image sequence. Duan et al. (2007) used an optical flow technique to quantify myocardial motion but limited their validation to simulated US data. Elen et al. (2008) applied B-Spline registration technique guided by mutual information to estimate 3D myocardial strain from 3D US sequences. They obtained absolute errors up to 21% for the radial strain and up to 4% for the longitudinal and circumferential strains on simulated US images, and recovered strain curves on in vivo data with physiologically plausible magnitude and dynamics. Kawagishi (2008) described the methods for applying speckle tracking on CRT candidates using 3D block matching and further computed regional strain curves from the resulting sparse displacement field. Radial strain curves obtained from a 3D speckle tracking technique were plotted for a single patient, but the authors did not look at the improvement of the strain pattern over therapy. No validation nor comparison to other techniques was made in their paper. Earlier work by the same authors (Abe et al. (2008)) on 3D speckle tracking only reported strain quantification results on a synthetic left ventricle (LV) model. Andrade et al. (2011) used 3D speckle tracking to quantify LV twist in 50 patients and highlighted statistical differences between 2D and 3D rotation measurements without looking at strain. Finally, Wang et al. (2010) recently proposed an unified Bayesian framework fusing multiple features (speckle patterns, border detection, intensity gradient and motion prediction) to recover 3D motion and strain. As this framework uses a training set to encode motion priors, an important issue is to know whether a single training set can handle all possible pathological cases. Such an approach could also introduce a bias towards deformation patterns that are more predominant among the training population. This approach is still pairwise since it estimates motion at every frame in a sequential manner without optimizing a joint objective function on the entire sequence.

Temporal consistency in image registration
One of the main drawbacks of speckle-tracking techniques based on pairwise registrations is that they do not make use of the temporal information embedded in the 3D US sequences.
In the literature, several image registration approaches, were proposed to exploit the temporal coherence of the input image sequence and they are summarized in Table 1. Such approaches can be organized into two main application fields: inter-subject sequence matching and deformation/motion quantification. Table 1 gives for each approach the application domain, the image modality, and its most distinctive algorithmic feature. It also specifies the input and output dimensionality of the transformation model (d in → d out ). The two last columns of Table 1 specify if the spatiotemporal transformation is represented as a displacement or a velocity field, and if it ensures temporal continuity.
In the context of inter-subject sequence matching, spatiotemporal matching attempts to map the cardiac geometry and dynamics of two patients for statistical atlas construction (Perperidis et al. (2005)) or to compare cardiac function of a patient before and after treatment (Peyrat et al. (2010)). Perperidis et al. (2005) proposed a Free Form Deformation (FFD) (Rueckert et al. (1999)) separable parameterization of the spatiotemporal transformation mapping two subjects. Peyrat et al. (2010) proposed to enforce the spatiotemporal mapping to match a given material point at different time points using trajectory constraints. Sundar et al. (2009) applied an inter-sequence alignment strategy for motion quantification. They created a static 3D+t sequence by replicating the first image of the sequence and then aligned the original one with the static 3D+t sequence. 2 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ In the context of motion quantification, spatiotemporal registration algorithms are based on the simultaneous alignment of all input images in the sequence. Ledesma-Carbayo et al. (2005) were pioneers in introducing a parametric spatiotemporal model, smooth in space and time, to take advantage of the temporal information contained in 2D US image sequences. Their algorithm was recently extended in two different directions. First, Metz et al. (2011) extended the output dimension of the transformation model to 3D+t. They used the sum of intensity variances over time as a groupwise similarity metric. Since uniqueness of the solution was not guaranteed, the authors constrained the average transformation in time to be the identity transformation. Second, Yigitsoy et al. (2011) kept the transformation model unchanged but took the mean squared intensity differences between all possible combinations of warped images as similarity metric. They referred to this metric as accumulated pairwise estimates. These three algorithms (Ledesma-Carbayo et al. (2005); Metz et al. (2011);Yigitsoy et al. (2011)) guarantee to recover smooth transformations in time by representing the displacement as a sum of smooth and continuous kernels. This displacement is expressed in the space of coordinates of a fixed reference frame. Such a representation is Lagrangian if the reference space is interpreted as the space of coordinates of material points. The advantage of this representation is that strain curves can be computed in a Lagrangian way, giving the deformation of one material point over time. The disadvantage, however, is that the motion field is not constructed as the composition of small incremental transformations. The displacement is smooth in time but does not functionally depend on the motion at earlier time instants. As it is expected that the transformation is not only smooth but also correlated between consecutive time points, this means that plausible transformations are restricted to a subset of the optimization space. The convergence of the optimization process can therefore be slowed down because of a suboptimal representation of the motion field.
One way of enforcing the motion to depend on earlier time points is to switch from a representation based on displacement to a representation based on velocity. Although the transformation model does not encode directly the trajectories, the Lagrangian displacement can still be reconstructed from the integration in time of the velocity field. Lagrangian strain or the image-based similarity to any reference frame in the sequence can then be computed from the Lagrangian displacement field. The benefit of basing the transformation representation on velocity rather than displacement is that by construction, motion at every time is obtained as the composition of velocity at all previous times. We refer to this functional dependence as temporal consistency and give experimental evidence in this paper of the resulting gain in accuracy and robustness against image noise.

Temporal diffeomorphic registration
Diffeomorphic registration algorithms (e.g. Beg et al. (2005); Vercauteren et al. (2007); Rueckert et al. (2006); Hernandez et al. (2009);Ashburner (2007); Avants et al. (2008)) ensure a continuous and differentiable correspondence with continuous inverse between the features to register. They are therefore particularly well suited to handle medical image sequences as they preserve topology and orientation of the observed anatomical structures over time. By integrating a velocity field over time, they provide an elegant way of enforcing temporal consistency. This concept was applied by Khan and Beg (2008) to monitor growth processes by extending the Large Deformation Diffeomorphic Metric Mapping (LDDMM) image registration algorithm (Beg et al. (2005)). Velocities were computed using a dense grid, which did not guarantee their spatiotemporal continuity, unlike registration methods that rely on parametric transformation models. A regularization term was added to the image similarity metric and involved a smoothing kernel thus enforcing the spatial continuity of the computed velocities. The temporal continuity of the velocities was then fully conditioned by the implicit assumption that the observed image features preserved their topology along the image sequence. Such an assumption does not hold for noisy image sequences, such as those arising from cardiac echocardiography, as speckle and image artifacts can generate discontinuities from one frame to the next.
The concept of exploiting the diffeomorphic registration framework to enforce temporal consistency was applied to 2D contours and 3D shapes by Durrleman et al. (2009) andQiu et al. (2009). However, while the computational cost of reconstructing dense velocity fields is acceptable for contours, paths or surfaces, its extension to dense volumetric spatiotemporal data remains critical. Moreover, all previous approaches 3 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ extending diffeomorphic image registration to temporal sequences only output a piecewise geodesic mapping (Durrleman et al. (2009)), for which the continuity of velocities at each data time point is not guaranteed. Trouvé and Vialard (2010) recently presented a framework for smoothly interpolating the trajectories of a set of landmarks over time by minimizing the norm of a forcing term in the evolution of the momentum equation. While this approach is suitable for handling landmark data, its computational cost would become prohibitive when applied to 3D images.

Incompressibility of myocardial tissue and application to cardiac motion quantification
Non-rigid image registration algorithms often require the inclusion of some prior information to compensate for the noise in the input data. In the case of cardiac motion quantification, several authors proposed the use of the incompressibility constraint to regularize the registration result.
The rationale behind incompressibility is that the various constituents of myocardial tissue as well as the blood perfusing it are incompressible. Experimental evidence of incompressibility was provided by Tsuiki and Ritman (1980) on a canine LV metabolically supported and monitored using angiography images. Although the volume quantified over the cardiac cycle was found to be globally constant, they reported variations about ±5%. One explanation provided by the authors is the possible change in coronary blood volume between each phase of the cardiac cycle. Since the myocardium is perfused with distinsible vessels, and because of the direct contact between muscles and vessels, one can expect that local changes in stiffness due to tissue contraction would modify myocardial volume (Yin et al. (1996)). Westerhof et al. (2006) reported about mechanical interactions between cardiac muscle and coronary vasculature. In particular, during systole, thickening of the cardiac muscle can take place at the expense of the vascular volume. Muscle contraction is augmented when vascular emptying is facilitated. This corroborates with data showing that vascular volumes decreases with contraction (Spaan et al. (1981)). Yin et al. (1996) experimentally showed the change of vascular volume to vary from 2 to 4 ml/100 g tissue when modifying the perfusion pressure from 0 to 120 mmHg.
In regard to these experimental results, myocardial tissue can be considered as quasi-incompressible, with a slight deviation from the fully incompressible case varying over the cardiac cycle.
In the context of motion/deformation quantification algorithms, incompressibility was first introduced as a constraint combined with the constraint of mass conservation in an optical flow framework (Song and Leahy (1991)). It led to the joint minimization of optical-flow, divergence-free and smoothness constraints. This approach was applied by Song and Leahy (1991) and Gorce et al. (1997) to the estimation of myocardial velocities from Computed Tomography (CT) images. Bistoquet et al. (2007) proposed the use of divergencefree radial basis kernels to represent myocardial displacements, implementing a first-order approximation of incompressibility. The model was validated by comparing the recovered motion obtained from cine MRI to motion information provided by tagged MRI images for healthy subjects and ventricular dyssynchrony patients. Haber and Modersitzki (2004) introduced a volume-preserving non-rigid registration algorithm by constraining locally the determinant of the transformation to equal unity. A specific discretization strategy was combined with a variant of the Sequential Quadratic Programming algorithm for solving the optimization problem. The method was applied to regularize motion quantification from contrast-enhanced data, being robust to contrast uptake registration artifacts. Saddi et al. (2007) also focused on motion quantification from perfusion sequences, but from CT images of the liver. In their approach, the non-rigid displacement field was projected onto the space of divergence-free vector fields using a multigrid solver. Mansi et al. (2010Mansi et al. ( , 2011 extended the diffeomorphic Demons registration method introduced by Vercauteren et al. (2007) by integrating elasticity and incompressibility for soft-tissue tracking. The basic idea in this method, as well as in Hinkle et al. (2009), is that diffeomorphisms parametrized by divergence-free velocity fields are incompressible. The proposed algorithm was applied to cine MRI for myocardial strain quantification. The introduction of the incompressibility constraint improved the strain estimation in cine MRI, taking the strain values computed from tagged MRI as ground truth.

Proposed approach
In this paper, we present a novel temporal diffeomorphic registration algorithm that models the velocities continuously in time and space and can be applied to 3D sequences with a reasonable computational cost.

5
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ Preliminary results on this methodology were presented in De Craene et al. (2010). In this work, we additionally introduce the possibility of using any time point as a reference, as well as the inclusion of quasi-incompressibility as a soft constraint in the optimization problem, and extend the clinical application of our algorithm regarding the analysis and the number of patients. We refer to our approach as Temporal Diffeomorphic Free Form Deformation (TDFFD) algorithm, since we extend the popular parametric FFD registration technique (Rueckert et al. (2006)) by summing spatiotemporal B-Spline kernels to model the 3D+t velocity field. An important contribution of this paper is the enforcement of the continuity of the velocity field by using a continuous parametric representation. As a result, the velocity and displacement fields can be computed at any time within the temporal interval captured by the image sequence. Another advantage of solving for velocity rather than for displacement is to endow the transformation model with temporal consistency. Indeed, if the velocity is changed locally at a given spatiotemporal location, the entire trajectory will be modified at subsequent times. This introduces a coupling between time steps that is expected to improve the robustness of the image registration algorithm against image noise. Although the TDFFD algorithm can be applied to any imaging modality, its clinical applicability is demonstrated here on synthetic and real 3D US sequences. For assessing the robustness of the algorithm with respect to the presence of noise, we generated ground truth data with a known motion field at several Contrast to Noise Ratios (CNRs) levels. The underlying clinical objective is to accurately estimate the impact of Cardiac Resynchronization Therapy (CRT) on 3D myocardial strain. CRT aims to restore synchrony of motion and strain in the four chambers. When looking specifically at the LV, CRT responders are expected to show at follow-up a higher peak strain value at the end of systole ) and an improved synchronization of the strain curves in the different LV segments (Delgado et al. (2008); Bertola et al. (2009)). Our algorithm was applied to a database of 13 CRT patients including both responders and non-responders before and after CRT, and at 6-and 12-months follow-ups, representing a total of 51 image sequences. For each of the 51 sequences, longitudinal strain curves were computed in different segments of the LV. The improvements in amplitude and synchrony were quantified for all patients and related to CRT outcome.
The remainder of this paper proceeds as follows. In Section 2, we describe the algorithmic framework implemented in this paper. We first detail the computation of a continuous spatiotemporal velocity field and how it generates trajectories in the 3D+t image domain. We then focus on the computation of the metric and its derivatives. Subsequently, we describe the computation of strain and briefly detail some implementation aspects. Section 3 is devoted to experiments on synthetic and clinical 3D US datasets, the latter including both volunteers and CRT patients. On synthetic images, we compare the TDFFD algorithm to the state-of-the-art pairwise FFD algorithm and to a 3D+t groupwise algorithm on the 3D+t displacement field, demonstrating an improvement in robustness in case of low CNRs. We then quantify the incidence of the main parameters of our algorithm on registration accuracy. On clinical datasets, we examine for healthy volunteers the uniformity of the strain pattern among different segments of the LV. On CRT patients, we quantify the correlation between strain curves and therapy outcome, as quantified by reverse remodeling. Finally Section 4 and 5 discuss the findings of this paper in regard to the literature and potential improvements for future work.

Registration framework and notations
In this paper, we denote a sequence of N 1 +N 2 +1 images as {I n }, with n ∈ N = {n ∈ Z, −N 1 ≤ n ≤ N 2 }. The image I 0 designates the image taken as reference. All images before the reference image are indexed by n = −N 1 . . . − 1 while all images after the reference are indexed by n = 1 . . . N 2 . Each image, I n , is defined on a spatial domain Ω ⊂ R d where d stands for the spatial dimension. Each image I n is associated to a time stamp t n ∈ T = [t −N 1 , t N 2 ], with t −N 1 , t N 2 ⊂ R.
A flowchart of the overall algorithmic registration framework is given in Fig. 1. The purpose of the registration algorithm is to solve for the diffeomorphic mapping ϕ 0 : Ω × T → R d that transports a material point x ∈ Ω from t = 0 to a target time t ∈ T . In this paper, a continuous velocity field is parameterized using a regular 4D grid and smooth spatiotemporal B-Spline kernels, thus ensuring spatial and temporal continuity. 6 This is a pre-print version Reference image time Figure 1: Overview of the algorithmic framework. Continuous spatiotemporal trajectories are computed from the 3D+t velocity field, parameterized by a 3D+t grid of control points with B-Spline kernels. Image similarity is computed from image intensities for the current set of trajectories. Quasi-incompressibility is added as a biophysical constraint for regularizing the velocity field.
Temporal continuity of the velocity field is not guaranteed when working with temporal diffeomorphic registration algorithms as described in Khan and Beg (2008) and in Durrleman et al. (2009). Recent extensions (Trouvé and Vialard (2010)) do ensure continuity when interpolating sparse trajectories defined on a set of landmarks by performing a variational perturbation on the temporal derivative of the momentum rather than on the velocity. This second-order approach supposes a significant increase in the computational cost. Since this paper addresses the problem of estimating dense trajectories from sequences of 3D images, the introduction of a continuous kernel in time was preferred to the introduction of higher orders.
The inclusion of the velocity in the registration framework adds one intermediate step in comparison with other spatiotemporal registration approaches that directly model the displacement field (Ledesma-Carbayo et al. (2005) and Chandrashekara et al. (2004)). The velocity field v : Ω × T → R d transports a given material point x from its original location at time t = 0 to its mapped location by a continuous trajectory, ϕ 0 (x, t), whose tangent vector at any time is given by v(ϕ 0 (x, t), t). The numerical solution of this transport equation, described in Section 2.3, is obtained by the discretization of the temporal domain T using a non-uniform sampling {t k }.
The velocity field, v, is obtained through an optimization procedure guided by a cost function composed of an image-driven term depending on image intensities and a regularization term representing prior information about the motion field. For the data-driven term, we used the Mean Squared Error (MSE) between the intensities of the reference frame and all other frames. For the regularization term, a constraint imposing incompressibility of the myocardial tissue was chosen. Both of them are described in Section 2.4.

A continuous velocity field in the spatiotemporal domain.
The temporal dimension is introduced into the diffeomorphic registration problem by relating the mapping ϕ 0 at any time t to a time-varying velocity field. In this paper, the velocity field is represented as a sum of spatiotemporal B-Spline kernels. The B-Spline velocity coefficients assigned to all the control points are concatenated in a vector of parameters p, the velocity being then denoted as v(x, t; p) and computed as 7 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ where x = (x, y, z), β(·) is a 1D cubic B-Spline kernel, q i,j,k,l = (q i , q j , q k , q l ) is a sparse grid of 3D+t control points, and ∆ = (∆ i , ∆ j , ∆ k , ∆ l ) is the spacing between control points in each dimension. Hence, the trajectory ϕ 0 that maps a material point of coordinates x at time t = 0, being the time associated to the reference frame, to time t n is obtained by adding to the initial position the integration of the velocity field from time t = 0 to time t = t n , i.e., Registration on 2D echocardiographic sequences may require the use of drift correction techniques due to out-of-plane motion. The clinical modality targeted in this work is 3D US, and taking into account that images are acquired under breath-hold in real time, no significant drift effect is expected. For this reason, drift correction was not included in the transformation model.

Trajectory computation.
The transport equation for computing ϕ 0 in Equation (2) is solved numerically using a forward Euler integration scheme in which the continuous integral is replaced by a discrete summation. The continuous time interval is now split into a collection of t k ∈ T values where the time increment between consecutive time-steps ∆t k is adapted to ensure invertibility as further described. By convention, we will use the subscript index k to refer to the discretization of the temporal dimension for solving the transport equation, whereas the subscript index n will refer to the temporal instants at which imaging data is available. The k-sample scheme is assumed to be denser than the temporal sampling of the image grid to ensure accurate estimation of the mapping ϕ 0 . For the simplicity of notation, we will assume that the n-sample scheme is a subset of the k-sample scheme.
Using this discretization, Equation (2) can be approximated by when n takes a positive value. When n takes a negative value, the same expression can be used but the upper limit of the sum has to be changed to n + 1. If we define x k 0 (p) = ϕ 0 (x, t k ; p) as the transport of the coordinate x from time t = 0 to t k and v k (p) = v(x k 0 (p), t k ; p), we can recursively write x k 0 (p) as follows: Equation (4) is still valid for a time t k < 0 giving a negative sign to ∆t k−1 . It can also be generalized to the transport from any frame in the sequence by taking an initial time different from 0. The integration of Equation (2) using the discrete approximation of Equation (4) requires to select a time-step sufficiently small for ensuring accurate computation and invertibility of the mapping ϕ 0 . In our method, we start off with a uniform sampling of the temporal domain T , arbitrarily chosen as half of the temporal spacing of the image sequence. To ensure invertibility at any time t n , one needs to consider the Jacobian of the mapping x n 0 with respect to the set of parameters p, here denoted as Dx n 0 (p) and computed from Equation (4) using where I stands for the identity matrix. This Jacobian must be positive definite everywhere to ensure invertibility of the transformation. A necessary condition to accomplish this goal is to have det(Dx n 0 (p)) > 0 for all x ∈ Ω. Computing the product over k of all det(∆ϕ k k−1 ) gives the determinant of the Jacobian matrix in Equation (5). When a negative value of det(∆ϕ k k−1 ) is detected, the value of ∆t k is reduced by a factor of 2 until enforcing a positive determinant. Note that the zero threshold could easily be substituted by a small positive value for not hindering the stability of the trajectories computation. 8 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ 2.4. Similarity metric and non-linear optimization.

Image similarity
Tissue and blood pool intensities are globally conserved over the cardiac cycle in 3D US images. Therefore, we use the Mean Squared Error (MSE) metric for capturing the optimal set of B-Spline velocity coefficients p of Equation (1). More complex similarity measures as described by Cohen and Dinstein (2002) have been proposed to take into account the statistical properties of the speckle noise. However, in the case of an independent additive gaussian noise, motion estimation based on a maximum likelihood approach leads to the minimization of the MSE metric.
The similarity metric is computed between an arbitrarily chosen reference frame I 0 in the sequence and all the consecutive time frames according to The framework presented in this paper can be generalized to other similarity metrics such as information theoretic measures as long as they are continuous and differentiable. The derivative will be a function of dx k 0 /dp whose computation is given in Section 2.4.3.

Quasi-incompressibility
Quasi-incompressibility is equivalent, in this framework, to imposing a zero divergence of the velocity field at every spatiotemporal location. Since the divergence is the trace of the Jacobian, the quasiincompressibility constraint can be formulated as where C n (p) is given by Here, Ω M stands for the subset of the Ω domain that delineates the myocardial wall in the reference image I 0 , i.e., the subdomain defined by all the material points in the myocardium undergoing the deformations we are interested in analyzing. In this paper, Ω M is obtained by segmenting the LV in the first frame of the sequence as described in Section 2.5. Since the quasi-incompressibility constraint is computed over a fixed domain in the reference frame, Ω M can be considered as constant when computing derivatives. The domain is propagated using the current estimate of the transformation, thus avoiding the definition of a region of interest at every frame.

Complete metric and derivation
As there is some controversy in the literature (see Section 1.4) on the fully incompressible nature of the myocardial tissue, we opted in this paper for enforcing quasi-incompressibility as soft constraint. This differs from previous approaches (see Mansi et al. (2010Mansi et al. ( , 2011 and Hinkle et al. (2009)), which project the velocity field at every iteration on a divergence-free velocity space.
In this paper, the incompressibility constraint of Equation (7) is added to the image similarity metric using a Lagrange multiplier λ ∈ R. The complete cost function is therefore The choice of an optimal λ is discussed in Section 3.1.4. Since the number of parameters characterizing the transformation is large, and the metric is explicitly differentiable, gradient-based optimization methods are well indicated for minimizing Equation (9). In this paper, the L-BFGS-B method of Byrd et al. (1995), 9 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ which searches the optimum based on the gradient and a low-rank approximation of the Hessian of the metric, was chosen. The total derivative of the MSE metric in Equation (6) with respect to the parameters p of the velocity kernels is equal to where DI n is the spatial gradient of the n th image in the sequence and dx k 0 dp can be obtained by differentiation of Equation (4): Hence, dx k 0 dp can be obtained from the following recursive equation: where the first term propagates the derivative dx k−1 0 dp computed at the previous time step, weighted by the volume change the transformation introduces at the previous time step (factor between brackets).
The derivative of C n (p) in Equation (8) with respect to one element of p is given by The detailed computation of d dp tr Dv n (p) is given in A.

Myocardial strain computation
The myocardial strain tensor is directly estimated using the spatial derivatives of the displacement field from Equation (5). In this paper, strain is computed in the space of coordinates of the first frame, which corresponds to the definition of Lagrangian strain. The strain tensor is then computed as Note that in the cardiac literature (D'hooge et al. (2000)), it is common to define strain as a linear change in length, whereas Equation (14) is quadratic with respect to the spatial gradient of the displacement field. Both expressions are equivalent under the assumption of small displacements, i.e. that both the displacements and the displacements gradient are small compared to unity. In the case of Lagrangian strain, as strain is computed with respect to a fixed reference (end of diastole), the assumption of small displacement is invalid. This justifies why we opted for Equation (14) rather than the linear formulation. However, it is important to keep in mind that the coexistence of the two strain definitions might be responsible for some discrepancies in the strain values reported in the literature. This tensor is projected along a specific direction h using h (x, t n ) = h T · (x, t n ) · h. The set of h directions considered here are the three vectors (radial, longitudinal, circumferential) of a local coordinate system related to the anatomy of the LV (D'hooge et al. (2000)). For computing this set of directions, the LV is segmented in the first frame of the sequence. According to Section 3.1.3, this corresponded to the optimal reference choice. This segmentation also defines Ω M in Equation (8) and is obtained from an active shape and appearance model whose construction and application to image segmentation is described in Butakoff et al. (2007). This algorithm was initialized by defining the aortic valve, the mitral valve and the apex on 10 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ the endocardial side. This provides semi-automated definition of the right ventricle insertion points. The longitudinal direction, l, is defined uniformly by drawing a line from the apex to the mitral valve. These two landmarks are directly stored on the mesh, which makes the long axis computation automatic and robust to any image orientation. The radial direction is obtained from the normal e to the mesh at each node and the l vector using r = e − (e · l) l .
The circumferential direction is then obtained by the cross product of longitudinal and radial directions The LV was divided using the 17-segment model as proposed by the American Heart Association (AHA, Cerqueira et al. (2002)). These 17 regions are directly defined on the surface mesh for computing regional average strain curves. In this paper, strain is computed in a Lagrangian space of coordinates, hence both local coordinate system and AHA segments need to be defined on the first image only.

Implementation
The TDFFD registration algorithm was implemented in C++ within the ITK 1 toolkit. The B-Spline velocity field computation was inspired from the B-Spline deformable transformation object and mainly differ on allowing different dimensions for the input point and the output velocity. The computation of the matrix product and summation in Equation (12) was implemented through sparse matrices. All experiments were run on a 24-node double quad-core Intel Xeon (2.66 GHz CPU, 16 GB RAM) Linux cluster for distributing the load of processing different input image sequences. Regarding the computation times, the FFD algorithm required 20 minutes to process the whole sequence. The TDFFD algorithm, due to its higher complexity in the computation of metric derivatives, took 3h40 min. Interestingly, because of a very slow convergence; the Ledesma-Carbayo et al.'s algorithm took 8h40 min and 10 hours (for the two initialization strategies described in Section 3.1.2). The convergence of the TDFFD algorithm was regular and the objective function value uniformly decreased over the iterations, which explains a shorter execution time.
The code is currently not parallelized and uses a single CPU. Conceptually, every sample from the reference image can be processed independently for computing its contribution to the metric and its derivative. Dividing the set of image samples among different threads is therefore expected to significantly speed-up the computation time.

Experiments
Experiments were performed first on a set of synthetic 3D US sequences to evaluate the accuracy and the robustness of the TDFFD algorithm with respect to known ground truth displacement. Then, the clinical applicability of our method was tested for two populations on clinical 3D US sequences: a population of 9 healthy volunteers and a population of 13 CRT patients including responders and non-responders acquired before and after CRT, as well as 6 and 12 months follow-up.

Registration accuracy on simulated ultrasound data
The evaluation on synthetic US images consisted first in comparing the accuracy of the TDFFD algorithm to two state-of-the-art algorithms. The first one is the pairwise FFD (Rueckert et al. (1999)). The second one is the 3D+t algorithm proposed by Ledesma-Carbayo et al. (2005), guaranteeing a temporally smooth displacement field. In this figure, as in the sequel of this manuscript, the time was normalized by the duration of one cardiac period and is referred to as normalized cardiac time. This comparison was performed both in terms of displacement and strain, considering several imaging noise levels. Then, an optimal set of parameters for the B-Spline grid resolution and the weighting of the incompressibility term were determined by optimizing the error between true and recovered displacements. Elen et al. (2008) simulated 3D US images of the deforming LV for a normal contraction. This dataset had an isotropic voxel size of 0.36 mm. Concerning the temporal resolution, 20 frames per cycle were generated, which is slightly higher than the average number of frames per cycle for the set of healthy volunteers used in Section 3.2.1. In the anatomical model employed to generate the simulated 3D US images, the LV was represented as a thick-walled ellipsoid with end-diastolic dimensions in the physiological range.

Experimental setup
Synthetic US data sets were generated from this mechanical model using a previously described methodology mimicking the whole image formation process (Gao et al. (2009)). In addition, a simplified kinematic model with an ejection fraction of 60 % over a cardiac cycle was developed to generate the ground truth displacement field. LV mechanics was based on a kinematic model as described in Arts et al. (1992). In this model, only transformations based on the changes of the cavity volume and the torsion were considered while the myocardium was assumed to be incompressible. The resulting synthetic displacement field is plotted in Fig. 3 and has an amplitude reaching 15.5 mm in the base at the end of systole. 12 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ This ground truth data is then used in this paper to evaluate the accuracy of the proposed algorithm and compare it to a pairwise registration strategy at different noise levels. Various CNRs were generated by multiplying the blood pool speckle acoustic impedance (i.e. the amplitude of the intensities) by a factor w. In this paper, three values were considered for w: 0.2, 0.5 and 0.7. An increasing w means an increasing noise power, thus a reduced CNR. Snapshots of axial and longitudinal slices for the three levels of noise considered here are plotted in Fig. 4. This set of values for w can be related to CNR values using the equation given in van Wijk and Thijssen (2002): where µ F and µ B are the average intensity values of the foreground and background, σ 2 B and σ 2 F designating the foreground and background intensity variances. Computing µ F , µ B , σ F and σ B for the three values of w listed above returns CNR values of 1.29, 0.60, 0.19, respectively. These values roughly cover the range used in van Wijk and Thijssen (2002) for nominal echo levels going from -6 to 3 dB.

Comparison between TDFFD and state-of-the-art algorithms
In this Section, we compare our TDFFD algorithm to the FFD (Rueckert et al. (1999)) and Ledesma-Carbayo et al. (2005) (LC) algorithms in terms of accuracy for both displacement and strain quantification. All algorithms considered in this section were compared without regularization metric, i.e. without the incompressibility constraint for the TDFFD. This choice was made to avoid biasing the analysis of the relative performance of the different methods under evaluation. 13 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ The horizontal axis is the normalized cardiac time (from 0 to 1). The error is measured in millimeters over the entire myocardium. Three levels of noise are considered (see Fig. 4): w = 0.2 (left), w = 0.5 (center) and w = 0.7 (right). First row: comparison between 3D FFD and 3D+t FFD on the displacement fields. FFD to first refers to applying FFD between the first frame and every frame of the sequence, LC refers to Ledesma-Carbayo et al. (2005) with zero initialization, LC 2 refers the same algorithm initialized by a set of pairwise transformations. Second row: comparison between 3D FFD and 3D+t FFD on velocities. FFD seq. refers to applying FFD sequentially, both for the metric and the transformation, FFD seq./first refers to a sequential representation of the transformation but defining the metric to the first frame.
Displacement accuracy versus noise level. Fig. 5 plots the median magnitude of the difference between the ground truth displacement field for all considered algorithms. Vertical bars show the dispersion on the entire myocardium of this displacement error values by plotting second and third quartiles limits. For all algorithms, the B-Spline grid had a resolution of three control points in the longitudinal direction and five in the two transverse directions. This resolution was the optimal initial resolution of the TDFFD algorithm for this set of images, as described in Section 3.1.3. Two configurations of the FFD algorithm were considered: computing the pairwise transformations between consecutive frames (sequential transformations) and computing them relative to the first frame (fixed reference transformations).
For sequential transformations, two variants for quantifying the image metric were considered. In the first, FFD seq. in Fig. 5, the metric was quantified between consecutive frames. In the second, FFD seq./first in Fig. 5, the metric was quantified with respect to the first frame. We used the mutual information metric as in the original version of the FFD algorithm (Rueckert et al. (1999)  the optimization process. Only the last transformation in the chain, between the previous and current frames, was optimized. This combination of sequential transformations but fixed reference is conceptually the closest choice to the TDFFD approach proposed in this paper.
In Ledesma-Carbayo et al. (2005), the transformation represents the displacement field to the first image as a smooth 3D+t B-Spline field. The sum of squared intensity differences between each image and the first image in the sequence was chosen as similarity measure. Two initialization strategies were compared for the LC method: initializing the transformations with null parameters (LC in Fig. 5) and initializing the 3D+t displacement field by a set of pairwise transformations relative to the first frame (LC 2 in Fig. 5).
Results are plotted in two rows for incrementally evaluating the impact of temporal continuity and temporal consistency as defined in Section 1.2. First, the advantage of using a continuous representation of the displacement field is evaluated by comparing FFD with fixed reference transformations with Ledesma-Carbayo et al. (2005). The purpose of this first comparison is to evaluate the advantage of going from a 3D to a 3D+t representation of the displacement field. Second, sequential FFD is compared to TDFFD, the purpose being to compare a 3D with the respective 3D+t representation of the velocity field. The main features of the different algorithms tested in this comparison are summarized in Table 2.
The top row in Fig. 5 shows that for the lowest amount of noise (w = 0.2), the LC algorithm and FFD perform similarly. In this case, there is no interest in initializing the 3D+t transformation by a set of pairwise transformations (LC 2) as the initialization by the identity transform (LC) provides a better accuracy. On the contrary, when working at higher noise levels, Ledesma-Carbayo et al.'s approach with pairwise initialization gave higher accuracy than the two other algorithms. At the intermediate noise level, the LC algorithm without initialization did not converge and produced higher median error values than FFD and the LC 2 algorithm .
The bottom row in Fig. 5 compares FFD with sequential transformations with the TDFFD algorithm. Applying FFD sequentially for the metric and the transformation resulted in the generation of a significant drift effect and overall divergence. TDFFD and FFD seq./first performed similarly at the lowest noise level. However, at the intermediate and highest noise levels, the TDFFD algorithm showed increased robustness, and produced smaller errors. The dispersion was also clearly reduced when using the TDFFD algorithm for all noise levels, as observable from Fig. 5. For w = 0.7, the upper limit of the third quartile goes from 6.5 mm using pairwise FFD and the LC algorithm to 3.23 mm using TDFFD.
At all levels of noise, the convergence of the LC algorithm was slower and less regular than TDFFD and all variants of FFD. One possible explanation is that the transformation model used in the LC approach does not exploit the high correlation between the displacements at consecutive times. Searching for optimal directions in the optimization space is therefore more challenging than for the sequential FFD and TDFFD algorithms that naturally represent the displacement at a time t + ∆t as a composition of the transformation at time t and an incremental contribution over the ∆t time interval. As previously mentioned, we used mutual information as similarity metric for the FFD algorithm (Rueckert et al. (1999)). While this has the potential of introducing some bias in the comparison between the different approaches, one can observe in the top row of Fig. 5 that FFD did not show worse performance than LC, despite of using a more generic 15 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ metric. This suggests that the choice of mutual information does not alter significantly FFD accuracy in this application.
It can be observed from Fig. 5 that all methods compared in this section produce error curves following cardiac cycle phases since they increase over systole and reach a peak of maximum error at the end-systolic phase, being reduced over diastole. This indicates that for the three methods, the error increases when estimating larger displacements. Myocardial strain accuracy versus noise level. The impact of imaging noise level on radial, longitudinal and circumferential strain curves was evaluated using the same synthetic datasets for the three algorithms (FFD, LC and TDFFD). For each noise level, the configuration of FFD and the LC algorithm giving the best displacement accuracy was taken for comparing the resulting strain outputs. Fig. 6 shows on the top row, the ground truth strain curves. Vertical bars show the dispersion on the entire myocardium of the strain values by plotting second and third quartiles limits. This dispersion models transmural changes from endocardium to epicardium. It can be neglected for longitudinal strain (inferior to 3 %) but becomes substantial (up to 10 %) for radial and circumferential components. The remainder rows show the strain curves as recovered by the three algorithms for the settings described in Section 3.1.1.
At a low level of noise (w = 0.2), TDFFD showed the largest accuracy and smallest dispersion for quantifying longitudinal strain. All three methods underestimated the peak value of radial strain. FFD slightly shifted in time the peak of radial strain. For circumferential strain, FFD and TDFFD performed better (lower dispersion) than the LC algorithm. At intermediate and high noise levels, TDFFD qualitatively preserved better the pattern of the strain curves than the two other algorithms, but at the expense of an underestimation of the peak values and an increased dispersion of strain values. In all three algorithms, the longitudinal strain was the most sensitive to the change in CNR. FFD registration failed to recover meaningful radial and longitudinal strain curves at w = 0.5, and at w = 0.7.
The FFD strain curves at w = 0.5 and w = 0.7 showed a more pronounced drift effect (the curves do not come back to zero) in comparison with TDFFD and LC algorithm at similar noise levels. This seems to indicate that spatiotemporal registration techniques are more robust to drift than pairwise algorithms (Elen et al. (2008)). As incompressibility was not used in this experiment, this drift robustness may come from temporal consistency.
In order to quantify the statistical agreement between the different registration algorithms and the ground truth, we performed a set of Wilcoxon rank sum tests for equality of medians at the lowest imaging noise level (w=0.2). These tests were performed on radial, circumferential and longitudinal strain values averaged per AHA segment at the end of systole. All obtained p-values are given in Fig. 7 together with a box plot giving the quartiles of the distributions under comparison. For all algorithms, longitudinal strain returned the lowest p-values. This mainly stems from the high dispersion of the estimated strain values compared to the ground truth. Regarding radial strain, LC and TDFFD gave similar performance for quantifying radial strain at end of systole. FFD in turn underestimated radial strain. Finally for circumferential strain, LC returned incorrect strain patterns in some regions with positive values. This resulted in a high dispersion compared to the ground truth and a lower p-value than FDD and TDFFD. Displacement accuracy versus time decimation. In addition to the deterioration of spatial CNR, examined in the former experiment, another potential source of image deterioration is the loss of temporal resolution. To evaluate the robustness of all compared algorithms to time resolution of the input data, the synthetic sequence at w = 0.2 was undersampled by using one out of every two frames. We then compared the performance of each algorithm on the undersampled data (red curves in Fig. 8) to two alternative situations: running the same algorithm on the full dataset using 1) one control point per time frame (blue curves) and 2) using one temporal control point every two time frames (green curves). Fig. 8 shows error curves in the same format as in Fig. 5 for the three algorithms. In the case of the TDFFD algorithm, the peak systolic error showed a slight increase (from 1.58 to 1.85 mm) while the average diastolic error went from 0.4 to 0.6 mm. In the case of the LC algorithm, peak systolic error increased from 2.29 to 4 mm. The diastolic error increased in a much higher proportion than for TDFFD (more than a factor 2). For FFD, systolic error was practically unchanged after decimation but the pairwise algorithm failed 16 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ truth data to evaluate the accuracy of the proposed algorithm and compare it to a pairwise registration strategy at di erent noise levels. Various signal to noise ratios were generated by adjusting intensities inside and outside the myocardial wall using a weight w (w = 0 .2, w = 0 .5 and w = 0 .7 in this paper). Fig. 1 shows the median magnitude and dispersion of the di erence between the ground truth displacement field and the ones given by to properly recover diastolic motion at the lower temporal resolution. This can be observed in Fig. 8(c), showing a significant increase of error dispersion in the second half of the heart cycle.

Optimization of TDFFD parameters
In this section, we quantify the influence of several parameters of our algorithm on the displacement accuracy. Median displacement values and their dispersion are plotted as in Section 3.1.

17
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/  Figure 8: Robustness of the 3D displacement quantification to temporal decimation of the input sequence, as measured by the proposed TDFFD, LC and FFD algorithms. Vertical bars show the dispersion of the error values over the entire myocardium by plotting second and third quartiles limits. The horizontal axis is the normalized cardiac time (from 0 to 1). The error is measured in millimeters over the entire myocardium. This experiment was performed for w = 0.2. For FFD and LC algorithms, we retained the configuration that gave the lowest error from Figure 5. The red curves shows the error magnitude after decimation. The error at full temporal resolution is plotted in blue. For LC and TDFFD algorithms, the error obtained by running the registration using a grid at half temporal resolution on the full dataset is also shown (green curve).
Influence of initial resolution of control points. The purpose of this experiment was to determine the optimal setting for the initial resolution of the 3D+t B-Spline grid. For ensuring convergence of the global optimization procedure, it is important to have an optimal setting of the number of control points in the 3D+t B-Spline grid to avoid solving the optimization problem in an unnecessary low or high dimensional space. In the synthetic dataset described in Section 3.1.1, X and Y axis define the short axis planes. Because of the symmetry of the geometry, the motion is expected to have the same magnitude in X and Y . Therefore, the same number of control points was used in these two directions. Fig. 9(a) shows the displacement accuracy when spanning from 3 to 5 control points in the short and long axis directions. The number of control points in the temporal dimension was fixed to the number of frames in the input sequence. From these curves, it 18 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ can be observed that 5 control points in the short axis direction and 3 control points along the long axis directions returned the lowest error over the whole myocardium.
To analyze the temporal resolution of the TDFFD grid, we varied the temporal resolution by decreasing it from the total number of frames (20 in this case) to 18, 16 14, 12, 10 and 5 control points. The resulting accuracy curves on the displacement errors are plotted in Fig. 9(b). This confirms that an accurate quantification of the motion pattern requires an elevated number of control points. Since taking more control points than the number of frames would result in oversampling the velocity field with respect to the temporal resolution of the input data, we kept the number of control points to its maximum, i.e. the number of frames, in all experiments.
Influence of reference choice. In this experiment, the reference image in the sequence was changed from the first to the last image to assess its influence on the registration results. This corresponds to adjusting the position of the image indexed with 0 in the similarity metric computation with Equation (6). Fig. 9(c) shows the averaged displacement accuracy for every reference choice. Taking the first image in the sequence as reference image (dashed red curve in Fig. 9) provides the lowest error considering all time frames. By symmetry, an equivalent accuracy is obtained when taking the last image of the sequence as reference. Obtaining a similar accuracy when taking the first or the last image as reference suggests that errors introduced by the backward integration of the velocity in time do not significantly alter the accuracy of the estimated trajectories.

Evaluation of the quasi-incompressibility constraint
The influence of the incompressibility weight λ (Equation (9)) on the displacement accuracy was quantified at two resolutions of the B-Spline grid. It is expected that the impact of the incompressibility constraint should be higher when the grid resolution increases. Indeed, the higher the dimensionality of the optimization space, the higher the impact of the regularization term.
The value of λ was first modified in a range from 0 to 2000 for a grid of 5×5×3 control points (Resolution 1). This grid size was the optimal grid resolution obtained from the previous experiment. Fig. 10(a) plots the displacement accuracy for every λ value. Choosing λ = 1250 offers a good compromise between reducing systolic and diastolic errors. Indeed, the corresponding curve is the lowest during systole and within the low range of other curves during diastole. The absence of incompressibility constraint gives the worst result in terms of accuracy. During diastole, λ ∈ [500, 1500] gives 30 % less error than λ = 0.

19
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ The impact of extending the grid resolution by a factor 2 in all dimensions was then studied. At the refined resolution, it is expected that the impact of the incompressibility would be stronger. We therefore extended the range of incompressibility weights to allow for higher values than the initial resolution. Fig. 10(b) shows the displacement accuracy obtained at this second resolution, namely using a grid of 10×10×6 control points (Resolution 2). If we assume that 10 % of the total volume is occupied by myocardial tissue, as it is the case in the synthetic dataset used here, then 60 of the 600 cells covered by the control points grid correspond to the myocardium. Since we aim at quantifying strain in each of the 17 segments, it means that each AHA segment is divided in more than 3 cells, corresponding to 16 control points. This can be considered a good trade-off between regularization of the transformation model and required accuracy since for clinical applications, strain will be locally averaged within each segment. At this second resolution, incompressibility was found to have more impact in diastole than in systole. Working without the incompressibility constraint led to worse accuracy at the fine than at the coarse resolution. Giving a weight of 5000 provided a good promise between the reduction of the diastolic error without significantly compromising the end-systolic accuracy.
To verify that the incompressibility constraint is satisfied for the synthetic dataset, we plotted the volume curve (relatively to end of diastole), averaging local volume change on the entire myocardium geometry. The resulting curves at the two consecutive resolutions are given in Fig. 10(c). Averages over time are given for each curve in the caption. These numbers fit in the 5 to 10 % range, in accordance to the volume changes reported by Westerhof et al. (2006); Yin et al. (1996). For further evaluating the effect of the incompressibility constraint, we measured the local volume change of the LV. Fig. 10(d) shows the local volume ratio between end-systole and end-diastole for different incompressibility weights λ at Resolution 2. Light colors indicate small or no volume changes while dark colors correspond to a volume increase (red) or decrease (blue). For incompressible tissue, this value would be equal to 1 on the entire LV. It can be observed how dark color regions fade away when giving more weight to the incompressibility constraint. For instance, dark blue areas (indicating a local compression rate of 50%) are attenuated when introducing the incompressibility constraint.

20
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/

21
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/

Experiments on clinical data
Experiments were performed on two different sets of clinical 3D US sequences. Myocardial strain was first quantified for healthy subjects whose strain pattern and amplitude are known from clinical literature. Then, the TDFFD algorithm was applied to 3D US sequences of CRT candidates and the obtained strain curves were analyzed in conjunction with the clinical outcome of the therapy.

Data acquisition and database
We acquired 3D US sequences in an apical view for two populations, using a GE Healthcare Vivid 7 (Milwaukee, WI, USA) ultrasound system. The first population was composed of 9 healthy volunteers (31 ± 6 years old), and the second population was composed of 13 patients selected as CRT candidates (61 ± 8 years old). All 13 patients underwent CRT at a single tertiary care university Hospital (Hospital Clínic i Provincial de Barcelona (HCPB)), based on current European Society of Cardiology clinical guidelines (a recent update of these guidelines was recently published in Dickstein et al. (2010)): left ventricular ejection fraction (LVEF) < 35%, QRS duration > 120ms, and NYHA classification III-IV or NYHA II who covered less than 500 meters in the 6 minutes walking test.
The average number of images per cardiac cycle was 18 ± 3 for the healthy subjects and 17 ± 3 for CRT candidates. The pixel size was on average of 0.9 × 0.6 × 0.9 mm 3 for the healthy volunteers and 1.0 × 0.7 × 1.0 mm 3 for the CRT candidates.
Among these patients, there were 7 responders and 6 non-responders according to echocardiographic response. Echocardiographic response quantifies the amount of reverse remodeling by computing the changes in LV volume at the end of systole (LVESV) between baseline and follow-up. Using this criterion, a patient is classified as responder in case of reducing more than 15 % the LVESV, conditioned to patient survival (Yu et al. (2005)). All 3D echocardiographic data were acquired at three different stages: before CRT (OFF), 24-72 hours after implantation (ON), at 6 months (M6), and at 12 months (M12) follow-up. No data was available in the hospital record at M12 for Patient # 11. This patient was admitted to intensive care one week prior to the M6 follow-up and was a non-responder to CRT at M6.
All CRT patients had dilated geometry before implantation, thus the LV did not entirely fit in the field of view. The image acquisition protocol then gave priority to the full inclusion of the septum at the center of the 3D US sector. Therefore, for these patients, myocardial strain in the inferior segments was excluded from the analysis because of their partial inclusion or absence of the field of view.
The exact definition of CRT response is known to be difficult since the application of a single threshold cannot properly render the different grades of response. The reader is invited to refer to Bleeker et al. (2006) for the different definitions of CRT response and their discrepancies in a large population of patients. Because of the complexity of defining CRT response, we distinguished between the following categories: large responder for a LVESV reduction higher than 30 %; moderate responder for a reduction between 15 and 30 %; moderate non-responder for a reduction between 10 and 15 %; and large non-responder when either no reduction was observed or the reduction was below 10 %. The LVESV reduction together with the classification between large/moderate responder/non-responder is given in Table 3. Improvements in both the magnitude and the synchronicity of myocardial strain among the different LV segments are expected for CRT responders (Delgado et al. (2008); Bertola et al. (2009)). This is illustrated for one responder (Patient # 1) in Fig. 11, showing a color map of longitudinal strain at the end of systole before and after CRT. In this patient, the average longitudinal peak strain goes from −6 to −11 %. Before CRT, areas of longitudinal stretching in the lateral wall (in red) coincide with shortening areas (in blue) at end of systole. At follow-up, the spatial distribution of strain becomes uniform over all the LV. Fig. 12 shows the recovered longitudinal strain curves for the database of 9 healthy subjects at mid and basal segments of the AHA 12 segments (not counting apical segments). The segments either not totally included in the field of view of the 3D US images or suffering from image artifacts (low visibility of the lateral wall, reflections of surrounding anatomical structures, lower spatial resolution on the lateral sides of the US sector), were excluded from the analysis after visual inspection of the images by clinical experts. On 22 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.   average, 78 % of the segments over the 9 volunteers were conserved. TDFFD parameters were set according to the optimal values found for the synthetic dataset from Section 3.1.3. For incompressibility, the optimal λ value found for the synthetic dataset was rescaled according to the initial value of the MSE metric. This ensures that the relative weight of the incompressibility constraint is conserved, despite linear changes of intensity between real and synthetic images. The recovered strain curves showed a similar pattern in all volunteers, with values in good agreement with clinical literature (Edvardsen et al. (2002)). Typical phases of diastole such as the isovolumetric relaxation and the atrial contraction periods (resulting in an acceleration of myocardial strain at the end 23 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ of the diastole, see red arrows in Fig. 12) periods can be distinguished in cases with higher image quality or improved temporal accuracy such as Volunteer #4, #7 and #8. The observed peak systolic longitudinal strain was, on average, of 16.3 %, with a standard deviation of 4.7 %. Edvardsen et al. (2002) (see Table  1 in that reference) reported an average of −17.5 % and a standard deviation of 4 % from tagged MRI images, when considering all but apical segments. If we assume that the values reported by Edvardsen et al. (2002) are the true average and standard deviation, one can compute a normalized Z ∼ N (0, 1) statistic by subtracting the true mean to the observed average and dividing by the true standard deviation. In this case, we obtained a p value = p(|Z| > |z 0 |) = 0.76. The average value obtained in this paper is also in agreement with the study by Saito et al. (2009) reporting a mean longitudinal strain of −17.4 ± 5.0 % (yielding a p value = 0.83) by speckle tracking on 3D US sequences. The averaged volume change was computed in a similar way as in Section 3.1.4. The average volume change was computed relatively to end of systole over the whole myocardium and is plotted in Fig. 13. It clearly appears that the total volume change exceeds the expected 5 to 10 % of volume change as reported in the literature (Westerhof et al. (2006); Yin et al. (1996)). While volunteers #1, #2, #3 and #5 show an average volume change close to 10 % range, the other four volunteers exhibit an average volume variation in the range of 14-18%. This point is further commented in Section 4.

Strain quantification in CRT candidates before and after implantation
Longitudinal strain curves per segment are reported in Fig. 14 for large echocardiographic responders and in Fig. 15 for large non-responders at OFF, ON, M6 and M12, respectively.
On the patient datasets, TDFFD parameters were set according to the optimal values found for the synthetic dataset from Section 3.1.3, except for the incompressibility weight that was left to 0. This choice was made to avoid local convergence problems close to the border of the imaging domain when computing the incompressibility constraint. This tends to happen more frequently in patients with dilated LV than in volunteers. Moderate responders and non-responders are plotted in Fig. 16 and Fig. 17, respectively. For each patient and each time point, the minimum value of the mean strain and the normalized time corresponding to this minimum are indicated (dashed gray lines). This average value roughly describes the global contraction in the longitudinal direction over the entire LV. Fig. 14 and 15 shows that this value increases at follow-up in all large responders while it stays constant in large non-responders. Fig. 18 plots the same information using a 3D glyph visualization for Patients #8 and #11, using the same AHA segments as in Figs. 14 and 15. Each ellipsoidal glyph represents the eigen values and eigen vectors of the Cauchy-Green (CG) tensor defined as D(x n 0 ) T Dx n 0 . Physically, the CG tensor gives the square of the stretching part of the transformation as the rotational component is compensated when being multiplied by its transposed. A zero deformation corresponds to a CG tensor equal to the identity and a sphere in the glyph representation. The colormap used in Fig. 18 represents longitudinal strain. Negative longitudinal strain is plotted in green, indicating compression, while positive longitudinal strain, indicating expansion, is plotted in red. Patient #8 is a large responder, for which we can observe at follow-up both the reduction of LVESV and a more uniform distribution of longitudinal strain with negative values, compared to baseline. Darker green color also indicates higher contraction at follow-up, reflecting improvements in the cardiac function. Negative and positive values for longitudinal strain coexist at both baseline and follow-up for Patient #11, who is a large non-responder.
Qualitative agreement is observed between the evolution of myocardial strain curves and clinical outcome as quantified by echocardiographic response. As a global trend, it can be observed that responders show an increase between baseline and follow-up both in the amplitude and synchronicity of strain curves among different LV regions. Quantitative measurement of the synchronicity of strain was computed using a covariance index (CI) based on the approach proposed in Silva et al. (2009). The CI between two regional strain curves 1 and 2 is computed as where t −N1 and t N2 represent the limits of the time domain of the input sequence as defined in Section 2.1. A high value of the CI means highly correlated strain curves, while negative or zero value indicates either 25 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ low strain amplitude or that the two strain curves have opposite signs. The CI is sensitive to an overall improvement of strain amplitudes in all segments and to the synchronization of the strain curves among the different segments, which are the two parameters one is interested to quantify after CRT. All possible combinations of segments were considered to generate a set of CI for each input image sequence. The median value and dispersion of the CI are descriptors of the average strain amplitude and homogeneity over all considered segments. The CI was measured at OFF and at M6 and M12. The resulting values are shown using a box plot next to the corresponding strain curves in Fig. 19. The null hypothesis of equal median was computed using the Wilcoxon rank sum test. This non-parametric test was chosen because the assumption of normal distribution was not valid for our population. The test was performed between OFF and M6 and between OFF and M12. Rejections of the null hypothesis, indicating a shift in CI at the 95 % significance level, are indicated in Fig. 19 by an horizontal line between either OFF and M6, either OFF and M12. Interestingly, all responders showed a significant change in median CI at M6 or M12. This corroborates with the fact that all these patients showed reverse remodeling at follow-up, which is a strong indicator of response. Reverse remodeling was quantified using the volumes given in Table 3. In the group of non-responders, no significant shift in CI is observed for all large non-responders (namely Patients #2, #6 and #11). This is in agreement with the fact that these patients did not show any reverse remodeling (LVESV reduction of approximately 2 % for Patients #2 and #6 and a doubling in LVESV at M6 for Patient # 11). The only significant change was detected for Patient #9, who is a moderate non-responder. Hence, CI discriminated in our population large non-responders from responders.

26
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/

Discussion
In this paper, we presented a diffeomorphic registration algorithm that enforces temporal consistency in the recovered displacement field by imposing the temporal continuity of the underlying velocity field. All speckle tracking or registration-based approaches currently in the literature either do not exploit the temporal consistency of myocardial motion over time and treat every frame in a sequential manner (Wang et al. (2010); Kawagishi (2008)), or do not guarantee the transformation to be diffeomorphic (Ledesma-Carbayo et al. (2005)). By enforcing temporal consistency, our algorithm does not require any drift correction as usually applied in sequential approaches (Elen et al. (2008)). With respect to other temporal diffeomorphic registration methods, our approach ensures recovering a velocity field continuous in time, unlike piecewise geodesic solutions given in Khan and Beg (2008); Durrleman et al. (2009). Temporal continuity of the velocity is obtained through regularization by a spatiotemporal kernel while current approaches regularized on the spatial domain only. This avoids solving differential equations at higher orders as proposed in Trouvé and Vialard (2010), which would result in a prohibitive computational cost when applied to dense 3D images.
The image similarity metric referenced in this work uses a fixed reference. Possible extensions to the metric in Equation (6) could consider other frame combinations. For example, taking into account the similarity metric between consecutive frames, or all possible pairwise combinations of frames. The first option can be highly sensitive to drift but prone to capture the motion of speckle patterns that only remain constant for a limited number of frames. The second has the advantage of taking the best of sequential and non-sequential approaches, but is associated to a higher computational cost.
Although it is not the purpose of this paper, we believe that the algorithm we propose has a clear potential for CRT application, namely for improving patient selection. Its improved robustness and accuracy compared to other methods makes it an excellent approach to improve the quantification of mechanical dyssynchrony, currently more based on myocardial velocity timings or in 2D analysis (Fornwalt et al. (2009)). The potential of analyzing the real 3D nature of myocardial mechanics provides an excellent tool to understand and recognize patterns of dyssynchrony that could be eventually corrected with CRT. The link 27 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/  between mechanical dyssynchrony and response to CRT is still a controversial issue (Delgado and Bax (2011); Sung and Foster (2011)), particularly limited if a single parameter (measured strain) is considered in the analysis. Comparing the measured strain to the deformation patterns of specific mechanisms involved in CRT response may be a more relevant strategy (Parsai et al. (2009)) and this could be achieved with the use of 3D strain analysis by using such a tool as the one we present. Experiments were performed on a set of 13 patients with 4 acquisition times each (OFF, ON, M6 and M12), representing a total of 51 sequences. Results showed that strain curves obtained from the TDFFD algorithm were in accordance with CRT outcome at M6 and M12. This was quantified using the CI computed from the longitudinal strain curves recovered by the TDFFD algorithm. In all responders, significant differences between CI at OFF and follow-up were found. In the group of non-responders, the only patient showing a significant CI difference at follow-up was Patient # 9. This patient was classified as moderate non-responder. Although this patient does improve strain-wise, its LVESV reduction is insufficient (11 %) for being classified as responder.
All clinical images used in our experiments were acquired over the last 3 years at HCPB. Data acquisition was not optimized for 3D motion and strain assessment but was representative of a standard, routinely available, image quality in terms of spatiotemporal resolution and SNR. As previously commented, all CRT responders showed an improvement in strain between OFF and follow-up. Improved strain synchronicity between OFF and ON was qualitatively observable in Patients #4, #5, #10 and #13. However, this observation could not be generalized to all responders. This probably indicates that such a differentiation is too challenging for the quality and resolution of datasets currently available in clinical routine. Additionally, it is well established that time-course response to CRT may vary among patients: some of them are acute 29 This is a pre-print version  Figure 19: Ranges of the covariance index for all the patient population. Average differences between OFF and M6 or OFF and M12 are highlighted when significant at the 5% level.

30
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ responders while others do show a later response ). Thus, we would expect the highest amplitude and best synchronicity at M12, as it is the case for Patients #3, #5, #7, #8, #9, #12 and #13. However, image quality at M6 can be higher than the M12 one, and may temper these conclusions. The incidence of the different registration parameters on the myocardial motion and strain accuracy were quantified in Section 3.1.3. On the synthetic dataset, incompressibility had a reduced influence on the registration accuracy in comparison with the choice of the reference frame or the initial resolution of the control points grid. One possible reason is that, on this dataset, signal to noise ratio was similar in radial, circumferential and longitudinal directions. As radial thickening is clearly observable from the input images, radial strain is correctly estimated and myocardial volume is relatively constant over time (see Fig. 13). On healthy volunteers, volume change was found to exceed 11% in 4 volunteers over 9. The reason for this is the underestimation of radial strain by our method. This indicates that the soft constraint scheme used in this paper was insufficient to efficiently impose incompressibility over the myocardium volume. Nonetheless, imposing incompressibility as a hard constraint raises several questions. First, Yin et al. (1996) provided experimental evidence that compliance to the incompressibility constraint is changing over the cardiac cycle. Second, some parts of the myocardial tissue, such as papillary muscles are known to be highly compressible. Third, incompressibility has many local optima since any constant velocity field is incompressible. For all these reasons, we made the choice of imposing incompressibility as a weak constraint. Although a hard constraint would be more accurate, it requires to know at which spatial and temporal locations incompressibility is fully reliable.
As visible from Fig. 9, the end-systolic frame was the poorest choice of reference. This can be interpreted in two ways. First, this frame corresponds to the shortest path on average when computing trajectories over time. It appears from Equation (11) that any control point propagates derivatives with respect to p to posterior time points using the first term of this equation. The longer the temporal path, the stronger this propagation mechanism. Accuracy was increased when taking the first frame as a reference, which suggests that the higher the coupling between time points, the better with respect to the overall accuracy. Second, the end-systolic frame is the most dissimilar to the other frames and therefore represents a poorest reference choice for the metric computation. Hence, a reference that is closer to the average of the sequence, as the end-diastolic frame, gives higher accuracy.
One limitation of using a smooth temporal kernel is to introduce temporal smoothing in the recovered myocardial motion and strain curves. Our current implementation uses a fixed size kernel that introduces the same amount of smoothing in the entire cardiac cycle. One improvement of the proposed method would consist in tuning the width of the temporal kernel when faster changes in myocardial velocity are expected to occur (e.g. end of systole). Another limitation of this study is to focus the clinical observations only on the longitudinal component of strain. This choice was made because all clinical 3D image datasets used in this study were acquired from an apical 4 chambers view. In such a setting, CNR is better in the longitudinal direction, and longitudinal deformation can be more accurately estimated than radial deformation.

Conclusion
We have described a new temporal diffeomorphic registration method called TDFFD and applied it to recover motion and strain from 3D echocardiography images. By introducing in the velocity computation a continuous kernel over time, we enforce temporal consistency in the recovered displacements. This can be particularly useful in image sequences with substantial amount of noise and artifacts. The TDFFD algorithm was applied to the quantification of strain in 3D US using synthetic datasets, and clinical images from healthy subjects and patients undergoing CRT.
Experiments on synthetic 3D US datasets showed an improved robustness and accuracy in the quantification of motion and strain for low values of CNR compared to a classical pairwise approach and one of its extension to recover 3D+t displacement fields. On healthy volunteers, the method provided physiologically meaningful longitudinal strain curves with small dispersion among LV segments. On CRT patients, the quantification performed before and after CRT, including follow-ups at 6 and 12 months showed agreement between the quantified myocardial strain curves and clinical outcome. This agreement was quantified using 31 This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ covariance indexes of average and segmental strain curves. Future extension of the TDFFD algorithm will focus on incorporating similarity metrics adapted to statistical characteristics of US speckle noise (Myronenko et al. (2009)) and its extension to incorporate compounding strategies to improve the limited field-of-view in 3D US sequences of heart failure patients with dilated LV similar to Piella et al. (2011).

A. Derivation of the incompressibility constraint
In this appendix, we detail the computation of d dp i,j,k,l tr Dv n (p) required to compute the derivative of the incompressibility constraint from Equation (13).
If x n 0 (p) = (x, y, z) and x i = x−q i ∆i , y j = y−q j ∆j , z k = z−q j ∆ k , and t l = t−q l ∆ l , the trace of the Jacobian in Equation (13) can be written as tr Dv n (p) = i,j,k,l β ( x i )β( y j )β( z k )β( t l ) p i,j,k,l where β designates the derivative of the 1D kernel defined in Equation (1). The derivative of Equation (19) to compute the derivative of Equation (13) is d dp i,j,k,l tr Dv n (p) =

33
This is a pre-print version The final version can be downloaded from http://www.sciencedirect.com/ Velocity field at spatiotemporal location x, t parameterized by p x k 0 (p) := ϕ0(x, t k , p) Trajectory of a point x from time t = 0 to time t k q i,j,k,l := (q i , q j , q k , q l ) Grid of control points D Jacobian operator on spatial dimensions Mid anterior 8 Mid anteroseptal 9 Mid inferoseptal 10 Mid inferior 11 Mid inferolateral 12 Mid anterolateral