An enhanced uncertainty principle for the Vaserstein distance

We improve some recent results of Sagiv and Steinerberger that quantify the following uncertainty principle: for a function $f$ with mean zero, either the size of the zero set of the function or the cost of transporting the mass of the positive part of $f$ to its negative part must be big. We also provide a sharp upper estimate of the transport cost of the positive part of an eigenfunction of the Laplacian. This proves a conjecture of Steinerberger and provides a lower bound of the size of the nodal set of the eigenfunction.


Introduction
For a continuous function with mean zero, the Vaserstein distance between the measures corresponding to the positive and the negative parts of the function indicates how oscillatory the function is. If this Vaserstein distance is small then the work required to move the positive mass to the negative mass is small and so we expect the positive and the negative parts of the function to be close together. Consequently, we would expect the function to oscillate significantly.
Our main result is an improvement of an uncertainty principle due to Sagiv and Steinerberger [11] showing that the the zero set of a mean zero, continuous function and the Vaserstein distance between the positive and negative parts of the function cannot both be small at the same time. We prove this result for a function defined in the unit cube of R d . It extends to functions defined on a smooth, compact Riemannian manifold M of dimension d.
Finally, we obtain an upper estimate for this Vaserstein distance in the case of high frequency eigenfunctions of the Laplacian in M -by the previous uncertainty principle, this indicates that the nodal sets of these eigenfunctions should be large.
A continuous function f on the unit cube Q = [0, 1] d in R d that has zero mean is decomposed into its positive part f + = max{f, 0} and its negative part f − = max{−f, 0}. The interface between the supports Date: October 9, 2020. The last two authors have been partially supported by the Generalitat de Catalunya (grant 2017 SGR 359) and the Spanish Ministerio de Ciencia, Innovación y Universidades (project MTM2017-83499-P). of these two functions is the zero set Thinking of f + as earth that is to be moved and of −f − as holes that need to be filled, then the earth-moving work that is required to fill the holes is the Vaserstein distance between the measures with densities f + and f − . As mentioned earlier, if the earth mover's distance is small then any earth to be moved f + must be close to a hole that needs to be filled f − , and so the interface between the two must be large. This is the intuition behind the following quantitative result of Steinerberger [12,Theorem 2] in dimension 2. With a minor abuse of notation, we write W 1 (f + , f − ) for the Vaserstein distance between the measures on Q with densities f + and f − respectively relative to Lebesgue measure. We write H d−1 (Z(f )) for the (d − 1)-dimensional Hausdorff measure of the zero set of f . Then, in dimension d = 2, The Vaserstein distance between probability measures µ and ν on Q is defined by where the infimum is over all admissible transport plans, that is over all probability measures ρ on Q × Q with marginals µ and ν. Such probability measures ρ are also referred to as couplings of µ and ν. The monograph Optimal Transport, Old and New by Cedric Villani [15] has become a classic reference on optimal transport and includes a detailed exposition of the Vaserstein distance, also known as the 'earth-mover's distance'. The p-Vaserstein distance W p (µ, ν) is defined similarly but taking the p-norm of |x−y|. The 1-Vaserstein distance has at least two advantages. One is that it has an equivalent Monge-Kantorovich dual formulation as Here The other, and more important, advantage is that the definition doesn't change if in (2) dρ is replaced by d|ρ| and ρ is allowed to be a signed measure or transport plan on Q×Q with marginals µ and ν (see [5]). This extra freedom allows us to construct transport plans that lead to better estimates, specifically in the course of proving Theorem 3.
The method of proof that Steinerberger uses to obtain the estimate (1) does not extend to higher dimensions in any obvious way. Using a different method, Sagiv and Steinerberger [11] prove that . By a modification of the 'balanced/unbalanced cubes' method of Sagiv and Steinerberger, we can reduce the power from 4 − 1/d to 2 − 1/d.
The proof is based on a decomposition of the original cube Q into smaller cubes Q at different scales where either the mass of |f | is irrelevant or Q f + is much larger than Q f − (or the other way around).
This proof extends to a somewhat more general setting. Let (M, g) be a d-dimensional, smooth, compact Riemannian manifold without boundary and let dV denote the volume form associated with g. A function f : M → R has zero mean if M f dV = 0.
In this setting, the Vaserstein distance between two probability measures µ and ν on M is then where the infimum is over all admissible transport plans ρ from µ to ν. Here d(x, y) stands for the distance induced by the metric g and We state this result for M compact without boundary because of the application we have in mind (see Theorem 3 below), but it will be clear from the proof that the statement holds equally well for M compact with smooth boundary.
We also show by means of an example (see Proposition 5) that the power 2 − 1/d in (4) cannot be replaced by any power smaller than 1. In particular, Steinerberger's estimate (1) in dimension 2 is best possible in this sense.
The uncertainty principle in Theorem 2 demonstrates that an upper estimate for the Vaserstein distance W 1 (f + , f − ) implies a lower estimate on the size of the nodal set. In this context, we establish one direction of a conjecture of Steinerberger on the Vaserstein distance between the positive and negative parts of eigenfunctions of the Laplacian. Steinerberger in [14] posed the following conjecture: Conjecture. Let (M, g) be a smooth, compact Riemannian manifold without boundary. Is it true that if φ is an L 2 -normalised eigenfunction of the Laplacian with eigenvalue L, so that −∆φ = Lφ on (M, g), then Steinerberger proves that We obtain the conjectured upper bound for the case p = 1 and for all linear combinations of eigenfunctions with high frequencies. This formalises the intuition that for high frequency eigenfunctions it is "cheap" to move from the positive to the negative part. Then The improvement by the factor √ log L follows from the construction of a (signed) transport plan that is well concentrated on the diagonal.
There is nothing special about the Laplacian in the context of Theorem 3, in that the result holds for any elliptic operator with smooth coefficients in the manifold M. We only need certain estimates on a Bochner-Riesz type kernel that are known to hold for general elliptic operators, see [9].
Together, Theorem 2 and Theorem 3 show that when φ is a linear combination of eigenfunctions of the Laplacian with eigenvalues bigger than L, This is a several variables generalization of Sturm's theorem on zeros of linear combinations of eigenfunctions, see [1]. As such, it goes in the direction of Yau's conjecture that, in a smooth compact Riemannian manifold without boundary and for an eigenfunction φ of the Laplacian with eigenvalue L, we have The full lower bound in Yau's conjecture, without terms involving L ∞ and L 1 norms of φ, that is H d−1 (Z(φ)) √ L , has already been proved by Logunov in [4].
We finally remark that our method seems to provide information only for the Vaserstein distance W 1 . As mentioned, the definition of W 1 (µ, ν) does not change if the transport plan dρ is replaced by d|ρ|, where ρ is a signed transport plan. This fails dramatically for p > 1.
Proposition 1. Let p > 1 and let µ, ν be two probability measures in the interval I = [0, 1]. We define where the infimum is taken over all admissible signed transport plans, that is over all signed measures ρ on I × I with marginals µ and ν.
Proof. Consider first the case µ = δ 0 and ν = δ 1 . Then we consider the sequence of transport plans ρ n , which consist of n negative Dirac deltas and n + 1 positive Dirac deltas located in points of I × I as in the figure: On the white dots we place a positive Dirac delta and on the black dots a negative Dirac delta. More precisely we take ρ n to be Clearly the marginals of ρ n are δ 0 and δ 1 . For any of the Dirac deltas, whether positive or negative and located at a point (x, y), we have that |x − y| = 1/(2n), except for the Dirac delta at (0, 0). Thus, Thus, This argument can be easily adapted to prove that W p (δ x , δ y ) = 0 for any pair x, y ∈ [0, 1]. Since linear combinations of Dirac deltas are weak*-dense in the space of probability measures, it follows that W p (µ, ν) = 0 for any probability measures µ, ν.
Acknowledgements. We are very thankful to Benjamin Jaye for letting us know that there was a gap in an earlier version of the proof of Theorem 1 and for finding the nice fix that he generously lets us use here. The construction of the Q j in that proof is due to him. We are also thankful to Gian Maria Dall'Ara for helpful discussions and to the referee for a careful reading of the manuscript and for many thoughtprovoking suggestions that have resulted in a significant improvement of the text.

Proof of Theorem 1
Note that in general f + dV and f − dV , where dV is Lebesgue measure in R d , are not probability measures, which is the usual setting for the Vaserstein distance. However, the distance is well defined for measures with the same total mass. Alternatively, notice that the zero mean condition implies that 2f + / f 1 dV and 2f − / f 1 dV are probability measures, so we can define In any case, replacing f by f / f 1 if necessary, we may assume without loss of generality that f 1 = 1 and proceed to prove that there is a constant C d > 0 such that For convenience, we extend the function f to a function defined in all R d , extending it to be 0 outside Q. We continue to denote this function by f . We shall use a decomposition of the cube Q into cubes at different scales defined through a continuous stopping time argument.
The argument draws on constructions used by Steinerberger [13] and Sagiv and Steinerberger [11]. We need some definitions to describe this decomposition.
For any measurable set A we denote its volume by V (A). The side length of a cube Q is denoted by l(Q), so V (Q) = l(Q) d . We write . Definition 1. We say that a cube Q is unbalanced if either we say that the cube is balanced.
Definition 2. We say that a cube Q is full whenever The empty cubes are those cubes Q for which For every x ∈ Q such that f (x) = 0, there exists l(x) > 0 such that the open cube Q x centred at x and of side length l(x) = l(Q x ) is simultaneously balanced and unbalanced. That is, either . This can be achieved by continuity, since for l very small the cube centred at x and of side length l is infinitely unbalanced, while for side length l = 2 it is balanced, by (9). Then there must be an intermediate side length l(x) that makes the cube both balanced and unbalanced.
These cubes Q x cover Q (up to at most a zero-measure set). By the Besicovitch covering theorem [2] , one can find finitely many sequences (x i,j ) i≥1 , j = 1 . . . 5 d , such that the cubes (Q x i,j ) i≥1 are disjoint for each j, and together they still cover Q. That is, Since Q |f | = 1, there is at least one family of cubes (Q x i,j ) i≥1 (which, by relabelling, we may assume corresponds to j = 1) such that From this particular sequence of cubes we select those that are full, and further relabel the centres of the cubes of this subfamily as (x i ) i≥1 and the cubes themselves as Q x i = Q i . These cubes are disjoint and carry most of the mass.

Proposition 2.
There is a constant c > 0 depending only on the dimension d such that Proof. First let us note that the mass of f in the cubes Q x i,1 that are empty cannot be very big: Thus the integral over the full cubes satisfies i: Qx i,1 full Qx i,1 |f | ≥ 5 −d 9 10 .
Denote by F + the set of indices of the cubes Q i that are full, balanced, and that are unbalanced in the sense that . Similarly, we denote by F − the indices corresponding to those cubes Q i that are full, balanced, and that are unbalanced in the sense that V − f (Q i ) dominates V + f (Q i ) ((11) holds). Lemma 1. For i ∈ F + , each of the following estimates holds: Analogous estimates hold for i ∈ F − .
Proof. If i ∈ F + then, by (6), Since Q i is full we then have These estimates together imply (12). Finally, which leads to (13).
We are now ready to bound from below both the Hausdorff measure of the zero set and the Vaserstein distance between f + and f − . That the Hausdorff measure of the zero set cannot be small comes from the fact that the cubes Q i are balanced. That the Vaserstein distance between f + and f − cannot be small comes from the fact that they are unbalanced. We first estimate from below the Hausdorff measure of Z(f ) in Q.
Proposition 3. We have: Proof. We start the proof by considering only the cubes Q i that are contained in Q. We will deal later with the cubes that intersect the boundary of Q. We recall the following relative isoperimetric inequality (see [6][7][8]): for an open cube Q in R d and K ⊂ Q, Observe that since Q i is balanced the volumes in Q i separated by Z(f ) are comparable, up to a factor f ∞ . In fact, if Then, by the relative isoperimetric inequality (15), Since the cubes Q i are disjoint, This last estimate holds only for cubes Q i that are fully inside Q. There may be others that touch the boundary, but for these we have The first inequality holds because the cubes are disjoint and all intersect ∂Q, and the last one because of the relative isoperimetric inequality applied to Q (see (9)). The estimate (14) now follows. Now we are going to estimate the transport realized in each of the full cubes Q i making use of the fact that they are unbalanced.
Proposition 4. We have the following estimate of the Vaserstein distance between f + and f − : Proof. By definition where ρ is a transport plan between f + and f − , that is ρ is a measure supported on Q × Q such that for any measurable set A ⊂ Q, We need a uniform lower bound on the transport required for a general plan ρ. We have, Here, d(x, ∂Q i ) is the distance from x ∈ Q i to the boundary of the cube Q i . We now estimate the transport for each Q i . Assume i ∈ F + , the case i ∈ F − being completely analogous. Given any transport plan ρ, write On the other hand Since, by (12), we deduce, using (13), that Next, writing the integral in terms of the distribution function, Since ν ≤ f + χ Q i dV and f + is bounded, we have that for some constant C (depending on the dimension). Then, by (17), The crossover point where ν(Q i ) dominates being when Going back to (16) and using the estimate (18) gives the estimate which finishes the proof of Proposition 4.
Finally, to conclude the proof of Theorem 1, we use first Proposition 4 and (14) to obtain: By the Cauchy-Schwarz inequality for sums, applied in the opposite direction to usual, and by Proposition 2 the result follows:

Proof of Theorem 2 (Sketch)
Let d be the dimension of the manifold M and let ρ be the injectivity radius of (M, g), that is, the supremum of the values r > 0 such that the exponential map defines a global diffeomorphism from the ball with centre 0 and radius r in R d onto its image in M. For x ∈ M and r > 0 let B(x, r) denote the ball of centre x and radius r in the distance induced by the metric g.
Assume, as before, that f L 1 (M ) = 1 and fix r 0 ≤ 3ρ. We start by choosing a ball in M with a substantial part of the L 1 -norm of f : there exist ǫ = ǫ(M) and x 0 ∈ M such that Denote B = B(x 0 , r 0 ). To adapt the scheme of the previous proof from Q to B we consider f restricted to 2B = B(x 0 , 2r 0 ) and extend it outside by 0. We still denote this function by f .
Assume first that B is such that This plays the role of (9) in this proof. The factor ǫ everywhere is just (the bound of) the L 1 -norm of f on B.
Here we call a ball B = B(x, r) balanced if For every x ∈ B, let B x = B(x, r(x)) be the ball centered a x and with radius r(x) chosen so that either . Such a radius r(x) exists and is smaller than the injectivity radius, because f vanishes outside 2B.
As in the cube case, by the Besicovitch covering theorem there are finitely many families of disjoint balls (B x i,j ) i≥1 that cover B. We can then select a family, called (B x i,1 ) i≥1 , such that From these balls we select those that are full, and we relabel them as (B i ) i≥1 . With this family of balls, which plays the role of the family (Q i ) i in the case of the cube, we can repeat, mutatis mutandis, the arguments that prove the equivalents of Propositions 2, 3 and 4, and therefore the inequality in Theorem 2. In the proof of Proposition 3 we separate the balls B i inside 2B, which are dealt with as before, and those that intersect the boundary of 2B. For these ones we use that i r d−1 i H d−1 (∂B), since the centres of the disjoint balls B i are always in B.
In case B does not satisfy (20) the desired estimate is straightforward. On the one hand, the argument of Proposition 4 applied just to the ball B yields On the other hand, the relative isoperimetric inequality applied to any ball B(x, r 0 ) with x ∈ Z(f ) yields Together these lead to

An example
Next we show that the exponent 2 − 1/d in Theorem 1 cannot be replaced by any power smaller than 1. In particular, Steinerberger's uncertainty principle (1) in dimension 2 is best possible in this sense.
Proposition 5. Let ε > 0. There is a continuous function f ε : Q 0 → R such that does not hold in general for any exponent α < 1.
Proof. The construction is as follows. Write Take the function f ε (x) = h ε (x d ) where the graph of h ε is as in the picture: With a similar example one can check that in dimension 2, the inequality (1), that is , cannot hold with a constant C greater than 1. It is an interesting problem to determine the best constant in the equality (1).

Proof of Theorem 3 on eigenfunctions of the Laplacian
Let d be the dimension of M and denote by V the volume form on M associated to g and normalised so that V (M) = 1.
In order to construct a transport plan between f + and f − we consider an auxiliary kernel. Let a : [0, 1] → R be a smooth decreasing function such that a(t) ≡ 1 in [0, 1/4] and a(t) ≡ 0 in [3/4, 1].
Observe that φ 0 (x) = 1 and therefore For any L > 0, we write This is a kernel of Bochner-Riesz type. It is a smoothed out version of the Bergman kernel that gives the orthogonal projection from L 2 (M) to the span generated by the first eigenvector of the Laplacian, in the same spirit as the Riesz kernels are a smoothed version of the Dirichlet kernel on trigonometric sums. See [9,10] for the basic properties of the kernel. It is proved in [9, Lemma 2.1] that the following pointwise estimates hold: for any N > 0 there exists C N > 0 such that x, y ∈ M Now we use a slightly different definition of the Vaserstein distance (see [5,Formula (43)]): where ρ are now signed measures on M ×M with marginals ρ(·, M) = µ, ρ(M, ·) = ν. This follows from the estimate of the Vaserstein distance using the dual expression (3): A direct estimate yields, for any signed measure ρ with marginals µ and ν, The other inequality is trivial. Let σ be the pushforward of the measure f − dV by the diagonal map Hence, the marginal of the first term in ρ L with respect to y ∈ M is f (x) dV (x), and therefore For the other marginal we use the orthogonality of f to all φ i , λ i < L, (since it is a linear combination of eigenfunctions of −∆ with eigenvalues λ k ≥ L). Thus, dV (x) = 0, and the second marginal of ρ L reduces to that of σ, which is f − (y) dV (y). Now that we have checked that ρ L has the correct marginals let us prove the inequality in the statement of Theorem 3.
Since σ is supported on the diagonal, it does not contribute to this last integral. Using (22), we are led to: We are still free to choose N. We pick N > d + 1 (the choice N = d + 2 works fine) and complete the proof of Theorem 3 by showing that there is a finite constant C independent of L such that Since M is compact, the volume of a geodesic ball {y : d(x, y) < s} is at most a (global) constant times s d . We deduce, finally, that which proves (23) and completes the proof of Theorem 3.