Measurement of an Excess of B ->D(*) Tau Nu Decays and Implications for Charged Higgs Bosons

Based on the full BaBar data sample, we report improved measurements of the ratios R(D(*)) = B(B ->D(*) Tau Nu)/B(B ->D(*) l Nu), where l is either e or mu. These ratios are sensitive to new physics contributions in the form of a charged Higgs boson. We measure R(D) = 0.440 +- 0.058 +- 0.042 and R(D*) = 0.332 +- 0.024 +- 0.018, which exceed the Standard Model expectations by 2.0 sigma and 2.7 sigma, respectively. Taken together, our results disagree with these expectations at the 3.4 sigma level. This excess cannot be explained by a charged Higgs boson in the type II two-Higgs-doublet model. Kinematic distributions presented here exclude large portions of the more general type III two-Higgs-doublet model, but there are solutions within this model compatible with the results.


I. INTRODUCTION
In the Standard Model (SM), semileptonic decays of B mesons proceed via first-order electroweak interactions and are mediated by the W boson [1][2][3]. Decays involving electrons and muons are expected to be insensitive to non-SM contributions and therefore have been the bases of the determination of the Cabibbo-Kobayashi-Maskawa (CKM) matrix elements |V cb | and |V ub | [4]. Decays involving the higher-mass τ lepton provide additional information on SM processes and are sensitive to additional amplitudes, such as those involving an intermediate charged Higgs boson [5][6][7][8][9]. Thus, they offer an excellent opportunity to search for this and other non-SM contributions.
Over the past two decades, the development of heavyquark effective theory (HQET) and precise measurements of B → D ( * ) ℓ − ν ℓ decays [10] at the B factories [11,12] have greatly improved our understanding of exclusive semileptonic decays. The relative rates are independent of the CKM element |V cb | and also, to a large extent, of the parameterization of the hadronic matrix elements. SM expectations [9] for the ratios R(D) and R(D * ) have uncertainties of less than 6% and 2%, respectively. Calculations [5][6][7][8][9] based on two-Higgsdoublet models predict a substantial impact on the ratio R(D), and a smaller effect on R(D * ) due to the spin of the D * meson.
The decay B → D * τ − ν τ was first observed in 2007 by the Belle Collaboration [13]. Since then, both BABAR and Belle have published improved measurements, and have found evidence for B → Dτ − ν τ decays [14][15][16]. Up to now, the measured values for R(D) and R(D * ) have consistently exceeded the SM expectations, though the significance of the excess is low due to the large statistical uncertainties.
We recently presented an update of the earlier measurement [14] based on the full BABAR data sample [17]. This update included improvements to the event reconstruction that increased the signal efficiency by more than a factor of 3. In the following, we describe the analysis in greater detail, present the distributions of some important kinematic variables, and expand the interpretation of the results.
We choose to reconstruct only the purely leptonic decays of the τ lepton, τ − → e − ν e ν τ and τ − → µ − ν µ ν τ , so that B → D ( * ) τ − ν τ and B → D ( * ) ℓ − ν ℓ decays are identified by the same particles in the final state. This leads to the cancellation of various detection efficiencies and the reduction of related uncertainties on the ratios R(D ( * ) ).
Candidate events originating from Υ (4S) → BB decays are selected by fully reconstructing the hadronic decay of one of the B mesons (B tag ), and identifying the semileptonic decay of the other B by a charm meson (charged or neutral D or D * meson), a charged lepton (either e or µ) and the missing momentum and energy in the whole event.
Yields for the signal decays B → D ( * ) τ − ν τ and the normalization decays B → D ( * ) ℓ − ν ℓ are extracted by an unbinned maximum-likelihood fit to the two-dimensional distributions of the invariant mass of the undetected particles m 2 miss = p 2 miss = (p e + e − −p Btag −p D ( * ) −p ℓ ) 2 (where p e + e − , p Btag , p D ( * ) , and p ℓ refer to the four-momenta of the colliding beams, the B tag , the D ( * ) , and the charged lepton, respectively) versus the lepton three-momentum in the B rest frame, |p * ℓ |. The m 2 miss distribution for decays with a single missing neutrino peaks at zero, whereas signal events, which have three missing neutrinos, have a broad m 2 miss distribution that extends to about 9 GeV 2 . The observed lepton in signal events is a secondary particle from the τ decay, so its |p * ℓ | spectrum is softer than for primary leptons in normalization decays.
The principal sources of background originate from BB decays and from continuum events, i.e., e + e − → f f (γ) pair production, where f = u, d, s, c, τ . The yields and distributions of these two background sources are derived from selected data control samples. The background decays that are most difficult to separate from signal decays come from semileptonic decays to higher-mass, excited charm mesons, since they can produce similar m 2 miss and |p * ℓ | values to signal decays and their branching fractions and decay properties are not well known. Thus, their impact on the signal yield is examined in detail.
The choice of the selection criteria and fit configuration are based on samples of simulated and data events. To avoid bias in the determination of the signal yield, the signal region was blinded for data until the analysis procedure was settled. Given that leptons are not affected by quantum chromodynamic (QCD) interactions (see Fig. 1), the matrix element of B → D ( * ) τ − ν τ decays can be factorized in the form [5] M λτ λ D ( * ) (q 2 , θ τ ) = are the leptonic and hadronic currents defined as Here, the indices λ refer to the helicities of the W , D ( * ) , and τ , q = p B −p D ( * ) is the four-momentum of the virtual W , and θ τ is the angle between the τ and the D ( * ) threemomenta measured in the rest frame of the virtual W . The metric factor η in Eq. 2 is η {±,0,s} = {1, 1, −1}, where λ W = ±, 0, and s refer to the four helicity states of the virtual W boson (s is the scalar state which, of course, has helicity 0). The leptonic currents can be calculated analytically with the standard framework of electroweak interactions. In the rest frame of the virtual W (W * ), they take the form [18]: with v = 1 − m 2 τ q 2 , d ± = 1 ± cos θ τ √ 2 , d 0 = sin θ τ . (8) Given that the average q 2 in B → D ( * ) τ − ν τ decays is about 8 GeV 2 , the fraction of τ − leptons with positive helicity is about 30% in the SM. Due to the nonperturbative nature of the QCD interaction at this energy scale, the hadronic currents cannot be calculated analytically. They are expressed in terms of form factors (FF) as functions of q 2 (see Secs. II A 1 and II A 2).
The differential decay rate, integrated over angles, is derived from Eq. 2 and Eqs..5-7 [2]: where |p * D ( * ) | is the three-momentum of the D ( * ) meson in the B rest frame. For simplicity, the helicities of the D ( * ) meson and the q 2 dependence of the hadron helicity amplitudes H ±,0,s have been omitted. The assignment is unambiguous because in B → D * τ − ν τ decays, H ± only receive contributions from λ D * = ±, while H 0,s require λ D * = 0. In B → Dτ − ν τ decays, only λ D = s is possible, which implies H ± = 0.
1. Form factor parameterization of B → D * τ − ντ decays Four independent FFs, V , A 0 , A 1 , and A 2 , describe the non-perturbative QCD interactions in B → D * τ − ν τ decays. Based on the FF convention of Ref. [9], the hadronic currents take the following form: In this analysis, we use an HQET-based parameterizations for the FFs that is expressed in terms of the scalar product of the B and D * four-velocities Its minimum value w min = 1 corresponds to q 2 max = (m B − m D * ) 2 . The maximum value is obtained for the lowest possible value of q 2 , which is the square of the mass of the lepton. Thus, w max = 1.35 for B → D * τ − ν τ decays and w max = 1.51 for B → D * ℓ − ν ℓ decays.
Three of the remaining four FF parameters, R 1 (1), R 2 (1), and ρ 2 D * , have been measured in analyses of B → D * ℓ − ν ℓ decays. The most recent averages by the Heavy Quark Averaging Group (HFAG) [4] and their correlations C are: R 0 (w) affects the decay rate only via the scalar hadronic amplitude H s (q 2 ). The corresponding leptonic amplitude L s (q 2 , θ τ ) is helicity suppressed, i.e., its rate is proportional to the mass of the lepton (Eq. 6). As a result, B → D * ℓ − ν ℓ decays are not sensitive to this FF, and R 0 (w) has not been measured. We therefore rely on a theoretical estimate, R 0 (1) = 1.14 ± 0.07, based on HQET [9].

Form factor parameterization of B → Dτ − ντ decays
The non-perturbative QCD interactions in B → Dτ − ν τ decays are described by two independent FFs, referred to as V 1 and S 1 [8]. The helicity amplitudes take the form: The amplitudes corresponding to the helicities λ W = ± vanish because the D meson has spin 0. For this decay mode, the variable w is defined as in Eq. 11, except that the D * meson mass is replaced by the D meson mass m D .
Taking into account dispersion relations [19], V 1 can be expressed as where V 1 (1) and ρ 2 D are FF parameters. The normalization V 1 (1) cancels in the ratio R(D). Based on B → Dℓ − ν ℓ decays, the average value of the shape parameter is ρ 2 D = 1.186 ± 0.055 [4]. As for B → D * τ − ν τ decays, the scalar hadronic amplitude is helicity suppressed and as a result, S 1 (w) cannot be measured with B → Dℓ − ν ℓ decays. We use instead the following estimate based on HQET [8]: with ∆ = 1 ± 1. We have employed this FF parameterization to generate B → Dτ − ν τ and B → Dℓ − ν ℓ decays, as described in Sec. III C 2. Though we used the same FF definitions and parameters, we found a difference of 1% between the value of R(D) that we obtained by integrating Eq. 9 and the value quoted in Ref. [8].
On the other hand, if we adopt the FF parameters of Ref. [20], we perfectly reproduce the R(D) predictions presented there. The translation of the FF parameterization of Ref. [20] into standard hadronic amplitudes is not straightforward, so we do not use these FFs in the Monte Carlo simulation. Since both parameterizations yield essentially identical q 2 spectra, they are equivalent with respect to Monte Carlo generation, which is not sensitive to differences in normalization.
3. SM calculation of R(D ( * ) ) and q 2 spectrum We determine the SM predictions for the ratios R(D ( * ) ) integrating the expression for the differential decay rate (Eq. 9) as follows: with q 2 max = (m B − m D ( * ) ) 2 . The uncertainty of this calculation is determined by generating one million random sets of values for all the FF parameters assuming Gaussian distributions for the uncertainties and including their correlations. We calculate R(D ( * ) ) with each set of values, and assign the root mean square (RMS) of its distribution as the uncertainty. We apply this procedure for B 0 and B − decays, and for ℓ = e and µ, and average the four results to arrive at the following predictions, Additional uncertainties that have not been taken into account could contribute at the percent level. For instance, some electromagnetic corrections could affect B → D ( * ) ℓ − ν ℓ and B → D ( * ) τ − ν τ decays differently [9].
The experimental uncertainty on R(D ( * ) ) is expected to be considerably larger.
The q 2 spectra for B → D ( * ) τ − ν τ decays in Fig. 2 clearly show the threshold at q 2 min = m 2 τ , while for B → D ( * ) ℓ − ν ℓ decays q 2 min ∼ 0. We take advantage of this difference in the signal selection by imposing q 2 > 4 GeV 2 . The spectra for ℓ = e and µ are almost identical, except for q 2 < m 2 µ = 0.011 GeV 2 .
B. Two-Higgs-Doublet Model Type II As we noted in the introduction, B → D ( * ) τ − ν τ decays are potentially sensitive to new physics (NP) processes. Of particular interest is the two-Higgs-doublet model (2HDM) of type II, which describes the Higgs sector of the Minimal Supersymmetric model at tree level. In this model, one of the two Higgs doublets couples to up-type quarks, while the other doublet couples to downtype quarks and leptons.
The contributions of the charged Higgs to B → D ( * ) τ − ν τ decays can be encapsulated in the scalar helicity amplitude in the following way [5,20]: Here, tanβ is the ratio of the vacuum expectation values of the two Higgs doublets, m H ± is the mass of the charged Higgs, and m c /m b = 0.215 ± 0.027 [21] is the ratio of the c-and b-quark masses at a common mass scale. The negative sign in Eq. 19 applies to B → Dτ − ν τ decays and the positive sign applies to B → D * τ − ν τ decays. This expression is accurate to 1% for m H + larger than 15 GeV. The region for m H + ≤ 15 GeV has already been excluded by B → X s γ measurements [22]. The tanβ/m H + dependence of the ratios R(D ( * ) ) in the type II 2HDM can be studied by substituting H 2HDM s for H SM s in Eq. 9. Given that charged Higgs bosons are not expected to contribute significantly to B → D ( * ) ℓ − ν ℓ decays, R(D ( * ) ) 2HDM can be described by a parabola in TABLE I. Dependence of R(D ( * ) ) on tanβ/m H + in the 2HDM according to Eq. 20 for B → Dτ − ντ and B → D * τ − ντ decays: the values of R(D ( * ) ), the parameters A and B with their uncertainties, and correlations C. Table I lists the values of A D ( * ) and B D ( * ) , which are determined by averaging over B 0 and B − decays. The uncertainty estimation includes the uncertainties on the mass ratio m c /m b and the FF parameters, as well as their correlations.
Due to the destructive interference between the SM and 2HDM amplitudes in Eq. 19, charged Higgs contributions depress the ratios R(D ( * ) ) for low values of tanβ/m H + . For larger values of tanβ/m H + , the Higgs contributions dominate and R(D) and R(D * ) increase rapidly. As the coefficients of Table I show, the 2HDM impact is expected to be larger for R(D) than for R(D * ). This is because charged Higgs contributions only affect the scalar amplitude H 2HDM s , but B → D * τ − ν τ decays also receive contributions from H ± , diluting the effect on the total rate. Figure 3 shows the impact of the 2HDM on the q 2 spectrum. Given that the B and D mesons have spin J = 0, the SM decays B → DW * → Dτ ν proceed via P -wave for J W * = 1, and via S-wave for J W * = 0. For the Pwave decay, which accounts for about 96% of the total amplitude, the decay rate receives an additional factor |p * D | 2 , which suppresses the q 2 spectrum at high values. Since charged Higgs bosons have J H = 0, their contributions proceed via S-wave, and, thus, have a larger average q 2 than the SM contributions. As a result, for low values of tanβ/m H + where the negative interference depresses H 2HDM s , the q 2 spectrum shifts to lower values. For large values of tanβ/m H + , the Higgs contributions dominate the decay rate and the average q 2 significantly exceeds that of the SM.
The situation is different for B → D * τ − ν τ decays because the D * meson has spin J D * = 1. The SM decays can proceed via S, P , or D-waves, while the decay via an intermediate Higgs boson must proceed via P -wave, suppressing the rate at high q 2 .
When searching for charged Higgs contributions, it is important to account for the changes in the q 2 spectrum. This distribution has a significant impact on the analysis  due to the close relation between q 2 and m 2 miss , one of the fit variables.
Charged Higgs contributions also affect the |p * ℓ | distribution. Given the spin 0 of the Higgs boson and the positive helicity (right-handedness) of the anti-neutrino, the decays H − → τ − ν τ always produce τ − leptons with positive helicities (λ τ = +). As a result, the fraction of right-handed τ − leptons produced in B → D ( * ) τ − ν τ decays changes from 30% in the SM, to close to 100% when the 2HDM contributions dominate.
The lepton spectrum of polarized τ ± → ℓ ± ν ℓ ν τ decays is well known [23]. For τ − leptons with λ τ − = −, the ℓ − is emitted preferentially in the τ − direction, while the opposite is true for positive helicities. In the B rest frame, leptons of a certain momentum in the τ − rest frame have larger momentum if they are emitted in the direction of the τ − momentum than in the opposite direction. As a result, the |p * ℓ | spectrum for SM decays is harder than for Higgs dominated decays. For low values of tanβ/m H + for which the destructive interference depresses the B → D ( * ) τ − ν τ rate, the proportion of lefthanded τ − leptons increases, and therefore, the |p * ℓ | spectrum is harder than in the SM. ,

A. Data Sample
This analysis is based on the full data sample recorded with the BABAR detector [24] at the PEP-II energyasymmetric e + e − storage rings [25]. It operated at a center-of-mass (c.m.) energy of 10.58 GeV, equal to the mass of the Υ (4S) resonance. This resonance decays almost exclusively to BB pairs. The collected data sample of 471 million Υ (4S) → BB events (on-peak data), corresponds to an integrated luminosity of 426 fb −1 [26]. To study continuum background, an additional sample of 40 fb −1 (off-peak data) was recorded approximately 40 MeV below the Υ (4S) resonance, i.e., below the threshold for BB production.

B. The BABAR Detector and Single Particle Reconstruction
The BABAR detector and event reconstruction have been described in detail elsewhere [24]. The momentum and angles of charged particles were measured in a tracking system consisting of a 5-layer, double-sided siliconstrip detector (SVT) and a 40-layer, small-cell drift chamber (DCH) filled with a helium-isobutane gas mixture. Charged particles of different masses were distinguished by their ionization energy loss in the tracking devices and by a ring-imaging Cerenkov detector (DIRC). A finely segmented CsI(Tl) calorimeter (EMC) measured the energy and position of electromagnetic showers generated by electrons and photons. The EMC was surrounded by a superconducting solenoid providing a 1.5-T magnetic field and by a segmented flux return with a hexagonal barrel section and two endcaps. The steel of the flux return was instrumented (IFR) with resistive plate chambers and limited streamer tubes to detect particles penetrating the magnet coil and steel.
Within the polar angle acceptance of the SVT and DCH (0.4 < θ lab < 2.6) the efficiency for the reconstruction of charged particles exceeds 99% for momenta above 1 GeV. For low momentum pions, especially from D * + → D 0 π + decays, the efficiency drops to about 90% at 0.4 GeV and to 50% at 0.1 GeV.
The electron and muon identification efficiencies and the probabilities to misidentify a pion, a kaon, or a proton as an electron or a muon are measured as a function of the laboratory momentum and angles using high-purity data samples.
Electrons are separated from charged hadrons primarily on the basis of the ratio of the energy deposited in the EMC to the track momentum. A special algorithm has been developed to identify photons from bremsstrahlung in the inner detector, and to correct the electron momentum for the energy loss. Within the polar angle acceptance, the average electron efficiency for laboratory momenta above 0.5 GeV is 97%, largely independent of momentum. The average pion misidentification rate is less than 0.5%.
Muon identification relies on a new multivariate algorithm that significantly increases the reconstruction efficiency at low muon momenta, |p µ | < 1 GeV. This algorithm combines information on the measured DCH track, the track segments in the IFR, and the energy deposited in the EMC. The average muon efficiency is close to 90% independent of momentum, except in the forward endcap, where it decreases for laboratory momenta below 1 GeV. The average pion misidentification rate is about 2% above 1.2 GeV, rising at lower momenta and reaching a maximum of 9% at 0.8 GeV.
By choosing a fairly loose selection of charged leptons and taking advantage of improved PID algorithms, we increased the lepton efficiencies by 6% for electrons and 50% for muons compared to the previous BABAR analysis [14]. Charged kaons are identified up to 4 GeV on the basis of information from the DIRC, SVT, and DCH. The efficiency exceeds 80% over most of the momentum range and varies with polar angle. The probability that a pion is misidentified as a kaon is close to 2%, varying by about 1% as a function of momentum and polar angle.
The decays K 0 S → π + π − are reconstructed as pairs of tracks of opposite charge originating from a displaced vertex. The invariant mass of the pair m ππ is required to be in the range m ππ ∈ [0.491, 0.506] GeV. No attempt is made to identify interactions of K 0 L in the EMC or IFR. To remove beam-generated background in the EMC and electronic noise, photon candidates are required to have a minimum energy of 30 MeV and a shower shape that is consistent with that of an electromagnetic shower. Neutral pions are reconstructed from pairs of photon candidates with an invariant mass in the range m γγ ∈ [120, 150] MeV.

Simulated Samples
This analysis relies on Monte Carlo (MC) techniques to simulate the production and decay of continuum and BB events. The simulation is based on the EvtGen generator [27]. The qq fragmentation is performed by Jetset [28], and the detector response by Geant4 [29]. Radiative effects such as bremsstrahlung in the detector material and initial-state and final-state radiation [30] are included.
We derive predictions for the distributions and efficiencies of the signal and backgrounds from the simulation. The size of the simulated sample of generic BB events exceeds that of the BB data sample by about a factor of ten, while the sample for qq events corresponds to twice the size of the off-peak data sample. We assume that the Υ (4S) resonance decays exclusively to BB pairs and use recent measurements of branching fractions [12] for all produced particles. The impact of their uncertainties on the final results is assessed as a systematic uncertainty.
Information extracted from studies of selected data control samples is used to improve the accuracy of the simulation. Specifically, we reweight simulated events to account for small differences observed in comparisons of data and simulation (Sec. V).
an HQET-based parameterization [32]. A change to a different FF parameterization is implemented by reweighting the generated events with the weights Here, M(q 2 , θ i ) HQET refers to the matrix element for the FF parameterizations described in Secs. II A 1 and II A 2, and M(q 2 , θ i ) MC is the matrix element employed in the MC generation. The matrix element of decays involving the scalar D meson depends on one angular variable, the lepton helicity angle θ ℓ , with ℓ = e, µ, τ . In addition to θ ℓ , the matrix element of decays involving the vector meson D * is sensitive to two additional angular variables describing the D * decay. The ratio of the branching fractions B MC /B HQET ensures that the sum of all weights equals the number of generated events.
In the SM, this reweighting results in a small shift of the q 2 distribution to higher values, while the changes in the helicity angle θ τ and the τ polarization are negligible. Therefore, the distributions of the secondary charged lepton are not affected.
In the presence of a charged Higgs boson, however, the τ polarization can change substantially, affecting the momentum of the secondary lepton ℓ originating from the τ → ℓν ℓ ν τ decays. We account for the potential presence of a charged Higgs of 2HDM type II by reweighting the simulation with the following weights, where θ i refers again to the angular variables. The second factor represents the ratio of the |p * ℓ | distributions Γ(|p * ℓ |) in the 2HDM parameterization and in the MC simulation. This factorization is necessary because in the MC generation the polarization is handled in a probabilistic manner, so it cannot be corrected on an event-per-event basis. It is only applicable if |p * ℓ | is uncorrelated with q 2 and the angular variables, which is largely the case. In some regions of phase space, the 2HDM weights have a much larger dispersion than the weights applied in the SM reweighting, leading to larger statistical uncertainties for the simulation of the Higgs boson contributions.

Simulation of
By D * * we refer to excited charm resonances heavier than the D * meson. We include in the simulation the B → D * * τ − ν τ and B → D * * ℓ − ν ℓ decays that involve the four D * * states with L = 1 that have been measured [4]. This simulation takes into account their helicities [33] and the following decay modes: are not included in the nominal fit for lack of reliable measurements.
To estimate the rate of B → D * * τ ν τ decays, we rely on ratios of the available phase space Φ, The value of this ratio depends on the mass of the D * * state involved in the B → D * * (τ − /ℓ − )ν decay. We use the largest of the four possible choices, R(D * * ) = 0.18.
Possible contributions from non-resonant B → D ( * ) π(π)ℓ − ν ℓ decays and semileptonic decays involving higher-mass excited charm mesons are not included in the nominal fit, and will be treated as a systematic uncertainty.

IV. EVENT SELECTION
The event selection proceeds in two steps. First, we select BB events in which one of the B mesons, the B tag , is fully reconstructed in a hadronic decay, while the other B meson decays semileptonically. To increase the event selection efficiency compared to earlier analyses, we have added more decay chains to the B tag selection and have chosen a looser charged lepton selection. This leads to significantly higher backgrounds, primarily combinatorial background from BB and continuum events, and charge-crossfeed events. Charge-crossfeed events are B → D ( * ) (τ − /ℓ − )ν decays in which the charge of the reconstructed B tag and D ( * ) mesons are wrong, primarily because of an incorrectly assigned low-momentum π ± .
Semileptonic decays to higher mass charm mesons have a signature similar to that of signal events and their composition is not well measured. This background is fitted in selected control samples that are enriched with these decays.
As the second step in the event selection, we introduce kinematic criteria that increase the fraction of selected signal events with respect to normalization and background decays. We also apply a multivariate algorithm to further improve the signal-to-background ratio.

A. Selection of Events with a Btag and a
Semileptonic B Decay Υ (4S) → BB events are tagged by the hadronic decay of one of the B mesons. We use a semi-exclusive algorithm which includes additional B tag decay chains and enhances the efficiency by a factor of 2 compared to the earlier version employed by BABAR [14]. We look for decays of the type B tag → SX ± , where S refers to a seed meson and X ± is a charged state comprising of up to five hadrons, pions or kaons, among them up to two neutral mesons, π 0 or K 0 S . The seed mesons, D, D * , D s , D * s , and J/ψ , are reconstructed in 56 decay modes. As a result, the B tag is reconstructed in 1,680 different decay chains, which are further subdivided into 2,968 kinematic modes.
To isolate the true tag decays from combinatorial background, we use two kinematic variables: the energy substituted mass m ES = E 2 beam − p 2 tag and the energy difference ∆E = E tag − E beam . Here p tag and E tag refer to the c.m. momentum and energy of the B tag , and E beam is the c.m. energy of a single beam particle. These variables make optimum use of the precisely known energies of the colliding beams. For correctly reconstructed B decays, the m ES distribution is centered at the B-meson mass with a resolution of 2.5 MeV, while ∆E is centered at zero with a resolution of 18 MeV which is dominated by the detector resolution. We require m ES > 5.27 GeV and |∆E| < 0.072 GeV.
For each B tag candidate in a selected event, we look for the signature of the semileptonic decay of the second B meson, a D or D * meson and a charged lepton ℓ. We combine charged B tag candidates with D ( * )0 ℓ − systems and neutral B tag candidates with both D ( * )+ ℓ − and D ( * )− ℓ + systems, where the inclusion of both charge combinations allows for neutral B mixing. We require all charged particles to be associated with the B tag D ( * ) ℓ candidate, but we allow for any number of additional photons in the event.
The laboratory momentum of the electron or muon is required to exceed 300 MeV or 200 MeV, respectively. For D mesons, we reconstruct the following decay modes: The reconstructed invariant mass of D candidates is required to be consistent with the nominal D mass to within four standard deviations (σ). The combined reconstructed branching fractions are 35.8% and 27.3% for D 0 and D + , respectively. We identify D * mesons by their decays D * + → D 0 π + , D + π 0 , and D * 0 → D 0 π 0 , D 0 γ. For these decays, the c.m. momentum of the pion or the c.m. energy of the photon are required to be less than 400 MeV. Furthermore, the mass difference ∆m = m(D * ) − m(D) is required to differ by less than 4σ from the expected value [12].
To further reduce the combinatorial background, we perform a kinematic fit to the event, constraining tracks of secondary charged particles to the appropriate B, D ( * ) , or K 0 S decay vertices. The fit also constrains the reconstructed masses of the D, D * , and K 0 S mesons to their nominal values. The vertex of the Υ (4S) → BB decay has to be compatible with a beam-beam interaction. Candidates for which this fit does not converge are rejected. The m 2 miss resolution improves by about 25% and becomes more symmetric for the remaining candidates.
To select a single BB candidate, we determine E extra = i E γ i , the sum of the energies of all photons that are not associated with the reconstructed BB pair. We only include photons of more than 50 MeV, thereby eliminating about 99% of the beam-generated background. We retain the candidate with the lowest value of E extra , and if more than one candidate survives, we select the one with the smallest |∆E|. This procedure preferentially selects D * ℓ candidates over Dℓ candidates. Thus, we reduce the fraction of misreconstructed events with a D * → D(π/γ) decay for which the pion or photon is not properly assigned to the D * meson.
As a consequence of the rather loose lepton selection criteria and the addition of decay modes with multiple neutral pions and K 0 S for the B tag selection, the number of B tag D ( * ) ℓ candidates per event is very large. To address this problem, we identify the B tag decay modes that contribute primarily to the combinatorial background. Specifically, we determine for each of the 2,968 kinematic modes R tc , the fraction of events for which all charged particles in the B tag final state are correctly reconstructed and associated with the tag decay. This assessment is based on a large sample of simulated BB events equivalent to 700 fb −1 . We observe that for decay chains with low multiplicity final states and no neutral hadrons the signal-to-background ratio (S/B) is very high. For instance, for the this ratio is S/B = 20/145. For this decay mode, typically 3.5 of the 8 B tag final state particles are incorrectly associated with the second B decay in the event or otherwise misidentified. Based on this study, we only retain B tag decay chains with R tc > 0.3. With this criterion, we remove 2100 B tag kinematic modes, eliminate 2/3 of the combinatorial background, and retain 85% of the signal B → D ( * ) τ − ν τ decays. Thanks to this procedure, the average number of candidates per event before single candidate selection is reduced to 1.8 for the D 0 ℓ and D + ℓ samples, and 3.1 and 4.8 for the D * 0 ℓ and D * + ℓ samples, respectively.
B. Selection of the D ( * ) π 0 ℓ Control Samples To constrain the B → D * * (τ − /ℓ − )ν background, we select four D ( * ) π 0 ℓ control samples, identical to the D ( * ) ℓ samples except for an additional reconstructed π 0 . The π 0 is selected in the mass range m γγ ∈ [120, 150] MeV. Decays of the form B → D ( * ) πℓν peak at m 2 miss = 0 in these samples. As a result, we can extract their yields together with the signal and normalization yields by fitting the D ( * ) ℓ and D ( * ) π 0 ℓ samples simultaneously.
More than half of the events in these control samples originate from continuum e + e − → qq(γ) events. Since the fragmentation of light quarks leads to a two-jet event topology, this background is very effectively suppressed by the requirement | cos ∆θ thrust | < 0.8, where ∆θ thrust is the angle between the thrust axes of the B tag and of the rest of the event. Since B mesons originating from Υ (4S) decays are produced just above threshold, their final state particles are emitted almost isotropically, and, Btag charged multiplicity cos ∆θ thrust therefore, the cos ∆θ thrust distribution is uniform. As a result, the loss of B → D * * (τ − /ℓ − )ν decays due to this restriction is significantly smaller than the amount of continuum events rejected.

C. Optimization of the Signal Selection
We introduce criteria that discriminate signal from background, and also differentiate between signal B → D ( * ) τ − ν τ and B → D ( * ) ℓ − ν ℓ decays. For semileptonic decays the minimum momentum transfer is largely determined by the mass of the charged lepton. For decays involving τ leptons, q 2 min = m 2 τ ≃ 3.16 GeV 2 . Thus the selection q 2 > 4 GeV 2 retains 98% of the B → D ( * ) τ − ν τ decays and rejects more that 30% of the B → D ( * ) ℓ − ν ℓ decays. The event sample with q 2 < 4 GeV 2 is dominated by B → D ( * ) ℓ − ν ℓ and serves as a very clean data sample for comparisons with the MC simulation. To reject background from hadronic B decays in which a pion is misidentified as muon, we require |p miss | > 200 MeV, where |p miss | is the missing momentum in the c.m. frame.
To further improve the separation of wellreconstructed signal and normalization decays from various backgrounds, we employ a boosted decision tree (BDT) multivariate method [34]. This method relies on simple classifiers which determine signal and background regions by using binary selections on various input distributions. (f) the mass difference for a D * originating from the B tag , ∆m tag = m(D tag π) − m(D tag ); (g) the charged particle multiplicity of the B tag candidate; and (h) cos ∆θ thrust . The input distributions for one of the BDT selectors are shown in Fig. 4. For the D ( * ) π 0 ℓ samples, we use similar BDT selectors that are trained to reject continuum, D ( * ) (ℓ/τ )ν, and other BB background. After the BDT requirements are applied, the fraction of events attributed to signal in the m 2 miss > 1.5 GeV 2 region, which excludes most of the normalization decays, increases from 2% to 39%. The background remaining in that region is composed of normalization events (10%), continuum (19%), D * * (ℓ/τ )ν events (13%), and other BB events (19%), primarily from B → D ( * ) D ( * )+ s decays with D + s → τ + ν τ .

V. CORRECTION AND VALIDATION OF THE MC SIMULATION
The simulation of the full reconstruction of highmultiplicity events, including the veto of events with extra tracks or higher values of E extra is a rather challenging task. To validate the simulation, we compare simulated distributions with data control samples, and, when necessary, correct the MC simulations for the observed differences. The figures shown in this section combine events from all four channels (D 0 ℓ, D * 0 ℓ, D + ℓ, and D * + ℓ); the observed differences are similar in the individual samples.
The control samples are selected to have little or no contamination from signal decays. Specifically we select, • Continuum events: off-peak data.
• Combinatorial BB and continuum backgrounds: 5.20 < m ES < 5.26 GeV.   The off-peak data sample confirms the m 2 miss distribution of simulated continuum events, but shows discrepancies in the |p * ℓ | spectrum and overall normalization of the simulation [ Fig. 5(a)]. These features are also observed in other control samples, such as on-peak data with high E extra [ Fig. 5(b)]. We correct the simulated |p * ℓ | spectrum and yield of the continuum contribution by reweighting it to match off-peak data, on an event-by-event basis. After this correction, the |p * ℓ | distributions of the expected backgrounds agree well in independent control samples down to low lepton momenta where the misidentification rates are significant [ Fig. 5(c)]. We observe that in the high E extra region, the simulation exceeds data yield by (1.3 ± 0.5)%. This small excess is corrected by decreasing the expected BB background yield by (4.3 ± 1.9)%. After this correction, the simulation provides accurate yield predictions for the backgrounds at intermediate and high E extra . For instance, the ratio of the expected to observed yield of events with m 2 miss > 1.5 GeV 2 is 0.998 ± 0.006. The m 2 miss distributions of the continuum and BB backgrounds are described well in all control samples.
The region of low E extra , which includes the signal region, is more difficult to model, primarily due to low energy photons and K 0 L mesons interacting in the EMC. Figure 5(d) shows that the data in the m ES sideband agree well with the combinatorial background predictions for E extra > 0.5 GeV, but are underestimated for low E extra . This, and small differences in the other BDT input distributions, result in a underestimation of the combinatorial background when the BDT requirements are applied. Based on the 5.20 < m ES < 5.26 GeV sideband, we find scale factors of 1.099 ± 0.019 and 1.047 ± 0.034 for the combinatorial background in the Dℓ and D * ℓ samples, respectively. The uncertainties are given by the statistics of the data and simulated samples. The ratio between the observed and expected m ES distribution is independent of E extra [Figs. 5(e,f)], so we apply these corrections to the continuum and BB backgrounds in the signal region. The same correction is applied to B → D * * (τ − /ℓ − )ν events, which cannot be easily isolated, because their simulated E extra distributions are very similar to those of combinatorial background. These corrections affect the fixed BB and continuum yields in the fit, as well as the relative efficiency of B → D * * (τ − /ℓ − )ν events in the D ( * ) ℓ and D ( * ) π 0 ℓ samples. As a result, these corrections are the source of the dominant systematic uncertainties.
Relying on the q 2 ≤ 4 GeV 2 control sample, where B → D ( * ) ℓ − ν ℓ decays account for 96% of the events, we correct the E extra distribution and an 8.5% overestimation of the simulated normalization events. We apply the same correction to simulated signal events which are expected to have a similar E extra distribution. This procedure does not affect the relative efficiency of signal to normalization events, so it has a very small impact on the R(D ( * ) ) measurements.
We use the same q 2 ≤ 4 GeV 2 control sample to compare and validate the |p * ℓ | distributions of B → D ( * ) ℓ − ν ℓ events. We observe that the m 2 miss resolution of the narrow peaks at m 2 miss = 0 is slightly underestimated by the simulation. This effect is corrected by convolving the simulated distributions with a Gaussian resolution function, for which the width is adjusted by iteration.

A. Overview
We extract the signal and normalization yields from an extended, unbinned maximum-likelihood fit to twodimensional m 2 miss -|p * ℓ | distributions. The fit is performed simultaneously to the four D ( * ) ℓ samples and the four D ( * ) π 0 ℓ samples. The distribution of each D ( * ) ℓ and D ( * ) π 0 ℓ sample is fit to the sum of eight or six contributions, respectively. Each of the 4 × 8 + 4 × 6 = 56 contributions is described by a probability density function (PDF). Their relative scale factor determines the number of events from each source. Tables II and III summarize the contributions to the fit for the four D ( * ) ℓ sample and the four D ( * ) π 0 ℓ samples. These tables also list the relative yield for each contribution as estimated from MC simulation (for SM signal), and specify whether the yield is free, fixed, or constrained in the fit. We introduce the following notation to uniquely identify each contribution to the fit: source ⇒ sample. For instance, D * 0 τ ν ⇒ D * 0 ℓ refers to signal D * 0 τ ν decays that are correctly reconstructed in the D * 0 ℓ sample, while D * 0 τ ν ⇒ D 0 ℓ refers to the same decays, but incorrectly reconstructed in the D 0 ℓ sample. We refer to the latter as feed-down. Contributions of the form D(τ /ℓ)ν ⇒ D * (τ /ℓ) and D ( * ) (τ /ℓ)ν ⇒ D * * (τ /ℓ) are referred to as feed-up.
The contributions from the continuum, BB, and crossfeed backgrounds, with the exception of BB background in the D ( * ) π 0 ℓ samples, are fixed to the yields determined by MC simulation after small adjustments based on data control regions. The yields of the remaining 36 contributions are determined in the fit. Some of these contributions share the same source and therefore the ratio of their yields is constrained to the expected value, e.g., D * 0 τ ν ⇒ D * 0 ℓ and D * 0 τ ν ⇒ D 0 ℓ. Of special importance are the constraints linking the D * * (ℓ/τ )ν yields in the D ( * ) ℓ samples (N D * * ⇒D ( * ) ) to the yields in the D ( * ) π 0 ℓ samples (N D * * ⇒D ( * ) π 0 ), Given that these constraints share the same source, f D * * is equivalent to the ratio of the D * * (ℓ/τ )ν reconstruction efficiencies for the two samples. Taking into account the constraints imposed on event yields from a common source, there are 22 free parameters in the standard fit, as listed in Table IV. In addition, we perform a fit in which we impose the isospin relations R(D 0 ) = R(D + ) ≡ R(D) and R(D * 0 ) = R(D * + ) ≡ R(D * ). We choose not to impose isospin relations for the D ( * ) π 0 ℓ samples. Consequently, this fit has a total of 17 free parameters.
• The fixed charge cross-feed yields are updated based on the deviation of the fitted D ( * ) ℓν yields from the expected values.
• The continuum, BB, and D * * (ℓ/τ )ν background corrections are recalculated. They have a slight dependence on the fitted D ( * ) ℓν events because some of these events extend into the m ES sideband.
• The correction to the m 2 miss resolution of the normalization contributions is readjusted.
• The two feed-down constraints for D * τ ν are updated using the fitted feed-down constraints for the normalization contributions in the following way: The iterations continue until the change on the values of R(D ( * ) ) is less than 0.01%. The update of the feed-down rates has a significant impact on the fits to the D 0 and D + samples because of the large signal feed-down. The other iterative updates have only a marginal impact.

B. Probability Density Functions and Validation
The fit relies on 56 PDFs, which are derived from MC samples of continuum and BB events equivalent to 2 and 9 times the size of the data sample, respectively. The two-dimensional m 2 miss -|p * ℓ | distributions for each of the 56 contributions to the fit are described by smooth nonparametric kernel estimators [35]. These estimators enter a two-dimensional Gaussian function centered at the m 2 miss and |p * ℓ | values of each simulated event. The width of the Gaussian function determines the smoothness of the PDF. We find the optimum level of global smoothing with a cross-validation algorithm [36]. For PDFs that have variations in shape that require more than one level of smoothing, we combine estimators with different TABLE II. Contributions to the four D ( * ) ℓ samples. The expected relative abundance of events in each data sample is represented by fexp. The columns labeled Yield indicate whether the contribution is free in the fit, fixed, or linked to another component through a cross-feed constraint. The charged cross-feed components, marked with Fix./It., are fixed in the fit, but updated in the iterative process.     Gaussian widths in up to four areas in the m 2 miss -|p * ℓ | space. For instance, we use different levels of smoothing in the D * 0 ℓν ⇒ D * 0 ℓ contribution for the narrow peak at m 2 miss = 0 and the smooth m 2 miss tail that extends up to 7 GeV 2 . Figure 6 shows one-dimensional projections of five two-dimensional PDFs. The bands indicate the statistical uncertainty on the PDFs estimated with a bootstrap algorithm [36].
The m 2 miss distributions of signal and normalization are very distinct due to the different number of neutrinos in the final state. The m 2 miss distributions of the backgrounds resemble those of the signal, and therefore these contributions to the fit are either fixed or constrained by the D ( * ) π 0 ℓ samples.
To validate the PDFs and the fit procedure, we divide the large sample of simulated BB events into two: sample A with about 3.3×10 9 BB events, and sample B with 9.4 × 10 8 BB events. We determine the PDFs with sample A, and create histograms by integrating the PDFs in bins of their m 2 miss and |p * ℓ | projections. We compare the resulting histograms with the events in sample A, and derive a χ 2 based on the statistical significance of the difference for each bin. The distribution of the corresponding p values for these PDFs is uniform, as expected for an unbiased estimation. As another test, we extract the signal and normalization yields from fits to the events of sample B, using the PDFs obtained from sample A. Again, the results are compatible with an unbiased fit. Furthermore, we validate the fit procedure based on a large number of pseudo experiments generated from these PDFs. Fits to these samples also show no bias in the extracted signal and normalization yields.

Figures 7 and 8 show the m 2
miss and |p * ℓ | projections of the fits to the D ( * ) ℓ samples. In Fig. 7, the |p * ℓ | projections do not include events with m 2 miss > 1 GeV 2 , i.e., most of the signal events. In Fig. 8, the vertical scale is enlarged and the horizontal axis is extended for the m 2 miss projection to reveal the signal and background contributions. The |p * ℓ | projections emphasize the signal events by excluding events with m 2 miss < 1 GeV 2 . Both figures demonstrate that the fit describes the data well and the observed differences are consistent with the statistical and systematic uncertainties on the PDFs and the background contributions. Figure 9 shows the m 2 miss and |p * ℓ | projections of the fit to the four D ( * ) π 0 ℓ samples. The narrow m 2 miss peak is described well by the fit. It tightly constrains contributions from B → D ( * ) πℓν decays, including the nonresonant D ( * ) π states as well as decays of D * * states, narrow or wide. There appears to be a small excess of events in the data for 1 < m 2 miss < 2 GeV 2 . This might be an indication for an underestimation of the D * * (ℓ/τ )ν background. The impact of this effect is assessed as a systematic uncertainty.
The fit determines, for each signal decay mode, the number of signal events in the data sample, N sig , and the corresponding number of normalization events, N norm . We derive the ratios of branching fractions as where ε sig /ε norm is the ratio of efficiencies (including the τ ± branching fractions) taken from MC simulation. These relative efficiencies are larger for R(D) than for R(D * ), because the q 2 > 4 GeV 2 requirement rejects a larger fraction of B → Dℓ − ν ℓ decays than of B → D * ℓ − ν ℓ decays, while keeping almost 100% of B → D ( * ) τ − ν τ decays.. The results of the fits in terms of the number of events, the efficiency ratios, and R(D ( * ) ) are listed in Table VIII Table V lists the systematic uncertainties considered in this analysis, as well as their correlations in the measurements of R(D) and R(D * ). We distinguish two kinds of uncertainties that affect the measurement of R(D ( * ) ): additive uncertainties which impact the signal and background yields and thereby the significance of the results, and multiplicative uncertainties that affect the ε sig /ε norm ratios and, thus, do not change the significance. The limited size of the simulated signal and background samples impact both additive and multiplicative uncertainties.

A. Additive uncertainties:
Additive uncertainties affect the results of the fit. To asses their impact, we vary the source of uncertainty 1000 times following a given distribution, and repeat the fit for each variation. We adopt as the uncertainty the standard deviation of the distribution of the resulting R(D ( * ) ) values. From this ensemble of fits, we also estimate the correlation between the uncertainties of R(D) and R(D * ).

PDF Estimation
MC statistics: We employ a bootstrap algorithm [36] to estimate the uncertainty due to the limited size of the simulated event samples on which we base the 56 PDFs. We generate 1000 samples of simulated events by sampling the original MC sample with replacement [37]. The PDFs are recalculated with each bootstrapped sample, and the fit is repeated for each set of PDFs. Figure 6 shows the 1σ and 2σ bands for the projections of five selected PDFs. The impact on the final result is 4.4% for R(D) and 2.0% for R(D * ). Form factors for B → D ( * ) (τ − /ℓ − )ν: We estimate the impact on the signal and normalization PDFs due to the uncertainties on the FF parameters, ρ 2 D , ∆, ρ 2 D * , R 0 (1), R 1 (1), and R 2 (1), taking into account their uncertainties and correlations. We recalculate the D ( * ) τ ν and D ( * ) ℓν PDFs with each set of 1000 Gaussian variations of the parameter values, and repeat the fit with each set of PDFs to determine the impact on R(D ( * ) ).
D * * → D ( * ) (π 0 /π ± ) fraction: The simulation of D * * (ℓ/τ )ν decays only includes the two-body decays D * * → D ( * ) π of the four L = 1 charm meson states. The ratio of D * * → D ( * ) π 0 decays to D * * → D ( * ) π ± decays which is fixed by isospin relations has a signifi-cant impact on the PDFs, because D * * → D ( * ) π 0 decays result in a sharply peaked m 2 miss distribution for the D ( * ) π 0 ℓ samples. The measured uncertainty on the π 0 detection efficiency is 3%. We assume a 4% uncertainty to the probability that a low momentum charged pion from D * * → D ( * ) π ± decays is misassigned to the B tag decay. Combining these two uncertainties, we arrive at an uncertainty on the relative proportion of the two-body decays of D * * of 5%. We repeat the fit increasing and decreasing this ratio by 5%, and adopt the largest variation of the isospin-constrained fit results as the systematic uncertainty.
B → D * * ℓ − ν ℓ branching fractions: Since decays to the four D * * states are combined in the B → D ( * ) (τ − /ℓ − )ν samples, the PDFs depend on the relative B → D * * ℓ − ν ℓ branching fractions for the four L = 1 states [4]. The impact of the branching fraction uncertainties is assessed by recalculating the B → D ( * ) (τ − /ℓ − )ν PDFs and adopting the variation of the fit results from the ensemble of PDFs as the uncertainty. B → D * * (τ − /ℓ − )ν branching fractions: As noted above, the sharp peak in the m 2 miss distribution of the D ( * ) π 0 ℓ samples constrains contributions from B → D ( * ) πℓν decays. Events with additional unreconstructed particles contribute to the tail of the m 2 miss distribution and, thus, are more difficult to separate from other backgrounds and signal events. This is the case for B → D * * τ − ν τ decays, which are combined with B → D * * ℓ − ν ℓ decays in the D * * (ℓ/τ )ν PDFs with the relative proportion R(D * * ) PS = 0.18. This value has been derived from the ratio of the available phase space. The same estimate applied to B → D ( * ) ℓ − ν ℓ decays results in R(D) PS = 0.279 and R(D * ) PS = 0.251, values that are 58% and 32% smaller than the measured values. Taking this comparison as guidance for the error on R(D * * ), we increase R(D * * ) by 50%, recalculate the D * * (ℓ/τ )ν PDFs, and repeat the fit. As a result, the values of R(D) and R(D * ) decrease by 1.8% and 1.7%, respectively. The impact is relatively small, because B → D * * τ − ν τ con-tributions are small with respect to signal decays, which have much higher reconstruction efficiencies.
Unmeasured B → D * * (→ D ( * ) ππ)ℓν ℓ decays: To assess the impact of other potential B → D * * ℓ − ν ℓ contributions, we modify the standard fit by adding an additional component. Out of the four contributions listed in Table VI, the three-body decays of the D * * states with L = 1 give the best agreement in the fits to the D ( * ) π 0 ℓ samples. For this decay chain, the m 2 miss distribution has a long tail due to an additional undetected pion. This could account for some of the observed excess at 1 < m 2 miss < 2 GeV 2 in Fig. 9. We assign the observed change in R(D ( * ) ) as a systematic uncertainty.

Cross-feed Constraints
MC statistics: Constraints on the efficiency ratios that link contributions from the same source are taken from MC simulation. The impact of their statistical uncertainty is assessed by varying the simulated event yields assuming Poisson errors.
The ratios f D * * : We assess the uncertainty on f D * * , the constraints linking the D * * (ℓ/τ )ν yields in the D ( * ) ℓ and D ( * ) π 0 ℓ samples, by estimating the relative efficiencies of the selection criteria that differ in the two samples. The main differences in the selection of these samples are due to differences in the D ( * ) ℓ and D ( * ) π 0 ℓ BDTs.
In the D ( * ) ℓ samples, we observed that differences between data and simulation cause a 5%-10% underestimation of the continuum and BB backgrounds after the BDT requirements are applied. Since the D * * (ℓ/τ )ν contributions have similar E extra distributions, and these distributions are the key inputs to the BDTs, we applied the same 5%-10% corrections to these contributions. We conservatively assign 100% of this correction as the systematic uncertainty on the D * * (ℓ/τ )ν efficiency in the D ( * ) ℓ samples.
Since B → D * * (τ − /ℓ − )ν decays are difficult to isolate in samples other than the D ( * ) π 0 ℓ control samples, we estimate the uncertainty on the D * * (ℓ/τ )ν efficiency due to the D ( * ) π 0 ℓ BDT selection by relying on the observed data-MC difference of the BDT selection efficiency for the D ( * ) ℓν sample. We assign the full 8.5% overestimate of the D ( * ) ℓν contribution as the systematic uncertainty on the D * * (ℓ/τ )ν efficiency in the D ( * ) π 0 ℓ samples.
The f D * * constraints also depend on the relative branching fractions of the four B → D * * ℓ − ν ℓ decays that are combined in the D * * (ℓ/τ )ν contributions. We estimate their impact on f D * * from the branching fraction variations observed in the evaluation of the PDF uncertainty. The largest standard deviation for the four f D * * distributions is 1.8%.
By adding the uncertainties on f D * * described above in quadrature, we obtain total uncertainties of 13.2% for the D samples, and 10.0% for the D * samples. Given that there are similarities between the BDT selections applied to the D and D * samples, we adopt a 50% correlation between their uncertainties. With these uncertainties and correlations, we derive the total impact on the results, 5.0% for R(D) and 2.0% for R(D * ).
Feed-down constraints: The feed-down constraints of the signal yields are corrected as part of the iteration of the fit. The uncertainties on these corrections are given by the statistical uncertainty on the ratios of the fitted D * ℓν ⇒ D * ℓ and D * ℓν ⇒ Dℓ yields. They are 2.4% and 4.4% on the D * 0 τ ν and D * + τ ν feed-down constraints, respectively.

Feed-up constraints:
We estimate the uncertainty on the Dτ ν and Dℓν feed-up constraints as 100% of the corrections on the feed-down constraints. This results in 6.8% on the D 0 (ℓ/τ )ν feed-up and 9.9% on the D + (ℓ/τ )ν feed-up. These two effects combined lead to an uncertainty of 1.3% on R(D) and 0.4% on R(D * ).
Isospin constraints: In the isospin-constrained fit, we employ five additional constraints to link the signal and normalization yields of the samples corresponding to B − and B 0 decays. Since we reweight these contributions with the q 2 ≤ 4 GeV 2 control sample, the uncertainty on the isospin constraints is given by the statistical uncertainty on the ratios of the q 2 ≤ 4 GeV 2 yields. This uncertainty is 3.4% in the Dℓ samples and 3.6% in the D * ℓ samples. This translates into uncertainties of 1.2% on R(D) and 0.3% on R(D * ).

Fixed Background Contributions
MC statistics: The yields of the continuum, BB, and cross-feed backgrounds are fixed in the fit. The uncertainty due to the limited size of the MC samples is estimated generating Poisson variations of these yields, and repeating the fit with each set of values. A significant part of this uncertainty is due to the continuum yields, since the size of simulated continuum sample is equivalent to only twice the data sample, Efficiency corrections: To account for the correlations among the various corrections applied to the continuum and BB backgrounds, we follow this multi-step procedure: • We vary the continuum corrections within their statistical uncertainties of 3%-9% , given by the number of events in the off-peak data control samples.
• The branching fractions of the most abundant decays in the BB background are varied within their uncertainties [12].
• The BB correction is reestimated in the high E extra control sample, and varied within the statistical uncertainty of 1.9%.
• The BDT bias corrections are reestimated in the m ES sideband, and varied within their statistical uncertainties, 2.1% in the Dℓ samples and 3.6% in the D * ℓ samples.
• The BB background PDFs are recalculated.
• The fit is repeated for each set of PDF and yield variations. Table VII shows the size of the continuum and BB backgrounds and their uncertainties due to the limited size of the MC samples and the various corrections implemented by comparisons with control samples.

B. Multiplicative Uncertainties
MC statistics: The relative efficiency ε sig /ε norm is estimated as the ratio of expected yields, so the limited size of the MC samples contributes to its uncertainty. We estimate it assuming Poisson errors on the MC yields. Form factors for B → D ( * ) (τ − /ℓ − )ν: The q 2 > 4 GeV 2 requirement introduces some dependence on the FF parameterization. This uncertainty is assessed based on the effect of the FF variations calculated for the uncertainty on the PDFs. π 0 /π ± from D * → Dπ: There is a significant momentum-dependent uncertainty on the reconstruction efficiency of soft pions originating from D * → Dπ decays. However, the momentum spectra of soft pions in signal and normalization decays are rather similar, see Fig. 10. As a result, the uncertainty on R(D ( * ) ) is less than 0.1%.
Detection and Reconstruction: Given that signal and normalization decays are reconstructed by the same particles in the final state, many of the uncertainties that impact their efficiencies cancel in the ratios ε sig /ε norm . Uncertainties due to final-state radiation, soft-pion reconstruction, and others related to the detector performance contribute less than 1%. Similarly, the tagging efficiency for events with signal and normalization decays show only very small differences.

C. Correlations
Even though several of the uncertainties listed in Table V have the same source, their impact on R(D ( * ) ) is largely uncorrelated, i.e., the correlation between uncertainties in different rows of Table V is negligible. However, the correlation between the uncertainties on R(D) and R(D * ) (different columns) is significant, and important for the comparison of these measurements with theoretical predictions.
For most of the additive systematic uncertainties, we estimate the correlation from the two-dimensional R(D)-R(D * ) distribution resulting from the fit variations. This is not possible for the D * * → D ( * ) π 0 /π ± and D * * → D ( * ) ππ uncertainties. These uncertainties affect the size of the D * * (ℓ/τ )ν background in the D ( * ) ℓ samples in the same way that as f D * * does. Thus, we derive their correlations from the f D * * correlations. Since the signal and D * * τ ν PDFs are very similar, we assign a 100% correlation on B(B → D * * τ − ν τ ).
The multiplicative uncertainties on the efficiency due to the MC statistics are uncorrelated. The FFs for B → Dℓ − ν ℓ and B → D * ℓ − ν ℓ decays are measured separately, so their uncertainties are also not correlated. The uncertainty on B(τ − → ℓ −ν ℓ ν τ ) affects all channels equally. We assume that the remaining small uncertainties on the efficiencies due to detector effects are 100% correlated as well.
The uncertainties and their correlations are listed in Table V. We combine these correlations ρ i and the uncertainties by adding their covariance matrices, Here, σ i and σ * i refer to the uncertainties on R(D) and R(D * ), respectively.

A. Stability tests
We have checked the stability of the fit results for different data subsamples and different levels of background suppression.
To look for possible dependence of the results on the data taking periods, we divide the data sample into four periods corresponding to approximately equal luminosity, and fit each sample separately. The results are presented in Fig. 11. The eight measurements each for R(D) and R(D * ), separately for B + and B 0 , are compared to the isospin-constrained fit results obtained from the complete data sample. Based on the values of χ 2 for 7 degrees of freedom, we conclude that the results of these fits are statistically consistent with the fit to the whole data sample.
A similar test is performed for two samples identified by the final state lepton, an electron or a muon. This test includes the uncertainties on the background corrections that affect the electron and muon samples differently. These uncertainties are statistically dominated and, thus, independent for both samples. The results are presented in the bottom panels of Fig. 11. The χ 2 tests confirm the stability of these measurements within the uncertainties.
To assess the sensitivity of the fit results on the purity of the data sample and the BDT selection, we perform fits for samples selected with different BDT requirements. We identify each sample by the relative number of events in the signal region (m 2 miss > 1 GeV 2 ) with respect to the nominal sample, which is labeled as the 100% sample. BDT bias correction and the PDFs are recalculated for each sample. Figure 12 shows the results of fits to the different samples with tighter and looser BDT requirements. We take into account the large correlations between these nested samples and conclude that the results are stable for the very large variations of the BDT requirements.

B. Gaussian Uncertainties
For a maximum likelihood fit with Gaussian uncertainties, the logarithm of the likelihood is described by the parabola P (Y ) = (Y − Y fit ) 2 /2σ 2 fit , where Y fit is the fitted yield and σ fit is the uncertainty on Y fit . Figure  13 compares the likelihood scan of the signal yields for the isospin-constrained fit with the parabola that results from the fitted yields, presented in Table VIII. There is a slight asymmetry in the likelihood function, but good agreement overall. Thus, we conclude that the statistical uncertainty on R(D) and R(D * ) may be considered Gaussian. Figure 14 shows the effect on R(D) and R(D * ) from variations on f D * * , the largest source of systematic uncertainty. The distributions are well described by a Gaussian function. This is also the case for the other major sources of systematic uncertainty.

C. Kinematic Distributions
We further study the results of the fit by comparing the kinematic distributions of data events with the SM expectations. Specifically, we focus on the signal-enriched region with m 2 miss > 1.5 GeV 2 and scale each component in the simulation by the results of the fits. To compare the data and MC distributions we calculate a χ 2 per degree of freedom which only includes the statistical uncertainty of bins with 8 or more events. The number of degrees of freedom is given by the number of bins minus the number of fitted signal yields. Figure 15 shows the E extra distribution of events in the D ( * ) ℓ samples. This variable is key in the BDT selection and overall background suppression. There is a clear enhancement of signal events at E extra = 0 in all four D ( * ) ℓ samples. The background contributions, which are significantly more uniform in E extra than those of signal, appear to be well reproduced. We conclude that the simulation agrees well with the data distribution. Figure 16 also shows clear signal enhancements in the m ES and |p * ℓ | distributions of events in the m 2 miss > 1.5 GeV 2 region. The data and simulation agree well within the limited statistics. Table VIII shows the results of the measurement of R(D) and R(D * ) extracted from the fit without and with isospin constraints linking B + and B 0 decays.

A. Comparison with SM expectations
The B → D ( * ) τ − ν τ branching fractions are calculated from the measured values of R(D ( * ) ), For B − , we use the average branching fractions measured by BABAR [39][40][41], and for B 0 , the corresponding branching fractions related by isospin.
We estimate the statistical significance of the measured signal branching fractions as Σ stat = 2∆(lnL), where ∆(lnL) is the increase in log-likelihood for the nominal fit relative to the no-signal hypothesis. The total significance Σ tot is determined as In this expression, the statistical significance is scaled by the sum of the statistical uncertainty σ stat and the additive systematic uncertainty σ asys . The significance of the B → Dτ − ν τ signal is 6.8σ, the first such measurement exceeding 5σ. We compare the measured R(D ( * ) ) to the calculations based on the SM, R(D) exp = 0.440 ± 0.072 R(D * ) exp = 0.332 ± 0.030, R(D) SM = 0.297 ± 0.017 R(D * ) SM = 0.252 ± 0.003, and observe an excess over the SM predictions for R(D) and R(D * ) of 2.0σ and 2.7σ, respectively. We combine these two measurements in the following way where ∆ ( * ) = R(D ( * ) ) exp − R(D ( * ) ) th , and ρ is the total correlation between the two measurements, ρ(R(D), R(D * )) = −0.27. Since the total uncertainty is dominated by the experimental uncertainty, the expression in Eq. 30 is expected to be distributed as a χ 2 distribution for two degrees of freedom. Figure 17 shows this Events/(50 MeV) TABLE VIII. Results of the isospin-unconstrained (top four rows) and isospin-constrained fits (last two rows). The columns show the signal and normalization yields, the ratio of their efficiencies, R(D ( * ) ), the signal branching fractions, and Σstat and Σtot, the statistical and total significances of the measured signal yields. Where two uncertainties are given, the first is statistical and the second is systematic. The second and third uncertainties on the branching fractions B(B → D ( * ) τ − ντ ) correspond to the systematic uncertainties due to R(D ( * ) ) and B(B → D ( * ) ℓ − ν ℓ ), respectively. The stated branching fractions for the isospin-constrained fit refer to B − decays.

Decay
Nsig  For the assumption that R(D ( * ) ) th = R(D ( * ) ) SM , we obtain χ 2 = 14.6, which corresponds to a probability of 6.9 × 10 −4 . This means that the possibility that the measured R(D) and R(D * ) both agree with the SM predictions is excluded at the 3.4σ level [42]. Recent calculations [7,8,43,44] have resulted in values of R(D) SM that slightly exceed our estimate. For the largest of those values, the significance of the observed excess decreases to 3.2σ.

B. Search for a charged Higgs
To examine whether the excess in R(D ( * ) ) can be explained by contributions from a charged Higgs boson in the type II 2HDM, we study the dependence of the fit results on tanβ/m H + .
For 20 values of tanβ/m H + , equally spaced in the [0.05, 1.00] GeV −1 range, we recalculate the eight signal PDFs, accounting for the charged Higgs contributions as described in Sec. II. Figure 18 shows the m 2 miss and |p * ℓ | projections of the D 0 τ ν ⇒ D 0 ℓ PDF for four values of tanβ/m H + . The impact of charged Higgs contributions on the m 2 miss distribution mirrors those in the q 2 distri- bution, see Fig. 3, because of the relation The changes in the |p * ℓ | distribution are due to the change in the τ polarization.
We recalculate the value of the efficiency ratio ε sig /ε norm as a function of tanβ/m H + (see Fig. 19). The efficiency increases up to 8% for large values of tanβ/m H + , and, as we noted earlier, its uncertainty increases due to the larger dispersion of the weights in the 2HDM reweighting.
The variation of the fitted signal yields as a function of tanβ/m H + is also shown in Fig. 19. The sharp drop in the B → Dτ − ν τ yield at tanβ/m H + ≈ 0.4 GeV −1 is due to the large shift in the m 2 miss distribution which occurs when the Higgs contribution begins to dominate the total rate. This shift is also reflected in the q 2 distribution and, as we will see in the next section, the data do not support it. The change of the B → D * τ − ν τ yield, mostly caused by the correlation with the B → Dτ − ν τ sample, is much smaller. Figure 20 compares the measured values of R(D) and R(D * ) in the context of the type II 2HDM to the theoretical predictions as a function of tanβ/m H + . The increase in the uncertainty on the signal PDFs and the efficiency ratio as a function of tanβ/m H + are taken into account. Other sources of systematic uncertainty are kept constant in relative terms.
The measured values of R(D) and R(D * ) match the predictions of this particular Higgs model for tanβ/m H + = 0.44±0.02 GeV −1 and tanβ/m H + = 0.75± 0.04 GeV −1 , respectively. However, the combination of R(D) and R(D * ) excludes the type II 2HDM charged Higgs boson at 99.8% confidence level for any value of tanβ/m H + , as illustrated in Fig. 21. This calculation is only valid for values of m H + greater than 15 GeV [5,8]. The region for m H + ≤ 15 GeV has already been excluded  by B → X s γ measurements [22], and therefore, the type II 2HDM is excluded in the full tanβ-m H + parameter space.
The excess in both R(D) and R(D * ) can be explained in more general charged Higgs models [44][45][46][47]. The effective Hamiltonian for a type III 2HDM is where S L and S R are independent complex parameters, and P L,R ≡ (1 ∓ γ 5 )/2. This Hamiltonian describes the most general type of 2HDM for which m 2 H + ≫ q 2 . In this context, the ratios R(D ( * ) ) take the form The sign difference arises because B → Dτ − ν τ decays probe scalar operators, while B → D * τ − ν τ decays are sensitive to pseudo-scalar operators. The type II 2HDM corresponds to the subset of the type III 2HDM parameter space for which S R = −m b m τ tan 2 β/m 2 H + and S L = 0. The R(D ( * ) ) measurements in the type II 2HDM context correspond to values of S R ±S L in the range [−7.4, 0]. Given that the amplitude impacted by NP contributions takes the form we can extend the type II results to the full type III parameter space by using the values of R(D ( * ) ) obtained with H s (S R ± S L ) for H s (−S R ∓ S L ). Given the small tanβ/m H + dependence of R(D * ) (Fig. 20), this is a good approximation for B → D * τ − ν τ decays. For B → Dτ − ν τ decays, this is also true when the decay amplitude is dominated either by SM or NP contributions, that is, for small or large values of |S R + S L |. The shift in the m 2 miss and q 2 spectra, which results in the 40% drop on the value of R(D) shown in Fig. 20, occurs in the intermediate region where SM and NP contributions are comparable. In this region, H s (S R + S L ) = H s (−S R − S L ), and, as a result, the large drop in R(D) is somewhat shifted. However, given that the asymptotic values of R(D) are correctly extrapolated, R(D) is monotonous, and the measured value of R(D * ) is fairly constant, the overall picture is well described by the H s (S R ± S L ) ≈ H s (−S R ∓ S L ) extrapolation. Figure 22 shows that for real values of S R and S L , there are four regions in the type III parameter space that can explain the excess in both R(D) and R(D * ). In addition, a range of complex values of the parameters are also compatible with this measurement.
C. Study of the q 2 spectra As shown in Sec. II B, the q 2 spectrum of B → Dτ − ν τ decays could be significantly impacted by charged Higgs contributions. Figure 23 compares the q 2 distribution of background subtracted data, corrected for detector efficiency, with the expectations of three different scenarios. Weighted events/(0.50 GeV Due to the subtraction of the large B → D * τ − ν τ feeddown in the Dℓ samples, the measured q 2 spectrum of B → Dτ − ν τ decays depends on the signal hypothesis. This dependence is very small, however, because the q 2 spectrum of B → D * τ − ν τ decays is largely independent of tanβ/m H + . The measured q 2 spectra agree with the SM expectations within the statistical uncertainties. For B → Dτ − ν τ decays, there might be a small shift to lower values, which is indicated by the increase in the p value for tanβ/m H + = 0.30 GeV −1 . As we showed in Sec. II B, the average q 2 for tanβ/m H + = 0.30 GeV −1 shifts to lower values because the charged Higgs contribution to B → Dτ − ν τ decays, which always proceeds via an Swave, interferes destructively with the SM S-wave. As a result, the decay proceeds via an almost pure P -wave and is suppressed at large q 2 by a factor of p 2 D , thus improving the agreement with data. The negative interference suppresses the expected value of R(D) as well, however, so the region with small tanβ/m H + is excluded by the measured R(D).
The two favored regions in Fig. 22 with S R +S L ∼ −1.5 correspond to tanβ/m H + = 0.45 GeV −1 for B → Dτ − ν τ decays. However, as we saw in Fig. 3, the charged Higgs contributions dominate B → Dτ − ν τ decays for values of tanβ/m H + > 0.4 GeV −1 and the q 2 spectrum shifts significantly to larger values. The data do not appear to support this expected shift to larger values of q 2 .
To quantify the disagreement between the measured and expected q 2 spectra, we conservatively estimate the systematic uncertainties that impact the distributions shown in Fig. 23 (Appendix). Within these uncertainties, we find the variation that minimizes the χ 2 value of those distributions. Table IX shows that, as expected, the con- servative uncertainties give rise to large p values in most cases. However, the p value is only 0.4% for B → Dτ − ν τ decays and tanβ/m H + = 0.45 GeV −1 . Given that this value of tanβ/m H + corresponds to S R + S L ∼ −1.5, we exclude the two solutions at the bottom of Fig. 22 with a significance of at least 2.9σ.
The other two solutions corresponding to S R +S L ∼ 0.4 do not impact the q 2 distributions of B → Dτ − ν τ to the same large degree, and, thus, we cannot exclude them with the current level of uncertainty. However, these solutions also shift the q 2 spectra to larger values due to the S-wave contributions from the charged Higgs boson, so the agreement with the measured spectra is worse than in the case of the SM. This is also true for any other solutions corresponding to complex values of S R and S L .
On the other hand, contributions to B → Dτ − ν τ decays proceeding via P -wave tend to shift the expected q 2 spectra to lower values. Thus, NP processes with spin 1 could simultaneously explain the excess in R(D ( * ) ) [44,45] and improve the agreement with the measured q 2 distributions.

X. CONCLUSIONS
In summary, we have measured the ratios R(D ( * ) ) = B(B → D ( * ) τ − ν τ )/B(B → D ( * ) ℓ − ν ℓ ) based on the full BABAR data sample, resulting in R(D) = 0.440 ± 0.058 ± 0.042, R(D * ) = 0.332 ± 0.024 ± 0.018, where the first uncertainty is statistical and the second is systematic. These results supersede the previous BABAR measurements [14]. Improvements of the event selection have increased the reconstruction efficiency of signal events by more than a factor of 3, and the overall statistical uncertainty has been reduced by more than a factor of 2. Table X shows the results of previous B → D ( * ) τ − ν τ analyses. In 2007 and 2010, the Belle collaboration measured the absolute B → D ( * ) τ − ν τ branching fractions which we translate to R(D ( * ) ) with B(B − → D 0 ℓ − ν ℓ ) = (2.26 ± 0.11)% [12] and B(B 0 → D * + ℓ − ν ℓ ) = (4.59 ± 0.26)% [48]. For the translation of R(D * ), we choose Belle's measurement of the branching fraction, instead of the world average, because of the current large spread of measured values. For Belle 2009, we average the results for B 0 and B − decays.
The values measured in this analysis are compatible with those measured by the Belle Collaboration, as illustrated in Fig. 24.
The results presented here exceed the SM predictions of R(D) SM = 0.297±0.017 and R(D * ) SM = 0.252±0.003 by 2.0σ and 2.7σ, respectively. The combined significance of this disagreement, including the negative correlation between R(D) and R(D * ), is 3.4σ. Together with the measurements by the Belle Collaboration, which also exceed the SM expectations, this could be an indication of NP processes affecting B → D ( * ) τ − ν τ decays.
These results are not compatible with a charged Higgs boson in the type II 2HDM, and, together with B → X s γ measurements, exclude this model in the full tanβ-m H + parameter space. More general charged Higgs models, or NP contributions with nonzero spin, are compatible with the measurements presented here.

Dℓ
Dℓ Dℓ  Center: q 2 distributions for events with m 2 miss < 1.5 GeV 2 scaled to the results of the isospin-constrained fit for the SM. See Fig. 15 for a legend. Right: q 2 dependence of the efficiency. The scale for the efficiency of the normalization decays is chosen so that the maximum value is 1. The efficiency data for the signal are adjusted so that they overlap with the data for normalization decays in the central part of the q 2 range. The signal efficiencies with and without the m 2 miss selection have the same scale.

APPENDIX: SYSTEMATIC UNCERTAINTIES ON THE q 2 SPECTRA
To assess the systematic uncertainty on the measured q 2 distributions of B → D ( * ) τ − ν τ decays, we exam-ine their sensitivity to the estimated contributions from background and normalization events. The q 2 distributions of signal and the various backgrounds are presented in Fig. 25 (left). There is good agreement between the data and the background contributions as derived from the isospin-constrained fit. To further examine the shape of the fixed contributions from BB and continuum background, we show two comparisons with data control samples: one for medium values of E extra in the m ES peak regions without the BDT requirements imposed, and the other for the m ES sidebands with the BDT requirements. While the first sample shows excellent agreement over the full q 2 range, the smaller second sample shows some deviations at low and high q 2 . We approximate the deviation of the data from the simulation by a fourth order polynomial, and we adopt this difference plus the statistical uncertainty of each bin as the overall uncertainty of the BB and continuum backgrounds. We conservatively consider it uniformly distributed between the limits of the band shown in Fig. 25 and uncorrelated between different bins.
The q 2 spectrum of normalization decays, both well reconstructed and cross-feed B → D ( * ) ℓ − ν ℓ decays, is well described by the simulation, see Fig. 26. Given that the normalization decays are well understood theoretically, we adopt the statistical uncertainty of the simulated distributions as the overall uncertainty of this contribution. Except for q 2 < 5 GeV 2 , where the rate of signal decays is highly suppressed, the efficiency and detector effects are very similar for signal and normalization. Thus, we also derive the overall uncertainty from the statistical uncertainty of the simulated signal q 2 distributions.
Since it is not feasible to repeat the m 2 miss -|p * ℓ | fit for each variation of the background contributions, we adopt the following procedure to account for the impact of these changes on the χ 2 : for each of the three q 2 distributions in Fig. 23 and each variation of the background components, we determine the B → Dτ − ν τ and B → D * τ − ν τ yields by a fit that minimizes the χ 2 of those distributions.