Measurement of the Semileptonic Decays B -->D tau- nubar_tau and B -->D* tau- nubar_tau

We present measurements of the semileptonic decays B- -->D0 tau- nubar_tau, B- -->D*0 tau- nubar_tau, B0bar -->D+ tau- nubar_tau, and B0bar -->D*+ tau- nubar_tau, which are sensitive to non--Standard Model amplitudes in certain scenarios. The data sample consists of 232x10^6 Upsilon(4S) -->BBbar decays collected with the BaBar detector at the PEP-II e+e- collider. We select events with a D or D* meson and a light lepton (ell=e or mu) recoiling against a fully reconstructed B meson. We perform a fit to the joint distribution of lepton momentum and missing mass squared to distinguish signal B -->D(*) tau- nubar_tau (tau- -->ell- nubar_ell nu_tau) events from the backgrounds, predominantly B -->D(*) ell- nubar_ell. We measure the branching-fraction ratios R(D) == BR(B -->D tau- nubar_tau) / BR(B -->D ell- nubar_ell) and R(D*) == BR(B -->D* tau- nubar_tau) / BR(B -->D* ell- nubar_ell) and, from a combined fit to B- and B0bar channels, obtain the results R(D)=(41.6 +/- 11.7 +/- 5.2)% and R(D*)=(29.7 +/- 5.6 +/- 1.8)%, where the uncertainties are statistical and systematic. Normalizing to measured B- -->D(*)0 ell- nubar_ell branching fractions, we obtain BR(B -->D tau- nubar_tau)=(0.86 +/- 0.24 +/- 0.11 +/- 0.06)% and BR(B -->D* tau- nubar_tau)=(1.62 +/- 0.31 +/- 0.10 +/- 0.05)%, where the additional third uncertainty is from the normalization mode. We also present, for the first time, distributions of the lepton momentum, p*_ell, and the squared momentum transfer, q^2.


I. INTRODUCTION
Semileptonic decays of B mesons to the τ leptonthe heaviest of the three charged leptons-provide a new source of information on Standard Model (SM) processes [1,2,3], as well as a new window on physics beyond the SM [4,5,6,7,8,9]. In the SM, semileptonic decays occur at tree level and are mediated by the W boson, but the large mass of the τ lepton provides sensitivity to additional amplitudes, such as those mediated by a charged Higgs boson. Experimentally, b → cτ − ν τ decays 1 are challenging to study because the final state contains not just one, but two or three neutrinos as a result of the τ decay.
Theoretical predictions for semileptonic decays to exclusive final states require knowledge of the form factors, which parametrize the hadronic current as functions of q 2 = [p B − p D ( * ) ] 2 . For light leptons ℓ ≡ e, µ, 2 there is effectively one form factor for B → Dℓ − ν ℓ , while there are three for B → D * ℓ − ν ℓ . If a τ lepton is produced instead, one additional form factor enters in each mode. The form factors for B → D ( * ) ℓ − ν ℓ decays 3 involving the light leptons have been measured [10,11,12], providing direct information on four of the six form factors. Heavy quark symmetry (HQS) relations [13] allow one to express the two additional form factors for B → D ( * ) τ − ν τ in terms of the form factors measurable from decays with the light leptons. With sufficient data, one could probe the additional form factors and test the HQS relations.
The first measurements of semileptonic b-hadron decays to τ leptons were performed by the LEP experiments [15] operating at the Z 0 resonance, yielding an average [16] inclusive branching fraction B(b had → Xτ − ν τ ) = (2.48±0.26)%, where b had represents the mixture of b-hadrons produced in Z 0 → bb decays. The Belle experiment has reported B(B 0 → D * + τ − ν τ ) = (2.02 +0.40 −0.37 ± 0.37)% [17]. The BABAR Collaboration has presented a measurement of the branching fractions for B → Dτ − ν τ and B → D * τ − ν τ for both charged and neutral B mesons [18]. In this article, we describe the analysis in greater detail, with particular emphasis on several novel features of the event selection and fit technique. We also present distributions of two important kinematic variables, the lepton momentum, |p * ℓ |, and the squared momentum transfer, q 2 .

A. Analysis overview and strategy
We determine the branching fractions of four exclusive decay modes: B − → D 0 τ − ν τ , B − → D * 0 τ − ν τ , B 0 → D + τ − ν τ , and B 0 → D * + τ − ν τ , each of which is measured as a branching-fraction ratio R relative to the corresponding e and µ modes. To reconstruct the τ , we use the decays τ − → e − ν e ν τ and τ − → µ − ν µ ν τ , which are experimentally the most accessible. The main challenge of the measurement is to distinguish B → D ( * ) τ − ν τ decays, which have three neutrinos, from B → D ( * ) ℓ − ν ℓ decays, which have the same observable final-state particles but only one neutrino.
The analysis strategy is to reconstruct the decays of both B mesons in the Υ (4S) → BB event, providing powerful constraints on unobserved particles. One B meson, denoted B tag , is fully reconstructed in a purely hadronic decay chain. The remaining charged particles and photons are required to be consistent with the products of a b → c semileptonic B decay: the daughter charm meson (either a D or D * ) and a lepton (e or µ). The lepton may be either primary or from τ − → ℓ − ν ℓ ν τ . To distinguish signal events from the normalization modes B → D ( * ) ℓ − ν ℓ , we calculate the missing four-momentum, of any particles recoiling against the observed B tag + D ( * ) ℓ system. A large peak at zero in m 2 miss = p 2 miss corresponds to semileptonic decays with one neutrino, whereas signal events produce a broad tail out to m 2 miss ∼ 8 (GeV/c 2 ) 2 .
To separate signal and background events, we perform a fit (described in Section VII) to the joint distribution of m 2 miss and the lepton momentum (|p * ℓ |) in the rest frame of the B meson. In signal events, the observed lepton is the daughter of the τ and typically has a soft spectrum; for most background events, this lepton typically has higher momentum. The fit is performed simultaneously in eight channels, with a set of constraints relating the event yields between the channels. The fit is designed to maximize the sensitivity to the B → Dτ − ν τ signals by using events in the D * ℓ − channels to constrain the dominant backgrounds, B → D * τ − ν τ feed-down, in which the final-state D * meson is not completely reconstructed. Similarly, we use a set of D * * control samples to constrain the feed-down background to both the Dτ − ν τ and D * τ − ν τ signals. 4 We perform a relative measurement, extracting both signal B → D ( * ) τ − ν τ and normalization B → D ( * ) ℓ − ν ℓ yields from the fit to obtain the four branching-fraction ratios R(D 0 ), R(D + ), R(D * 0 ), and R(D * + ), where, for example, R(D * 0 ) ≡ B(B − → D * 0 τ − ν τ )/B(B − → D * 0 ℓ − ν ℓ ). In the ratio, many systematic uncertainties cancel, either partially or completely. These ratios are normalized such that ℓ represents only one of e or µ; however, both light lepton species are included in the measurement. We multiply these branching-fraction ratios by previous measurements of B(B → D ( * ) ℓ − ν ℓ ) to derive absolute branching fractions.

II. THE BABAR DETECTOR AND DATA SETS
We analyze data collected with the BABAR detector at the PEP-II e + e − storage rings at the Stanford Linear Accelerator Center. PEP-II is an asymmetric-energy B factory, colliding 9.0 GeV e − with 3.1 GeV e + at a centerof-mass energy of 10.58 GeV, corresponding to the Υ (4S) resonance. The data sample used consists of 208.9 fb −1 of integrated luminosity recorded on the Υ (4S) resonance between 1999 and 2004, yielding 232 × 10 6 Υ (4S) → BB decays. This data sample can be divided into two major periods: Runs 1-3, comprising 109.0 fb −1 taken from 1999 to June 2003, and Run 4, comprising 99.9 fb −1 taken from September 2003 to July 2004. The accelerator background conditions were significantly different between Runs 1-3 and Run 4, which could affect missingenergy analyses such as this one; for this reason, the two running periods have been independently validated, and the fraction of signal-like events found in the Run 4 sample is used as a crosscheck of the results, as described in Section X.
The BABAR detector is a large, general-purpose magnetic spectrometer and is described in detail elsewhere [19]. Charged particle trajectories are measured in a tracking system consisting of a five-layer double-sided silicon strip detector and a 40-layer drift chamber, both of which operate in the 1.5 T magnetic field of a superconducting solenoid. A detector of internally reflected Cherenkov light (DIRC) is used to measure charged particle velocity for particle identification (PID). An electromagnetic calorimeter (EMC), consisting of 6580 CsI(Tl) crystals, is used to reconstruct photons and in electron identification. The steel flux return of the solenoid is seg- 4 Throught this paper, we use the symbol D * * to represent all charm resonances heavier than the D * (2010), as well as nonresonant D ( * ) nπ systems with n ≥ 1.
mented and instrumented with resistive plate chambers (IFR) for muon and neutral hadron identification. All detector systems contribute to charged particle identification. Ionization energy losses in the tracking systems and the Cherenkov light signature in the DIRC are used for all charged particle types. Electrons are also identified on the basis of shower shape in the EMC and the ratio of energy deposited in the EMC to the track momentum. Muon identification is based on a minimumionization energy deposit in the EMC and on the measured interaction length in the IFR.
This analysis relies on measurement of the missing momentum carried off by multiple neutrinos, and the large solid angle coverage (hermeticity) of the detector is therefore crucial. The tracking system, calorimeter, and IFR cover the full azimuthal range and the polar angle range from approximately 0.3 < θ < 2.7 rad in the laboratory frame, corresponding to a Υ (4S) center-of-mass coverage of approximately 90% (the direction θ = 0 corresponds to the direction of the high-energy beam, and therefore to the Υ (4S) boost). The DIRC fiducial volume is slightly smaller, corresponding to a center-of-mass frame coverage of about 84%.
Within the active detector volume, the efficiency for reconstructing charged tracks and photons is very high, typically greater than 95% over most of the momentum range. At low momenta, however, the reconstruction efficiency drops off, leading to an increased contribution from feed-down processes to which special attention is paid throughout this analysis. Feed-down occurs when the photon from D * → Dγ or the π 0 from D * → Dπ 0 is not reconstructed (in the case of the π 0 , either one or both of the photons from π 0 → γγ may be missed). Care must therefore be taken to avoid confusing D * feed-down events for D signals.
We use a Monte Carlo simulation (MC) of the production and decay of signal and background events based on EvtGen [20]. A sample of simulated inclusive BB events equivalent to about five times the integrated luminosity is used to study backgrounds and to optimize event selection criteria. Large samples of many individual semileptonic B decays (discussed in Section III) are used to parameterize the distributions of variables used in the fit. Final-state radiation is simulated using PHOTOS [21]. Simulation of the detector response is performed with GEANT [22] and the resulting efficiencies and resolutions are validated in multiple data control samples.

III. SEMILEPTONIC DECAY MODELS
In the SM, the matrix element for a semileptonic B meson decay can be written as where g is the weak coupling constant, m W the W mass, V cb the quark mixing matrix element, and L µ and H µ are the leptonic and hadronic currents, respectively. Here, we have used a simplified form for the W propagator appropriate for energies much less than m W . The leptonic current is exactly known, and the hadronic current is given by In the case of a B → D transition, the axial-vector part of the current does not contribute to the decay, and we may write the hadronic current in terms of two form factors f + (q 2 ) and f − (q 2 ): with V µ ≡ cγ µ b and where p and p ′ are the four-momenta of the B and D mesons, respectively. For the B → D * transition, the axial-vector term contributes to the decay as well, and we write the hadronic current in terms of form factors V (q 2 ), A 1 (q 2 ), A 2 (q 2 ), A 3 (q 2 ), and A 0 (q 2 ): where A µ ≡ cγ µ γ 5 b and ε is the D * polarization vector. The form factor A 3 (q 2 ) is related to two other form factors as so that there are only four independent form factors.
In the limit of massless leptons, any terms proportional to q µ ≡ (p−p ′ ) µ vanish when the hadronic current is contracted with the leptonic current. For this reason, the contributions from the form factors f − (q 2 ) and A 0 (q 2 ) are essentially negligible for electrons and muons, as mentioned above.
Semileptonic decays are simulated using the ISGW2 model [23], except for B → D * ℓ − ν ℓ decays, which use an HQET model with a linear form factor expansion [24], and nonresonant B → D ( * ) πℓ − ν ℓ decays, which use the model of Goity and Roberts [25]. We reweight both signal B → D ( * ) τ − ν τ and normalization B → D ( * ) ℓ − ν ℓ events [26] so that the decay distributions follow the Caprini-Lellouch-Neubert (CLN) form factor model [27] with parameters measured in data. We use ρ 2 + = 1.17 ± 0.18 [28] for B → Dℓ − ν ℓ and B → Dτ − ν τ decays, and we use R 1 = 1.417±0.061±0.044, R 2 = 0.836±0.037±0.022, and ρ 2 A1 = 1.179±0.048±0.028 [11] for B → D * ℓ − ν ℓ and B → D * τ − ν τ decays. 5 Variation of these form factors is taken into account as a systematic uncertainty, including the correlations between the three B → D * form factor parameters. Figures 1-3 show distributions of three kinematic variables important to this analysis, all generated using the CLN form factor parameterization with parameters given above. Figure 1 compares q 2 distributions between the signal and normalization modes. Signal events must satisfy q 2 > m 2 τ , leading to qualitatively different q 2 spectra for signal and normalization events; this feature is exploited in the event selection and in validation studies. Figure 2 shows distributions of lepton energy in the B meson rest frame. While the τ − lepton in signal events typically has high energy (due to its mass), the secondary lepton ℓ − typically has much lower energy than either the τ − or the primary lepton in B → D ( * ) ℓ − ν ℓ events. This low lepton energy leads to a lower reconstruction efficiency for signal leptons than those in the normalization modes. Figure 3 shows distributions of m 2 miss for the two signal modes, which, due to the three neutrinos in these events, forms a broad structure up to very large m 2 miss .

IV. EVENT RECONSTRUCTION AND SELECTION
All event selection requirements (as well as the fit procedure described in Section VII) are defined using simulated events or using control samples in data that exclude the signal region in order to avoid any potential sources of bias. About 60% of the BB MC sample is used in optimizing the event selection, while the remaining 40% is used as an independent validation of the selection and fitting procedures.
Most of the selection criteria described here are optimized to maximize the quantity S/ √ S + B, where S and B are the expected signal and background yields in the large m 2 miss region of our data sample, assuming Standard Model branching fractions for signal decays. The requirement on ∆E of the B tag candidate (defined below) was initially optimized in the same way, but was tightened because fits to MC samples indicated that events at large |∆E| contributed to biases in the signal extraction. The final selection corresponds to a compromise 5 The parameters R 1 and R 2 are not included in the model of Caprini, Lellouch, and Neubert [27]; to model the B → D * form factors, we adopt the formalism used in [12], Eqs. (13)(14), where the leading terms in these form factor ratio expansions are taken as free parameters. We use independent slope parameters ρ 2 + and ρ 2 A 1 for the B → D and B → D * form factors, respectively, treating the two sets of form factors as uncorrelated.
The two curves in each plot show q 2 for the light lepton (dashed) and for the τ (solid). All distributions use the CLN form factor model with experimentally-measured shape parameters. The distributions are normalized to equal areas.
between the statistical S/ √ S + B optimization and the systematic effects due to this bias.

A. Btag Reconstruction
We reconstruct B tag candidates in 1114 final states B tag → D ( * ) Y ± with an algorithm that has been used previously at BABAR for a number of analyses, especially those dependent on measuring missing momentum [29]. These final states arise from the large number of ways to reconstruct the D and D * mesons within the B tag candidate and the possible pion and kaon combinations within the Y ± system. Tag-side D candidates are reconstructed as D 0 tag → K − π + , K − π + π 0 , K − π + π + π − , and K 0 S π + π − , and as D + tag → K − π + π + , K − π + π + π 0 , K 0 S π + , K 0 S π + π − π + , and K 0 S π + π 0 . Tag-side D * candidates are reconstructed as D * 0 tag → D 0 tag π 0 and D 0 tag γ and as D * + tag → D 0 tag π + . The Y ± system may consist of up to six light hadrons (π ± , π 0 , K ± , or K 0 S ). In both the D ( * ) tag and Y ± systems, we reconstruct π 0 → γγ and K 0 S → π + π − and require charged kaon candidates to sat- isfy PID criteria (loose criteria for D 0 → K − π + , tight for all other modes. 6 ) D ( * ) tag candidates are selected within about 2σ (standard deviations) of the nominal mass, with σ depending on the reconstruction mode and typically 5-10 MeV/c 2 for the D tag mass and 1-2 MeV/c 2 for the D * tag − D tag mass difference. We use two kinematic variables to identify B tag candidates, where √ s is the total e + e − energy, |p tag | is the magnitude of the B tag momentum, and E tag is the B tag energy, all defined in the e + e − center-of-mass frame. For correctly reconstructed B tag candidates, m ES is equal to the B meson mass, with a resolution of about 2.5 MeV/c 2 , and ∆E is equal to zero, with a resolution of about 18 MeV.
For each D ( * ) tag "seed" candidate, we use a recursive algorithm to identify candidate Y ± systems. Light hadrons from the remaining tracks and photons in the event are added to the Y ± system, one at a time. If the resulting values of m ES and ∆E for the D ( * ) tag Y ± candidate are close to the nominal values, the B tag candidate is accepted. If the value of ∆E is too large, the light hadron just added is removed from the Y ± system, since continuing to add particles to this Y ± candidate will increase ∆E further. The algorithm then continues recursively with the remaining particles in the event, adding and removing light hadrons to the Y ± system according to m ES , ∆E, and the Y ± system topology. This algorithm is semiexclusive, meaning that particles in the Y ± sys- tem are not constrained to intermediate resonance states. Because of this, the yield is significantly higher than exclusive B reconstruction, while the purity is somewhat lower. In this analysis, however, since we exclusively reconstruct the second B meson in the event, the purity of our final sample is substantially improved with respect to the raw B tag sample.
We require m ES > 5.27 GeV/c 2 and |∆E| < 72 MeV, corresponding to ±4σ in ∆E and −4σ in m ES (the kinematic limit m ES < √ s/2 provides an effective +4σ requirement). We reconstruct B tag candidates with an efficiency of 0.2% to 0.3%. Figure 4 shows distributions of m ES for selected B tag candidates both before and after the signal-side reconstruction. We make no attempt at this stage to select a single B tag among multiple reconstructed candidates: this decision is made after reconstructing the signal side as well.
Electron candidates are required to satisfy tight PID criteria and to have lab-frame momentum |p e | > 300 MeV/c, with an efficiency that rises from 85% at the lowest momenta to 95% for |p e | > 1.0 GeV/c. Muon candidates are required to satisfy tight PID criteria; since muon PID relies on the hit pattern in the IFR, this effectively requires |p µ | 600 MeV/c, and results in an efficiency of 40%-60% over the allowed momentum range. The energy of electron candidates is corrected for bremsstrahlung energy loss if photons are found close to the electron direction. Lepton candidates of either flavor are required to have at least 12 hits in the drift chamber and to have a laboratory-frame polar angle 0.4 < θ < 2.6 rad (excluding the very forward and very backward regions of the tracking system) in order to ensure a well-measured momentum, since mismeasured lepton momenta distort the m 2 miss distribution and tend to move background events into the signal-like region. Approximately 5% of selected lepton candidates are misidentified, almost all of which are pions misreconstructed as muons.

C. Total-Event and Single-Candidate Selection
We form whole-event candidates by combining B tag candidates with D ( * ) ℓ − candidate systems. We combine charged B tag candidates with D ( * )0 ℓ − systems and neutral B tag candidates with both D ( * )+ ℓ − and D ( * )− ℓ + systems, where the inclusion of both charge combinations allows for neutral B mixing.
In correctly reconstructed signal and normalization events, all of the stable final-state particles, with the exception of the neutrinos, are associated with either the B tag , D ( * ) , or ℓ − candidate. Events with additional particles in the final state must therefore have been mis- The gap between the Eextra = 0 bin and the remainder of the distribution corresponds to the minimum allowed photon energy, 50 MeV. The normalization is arbitrary. The agreement between the two distributions indicates that the efficiency of a cut on Eextra will cancel when we measure the branchingfraction ratio.
reconstructed, and we suppress these backgrounds with two selection requirements on the "extra" particle content in the event. We require that all observed charged tracks be associated with either the B tag , D ( * ) , or ℓ candidate. We compute E extra , the sum of the energies of all photon candidates not associated with the B tag + D ( * ) ℓ candidate system, and we require E extra < 150-300 MeV, depending on the D ( * ) channel. When considering these extra tracks and extra photons, care is taken to reject track and photon candidates which are likely to be due to accelerator background, electronics noise, or reconstruction software failures; fake photons in the EMC are, to some degree, unavoidable, which is why we can not simply require E extra = 0. The different D modes have very different levels of combinatorial background, which the E extra cut is particularly effective at rejecting. Figure 5 shows distributions of E extra for simulated signal and normalization events. Excellent agreement is seen in the two distributions, indicating that the efficiency of a cut on E extra will largely cancel when we measure the branching-fraction ratio; we observe the same level of agreement in the four D ( * ) ℓ − channels separately, as well as in the e and µ final states separately.
We suppress hadronic events and combinatorial backgrounds by requiring |p miss | > 200 MeV/c, where |p miss | is the magnitude of the missing momentum. This requirement mainly rejects hadronic events such as B → D ( * ) π − , where the π − is misidentified as a µ − . Our selection rejects more than 99% of B → D ( * ) π − background, while rejecting less than 1% of signal and other semileptonic events.
We further suppress background by requiring q 2 > 4 (GeV/c 2 ) 2 , where q 2 is calculated as This requirement preferentially rejects combinatorial backgrounds from two-body B decays such as B → D ( * ) D, where one D meson decays semileptonically (or, in the case of a D + s , leptonically as D + s → τ + ν τ ). Our selection rejects about 25% of these backgrounds, while the signal efficiency is about 98% because signal events automatically satisfy q 2 > m 2 τ ≈ 3.16 (GeV/c 2 ) 2 . For B → Dℓ − ν ℓ decays, the q 2 distribution peaks near zero (see Fig. 1), so this selection has an efficiency of about 60% for this normalization mode; for B → D * ℓ − ν ℓ decays, the q 2 distribution peaks at higher values, so our efficiency is about 70%. The q 2 requirement is the main reason why the reconstruction efficiency is different for signal and normalization modes, as seen below.
If multiple candidate systems pass our selection in a given event, we select the one with the lowest value of E extra . This scheme preferentially selects the candidate that is least likely to have lost additional particles. The main effect of this algorithm is that a candidate in one of the D * ℓ − channels will be selected before a candidate in one of the Dℓ − channels when both candidates are present in an event. Because D * → D feed-down is a dominant background while D → D * feed-up is comparatively rare, keeping as many true D * events in the D * ℓ − reconstruction channels helps to increase the sensitivity to the Dτ − ν τ signals.
To improve the resolution on the missing momentum, we perform a kinematic fit [30] to all Υ (4S) → B tag D ( * ) ℓ − candidates. We constrain charged track daughters of K 0 S , D, and B mesons to originate from common vertices, and we constrain the Υ (4S) → BB vertex to be consistent with the measured BABAR beamspot location. We constrain the mass of the signal D meson (and D * meson, if there is one) to the measured value [16], and the combined momentum of the two B mesons to be consistent with the measured beam energy.

D. D * * Control Sample Selection
We select four control samples to constrain the poorly known B → D * * (ℓ − /τ − )ν background. The selection is identical to that of the signal channels, but we require the presence of a π 0 meson in addition to the B tag + D ( * ) ℓ system. The π 0 candidate must have momentum greater than 400 MeV/c, and the event must satisfy E extra < 500 MeV, where the two photons from π 0 → γγ are excluded from the calculation of E extra .
Most of the D * * background in the four signal channels occurs when the π 0 from D * * → D ( * ) π 0 is not reconstructed, so these control samples provide a direct normalization of the background source. Similar D * * decays in which a π ± is lost contribute very little to the background since they do not have the correct charge I: Number of selected data events in the four signal channels, Nev, and in the four D * * control samples, ND * * CS. Here, the large m 2 miss region is taken to be m 2 miss > 2 (GeV/c 2 ) 2 and corresponds to the region with greatest signal sensitivity. 14 30 correlation between the B tag and D ( * ) candidate, and decays with two missing charged pions, which may have the correct charge correlation, have very low reconstruction efficiencies. The feed-down probabilities for the D * * (ℓ − /τ − )ν background are determined from simulation, with uncertainties in the D * * content treated as a systematic error as described in Sec. IX A 3.

V. SELECTED EVENT SAMPLES
After applying all of the criteria above, we select a total of 3196 data events, 2886 in the four signal channels and 310 in the D * * control samples, as listed in Table I. Since most of the events at large m 2 miss are either Dτ − ν τ or D * τ − ν τ signal events, the third column in the Table gives a first indication of where our sensitivity comes from. There are more events in the two B − channels, D 0 ℓ − and D * 0 ℓ − , due to a larger efficiency to reconstruct charged B tag candidates than neutral ones and, to a lesser extent, a larger efficiency to reconstruct D 0 mesons on the signal side than D + mesons. There are more events in the D channels than the D * channels, particularly at large m 2 miss , because these channels contain both D mesons and D * feed-down. The greatest signal sensitivity therefore comes from the D 0 ℓ − channel. Figure 6 shows distributions of |p * ℓ | versus m 2 miss for the selected data samples. One-dimensional distributions of m 2 miss and |p * ℓ | for these samples are shown when we discuss the signal fit in Section VII. Figure 7 shows distributions of |p * ℓ | versus m 2 miss for several MC samples after applying all event selection criteria. While the composition of the event sample will be discussed in greater detail in the following section, these distributions exhibit the qualitative features of the data sample which are most relevant to our signal extraction. Figure 7(a) shows D 0 ℓ − ν ℓ ⇒ D 0 ℓ − , where we introduce the ⇒ notation to mean that these are true The m 2 miss distribution is very narrowly peaked around zero, as expected for one-neutrino events. Figure 7(b) shows D * 0 ℓ − ν ℓ ⇒ D 0 ℓ − , feed-down events where a D * 0 is misreconstructed as a D 0 . In this case, the center of the m 2 miss distribution is offset from zero, and this offset decreases with increasing |p * ℓ |; this kinematic feature is common to all feed-down processes, and is due  to the fact that higher |p * ℓ | correspond to lower D * momenta and therefore to lower momenta for the lost π 0 or γ. The width of the m 2 miss distribution is also observed to decrease with increasing |p * ℓ |, a feature which is also common to most distributions; this narrowing is partly due to the same kinematic effect as before, the reduced D * phase space at large |p * ℓ |, and partly due to the fact that the lepton momentum resolution improves at higher momenta. miss due to the three neutrinos are clearly observed. Again, the m 2 miss distributions move towards zero and become narrower at high |p * ℓ |, in this case due to the reduced phase space for the multiple neutrinos, although, in Fig. 7(d), the effect of the lost π 0 or γ can also be seen as a defecit along the lower-left edge of the distribution. Figure 7(e) shows feed-down from B → D * * ℓ − ν ℓ into the D 0 ℓ − channel, where, in addition to the neutrino, one or more π 0 mesons or photons from the D * * decay have been lost. Since π 0 mesons from D * * decay typically have higher momentum than those from D * decay, the m 2 miss distribution is much broader than that in Fig. 7(b). Figure 7(f) shows the feed-up process D 0 ℓ − ν ℓ ⇒ D * 0 ℓ − , where a true D 0 meson is paired with a combinatorial π 0 or γ to fake a D * 0 candidate. In this case, the m 2 miss distribution is shifted in the opposite direction from Fig. 7 (k) show charge-crossfeed backgrounds: true B → D ( * ) ℓ − ν ℓ events reconstructed with the wrong charge for both the B tag and D ( * ) meson. Typically this occurs when a low-momentum π ± is swapped between the two mesons. Note that, even though the event is misreconstructed, this particle misassignment does not substantially alter the total missing momentum, so that the m 2 miss distribution still peaks at or near zero. While the events in Fig. 7(k), which are reconstructed in the D * ℓ − channels, are very strongly peaked at m 2 miss = 0, Fig. 7(j) includes a large feed-down component, and therefore exhibits the same sloping behavior seen in Fig. 7 Figure 7(l) shows the distribution for combinatorial background for all four signal channels. This background is dominated by hadronic B decays such as B → D ( * ) D ( * ) s that produce a secondary lepton, including events with τ leptons from D s decay.
In our BB MC sample, our criteria select D * * control samples which are 60%-80% pure B → D * * ℓ − ν ℓ events, of which more than 90% involve true D * * → D ( * ) π 0 transitions. The remaining events are split between feedup from B → D ( * ) ℓ − ν ℓ and combinatorial background. In these control samples, the B → D * * ℓ − ν ℓ component peaks at or near zero in m 2 miss , just as B → D ( * ) ℓ − ν ℓ does in the four signal channels. The qualitative features of the other contributions are similar to what is seen in the signal channels.

VI. KINEMATIC CONTROL SAMPLES
The event selection criteria described in Section IV are more complicated than those used in a typical BABAR analysis, due to the full-event reconstruction of a highmultiplicity final state and the need to veto events with extra tracks and neutral clusters. We use two data control samples to validate our simulation with respect to the observed behavior in data. The control samples are charge-crossfeed reconstructed in the D * 0 ℓ − and D * + ℓ − channels, and (l) combinatorial background in the four D ( * ) ℓ − channels. The reconstruction channel notation ⇒ and the features of these distributions are discussed in the text.
kinematically selected, with no requirement on m 2 miss , to be high purity samples of B → D ( * ) ℓ − ν ℓ events, with little or no contamination from signal decays.
For the second control sample, we remove the standard q 2 > 4 (GeV/c 2 ) 2 selection and require that events instead satisfy q 2 < 5 (GeV/c 2 ) 2 , with q 2 calculated according to Eq. (10). This control sample has very little overlap with our final event sample, where we require q 2 > 4 (GeV/c 2 ) 2 . Although the two control samples do have some overlap, this q 2 control sample has the advantage over the first of allowing us to examine events with low |p * ℓ |, as expected for signal events. In simulation, approximately 90% of this control sample is B → D ( * ) ℓ − ν ℓ (as in the first sample, the two D * ℓ − channels are approximately 90% B → D * ℓ − ν ℓ , while the two Dℓ − channels include D * feed-down). The remainder of the sample is composed of about 3% B → D * * ℓ − ν ℓ , 3% B → D ( * ) τ − ν τ , and 4%-5% combinatorial backgrounds. Figure 8 shows several data-simulation comparisons in the two control samples. The four D ( * ) ℓ − channels have been combined in these plots as have the two control samples, and this union of the two control samples is responsible for the large steps visible in (a) and (b). We see good agreement between data and simulation in these plots, as well as in similar studies where the two control samples are examined separately, the four D ( * ) ℓ − channels are examined separately, the two lepton types are examined separately, and where the data are split according to BABAR running period. We have examined variables related to B tag reconstruction, signal-side reconstruction, hermeticity and whole-event reconstruction, and missing momentum. In all cases, we observe that the simulation does a reasonable job describing the data. Because of the relative normalization scheme, small differences between simulation and data have no detrimental effect on the analysis.

A. Fit Overview
Signal and background yields are extracted using an extended unbinned maximum likelihood fit to the joint (m 2 miss , |p * ℓ |) distribution. The fit is performed simultaneously in the four signal channels and the four D * * control samples. Two two-dimensional probability density Eextra; (f) m 2 miss in the D * channels, assuming that the soft π/γ had been lost; multiplicity of (g) charged tracks and (h) neutral clusters used to reconstruct the Btag. In all plots, the points with error bars are the data and the solid histogram is the simulation, scaled to the data luminosity. Good agreement is seen between data and simulation in a variety of variables corresponding to reconstruction, kinematics, and hermeticity requirements. Small differences between data and simulation cancel in the relative measurement and have no detrimental effect on the analysis. The large steps in (a) and (b) are due to the combination of two control samples, as described in the text. The structure in (g) is caused by the larger efficiency to reconstruct charged Btag candidates-with an odd number of charged tracks-than neutral candidates, while the prominent even-odd structure in (h) is due to the fact that most neutral clusters correspond to the process π 0 → γγ and so appear in pairs. functions (PDFs) are presented in Section VII B; each component in the fit (listed below) is described by one of these two PDFs, with parameters determined from fits to simulated event samples. A set of constraints, described in Section VII C, relate fit components in different reconstruction channels. These constraints are also determined from MC samples, except for parameters describing the amount of D * feed-down into the Dℓ − signal channels, which are determined directly by the fit to data.
Tables II and III summarize the parameterization of the fit in the four signal channels and the four D * * control samples, respectively. In each of the four signal channels, we describe the data as the sum of seven components: Dτ − ν τ , D * τ − ν τ , Dℓ − ν ℓ , D * ℓ − ν ℓ , D * * (ℓ − /τ − )ν, charge-crossfeed, and combinatorial background. The four D * * control samples are described as the sum of five components: D * * (ℓ − /τ − )ν, D(ℓ − /τ − )ν, D * (ℓ − /τ − )ν, charge-crossfeed, and combinatorial background. Each of these components is described by one of the two PDFs given in Section VII B, with the numerical parameters of the 32 PDFs determined from independent MC samples. The charge-crossfeed components in the two Dℓ − signal channels are described by a single PDF, with common parameters for D 0 ℓ − and D + ℓ − , as are the two D * ℓ − charge-crossfeed components and the four D ( * ) ℓ − π 0 components; the four combinatorial background components in the signal channels are described by a single PDF with common parameters, as are the four in the D * * control samples.
B → D ( * ) τ − ν τ events feeding up into the D * * control samples are expected to contribute 1.8 ± 0.6 events in the four channels together, so these events are combined with the light lepton contribution. In both the control samples and in the signal channels, B → D * * τ − ν τ events II: Components of the signal extraction fit in the signal channels, and their approximate abundances in our BB MC sample. The structure of the fit is identical between the B − and B 0 channels. There are seven components in each of the four signal channels.

Abundance in Channel Source
BB MC (%) are expected to contribute 3.5%-4.5% of the total D * * yield; these events are combined with the light lepton contribution, and the amount of D * * τ − ν τ is varied as a systematic uncertainty.
The fit has 18 free parameters: four signal branchingfraction ratios R, one for each D ( * ) meson; four B → D ( * ) ℓ − ν ℓ normalization yields; four B → D * * ℓ − ν ℓ background yields; four combinatorial background yields, one in each of the four D * * control samples; two parameters describing D * ⇒ D feed-down, one for charged B modes and one for neutral B modes. The combinatorial background yields in the four signal channels are fixed in the fit to the expected value from simulation, as are the charge-crossfeed backgrounds in both the signal channels and D * * control samples; variation of these backgrounds is treated as a systematic uncertainty below.
We also perform a second, B − -B 0 constrained, fit, by III: Components of the signal extraction fit in the D * * control sample channels, and their approximate abundances in our BB MC sample. The structure of the fit is identical between the B − and B 0 channels. There are five components in each of the four D * * control sample channels.

B. Probability Density Functions
We construct an empirical model of the twodimensional (m 2 miss , |p * ℓ |) PDF as the product of two terms: a one-dimensional function to describe the |p * ℓ | distribution, discussed in Section VII B 1; and a |p * ℓ |dependent "resolution" function to describe the m 2 miss distribution, to be discussed in Section VII B 2. For processes in which the only missing particle is a single neutrino, the true m 2 miss spectrum is a delta function located at zero and the observed distribution is a pure resolution function. For components with multiple missing particles, the observed m 2 miss distribution is the convolution of the physical m 2 miss spectrum with our detector resolution. The PDFs presented below are used to describe both of these physical cases, with different numerical parameters describing the different behaviors; these two PDFs are flexible enough to describe the variety of physical and resolution processes needed in this analysis.

One-dimensional |p * ℓ | Parameterization
We use a generalized form of a Gaussian to model the |p * ℓ | distribution. The Gaussian distribution, has the same general properties as our distributions: it rises smoothly from zero to a peak value and then falls smoothly back to zero again. Here, p 0 represents the value of |p * ℓ | for which G peaks and σ represents the width of the Gaussian distribution.
This gross agreement is not enough, however, so we define a modified Gaussian function, where, for convenience, we have absorbed the constant factor of 2 into the definition of σ(|p * ℓ |). By allowing the width and exponent of the Gaussian to be functions of |p * ℓ |, we are able to describe a greater variety of shapes. Specifically, we take ν(|p * ℓ |) to be a linear function, where ν L and ν H are the values of the exponential term at the low and high endpoints of |p * ℓ |, fixed at zero and 2.4 GeV/c, respectively. Similarly, we parameterize σ(|p * ℓ |) as a bilinear function, where σ L , σ 0 , and σ H represent the widths of the Gaussian at |p * ℓ | = 0, |p * ℓ | = p 0 , and |p * ℓ | = 2.4 GeV/c, respectively. Even though this parameterization is discontinuous at the point |p * ℓ | = p 0 , the resulting function H(|p * ℓ |) remains smooth since the numerator in the exponent, (|p * ℓ | − p 0 ), goes to zero at the same point. The |p * ℓ | parameterization therefore has six free parameters: p 0 , the peak; ν L and ν H , describing the exponential term; and σ L , σ 0 , and σ H , describing the width. When performing fits using this PDF, we integrate H numerically to compute the normalization.

Two-dimensional PDF Parameterization
We construct two types of two-dimensional PDF, P 1 (|p * ℓ |, m 2 miss ) and P 2 (|p * ℓ |, m 2 miss ) by multiplying the model of the lepton spectrum above by a "resolution" function in m 2 miss , where the resolution is a function of |p * ℓ |. Allowing the parameters of the resolution function to be functions of |p * ℓ | produces a correlation between the two fit variables, and it is these parameters which allow the PDFs to describe such a wide variety of shapes.
Using the model of the lepton spectrum H(|p * ℓ |) introduced above, we construct the PDFs as: and Here, the functions G 1 and G 2 are Gaussians and G b is a bifurcated Gaussian (Gaussian with different σ parameters on either side of the mean), respectively; all are functions of m 2 miss , with parameters dependent on |p * ℓ |. The |p * ℓ | dependence of the various parameters of G 1,2 and G b is listed in Table IV. The total number of free parameters for P 1 is 18: six for H(|p * ℓ |), five each for G 1 and G 2 , and two for f 1 . The total number of free parameters for P 2 is 24: six for H(|p * ℓ |), five each for G 1 and G 2 , four for G b , and two each for f 1 and f 2 .
We use the simpler PDF, P 1 , to model most of the semileptonic fit components (22 out of 32), as well as the charge-crossfeed and combinatorial backgrounds. For the remaining ten components, however, the more complicated parameterization P 2 is required to adequately describe the m 2 miss tail. Eight of these components are the ones in which the only missing particle is a single neutrino, and the remaining two are components in which a single neutrino and a soft π 0 or γ are missing, IV: |p * ℓ | dependence of the m 2 miss PDF parameterizaion. The form of f2 is chosen to allow the G b term to contribute at low |p * ℓ |, but to drive this term rapidly to zero as |p * ℓ | increases. The form of σH is chosen to allow for a long tail towards high m 2 miss at low |p * ℓ |, but to drive this term rapidly to zero as |p * ℓ | increases (note that there is no problem having σ approach zero since the amplitude of this term goes to zero as well; the result is finite and well-behaved). Npar gives the number of free parameters for each term separately.

C. Crossfeed Constraints
We apply a number of constraints in the fit, relating the event yields between different reconstruction channels in order to make use of all available information. These constraints help to maximize our sensitivity, particularly to the B → Dτ − ν τ signals where the dominant backgrounds are due to feed-down. There are 20 such constraints in the fit, corresponding to 20 different ways in which a true B → D/D * /D * * ℓ − ν ℓ event can be reconstructed with the wrong final-state meson, either as feed-down (D * ⇒ D and D * * ⇒ D/D * ) or as feed-up (D ⇒ D * /D ( * ) π 0 and D * ⇒ D ( * ) π 0 ). These constraints are implemented in the fit by requiring that the number of events of type j correctly reconstructed in the i th channel (N ij ) is related to the number of events of type j reconstructed in a crossfeed channel where f i→i ′ ,j is a crossfeed constraint relating the two yields. The crossfeed constraints f i→i ′ ,j are linearly related to the misreconstruction probability. For feeddown processes, in which the probability to lose a lowmomentum π 0 or γ is high, f i→i ′ ,j typically takes values between 0.2 and 1.0; for feed-up processes, in which the probability to reconstruct a fake π 0 or γ in a narrow mass window is low, f i→i ′ ,j typically takes values between 0.01 and 0.1. The values for most of the f i→i ′ ,j terms are taken from simulation, but, in order to reduce systematic ef- The MC sample is shown as points, and the projection of the fit is shown as a curve. Note the sharp peak at m 2 miss = 0 in (a), while the peak in (b) is somewhat spread out and shifted to larger values of m 2 miss because of the lost π 0 or γ from D * 0 decay. fects, the values of the dominant feed-down components, B → D * ℓ − ν ℓ reconstructed in the Dℓ − signal channels, are left free in the fit to data. We also use the floating values of these D * feed-down constraints to apply a small first-order correction to the corresponding signal feed-down constraints describing B → D * τ − ν τ reconstructed in the Dℓ − channels; in this way, we use the high-statistics D * ℓ − ν ℓ samples to improve our knowledge of the signal feed-down probability.

VIII. SIGNAL EXTRACTION AND NORMALIZATION
The fit described in Section VII directly measures, for each signal mode, the ratio of the number of signal events in the data sample, N sig , to the number of corresponding normalization events, N norm . We measure the signal branching-fraction ratios R as where the relative efficiency ε sig /ε norm is calculated from signal MC samples as Here, the N gen are the numbers of simulated events, and the N reco are the numbers of reconstructed events, including both correctly reconstructed events and contributions from feed-up or feed-down. Crossfeed is not a large effect, however, because both the numerator and denominator in this relative efficiency receive crossfeed contributions, and the net result tends to cancel (this cancellation is not exact, since the D * momentum spectra are not identical between signal and normalization modes, but these differences are already accounted for in our normalization procedure). Signal efficiencies are given in Table V. The relative efficiencies for the two B → Dτ − ν τ modes are much larger than unity because of the q 2 cut, which is ≈ 98% efficient for signal events but rejects about 50% of the B → Dℓ − ν ℓ normalization events, as seen in Fig. 1(a). The q 2 cut has a similar, but less pronounced, effect on the D * modes, but, due to the lower efficiency for identifying secondary leptons in the signal modes, the resulting relative efficiency is close to unity.
We describe the individual contributions to the systematic uncertainty below. We divide the systematics into two broad categories: additive and multiplicative. Additive systematic uncertainties are those which affect the fit yields and therefore reduce the significance of the measured signals. Multiplicative uncertainties affect the normalization of the signals and the numerical results but not the significance.

A. Additive Systematic Uncertainties
In order to estimate additive systematic uncertainties, we perform an ensemble of fits to MC event samples. For each source of uncertainty, we perform a number of tests where we modify, as appropriate, the fit shapes, crossfeed constraints, and the combinatorial background yields (all of which are fixed to MC-derived values in the nominal fit) and perform a signal fit. By doing a large number of such tests and studying the distribution of fit results in these ensembles, we are able to estimate the systematic uncertainties. In all of these ensembles, we take the RMS of the observed distribution, relative to the corresponding mean fit value, as the systematic uncertainty.

Monte Carlo Statistics
In order to study the systematic uncertainties due to limited Monte Carlo statistics, we perform two ensembles of fits. In the first ensemble, we perform a variation of the PDF shapes. Each of the 37 PDFs are independently varied by generating new values for each of the 18 or 24 shape parameters according to the uncertainties in the PDF fit, taking into account correlations between the fitted parameters. In the second ensemble, we vary each of the feed-up and feed-down constraints according to their statistical uncertainties. Figure 15 shows distributions of fit results for the ensemble of PDF shape fits.

Combinatorial Background Modeling
Table VII summarizes the physical sources of combinatorial background considered in this analysis, including their approximate abundances in our BB MC sample after all event selection. In order to study systematic effects, we perform an ensemble of fits, reweighting events from the various combinatorial sources.
In total, the two-body B decays B → D  VI: Contributions to the total systematic uncertainty. The additive systematic uncertainties represent uncertainties on the fit yield, and therefore reduce the statistical significance of the results. The multiplicative systematic uncertainties represent uncertainties on the normalization, so they affect the numerical results but not the statistical significance. The first four columns summarize errors on the individual branching-fraction ratios; the last two columns summarize errors on the B − -B 0 constrained measurement. The totals here refer to errors on the branching-fraction ratios R; the errors on B(B → D ( * ) ℓ − ν ℓ ) (discussed in Section X) only apply to the absolute branching fractions, and are not included in the quoted total error.
Branching fractions of most of the relevant two-body B decays (and some of the three-body decays as well) have previously been measured. These branching fractions are listed in Table VIII, along with relevant branching fractions of the D + s meson. To study systematic uncertainties related to combinatorial background modeling, we perform an ensemble of fits. In each fit, we reweight events in the simulation. For modes listed in Table VIII, we reweight the branching fraction, generating random weights from a Gaussian distribution based on the measured value (for decays involving a D + s meson, the weight is the product of weights for both the B and D + s decays). For charge-crossfeed events (true B → D ( * ) ℓ − ν ℓ events where the B tag and signal D ( * ) swap a charged particle), the dominant systematic uncertainty is not the branching fraction, but rather the efficiency to reconstruct the B tag with the wrong charge. We estimate a 10% uncertainty on the modeling of this process, i.e., we generate weights for these events using a Gaussian with a mean of 1 and a width of 0.1. For double-semileptonic events, with both B mesons decaying to D ( * ) ℓ − ν ℓ , again, the dominant uncertainty comes from the probability to misreconstruct a B tag candidate in this event, and we assume a 10% uncertainty on this number as well. For events in which the signal lepton is misidentified, we assign a 10% uncertainty; the typical fake rate measured in data is 2%-3%, with datasimulation discrepancies generally 10% or less in the momentum ranges of interest. For all remaining sources of combinatorial background, including high-multiplicity  [31], except ( †) which are taken from [32]. The last column gives the branching fraction used to generate the BABAR MC sample, where each number is shown in the same scale as the corresponding number in the second column. isospin symmetry between charged and neutral B modes.
For each test, we generate random numbers for the six exclusive modes (D, D * , and the resonant D * * states), independently for B + and B 0 decays. We then saturate the remaining inclusive b → cℓ − ν ℓ rate with the four nonresonant states, maintaining the Monte Carlo ratio of 0.1 : 0.3 : 0.2 : 0.6. Even though we are only interested in the B → D * * ℓ − ν ℓ states, we need to generate distributions of the B → D ( * ) ℓ − ν ℓ branching fractions to allow for sufficient variations in the nonresonant states which are used to saturate the total rate.
For each test, we reweight both the D * * ℓ − ν ℓ PDFs and crossfeed constraints to estimate the systematic uncertainty.

B → D ( * ) Form Factors
We reweight the form factors of both signal B → D ( * ) τ − ν τ and normalization B → D ( * ) ℓ − ν ℓ decays. In both cases, we use the form factor parameterization of Caprini, Lellouch, and Neubert [27], with numerical parameters given in Section III. We reweight signal and IX: B → Xcℓ − ν ℓ branching fractions used in the D * * modeling systematic study. The first line, cℓν, represents the inclusive semileptonic branching fraction. For the six lines representing the D, D * , and D * * resonant states, the distribution of these branching fractions is taken to be Gaussian with the given mean and width. For the last four lines, representing the nonresonant D * * states, the ranges of variation are not shown in this table; their distribution is determined by the inclusive rate and the other exclusive modes, as described in the text. The generated branching fractions, Bgen, are the same for charged and neutral B mesons. All numbers are in %.

Mode
Bgen Studies in the two kinematic control samples show acceptable overall agreement between data and simulation for the m 2 miss resolution [see Fig. 8(d)], but suggest that the simulation may underestimate the ratio of the number of events in the large m 2 miss tail region to the number of events near m 2 miss = 0. We estimate that this tail component of the resolution may be underestimated by up to 10%. We study systematic effects related to this by reweighting events at large m 2 miss , greater than 1 (GeV/c 2 ) 2 , up by 10%, modifying the PDF shapes for B → Dℓ − ν ℓ and B → D ( * ) ℓ − ν ℓ . We perform a fit with these modified PDFs and take the difference from the nominal fit as a systematic uncertainty.

π 0 Efficiency and Crossfeed Constraints
While the systematic uncertainties due to detector efficiencies (described in more detail in Section IX B 3) are primarily multiplicative, the efficiencies for π 0 reconstruction have a large impact on the feed-down efficiencies and therefore the fit yields. This effect can be enhanced by the fact that the feed-down constraints are defined as the ratio of the number of events reconstructed in the Dℓ − channel to that in the D * ℓ − channel, which move in opposite directions as the π 0 efficiency is varied.
We generate an ensemble of fits by varying the π 0 efficiency within its uncertainty, 3.0% per π 0 . The resulting changes in the feed-down constraints for both signal and background modes are propagated through the signal fit to estimate the resulting systematic uncertainties.
We assign an additional systematic uncertainty on D * * ℓ − ν ℓ feed-down rates due to the fact that the π 0 mesons involved in feed-down processes typically have low momentum, while the 3.0% systematic uncertainty mentioned above is derived from a control sample with a broad spectrum. Since we float the constraints describing D * ⇒ D feed-down in the fit, D * feed-down processes are insensitive to systematic effects due to the π 0 efficiency at low momentum. The D * * feed-down constraints, however, are taken from simulation and can therefore be affected.
We compare the fitted values of the D * ⇒ D feed-down rates to the simulation to estimate that the efficiency for low-momentum π 0 mesons is correctly modeled to within 10%. We generate an ensemble of fits in which we vary the π 0 reconstruction efficiency ±10% for π 0 mesons with momentum less than 300 MeV/c. We generate new PDFs and feed-down constraints which we propagate through the signal fit to estimate the systematic uncertainties.
We vary the fraction of B → D * * τ − ν τ events in the D * * samples by generating random numbers from a Gaussian distribution with mean 1.0 and width 0.3, equivalent to a ±30% variation. For each test, we generate new PDFs and crossfeed constraints to estimate the systematic uncertainty.

Monte Carlo Statistics
The dominant multiplicative systematic uncertainty is due to limited Monte Carlo statistics. The various MC samples are independent of one another, so that there is no cancellation between the signal and normalization.

Bremsstrahlung and Final-State Radiation
Based on a control sample of identified electrons and studies in MC samples, we estimate the uncertainty on reconstruction efficiency due to Bremsstrahlung and final-state radiation effects to be 2.1%. This uncertainty applies to both signal and normalization modes, however, and so the effect on the relative efficiency is expected to cancel. The fractions of events in which a photon is radiated are nearly the same between signal and normalization modes, within statistical precision of 10%; we therefore treat the uncertainty between signal and normalization modes as 90% correlated to calculate the final systematic uncertainty.

Detector Efficiencies
We estimate systematic uncertainties related to the detector efficiencies-track and neutral reconstruction and charged particle identification-by studying these efficiencies in several control samples in both data and simulation. We correct the MC efficiencies to match those seen in the data, and we take the statistical precision of these studies as an estimate of the systematic uncertainty on absolute efficiencies.
Since we normalize our signals to B → D ( * ) ℓ − ν ℓ , we calculate systematic uncertainties on the relative efficiency, treating uncertainties on the signal and normalization modes as correlated. The degree of correlation, and therefore, the degree to which the uncertainty cancels, is determined by the kinematics of the two samples. For most of the final state particles, the kinematic distributions are very similar between signal and normalization modes and so the systematic uncertainty cancels almost entirely. For the charged leptons, however, the momentum spectra are very different between signal and normalization (see Fig. 2), and so the associated systematic uncertainty is larger.

Hadronic Daughter Branching Fractions
We reconstruct both signal and normalization modes using the same set of final states, so uncertainties due to the branching fractions of these states very nearly cancel. (The D ( * ) momentum spectra are slightly different between signal and normalization modes, so this cancellation is not perfect.) We take the uncertainty on each of the reconstructed D * , D, K 0 S , and π 0 decay modes from [16] and propagate each of these uncertainties through to the relative efficiency, using the relative abundance of each decay chain in the signal and normalization MC samples to determine the correlation and the degree of cancellation.
Table X also gives the significances of the signal yields. The statistical significance is determined from 2∆(ln L), where ∆(ln L) is the change in log-likelihood between the nominal fit and the no-signal hypothesis. The total significances are determined by including the systematic uncertainties on the fit yields in quadrature with the statistical errors. In the B − -B 0 -constrained fit, the signal significances are 3.6σ and 6.2σ for R(D) and R(D * ), respectively.
The statistical correlation between R(D) and R(D * ) is −0.51 in the B − -B 0 -constrained fit. This correlation is due to the fact that most of the events at large m 2 miss are either B → Dτ − ν τ or B → D * τ − ν τ signal events, and increasing either of the two signal yields in the fit necessarily decreases the other. The systematic uncertanties have a correlation of −0.03 between R(D) and R(D * ); most of the systematic uncertainties have large negative correlations for the same reason that the statistical uncertainty does, but the combinatorial background uncertainty affects both signal yields in a coherent manner and so contributes a large positive correlation. The sum of the two branching fractions, taking all correlations into account, is B(B → D ( * ) τ − ν τ ) = (2.48±0.28±0.15±0.08)%. Figures 17 and 18 show the observed q 2 distributions in the four signal channels in the low and high m 2 miss regions, respectively. The histograms in these figures are taken from MC samples of the various components, with each component scaled to match the yield in the B − -B 0constrained fit; since q 2 is not a fit variable, we cannot show a projection of a continuous PDF as was done in Figs. 10-14. As before, we observe good agreement between the data and the expectation from simulation, in both the low and high m 2 miss regions. Since the q 2 distribution is highly dependent on the form factor model, we note that the CLN model describes both normalization and signal events within the available statistics. Table XI summarizes the results of several crosschecks, including splitting up the sample according to lepton flavor, lepton charge, and data-taking period. We have done these checks by performing "cut-and-count" analyses, both in the data and in simulated event samples. In all cases, the results in data are consistent with our expectations from simulation. The first row in this table shows the fraction of events with muon candidates in data and simulation, both for the full event sample and for the signal-sensitive region in m 2 miss . Electron identification is more efficient than muon ID, which is why the muon fraction in the final sample is less than X: Results from fits to data: the signal yield (Nsig), the yield of normalization B → D ( * ) ℓ − ν ℓ events (Nnorm), the relative systematic error due to the fit yields [(∆R/R) fit ], the relative systematic error due to the efficiency ratios [(∆R/R)ε], the branching-fraction ratio (R), the absolute branching fraction (B), and the total and statistical signal significances (σtot and σstat). The first two errors on R and B are statistical and systematic, respectively; the third error on B represents the uncertainty on the normalization mode. The last two rows show the results of the fit with the B − -B 0 constraint applied, where B is expressed for the B 0 . The statistical correlation between R(D) and R(D * ) in this fit is −0.51.

Mode
Nsig TABLE XI: Crosscheck studies, splitting the data according to lepton flavor, lepton charge, and running period. The first row shows the fraction of events with muon candidates for both data and MC samples, for both the full event sample and for the signal-sensitive region m 2 miss > 1 (GeV/c 2 ) 2 . The second row shows fractions of events with positively charged lepton candidates, and the third row shows the fractions of events recorded in Run 4. In all cases, the data are consistent with the simulation and with expectations.

50%
, and, at lower momenta (which generally correspond to larger m 2 miss ), this efficiency difference is more pronounced; in both cases, however, the muon abundance is well-modelled by the simulation. The next row shows the fraction of positively-charged lepton candidates (versus negatively-charged candidates), and all samples are consistent with the expected 50/50 split. The last row shows the fraction of events recorded during the Run 4 BABAR data-taking period; Run 4 had significantly different accelerator background conditions from Runs 1-3, which could affect missing-energy analyses. The fraction of events in the Run 4 subsample is consistent with expectations: Run 4 makes up 47% of the total luminosity.
We estimate the goodness of fit using an ensemble of simulated experiments. We generate 1000 event samples, using the nominal PDFs for the fit to data and event yields based on the B − -B 0 -constrained fit to data. We fit each of these samples both with and without the B − -B 0 constraints and study the distribution of − log L in these fits. Figure 19 shows the distribution of − log L for the two ensembles of fits. In both cases, the value of − log L obtained in the fit to data is indicated with an arrow, and, in both cases, this value is found within the central part of the Monte Carlo distribution, indicating a good fit. In the unconstrained fit, 11.7% of the simulated experiments have a value of − log L greater than the value observed in data, corresponding to the probability that we expect to observe a fit as bad, or worse, than the one actually observed. This probability is large, indicating an acceptable goodness of fit. The corresponding probability for the B − -B 0 constrained fit is 11.8%, also large.
From these branching-fraction ratios and known branching fractions of the normalization modes B → D ( * ) ℓ − ν ℓ , we derive the absolute branching fractions R(D) and R(D * ) are about 1σ higher than the SM predictions but, given the uncertainties, there is still room for a sizeable non-SM contribution.
We have also presented distributions of the lepton momentum |p * ℓ | and the squared momentum transfer q 2 for B → D ( * ) τ − ν τ events. In all cases, these distributions are consistent with expectations based on the SM and the CLN form factor model with measured form factors.
We are grateful for the extraordinary contributions of our PEP-II colleagues in achieving the excellent luminosity and machine conditions that have made this work possible. The success of this project also relies critically on the expertise and dedication of the computing organizations that support BABAR. The collaborating institutions wish to thank SLAC for its support and the kind hospitality extended to them. This work is supported by the US Department of Energy and National Science Foundation, the Natural Sciences and Engineering Research Council (Canada), the Commissariatà l'Energie Atomique and Institut National de Physique Nucléaire et de Physique des Particules (France), the Bundesministerium für Bildung und Forschung and Deutsche Forschungsgemeinschaft (Germany), the Istituto Nazionale di Fisica