A Search for the Rare Decays B -->pi l+l- and B0 -->eta l+l-

We present the results of a search for the rare flavor-changing neutral-current decays B -->pi l+l- (pi=pi+/-, pi0 and l = e,mu) and B0 -->eta l+l- using a sample of e+e- -->Y(4S) BB decays decays corresponding to 428 fb^-1 of integrated luminosity collected by the BABAR detector. No significant signal is observed, and we set an upper limit on the isospin and lepton-flavor averaged branching fraction of BF(B -->pi l+l-)<7.0 x 10^-8 and a lepton-flavor averaged upper limit of BF(B0 -->eta l+l-)<9.2 x 10^-8, both at the 90% confidence level. We also report 90% confidence level branching fraction upper limits for the individual modes B+ -->pi+ e+e-, B0 -->pi0 e+e-, B+ -->pi+ mu+mu-, B0 -->pi0 mu+mu-, B0 -->eta e+e-, and B0 -->eta mu+mu-.


I. INTRODUCTION
In the standard model (SM), the decays B → πℓ + ℓ − (π = π ± , π 0 and ℓ = e, µ) and B 0 → ηℓ + ℓ − proceed through the quark-level flavor-changing neutral current (FCNC) process b → dℓ + ℓ − . Since all FCNC processes are forbidden at tree level in the SM, the lowest order diagrams representing these transitions must involve loops. For b → dℓ + ℓ − , these are the electroweak semileptonic penguin diagrams ( Fig. 1(a)) and the W + W − box diagrams ( Fig. 1(b)). The b → dℓ + ℓ − transition is similar to b → sℓ + ℓ − , but its rate is suppressed by the ratio |V td /V ts | 2 ≈ 0.04 where V td and V ts are elements of the Cabibbo-Kobayashi-Maskawa quark mixing matrix [1,2]. The predicted branching fractions for the B + → π + ℓ + ℓ − decay modes lie in the range of (1.4 -3.3) × 10 −8 , when the dilepton mass regions near the J/ψ and ψ(2S) are excluded in order to remove decays that proceed through the intermediate charmonium resonances. The largest source of uncertainty in these predictions arises from * Now at the University of Tabuk, Tabuk 71491, Saudi Arabia † Also with Università di Perugia, Dipartimento di Fisica, Perugia, Italy ‡ Now at the University of Huddersfield, Huddersfield HD1 3DH, UK § Deceased ¶ Now at University of South Alabama, Mobile, Alabama 36688, USA * * Also with Università di Sassari, Sassari, Italy knowledge of the B → π form factors [3][4][5]. These branching fractions imply that 5-15 events occur for each B → πℓ + ℓ − decay channel in the BABAR data set (471 million BB pairs). The predicted B 0 → ηℓ + ℓ − branching fractions lie in the range (2.5 -3.7) × 10 −8 where again the dominant uncertainty is due to lack of knowledge of the B 0 → η form factors [6].
Many extensions of the SM predict the existence of new, heavy particles which couple to the SM fermions and bosons. The b → dℓ + ℓ − and b → sℓ + ℓ − decays provide a promising avenue in which to search for New Physics (NP). Amplitudes from these NP contributions can interfere with those from the SM, altering physical observables (e.g., decay rates, CP , isospin, and forward-backward asymmetries) from the SM predictions [3,4,[7][8][9][10]. Measurements in the πℓ + ℓ − and ηℓ + ℓ − systems complement and provide independent searches of NP from those in the K ( * ) ℓ + ℓ − channels [11][12][13][14][15][16][17], as physics beyond the SM may have non-trivial flavor couplings [18]. The measurement of observables as a function of the square of the invariant dilepton mass q 2 = m 2 ℓℓ for exclusive b → dℓ + ℓ − decay modes allows for more thorough tests of SM predictions and deeper probes for NP but is currently not possible due to the size of the data set.
Only one b → dℓ + ℓ − decay has been observed to date, with LHCb measuring the B + → π + µ + µ − branching fraction to be (2.4±0.6±0.2)×10 −8 [19]. Both BABAR [20] and Belle [21] have performed searches for B → πℓ + ℓ − decays, but have observed no significant signal. For the πℓ + ℓ − modes, the smallest upper limits from the B factories lie within an order of magnitude of the SM pre- dictions [3][4][5] and are beginning to exclude portions of the NP parameter space. No previous searches for B 0 → ηℓ + ℓ − have been reported. Observation of b → dℓ + ℓ − decays at the B factories is currently limited by the size of the available data sets. Additionally, for B + → π + ℓ + ℓ − , background from B + → K + ℓ + ℓ − decays where the kaon is misidentified as a pion must be treated carefully as K + ℓ + ℓ − can appear very signal-like and occurs at a rate approximately 25 times the expected B + → π + ℓ + ℓ − rate.
In this article we report on our study of the B → πℓ + ℓ − and B 0 → ηℓ + ℓ − decays using the full BABAR data set, presenting branching fraction upper limits for the modes B + → π + e + e − , B 0 → π 0 e + e − , B + → π + µ + µ − , B 0 → π 0 µ + µ − , B 0 → ηe + e − , and B 0 → ηµ + µ − . Charge conjugation is implied throughout unless specified otherwise. We also present upper limits for the lepton-flavor averaged modes B + → π + ℓ + ℓ − , B 0 → π 0 ℓ + ℓ − , and B 0 → ηℓ + ℓ − , where we constrain the e + e − and µ + µ − branching fractions to be equal; for the isospin averaged modes B → πe + e − and B → πµ + µ − , where the B + → π + ℓ + ℓ − decay rate is constrained to be twice the B 0 → π 0 ℓ + ℓ − decay rate; and for the isospin and lepton-flavor averaged mode B → πℓ + ℓ − . For the lepton-flavor averaged measurements, we neglect differences in available phase space due to the difference between the electron and muon masses. The branching fractions are based on signal yields that are extracted through an unbinned, extended maximum likelihood fit to two kinematic variables. All selection criteria are determined before the fit was performed on data, i.e., the analysis is performed "blind". The results of this analysis are based upon a sample of e + e − → Υ (4S) → BB interactions provided by the PEP-II asymmetric-energy storage rings and collected by the BABAR detector located at SLAC National Accelerator Laboratory. The BABAR data sample corresponds to an integrated luminosity of 428 fb −1 containing 471 mil-lion BB decays. This is the full data set collected at the Υ (4S) resonance. A detailed description of the BABAR detector can be found elsewhere [22]. Charged particle momenta are measured with a five-layer, double-sided silicon vertex tracker and a 40-layer drift chamber operated in proportional mode. These two tracking systems are immersed in the 1.5 T magnetic field of a superconducting solenoid. A ring-imaging Cherenkov detector with fused silica radiators, aided by ionization loss dE/dx measurements from the tracking system, provides identification of charged particles. Electromagnetic showers from electrons and photons are detected with an electromagnetic calorimeter (EMC) constructed from a finely segmented array of thallium-doped CsI scintillating crystals. The steel flux return of the solenoid (IFR) was initially instrumented with resistive plate chambers (RPCs) and functions primarily to identify muons. For the later data taking periods, the RPCs of the IFR were replaced with limited streamer tubes and brass to increase absorption.
The BABAR Monte Carlo (MC) simulation utilizes the Geant4 package [23] for detector simulation, and EvtGen [24] and Jetset7.4 [25] for BB and e + e − → qq (q = u, d, s, c) decays, respectively. The BB and continuum (e + e − → qq, q = u, d, s, c) MC samples correspond to an effective luminosity of about ten times the data sample collected at the Υ (4S) resonance. Simulated B → πℓ + ℓ − signal decay samples are generated according to the form-factor model of Ref. [26], with the Wilson coefficients taken from Refs. [27][28][29], and the decay amplitudes calculated in Ref. [10]. The B 0 → ηℓ + ℓ − signal MC samples utilize the same kinematics, Wilson coefficients, and form-factor model as the πℓ + ℓ − modes. The effects of the choice of form-factor model and the values of the Wilson coefficients are considered as sources of systematic uncertainty in the signal efficiency. We also make use of simulated B → K ( * ) ℓ + ℓ − , B → J/ψ X, and B → ψ(2S)X samples. Signal efficiencies, as well as parameters of the fit model, are determined from signal and K ( * ) ℓ + ℓ − MC data sets. The B → J/ψ X and B → ψ(2S)X MC samples allow us to study background from these decays and also serve as the data sets from which we fix the parameters of the fit model used in the B → J/ψ π/η fit validation, as described later.

III. EVENT RECONSTRUCTION AND CANDIDATE SELECTION
Event reconstruction begins by building dilepton candidates from two leptons (e + e − or µ + µ − ). Leptons are selected as charged tracks with momenta in the laboratory reference (Lab) frame greater than 300 MeV/c. Loose particle identification (PID) requirements are placed upon the two leptons. More stringent PID requirements are applied later, and the optimization of these selection criteria is discussed in Section V. The lepton pair is fit to a common vertex to form a dilepton candidate with the requirement that m ℓℓ < 5.0 GeV/c 2 [30]. We also place a loose constraint on the χ 2 probability of the vertex fit by requiring it to be greater than 10 −10 . For electrons, we apply an algorithm which associates photons with electron candidates in an attempt to recover energy lost through bremsstrahlung, allowing at most one photon to be associated with each electron. The photon trajectory is required to lie within a small cone of opening angle 0.035 rad about the initial momentum vector of the electron, and the photon energy in the Lab frame must be greater than 30 MeV. Additionally, we suppress background from photon conversions by requiring that the invariant mass of the electron (or positron) paired with any other oppositely charged track in the event be greater than 30 MeV/c 2 .
Charged pion candidates are charged tracks passing pion PID requirements which retain approximately 90-95% of charged pions and only 2-5% of charged kaons. We reconstruct π 0 candidates from two photons with invariant diphoton mass m γγ lying in the range 115 < m γγ < 150 MeV/c 2 . A minimum value of 50 MeV is required for the Lab energy of each photon. Photons are detected as EMC clusters not associated with a charged track. The clusters are also required to have a lateral shower profile consistent with originating from a photon. We reconstruct η as η → γγ (η γγ ) and η → π + π − π 0 (η 3π ), which constitute 39.3% and 22.7% of the η branching fraction, respectively. As in the case of the π 0 , we require the η γγ photon daughters to have energy greater than 50 MeV in the Lab frame. Additionally, the photon energy asymmetry A γ = |E 1,γ − E 2,γ |/(E 1,γ + E 2,γ ) must be less than 0.8, where E 1,γ and E 2,γ are the energies of the photons in the Lab frame. The invariant diphoton mass must lie in the range 500 < m γγ < 575 MeV/c 2 . For η 3π , the pion candidates are fit to a common vertex to form an η candidate. In the fit the η candidate mass is constrained to the nominal η mass, while the invariant three-pion mass is required to lie in the range 535 < m 3π < 565 MeV/c 2 .
B candidates are reconstructed from a hadron candidate (π + , π 0 , η γγ , or η 3π ) and a dilepton candidate (e + e − or µ + µ − ). The hadron and dilepton candidates are fit to a common vertex, and the entire decay chain is refit. We make use of two kinematic, Lorentz-invariant quantities, m ES and ∆E, defined as where √ s = 2E * beam is the total energy of the e + e − system in the center of mass (CM) frame, q B and q 0 = (E 0 , p 0 ) are the four-vectors representing the momentum of the B candidate and of the e + e − system, respectively, and p B is the three-momentum of the B candidate. In the CM frame, these expressions simplify to where the asterisk indicates evaluation in the CM frame. These variables make use of precisely measured beam quantities. All B candidates are required to have m ES > 5.1 GeV/c 2 and −300 < ∆E < 250 MeV. The distributions of these two variables are later fit to extract the πℓ + ℓ − and ηℓ + ℓ − branching fractions. A large background is present from B → J/ψ X and B → ψ(2S)X decays where J/ψ and ψ(2S) decay to ℓ + ℓ − . Here X represents a hadronic state, typically π, η, ρ, or K ( * ) . These events are removed from our data sample by rejecting any event with a value of m ℓℓ consistent with originating from a J/ψ or ψ(2S) decay. The rejected J/ψ events are useful as they provide a control sample which can be used to test the fit model. We also use these samples to estimate systematic uncertainties and to correct for differences between data and MC selection efficiencies. For the electron modes we reject events in the following regions about the J/ψ mass: 2.90 < m ℓℓ < 3.20 GeV/c 2 , or m ee < 2.90 GeV/c 2 and ∆E < m ee c 2 −2.875 GeV. For the muon modes the region is 3.00 < m µµ < 3.20 GeV/c 2 , or m µµ < 3.00 GeV/c 2 and ∆E < 1.11m µµ c 2 − 3.31 GeV. The rejection region about the ψ(2S) mass is the same for electrons and muons: 3.60 < m ℓℓ < 3.75 GeV/c 2 , or m ℓℓ < 3.60 GeV/c 2 and ∆E < m ℓℓ c 2 − 3.525 GeV. Introducing ∆E dependence into the region boundaries allows us to account for some of the effects of track mismeasurement and energy lost due to bremsstrahlung.
The largest source of background comes from random combinations of particles from continuum events or semileptonic B and D decays in BB events. Continuum events tend to be jet-like as the qq pair is produced back-to-back with relatively large momentum in the CM frame. The topology of BB decays is more isotropic as the B mesons are produced nearly at rest in the Υ (4S) rest frame. Semileptonic decays are characterized by the presence of a neutrino, e.g., missing energy in the event and non-zero total transverse momentum of the event. Due to the differences in these two background types we train separate artificial neutral networks (NNs) to reject each of them. By selecting inputs to the NNs which are independent of the final state we are able to train only one NN for each lepton flavor. We do not train separate NNs for π + , π 0 , and η. This increases the size of the training samples, improving the performance of the NNs. We train four NNs: one to reject BB background in thee + e − modes, one to reject BB background in the µ + µ − modes, one to reject continuum background in the e + e − modes, and one to reject continuum background in the µ + µ − modes.
The signal training samples were assembled from equal portions of correctly reconstructed The size of the training samples, particularly the background training samples, that could be formed from events reconstructed as one of our signal modes was a limiting factor in the performance of the NNs. To increase the available statistics the other events from the ρℓ + ℓ − and ωℓ + ℓ − modes were added to the training samples. The ρ + (ρ 0 ) was reconstructed as π + π 0 (π + π − ) with two-pion invariant mass m ππ in the range 0.455 < m ππ < 1.095 GeV/c 2 (0.475 < m ππ < 1.075 GeV/c 2 ). The ω was reconstructed as π + π − π 0 and required to have a three-pion invaraint mass lying within 50 MeV/c 2 of the nominal ω mass. No η 3π ℓ + ℓ − events were used in the training due to very low statistics for the background BB and continuum samples for these modes. For background we combined the MC data sets, either BB or continuum depending upon the classifier to be trained, from the six modes listed above and randomly select events from this data set to form the training sample. The performances of the NNs trained with samples from several different b → dℓ + ℓ − (global NNs) modes were compared with NNs trained specifically for each of our four B → πℓ + ℓ − modes (single mode NNs). The background rejection of the global NNs at a fixed signal efficiency was found to be similar to that of the single mode NNs.
The input variables to the continuum NNs are related mostly to event topology and include the ratios of Fox-Wolfram moments [31]; the cosine of the polar angle of the thrust axis [32] of the event; the cosine of the polar angle of the thrust axis of the rest-of-the-event (ROE), which consists of all particles in the event not associated with the signal B candidate; the momentum weighted polynomials L j i [33] computed using tracks and EMC clusters; the cosine of the polar angle of the B candidate momentum; and the χ 2 probability of the B candidate vertex fit. The input variables to the BB NNs reflect the effort to reject background from semileptonic B and D decays and include m ES and ∆E constructed from the ROE; total momentum of the event transverse to the beam; missing energy in the event; momentum of the ROE transverse to the beam direction; momentum of the ROE transverse to the thrust axis of the event; cosine of the polar angle of the B candidate momentum; and χ 2 probability of the B candidate and dilepton candidate vertex fits. The NN outputs show only weak correlation with the fit variables m ES and ∆E. works, shows the output of the e + e − BB NN for a sample of signal and BB background π + e + e − MC events. Also shown is the output of the µ + µ − continuum NN for a sample of signal and continuum background π 0 µ + µ − MC events. Requirements on the NN outputs are optimized for each of our eight modes to produce the lowest branching fraction upper limit. A description of the optimization procedure is given in Section V.
Due to their similarity to signal, B → K ( * ) ℓ + ℓ − decays constitute a background that mimics signal by peaking in either one or both m ES and ∆E. The b → sℓ + ℓ − transition occurs at a rate approximately 25 times greater than the SM b → dℓ + ℓ − rate, and due to particle misidentification and event misreconstruction, its contribution is expected to be of the same order as the πℓ + ℓ − signal in the BABAR data sample. In the charged pion modes, B + → K + ℓ + ℓ − peaks in m ES as π + ℓ + ℓ − signal but in ∆E near −70 MeV due to the misidentification of the kaon as a pion. There are also contributions from where one of the pions from the K 0 S decay is missed, and from B → K * (→ K + π)ℓ + ℓ − , where the pion from the K * decay is missed. In the case of K 0 S ℓ + ℓ − , the remaining pion and the two leptons are reconstructed as π + ℓ + ℓ − . For K * (→ K + π)ℓ + ℓ − , the K + is misidentified as π + and reconstructed with the two leptons as π + ℓ + ℓ − . In both cases, the decays peak in m ES like signal but at ∆E < −140 MeV due to the missing pion. For K * ℓ + ℓ − the ∆E peak occurs at even lower values due to the kaon misidentification. For B 0 → π 0 ℓ + ℓ − , there is a similar background from B 0 → K 0 S (→ π 0 π 0 )ℓ + ℓ − decays where one π 0 is reconstructed along with the lepton pair as π 0 ℓ + ℓ − . These events produce a peak in m ES in the same location as π 0 ℓ + ℓ − signal, but peak at smaller values of ∆E due to the missing pion from the decay. In all three cases ( we include a separate component in the fit model to account for the corresponding contribution. For πe + e − and η γγ e + e − , there is an additional background that originates from two-photon events, given by the process e + e − → e + e − γγ → (e + e − )qq where q is a u, , . or s quark. The background is characterized by a small transverse momentum of the pion and a large leptonlepton opening angle θ ℓ + ℓ − . There is also a correlation between the polar angles of the electron, θ e − , and of the positron, θ e + . The e − tends to be in the forward direction while the e + tends to be in the backward direction, consistent with the e + e − beam particles scattering into the detector. Events of this type are rejected using the following requirements. For π + e + e − , π 0 e + e − , and η γγ e + e − we require p * had > 750 MeV/c and N trk > 4 where p * had is the hadron momentum in the CM frame and N trk is the number of charged tracks in the event. Additionally, for π + e + e − we require E 1,neut < 1.75 GeV, cos θ ℓ + ℓ − > −0.95, and θ e − > (0.57 θ e + − 0.7 rad) where E 1,neut is the energy of the highest energy neutral cluster in the event in the Lab frame. Similarly, π 0 e + e − candidates must satisfy θ e − > (0.64 θ e + − 0.8 rad), and η γγ e + e − candidates are required to have θ e − > (0.6 θ e + − 0.55 rad) and cos θ ℓ + ℓ − > −0.95. These criteria were determined by maximizing the quantity ε/ √ N SB , where ε is the signal efficiency and N SB is the number of events lying in the sideband region 5.225 < m ES < 5.26 GeV/c 2 in data. We assume that the two-photon background in the m ES sideband occurs similarly to the two-photon background in the region m ES > 5.26 GeV/c 2 . The optimization was carried out with all other selection criteria applied, including those on the NN outputs.
To guard against possible background from B → Dπ and B → Dη decays where D → Kπ, ππ, or ηπ and the kaon or pions are misidentified as leptons, we assign the lepton candidates either a kaon or pion mass and discard any event with a combination of µ + µ − , µ ± π, or µ ± η with invariant mass in the range (1.83-1.89) GeV/c 2 . The probability of misidentifying a hadron as an electron is negligible, and this requirement is therefore only applied to the µ + µ − modes.
Hadronic decays such as B + → π + π − π + , where two pions are misidentified as muons, peak in both m ES and ∆E similarly to signal due to the relatively small difference between the pion and muon masses. This hadronic peaking background is modeled by a component in the fit. A dedicated data control sample is used to determine its normalization and shape. This sample is constructed from events where one lepton candidate passes the muon identification requirements but the other does not. The events in these samples are weighted with particle misidentification probabilities determined from control samples in BABAR data. Studies of MC samples indicate that this background is consequential only for the πµ + µ − modes.
After applying all selection criteria there are sometimes multiple candidates within a given mode remaining in an event. This occurs for approximately 20-25% (35-40%) of π + e + e − and π 0 e + e − (η γγ e + e − and η 3π e + e − ) candidates, and 5-10% (25-30%) of π + µ + µ − and π 0 µ + µ − (η γγ µ + µ − and η 3π µ + µ − ) candidates. There tend to be more events containing multiple candidates in the e + e − modes due to the bremsstrahlung recovery. For instance, there may be multiple candidates arising from the same π + e + e − combination where the bremsstrahlung photons associated with the e + or e − are different.
To choose the best candidate we construct a ratio L R from the BB and continuum NN classifier output distributions of the signal and background samples. The ratio L R is defined as is the probability that a signal candidate has a BB (continuum) NN output value of x (y). The quantities P bkg BB (x) and P bkg cont (y) are defined analogously for background events. Signal-like candidates have values of L R near 1 while more backgroundlike candidates have values near 0. If multiple candidates are present in an event, we choose the candidate with the greatest value of L R as the best candidate. For events containing multiple candidates, this procedure chooses the correct candidate approximately 90-95% of the time for πℓ + ℓ − and 75-80% of the time for ηℓ + ℓ − . The ratio L R is used only to select a best candidate.

IV. BRANCHING FRACTION MEASUREMENT AND UPPER LIMIT CALCULATION
Branching fractions are extracted through an unbinned extended maximum likelihood fit to m ES and ∆E with the fit region defined as m ES > 5.225 GeV/c 2 and −300 < ∆E < 250 MeV. The probability density functions (PDFs) in the fit model contain several components corresponding to the different contributions in the data set. To model the various components, we use a combination of products of one-dimensional parametric PDFs, two-dimensional histograms, and two-dimensional non-parametric shapes determined by a Gaussian kernel density estimation algorithm (KEYS PDF) [34]. For components that are described by the product of onedimensional PDFs, we are allowed to use such a model because m ES and ∆E are uncorrelated for these components.
A. B + → π + ℓ + ℓ − The π + ℓ + ℓ − fit model involves four components: signal, K + ℓ + ℓ − background, K 0 S /K * ℓ + ℓ − background, and combinatoric background. There is an additional component in B + → π + µ + µ − representing the B + → π + π + π − hadronic peaking background. The K + ℓ + ℓ − background arises from decays where the kaon is misidentified as a pion. The K + misidentification rate is such that the K + ℓ + ℓ − background in π + ℓ + ℓ − is approximately the same size as the expected SM π + ℓ + ℓ − signal. Since the K + misidentification probability is well measured, it is possible to measure this background contribution directly from our data. This is done by simultaneously fitting two data samples, comprised by the B + → π + ℓ + ℓ − candidates and the B + → K + ℓ + ℓ − candidates in our data set. The K + misidentification background to B + → K + ℓ + ℓ − is included in the fit at a level fixed to the B + → K + ℓ + ℓ − yield using the known misidentification probability (which depends on the momentum of the kaon). The B + → K + ℓ + ℓ − branching fraction that is measured from the simultaneous fit of the B + → π + ℓ + ℓ − and B + → K + ℓ + ℓ − data samples provides an additional validation of our procedure, since this branching fraction has been previously measured [37].
The π + ℓ + ℓ − and K + ℓ + ℓ − background m ES and ∆E distributions are modeled by products of one-dimensional PDFs. The π + ℓ + ℓ − signal and K + ℓ + ℓ − background m ES distributions are described by a Crystal Ball function [35]. The π + e + e − ∆E signal distribution is modeled by the sum of a Crystal Ball function and a Gaussian which share a common mean, while the π + µ + µ − signal and both the K + e + e − and K + µ + µ − ∆E distributions are modeled by a modified Gaussian with tail parameters whose functional form is given by where σ L and α L (σ R and α R ) are the width and tail parameters used when ∆E < µ (∆E > µ), respectively. A two-dimensional histogram models the contribution from B → K 0 S /K * ℓ + ℓ − decays. Combinatoric background is described by the product of an ARGUS function [36] in m ES with endpoint fixed to 5.29 GeV/c 2 and a secondorder polynomial in ∆E. The π + µ + µ − hadronic peaking background component is modeled by a two-dimensional KEYS PDF [34].
The PDF fit to the K + ℓ + ℓ − sample contains a similar set of components. Signal K + ℓ + ℓ − distributions are modeled by the product of a Crystal Ball function in m ES and the line shape of Eq. 6 in ∆E. The contribution from other b → sℓ + ℓ − decays is dominated by B → K * (K + π)ℓ + ℓ − where the pion is lost. We use a two-dimensional histogram to model this background. Combinatoric background is modeled by the product of an ARGUS distribution in m ES , and by an exponential function for K + e + e − and a second-order polynomial for K + µ + µ − in ∆E. A KEYS PDF models the hadronic peaking background in K + µ + µ − .
In both the π + ℓ + ℓ − and K + ℓ + ℓ − PDFs, the signal and combinatoric background yields float along with the shapes of the combinatoric background PDFs. The K + ℓ + ℓ − background yield in the π + ℓ + ℓ − sample is constrained so that the B + → K + ℓ + ℓ − branching fractions measured in the π + ℓ + ℓ − and K + ℓ + ℓ − samples are equal. All fixed shapes and yields are determined from exclusive MC samples except for the hadronic peaking background which uses a data control sample. Normalizations of the K 0 S /K * ℓ + ℓ − component of the π + ℓ + ℓ − PDF and of K * ℓ + ℓ − component in the K + ℓ + ℓ − PDF are fixed from efficiencies determined from MC samples and world average branching fractions [37].
The B 0 → π 0 ℓ + ℓ − signal distribution is modeled by the product of a Crystal Ball function in m ES and by the line shape given in Eq. 6 in ∆E. Background from B 0 → K 0 S (→ π 0 π 0 )ℓ + ℓ − decays is modeled by a two-dimensional histogram. The product of an ARGUS shape in m ES with an exponential function in ∆E models the combinatoric background distribution. As in the π + µ + µ − and K + µ + µ − PDFs, there is an additional component in the π 0 µ + µ − fit model devoted to hadronic peaking background which is described by a KEYS PDF.
In the fit, only the signal π 0 ℓ + ℓ − and combinatoric background yields along with the ARGUS slope parameter and argument of the exponential float. The signal and K 0 S (→ π 0 π 0 )ℓ + ℓ − shapes are determined from fits to MC samples, and the K 0 S (→ π 0 π 0 )ℓ + ℓ − normalization comes from efficiencies taken from MC samples and world average branching fractions [37]. The shape and normalization of the peaking hadronic component are determined from a data control sample.
The ηℓ + ℓ − fit model is simple, consisting of only three components, and is the same for all four ηℓ + ℓ − channels. The signal component is modeled by the product of a Crystal Ball function in m ES and the line shape of Eq. 4 in ∆E. We include a component for events containing a signal decay where the signal B is incorrectly reconstructed, which we refer to as self-cross-feed. In these events the signal decay is typically reconstructed as a combination of particles from the B decaying to our signal mode and the other B. In most self-cross-feed events the dilepton pair is correctly reconstructed and the hadron is misreconstructed. The self-cross-feed contribution is represented by a two-dimensional histogram and its normalization is a fixed fraction of the signal yield with the fraction determined from signal MC. The self-cross-feed-to-signal ratio varies from 0.1-0.15 for the η γγ channels to 0.25-0.3 for the η 3π channels. Combinatoric background is described by the product of an ARGUS function in m ES and an exponential function in ∆E. From studies of MC samples, we find no indication of potential peaking background contributions from b → sℓ + ℓ − decays or any other sources. The η γγ ℓ + ℓ − yield and the η 3π ℓ + ℓ − yield are constrained in the fit to be consistent with the same B 0 → ηℓ + ℓ − branching fraction. The signal yield, combinatoric background yield, ARGUS slope and exponential argument float in the fit. All other parameters are fixed from MC samples.

D. Lepton-flavor averaged and isospin averaged fits
In addition to branching fraction measurements and upper limits for the B → πℓ + ℓ − and B 0 → ηℓ + ℓ − modes we also present lepton-flavor averaged, isospin averaged, and lepton-flavor and isospin averaged results. The lepton-flavor averaged measurement of B(B + → π + ℓ + ℓ − ) is the branching fraction obtained from a simultaneous fit to the π + e + e − and π + µ + µ − samples subject to the constraint B(B + → π + e + e − ) = B(B + → π + µ + µ − ). Here we have neglected the difference between the electron and muon masses. The measurements of B(B 0 → π 0 ℓ + ℓ − ) and B(B 0 → ηℓ + ℓ − ) are subject to a similar set of constraints and are determined in an analogous way. The isospin averaged branching fraction B(B → πe + e − ) is the measured value of B(B + → π + e + e − ) after simultaneously fitting the π + e + e − and π 0 e + e − samples subject to the constraint B(B + → π + e + e − ) = (τ B 0 /2τ B + )B(B 0 → π 0 e + e − ) where τ B 0 and τ B + are the mean lifetimes of the neutral and charged B mesons, respectively [37]. An analogous expression is applied for the B(B → πµ + µ − ) measurement. The lepton-flavor and isospin averaged measurement of B(B → πℓ + ℓ − ) is the value of B(B + → π + ℓ + ℓ − ) determined from a simultaneous fit to all four samples subject to both the lepton flavor and isospin constraints listed above.

E. Upper limit calculation
We set upper limits on the branching fractions following a method which utilizes the profile likelihood. Upper limits at the α confidence level (CL) are set by scanning the profile likelihood λ as a function of the signal branching fraction to determine where −2 ln λ changes by α percentile of a χ 2 random variable with one degree of freedom. For α = 0.9 we look for a change in −2 ln λ of 1.642. If the measured branching fraction is negative, we begin our scan from zero rather than the minimum [38]. This is a conservative approach that always produces physical, i.e., non-negative, upper limits, even in the case of low statistics. Systematic uncertainties are incorporated into the limit by convolving the profile likelihood with a Gaussian distribution whose width is equal to the total systematic uncertainty.

V. OPTIMIZATION OF SELECTION
We simultaneously optimize the selection criteria for the two NN outputs and the PID selection criteria for the charged pions and leptons. BABAR employs algorithms which use outputs from one or more multivariate classifiers to identify charged particle species. A few (3)(4)(5)(6) standard selections on the outputs of these algorithms are used to identify particles of a given species with different efficiencies. Greater identification efficiencies typically imply greater misidentification rates. Due to this trade-off, it is not clear a priori which selection is best for a particular analysis. Therefore for each charged particle type (e − , µ − , π + ) we optimize the PID requirements for the leptons and pions along with the NN output criteria.
For the optimization we assume that B → πℓ + ℓ − and B 0 → ηℓ + ℓ − occur near the center of the branching fraction ranges expected in the SM. Under this assumption, no statistically significant signal is expected, and the selection is optimized to produce the smallest branching fraction upper limit. We divide the BB and continuum NN output space into a grid and generate 2,500 parametrically simulated data sets per grid point according to our fit model. Each simulated data set is fit, and a branching fraction upper limit is calculated. The figure of merit (FOM) for each point is the average branching fraction upper limit determined from the 2,500 data sets, and we take the combination of PID and NN output selection producing the smallest FOM as optimal.
The results of the optimization show that the upper limits are rather insensitive to the PID selection. Also, in the two-dimensional NN output space, there is a region about the optimal selection where the FOM changes slowly, giving confidence that our optimization procedure is robust because the expected limits do not depend critically on the NN selection requirements.
The e + e − (µ + µ − ) modes use the same electron (muon) selection, while more efficient charged pion selection is favored for π + e + e − and η 3π e + e − than π + µ + µ − and η 3π µ + µ − . Tighter selection is favored on the continuum NN output than the BB NN output. The optimization favors looser requirements for the ηℓ + ℓ − modes as the size of the background in these channels is much smaller than for πℓ + ℓ − .

VI. FIT VALIDATION
We validate our fit methodology in three ways: (1) generating an ensemble of data sets from our fit model and fitting them with the same model ("pure pseudoexperiments"), (2) generating and fitting an ensemble of data sets with signal events from the BABAR MC simulation embedded into the data set ("embedded pseudoexperiments"), (3) extracting B → J/ψ π and B 0 → J/ψ η branching fractions from the BABAR data sample.
From our studies of both pure and embedded pseudoexperiments, we find no significant source of bias in our fit. Distributions of branching fractions and their errors obtained from fits to these data sets are consistent with expectations.
Measuring the B + → J/ψ π + , B 0 → J/ψ π 0 , B 0 → J/ψ η, and B + → J/ψ K + branching fractions in the control sample of vetoed charmonium events allows us to validate our fit methodology on data. We employ the same fit model to extract these branching fractions as we do for the π + ℓ + ℓ − , π 0 ℓ + ℓ − , K + ℓ + ℓ − , and ηℓ + ℓ − branching fractions. Fixed shape parameters and yields are determined through fits to exclusive MC samples. We find that all measurements are in good agreement with world averages [37].

VII. SYSTEMATIC UNCERTAINTIES
Systematic uncertainties are included in the branching fraction upper limit calculation by convolving the profile likelihood with a Gaussian whose width is equal to the total systematic uncertainty. The systematic uncertainties are divided into "multiplicative" uncertainties, which scale with the true value of the branching fraction, and "additive" uncertainties, which are added to the true value of the branching fraction, independent of its value.

A. Multiplicative uncertainties
We list the sources of multiplicative systematic uncertainty below and their assigned values for each of the πℓ + ℓ − and ηℓ + ℓ − signal modes in Table I. The systematic uncertainty in the measured number of BB pairs is estimated to be 0.6% [39].
The difference between the π 0 reconstruction efficiency in data and MC has been studied in τ + τ − decays where one τ decays via the channel τ ± → e ± νν and the other τ decays via τ ± → π ± ν or τ ± → ρ ± ν with ρ ± reconstructed as π ± π 0 . The τ ± → ρ ± ν yields are roughly proportional to the product of the π ± and π 0 reconstruction efficiencies, while the τ ± → π ± ν yields are proportional to the π ± reconstruction efficiencies. A correction proportional to the ratio of the τ → ρν to τ → πν yields in the data and MC samples is applied to better reproduce the data reconstruction efficiency in MC simulation. The uncertainty due to this correction is estimated as 3.0% per π 0 . We take the uncertainty in the η γγ reconstruction efficiency associated with this correction to also be 3.0% per η γγ .
A correction to the MC tracking efficiency was developed from the study of τ + τ − decays where one τ has a single charged daughter (1-prong decays) allowing the event to be identified as a τ + τ − event and the other τ has three charged daughters (3-prong decays). By measuring the event yields where the 3-prong τ has either two or three tracks reconstructed, the track reconstruction efficiency can be measured. This efficiency can be used to correct the MC to match the efficiency measured in data. The systematic uncertainty associated with this correction is estimated to be 0.3% per charged track taken to be 100% correlated among tracks in the event.
We correct for the difference between the lepton PID selection efficiencies in data and MC by measuring the B + → J/ψ K + yields in data and J/ψ K + MC control samples with and without the PID selection requirements applied to both leptons. The ratios of the yields are used to correct the lepton particle identification selection efficiency derived from MC to match data. The error on the correction is taken as the associated systematic uncertainty, which ranges from 1.3-1.5%. The available statistics in the samples used to calculate the correction determine the size of the error which is associated with it.
In an analogous procedure, we correct for the difference between the charged pion PID selection efficiency obtained by measuring signal yields in high statistics B 0 → J/ψ K * 0 (→ K − π + ) data and exclusive MC control samples with and without pion PID selection criteria applied. A correction is derived and the error on the correction is taken as the associated systematic uncertainty. These uncertainties are approximately 2.5% and 3.5% for e + e − and µ + µ − modes, respectively.
The high statistics of the B + → J/ψ K + data and MC control samples are again exploited to derive a correction for the NN output selection efficiency on MC. The J/ψ K + signal yields were measured with only the BB NN output selection applied, only the continuum NN output selection applied, and with both selections applied. The error on the correction is taken as the associated I: Multiplicative systematic uncertainties for the πℓ + ℓ − and ηℓ + ℓ − modes. The lepton and π ± PID and NN output selection efficiency correction uncertainties are determined using J/ψ K ( * ) control samples, while the tracking and π 0 /ηγγ efficiency correction and B counting uncertainties are taken from dedicated BABAR studies. The total uncertainty is the sum in quadrature of the individual uncertainties.
We conservatively vary the Wilson coefficients C 7 , C 9 , and C 10 from their nominal values of −0.313, 4.344, and −4.669, respectively, by a factor of ±2 (e.g., C 7 is varied to −0.157 and −0.616) and generate new simulated samples with all possible combinations of the varied Wilson coefficients. For each varied sample we apply the full event selection and calculate the efficiency for that set of Wilson coefficients, taking the largest relative difference between the varied Wilson coefficient efficiencies and the efficiency of our default model as the associated systematic uncertainty.
Simulated MC samples using several different formfactor models were generated. Ultimately, we compare the efficiency from our default model with the efficiency calculated from the "Set 2" and "Set 4" form-factor models of Ref. [40]. The maximum relative difference between our default model efficiency and the efficiency obtained with the "Set 2" and "Set 4"form-factor models is taken as the associated systematic uncertainty. There is a large variation in this uncertainty from one mode to another. The source of this effect is due in part to the correlation between the NN output and q 2 . Selection on the NN outputs is mode dependent and therefore changes the q 2 dependence of the efficiency. Also, for the modes π + e + e − , π 0 e + e − , and η γγ e + e − we require that the hadron momentum be greater than 750 MeV/c in the CM frame. The hadron momentum is highly correlated with q 2 . Removing events with small hadron momentum also removes events with large q 2 . Differences at large q 2 between the differential branching fractions calculated using the default and alternative models lead to greater sensitivity to the choice of form-factor model, and therefore larger uncertainties associated with the choice of model.
The uncertainty in the efficiency due to the size of the simulated MC samples is less than 0.1% and is negligible.

B. Additive uncertainties
We consider the following sources of additive systematic uncertainty with their values given in Table II.
The fixed parameters of the πℓ + ℓ − , ηℓ + ℓ − , and K + ℓ + ℓ − signal and the K + ℓ + ℓ − background PDFs are varied individually within the errors obtained from fits to exclusive MC samples, and the data sample is re-fit. For simultaneous fits we additionally vary the efficiencies within their uncertainties, and for ηℓ + ℓ − we vary the self-crossfeed-to-signal ratio by ±10%. The size of the variation is arbitrary but conservative enough since the number of expected self-crossfeed events is at most 0.15. The difference between the branching fraction from this fit and that from the nominal fit is taken as the associated systematic uncertainty. We take the largest change in the branching fraction as the systematic uncertainty associated with each fixed quantity. The uncertainties from individual variations are summed in quadrature.
Non-parametric PDFs include the two-dimensional histograms and KEYS shapes. We vary the binning of the two-dimensional PDFs, increasing and decreasing them by a factor of two. The data sample is re-fit, and we take the largest change in the branching fraction as the associated systematic uncertainty. For the KEYS shapes, we increase and decrease the width of the Gaussian kernel used to generate the shapes and take the largest change in the branching fraction as the associated systematic uncertainty. If there are multiple non-parametric PDFs, we add their associated uncertainties in quadrature.
The hadronic peaking background yields are fixed from the control sample of hadronic decays and are varied within their statistical uncertainties. The data sample is re-fit, and we take the largest change in the branching fraction as the associated systematic uncertainty.
We fix the K 0 S /K * ℓ + ℓ − yield in the π + ℓ + ℓ − PDF, the K * ℓ + ℓ − yield in the K + ℓ + ℓ − PDF, and the K 0 S ℓ + ℓ − yield in the π 0 ℓ + ℓ − PDF. These values are determined from efficiencies taken from exclusive MC samples and II: Additive systematic uncertainties for the B → πℓ + ℓ − and B 0 → ηℓ + ℓ − channels. The total uncertainty is the sum in quadrature of the individual uncertainties. All uncertainties are given in units of 10 −8 .

Mode
π + e + e − π 0 e + e − π + µ + µ − π 0 µ + µ − ηe + e − ηµ + µ − the current world average branching fractions for these modes [37]. We vary the yields according to the errors on their branching fractions and re-fit the data sample. The change in the branching fraction from its nominal value is taken as the associated systematic uncertainty. In Table II, these uncertainties are classified as "Non-hadronic peaking bkg yields".

VIII. RESULTS
We extract branching fractions by fitting the data set with the fit model described in Section IV. Projections of the PDFs and data sets in m ES and ∆E are shown for the isospin averaged B → πe + e − fit, the lepton-flavor averaged B 0 → ηℓ + ℓ − fit, and the isospin and leptonflavor averaged B → πℓ + ℓ − fit in Figs. 3-5, respectively. Figure 6 shows −2 ln λ as a function of the branching fraction for the πℓ + ℓ − , π + ℓ + ℓ − , π 0 ℓ + ℓ − ηℓ + ℓ − , ηe + e − , and ηµ + µ − measurements. Branching fraction measurements and upper limits at 90% CL are given in Table III for each mode.
As a cross-check, we measure the B + → K + ℓ + ℓ − branching fractions and find them consistent with the current world averages [37].