Origin of Metals around Galaxies I: Catalogs of Metal-line Absorption Doublets from High-Resolution Quasar Spectra

We present the first paper of the series Origin of Metals around Galaxies (OMG) aimed to study the origin of the metals observed in the circumgalactic and intergalactic media. In this work we extract and build the catalogs of metal absorbers that will be used in future analyses, and make our results publicly available to the community. We design a fully automatic algorithm to search for absorption metal-line doublets of the species CIV, NV, SiIV and MgII in high-resolution ($R\gtrsim30\,000$) quasar spectra without human intervention, and apply it to the high-resolution and signal-to-noise ratio spectra of 690 quasars, observed with the UVES and HIRES instruments. We obtain $5\,656$ CIV doublets, $7\,919$ doublets of MgII, $2\,258$ of SiIV, and 239 of NV, constituting the largest high-resolution metal-doublet samples to date, and estimate the dependence of their completeness and purity on various doublet parameters such as equivalent width and redshift, using real and artificial quasar spectra. The catalogs include doublets with rest-frame line equivalent widths down to a few ${\rm m\AA}$, all detected at a significance above 3$\sigma$, and covering the redshifts between $1


INTRODUCTION
Metals are precise probes for revealing the physical properties of the medium that they inhabit, ranging from small regions within galaxies out to the largest scales of the intergalactic medium (IGM), and from the present day back to the redshifts of cosmic reionization (e.g., Tytler et al. 1995;Songaila & Cowie 1996;Pettini et al. 2001).
Despite their importance, our knowledge on the origin of metals is restricted to those detected in massive galaxies and their immediate surroundings, where local star formation and feedback processes create and distribute them, respectively. It remains unclear whether the main polluting mechanism of the IGM are powerful feedback processes from distant massive and bright galaxies, or the in situ enrichment by (currently) undetectable smallgalaxy populations embedded in such a medium (Pettini 2004;Bertone et al. 2005;Bouché et al. 2006;Bouché et al. 2007;Pratt et al. 2017). Adelberger et al. (2003) proposed that galactic superwinds from massive galax-ies were the mechanism responsible for the abundance of CIV absorption systems detected around z = 3 Lyman Break Galaxies, but Porciani & Madau (2005) argued that an early metal pollution driven by dwarf galaxies at 6 < z < 12 was a more likely scenario. The latter origin was also supported by the observations of Simcoe (2011), suggesting that half of the intergalactic metals observed at z ∼ 2.4 were produced at higher redshifts, between z ∼ 4.3 and z ∼ 2.4, consistent with the enrichment by small galaxies (with dark matter halos of masses M h 10 11 M ) found in the simulations by Wiersma et al. (2010). More recent simulations by Oppenheimer et al. (2012) also favored an IGM enriched by similar lowmass galaxies through supernova feedback, arguing that metals produced in massive systems might not be able to escape the galactic environment (see also Hayward & Hopkins 2017).
A powerful method to probe the origin of metals around galaxies and in the IGM is the study of their clustering signature (e.g., Adelberger et al. 2005;Scannapieco et al. 2006;Wild et al. 2008;Martin et al. 2010;Pérez-Ràfols et al. 2015). In detail, the clustering of cosmic structure follows that of the underlying dark-matter density field as ξ(r) b 2 ξ DM (r), where ξ(r) and ξ DM (r) denote the distance dependent clustering (i.e., the twopoint correlation function) of structure and dark matter, respectively, and the term b is the so called bias factor (Sheth & Tormen 1999;Tinker et al. 2010). This bias depends on the mass of the dark matter halo hosting the structure of interest (Zehavi et al. 2005), the redshift, and the spatial scale, although in the linear regime (large scales) the scale dependence is small (Peebles 1980;Peacock & Dodds 1996). One expects the metals and the galaxy population that produces them to inhabit the same dark matter halos and show the same clustering, and therefore, to also have the same bias factor, although the evolution of a population of halos of fixed mass mod-ifies their bias factors if the metals are observed a long time after they were produced. In conclusion, the measurement of the bias factor of the intergalactic metals can be used as an indicator of their progenitor galaxies.
Shedding light on the sources responsible for enriching the IGM through the measurement of the metal bias factor is the main goal of our project Origin of Metals around Galaxies (OMG). We aim to obtain the metal bias by spatially cross-correlating the intergalactic metals found in a specifically built sample of quasar spectra, with the Lyman-alpha (Lyα) forest from the quasar spectra in the DR12 catalog (DR12Q, Pâris et al. 2017) of the BOSS/SDSS-III Collaboration (Eisenstein et al. 2011;Dawson et al. 2013;Alam et al. 2015). In principle, we could calculate the auto-correlation function of the metals alone, but the number of detectable metal absorbers is small (a few thousands) and this would result in weak constraints. Since the number of useful pixels in the BOSS quasar spectra denoting the Lyα forest is large (∼ 27 millions; Busca et al. 2013), the cross-correlation between the two tracers is a better choice. Overall, this cross-correlation will equal ξ metal−Lyα b metal b Lyα ξ DM , which is the product of the clustering of the underlying dark matter density field, the bias factor of the Lyα forest b Lyα , and the bias factor of the population of metal absorbers. The first term can be calculated from the theory of structure formation within the Lambda Cold Dark Matter (ΛCDM) cosmological model (e.g., Smith et al. 2003), and the Lyα forest bias has been constrained in the recent works by Blomqvist et al. (2015), Delubac et al. (2015), and Bautista et al. (2017), so that the bias factor of the metal population can be determined from the clustering.
For the calculation of the IGM metal bias, we will use weak CIV λλ1548,1550 metal absorption doublets detected in high-resolution quasar spectra, motivated by the following aspects: (i) we need the largest possible number of metal systems spatially overlapping with the Lyα forest data in order to reduce the statistical uncertainties of the measurements and obtain tight constraints on the bias factor. CIV is the optimal ion for this purpose due to the large cosmic abundance of carbon, which typically results in the detection of several CIV doublets in one quasar spectrum, the exact number depending on the signal-to-noise ratio and quasar redshift. Furthermore, the doublet nature of the absorption systems facilitates their identification, and the value of the CIV rest-frame wavelength enables covering a broad redshift range when the doublets are identified in ultraviolet spectra. (ii) the requirement of weak absorption systems arises from the fact that the strength of the absorption lines is broadly correlated with the metal content (e.g., Mas-Ribas et al. 2017) and, in turn, the mass of the host medium. We expect strong absorbers to trace massive galaxies instead of the IGM. In Mas-Ribas et al. (2017), we found a mean rest-frame equivalent width of W r ∼ 0.43Å for the CIV λ1548 line of the carbon doublet by stacking the spectra of ∼ 27 000 Damped Lyman Alpha systems (DLAs; see Wolfe et al. 2005, for a review), a type of objects associated to galaxies with dark matter halo masses in the range 8.7 log(M h /M ) 11.8 (Pérez-Ràfols et al. 2018, see also Font-Ribera et al. 2012. Furthermore, Vikas et al. (2013) argued that CIV systems with W r > 0.28Å and up to W r ∼ 5Å inhabit dark matter halos with masses of log(M h /M ) 11.3 − 13.4, and Gontcho et al. (2017) recently found indications suggesting that weaker absorbers may represent smaller halos when comparing their findings with those by Vikas et al 10 . In view of these results, it seems plausible to consider CIV doublets with W r 0.3Å to be tracers of massive galaxies and not desirable for our purposes, although the exact threshold should be revisited when doing the cross-correlation. (iii) the highest possible purity (> 90%) is needed for obtaining a reliable bias factor value because each false-positive detection contributes to the bias calculation with a null value that, overall, results in an underestimation of the true result. Summarizing, we need to search for a large number of CIV doublets with small equivalent widths, while ensuring the minimum number of false-positive detections, which is only possible with spectra of very high resolution and signalto-noise ratio. This metal search is the purpose of this first paper of the OMG series.

Observational Requirements
The first aspect of our calculations is obtaining a large enough sample of high-resolution quasar spectra. In Pérez-Ràfols et al. (2018), we recently cross-correlated ∼ 14 000 DLAs (dataset A) with the Lyman alpha forest sample by Busca et al. (2013), which resulted in onesigma uncertainties of around 10% for a bias factor value of b DLA ∼ 2. Assuming a similar bias factor for CIV and DLAs, and that we can detect ten CIV systems in each quasar spectrum on average, consistent with our results presented below, obtaining similar constraints would require the search of 1 400 high-resolution and signal-tonoise ratio quasar spectra. In practice, however, not all the detected CIV systems will be weak, which results in the need for even a larger number of spectra. As we will show below, our sample consists of 690 quasars with high-resolution (R > ∼ 30 000) spectra and median signalto-noise ratios above 10. Despite this number of spectra being lower than our rough estimate, it represents the largest compilation of quasar spectra of this resolution to date, followed by the partially overlapping sample of 602 quasars by Mathes et al. (2017), who investigated properties of MgII absorbers. The use of our quasar sample will constitute a strong improvement compared to previous works, which were only able to use a large number of spectra for the case of low-resolution ( ∼ 2000, typical of the BOSS quasar spectra) and low signal-to-noise ratio, and, therefore, could only identify the strong absorbers that probe the vicinity of massive galaxies.
Since our methodology for searching quasar spectra can be applied to metal doublets of different species in a similar fashion, we perform calculations to obtain doublet catalogs of CIV, SiIV, NV, and MgII, and make these catalogs publicly available at https://github. com/lluism/OMG, which will enable a large number of additional metal studies by the community. We empha-10 The aforementioned findings are also supported by studies of MgII absorbers at redshifts z 2; Mathes et al. (2017) argued that weak and strong MgII absorbers represent different objects given their different evolution with redshift, and Kacprzak et al. (2011) suggested that weak MgII absorbers trace cold filaments that are being accreted onto the galaxies. size that, given our purpose, we will pay special attention to building catalogs with high purity, and will be less concerned about completeness. This paper is structured as follows. In § 2 we present the quasar data and, in § 3, we detail the search code designed to find the doublets. The capabilities of the code are tested in § 4, where we estimate the completeness and purity expected from the search. The resulting metal-doublet catalogs are presented in § 5, before concluding in § 6.

QUASAR SPECTRA
Our data is composed of four high-resolution quasar spectra datasets: we use the continuum normalized spec-  et al. 2017) 11 . We also consider 414 and 71 quasars with continuum normalized spectra from observations with the Ultraviolet and Visual Echelle Spectrograph (UVES; Dekker et al. 2000) at the Very Large Telescope (VLT), and the High Resolution Echelle Spectrometer (HIRES; Vogt et al. 1994) at Keck, respectively, (Murphy et al. 2003;King et al. 2012, Murphy et al. 2018. Table 1 indicates the number of quasars for each sample, their median quasar redshift, and the median signalto-noise ratio of all the spectra considering their entire wavelength range. The upper and lower panels of Figure  1 display the quasar redshift and spectral signal-to-noise ratio distributions, respectively, for the four samples, and the dashed vertical lines denote the median of each distribution. The Kodiaq and HIRES samples were both built from studies using the same instrument (i.e., HIRES spectrograph), but we name the samples in this way because they contain different spectra. Some quasars are observed and contained in more than one sample, but we consider them all for the search, because different observations cover different wavelength ranges, and some parts of the spectrum with no detection or high noise in one sample are clearly detectable in another one. In practice, there are 690 different quasars in the samples. We will look for repeated doublets after the search, and will keep those with highest significance in the detection. In all cases, we only use the data outside the Lyα forest, i.e., redward the Lyα emission line of the quasars.

Quasar Continuum
The continuum (unabsorbed) spectral energy distribution (SED) of each quasar is used to obtain its normalized spectrum as where F λ , E λ and C λ are the observed quasar flux, uncertainty, and continuum at a given wavelength (pixel), respectively, and f λ and σ λ are the normalized flux and uncertainty.
For the KODIAQ data, we use the continua provided along with the observed spectra and computed as described in O' Meara et al. (2015). Briefly, the spectra are continuum-fitted by hand, using Legendre polynomials of order generally between 4 and 12, depending on the specific characteristics of every spectrum. The continua for the UVES and HIRES spectra were obtained by -Wavelength range of a quasar spectrum in the UVES sample. The black line denotes the normalized spectrum, and the cyan line is the value 1-σ λ at every pixel. The yellow line denotes the gaussian-convolved flux, and the red lines the candidate absorption features. The regions pertaining to a CIV doublet, between ∼ 7345Å and ∼ 7366Å, are represented as shaded red areas, and listed as three doublets in our resulting catalog, although they may belong to the same absorption system. The horizontal dashed line marks the position of the unabsorbed continuum, i.e. a transmission of 100%.
fitting spline curves to regions that are free from absorption features (as described in Bagdonaite et al. 2014 andRiemer-Sørensen et al. 2015). For the UVES spectra, a 6 to 8 order Chebyshev polynomial was generally used. Because our analysis uses only continua outside the Lya forest, where absorption lines are rare, we expect the impact of continuum uncertainty to be negligible, but we perform a more quantitative test in § 4.3.

BLIND DOUBLET SEARCH CODE
We design a code to detect metal absorption doublets in quasar spectra, based on the public code by Cooksey et al. (2008Cooksey et al. ( , 2010 , which has proven to be efficient for this purpose (e.g., Cooksey et al. 2011Cooksey et al. , 2013Seyffert et al. 2013). Contrary to the Cooksey code, our code is built to be fully automatic, without the need for human intervention, which allows searching a large number of spectra in a short time.
In § 3.1 below, we describe the code focusing on the procedure for the search and identification of CIV λλ1548,1550 doublets, although the code also searches for the additional doublets SiIV λλ1393,1402, NV λλ1238,1242 and MgII λλ2796,2803, simultaneously. In § 3.2, the requirements that the doublet candidates have to accomplish in order to be considered real and included in the final catalogs are presented.

Automatic Metal Doublet Search
The code starts by correcting possible outliers in the spectrum: the normalized pixel fluxes are limited to the values −σ λ ≤ f λ ≤ 1+σ λ . Pixels with flux values outside this range are reset to the upper/lower limit. Additionally, we limit the flux between −1 ≤ f λ ≤ 2 to account for unreasonable flux values usually associated with extremely large uncertainties.
After these corrections, we convolve the spectrum using a Gaussian kernel with a FWHM of three times the FWHM of the instrument, and group adjacent pixels whose Gaussian-convolved flux is f G ≤ 1 − σ λ , also including the pixels outside this group that reside within one kernel sigma from both group sides. Only groups 12 http://guavanator.uhh.hawaii.edu/~kcooksey/ igmabsorbers.html 13 http://cvs.ucolick.org/viewcvs.cgi/xidl/HST/CIV/ ?cvsroot=Prochaska with a minimum number of pixels between three and five, depending on the sample, are considered. Each of these groups then represents an absorption candidate within a range defined by a minimum and maximum wavelength, λ l and λ h , respectively. Figure 2 displays an example of the search described above for an arbitrary wavelength interval and a spectrum from the UVES sample. The black line denotes the normalized flux and the cyan line the value 1-uncertainty at every pixel (1-σ λ ). The yellow line represents the Gaussian convolved flux, and the red line denotes the candidate absorption features whose convolved flux resides below the 1-σ λ line.
Under the assumption that these candidates are CIV λ1548 absorption lines (or the shortest wavelength line of the doublet for other species), we calculate their redshift and equivalent width as follows. The absorption redshift is calculated as (Cooksey et al. 2010) where λ r is the rest-frame wavelength of the metal line, and λ i is the observed wavelength of the pixel i. The logarithmic term denotes the pixel optical depth and acts as a weight, i.e., pixels with large optical depth will contribute more to the redshift calculation. For the calculation of the weights, we limit the minimum flux to f λ ≥ (0.2σ λ > 0.05) to avoid extremely large values of the optical depth where the line saturates (Cooksey et al. 2010). The rest-frame equivalent width and its uncertainty from error propagation equate where δλ i is the wavelength interval between pixels. We use the redshift measurements to calculate the expected position of the associated CIV λ1550 lines (or the higher wavelength lines of the other doublets). In practice, the actual position of the associated lines fluctuates around the calculated value due to the effects of blends with other lines and noise, and to the fact that we imposed a flux limit for the weights in the redshift calculation. A match is accepted when the velocity offset between the observed and calculated positions of the associated lines is |δ v | ≤ 7 km s −1 . This value, as those of other parameters, is set to maximize the number of detections while avoiding false positive detections (see § 4).
Our code also computes the column density of each feature as in Cooksey et al. (2008), using the Apparent Optical Depth (AOD) method by Savage & Sembach (1991). However, we do not present these measurements because this method is valid only when the absorption lines are unsaturated (Fox et al. 2005, although see also Cooksey et al. 2008), which occurs only for ∼ 50 − 60% of the doublets.

Doublet Acceptance Criteria
All the previous candidate doublets are finally considered real if they accomplish all the following requirements (with the corresponding values for each species): 1. The redshift of the metal absorption lines is below that of the quasar, z abs ≤ z em , where z em is the redshift of the quasar.
2. Both lines in the doublet reside outside the Lyα forest region.
3. The equivalent width ratio between the doublet lines, R W , is within the range 1 − σ R W ≤ R W ≤ 2 + σ R W , where σ R W is the uncertainty of the ratio. The equivalent width ratio and uncertainty from error propagation are calculated as (Cooksey et al. 2010) These limits denote the extreme cases of completely saturated, R W = 1, and unsaturated lines, R W = 2.
4. The significance of the detection for the strongest doublet line is W/σ W ≥ 3.
The red shaded regions of Figure 2 illustrate three separate CIV doublet features, a central component at z ∼ 3.747, plus two more within a velocity offset ranging between 100 − 200 km s −1 . We include these features in the final catalog as three different doublets, thus allowing to either group and treat them as a single absorption system or individually. There is another CIV doublet at the positions ≈ 7320Å and ≈ 7332Å (z ∼ 3.728) not identified by the code. The strong absorber centered at ≈ 7335Å is blended enough with the rightmost wavelength line of the doublet so that the code considers them as a single feature. This results in the determination of a higher (incorrect) redshift for the redder doublet feature compared to that of the blue one, and, especially, fails the requirement of the equivalent width ratio (condition number 3 above).

CODE AND DATA VALIDATION
We present below a series of tests to assess the capabilities of our search code and the validity of our results. The impact of false positive detections on the purity is assessed with the real spectra in § 4.1, and we make use of mock spectra to further test the completeness in § 4.2, and the effects of the quasar continuum placement in § 4.3.

Searching for False Positives in Real Spectra
We run our search code on the four quasar samples setting now the theoretical separation between the two lines of the doublets to a slightly larger value than the true one. This implies that all the doublets identified by the code in this case will be false detections, and will provide a robust estimation for the purity of the results.
We set the distances between the lines of the doublet to values a few Angstrom larger than the theoretical ones, the exact value depending on the species, and avoiding coincidences of these new separations with the values for doublets of other species. Because the separation in wavelength between lines changes with redshift, it is possible that for a specific redshift the separation of one species matches the value of the separation of another species at a different redshift. However, this also happens in the real case, and so we enable this possibility in the analysis. We have tested that changing the separation from a few to a couple of tens of Angstrom produces negligible differences in the test results.
After obtaining the catalogs of false positives, we compute the purity as 1 − false / real, where false and real refer to the number of false detections, and detections in the real search (with the true doublet line separation), respectively. Figure 3 shows the results for every species (from left to right panels), binned in equivalent width, redshift, and significance of the detection (from top to bottom panels), using the values of the lower wavelength lines in the doublets. The colored solid lines denote each of the four samples, and the dashed black line represents the median of the four. The middle panels indicate that there is no correlation between the purity and the redshift of the absorbers, whilst the upper panels show an increase of purity with equivalent width. The equivalent widths where a purity above 90% is reached varies strongly with the species and the sample. The significance of the detection (bottom panels) appears as the most robust observable since the trends for the different samples present a narrow scatter (dispersion) around the median value. For CIV and MgII, a median purity of > 90% is reached at W/σ W ∼ 32, and at W/σ W ∼ 10 is reduced to ∼ 60%. For SiIV, a W/σ W 100 is required to reach a median purity above 90%, similar to the case of NV, although in the latter case the number of detections is too small to establish firm conclusions.

Searching Realistic Mock Spectra
We perform tests creating and analysing mock spectra that precisely reproduce the spectra from our different real-quasar data sets. The results are presented for the four species (from left to right panels) and equivalent width, redshift, and significance of the detection (from top to bottom panels), using the values of the lower wavelength lines in the doublets. The colored solid lines denote each of the four samples, and the dashed black line represents the median of the four. The number of bins ranges between 15 − 25 for visualization purposes, and the scales are logarithmic for the cases of the equivalent widths (top panels) and significance (bottom panels), and linear for the redshifts (middle panels). The best indicator for the purity is the significance of the detection (bottom panels), showing purities above 90% at W/σ W ∼ 32 for CIV and MgII.

Generation of Mock Spectra
To create mock spectra, we use the publicly available code QSOSim10 14 (Milakovic et al. 2017), designed to simulate quasar spectra from the 10th Data Release quasar catalog of the BOSS survey (DR10Q; Pâris et al. 2014). Briefly, the code computes a quasar continuum taking the redshift, spectral slope and magnitude parameters from the DR10 quasar catalog for every object, and adds up to 59 broad emission lines to it. The Lyα forest is then created using statistical models for the redshift and column density distributions of neutral hydrogen, which are derived from studies of high-resolution spectra. Individual metal absorption lines associated to the neutral hydrogen absorption systems are included by means of the photoionization code CLOUDY v13.03 (Ferland et al. 2013). In detail, for every hydrogen absorber at a given redshift and hydrogen column density above N HI ≥ 10 15 cm −2 , the code reads its average metallicity inferred from absorption spectroscopy studies and introduces these three parameters into CLOUDY. Considering the radiation background at the corresponding redshift, CLOUDY yields then the metal column densities of 30 atomic species and about 500 ionization states. Finally, QSOSim10 creates absorption lines of these column densities with a line width sampled from a normal distribution centered at 6 km s −1 and standard deviation of 1.5 km s −1 . Blends 14 https://github.com/vincentdumont/qsosim within different lines are allowed but the individual lines are created using Voigt profiles. The resulting mock spectrum is finally convolved accounting for the BOSS spectrograph resolution, and the noise from the observations and instrumentation is included. We refer the reader to Milakovic et al. (2017) for more detailed descriptions of the code and comparisons with the real BOSS spectra.
For our purpose, we modify QSOSim10 to create three sets of 1 000 mock quasar spectra each, with different spectral resolution and S/N ratio broadly covering the range of values in our real data samples. We consider spectral resolutions R = 100 000 and R = 50 000, which we name HIres and MDres, respectively, referring to high and medium resolution. Since QSOSim10 computes BOSS noise for each spectrum, we adopt these BOSS values and simply apply a constant reduction factor for our mock samples, i.e., factors 3, 20 and 50, named LOsnr, MDsnr, and HIsnr, respectively, accounting for low, medium and high S/N ratios. Table 2 lists the median S/N ratio values, the resolution, and number of spectra for each mock sample. Figure 4 illustrates the S/N ratio distributions as shaded areas, and overplotted as solid lines are those of the four real quasar samples. The mock HIres − MDsnr sample (blue shaded region) broadly matches the HIRES sample (blue line), and the shaded red and shaded green regions cover the higher and lower tails, respectively, of the distributions of the real samples, which will be useful to determine the dependencies on S/N.

Completeness and Purity in Mock Spectra Searches
We run our search code on the mock spectra to assess the purity and completeness of the results for all the species. We define the completeness as the fraction of mock absorbers, i.e., actual matches, that are detected by our code (considering only detections of at least 3σ significance), and for several equivalent width bins. The purity denotes the ratio between the number of the doublets detected by our code that are real and the total number of detections. In all cases, we only consider the doublets outside the Lyα forest. We have also used these mock spectra to explore and find the optimal search parameters. Figure 5 displays the completeness of the search as a function of the equivalent width of the bluer member of each doublet species, and for the three mock samples. The two highest S/N samples (red and blue lines) show little difference, with HIsnr sample (red lines) presenting a completeness ∼ 5−15% higher than the MDsnr sample (blue lines), owing to the better S/N ratio of the spectra which allows for the detection of weaker features. For the lowest resolution and low S/N ratio sample (green lines), the completeness is, in general, significantly lower than that of the other two samples, especially at small equivalent width values. Given the distributions of S/N ratios for the real spectra, we expect most of these to show completeness curves between the green and red lines in Figure 5.
The overall purity for each species and mock sample is indicated in Table 2, and is generally above 90% for the CIV, SiIV and NV species, and > ∼ 80% for MgII. The last four columns show purity values considering only absorbers with equivalent width for the strongest line in the doublet W r > 10 mÅ, reaching values within 99 − 100% in all CIV, SiIV and NV mock samples. For MgII, the lower right panel of Figure 5 indicates additional purity values for absorbers with equivalent widths W r > 20 mÅ and W r > 50 mÅ, with the same color code as the lines in the plot. In general, purity values from the mock catalogs are higher than those computed from real spectra, likely because mocks have lower complexity, specially in the absorption line profiles and in the lack of effects, such as sky lines or outliers, that can contaminate pixels.

Continuum Placement Effects
Obtaining the quasar continuum is particularly difficult in the Lyman-alpha forest region of the spectra, where multiple hydrogen and metal absorption features overlap and reduce the average transmission. Faucher-Giguère et al. (2008) estimated the continuum uncertainty in this region to be of a few percent in spectra with S/N∼ 10, increasing toward lower S/N and higher redshifts. Outside the Lyman-alpha forest, however, most of the spectrum is free of absorption and the continuum can be precisely calculated, its uncertainty being a small fraction of the noise in those regions.
To test the effect of continuum placement on the doublet search, we introduce variations in the continuum for each of our normalized mock spectra as follows: we consider eight evenly-spaced wavelength positions (nodes) in the normalized spectrum, the first and the last nodes corresponding to the first and last pixels, respectively. For every node, we then compute a small perturbation by randomly sampling a Gaussian distribution centered at 1, corresponding to the value of the nominal unabsorbed normalized flux, and standard deviation σ C = 0.01, corresponding to a 1% variation in the flux. Finally, the flux of the eight perturbed nodes is connected using linear interpolation, and the flux of each pixel in the spectrum is multiplied by the value of the interpolation at the corresponding pixel wavelength. Selecting σ C = 0.01 for the region outside the forest is likely an overestimation of the uncertainty in the continuum, especially for the high S/N spectra, but we adopt this value as an upper limit and interpret our results as a worst case. Figure 6 shows the results after searching the 1 000 mock spectra of each sample with variations in the continua (dashed lines) and using the original unperturbed mocks (solid lines), for the case of CIV. The left panel shows the number of detections at every wavelength bin. The low S/N samples (green lines) show no differences between the perturbed and unperturbed cases because the noise dominates the signal and variations of the con-  Even assuming these large continuum variations, the results are unaffected at equivalent widths above W r ∼ 15 mÅ, concluding that the continuum uncertainties are not a concern for our analysis above this threshold. We have not observed apparent effects on the redshift distributions due to changes in the continua in any case.

METAL DOUBLET CATALOGS
We present here the doublet catalogs resulting from our search.
As mentioned in § 2, a fraction of the doublets in the samples are repeated because some spectra are included in various surveys, and, for the case of the KODIAQ samples, the same quasar has several spectra from different observational campaigns. We consider two doublets to be the same when they accomplish all the following requirements: (i) the doublets appear in observations of the same quasar, (ii) the average redshifts of the doublets are offset by ≤ 50 km s −1 , implying redshift variations of ∆z/(1 + z) ∼ 10 −4 , and (iii) the difference between the equivalent width of the respective absorption lines in the two doublets, for both lines of the doublet, is ∆W < 20%. When two doublets are considered to be the same feature, we keep the one with the highest total detection significance, expressed as W l /σ W l + W rg /σ Wrg , where l and rg denote the lower and higher wavelength lines in the doublet, respectively. Variations of a factor of two in these two threshold values lead to variations of a few percent in the total number of doublets. Figure 7 shows the distributions of rest-frame equivalent width, W r , doublet redshift, z, and significance of the detection, W/σ W , for the lower wavelength lines in the doublet, and the four species in the final total catalog, including the spectra from all samples. The redshift of the doublets is computed as the weighted mean of the redshift of the two lines in the doublet, using the significance of the detections as the weighting parameter. Table 3 quotes the number of doublets for each species and for each catalog, and in the final catalog after accounting for the repeated doublets. Table 4 illustrates the structure of the final CIV catalog available at https://github.com/lluism/OMG. The name, right ascension and declination of the quasar are quoted in the first to the third columns, respectively, and the average redshift of the doublet is displayed in the fourth column. The fifth to the tenth columns denote the rest-frame equivalent width and its uncertainty (in mÅ), and detection significance, for the CIV λ1548 and CIV λ1550 lines, respectively. We list the detection significances computed from the wavelength and uncertainty values in Angstrom, at the observer frame using the redshift for each individual line, and before rounding. In some cases, these values differ from the simple division of the values quoted in the table. The last column denotes the quasar sample from where the doublet is obtained. The individual catalogs and distribution plots for each species and quasar sample are publicly available at the link indicated above.

CONCLUSIONS
In this first paper of the series Origin of Metals around Galaxies (OMG), we build and publicly release the large metal-line doublet catalogs that we will analyse in future papers.
We have designed a metal-doublet search algorithm, based on that by Cooksey et al. (2008), that searches for metal-line doublets of CIV λλ1548,1550, MgII λλ2796, 2803, SiIV λλ1393,1402, and NV λλ1238,1242, in quasar spectra in a fully-automatic fashion, i.e., without human intervention. We have tested the purity of the results by looking for false doublets using line separations larger than the true ones in real spectra, and have created three large samples of mock quasar spectra with different resolution and S/N ratio values to estimate the completeness. We have also assessed the impact of variations in the determination of the intrinsic quasar continua on the purity. Finally, we have applied our code to the restframe wavelength ranges outside the Lyman-alpha forest of the spectra of 690 quasars, compiled from the KO-DIAQ datasets and observations with the UVES and HIRES instruments. Our findings can be summarized as follows: • We build catalogs with 5 656 CIV λλ1548, 1550 doublets, 7 919 doublets of MgII λλ2796,2803, 2 258 of SiIV λλ1393,1402, and 239 of NV λλ1238,1242, available at https://github.com/lluism/OMG.
• We present the purity of the individual samples with equivalent width, redshift, and significance of the detection, the latter being the parameter that purity depends on the most.
• We expect contamination to the purity arising from errors in the calculation of the continuum to be negligible outside the Lyman-alpha forest. In cases of large continuum errors, if any, this effect may introduce false positive detections with small equivalent widths ( 15 mÅ) in spectra with high signal-tonoise ratios (S/N 10).
In the upcoming work, we will perform the crosscorrelation of the weak CIV absorbers found here with the Lyα forest from BOSS. These calculations will enable us to obtain the bias factor of the intergalactic absorbers at different equivalent widths, and thus place constraints on their progenitor galaxy population.  -Rest equivalent width, Wr, redshift, z, and detection significance, W/σ W , distributions for the higher frequency line of the doublets in the total catalogs. The horizontal axes for equivalent width and detection significance are logarithmically spaced to facilitate the visualization.