The Co-ordinated Radio and Infrared Survey for High-Mass Star Formation - II. Source Catalogue

The CORNISH project is the highest resolution radio continuum survey of the Galactic plane to date. It is the 5 GHz radio continuum part of a series of multi-wavelength surveys that focus on the northern GLIMPSE region (10 deg<l<65 deg), observed by the Spitzer satellite in the mid-infrared. Observations with the Very Large Array in B and BnA configurations have yielded a 1.5"resolution Stokes I map with a root-mean-squared noise level better than 0.4 mJy/beam. Here we describe the data-processing methods and data characteristics, and present a new, uniform catalogue of compact radio-emission. This includes an implementation of automatic deconvolution that provides much more reliable imaging than standard CLEANing. A rigorous investigation of the noise characteristics and reliability of source detection has been carried out. We show that the survey is optimised to detect emission on size scales up to 14"and for unresolved sources the catalogue is more than 90 percent complete at a flux density of 3.9 mJy. We have detected 3,062 sources above a 7-sigma detection limit and present their ensemble properties. The catalogue is highly reliable away from regions containing poorly-sampled extended emission, which comprise less than two percent of the survey area. Imaging problems have been mitigated by down-weighting the shortest spacings and potential artefacts flagged via a rigorous manual inspection with reference to the Spitzer infrared data. We present images of the most common source types found: regions, planetary nebulae and radio-galaxies. The CORNISH data and catalogue are available online at http://cornish.leeds.ac.uk


Introduction
The observed progression of massive star formation, from cold collapsing core to young OB clusters, is largely understood via observations of discrete examples that have been ordered into an evolutionary sequence. Key to separating objects of different age and type are measurements of their spectral energy distributions (SEDs) at sub-millimetre, infrared and radio wavelengths.
The Spitzer GLIMPSE (Galactic Legacy Infrared Mid-Plane Survey Extraordinaire) programme is the first of a number of sensitive infrared surveys covering the inner Galactic plane at high resolution and in an unbiased manner (Churchwell et al. 2009). The northern half of GLIMPSE covers the region 10 • < l < 65 • , |b| < 1 • at wavelengths spanning 3.6 µm -8.0 µm, which preferentially selects warm and dusty embedded sources. The companion Spitzer MIPSGAL survey (Carey et al. 2009) has imaged the same region at 24 µm and 70 µm (where the bulk of the energy from massive young stellar objects is emitted) and is hence sensitive to cooler and more deeply embedded young stellar objects. Most recently, the Herschel Infrared Galactic Plane survey (Hi-GAL, Molinari et al. 2010) is delivering the most comprehensive survey of embedded objects to date. With observations in six far-infrared bands between 70 µm and 500 µm, Hi-GAL samples the peak of the star-forming SED and covers the northern GLIMPSE region out to l = 60. Completing the infrared picture of Galactic star formation is the UKIDSS 1 project (UK IR Deep Sky Survey, Lawrence et al. 2007). A subset of UKIDSS (the Galactic Plane Survey, Lucas et al. 2008) has observed the northern GLIMPSE region in the near-infrared J, H and K bands and is sensitive to objects down to 18th magnitude. The combined data from these surveys are driving the detailed characterisation of the Galactic population via their infrared colours (e.g., Robitaille et al. 2007, Arvidsson et al. 2010, Smith et al. 2010, Wright et al. 2010,Mottram et al. 2011. A complementary picture of the molecular and atomic interstellar medium is being provided by the BU-FCRAO Galactic Ring Survey for CO (Jackson et al. 2006) and the VLA Galactic Plane Survey (VGPS) for H I (Stil et al. 2006). Similarly, the ongoing Isac Newton Telescope Photometric Survey of the Northern Galactic Plane (IPHAS) (Drew et al. 2005) probes H α in emission towards nebulae, and in both absorption and emission towards stars. The UKIRT Wide Field Infrared Survey for H 2 (UWISH2, Froebrich et al. 2011) also covers the same GLIMPSE region in molecular hydrogen (2.122 µm line) highlighting regions of shocked or fluorescently excited molecular gas (T≈ 2000 K, n H2 > 10 3 cm −3 ).
Conspicuous by its absence is a comparable radio continuum survey for compact ionised gas. Previous surveys are either targeted at individual sources selected via infrared colours (e.g., Wood & Churchwell 1989, 95123 Catania, Italy Kurtz et al. 1994, Urquhart et al. 2009 or are limited in their resolution and sky-coverage (e.g., Becker et al. 1994. From a star formation perspective, the presence or absence of free-free emission is vital to distinguish the more evolved ultra-compact H II (UCH II) regions from their younger counterparts with similar thermal SEDs (Urquhart et al. 2009(Urquhart et al. , 2011. The sheer number density of sources in the near and mid-infrared surveys necessitates complementary data at similarly high resolution to enable the full science potential to be fulfilled. This is particularly true in highly clustered star forming regions. It is important that any radio-continuum survey for UCH II regions be carried out at relatively high frequencies (≥5 GHz) where thermal free-free emission is optically thin with a spectral index of S ν ∝ ν −0.1 . At lower frequencies the spectrum becomes optically thick with S ν ∝ ν 2 . High-frequency observations hence confer a signal-to-noise advantage and probe the structure of the ionised gas at all depths in UCH II regions. We note that even at ν = 5 GHz we will be insensitive to a population of young and compact H II regions: the so-called Hyper-compact H II (HCH II) regions (see Sewi lo et al. 2011 and references therein). These objects have greater emission measures than UCH II regions and the turnover frequency from optically thick to thin occurs at high frequencies.
No previous radio survey of the Galactic plane has similar resolution and coverage to the Spitzer GLIMPSE survey. A number of single dish surveys have been conducted at 5 GHz (e.g., Altenhoff et al. 1979), however, their arcminute resolution is quite low, compared to the arcsecond resolution of Spitzer. Most interferometric surveys have been carried out at a frequency of 1.4 GHz (e.g., the NRAO VLA Sky Survey, Condon et al. 1998) except for the catalogues of Becker et al. (1994), Giveon et al. (2005) and White et al. (2005), who surveyed the inner Galactic plane (−10 • < l < 42 • , |b| < 0.4 • ) at 5 GHz. These three surveys are published as the Multi-Array Galactic Plane Imaging Survey (MAGPIS) 2 . They used the Very Large Array (VLA) in C and D-configurations, which deliver a relatively large beam (4 ′′ ×9 ′′ ) and the total survey area only covers 26 percent of the northern GLIMPSE region.
The CORNISH (Co-Ordinated Radio 'N' Infrared Survey for High-mass star formation) project delivers a uniform, sensitive and high-resolution radio survey of the northern GLIMPSE region to address key questions in high-mass star formation, as well as many other areas of astrophysics. In addition to UCH II regions, the CORNISH survey detects many other radio-bright objects, including planetary nebulae, ionised winds from evolved massive stars, non-thermal emission from active stars, active Galactic nuclei and radio galaxies. The full rationale behind the survey design and the scientific motivation is presented in an accompanying paper, (Hoare et al. 2012).  Note.-The properties of the data differ in the combination of antenna types included in the array, the configuration of the array, the weather experienced and the declination range observed. Unless otherwise noted the weather during the observations was reasonable.

Observations
CORNISH covers the 110 square degrees of the northern GLIMPSE region (10 • < l < 65 • , |b| < 1 • ) using the VLA in B and BnA configurations at 5 GHz. The combination of array configuration and observing frequency results in a ∼ 1.5 ′′ synthesised beam within a 8.9 ′ field of view, corresponding to the full-width half-maximum (FWHM) primary beam. With a total integration time of 80 seconds per pointing, the root-mean-squared (RMS) noise in the images is better than 0.4 mJy beam −1 -sufficient to detect an unresolved UCH II region around a B0 star on the far edge of the Galaxy (16 kpc, Kurtz et al. 1994).
CORNISH observations of the northern GLIMPSE region were conducted using the VLA during the 2006 and 2007/2008 observing seasons. The observations fall naturally into the epochs presented in Table 1, which are distinguished by the combinations of array configuration used, inclusion or exclusion of upgraded EVLA antennas, declination ranges observed and weather conditions experienced. We show later that data from each epoch have unique properties.
To facilitate scheduling the target area was divided into 42 blocks each corresponding to eight hours of observations per day. Block contain between 180 and 220 fields arranged in rows of equal right-ascension on a hexagonal pointing grid. Individual fields were observed as two 45 second 'snapshots' separated by ∼ 4 hours in time, maximising the uv-coverage and minimising the elongation of the synthesised beam. The telescope was advanced along each row (∼ 20 fields) integrating for 45 seconds on each pointing position, before observing a secondary calibrator (one of 1832-105, 1856+061 or 1925+211) for two minutes and then continuing to the next row. Including overheads the secondary calibrators were observed with a cadence of twenty minutes. Fields at declinations greater than −15 • were observed using the VLA's B configuration while fields at lower declinations were observed using the BnA configuration, which is designed to compensate for beam distortion at low elevations.
To allow imaging of the widest possible field of view without bandwidth-smearing the observations were carried out in pseudo-spectral line mode. The two 25 MHz wide spectral windows (also known as intermediate frequencies, or IFs) of the VLA correlator were tuned to adjoining frequency bands centred on 5 GHz. Each window was sampled by eight 3.1 MHz channels, degrading the peak response by only a few percent at the edge of the 8.9 ′ primary beam. Due to hardware limitations only the RR and LL polarisations were recorded, meaning that linear polarisation information is not available in the CORNISH data.
During both CORNISH observing seasons significant engineering works were underway to upgrade the VLA to the next generation instrument: the Expanded VLA (EVLA). In 2006 between two and six antennas were missing from the array as they were being refurbished with new receivers and electronics to convert them to the EVLA design. By the start of the second season of CORNISH observations (September 2007), almost half the array was comprised of EVLA antennas and the instrument was operating in a transition mode. Over the season VLA antennas were progressively removed from the active array and substituted by EVLA antennas. The EVLA antennas conferred the advantage of enhanced sensitivity, but at the same time were untested and prone to software and hardware problems. Special care was needed to properly calibrate VLA-EVLA baselines and to ensure that the EVLA data were properly flagged. As part of the upgrade, the venerable Modcomp-based VLA control systems was also replaced in mid 2007 with new software running under Linux. Taken as a whole, the CORNISH data required close inspection and vigilance during post-processing.

The data reduction pipeline
The raw CORNISH dataset consists of 9,349 pointing positions, each of which was observed twice. A manually guided data-reduction procedure was considered too labour-intensive to use on such a large volume of data, hence a semi-automatic pipeline was developed with the control parameters tuned to the average observation. This approach has the advantage of applying uniform processing over the majority of the survey area, while still allowing manual intervention in a minority of special cases (e.g., fields with complicated emission structures, or very bright sources).
The pipeline was implemented in the python language and made use of the ObitTalk module to interface directly with the NRAO 3 AIPS 4 and OBIT 5 data-reduction packages. The CORNISH pipeline utilised a MySQL database to record meta-data and perform bookkeeping operations during the reduction procedure. Figure 1 illustrates the pipeline logic, which is broken up into calibration and imaging stages. In the following sections we describe each of the stages in detail.

Calibration and flagging
Raw data from the telescope were corrected for atmospheric opacity using phase monitor data and written to an AIPS-format uv-file in spectral-line mode by the AIPS task FILLM. Each eight-hour block of observations was first inspected by eye and uv-visibilities with large phase scatter, errant amplitudes or system-temperature spikes were flagged out of the dataset. Gross errors in the data, such as bad antennas, IFs or polarisations were also flagged out at this stage. It was necessary to edit out the first five seconds of data from each pointing to allow for antenna settling time, reducing the on-source integration time from 45-sec to 40-sec. All manual flagging parameters were written to a master flag list, which was automatically applied upon restarting the pipeline. Care was taken here that the primary flux calibrators contained only good data.
The shapes of the VLA and EVLA pass-bands are different enough that a six percent closure error has been measured on EVLA-VLA baselines in continuum modes using 50-MHz bandwidths 6 . This error is expected to be larger at narrower bandwidths. Because we are operating in pseudo-spectral line mode the issue was mitigated by performing bandpass calibration (phase and amplitude) immediately after the initial flagging, and before any further calibration. Solutions for the atmospherically and electronically induced changes in phase and amplitude were then calculated using the standard AIPS CALIB task, operating on one of the three secondary calibrators. The data were bootstrapped on to an absolute flux scale by comparing observations of the quasars 1331+305 (3C286) or 0137+331 (3C48) to their C-band model in AIPS. A global calibration table was produced, which could then be applied to the whole block.
After a first pass at calibration the OBIT flagging tasks AutoFlag and MednFlag were applied to each IF, polarisation and channel of the secondary calibrator observations. AutoFlag edits out bad visibilities based on an absolute maximum allowed value in Stokes I or V. Radio-frequency interference (RFI), e.g., from commercial broadcasting, is often highly polarised and all visibilities with Stokes V amplitude greater than 2 Jy were flagged as bad.MednFlag applies a rolling median filter to each IF and spectral channel. Visibilities were edited out if they had Stokes I values greater than five standard deviations from the median, calculated in a ∼ 50 second time-window. A second pass at calibration was then performed before applying a similar flagging procedure to the calibrated science observations on a fieldby-field basis. Finally, all calibration and flagging tables were applied to the data and the individual forty-second pointings were split into uv-FITS format files.
Meta-data associated with each observation (e.g., the pointing centre co-ordinates and number of flagged visibilities) were automatically saved in the MySQL database. The imaging procedure subsequently queried this database when building a final mosaiced image.

Imaging
Fields were imaged using the OBIT Imager task, which performs imaging and deconvolution in a similar manner to the AIPS IMAGR routine. Imager automatically switches between the standard Cotton-Schwabb and SDI deconvolution algorithms (Schwab 1984, Steer et al. 1984, referred to simply as 'CLEAN' in the following discussion. In addition the task can be instructed to perform both phase and amplitude self-calibration. Imager is a complex task with many important input parameters which need careful tuning to result in a scientifically useful image. Key amongst these are the maximum residual flux, the threshold at which to begin self-calibration, whether to 6 See http://www.vla.nrao.edu/astro/guides/evlareturn/ for details. perform both phase and amplitude self-calibration, and the weighting function used. The 'average' CORNISH field was imaged using a Briggs-robustness parameter of zero, which is a compromise between natural (highsensitivity) and uniform (high-resolution) weighting. A minority of complex fields were treated as a special case and imaged with a custom weighting scheme -see Section 3.2.2, below. Self-calibration was performed on sources with peak fluxes greater than 30 mJy beam −1 . During the deconvolution process the restoring Gaussian beam was forced to be circular and have a full-width half-maximum of 1.5 ′′ . The cell-size was set at 0.3 ′′ , oversampling the synthesised beam. We justify these choices in the following sections.
Imager implements two algorithms which improve the dynamic range and quality of the final image. Firstly, the AutoWindow 7 function dynamically places small (< 20 pixels in radius) clean windows over regions of emission in the intermediate dirty map. The validity of each window is assessed periodically during the clean cycle and additional windows are created as necessary. The net effect is to CLEAN only real emission and avoid CLEANing noise, resulting in a smaller clean bias. The AutoCen 8 algorithm regrids uv-data containing bright point sources, so their peaks fall on a pixel centre. A point source at the centre of a pixel can be represented as a single delta function, i.e., a single clean-component, leading to a significant improvement in dynamic range. In contrast, a bright point source offset from a pixel centre requires   Individual fields containing bright or complex structured emission also required higher fMRF values. Partial rows of fields (in R.A.) with different multipliers compared to their neighbours are due to repeat observations from later epochs. The row at δ = −16 • 55 ′ 03.13 ′′ (13.3 • < l < 14.5 • ) was observed only once and the fMRF was increased to avoid over-cleaning.
multiple clean components, both positive and negative, to model its emission. This more complex model will inevitably suffer from rounding errors on finite-precision computers, leading to flux being scattered into the surrounding sky. The EVLA memos numbers 116 and 114 by Bill Cotton contain detailed descriptions of the Au-toWindow and AutoCen algorithms.

Controlling the deconvolution algorithm
One of the most critical control parameters for the imaging task is the target maximum residual flux (MRF). The ideal value varies from field to field depending on the weather conditions, individual antenna system temperatures and the structure and strength of emission in the field of view. To estimate the intrinsic sensitivity attainable we imaged each field in Stokes V and measured the RMS noise. Sources with significant circularly polarised emission at 5 GHz are rare (Roberts et al. 1975, Homan & Lister 2006, so the RMS noise measured from an uncleaned Stokes V image is expected to be comparable to the final CLEANed noise level. Because the two polarisation beams of the VLA are not co-aligned on the sky they give rise to a strong instrumental polarisation away from the pointing centre. To compensate for this 'beam squint' 9 only the central 2 ′ diameter portion of each field was imaged and measured. Despite this we ex-9 http://www.aoc.nrao.edu/evla/geninfo/memoseries/evlamemo113.pdf pect noise in the Stokes V images to be higher than the ideal in their Stokes I counterparts. In addition, the 1.5 ′′ restoring beam applied to the I images is larger, on average, than the unconstrained synthesised beam of the V images. This effectively smooths the noise in the Stokes I images compared to the V. The target MRF was assumed to be equal to RMS V × f MRF , where f MRF is a constant multiplicative factor. The canonical value for f MRF was determined in two ways. Firstly, artificial point sources were injected into the uv-data for an emission-free field and the OBIT Imager task was applied using a range of values for f MRF . After each iteration the recovered flux and RMS noise of the final image was measured. Figure 2 shows the results of this experiment. We found in practise that values of f MRF = 0.8 − 2.0 recovered greater than 96 percent of the flux, within the errors. Secondly, we chose a representative sample of compact sources with simple morphologies and inspected the deconvolved images for residual sidelobe structure. Values of f MRF = 0.8 − 1.0 were required to fully remove sidelobe structure from the images. We re-imaged fields using a higher threshold where obvious clean artifacts were present. Figure 3 shows the values of f MRF used across the survey. Most of the CORNISH area was imaged using an f MRF = 0.8. Fields at lower declinations (including the BnA observations) required higher f MRF values to avoid producing increased numbers of low-level artefacts in otherwise empty regions. Unre-solved weak detections are dominated by the extragalactic population and hence have a flat distribution on the sky (Anglada et al. 1998, see also Sections 5.4 and 6.2.2). Based on an initial pass at the data reduction we used two further levels of f MRF = 1.0 and 1.1, chosen to keep the number of 5 -6σ point-sources roughly constant away from the Galactic mid-plane. However, multiplier values between 0.8 and 1.1 lead to imaging artefacts in a minority of fields affected by poor calibration, containing very bright point sources (>1 Jy) or extended emission. Such fields were inspected and cleaned manually. A small proportion of fields (≪ 1 percent) were found to contain significant circularly polarised flux and were also cleaned by hand. It is likely that this emission is due to the instrumental beam squint, rather than real emission on the sky. The OBIT Imager task does not correct for beam squint, however, tests on selected CORNISH data did not find any believable Stokes V after a correction had been applied (B. Cotton, private communication). The manually cleaned fields are mostly confined to highmass star-formation regions. They appear on Figure 3 as patches of red hexagons clustered around the mid-plane of the Galaxy. Values used in these cases ranged over 1.2 ≤ f MRF ≤ 10, with a mean of 3.2.
Applying the deconvolution algorithm close to the noise can result in an increase in the so called 'clean bias', an effect which results in a systematic reduction in object fluxes. It is believed to be caused by inadvertently CLEANing bright sidelobes, leading to a subtraction of real flux from astronomical objects (White et al. 1997;Condon et al. 1998). We measure the clean bias for CORNISH images in Section 5.6.

Imaging extended emission
Baseline lengths on the VLA B and BnA arrays range from approximately 300 kλ to 2 kλ, equivalent to spatial scales of 1.5 ′′ to 2 ′ , respectively. These array configurations sample the uv-plane less well at shorter spacings and the deconvolution algorithm has difficulty reconstructing image structure on scales greater than ∼ 14 ′′ . The imaging procedure tends to produce 'waves' or 'ruffles' in the background of fields containing significant extended emission. Flux is also scattered over the image as the standard CLEAN algorithm attempts to model the emission as a series of delta functions. This can lead to high RMS noise levels and multiple imaging artefacts, especially if self-calibration is allowed to run unchecked. A total of 193 fields (∼ 2 %) were found to have poorlyimaged extended emission. To combat this problem we imaged these fields using a custom weighting scheme. By default the CORNISH pipeline is configured to use robust weighting (Briggs 1995), which is a compromise between the low thermal noise of natural weighting and the high resolution of uniform weighting. In the case of uniform weighting the OBIT Imager weights each visibility by the sum of the weights present in each cell in the uv-plane. For fields with extended emission we have instead weighted by the inverse of the number of visibilities within a radius of ten cells, attenuated by a Gaussian function. For the B and BnA arrays this has the effect of weighting down the poorly sampled short spacings, similar to the effect of applying an inverse taper. We found that the RMS noise and number of artefacts in the imaged fields are reduced at the expense of additional 'missing' flux. The reduction in flux compared to a uniform, or robust weighted image is highly dependent on the structure of the emission.
In order to quantify the effect of the two weighting schemes (robust = 0 and Gaussian) we imaged artificial Gaussian sources of increasing size inserted into the uvdata for a blank field. The peak flux was fixed at 1.0 Jy beam −1 while the full-width half-maximum was increased from 2.0 ′′ to 26 ′′ in steps of 2 ′′ . Figure 4 (top panel) plots the recovered flux as a function of the source FWHM for both weighting schemes. When imaging using robust weighting, the fraction of recovered flux drops off above FWHMs greater than ∼ 8 ′′ . When using the Gaussian-smoothed weighting scheme this drop-off also occurs at FWHM ≥ 8 ′′ , however, the fraction of recovered flux falls more rapidly. The bottom panel of Figure 4 shows the RMS noise as a function of injected source size. In this case the flux density was fixed at 1.0 Jy to avoid being dynamic-range limited at higher fluxes. It is clear that the RMS noise in the Gaussian weighted images is significantly lower and more stable as a function of FWHM. In practise dynamic ranges of several thousand are achieved on isolated point sources, falling to several hundred for slightly resolved sources (> 1.8 arcsec). Figure 5 shows the effect of the two weighting schemes on the fidelity of the images. In the top panel is plotted the absolute fractional value of the residual flux remaining after the model image is subtracted from the pipeline imaged data (|S resid |/S model ). Lower numbers mean that the image is similar to the model, while higher numbers mean that there are significant structural differences. It can be seen from the plot that the Gaussian weighting scheme is the most consistent at representing the source morphology, while the robust weighted images break down between FWHM = 8 ′′ and 10 ′′ . To further quantify the effect we fit the pipeline imaged data with a 2D-Gaussian using the MIRIAD task imfit. In the bottom panel of Figure 5 is plotted the fitted versus injected FWHM. It is again clear that the robust weighted images begin to differ from the model at FWHM≈ 10 ′′ , while the Gaussian weighting scheme preserves structures out to ∼ 14 ′′ . Note that real-world emission with complex morphology will react differently to the Gaussian weighting scheme, depending on its visibility function.

Mosaicing
All fields were imaged out to a radius of eight arcminutes (∼ 10 percent power pattern) before being linearly mosaiced in the image plane onto 20 ′ × 20 ′ tiles ori-ented in equatorial (J2000) coordinates. In total 1408 tiles cover the survey area, each of which overlap by 1 ′ . Pointing centres sit on a close-packed hexagonal grid adapted from the 1.4 GHz NVSS survey and scaled to 5 GHz. Condon et al. (1998) justifies this layout in detail and Hoare et al. (2012) describes the implementation in CORNISH. Here we provide a summary for convenience. Adjacent CORNISH pointing centres are separated by 7.4 ′ , compared to the 8.9 ′ full-width half-maximum of the primary beam at 4.86 GHz. The separation is optimised to maximise the uniformity of the noise pattern without appreciably degrading observing efficiency. At any point in the mosaic the sky brightness B is given by a weighted sum of the individual brightness values b i contributed by the overlapping snapshots To maximise sensitivity the weighting factor W i was set to be proportional to P (ρ), the primary beam pattern as a function of offset ρ from the pointing centre. This correction is necessary as the noise is constant across a raw snapshot image and must be weighted by the square of the signal-to-noise-ratio. The weighting method is implemented in the CORNISH pipeline in two steps. Individual fields are first multiplied by P (ρ) and summed onto a blank tile. This image is then divided by a 'weight image' created from the sum of P 2 (ρ) functions (modelled by Gaussians for the VLA). The resultant data product is a mosaiced image which has been primary beam corrected (i.e., divided by P (ρ)). Figure 6 illustrates an example of a weight image. The minimum weight is P 2 (ρ) = 0.83, hence the worst-case relative-sensitivity is P 2 (ρ) = 0.91.  an extended jet and there are several sources brighter than 1 mJy within one arcminute. The image scales have been stretched to show all real emission, but also highlight very low-level imaging artefacts. Only clean-components from real emission were used when calibrating the data.

Data quality
Data were reduced and imaged for quality control purposes immediately after the observations were completed. Bad data were quickly identified allowing the affected fields to be re-scheduled in the observing queue. The rapid turn-around time meant that we were able to reobserve most fields affected by poor weather in 2006 and system power glitches in 2007/2008.

Calibration
Three quasars, spaced equally along the plane of the Galaxy, were used as secondary (phase) calibrators for the whole survey. Although initially assumed to be pointlike, we found that each exhibited structure at the 0.1 to 2.0 percent level, in the form of radio-jets and nearby confusing sources. Using the full complement of data available we imaged and self-calibrated each secondary calibrator field out to the full-width half-power radius. The resulting clean-component models were used as inputs to the calibration procedure. Images of the secondary calibrators are presented in Figure 7. Quasars 1832-105 and 1856+061 deviate from point sources, exhibiting jets with flux densities peaking at two percent of the main peak. The source 1925+211 shows significant structure within one arcminute of the central source, including an elongated jet and two point sources of 1 mJy and 6 mJy (0.1 and 0.4 percent of the main peak, respectively).
Two primary flux density calibrators were observed, providing a redundant means of flux-calibrating the data. 1331+305 (3C286) was observed at the beginning of an observing block and 0137+331 (3C48) at the end. With a 5 GHz flux density of 7.47 Jy 1331+305 was the preferred calibrator. However, 0137+331 (S 5 GHz = 5.48 Jy) was used if technical or weather-related problems affected the initial data from a block.
The small number of calibrators observed allowed us to check the consistency of our calibration with time. Figure 8 shows the percentage deviation from the median flux densities of the three secondary calibrators and the backup primary calibrator. Each point on the plot represents an 8-hour block of observations. Calibrator flux densities were measured directly from the image data by manually drawing a polygon around each quasar and summing the flux within the polygon. For the secondary calibrators the standard-deviation in flux is 2.7 percent, and for the backup calibrators 8.9 percent, consistent with the accuracy of previous VLA surveys (e.g., Condon et al. 1998, who quote three percent at 1.4 GHz). No variation with time is seen, implying that the calibration is stable over the two observing seasons. The scatter in the backup calibrator is a more appropriate error to quote for snapshot imaging and is adopted as the formal amplitude calibration error for CORNISH data.
All CORNISH observations are phase-referenced to one of the three secondary calibrators and hence adopt their positional uncertainties. The formal positional uncer-tainties may be found in the VLA calibrator manual 10 and are < 150 milliarcseconds (mas) for 1832−105, < 10 mas for 1856+061 and < 2 mas for 1925+211.

Synthesized Beam shape
The dual-snapshot observing scheme was designed to deliver the most circular synthesised beam possible, while allowing both snapshots to be taken within a single eighthour observing block. To minimise the total range of synthesised beam shapes in the survey each field should ideally be observed at an equal ±3 hr hour-angle before and after its zenith position. Scheduling constraints meant that this was not achieved in practise and a compromise of four hours between snapshot images was implemented. Hoare et al. (2012) presents the parameters of the synthesised beams attained in the final images, which we briefly summarise here.
Within each observing block the beam elongation increases towards lower declinations, while the position angle varies by ∼ 60 degrees. The distribution of beam minor-axes in the survey area separates into two distinct populations, with a small peak at 0.77 ′′ and a large peak at 1.2 ′′ . The smaller peak stems from the low-declination fields observed using the BnA array configuration, while the larger one contains the majority of fields observed using the B array. In contrast, the distribution of major axes values is monolithic, with a median at 1.5 ′′ and a standard deviation of 0.32 ′′ . Ninety-eight percent of fields have elongations less than two and seventy-four percent less than 1.5.
Based on these values, we chose to force a circular restoring beam of FWHM 1.5 ′′ because this greatly simplified the mosaicing operation and meant that the restoring beam shape was constant across every mosaiced image. The value 1.5 ′′ was chosen as the median value of the measured major-axes from all CORNISH fields. The degree of super-resolution is presented in Figure 9 and is less than 1.5 in ninety-six percent of fields. The restoring beam area is larger than the synthesised beam area for 8,154 fields (87.2 percent) and is less than 1.5 times greater in 9,343 fields (99.9 percent). Figure 10 presents an image of the RMS noise over the full survey area, with each colour-coded hexagon representing a field. The locations of H II region complexes are prominent as clumps of high-noise fields located close to the mid-plane of the Galaxy. Away from such regions the noise level within individual scan-rows (scanning in RA) is relatively constant compared to the variation between rows, which is largely weather related. The observation area can be divided into two regions with noticeably different noise properties. At declinations greater than δ = 14.2 • , the median RMS noise is significantly lower (RMS outer = 0.25 mJy beam −1 ) than the remainder of the survey area (RMS inner = 0.35 mJy beam −1 ). This outer CORNISH region corresponds directly to the epoch-IIIb 10 http://www.vla.nrao.edu/astro/calib/manual/ observations detailed in Table 1. From the 2007 season onwards the VLA made extensive use of the upgraded EVLA antennas, which have more sensitive receivers. In addition, the weather conditions were better in the second season than in 2006, when observations were affected by electrical storms. Observations of approximately the inner 20 degrees of the CORNISH area (δ < −10.5 • ) also took place during the second season, corresponding to epochs II and IIIa. However, the RMS noise level is similar to the 2006 season for a number of reasons. In particular, the inner CORNISH region is seen at relatively low elevations from the VLA site, requiring the telescope to peer through a greater path-length of atmosphere. Emission from the atmosphere causes an increase in system temperature decreasing the signal-to-noise ratio in the data. The epoch-II observations utilised fewer EVLA antennas and, because the telescope was at the beginning of the VLA/EVLA transition, required extensive flagging to render the data usable. Figure 11 presents a histogram of the distribution of noise measurements, sampled on 2 ′ scales, across the whole survey. The division between the inner and outer CORNISH regions is obvious. Both regions exhibit high-noise tails, corresponding to fields containing bright and extended emission.

Spatial scale of noise
Interferometry data often exhibit non-Gaussian noise statistics, largely due to the non-linear deconvolution process and poorly sampled uv-coverage at large spatial scales. In regions with complex structures on scales greater than ∼ 14 ′′ the emission is poorly constrained by uv-coverage of the VLA B arrays. If only a few short baselines contain most of the flux a simple fringe pattern is produced on the sky. The flux is not evenly distributed but accumulates at specific spatial scales, depending on the sampling in uv-space. The deconvolution algorithms used here also struggle to model this emission, resulting in some of the flux being scattered onto the surrounding sky (see Section 3.2). It is important to characterise this 'ripple' noise pattern before attempting to search for real  emission in the CORNISH data. We have measured the noise characteristics of representative CORNISH data affected by a ripple. The region chosen was centred on α = 18 h 09 m 21.96 s , δ = −20 • 19 ′ 34.9 ′′ and the RMS noise was measured using both the standard-deviation (STDEV) and median absolute deviation from the median (MADFM) statistics. For a dataset X = x 1 , x 2 . . . x i . . . x n MADFM is given by i.e., the median of the deviations from the median value. For a normal distribution MADFM is equivalent to the standard deviation using a scale factor K = 1.4826. The advantage of MADFM is that it is insensitive to the presence of outliers in the distribution and delivers a robust estimate of the true noise. Measurements were conducted using a range of aperture sizes, varying between 12 ′′ and 240 ′′ in steps of 2.82 ′′ . In total twenty one positions were measured, offset in declination by 6 ′′ along a line centred on the noise peak. The scatter in the results (expressed in standard deviations σ) for each aperture size is plotted in Figure 12. From the plot we see that the scatter in the ensemble set of measurements increases as the aperture size decreases. The MADFM statistic remains stable at smaller spatial scales than the STDEV. At scales less than 2 ′ the scatter in the STDEV measurement slowly rises, compared to MADFM, whose scatter remains less than 0.1 mJy beam −1 until scales of 40 ′′ . Measurements of the global noise-properties of the CORNISH data are therefore best performed using apertures spanning 40 ′′ or larger using the MADFM statistic.

The CORNISH source finder
We have developed an automated source finding procedure with the aim of producing a well-characterised catalogue of 5 GHz emission in the northern Galactic plane. In the following subsections we describe the source-finding and measurement procedures and investigate the limits of the catalogue.

Source detection and photometry
Tiles were automatically searched for emission using a custom procedure based on the OBIT FndSou task. FndSou identifies contiguous islands of emission above a global intensity threshold and attempts to fit one or more 2D Gaussians to each. This approach works well in the simplest case of an image with homogeneous noise properties, however, in the worst-case scenario the RMS noise can change by a factor of a few over a 20 ′ × 20 ′ tile. This is especially true of tiles covering the Galactic mid-plane, where massive star-forming complexes are common. Using a single intensity threshold often results in spurious detections or omissions of real sources. To compensate for variable noise levels we ran FndSou on a 9×9 grid of 'patches' within the tile area. Each patch is 800 pixels (4 ′ ) on a side and overlaps adjacent patches by 400 pixels in R.A. and Dec. The local RMS noise in each patch was determined using a histogram analysis clipped at 3σ from the median value. With this patch layout a radio source within a 2 ′ band around the tile edge may be detected in two patches, except at the tile corners. A source in the interior may be detected in up to four overlapping patches. The maximum fitted Gaussian FWHM was constrained to be ≤ 30 ′′ in keeping with the uv-coverage. Fits within 14 ′′ of the patch edge were deemed invalid, except where a patch abutted a tile edge. Running this patch-based emission finding procedure results in a degenerate list of sources with coincident positions derived from overlapping patches. A list of unique Gaussian fits to each tile was produced by searching for duplicates at similar positions (separation < 1 ′′ ) and with similar peak amplitudes (A min /A max > 0.7). The Gaussian fit closest to the centre of a patch was retained.
Initially, the search was conducted using a 4σ local noise threshold and aperture photometry was performed to weed out detections with a signal-to-noise ratio σ < 5.0 (σ = maximum pixel/RMS-noise). An elliptical aperture was used to measure the source properties, which extended to the 3σ Gaussian major and minor axes (2.548 × FWHM). If the emission was indeed Gaussian in shape this aperture would encompass 99.7 percent of the emitting flux. The RMS noise and median background level of the sky were measured from a 20 ′′ wide annulus centred on the source and offset from the measurement aperture by 5 ′′ . The annulus width was chosen to sample the local noise pattern without being influenced by ripples or negative-bowls (see Section 4.3.1). In crowded regions the sky annulus is likely to contain bright and real sources so the noise was measured using the robust MADFM statistic. The parameters of the valid Gaussian fits and photometric measurements were both recorded to the MySQL database, although the Gaussian fits are preferentially used in the default CORNISH catalogue.

Resolved emission
The source finder determines accurate fluxes for isolated and unresolved sources but decomposes complex structures into multiple overlapping Gaussians fits. It is highly desirable to merge these into a single measurement to avoid over-interpreting the number-counts and properties of sources in the final catalogue. Clusters of Gaussians were identified in the catalogue using a friends-offriends search: a Gaussian was associated with a cluster if it was within 12 ′′ of any other member. In total, 741 clusters were found and these were all inspected manually. To distinguish between adjacent but unrelated sources and over-resolved emission the morphology at 5 GHz was compared to that in the Spitzer GLIMPSE mid-infrared images. The most common extended sources in the images are UCH II regions and planetary nebulae, each of which have distinctive mid-infrared signatures. For these types of object the morphology of the 8 µm emission often echos that of the radio continuum ). If a cluster of Gaussian fits was found to trace an overresolved source then the fitted parameters were replaced with a single measurement under a polygonal aperture manually drawn around the emission. Figure 13 shows an example of a polygon carefully drawn around the border of a cometary H II region. The flux density is calculated from the sum of the pixels within the source aperture minus the median background level in the vicinity of the source. In addition to the coordinates of the peak emission (for which the source is named in l and b) we also record the geometric and intensity-weighted positions.

Measurements and uncertainties
Below we explain how the properties of the sources were measured and the uncertainties calculated. The final values are presented in the CORNISH catalogue, including the measurement-error and the absolute uncertainty on each parameter, incorporating the calibration error of 8.9 percent.

Gaussian fits
Uncertainties on the Gaussian fits are calculated using the equations derived by Condon (1997), summarised here for convenience. Noise in interferometric data is correlated on the scale of the synthesised beam FWHM, in this case θ bm = 1.5 ′′ . The effective signal-to-noise level ρ of a source with measured peak amplitude A peak seen against a background of correlated Gaussian noise is given by where θ M and θ m are the respective major and minor fitted axes and σ sky is the RMS noise measured directly from the image. The exponents α M and α m have been estimated by Condon (1997) via Monte-Carlo simulations and are α M = α m = 3/2 for the amplitude and flux density errors, α M = 5/2, α m = 1/2 for the error on the major axis, and α M = 1/2, α m = 5/2 for the minor axes, position angle and absolute coordinate errors. On average, the signal-to-noise ratio is increased by a factor of 1.4. The positional uncertainties parallel to the major (σ M ) and minor (σ m ) fitted axes are given by using values for ρ calculated from Equation 3. When the fit is projected onto equatorial axes the absolute position errors in right ascension (σ α ) and declination (σ δ ) become σ 2 α ≈ ǫ 2 α + σ 2 M sin 2 (P.A.) + σ 2 m cos 2 (P.A.), σ 2 δ ≈ ǫ 2 δ + σ 2 M cos 2 (P.A.) + σ 2 m sin 2 (P.A.), where P.A. is the position angle of the fitted major axis east of north and ǫ α = ǫ δ ≈ 0.1 arcseconds is the systematic positional uncertainty. This value was determined via a comparison between CORNISH and catalogues of quasars whose positions are determined to milliarscsecond accuracy. The 15 matching quasars were drawn from the Goddard VLBI astrometric catalogues 11 , Very Long Baseline Array Galactic Plane Survey (VGaPS, Petrov et al. 2011) and the VLA-calibrator manual 12 and their median offset of 0.1 arcseconds was adoped as the systematic positional uncertainty for CORNISH. Errors in the P.A. may be calculated from although we note that position angle values are only relevant when one or more axes is significantly resolved. Uncertainties associated with the fitted major and minor Gaussian FWHM are given by The fractional calibration uncertainty ǫ θ = 0.02 is adopted from the VLA NVSS survey (Condon et al. 1998), which was observed using a similar snapshot mode. A single characteristic measured angular size θ f may be obtained from the geometric mean of the major and minor axes and its associated measurement uncertainty given by The restoring beam was forced to be a circular Gaussian of FWHM θ bm = 1.5 ′′ over the whole survey area so the deconvolved source size θ s in arcseconds may be found from although we note that detections with θ f < 1.8 ′′ are considered unresolved in CORNISH. The fitted amplitude A peak must be corrected for the clean bias ∆A cb = − 0.94 σ sky , which we measured in Section 5.6 below, so The uncertainty on the fitted amplitude may be calculated from where ǫ A = 0.089 is the fractional amplitude calibration error. Finally, the integrated flux density S under a 2D Gaussian is given by and the corresponding uncertainty is

Aperture photometry
The peak amplitude reported for sources measured using aperture photometry is simply the intensity of the brightest pixel within the source aperture, corrected for the clean bias.
The effective signal-to-noise of the source in the presence of correlated Gaussian noise may be determined from a modified version of Equation 3 in which θ M and θ m are both replaced with the intensity-weighted diameter For a perfectly circular Gaussian source θ d ≡ θ f ≡ FWHM. In Equation 16 r i is the angular distance from the i th pixel to the brightness-weighted centre and A i is its intensity. The sky-noise corrected for Gaussian correlation is then where α = 3 for all errors. The uncertainty on the peak amplitude is where ǫ A = 0.089 is the calibration error. The equation for measuring the integrated flux density S phot of a source using aperture photometry can be written as where Nsrc i=1 A i is the total flux in the source aperture summed over N src pixel elements (in units of Jy pixel −1 ),B is the background flux density estimated from the median level in the sky-annulus and a bm = π θ 2 bm /(4 ln(2) ∆ 2 pix ) is the beam-area in pixels (28.33 pixels for CORNISH data). S phot must also be corrected for clean-bias. If the source is unresolved the missing flux ∆S cb is given by however, because the clean-bias reduces the flux in all clean-components (CCs) by a constant factor the effect on extended emission is difficult to gauge. The minimum number of CCs required to model an extended source can be estimated from the number of beam-areas n beams subtended by the emission. The integrated flux density is then S = S phot − ∆S cb * n beams .
The error on the integrated flux density may be found from where σ 2 g is the corrected variance, N src and N sky are the the number of pixels in the source-and sky-apertures, respectively. The term σ( A i ) is the uncertainty on the sum over the pixels in the source aperture given by The uncertainty on the intensity weighted diameter maybe found from where the sums are taken over the pixels in the sourceaperture. For emission measured using a polygonal aperture the intensity-weighted position is given bȳ where x i is the right-ascension (α) or declination (δ) in the i th pixel. The corresponding error inx is given by with ǫ x = ǫ α = ǫ δ , the absolute positional error of the associated phase calibrator.

Spurious sources
We have attempted to estimate the number of spurious sources detected in well calibrated and well behaved data by running the source finder on inverted tiles, i.e., tiles where the pixel values have been multiplied by −1. Any negative detections will be false and allow us to estimate the number of spurious sources as a function of the signal-to-noise ratio. Fourteen tiles were selected to be representative of the emission properties across the survey region. They contained variously: no strong emission, one or more point sources with S 5 GHz > 50 mJy, weak extended emission, and bright extended sources causing moderately elevated noise levels (0.5 mJy< RMS <0.8 mJy). For comparison, we ran the Signal-to-noise threshold (σ) Cumulative detections (x81) Fig. 14.-The solid histogram illustrates the cumulative distribution of spurious detections in the CORNISH catalogue as a function of the signal-to-noise ratio measured from fourteen inverted tiles. The plot has been scaled to the total CORNISH area (∼ 1300 tile areas) by multiplying by 81. In comparison, the hatched histogram shows the distribution of detections from the same tiles before inversion. See Section 5.4 for further details. source finder using a 3.5σ cutoff on both the inverted and the regular tiles. Figure 14 plots the cumulative counts of detections as above a signal-to-noise ratio, expressed as σ. The grey-shaded histogram illustrates the number of spurious detections in the inverted tiles, while the hatched histogram illustrates the detections in the normal tiles, some of which will be real. Note that below 4.5σ the detections are dominated by spurious sources. The fourteen tiles represent 1.23 percent of the survey area, so by scaling the plot by 81 we can estimate the total distribution of spurious sources in CORNISH. In Figure 14 the number of spurious sources found decreases to 81 at 6.1σ, above which our scaling is too crude to sample. For populations governed by Gaussian statistics the fraction f (σ) of the populations lying within a σ-threshold is given by where erf (σ) is the Gaussian error function. The solid black curve in Figure 14 plots f (σ) assuming the total number of possible detections is equal to the number of synthesised beam areas in CORNISH (≈ 5.6×10 8 beams). It is clear that this assumption underestimates the number of spurious sources found, so we fit f (σ) with the total number of sources as a free-parameter. The fit is shown by the short-dashed line and is dominated by the large number of sources in the bins with σ < 4. Above 4σ the distribution has a shallower fall-off than expected for Gaussian statistics. An alternative is shown by the longdashed line, which is fitted only to the bins with σ > 4 and uses the error-function of a distribution with a narrower width than purely Gaussian (σ = 0.9σ gauss ). It is a significantly better match to the high signal-to-noise end of the distribution and predicts less than one spuri-ous source above 7σ. Based on this reasoning we have chosen a signal-to-noise cutoff of 7σ for a high-reliability CORNISH catalogue. We caution that data with greater complexity or poor calibration may introduce significant numbers of false sources above this level, so this does not mitigate the need to manually inspect the data for artefacts. Sources detected below 7σ are not offered as an official data-product, but this low-reliability catalogue will be made available on the CORNISH web page

Density of low signal-to-noise sources
A density map of the CORNISH detections serves to highlight regions containing excessive numbers of weak sources, some of which may be spurious. Figure 15 plots the number density of CORNISH sources above 5σ summed within an eight arcminute radius. The most prominent feature is a line of elevated pixels corresponding to the scan row at δ = −16 • 55 ′ 03.13 ′′ (13.3 • < l < 14.5 • ). This row is unique in the survey as each field has only a single 40 second snapshot observation. Data in the second pass were found to be corrupted and no repeat observations were scheduled due to an array configuration change.
Isolated clumps of pixels with high source counts (e.g., at l = 12, l = 31, l = 43 and l = 49) correspond to molecular cloud complexes forming massive stars. The over-density of weak sources in these regions could be due to either a real increase in source counts or to an increase spurious in sources generated by the deconvolution process. Both scenarios warrant careful inspection of the data.

Completeness
To quantify the formal sensitivity limits of the pipeline reduced data we conducted completeness tests on tiles from 'empty' parts of the sky. The tiles were chosen to have few detections above 5σ, homogeneous noise properties and be free of obvious imaging artefacts. One-  Tile 43  Tile 141  Tile 223  Tile 273  Tile 349  Tile 529  Tile 1248  Tile  hundred artificial point sources were injected into the calibrated uv-data for each tile before creating a mosaiced image. The flux densities of the injected sources were varied randomly between 0.5 and 5.0 mJy, so as to bracket the expected sensitivity limit. Positions were also chosen randomly, but avoided known emission, the tile edges or regions where noise spikes were common. FndSou was then used to find and fit the emission with Gaussians. After twenty iterations of the injection-imaging-fitting routine the aggregate results were compared to the injected source parameters. Figure 16 plots the percentage of sources recovered as a function of the flux density for tiles covering a range of RMS values and Epochs. The image parameters are presented in Table 2 alongside the fifty and ninety percent completeness limits. As expected, tiles with lower measured RMS noise levels tend to have lower completeness limits. There is, however, significant variation between tiles as the local completeness limit ultimately depends on the uniformity of the noise pattern within the tile. At worst (tile 529, Epoch I) the CORNISH survey is 90 percent complete to point sources at the 3.9 mJy level.

Clean bias
When deconvolving the synthesised beam from the images, the flux level at which the CLEAN algorithm halts must be chosen carefully. If the cutoff is set too far above the noise then the residual images will be dominated by sidelobe patterns. If it is too low CLEAN will inadvertently identify noise-spikes and sidelobes as real emission. Both negative and positive clean components on sidelobes will result in flux being subtracted from the positions of real sources and can artificially lower the RMS noise. The AutoWindow function described in Section 3.2 has been shown to reduce the clean bias. However, we chose a relatively low CLEAN cutoff when imaging CORNISH data and need to measure the bias level in order to evaluate the correct flux-densities and uncertainties for the CORNISH catalogue. In a similar test to the one presented in Section 5.5 we inserted twenty point sources into the uv-data for each of tiles 43, 273, 529 and 1248. The flux densities were set randomly between 2 and 20 mJy. The data were imaged and mosaiced using the CORNISH pipeline, and aperture photometry was used to recover the artificial source fluxes. We found that the clean bias was consistent across all four tiles, despite being representative of different epochs. We adopt a mean clean bias of ∆A peak = −0.94 σ sky (typically 0.33 mJy), indicating a moderate level of over-cleaning compared to NVSS, which quotes ∆A peak = −0.67σ. We judge that this will not affect the utility of the catalogue.

Manual quality control
According to the results of Section 5.4 we do not expect to find significant numbers of spurious sources above 7σ in well-behaved CORNISH data. This statement is not necessarily true of high-noise fields containing bright and extended emission associated with massive star-forming complexes. Occasionally, peaks in the rippled noise pattern may be mistaken for real emission, or calibration errors may conspire to create false sources. To alleviate this problem we visually inspected all high-reliability CORNISH detections (i.e., those peaking above 7σ) to assess them as potential artefacts.
The CORNISH team visually inspected all mosaic tiles and individual sources in the 7σ catalogue. All sources were classified as being either 'unlikely', 'possibly' or 'likely' an artefact based on the criteria above, i.e., located on a peak in a high noise ripple region, near a very bright source, or in a region where there appears to be an excessive number of potentially spurious 5 -7σ sources (see Figure 15). If a source suspected of being possibly or likely an artefact was found to have a radio or infrared counterpart then, of course, the flag was left as 'unlikely' in the CORNISH database. Smaller UCH II regions lying within the noise radius of a much brighter emission often have counterparts in the GLIMPSE IRAC data, while planetary nebulae (PN) are often seen in the UKIDSS data, confirming them as real detections. Both source types appear in the far-infrared MIPSGAL bands (24 µm and 70 µm). Real extragalactic sources are not likely have counterparts in the infrared datasets and so retain their possible-artefact flag in suspect regions.

Results
We found 3,062 sources in the CORNISH data above a 7σ detection threshold. Of these, 2,591 were well fit by model Gaussians and the remaining 471 sources required measurement using a hand-drawn polygonal aperture. A total of 286 and 138 sources were classified as 'possible' or 'likely' artefacts, respectively, and a flag set in the final high-reliability catalogue. They remain available in the on-line catalogue and users will be able to include possible and likely artefact sources in their searches. Below we present the new, high-reliability catalogue of 5 GHz radio-emission containing 2,638 sources.

Catalogue format
Isolated and unresolved sources identified by the source finder have two recorded entries taken from fitted Gaussian parameters and aperture photometry measurements. Sources exhibiting structured and extended emission have a single entry, based on aperture photometry performed using a manually drawn polygon. When assembling an aggregate catalogue we favoured the Gaussian fitted values. The photometric measurements are useful for diagnostic purposes.
An excerpt from the final CORNISH catalogue is presented in Table 3. The columns are as follows: column (1) contains the CORNISH source name, constructed from the Galactic longitude and latitude of the source. The equivalent right-ascension (α) and declination (δ) are displayed in columns (2) and (3), respectively. For sources well fitted by Gaussians the adopted coordinates are simply the peak positions of the fits. The intensity weighted position is quoted for extended sources measured using a polygonal aperture. The associated positional uncertainties are given in columns (4) and (5). Two uncertainty values are quoted for catalogue entries. The first value is the absolute uncertainty, incorporating both measurement and calibration errors. The second value (in brackets) is the error associated with the photometry or Gaussian fit alone. Column (6) presents the peak flux density in units of mJy beam −1 . The 5 GHz integrated flux density (S 5 GHz ) is presented in column (7). Column (8) contains the measured angular-scale of the emission θ f , which has been determined from the geometric average of the major and minor Gaussian fit axes, or the intensity-weighted diameter in the case of extended emission. Sources with θ s > 1.8 are considered to be resolved in the CORNISH images and their deconvolved sizes are presented in column (9). The local RMS noise measured from the photometric sky-annulus is recorded in column (10). Column (11) notes how the flux density was measured, either with an polygonal aperture, or using the Gaussian fit. Finally, column (12) contains a range of flags notifying the reader if the source is: • within 12 ′′ of another source, • lying on an unusually high-noise region (RMS> 0.45 mJy beam −1 ), • imaged using the smoothed weighting scheme described in Section 3.2.2 (within 4.45 ′ of a field centre, i.e., half a primary beam FWHM), • within 3 ′ of a bright (> 0.5 Jy) source, • within 2 ′ of the edge of the survey, • within an area containing numerous low signal-tonoise detections likely to be spurious, • overlaps with another source, • or has been flagged as a suspected artefact during manual inspection.
The flags are described in more detail in the footnotes to  In both panels the high-reliability catalogue (σ ≥ 7) is plotted using solid shading, while the hatched histogram contains a sub-sample of resolved sources (θ f > 1.8 ′′ ).
on the project website (http://cornish.leeds.ac.uk). A query based web interface is also available which allows the user to retrieve specific catalogue subsets and drill down to the underlying data.

Angular size
The measured angular size θ f quoted in the catalogue is given by the geometric average of the major and minor fitted Gaussian FWHM axes ( √ θ M θ m ), or by the intensity-weighted diameter (θ d ) in the case of emission measured using a polygonal aperture. For a bright source with a Gaussian morphology these measurements are equivalent within the errors. Figure 17 plots the distribution of measured angular sizes for the high-reliability catalogue. The distribution begins to flatten at sizescales greater than 5 arcseconds, before tapering out at 30 ′′ , where the upper-limit for the source-fitter is set. Above sizes of 14 ′′ the deconvolution algorithm struggles to model the poorly-sampled longer uv-spacings (see Section 3.2.2), hence the slight increase in counts at that scale -broad sources are artificially truncated at sizes of 14 ′′ .
The uncertainty on the angular size of CORNISH sources is better than 0.3 ′′ for 96 percent of compact (θ f < 5 ′′ ) catalogue entries. We consider sources with θ f < 1.8 ′′ (i.e., the restoring beam size plus 0.3 ′′ ) to be unresolved. Sixty-one percent of the 7σ catalogue fall into this category. Below we examine the differences between the resolved and unresolved populations. Figure 18 illustrates the distribution of CORNISH sources as a function of Galactic latitude (upper panel) and longitude (lower panel). The solid-shaded histogram contains all sources in the high-reliability catalogue, while the hatched histogram contains only the subset of resolved detections (859 sources). Resolved sources account entirely for the broad peak seen in the latitude distribution, with the remaining unresolved detections exhibiting a flat profile. The scale-height of the resolved latitude distribution is 0.47 • , consistent with that of UCH II regions, 6.7 GHz methanol masers and other tracers of high-mass star-formation (e.g., Green et al. 2009, Urquhart et al. 2007. The supposition that a large fraction of the resolved sources arise in high-mass star-forming regions is lent weight by their Galactic longitude distribution. The number of sources per 2 • bin increases gradually towards longitude zero, while the two spikes at l ≈ 43 • and l ≈ 50 • correspond to the W49 and W51 complexes, respectively. Conversely, unresolved detections (1,719 sources) show a flat distribution with Galactic longitude and are likely to contain significant numbers of active galactic nuclei (AGN), and other extragalactic sources. We note that this is partly by design as course adjustments were made to the deconvolution algorithm in order to keep the number of low-level sources roughly constant (Figure 3). The expected density of extragalactic sources in CORNISH may be calculated from Equation A2 of Anglada et al. (1998), using the 5 GHz source counts of Condon (1984).

Galactic distribution
Assuming median values of RMS-noise for the two regions presented in Figure 11, we expect to find ∼ 2400 extragalactic sources in our 7σ catalogue, consistent with the number of unresolved detections. Most extragalactic sources can be classified as they are not detected in any of the infrared wavebands. Specific catalogues of CORNISH sources identified as UCH II regions, PNe and AGN will be presented in a forthcoming paper. Figure 19 shows the distributions of flux densities and peak fluxes for the CORNISH sources. At flux density levels of ∼ 5 mJy or greater the distribution is well fitted with a power-law of index −0.81. Below 3 mJy the number of sources begins to decrease as the 7σ detection limit is encountered. The distribution of flux densities for resolved sources turns over at approximately 6 mJy due to the constraints imposed by their selection and the 7σ signal-to-noise cutoff.

Flux density and peak flux
The peak flux distributions are identical for the highreliability and the resolved source catalogues. Above the sensitivity cutoff (∼ 2.5 mJy beam −1 ), both are fit by a power law of n src ∝ S −0.91 5 GHz .

Comparison to other catalogues
Of all prior observations the White et al. (2005) VLA survey of the Galactic plane has the most similar observing setup and sky-coverage. A comparison to that work serves as a useful sanity-check on the ensemble properties of the CORNISH catalogue and images. The 5 GHz component of the White et al. survey was observed using the D, DnC and C arrays. These more compact VLA configurations (compared to B and BnA) yield better sensitivity to extended emission than CORNISH, but at a lower resolution (∼ 6 ′′ ). White et al. (2005) imaged the Galactic plane between −10 • < l < 42 • and |b| < 0.4 • , of which 25.6 square degrees overlap with the CORNISH target area. The measured noise properties of their images are lower, with a median RMS of ∼ 0.27 mJy beam −1 compared to ∼ 0.35 mJy beam −1 for CORNISH data. The cutoff limit for the White et al. source catalogue was chosen to be 5.5σ (∼ 1.4 mJy beam −1 ) compared to 7σ (∼ 2.5 mJy beam −1 ) in this work. We would expect similar flux densities for compact sources (≤ 6 ′′ , see Figure 4) common to both catalogues, despite differences in uvcoverage. Systematic errors present in either catalogue should be obvious in a flux-flux comparison plot.
The White et al. 5 GHz catalogue contains 1822 entries in the overlapping area and we match 558 of these with 521 CORNISH sources using a 5 ′′ search radius. The number of matches diminishes significantly at matching radii greater than 2 ′′ , however, a 5 ′′ matching radius was chosen to allow for offsets in the positions assigned to resolved sources in both catalogues. Figure 20 presents a comparison of the measured flux densities for sources Fig. 20.-Comparison of the 5 GHz flux density measurements for sources common to the CORNISH and White et al. (2005) catalogues. No systematic differences are apparent in the plot, however, the absolute differences between the two catalogues increase with flux density. The points are colourcoded to show the angular size of the CORNISH detections and it is clear that the most extended sources are responsible for the outliers seen above ∼ 100 mJy. successfully cross-matched between the two surveys. The measurements agree, on average, to within 39 percent, with no evidence of systematic differences. A greater fraction of sources in the high flux density bins are resolved, hence the outliers in the plot above ∼ 100 mJy may be attributed to imaging and measurement differences between the two surveys. Points representing individual sources in Figure 20 are colour-coded to indicate angular size in the CORNISH catalogue. Unresolved sources (blue) cluster around the equality line while the outliers are almost all extended (red).
In total, 1264 sources in the White et al. catalogue remain unmatched using a simple cone search within five arcseconds. Of these, fifty percent lie above our 7σ sensitivity threshold and are sufficiently bright to have been detected in CORNISH. The reasons for the disparity become apparent upon comparing the White et al. and CORNISH images. A significant fraction of the bright, unmatched sources have angular scales greater than ∼ 20 ′′ in the White et al. data and are simply resolved out by the VLA B configurations used by CORNISH. Figure 21 (panels i and ii) presents examples of such objects. The central source powering the radiogalaxy shown in panel (i) is detected in both surveys as a 25 mJy point source. The radio-lobes have angular scales of ∼ 30 ′′ in the White et al. image and the brighter southern lobe has a peak flux density of 14 mJy beam −1 . In the corresponding CORNISH image the northern lobe is completely resolved out, while the southern lobe is detected at a 5.5σ level and is therefore not included in the high-reliability catalogue. Similarly, the cometary H II region G24.799+0.097, shown in panel (ii), is resolved into multiple components by CORNISH. When assembling the CORNISH catalogue we took great care to identify such over-resolved emission as a single source (see Section 5.2). In the White et al. catalogue individual Gaussian fits to complex emission are left separate. G24.799+0.097, for example, has four catalogue entries and a 15 ′′ matching radius is required to correctly match these to their CORNISH counterpart.
Panel ( infrared data. The green polygon illustrates the aperture used to measure the properties of the 5 GHz emission. In the three-the two surveys. In general, the CORNISH images have more homogeneous noise properties and are of higher quality. The White et al. images contain numerous compact sites of emission not present in the equivalent CORNISH mosaics. These often occur in regions adjacent to very bright sources or poorly imaged extended emission, and are themselves generally unresolved and weak (90 percent < 10 mJy beam −1 ). Their morphology and location makes them likely to be artefacts of the imaging process. Examples of such artefacts are visible in panel (iii) and, to a lesser extent in panel (ii) of Figure 21. Artefacts in the CORNISH image are limited to a moderate level of ripple and few noise-spikes. By comparison the White et al. images often contain significant sidelobe structure and a number of spurious emission sources. The advantages of utilising a semi-automatic pipeline are apparent in the high-quality of the CORNISH images. In general, such fields also illustrate the importance of manually inspecting the results of automatic source-finders when dealing with under-sampled interferometric data.
There are 262 CORNISH sources which have no counterpart in the White et al. catalogue, despite peaking well above the nominal 5.5σ detection limit. A small fraction of these are pathological cases, like the two compact sources at the centre of the CORNISH image in panel (ii) of Figure 21. The sources have flux densities of 12 and 20 mJy and should be visible in the White et al. image. A linear discontinuity cuts through the image at this position, so we speculate that the omission derives from a problem with the imaging process used by White et al. (2005). The majority of the remaining unmatched CORNISH sources are detected below the 10σ level and when present in the White et al. images are washed out by ripples in the noise or other imaging artefacts.

Example CORNISH data
The CORNISH dataset contains objects of many types, including H II and UCH II regions, PN, evolved stars, active binaries, radio lobes from external galaxies, and many AGN and quasars. Figure 22 presents sample CORNISH images for known objects opposite their counterpart data from the Spitzer GLIMPSE and MIPSGAL surveys. The first column of images shows the CORNISH data, the second a three-colour image made from the mid-infrared GLIMPSE IRAC bands and the third column the 70 µm MIPSGAL image. The first two rows present examples of resolved H II regions with different morphologies. G013.8726+00.2818 is a classical cometary H II region, while G018.3024−00.3910 is irregularly shaped. For most resolved H II regions in CORNISH the shape of the 5 GHz continuum emission is echoed and extended in the GLIMPSE three-colour image. The 3.6 µm and 8.0 µm bands (coded blue and red, respectively, in Figure 22) contain broad lines from polyaromatic-hydrocarbons (PAHs), which are excited by the strong ultraviolet radiation field (Peeters et al. 2002). Galactic massive star-forming regions are readily iden-tifiable via the appearance of the filamentary PAH emission surrounding them. The extended flocculent emission (appearing purple in the GLIMPSE colour coding in Figure 22) traces the clumpy photodissociation region (PDR) at the interface between the ionised gas and the enveloping molecular cloud. The spectral-energy distribution of H II regions peaks in the far-infrared and they are easily detected as a bright source in the MIPSGAL 70 µm images.
The third row shows a particularly good example of a resolved planetary nebulae (PN). The GLIMPSE colours of PN are similar to those of the H II regions, but the SED falls off more steeply in the far-infrared, hence the MIPSGAL 70 µm band is noticeably less bright. PNs tend to be isolated objects in the GLIMPSE images, having long since dispersed their natal molecular clouds. The expanding shell of gas surrounding the PN also contains PAHs, which are excited by the ultraviolet photons generated by the central stellar remnant (see Smith & McLean 2008 and references therein). Unlike the H II regions, the PAH emission is confined to the ejected envelope leading to simple mid-infrared morphology. G051.5095+00.2686 in Figure 22 exhibits a similar ring-shape in both CORNISH and GLIMPSE images.
The final two rows present examples of radio-galaxies in which the radio lobes have been resolved.
In G057.3066+005467 the central driving source is not detected and the lobes are barely resolved as two tear-drop shaped sources extended towards each other. The central driving source of G060.7862−00.6360 is detected as a point source near the centre of the image and both radio-lobes are well-resolved, if weak. Neither radiogalaxy has a counterpart in any of the associated midor far-infrared images.

Summary and future work
The CORNISH project has delivered the best ever complementary radio view of the northern GLIMPSE region at 5 GHz (6-cm wavelength). With a resolution of ∼ 1.5 ′′ and a RMS noise level of < 0.4 mJy beam −1 , the survey is tailored to search for UCH II regions across the Galaxy, but has also detected a wide range of radio-bright objects that are also identified in other categories.
We present here a catalogue of 3,062 compact radio sources detected in CORNISH data above a 7σ signal-tonoise threshold. A high-reliability subset (2,638 sources) contains has been flagged to exclude potential spurious detections in poorly imaged uv-data. Fields containing emission extended on scales greater than 14 ′′ are poorly sampled by the uv-coverage of the VLA B-configuration, giving rise to a small number of spurious sources. Such fields represent only two percent of the survey area and a rigorous program of manual inspection has flagged suspected artefacts, hence, we estimate the catalogue reliability to be better than 99 percent. To date, the CORNISH catalogue is the most uniformly sensitive, homogeneous and complete list of compact radio-emission sources at 5 GHz towards the northern Galactic plane.
Mosaiced images and calibrated uv-data in FITS format are available to download from the CORNISH website (http://cornish.leeds.ac.uk). We have created a data server, which is operated by submitting a list of positions and serves either postage-stamp images or calibrated uvdata. The full CORNISH catalogue is also available online via a query based interface, as a plain-text format file, or a VO-table. General access is also available through the VizieR service.
Much work remains to be done in order to fully exploit the CORNISH dataset. A future paper in the survey (Purcell et al., in prep) will cross-match the 5 GHz radio emission to the complementary Spitzer GLIMPSE and UKIDSS datasets, allowing the identification of specific source types via their SEDs.
The authors would like to thank the referee, Jim Condon, for his thorough comments which very much improved the quality of the paper. We would also like to thank the Director and staff of the VLA for their assistance during the preparation of these observations. Thanks also go to James Alison for many helpful discussions. CRP was supported by a STFC postdoctoral grant while at the universities of Manchester and Leeds.
The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc.