Photometric Redshifts

Data Release 8 includes photometric redshift estimations for all galaxies. As with previous releases, we provide two alternative versions. This page summarizes the methods used to calculate the photometric redshift estimates, with details to follow in a paper in preparation by Istvan Csabai.

One method is similar to the one used in Data Release 7; following the name used in Csabai et al. (2007), we refer to it as a kd-tree nearest neighbor fit (KF). The KF estimates are stored in the table Photoz. The other technique, described in Carliles et al. (2010), is based on random forests (RF). The RF estimates are stored in the table PhotozRF.

Both methods are empirical in the sense that they use a training set as a reference, then apply machine learning techniques to estimate redshifts. The training sets contain photometric and spectroscopic observations for galaxies. We have chosen to use machine learning techniques with training sets, as opposed to template fitting methods, because of the machine learning techniques' higher overall precision.

To infer values of physical parameters of galaxies, such as k-corrections, spectral type, and various spectral features, we extend both the KF and RF methods with a template fitting estimator similar to the one described in Csabai et al. (2003); (the approach used in DR5 and DR6). We also used the colors and inclination angle (expAB_r in the PhotoObj table) of each galaxy. Although using inclination angle does not significantly improve the overall estimation, as described in Yip et al. (2011) it does remove a systematic bias.

The training set contains more than 850,000 galaxies from the DR8 spectroscopic catalog (average r magnitude 17.3), and an additional 14,000 galaxies from other spectroscopic redshift surveys that include deeper (up to redshift of 1) and fainter (average r magnitude 20.75) galaxies. The RMS of the estimation errors for the two parts of the training set are 0.018 and 0.103, respectively. The more than fivefold increase of error for the faint subset is mostly due to the larger photometric errors.

The error statistics for the reference set is not always good indicator for the error of the estimated redshifts, especially when the objects are at distant parts of the magnitude space where the photometric error characteristics are different than that of the reference set. Fortunately both KF and RF provide an explicit estimate of the redshift errors (zErr) and we have found this estimate to be reliable and unbiased. We suggest the users of this catalog to take into account these values when using the photometric redshifts in their analysis.

The KF method provides some additional parameters that can be useful for quality assurance. For each galaxy in the photoZ table, nnCount is the number of nearest neighbors, after removing outliers. A value much smaller than 100 indicates poor training set coverage for that galaxy. The parameter nnInside is another indication of the training set coverage; nnInside=0 indicates that the galaxy is outside of this box in color space, and so outside of the training set coverage. Similarly, the parameter nnVol warns that reference set is only very sparsely populated around that galaxy. Although the spectroscopic redshift of the nearest object (nnSpecz) and the average nearest neighbor redshift (nnAvgZ) is not as good estimator as the fitted redshift (z), significantly different values might indicate large errors. Note that in all the related tables instead of NULL value we use large negative values (<= -1000) to indicate that the estimation was not possible for some reason.

When the photometric redshift of each galaxy is estimated, template fitting is used to estimate the galaxy's k-correction, distance modulus, absolute magnitudes, rest frame colors, and spectral type. The non-negative linear combination (NNLS) of spectral model templates is searched for which synthetic colors, calculated by the convolution with the filter transmission curves matched to the observed colors at the best at the redshifts given by the KF or RF estimator. To improve the accuracy of the k-correction and synthetic magnitude estimations, slightly different filter transmission curves were used for each camera column, respectively. Where applicable, Omega=0.3, Lambda=0.7 cosmology was assumed, where the unit of the luminosity distance is Mpc/h. Chisq and rnorm values indicate the quality of the NNLS fit and nTemplates give the number of spectral templates used for fitting. Spectral templates described in Maraston (2005) were used for NNLS. The value of parameter nTemplates shows the number of templates with which the method could reconstruct the observed photometry of the given object. If nTemplates is a small positive value, one can check the coefficients and templates based on the information in the related PhotozTemplateCoeff and PhotozRFTemplateCoeff tables for the KF and RF estimations, respectively. Note that a nTemplates=0 indicates a failed fit; on the other hand, nTemplates=5 is sign of overfitting.

An example of how to query for photometric redshifts in DR8 data is shown in SkyServer at Sample Queries: Photometric Redshifts.