Site Map

Creating a Large Scale Structure Galaxy Catalog

So you want to create a large scale structure galaxy catalog? If you want to work with the catalogs that were used in the DR9 clustering analysis, they are available from the Value Added Catalogs page. This page describes the files and processes needed to create your own large scale structure galaxy catalog from the raw SDSS-III DR9 files. The first section describes the necessary files and where to get them, and the subsequent sections each describe a step in the procedure.

Necessary files

In order to follow this procedure, you will have to have the following files.

Create target photometric catalog

The first step toward producing a uniform large scale structure catalog is to generate a list of objects targeted with the same target selection algorithm. This is complicated by the fact that the final photometry for an object (as given in the photoobj table) may not match the photometry when the targeting algorithm was run for that object. Early chunks used an earlier version of the photometric data, and one must use the correct target photometry when generating a catalog. Also, the target selection algorithms changed slightly after the early chunks (or, in the case of LOWZ, after bugs in target selection were fixed).

To this end, we have created a merged target list, using the correct target photometry for each object, and have created a version of the BOSS_TARGET1 target flag that reflects the final target selection, as applied to the appropriate photometry for each object. The target list file, linked above, includes a bitfield--BOSS_TARGET1_009--that can be used just like BOSS_TARGET1 to select a set of galaxy targets:

Selecting objects with these flags will remove chunks 1-6 for LOWZ (where the target selection bug appeared), and will use the correct version of the photometry to select targets for CMASS, applying the final version of target selection (i.e., with the final brighter magnitude cut). If you want to see the values of the target photometry (e.g., magnitudes) that were used to target each object, the run/rerun/camcol/field/id values in this file refer to the target photometry in photoObj.

Note that some targets fall into both the LOWZ and CMASS selection boxes. This is not a problem if you are only considering objects from one of the samples. If you are performing a combined analysis using both sets of targets, you should assign a redshift cut to separate them. The minimum in their number densities falls around z ~ 0.43.

Classify objects with redshifts

Objects matched to the target photometry should be classified as one of four types: good redshift from BOSS observations, good redshift from SDSS-I/II ("legacy"), star, redshift failure. These all contribute differently when computing the completeness in each sector on the sky.

Match objects from your target catalog with redshifts from both the specObj-dr9 file. Use the PROGRAMNAME field in that file to select "boss" and "legacy" redshifts:


Match targets to their redshifts with RUN, RERUN, CAMCOL, FIELD, ID (in the target list) and TARGETOBJID (in specObj). See the ObjID glossary entry for a description of the relationship between these fields, and how to convert between them. Warning: TARGETOBJID in specObj-dr9.fits is a 22-character string (because of problems with unsigned 64-bit integers in fits binary tables), so you will have to strip it of whitespace and convert to an unsigned 64-bit int before performing the match.

From this matched target/redshift list, select the individual primary, "best" spectra, (to ignore multiple observations of the same object) with the following applied to all of the selections below:

Select "good" redshifts from the "best" observations with:

Select "failed" redshifts from the "best" observations with:

To use the redshift failure correction of Anderson et al. (section 3.5), you need to separate good star redshifts from good galaxy redshifts. Select objects with spectra that are best fit as "star" from "best" redshifts with:


Fiber correction weights

Because of the finite size of the fibers, objects closer than 62" cannot have spectra taken on the same plate. In order to correct for this, one must apply a set of weights to those targets that have "collided" with other targets within that radius. There are a number of different choices for how to correct for fiber collisions, including using the nearest-neighbor redshift (used in Anderson et al. and Parejko et al.), projected-correlation function weights (used in White et al.), and using the sectors covered by multiple plates to directly compute the correlation function of collided fibers (detailed in Guo et al.). See the respective papers for details on these different methods. In addition, Guo et al. provides a comprehensive overview of the pros and cons of each method.

Redshift failures weights

As discussed in detail in Section 2.3 of Ross et al.(2012), a small fraction of targeted BOSS galaxies do not obtain a redshift (1.8% for CMASS, 0.4% for LOWZ), and the distribution of these redshift failures is not uniformly distributed; to remove these spurious fluctuations, the weight of a redshift failure galaxy is applied to its nearest neighbor in the analysis of the DR9 CMASS sample. The LOWZ failure rate is much lower, so LOWZ redshift failures are treated the same way as objects that did not receive a fiber (i.e., they lower the sector completeness).

Angular systematic weights

Section 5 of Ross et al. (2012) examines a number of further effects that could cause spurious angular fluctuations in the galaxy target density. These systematic weights had much less of an effect on the LOWZ sample. We have not complied a list of potential systematic weights for LOWZ at the moment. Systematic weights determined by a stellar density map and the galaxy ifiber2 magnitude are applied to the CMASS sample. These weights should potentially be recalibrated for a new CMASS galaxy subsample.

Minimum variance weights

To minimize the error in the measured clustering signal, weights based on the sample redshift distribution, such as FKP or J3 should be applied.

Weights summary

The weights we have described above should be combined using Equation 18 of Anderson et al. 2012 to generate a final weight for each galaxy. Note that use of the J3 weights (as in Reid et al. 2012) is slightly more complicated, as the J3 weight is applied to pairs of galaxies.

Angular Selection function

The masks describes the regions of sky observable by BOSS. The masks are spherical polygon files in the mangle format. The mask includes both an acceptance mask (regions of the sky that were included in the survey), and an rejection mask (regions of the sky that are explicitly excluded). The rejection mask removes regions around bright stars, the center posts of the plates, fields with bad imaging data, and regions where other targets had priority for being assigned a fiber.

When computing a correlation function statistic, we use points uniformly randomly distributed in the mask to trace out the geometry. The program ransack, distributed with mangle, will generate uniformly distributed randoms in an inclusion mask. Note that ransack will not work with fits files, but we have provided a mangle .ply formatted version of the file for this purpose.

Acceptance masks

There is one acceptance mask file given in the file list above. Accept all objects (galaxy targets and randoms) that are contained within the polygons in the acceptance mask.

Rejection masks

There are 4 rejection mask files given in the file list above. Reject all objects (galaxy targets and randoms) that are in the polygons in the bright-star mask, centerposts mask, collision priority mask, and bad field mask.

Determine BOSS completeness by sector

This completeness specifies the probability in a given sector of the survey of obtaining a redshift for a target, and is an input for creating the angular mask of your galaxy sample. Anderson et al. 2012 (section 3.3) and Parejko et al. 2012 (section 3.1) provide details on how to account for redshift failures, fiber collision corrections, and legacy objects when computing the sector completeness.

The completeness in sectors containing no BOSS targets is ambiguous (these sectors are typically very small). This has been addressed differently in the various BOSS analyses published so far. In Anderson et al., we chose to remove such sectors if they were not surrounded in every direction by nearby sectors within 2 degrees that had spectroscopic observations, or if they were larger than 0.1273 square degrees. In White et al. and Parejko et al., we only keep sectors with areas larger than 10^-4 steradians.

Downsample the legacy sample and close pairs to BOSS completeness

The DR9 CMASS analysis subsamples the "legacy" galaxy sample in each sector based on its BOSS completeness so that the full galaxy sample is described by a single mask. Moreover, one redshift is removed from fiber collision BOSS-legacy and legacy-legacy pairs in each sector based on the fraction of unresolved fiber collisions on a sector-by-sector basis. See section 3.3 of Anderson et al. 2012 for more detail.

An alternate method, (used in Parejko et al.), is to assign galaxies that received a redshift as part of the legacy survey a BOSS completeness of 1, regardless of the completeness of the sector they occupy. This method does not throw away any objects, but it does require that the weights be applied to the galaxies instead of the randoms. This implies an increase in the variance, because the randoms better map out the on-sky completeness variation than the galaxies do.

Remove incomplete sectors

One should also reject sectors from the mask (and their associated galaxies and randoms) with a completeness less than some threshold value (taken to be 70% in Anderson et al., section 3.5), to remove highly incomplete sectors that have been only partially observed. Anderson et al. also applied a redshift completeness for sectors with more than 10 galaxies, but less than 80% good redshifts (see equation 13). This removes sectors with a significant fraction of bad data.

Radial selection function

The radial selection function for both LOWZ and CMASS galaxies differs between the northern and southern hemispheres, so distinct radial distributions for the two hemispheres must be used when assigning redshifts to the random galaxy catalog. Section 6 of Ross et al. (2012) compares methods of sampling the underlying redshift distribution to assign redshifts to the random galaxies, including randomly selecting from a "shuffled" list of galaxy redshifts and various choices for smooth spline fits to the observed distribution.

Clustering statistics

The instructions above provide all of the necessary information to generate data and random catalogs. You are now ready to compute your favorite clustering statistic!