Bulk Data Downloads

Introduction

Data can be downloaded directly from data.sdss3.org using the rsync or wget commands. Note that the total SDSS-III data volume is ~60 TB; see the data volume table. If you need a substantial fraction of that data (>1 TB), please contact the helpdesk to arrange a custom data transfer. This will be faster for you and easier on our servers.

NOTE: all rsync commands on this page have --dry-run added to them, and all wget commands have --spider added to them. You have to remove those command line arguments for these commands to actually download data.

wget commands use the same URL as you would in a web browser, e.g.,

  wget --spider http://data.sdss3.org/sas/dr9/sdss/spectro/redux/plates-dr9.fits

or for rsync drop the "sas" from the URL, e.g.,

  rsync --dry-run -v rsync://data.sdss3.org/dr9/sdss/spectro/redux/plates-dr9.fits .

If you are having any difficulty with rsync URLs, check the notes below.

The number of rsync connections is throttled but the number of wget connections is not. Thus it is recommended to use wget to initially fetch the data, and use rsync only to confirm that the data you have is correct and complete.

The SAS website data.sdss3.org/sas/dr9 (US West Coast) is completely mirrored at mirror.sdss3.org/sas/dr9 (US East Coast). If you have difficulty connecting to data.sdss3.org, try mirror.sdss3.org instead. Also check the status page for outage announcements.

Spectra Catalog Data

Catalogs of parameters derived from the spectra and matched to photometric data are documented on the spectra data page. These can be directly downloaded from the links on that page, or via wget commands. For example, to download the redshifts and classifications of all SDSS spectra (2.9 GB):

  wget --spider http://data.sdss3.org/sas/dr9/sdss/spectro/redux/specObj-dr9.fits

Or to get the associated photometric position based matches (7.5 GB):

  wget --spider http://data.sdss3.org/sas/dr9/sdss/spectro/redux/photoPosPlate-dr9.fits

The stellar parameter (SSPP) results can be downloaded similarly:

  wget --spider http://data.sdss3.org/sas/dr9/sdss/sspp/ssppOut-dr9.fits

Spectra Per-Object Files

If you want just a subset of the spectra, the most convenient form may be the spec files with one file per PLATE-MJD-FIBER containing the coadded spectrum, the redshift and classification fits, spectral line fits, and optionally the individual exposures which contributed to the coadd: These are located at:

BOSS spectra: data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/
SDSS Legacy spectra: data.sdss3.org/sas/dr9/sdss/spectro/redux/26/spectra/
SDSS stellar cluster plates: data.sdss3.org/sas/dr9/sdss/spectro/redux/103/spectra/
SDSS SEGUE-2 plates: data.sdss3.org/sas/dr9/sdss/spectro/redux/104/spectra/

Beneath each of those directories, the spectra are organized by plate in the form

  PLATE/spec-PLATE-MJD-FIBER.fits

e.g.,

  3586/spec-3586-55181-0016.fits
  3609/spec-3609-55201-0646.fits
  3661/spec-3661-55614-0020.fits
  ...

To download these spectra in bulk, generate a list of spectra you wish to download in a text file of that format and then use wget:

  wget --spider -nv -r -nH --cut-dirs=7 \
      -i speclist.txt \
      -B http://data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/

For BOSS spectra, a few sample lists have been pre-generated in data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/specfiles*.txt. e.g., to download all the objects which were either targeted or classified as a QSO (201k files, ~250 GB),

  wget --spider -nv -r -nH --cut-dirs=7 \
    -i http://data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/specfiles-qso-v5_4_45.txt \
    -B http://data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/

Spectra Per-Object Lite Files

A "lite" version of the above files are also available in the "spectra/lite/PLATE/" subdirectories. These contain the same coadd and catalog information as the full spec files, but don't include the individual exposures which contributed to the coadd. For example, to download the "lite" version of the above QSO files (~42 GB instead of ~250 GB):

  wget --spider -nv -r -nH --cut-dirs=8 \
    -i http://data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/specfiles-qso-v5_4_45.txt \
    -B http://data.sdss3.org/sas/dr9/boss/spectro/redux/v5_4_45/spectra/lite/

Spectra per-Plate Files

The spectra are also available grouped by plate, with all 640 (SDSS) or 1000 (BOSS) spectra in a single file. These are the original outputs of the spectroscopic pipeline and are itemized on the spectro pipeline page, including where they are in the SAS directory structure. The primary files are:

File	Description
spPlate	Coadded spectra
spCFrame	Individual exposure spectra
spZbest	Redshifts and classifications
spZall	Redshifts and classifications including second, third, etc. best fits
spZline	Spectral line fits

To download all the spPlate files for BOSS:

  rsync --dry-run -aLvz --include "????/" --include "spPlate*.fits" \
    --exclude "*" --exclude "spectra/*" \
    --prune-empty-dirs --progress \
    rsync://data.sdss3.org/dr9/boss/spectro/redux/v5_4_45/ v5_4_45/

Or for spPlate, spZall, spZbest, spZline:

  rsync --dry-run -aLvz --include "????/" \
    --include "spPlate*.fits" --include "spZ*.fits" \
    --exclude "*" --exclude "spectra/*" \
    --prune-empty-dirs --progress \
    rsync://data.sdss3.org/dr9/boss/spectro/redux/v5_4_45/ v5_4_45/

A version of the above command specific to SEGUE-2:

  rsync --dry-run -aLvz --include "????/" --include "spPlate*.fits" --exclude "*" \
    --prune-empty-dirs --progress \
    rsync://data.sdss3.org/dr9/sdss/spectro/redux/104/segue2/ segue2/

This command will download the spectroscopic parameters by plate. If you need stellar parameter data, you need to use:

  rsync --dry-run -aLvz --include "????/" --include "output/" \
    --include "param/" --include "ssppOut*.fit" \
    --include "ssppOut.lineindex*.fit" --exclude "*" \
    --prune-empty-dirs --progress \
    rsync://data.sdss3.org/dr9/sdss/sspp/122/ .

Imaging Data

Images and derived catalog data are described on the imaging data page. You can use a SkyServer search or the file window_flist.fits file to identify which RERUN-RUN-CAMCOL-FIELD overlaps your region of interest. Then download the matching calibObj files (catalog data) or frame files (calibrated imaging data), e.g., for RERUN 301, RUN 2505, CAMCOL 3, FIELD 38, the r-band image is:

  wget --spider http://data.sdss3.org/sas/dr9/boss/photoObj/frames/301/2505/3/frame-r-002505-3-0038.fits.bz2

and the associated catalog of identified galaxies for that patch of sky is:

  wget --spider http://data.sdss3.org/sas/dr9/boss/sweeps/dr9/301/calibObj-002505-3-gal.fits.gz

Notes on using rsync

Remember, to convert an http URL to an rsync URL you must:

Replace http:// with rsync://.
Remove /sas/.

Here's a Python function that accomplishes these steps:

def http2rsync(url):
    """Convert a valid SDSS-III http URL to the rsync equivalent.
    """
    from re import sub
    return sub(r'https?://(data|mirror)\.sdss3\.org/sas/dr([0-9]+)/(.*)$',
        r'rsync://\1.sdss3.org/dr\2/\3',url)

And here's the equivalent in IDL:

FUNCTION http2rsync, url
    parts = STREGEX(url,'https?://(data|mirror)\.sdss3\.org/sas/dr([0-9]+)/(.*)$',/EXTRACT,/SUBEXPR)
    RETURN, 'rsync://'+parts[1]+'.sdss3.org/dr'+parts[2]+'/'+parts[3]
END