Loading and Examining data

Loading and Examining data#

The basics classes for reading in SDFITS files were explained in the previous section. Here we describe some of their keywords and how to inspect data after loading it.

Header Data Units#

By default GBTFITSLoad will load all Header Data Units (HDUs) from the input file(s). In SDFITS format, the HDUs are binary tables. For files with multiple HDUs, you can load a specific HDU with the hdu keyword, which can speed up data loading if you are only interested in one binary table.

# Load the second binary table from the input SDFITS files.
# Note binary tables start at index 1, since HDU 0 is the
# Primary HDU contains only metadata.
sdfits = GBTFITSLoad("/path/to/data", hdu=2)
sdfits.info()

You can also limit Flags and Flag Files ——————–

Data may come with flag files (with extension “.flag”). By default dysh does not read these files because primarily the contain VEGAS spur channels which are more quickly flagged algorithmically based on information in the SDFITS header. You can control flagging on input with the skipflags (default: True) and flag_vegas (default: False) keywords.

Note

It is more efficient to use flag_vegas=True in calibration routines rather than in GBTFITSLoad. GBTFITSLoad(flag_vegas=True) would cause all rows in the SDFITS files(s) to be read, since the keywords needed to calculate the VEGAS spur locations are defined per row.

# Load a single SDFITS file. Read in .flag file if it exists.
# VEGAS spurs are still flagged algorithmically.
sdfits = GBTFITSLoad("/path/to/mydata.fits", skipflags=False, flag_vegas=True)

# Load multiple SDFITS files from a given directory.
# Do not read in any .flag files and do not flag VEGAS spurs.
sdfits = GBTFITSLoad("/path/to/data/")

Tip

See the flagging chapter for more details on setting and applying flags.

One you have loaded data, you can see what files were read in:

# Print out the files that were loaded
sdfits.filenames()

# or as Paths instead of strings
sdfits

The most basic description of the data is from summary, the output of which is customizable.

# The default, compact view
sdfits.summary()

# Show every record for a set of scans
sdfits.summary(scan=[19,20,21], verbose=True)

# List specific columns
sdfits.summary(columns=["OBJECT", "LST", "DATE-OBS"]))

# List the default columns plus additional ones
sdfits.summary(add_columns=["TAMBIENT", "VELDEF"])

Information about individual scans can be seen with the scan_info function. This prints the ifnum, plnum and fdnum values for each scan requested.

sdf.scan_info(scan=[60,84])
Scan 60: ifnum=[0, 1, 2, 3, 4, 5] plnum=[0, 1] fdnum=[0, 1, 2, 3, 4, 5, 6]
Scan 84: ifnum=[0, 1, 2, 3, 4] plnum=[0, 1] fdnum=[0, 1, 2, 3, 4, 5, 6]

Data in the individual SDFITS files are accessible via the sdf attribute, a list of SDFITSLoad with length equal to the number of files.

sdfits.sdf[0]
# Get array of data from first SDFITS file, row 3
array = sdfits.sdf[0].rawspectrum(3)

Examining and Setting Metadata#

Metadata are any columns in the SDFITS file other than DATA and FLAGS. A powerful mechanism for examining and modifying the metadata columns is the [] operator. This is used to access any column of the metadata.

# Return the names of all the columns (except DATA and FLAGS)
sdfits.columns

# Examine individual columns, note column names are case-insensitive
sdfits["obstype"]
sdfits["tunit7"]
sdfits["data"].shape

# Examine multiple columns at once
mylist = ["object", "backend", "intnum"]
sdfits[mylist]

# The unique set of values in a column
sdfits.udata["backend"]

Note

dysh uses the GBTIDL-created index file (if present) to speed up loading of SDFITS files. However, the index file does not include all columns in the SDFITS binary table. When dysh needs columns that that are not in the index file, it loads those columns from the SDFITS file(s). So the first time you use [] to access such a column you will see a message such as:

Column(s) [‘DOPFREQ’] not available in .index file. Loading from FITS file(s).

For operations such as gettp, getfs, etc, the missing columns are loaded only for the rows needed (again, for performance reasons – especially important for very large files). Therefore under certain circumstances you may find that sdf[column_name] returns NaN for unloaded rows. You can force loading of all rows with sdf.load_all(), but note this can take a while for very large files.

Assignment also works. Assigned values will be used in any subsequent calibration commands. The underlying SDFITS files are not affected unless you actually overwrite them with sdfits.write(overwrite=True).

# Assign one value to all rows
sdfits["TCAL"] = 1.5

# Assign an array with lengths equal to number of rows
# (a silly example)
tcal = np.arange(sdfits.stats()['nrows'])
sdfits["TCAL"] = tcal

New columns can be created by assignment as well, either one value assigned to all rows, or with an array with length equal to the number of rows.

sdfits["PI"] = np.pi
sdfits["RANDOM"] = np.random.rand(sdfits.stats()['nrows'])

Columns can be renamed:

sdfits.rename_column("PI","PIE")

Tip

For more information on setting metadata, see Metadata Management

Examining the Raw Spectral Data#

GBTFITSLoad, GBTOnline, and GBTOffline have two methods to look at the raw integrations, rawspectrum returns a ndarray for a given integration, while getspec returns the data as a Spectrum object. By default these retrieve the record from the first SDFITS file; other files can be accessed with the fitsindex parameter (equivalent to, e.g., sdfits.sdf[2].rawspectrum().

array = sdfits.rawspectrum(10) #  get data array for row 10 from the first SDFITS file
array = sdfits.rawspectrum(10, fitsindex=2) #  get data array for row 10 from the third SDFITS file
spectrum = sdfits.getspec(10)  #  get a Spectrum for row 10 data and metadata
spectrum.plot()                #  Spectrum objects can always be plotted.

The full raw data array for any binary table from one of the underlying SDFITS files can be retrieved with rawspectra.

# Get the data array from the second binary table of the third SDFITS file
array = sdfits.rawspectra(bintable=1, fitsindex=2)

Warning

rawspectrum and rawspectra return references to the actual binary table data. If you alter the result, you alter the data! It is safer to use the “DATA” keyword.

If there is a single binary table, the entire raw data array can be retrieved using the “DATA” keyword.

allthedata = sdfits["DATA"]

Note this is copy of the data not a reference to it, modifying allthedata does not affect the binary table data.

Adding Comments#

Users can add COMMENT or HISTORY cards to the the GBTFITSLoad object with add_comment and add_history. These get propagated to ScanBlocks and Spectrum objects during processing and written to the output file(s) during write.