Data Structures Explained

Data Structures Explained#

GBTFITSLoad#

The main data object in dysh is GBTFITSLoad which represents one or more SDFITS files in the GBT SDFITS format. (A more generic SDFITS loader is SDFITSLoad, described below). The data are represented as one contiguous block even if multiple files were loaded.

sdfits = GBTFITSLoad(path)
sdfits.summary()   # A tabular summary of all the data

where path is a path.Path to a directory containing SDFITS files or a single SDFITS file.

Once loaded, the columns of the binary table, except the DATA and FLAGS columns, are stored as a DataFrame, and are accessible via the Python [] notation, e.g.:

sdfits["OBJECT"]         # The entire OBJECT column
sdfits.udata("OBJECT")   # The unique set of OBJECTS
sdfits.selection         # The entire DataFrame

This mechanism can be used to select data for calibration.

Although not in the DataFrame, The DATA column of a GBTFITSLoad is directly accessible as a ndarray with sdfits["DATA"]. More commonly, to look at the raw data you would access a single integration (row) as a numpy array or as a Spectrum, which would include the metadata.

array = sdfits.rawspectrum(10) #  get data array for row 10
spectrum = sdfits.getspec(10)  #  get a Spectrum for row 10 data and metadata

GBTOnline and GBTOffline#

For users at GBO, GBTOnline connects directly to the SDFITS file(s) currently being observed/written at the GBT. It functions exactly like GBTFITSLoad and updates its contents automatically as new data are written.

sdfits = GBTOnline()
sdfits.summary()   # A tabular summary of the current data

GBTOffline connects to a given project on disk, by default in /home/sdfits (where project data live a GBO). You can set the SDFITS_DATA environment variable to point to a different location.

# Load data from project id AGBT_22A_325_33
sdfits = GBTOffline('AGBT_22A_325_33')
sdfits.summary()

SDFITSLoad#

A generic class for loading single SDFITS files is SDFITSLoad. (Underneath the hood, GBTFITSLoad contains one SDFITSLoad for each file within). GBT users will not likely instantiate one of these objects directly. However, they can be used to read in a SDFITS file that came from a different telescope. It has basic methods to inspect data, such as summary info, getspec, rawspectrum, rawspectra, as well as the [] style accessor for column data.

from dysh.fits import SDFITSLoad

sdfits = SDFITSLoad('AGBT05B_047_01.raw.acs.fits')
sdfits.summary()

File:    AGBT05B_047_01.raw.acs.fits
i= 0
HDU       1
BINTABLE: 352 rows x 70 cols with 32768 chans
Selected  352/352 rows
Sources:  ['NGC5291']
RESTFREQ: [1.420405] GHz
Scans:    [51, 52, 53, 54, 55, 56, 57, 58]
Npol:     2
Nint:     176

sdf.info()

Filename: AGBT05B_047_01.raw.acs.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
0  PRIMARY       1 PrimaryHDU      12   ()
1  SINGLE DISH    1 BinTableHDU    229   352R x 70C   ['32A', '1D', '22A', '1D', '1D', '1D', '32768E', '16A', '6A', '8A', '1D', '1D', '1D', '4A', '1D', '4A', '1D', '1I', '32A', '32A', '1J', '32A', '16A', '1E', '8A', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '8A', '1D', '1D', '12A', '1I', '1I', '1D', '1D', '1I', '1A', '1I', '1I', '16A', '16A', '1J', '1J', '22A', '1D', '1D', '1I', '1A', '1D', '1E', '1D', '1A', '1A', '8A', '1E', '1E', '16A', '1I', '1I', '1I']

sdfits["IFNUM"]

0      0
1      0
2      0
3      0
4      0
      ..
347    0
348    0
349    0
350    0
351    0
Name: IFNUM, Length: 352, dtype: int16

Warning

Be careful accessing the data of an SDFITSLoad with the “DATA” key! It is a reference to the underlying data, not a copy. You could overwrite the data with sdfits["DATA"] = ...

Spectrum#

A Spectrum is a container representing a spectrum and its attributes, with data in a brightness unit (e.g., ct, K, Jy) and a spectral axis in frequency or velocity units. The data are accesible as a (unitless) ndarray (data) or as a Quantity (flux). Spectrum objects have a mask array which can be set with flagging operations. It is based on specutils Spectrum class. It supports most common velocity reference frames supported by astropy.coordinates.BaseCoordinateFrame (‘itrs’, ‘topo’, ‘lsrk’, ‘icrs’, ‘hcrs’, etc.). ‘topo’ is a synonym for ‘itrs’. Spectra can be displayed with plot, which opens up an interactive plot. The frequency/velocity locations of spectral lines within the spectral window can be listed using query_lines.

Standard operations such as baseline removal, smooth, and average are supported, as well as analysis functions like stats, roll , radiometer, normalness, and cog (Curve of Growth). Spectrum arithmetic is supported with common operators, e.g.:

s1 = Spectrum.fake_spectrum(nchan=2048)   # Default unit is K
s2 = Spectrum.fake_spectrum(nchan=2048)
s3 = s1+s2      # Sum of two spectra
ratio = s2/s1   # Unitless ratio

Spectrum objects are returned by operations like timeaverage and getspec. A useful method for creating dummy spectra is fake_spectrum.

Scan#

Scans are represented by subclasses of ScanBase specific to their observation type. For instance PSScan is position-switched, FSScan is frequency-switched, TPScan is total power, NodScan is nodding. These are created by the corresponding calibration routine, e.g., getps, getfs, gettp and collected into a common ScanBlock. Users generally will not interact with individual Scan objects, but rather collectively through the ScanBlock interface. However, certain attributes and functions are common to all Scan classes are helpful for understanding the data.

# Load data from project id AGBT_22A_325_33
sdfits = GBTFITSLoad(path)

# Get a ScanBlock containing some PSScans calibrated to Janskys
sb = sdfits.getps(scan=[23,25,27], ifnum=0, plnum=0, fdnum=0, units='flux', zenith_opacity=0.05)

# Look at the scale factors and weights of all the Scans in the ScanBlock
for s in sb:
    print(f"Scale factors for PSScan {s.scan} = {s.tscale_fac}")
    print(f"Weights for PSScan {s.scan} = {s.weights}")

# Get the weighted average of the integrations for the first PSScan
ta = sb[0].timeaverage()
ta.plot()

# Examine the integrations in the first PSScan
for i in range(len(sb[0]):
     sb[0].getspec(i).plot()

sb[0].add_comment("Integration 12 looks wonky.")

ScanBlock#

A ScanBlock is a list of ScanBase objects. As illustrated by the diagram below, it serves to hold together a series of scans. One of its purposes is to facilitate working with groups of scans, as it allows time averaging them together (using ScanBlock.timeaverage), or subtracting a baseline from the integrations in a group of scans (using ScanBlock.subtract_baseline).

        flowchart TD

subgraph newLines[GBTFITSLoad
    50 Scans
    Position Switching
    Dual Linear Polarization
    1 Beam
    4 Frequency Windows
    100 integrations each
    ]




end


newLines -- getps( scan=45, plnum=1, ifnum=0, fdnum=0 ) --> ScanBlock1
newLines -- gettp( scan=[17,18,19], intnum=np.r_[50:100], ifnum=2, plnum=0,fdnum=0 ) --> ScanBlock2



subgraph ScanBlock1[ScanBlock]
        psscan["spectra.scan.PSScan<br />scans = 44,45<br />plnum = 1<br />ifnum = 0<br />fdnum = 0<br />intnum = (0,100)"]
    end
subgraph ScanBlock2[ScanBlock]
        tpscan1["spectra.scan.TPScan<br />scan=17<br />plnum = 0<br />ifnum = 2<br />fdnum = 0 <br />intnum=(50,100)"]
        tpscan2["spectra.scan.TPScan<br />scan=18<br />plnum = 0<br />ifnum = 2<br />fdnum = 0 <br />intnum=(50,100)"]
        tpscan3["spectra.scan.TPScan<br />scan=19<br />plnum = 0<br />ifnum = 2<br />fdnum = 0 <br />intnum=(50,100)"]

    end

ScanBlock1[Scan Block] -- timeaverage() --->spectrum1[Spectrum]
ScanBlock2[Scan Block] -- timeaverage() --->spectrum2[Spectrum]

Fig. 1 The contents of a ScanBlock are a series of ScanBase objects. ScanBlock are the return of the calibration routines (e.g., getps, getfs or gettp)#

The return object of all calibration methods (e.g., getps, getfs or gettp) is a ScanBlock. Since a ScanBlock is a list, it is possible to append to a ScanBlock. This can be useful when working with different observing procedures or for combining polarizations and/or spectral windows (IFs). For example, you can do the following:

# Calibrate plnum 1 and 2 data, then average.
sb0 = sdfits.getps(scan=1, ifnum=0, plnum=0, fdnum=0)
sb1 = sdfits.getps(scan=1, ifnum=0, plnum=1, fdnum=0)
sb0.extend(sb1)
pol_avg_spectrum = sb0.timeaverage()