Data Structures Explained#
GBTFITSLoad#
The main data object in dysh is
GBTFITSLoad
which represents one or more SDFITS files
in the GBT SDFITS format. (A more generic
SDFITS loader is SDFITSLoad, described below).
The data are represented as one contiguous block even if multiple
files were loaded.
sdfits = GBTFITSLoad(path)
sdfits.summary() # A tabular summary of all the data
where path is a path.Path to a directory containing SDFITS files or a single SDFITS file.
Once loaded, the columns of the binary table, except the DATA and FLAGS columns, are stored as a DataFrame, and are accessible via the Python [] notation, e.g.:
sdfits["OBJECT"] # The entire OBJECT column
sdfits.udata("OBJECT") # The unique set of OBJECTS
sdfits.selection # The entire DataFrame
This mechanism can be used to select data for calibration.
Although not in the DataFrame, The DATA column of a GBTFITSLoad is directly accessible as a ndarray with sdfits["DATA"]. More commonly, to look at the raw data you would access a single integration (row) as a numpy array or as a Spectrum, which would include the metadata.
array = sdfits.rawspectrum(10) # get data array for row 10
spectrum = sdfits.getspec(10) # get a Spectrum for row 10 data and metadata
GBTOnline and GBTOffline#
For users at GBO, GBTOnline connects directly to the SDFITS file(s)
currently being observed/written at the GBT. It functions exactly
like GBTFITSLoad and updates its contents automatically as new data are written.
sdfits = GBTOnline()
sdfits.summary() # A tabular summary of the current data
GBTOffline connects to a given project on disk,
by default in /home/sdfits (where project data live a GBO).
You can set the SDFITS_DATA environment variable to point to a different location.
# Load data from project id AGBT_22A_325_33
sdfits = GBTOffline('AGBT_22A_325_33')
sdfits.summary()
SDFITSLoad#
A generic class for loading single SDFITS files is SDFITSLoad.
(Underneath the hood, GBTFITSLoad contains one SDFITSLoad for each file within).
GBT users will not likely instantiate one of these objects directly. However, they can
be used to read in a SDFITS file that came from a different telescope. It has basic methods to inspect data, such as
summary
info,
getspec,
rawspectrum,
rawspectra, as well as the [] style accessor for column data.
from dysh.fits import SDFITSLoad
sdfits = SDFITSLoad('AGBT05B_047_01.raw.acs.fits')
sdfits.summary()
File: AGBT05B_047_01.raw.acs.fits
i= 0
HDU 1
BINTABLE: 352 rows x 70 cols with 32768 chans
Selected 352/352 rows
Sources: ['NGC5291']
RESTFREQ: [1.420405] GHz
Scans: [51, 52, 53, 54, 55, 56, 57, 58]
Npol: 2
Nint: 176
sdf.info()
Filename: AGBT05B_047_01.raw.acs.fits
No. Name Ver Type Cards Dimensions Format
0 PRIMARY 1 PrimaryHDU 12 ()
1 SINGLE DISH 1 BinTableHDU 229 352R x 70C ['32A', '1D', '22A', '1D', '1D', '1D', '32768E', '16A', '6A', '8A', '1D', '1D', '1D', '4A', '1D', '4A', '1D', '1I', '32A', '32A', '1J', '32A', '16A', '1E', '8A', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '1D', '8A', '1D', '1D', '12A', '1I', '1I', '1D', '1D', '1I', '1A', '1I', '1I', '16A', '16A', '1J', '1J', '22A', '1D', '1D', '1I', '1A', '1D', '1E', '1D', '1A', '1A', '8A', '1E', '1E', '16A', '1I', '1I', '1I']
sdfits["IFNUM"]
0 0
1 0
2 0
3 0
4 0
..
347 0
348 0
349 0
350 0
351 0
Name: IFNUM, Length: 352, dtype: int16
Warning
Be careful accessing the data of an SDFITSLoad with the “DATA” key! It is a reference
to the underlying data, not a copy. You could overwrite the data with sdfits["DATA"] = ...
Spectrum#
A Spectrum is a container representing a
spectrum and its attributes, with data in a brightness unit (e.g.,
ct, K, Jy) and a
spectral axis in frequency or velocity units.
The data are accesible as a (unitless) ndarray (data) or as a
Quantity (flux).
Spectrum objects have a mask array which can be set with
flagging operations.
It is based on specutils Spectrum class.
It supports most common velocity reference frames supported by astropy.coordinates.BaseCoordinateFrame (‘itrs’, ‘topo’, ‘lsrk’, ‘icrs’, ‘hcrs’, etc.).
‘topo’ is a synonym for ‘itrs’.
Spectra can be displayed with plot, which opens up an interactive plot.
The frequency/velocity locations of spectral lines within the spectral window can be listed using query_lines.
Standard operations such as baseline removal, smooth, and average are supported, as well as analysis functions like stats, roll , radiometer, normalness, and cog (Curve of Growth). Spectrum arithmetic is supported with common operators, e.g.:
s1 = Spectrum.fake_spectrum(nchan=2048) # Default unit is K
s2 = Spectrum.fake_spectrum(nchan=2048)
s3 = s1+s2 # Sum of two spectra
ratio = s2/s1 # Unitless ratio
Spectrum objects are returned by operations like timeaverage and getspec.
A useful method for creating dummy spectra is fake_spectrum.
Scan#
Scans are represented by subclasses of ScanBase specific to their observation type. For instance
PSScan is position-switched,
FSScan is frequency-switched,
TPScan is total power,
NodScan is nodding.
These are created by the corresponding calibration routine, e.g., getps, getfs, gettp
and collected into a common ScanBlock. Users generally will not interact with individual Scan objects, but rather collectively through the ScanBlock interface. However, certain attributes and functions are common to all Scan classes are helpful for understanding the data.
# Load data from project id AGBT_22A_325_33
sdfits = GBTFITSLoad(path)
# Get a ScanBlock containing some PSScans calibrated to Janskys
sb = sdfits.getps(scan=[23,25,27], ifnum=0, plnum=0, fdnum=0, units='flux', zenith_opacity=0.05)
# Look at the scale factors and weights of all the Scans in the ScanBlock
for s in sb:
print(f"Scale factors for PSScan {s.scan} = {s.tscale_fac}")
print(f"Weights for PSScan {s.scan} = {s.weights}")
# Get the weighted average of the integrations for the first PSScan
ta = sb[0].timeaverage()
ta.plot()
# Examine the integrations in the first PSScan
for i in range(len(sb[0]):
sb[0].getspec(i).plot()
sb[0].add_comment("Integration 12 looks wonky.")
ScanBlock#
A ScanBlock is a list of
ScanBase objects. As illustrated
by the diagram below, it serves to hold together a series
of scans. One of its purposes is to facilitate working with
groups of scans, as it allows time averaging them together (using
ScanBlock.timeaverage),
or subtracting a baseline from the integrations in a group
of scans (using ScanBlock.subtract_baseline).
flowchart TD
subgraph newLines[GBTFITSLoad
50 Scans
Position Switching
Dual Linear Polarization
1 Beam
4 Frequency Windows
100 integrations each
]
end
newLines -- getps( scan=45, plnum=1, ifnum=0, fdnum=0 ) --> ScanBlock1
newLines -- gettp( scan=[17,18,19], intnum=np.r_[50:100], ifnum=2, plnum=0,fdnum=0 ) --> ScanBlock2
subgraph ScanBlock1[ScanBlock]
psscan["spectra.scan.PSScan<br />scans = 44,45<br />plnum = 1<br />ifnum = 0<br />fdnum = 0<br />intnum = (0,100)"]
end
subgraph ScanBlock2[ScanBlock]
tpscan1["spectra.scan.TPScan<br />scan=17<br />plnum = 0<br />ifnum = 2<br />fdnum = 0 <br />intnum=(50,100)"]
tpscan2["spectra.scan.TPScan<br />scan=18<br />plnum = 0<br />ifnum = 2<br />fdnum = 0 <br />intnum=(50,100)"]
tpscan3["spectra.scan.TPScan<br />scan=19<br />plnum = 0<br />ifnum = 2<br />fdnum = 0 <br />intnum=(50,100)"]
end
ScanBlock1[Scan Block] -- timeaverage() --->spectrum1[Spectrum]
ScanBlock2[Scan Block] -- timeaverage() --->spectrum2[Spectrum]
Fig. 1 The contents of a ScanBlock are a series of ScanBase objects. ScanBlock are the return of the calibration routines (e.g., getps, getfs or gettp)#
The return object of all calibration methods (e.g., getps, getfs or gettp) is a ScanBlock.
Since a ScanBlock is a list, it is possible to append to a ScanBlock.
This can be useful when working with different observing procedures or for combining polarizations and/or spectral windows (IFs).
For example, you can do the following:
# Calibrate plnum 1 and 2 data, then average.
sb0 = sdfits.getps(scan=1, ifnum=0, plnum=0, fdnum=0)
sb1 = sdfits.getps(scan=1, ifnum=0, plnum=1, fdnum=0)
sb0.extend(sb1)
pol_avg_spectrum = sb0.timeaverage()