Performance vs. GBTIDL

One of the requirements of dysh is that be computationally as fast or faster than GBTIDL, specifically (from the Requirements document):

5.3-R1 The software must be capable of the data reduction processes with the same or better accuracy and speed as GBTIDL.

Using our prototype design of SDFITSLoad and a spectrum class based on specutils.Spectrum1D, we profiled the code in 4 operations:

Loading an SDFITS file with one or more HDUs
Creating an index for each HDU from the FITS bintable columns using pandas
Creating a spectrum object for each row in the bintable
Removing baselines of order 1, 2 and 3 from each spectrum, excluding the inner 25% of channels from the fit.

The equivalent GBTIDL commands were also profiled, as well as a pure numpy and pure C approach for steps 1 and 4. The latter represents the maximum possible speed at which an operation could run.

We used SDFITS files between 4MB and 23GB in size with number of rows ranging between 352 and 92032 and number of channels ranging between 1024 and 65536.

The result is that dysh performs better than GBTIDL in loading and indexing files and creating spectra, and comparably well for baselining (with no optimization). The prototype design can easily handle large files and spectra with many channels.

Performance testing of dysh and GBTIDL — Performance of `dysh` versus `GBTIDL` in common operations. `dysh` is significantly faster in loading SDFITS files and creating indices (equivalent of GBTIDL index file) and creating spectra. It is comparable in removing baselines of arbitrary order. No attempt was made to optimize the prototype `dysh` code.

For those interested, you can find the GBTIDL code used for this comparison in this repo.