Selecting Data#

This notebook shows how to select data in dysh. We illustrate this using the Selection class of dysh. We use this approach to show the various method available, however, using a Selection object will have no effect on the data itself. At the end of the notebook we show how the same selections can be accomplished using a GBTFITSLoad object, so that the selections made are actually applied to the data.

You can find a copy of this tutorial as a Jupyter notebook here or download it by right clicking here and selecting “Save Link As”.

Loading Modules#

We start by loading the modules we will use in this tutorial.

# These modules are required for the tutorial.
import astropy.units as u
from astropy.time import Time
from dysh.fits.gbtfitsload import GBTFITSLoad
from dysh.util.selection import Selection

# These modules are only used to download the data.
from pathlib import Path
from dysh.util.download import from_url

Data Retrieval#

Download the example SDFITS data, if necessary.

The code below will download an SDFITS file from http://www.gb.nrao.edu/dysh/example_data and put it in a data directory. The data directory must exist where this notebook is being run from, otherwise the downloaded SDFITS will be named data. The example will work either way, but be aware if you find a new file named data after running it.

url = "http://www.gb.nrao.edu/dysh/example_data/hi-survey/data/AGBT04A_008_02.raw.acs/AGBT04A_008_02.raw.acs.fits"
savepath = Path.cwd() / "data"
savepath.mkdir(exist_ok=True) # Create the data directory if it does not exist.
filename = from_url(url, savepath)

Data Loading#

Next, we use GBTFITSLoad to load the data, and then its summary method to inspect its contents.

sdfits = GBTFITSLoad(filename)
sdfits.summary()
SCAN OBJECT VELOCITY PROC PROCSEQN RESTFREQ DOPFREQ # IF # POL # INT # FEED AZIMUTH ELEVATION
220 3C286 0.0 OffOn 1 1.400000 1.400000 1 2 6 1 185.2806 82.0246
221 3C286 0.0 OffOn 2 1.400000 1.400000 1 2 6 1 187.2136 81.9980
222 3C286 0.0 OffOn 1 1.400000 1.400000 1 2 6 1 193.8331 81.8413
223 3C286 0.0 OffOn 2 1.400000 1.400000 1 2 6 1 195.6766 81.7788
224 3C286 0.0 OffOn 1 1.400000 1.400000 1 2 6 1 195.5182 80.2910
225 3C286 0.0 OffOn 2 1.400000 1.400000 1 2 5 1 199.9358 81.6005
226 3C286 0.0 OffOn 1 1.400000 1.400000 1 2 6 1 200.8333 80.0265
227 3C286 0.0 OffOn 2 1.400000 1.400000 1 2 6 1 205.9471 81.2609
228 B1328+254 0.0 OffOn 1 1.400000 1.400000 1 2 6 1 207.5257 73.9844
229 B1328+254 0.0 OffOn 2 1.400000 1.400000 1 2 6 1 210.9600 75.1584
230 B1345+125 0.0 OffOn 1 1.400000 1.400000 1 2 18 1 193.2738 62.0703
231 B1345+125 0.0 OffOn 2 1.400000 1.400000 1 2 18 1 195.6794 63.3244
244 B1345+125 0.0 OffOn 1 1.400000 1.400000 1 2 18 1 200.9543 60.9284
245 B1345+125 0.0 OffOn 2 1.400000 1.400000 1 2 18 1 203.5311 62.0587
246 B1345+125 0.0 OffOn 1 1.400000 1.400000 1 2 18 1 204.3408 60.3930
247 B1345+125 0.0 OffOn 2 1.400000 1.400000 1 2 18 1 206.9763 61.4655
248 B1345+125 0.0 OffOn 1 1.370000 1.370000 1 2 18 1 208.0408 59.6983
249 B1345+125 0.0 OffOn 2 1.370000 1.370000 1 2 18 1 210.7215 60.7060
250 B1345+125 0.0 Track 1 1.370000 1.370000 1 2 3 1 215.2430 59.6119
251 B1345+125 0.0 Track 1 1.370000 1.370000 1 2 3 1 215.9692 59.4175
263 U8091 213.0 OffOn 1 1.420405 1.420405 1 2 30 1 241.1339 50.6393
264 U8091 213.0 OffOn 2 1.420405 1.420405 1 2 30 1 240.9795 50.7395
265 U8091 213.0 Track 1 1.420405 1.420405 1 2 3 1 243.4807 49.8234
266 U8249 2541.0 OffOn 1 1.420405 1.420405 1 2 10 1 241.7065 50.2801
267 U8249 2541.0 OffOn 1 1.420405 1.420405 1 2 30 1 242.6892 49.6343
268 U8249 2541.0 OffOn 2 1.420405 1.420405 1 2 30 1 242.5451 49.7333
269 U8249 2541.0 Track 1 1.420405 1.420405 1 2 3 1 244.9435 48.0787
270 U8503 4676.0 OffOn 1 1.420405 1.420405 1 2 30 1 270.1741 60.2485
271 U8503 4676.0 OffOn 2 1.420405 1.420405 1 2 30 1 269.9396 60.5449
272 U8091 213.0 OffOn 1 1.420405 1.420405 1 2 30 1 253.5780 41.4420
273 U8091 213.0 OffOn 2 1.420405 1.420405 1 2 30 1 253.4586 41.5517
274 U8091 213.0 Track 1 1.420405 1.420405 1 2 3 1 255.5793 39.5541
275 U8249 2541.0 OffOn 1 1.420405 1.420405 1 2 30 1 266.5469 46.8377
276 U8249 2541.0 OffOn 2 1.420405 1.420405 1 2 30 1 266.3738 47.0388
277 U8249 2541.0 Track 1 1.420405 1.420405 1 2 3 1 267.9426 45.1863
278 U9965 4524.0 OffOn 1 1.420405 1.420405 1 2 30 1 221.5582 67.7182
279 U9965 4524.0 OffOn 2 1.420405 1.420405 1 2 30 1 221.2070 67.8070
280 U9965 4524.0 Track 1 1.420405 1.420405 1 2 3 1 222.9262 67.3657
281 U10351 891.0 OffOn 1 1.420405 1.420405 1 2 30 1 221.2955 77.4668
282 U10351 891.0 OffOn 2 1.420405 1.420405 1 2 30 1 220.4924 77.5922
283 U10351 891.0 Track 1 1.420405 1.420405 1 2 3 1 227.9032 76.2644
284 U9007 4618.0 OffOn 1 1.420405 1.420405 1 2 30 1 248.6632 38.3497
285 U9007 4618.0 OffOn 2 1.420405 1.420405 1 2 30 1 248.5549 38.4433
286 U9007 4618.0 Track 1 1.420405 1.420405 1 2 3 1 250.5574 36.6787
287 U9007 5257.0 OffOn 1 1.420405 1.420405 1 2 30 1 234.2956 48.0209
288 U9007 5257.0 OffOn 2 1.420405 1.420405 1 2 30 1 234.1669 48.0937
289 U9803 5257.0 OffOn 1 1.420405 1.420405 1 2 30 1 264.9610 57.9997
290 U9803 5257.0 OffOn 2 1.420405 1.420405 1 2 30 1 264.7276 58.2483
291 U9803 5257.0 Track 1 1.420405 1.420405 1 2 3 1 266.5221 56.2884
292 U10351 891.0 OffOn 1 1.420405 1.420405 1 2 30 1 252.1644 67.0847
293 U10351 891.0 OffOn 2 1.420405 1.420405 1 2 30 1 251.8134 67.3038
294 U10351 891.0 Track 1 1.420405 1.420405 1 2 3 1 255.3764 64.9184
295 U10629 2980.0 OffOn 1 1.420405 1.420405 1 2 30 1 233.9472 66.8551
296 U10629 2980.0 OffOn 2 1.420405 1.420405 1 2 30 1 233.5991 66.9846
297 U10629 2980.0 OffOn 1 1.420405 1.420405 1 2 30 1 238.3584 65.0614
298 U10629 2980.0 OffOn 2 1.420405 1.420405 1 2 30 1 238.0441 65.2006
299 U10629 2980.0 Track 1 1.420405 1.420405 1 2 3 1 241.4405 63.6131
300 U11017 4644.0 OffOn 1 1.420405 1.420405 1 2 30 1 235.4605 76.2074
301 U11017 4644.0 OffOn 2 1.420405 1.420405 1 2 30 1 234.7726 76.3879
302 U11017 4644.0 OffOn 1 1.420405 1.420405 1 2 30 1 241.5197 74.3514
303 U11017 4644.0 OffOn 2 1.420405 1.420405 1 2 30 1 240.9590 74.5471
304 U11017 4644.0 Track 1 1.420405 1.420405 1 2 3 1 242.6641 73.9425
305 U11017 4644.0 Track 1 1.420405 1.420405 1 2 1 1 242.9359 73.8408
306 U11017 4644.0 Track 1 1.420405 1.420405 1 2 3 1 246.0511 72.5738
307 U11461 3122.0 OffOn 1 1.420405 1.420405 1 2 30 1 168.0531 59.8975
308 U11461 3122.0 OffOn 2 1.420405 1.420405 1 2 30 1 167.8894 59.8843
309 U11461 3122.0 OffOn 1 1.420405 1.420405 1 2 30 1 173.4698 60.2443
310 U11461 3122.0 OffOn 2 1.420405 1.420405 1 2 30 1 173.3111 60.2376
311 U11461 3122.0 Track 1 1.420405 1.420405 1 2 3 1 178.1207 60.3802
312 U11578 4601.0 OffOn 1 1.420405 1.420405 1 2 30 1 156.3573 58.8036
313 U11578 4601.0 OffOn 2 1.420405 1.420405 1 2 30 1 156.1796 58.7731
314 U11578 4601.0 Track 1 1.420405 1.420405 1 2 3 1 160.4469 59.4464
315 U11578 4601.0 Track 1 1.420405 1.420405 1 2 3 1 161.0040 59.5231
316 U11627 4864.0 OffOn 1 1.420405 1.420405 1 2 30 1 157.8065 55.3372
317 U11627 4864.0 OffOn 2 1.420405 1.420405 1 2 30 1 157.6549 55.3107
318 U11627 4864.0 OffOn 1 1.420405 1.420405 1 2 30 1 162.4625 56.0655
319 U11627 4864.0 OffOn 2 1.420405 1.420405 1 2 30 1 162.3199 56.0465
320 U11627 4864.0 Track 1 1.420405 1.420405 1 2 3 1 166.7983 56.5802
321 U11992 3592.0 Track 1 1.420405 1.420405 1 2 3 1 124.4762 54.3242
322 U11992 3592.0 OffOn 1 1.420405 1.420405 1 2 30 1 125.6320 54.9128
323 U11992 3592.0 OffOn 2 1.420405 1.420405 1 2 30 1 125.4565 54.8251

Create a Selection Object for SDFITS Data#

We will show how to select data using a Selection object. We start by creating the Selection object and putting it into a variable named selection_object.

selection_object = Selection(sdfits)

Using Selection#

Now we show various ways in which the Selection object can be used to select data.

Select by Column Names#

One way of selecting data is by specifying a value for a column name. For example, we can select data which has OBJECT=”U8249” and polarization number 0 using the following.

selection_object.select(object="U8249", plnum=0)

We can view the contents of the selection using its show method.

selection_object.show()
 ID    TAG    OBJECT PLNUM # SELECTED
--- --------- ------ ----- ----------
  0 a56df05ca  U8249     0        152

This displays the selection as a table. In the backround, each time we create a new selection, it is assigned an id and tag.

We can also specify the tag name to have a more meaningful value. In this case we will select both polarizations.

selection_object.select(plnum=[0, 1], tag="plnums")
selection_object.show()
 ID    TAG    OBJECT PLNUM # SELECTED
--- --------- ------ ----- ----------
  0 a56df05ca  U8249     0        152
  1    plnums        [0,1]       3766

Combining Selections#

Once we have multiple selection rules in our Selection object, we can combine them into a single selection using the final method of Selection. This will return a ~pandas.DataFrame.

selection_object.final
OBJECT BANDWID DATE-OBS DURATION EXPOSURE TSYS TDIM7 TUNIT7 CTYPE1 CRVAL1 ... SITELAT SITEELEV EXTNAME FITSINDEX UTC CHAN PROC OBSTYPE SUBOBSMODE INTNUM
0 U8249 12500000.0 2004-04-22T06:44:49.00 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408421e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 06:44:49.000 None OffOn PSWITCHOFF TPWCAL 0
1 U8249 12500000.0 2004-04-22T06:44:49.00 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408421e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 06:44:49.000 None OffOn PSWITCHOFF TPWCAL 0
2 U8249 12500000.0 2004-04-22T06:44:59.01 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408421e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 06:44:59.010 None OffOn PSWITCHOFF TPWCAL 1
3 U8249 12500000.0 2004-04-22T06:44:59.01 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408421e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 06:44:59.010 None OffOn PSWITCHOFF TPWCAL 1
4 U8249 12500000.0 2004-04-22T06:45:09.03 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408421e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 06:45:09.030 None OffOn PSWITCHOFF TPWCAL 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
147 U8249 12500000.0 2004-04-22T07:45:01.00 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408411e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 07:45:01.000 None Track NONE TPWCAL 0
148 U8249 12500000.0 2004-04-22T07:45:11.01 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408411e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 07:45:11.010 None Track NONE TPWCAL 1
149 U8249 12500000.0 2004-04-22T07:45:11.01 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408411e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 07:45:11.010 None Track NONE TPWCAL 1
150 U8249 12500000.0 2004-04-22T07:45:21.03 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408411e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 07:45:21.030 None Track NONE TPWCAL 2
151 U8249 12500000.0 2004-04-22T07:45:21.03 5.005 4.779488 1.0 (32768,1,1,1) Counts FREQ-OBS 1.408411e+09 ... 38.43312 824.595 SINGLE DISH 0 2004-04-22 07:45:21.030 None Track NONE TPWCAL 2

152 rows × 94 columns

In this particular case, we have 152 rows.

Remove Selections#

This can be done by id or tag. Multiple rows with the same tag will all be removed.

selection_object.remove(id=0)
selection_object.remove(tag='plnums')
selection_object.show()
 ID TAG OBJECT PLNUM # SELECTED
--- --- ------ ----- ----------

To remove all selections use Selection.clear, like

selection_object.clear()
selection_object.show()
 ID TAG OBJECT BANDWID DATE-OBS ... PROC OBSTYPE SUBOBSMODE INTNUM # SELECTED
--- --- ------ ------- -------- ... ---- ------- ---------- ------ ----------

Select by Range#

It is also possible to define a selection given a range of values. In this case the selection must be specified using either a list, [], or a tuple, (), with a start and an end value. Lower limits are give by (value,None) or (value,). Upper limits are given by (None,value), since (,value) is not valid python. For coordinates the default unit is taken to be degrees. Other units can be explicitly given. Both () and [] are valid for indicated ranges, but only tuples can be used if (value,) for lower limit.

For example to select only rows where the right ascention is greater than 114 degrees we would use

selection_object.select_range(ra=(114,))
selection_object.show()
 ID    TAG           CRVAL2       # SELECTED
--- --------- ------------------- ----------
  0 d2cefae02 [np.float64(114.0)]       3766

and to select rows where the elevation is below 80 degrees

selection_object.select_range(elevation=[None,80])
selection_object.show()
 ID    TAG           CRVAL2        ELEVATIO # SELECTED
--- --------- ------------------- --------- ----------
  0 d2cefae02 [np.float64(114.0)]                 3766
  1 8cc91cf46                     [None,80]       3582

We can check that the selections were applied properly by inspecting at the final result and its “ELEVATIO” column.

It is also possible to use units during selection. For example

selection_object.select_range(dec=[854, 855] * u.arcmin)
selection_object.show()
 ID    TAG           CRVAL2       ...  ELEVATIO # SELECTED
--- --------- ------------------- ... --------- ----------
  0 d2cefae02 [np.float64(114.0)] ...                 3766
  1 8cc91cf46                     ... [None,80]       3582
  2 fecefec57                     ...                  132

Selection keywords are case insensitive, so for example using DeC is the same as dec. Note also elevation is aliased here to elevatio (the actual SDFITS keyword)

selection_object.select_range(eLEVaTIon=[None,80])
selection_object.show()
 ID    TAG           CRVAL2       ...  ELEVATIO # SELECTED
--- --------- ------------------- ... --------- ----------
  0 d2cefae02 [np.float64(114.0)] ...                 3766
  1 8cc91cf46                     ... [None,80]       3582
  2 fecefec57                     ...                  132
  3 f19b6ea7d                     ... [None,80]       3582

Notice that the selections with ids 1 and 3 are the same. By default, Selection will not check for duplicates (this makes it swifter).

Select Within a Range#

It is also possible to specify the midpoint and a range to make a selection. In this case we use select_within and specify the mean value and the +- range.

For example to select between elevation of 50-10 and 50+10 we would use

selection_object.select_within(eleVation=(50,10))
selection_object.show()
 ID    TAG           CRVAL2       ...  ELEVATIO # SELECTED
--- --------- ------------------- ... --------- ----------
  0 d2cefae02 [np.float64(114.0)] ...                 3766
  1 8cc91cf46                     ... [None,80]       3582
  2 fecefec57                     ...                  132
  3 f19b6ea7d                     ... [None,80]       3582
  4 e198fc276                     ...   [40,60]       1694

Which shows a selection between 40 and 50 degrees of elevation.

Using Aliases#

Selection knows about certain aliases for column names. For example, the SDFITS column ELEVATIO can also be selected using ELEVATION. The aliases are defined in the aliases attribute of Selection.

selection_object.aliases
{'FREQ': 'CRVAL1',
 'RA': 'CRVAL2',
 'DEC': 'CRVAL3',
 'GLON': 'CRVAL2',
 'GLAT': 'CRVAL3',
 'GALLON': 'CRVAL2',
 'GALLAT': 'CRVAL3',
 'ELEVATION': 'ELEVATIO',
 'SOURCE': 'OBJECT',
 'POL': 'PLNUM',
 'SUBREF': 'SUBREF_STATE'}

It is also possible to add your own aliases. For example to use target and az as aliases for OBJECT and AZIMUTH we would use

selection_object.alias({'target':'object','az':'azimuth'})
selection_object.aliases
{'FREQ': 'CRVAL1',
 'RA': 'CRVAL2',
 'DEC': 'CRVAL3',
 'GLON': 'CRVAL2',
 'GLAT': 'CRVAL3',
 'GALLON': 'CRVAL2',
 'GALLAT': 'CRVAL3',
 'ELEVATION': 'ELEVATIO',
 'SOURCE': 'OBJECT',
 'POL': 'PLNUM',
 'SUBREF': 'SUBREF_STATE',
 'TARGET': 'OBJECT',
 'AZ': 'AZIMUTH'}
selection_object.select(target="U8249")
selection_object.show()
 ID    TAG    OBJECT ...  ELEVATIO # SELECTED
--- --------- ------ ... --------- ----------
  0 d2cefae02        ...                 3766
  1 8cc91cf46        ... [None,80]       3582
  2 fecefec57        ...                  132
  3 f19b6ea7d        ... [None,80]       3582
  4 e198fc276        ...   [40,60]       1694
  5 c2d095a81  U8249 ...                  304

Notice that this will only affect the aliases for this particular instance of a Selection. Any new Selection objects will not know about these aliases.

Selection(sdfits).aliases
{'FREQ': 'CRVAL1',
 'RA': 'CRVAL2',
 'DEC': 'CRVAL3',
 'GLON': 'CRVAL2',
 'GLAT': 'CRVAL3',
 'GALLON': 'CRVAL2',
 'GALLAT': 'CRVAL3',
 'ELEVATION': 'ELEVATIO',
 'SOURCE': 'OBJECT',
 'POL': 'PLNUM',
 'SUBREF': 'SUBREF_STATE'}

Empty Selections#

Any selection that results in no data being selected is ignored. You will get a warning message in this case.

selection_object.select(target='foobar')
/home/docs/checkouts/readthedocs.org/user_builds/dysh/checkouts/release-0.11.5/src/dysh/util/selection.py:647: UserWarning: Your selection rule resulted in no data being selected. Ignoring.
  warnings.warn("Your selection rule resulted in no data being selected. Ignoring.")  # noqa: B028

Time Selections#

UTC time ranges can be selected with Time objects. This checks against the UTC timestamp column. For LST, use select_range(lst=[number1,number2]).

selection_object.select_range(utc=(Time("2004-04-22T06:08:05", scale="utc"),
                                   Time("2004-04-22T06:08:26", scale="utc")))
selection_object.show()
 ID    TAG    ... # SELECTED
--- --------- ... ----------
  0 d2cefae02 ...       3766
  1 8cc91cf46 ...       3582
  2 fecefec57 ...        132
  3 f19b6ea7d ...       3582
  4 e198fc276 ...       1694
  5 c2d095a81 ...        304
  6 15ce594d4 ...         12
selection_object.final["UTC"]
Series([], Name: UTC, dtype: datetime64[ns])

Channel Selection#

To select channels there is a special method, Selection.select_channel. Channels can be ranges, individual channels or combinations there of. Note that selecting channels does not down select rows.

a = [1, 4, (30, 40)]
selection_object.select_channel(a)
selection_object.show()
 ID    TAG    OBJECT ...      CHAN     # SELECTED
--- --------- ------ ... ------------- ----------
  0 d2cefae02        ...                     3766
  1 8cc91cf46        ...                     3582
  2 fecefec57        ...                      132
  3 f19b6ea7d        ...                     3582
  4 e198fc276        ...                     1694
  5 c2d095a81  U8249 ...                      304
  6 15ce594d4        ...                       12
  7 96ab2cec1        ... [1,4,(30,40)]       3766

Note that you can only have one channel selection rule at a time.

try: 
    selection_object.select_channel([60,70])
except Exception as e:
    print(e)
You can only have one channel selection rule. Remove the old rule before creating a new one.

Applying Selections to Your Data#

So far we have seen how to create and manage selections. However, these have been made with a separate Selection object. All of the methods exposed above are also available through the GBTFITSLoad object. For example, to list the selections we’d use GBTFITSLoad.selection.show(), to clear the selections GBTFITSLoad.selection.clear(), and to select in a range GBTFITSLoad.select_range().

To show the effect we start by using gettp with the basic required selection of ifnum, plnum and fdnum.

tp_all = sdfits.gettp(ifnum=0, plnum=0, fdnum=0)
len(tp_all)
81

That is all 81 scans were selected.

Now we select something and show the selection.

sdfits.select_range(eLEVaTIon=[None,80])
sdfits.selection.show()
 ID    TAG     ELEVATIO # SELECTED
--- --------- --------- ----------
  0 917ec8542 [None,80]       3582

Now repeat the gettp call and notice the difference.

tp_selection = sdfits.gettp(ifnum=0, plnum=0, fdnum=0)
len(tp_selection)
74

Now only 74 scans are selected, the ones that have an elevation below 80 degrees.