Skip to content

Commit 7647830

Browse files
authored
Merge pull request #367 from jianyangli/add_dataclass_merge_202209
Enable combining `DataClass` objects
2 parents a6619c7 + 74d8edc commit 7647830

File tree

4 files changed

+295
-35
lines changed

4 files changed

+295
-35
lines changed

CHANGES.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ sbpy.data
2222
- Added ``DataClass.__contains__`` to enable `in` operator for ``DataClass``
2323
objects. [#357]
2424

25+
- Added ``DataClass.add_row``, ``DataClass.vstack``
26+
methods. [#367]
27+
2528

2629
sbpy.photometry
2730
^^^^^^^^^^^^^^^

docs/sbpy/data/dataclass.rst

Lines changed: 45 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -314,8 +314,8 @@ object, you can use `~sbpy.data.DataClass.field_names`:
314314
['ra', 'dec', 't']
315315

316316
You can also use the `in` operator to check if a field is contained in
317-
a `~sbpy.data.DataClass` object. Alternative field names can also be
318-
used for the `in` test:
317+
a `~sbpy.data.DataClass` object. Alternative field names can be used
318+
for the `in` test:
319319

320320
>>> 'ra' in obs
321321
True
@@ -411,21 +411,17 @@ directly addressing them:
411411
>>> obs['ra']
412412
<Quantity [10.323423, 10.333453, 10.343452] deg>
413413

414-
More complex data table modifications are possible by directly
415-
accessing the underlying `~astropy.table.QTable` object as shown below.
416-
417-
`~sbpy.data.DataClass` provides a direct interface to the table
418-
modification functions provided by `~astropy.table.Table`:
419-
`~astropy.table.Table.add_row`, `~astropy.table.Table.add_column`,
420-
`~astropy.table.Table.add_columns`, etc. For instance, it is trivial to add
421-
additional rows and columns to these objects.
414+
The basic functionalities to modify the data table are implemented in
415+
`~sbpy.data.DataClass`, including adding rows and columns and stack a
416+
DataClass with another DataClass object or an `~astropy.table.Table`
417+
object.
422418

423419
Let's assume you want to add some more observations to your ``obs``
424420
object:
425421

426422
.. doctest-requires:: astropy>=5
427423

428-
>>> obs.table.add_row([10.255460 * u.deg, -12.39460 * u.deg, 2451523.94653 * u.d])
424+
>>> obs.add_row([10.255460 * u.deg, -12.39460 * u.deg, 2451523.94653 * u.d])
429425
>>> obs
430426
<QTable length=4>
431427
ra dec t
@@ -442,13 +438,12 @@ or if you want to add a column to your object:
442438

443439
.. doctest-requires:: astropy>=5
444440

445-
>>> from astropy.table import Column
446-
>>> obs.table.add_column(Column(['V', 'V', 'R', 'i'], name='filter'))
441+
>>> obs.apply(['V', 'V', 'R', 'i'], name='filter')
447442
>>> obs
448443
<QTable length=4>
449444
ra dec t filter
450445
deg deg
451-
float64 float64 Time str1
446+
float64 float64 Time str32
452447
--------- --------- ------------- ------
453448
10.323423 -12.42123 2451523.6234 V
454449
10.333453 -12.41562 2451523.7345 V
@@ -464,7 +459,7 @@ The same result can be achieved using the following syntax:
464459
<QTable length=4>
465460
ra dec t filter filter2
466461
deg deg
467-
float64 float64 Time str1 str1
462+
float64 float64 Time str32 str1
468463
--------- --------- ------------- ------ -------
469464
10.323423 -12.42123 2451523.6234 V V
470465
10.333453 -12.41562 2451523.7345 V V
@@ -477,16 +472,43 @@ Similarly, existing columns can be modified using:
477472

478473
>>> obs['filter'] = ['g', 'i', 'R', 'V']
479474

480-
Note how the `~astropy.table.Table.add_column` and
481-
`~astropy.table.Table.add_row` functions are called from
482-
``obs.table``. `~sbpy.data.DataClass.table` is a property that exposes
483-
the underlying `~astropy.table.QTable` object so that the user can
484-
directly interact with it. Please refer to the `~astropy.table.Table`
485-
reference and
486-
[documentation](https://docs.astropy.org/en/stable/table/index.html)
487-
for more information on how to modify `~astropy.table.QTable` objects.
475+
If you want to stack two observations into a single object:
488476

477+
.. doctest-requires:: astropy>=5
489478

479+
>>> ra = [20.223423, 20.233453, 20.243452] * u.deg
480+
>>> dec = [12.42123, 12.41562, 12.40435] * u.deg
481+
>>> phase = [10.1, 12.3, 15.6] * u.deg
482+
>>> epoch = Time(2451623.5 + array([0.1234, 0.2345, 0.3525]), format='jd')
483+
>>> obs2 = Obs.from_columns([ra, dec, epoch, phase],
484+
... names=['ra', 'dec', 't', 'phase'])
485+
>>>
486+
>>> obs.vstack(obs2)
487+
>>> obs
488+
<QTable length=7>
489+
ra dec t filter filter2 phase
490+
deg deg deg
491+
float64 float64 Time str1 str1 float64
492+
--------- --------- ------------- ------ ------- -------
493+
10.323423 -12.42123 2451523.6234 g V ———
494+
10.333453 -12.41562 2451523.7345 i V ———
495+
10.343452 -12.40435 2451523.8525 R R ———
496+
10.25546 -12.3946 2451523.94653 V i ———
497+
20.223423 12.42123 2451623.6234 -- -- 10.1
498+
20.233453 12.41562 2451623.7345 -- -- 12.3
499+
20.243452 12.40435 2451623.8525 -- -- 15.6
500+
501+
Note that the data table to be stacked doesn't have to have the same
502+
columns as the original data table. A keyword `join_type` is used to
503+
decide how to process the different sets of columns. See
504+
`~astropy.table.Table.vstack()` for more detail.
505+
506+
Because the underlying `~astropy.table.QTable` can be exposed by the
507+
`~sbpy.data.DataClass.table` property, it is possible to modify the data
508+
table by directly accessing the underlying `~astropy.table.QTable` object.
509+
However, this is not generally advised. You should use the mechanisms provided
510+
by `~sbpy.data.DataClass` to manipulate the data table as much as possible
511+
to maintain the integrity of the data table.
490512

491513
Additional Data Container Concepts
492514
==================================

sbpy/data/core.py

Lines changed: 128 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,10 @@
77
created on June 22, 2017
88
"""
99

10+
from collections.abc import Mapping
1011
from copy import deepcopy
1112
from numpy import ndarray, array, hstack, iterable
12-
from astropy.table import QTable, Column
13+
from astropy.table import QTable, Table, Column, Row, vstack
1314
from astropy.time import Time
1415
from astropy.coordinates import Angle
1516
import astropy.units as u
@@ -661,32 +662,35 @@ def __contains__(self, value):
661662
else:
662663
return False
663664

664-
def _translate_columns(self, target_colnames):
665+
def _translate_columns(self, target_colnames, ignore_missing=False):
665666
"""Translate target_colnames to the corresponding column names
666667
present in this object's table. Returns a list of actual column
667668
names present in this object that corresponds to target_colnames
668-
(order is preserved). Raises KeyError if not all columns are
669-
present or one or more columns could not be translated.
669+
(order is preserved). If `ignore_missing == False` (default),
670+
raises a `KeyError` if a match cannot be found for an input column
671+
name (neither in this object nor defined in `Conf.fieldnames`).
672+
If `ignore_missing == True`, then the problemtic column name will
673+
be silently carried over and returned.
670674
"""
671675

672676
if not isinstance(target_colnames, (list, ndarray, tuple)):
673677
target_colnames = [target_colnames]
674678

675679
translated_colnames = deepcopy(target_colnames)
676680
for idx, colname in enumerate(target_colnames):
677-
# colname is already a column name in self.table
678-
if colname in self.field_names:
679-
continue
680-
# colname is an alternative column name
681-
else:
681+
if colname not in self.field_names:
682+
# colname not already in self.table
682683
for alt in Conf.fieldnames[
683684
Conf.fieldname_idx.get(colname, slice(0))]:
685+
# defined in `Conf.fieldnames`
684686
if alt in self.field_names:
685687
translated_colnames[idx] = alt
686688
break
687689
else:
688-
raise KeyError('field "{:s}" not available.'.format(
689-
colname))
690+
# undefined colname
691+
if not ignore_missing:
692+
raise KeyError('field "{:s}" not available.'.format(
693+
colname))
690694

691695
return translated_colnames
692696

@@ -934,3 +938,116 @@ def verify_fields(self, field=None):
934938
):
935939
raise FieldError('Field {} does not have units of {}'
936940
.format(test_field, str(dim.unit)))
941+
942+
def add_row(self, vals, names=None, units=None):
943+
"""Add a new row to the end of DataClass.
944+
945+
This is similar to `astropy.table.Table.add_row`, but allows for
946+
a set of different columns in the new row from the original DataClass
947+
object. It also allows for aliases of column names.
948+
949+
Parameters
950+
----------
951+
vals : `~astropy.table.Row`, tuple, list, dict
952+
Row to be added
953+
names : iterable of strings, optional
954+
The names of columns if not implicitly specified in ``vals``.
955+
Takes precedence over the column names in ``vals`` if any.
956+
units : str or list-like, optional
957+
Unit labels (as provided by `~astropy.units.Unit`) in which
958+
the data provided in ``rows`` will be stored in the underlying
959+
table. If None, the units as provided by ``rows``
960+
are used. If the units provided in ``units`` differ from those
961+
used in ``rows``, ``rows`` will be transformed to the units
962+
provided in ``units``. Must have the same length as ``names``
963+
and the individual data rows in ``rows``. Default: None
964+
965+
Notes
966+
-----
967+
If a time is included in ``vals``, it can either be an explicit
968+
`~astropy.time.Time` object, or a number, `~astropy.units.Quantity`
969+
object, or string that can be inferred to be a time by the existing
970+
column of the same name or by its position in the sequence. In
971+
this case, the type of time values must be valid to initialize
972+
an `~astropy.time.Time` object with format='jd' or 'isot', and
973+
the scale of time is default to the scale of the corresponding
974+
existing column of time.
975+
976+
Examples
977+
--------
978+
>>> import astropy.units as u
979+
>>> from sbpy.data import DataClass
980+
>>>
981+
>>> data = DataClass.from_dict(
982+
... {'rh': [1, 2, 3] * u.au, 'delta': [1, 2, 3] * u.au})
983+
>>> row = {'rh': 4 * u.au, 'delta': 4 * u.au, 'phase': 15 * u.deg}
984+
>>> data.add_row(row)
985+
"""
986+
if isinstance(vals, Row):
987+
vals = DataClass.from_table(vals)
988+
else:
989+
if isinstance(vals, Mapping):
990+
keys_list = list(vals.keys())
991+
vals_list = [vals[k] for k in keys_list]
992+
vals = vals_list
993+
if names is None:
994+
names = keys_list
995+
else:
996+
# assume it's an iterable that can be taken as columns
997+
if names is None:
998+
# if names of columns are not specified, default to the
999+
# existing names and orders
1000+
names = self.field_names
1001+
# check if any astropy Time columns
1002+
for i, k in enumerate(names):
1003+
if k in self and isinstance(self[k], Time):
1004+
vals[i] = Time(vals[i], scale=self[k].scale,
1005+
format='isot' if isinstance(vals[i], str)
1006+
else 'jd')
1007+
vals = DataClass.from_rows(vals, names, units=units)
1008+
self.vstack(vals)
1009+
1010+
def vstack(self, data, **kwargs):
1011+
"""Stack another DataClass object to the end of DataClass
1012+
1013+
Similar to `~astropy.table.Table.vstack`, the DataClass object
1014+
to be stacked doesn't have to have the same set of columns as
1015+
the existing object. The `join_type` keyword parameter will be
1016+
used to decide how to process the different sets of columns.
1017+
1018+
Joining will be in-place.
1019+
1020+
Parameters
1021+
----------
1022+
data : `~sbpy.data.DataClass`, dict, `~astropy.table.Table`
1023+
Object to be joined with the current object
1024+
kwargs : dict
1025+
Keyword parameters accepted by `~astropy.table.Table.vstack`.
1026+
1027+
Examples
1028+
--------
1029+
>>> import astropy.units as u
1030+
>>> from sbpy.data import DataClass
1031+
>>>
1032+
>>> data1 = DataClass.from_dict(
1033+
... {'rh': [1, 2, 3] * u.au, 'delta': [1, 2, 3] * u.au})
1034+
>>> data2 = DataClass.from_dict(
1035+
... {'rh': [4, 5] * u.au, 'phase': [15, 15] * u.deg})
1036+
>>> data1.vstack(data2)
1037+
"""
1038+
# check and process input data
1039+
if isinstance(data, dict):
1040+
data = DataClass.from_dict(data)
1041+
elif isinstance(data, Table):
1042+
data = DataClass.from_table(data)
1043+
if not isinstance(data, DataClass):
1044+
raise ValueError('DataClass, dict, or astorpy.table.Table are '
1045+
'expected, but {} is received.'.
1046+
format(type(data)))
1047+
1048+
# adjust input column names for alises
1049+
alt = self._translate_columns(data.field_names, ignore_missing=True)
1050+
data.table.rename_columns(data.field_names, alt)
1051+
1052+
# join with the input table
1053+
self.table = vstack([self.table, data.table], **kwargs)

0 commit comments

Comments
 (0)