YTEP-0027: Non-Spatial Data¶
Created: December 1, 2015 Author: Matthew Turk, Nathan Goldbaum, John ZuHone
This YTEP outlines a plan to implement support for native non-spatial data representations in yt.
Currently, most of yt assumes that its data structures (particularly for
purposes of selection and units) are related to spatial coordinates. This
leads to issues such as spherical and cylindrical coordinates believing their
angular coordinates are in
code_length, having to pretend that pressure
code_length, and so on.
An additional complication is that at present, index operations (particularly
in selection operations) cannot know in advance that their input arrays are in
“index space.” This leads to costly operations that check the units (which are
assumed to be
code_length) and converts if need be. It is often very
difficult to create a situation where the arrays are not in those units,
Fortunately, there are very few places where the arrays used to index the dataset are utilized directly; for the most part, they are manually stripped of units and then re-applied with the correct units in classes such as the spherical coordinates handler.
This YTEP concerns itself with a few things:
- Allowing datasets to be loaded that are indexed in non-spatial dimensions (for instance, lat, lon, pressure)
- Developing unitful coordinate systems for these non-spatial datasets
- Implementing a custom coordinate handler
Why is this hard?¶
There are assumptions made in a number of places that data is spatial. Often this shows up in one of these ways:
- Calls to
ensure_codeor conversions explicitly to
- Assumptions that a set of units can be represented as a form of length, for instance during integration.
- Inhomogeneous units in a single YTArray are not supported in the current development tip of yt. Some behavior can be mocked up using object arrays, but this is incredibly unreliable.
Implementation of Index Arrays¶
To address this issue, an implementation of an object explicitly for indexing
data has been created, currently called an
IndexArray. This object
YTArray, but differs in some crucial ways.
- Multiple units may be specified. These units must be of the same length as the final axis of the array.
- The units in an array are immutable. To change units, the array must be copied. Practically, this means that
convert_to_unitswill raise an exception, but it brings with it the benefit that it is difficult to find oneself in a situation where something like
domain_left_edgeis not the native units of the indexing system.
- Fancy-indexing is not possible; only slicing can be conducted.
These arrays are almost always assumed to be created internally within yt.
Some situations, such as specifying a “center” to an object, can accept
Implementation of Coordinates¶
For inhomogeneous units to be useful, there must be a mechanism for specifying
the units to a coordinate handler. The implementation of a
CustomCoordinateHandler manages this task. This coordinate handler assumes
that the coordinate space is functionally Cartesian, but where the axes
correspond to non-spatial information. For instance, you might have the first
axis be mass, the second time, the third distance.
At present, distance metrics are assumed to be scaled identically amongst the three axes. This means that distance is computed in a Euclidean fashion!
To specify this, the
CustomCoordinateHandler accepts an axis unit
specification. This extends the existing axis ordering argument to include
axis units. From the perspective of the user, this would look like this::
ds = yt.load_uniform_grid(data, [30, 30, 30], bbox=np.array([[0.0, 10.0], [0.0, 30.0], [0.4, 0.9]]), geometry = ('custom', (('length', 'm'), ('mass', 'g'), ('time', 's'))) )
In this function call, note that the
geometry argument has been extended to
include both the axis ordering and the units that each takes. The first axis
length with units of
m, the second is called
g and the third is
time with units of
Note that these could all be length units, but with different names – this would also be a custom coordinate system where the naming scheme can be modified.
All coordinate handlers now have an
axes_unit dict, which maps the axis
names to units.
Future developments may include allowing for specification of non-Euclidean distance functions.
Impact on Plotting¶
PlotWindow as a whole is designed to be used for plotting spatial datasets.
Integrating non-spatial datasets presents us with two options:
PlotWindowsuch that it is generic with respect to units and aspect ratios and usable for non-spatial data.
- Utilize something like
ParticlePlotfor plotting image data from non-spatial datasets.
At present, extremely basic plotting functionality has been put into
PlotWindow to deal with non-spatial datasets, but this has also caused some
minor impedance mismatches.
The current long-term strategy is to refactor the two plotting interfaces to
share a common base class (also likely with
ParticlePlot), and then have
these choose the appropriate subclass for plotting non-spatial data and “do the
Future: More than Three Dimensions¶
IndexArray is the first step toward enabling additional
dimensions of data access. However, this set of functionality alone is by far
insufficient. In order to enable access to greater dimensionality of data,
there must be concerted effort to eliminate assumptions of 3 dimensions and
generalize data structures. While this is now feasible, it is still quite the
The biggest potential source of problems with backwards compatibility arise
from the utilization of
YTArray objects where
IndexArray objects are
required. This is mostly likely to happen places like centers specified to
objects. However, in updating the tests, it seems that these are minimally
invasive and should have only very minor impact on user-facing scripts and
Work is in progress to ensure that an
IndexArray with homogeneous units
behaves the same as a
YTArray with those same units. This should minimize