YTEP-0027: Non-Spatial Data¶
Abstract¶
Created: December 1, 2015 Author: Matthew Turk, Nathan Goldbaum, John ZuHone
This YTEP outlines a plan to implement support for native non-spatial data representations in yt.
Status¶
In Progress
Project Management Links¶
- This pull request is the first attempt at implementing this: https://bitbucket.org/yt_analysis/yt/pull-requests/1891/wip-supporting-non-spatial-coordinate
- This Trello card discusses it a bit: https://trello.com/c/7d5PCUym/7-index-arrays as does this one: https://trello.com/c/MXF1sWam/6-non-spatial-data
Detailed Description¶
Background¶
Currently, most of yt assumes that its data structures (particularly for
purposes of selection and units) are related to spatial coordinates. This
leads to issues such as spherical and cylindrical coordinates believing their
angular coordinates are in code_length
, having to pretend that pressure
coordinates are code_length
, and so on.
An additional complication is that at present, index operations (particularly
in selection operations) cannot know in advance that their input arrays are in
“index space.” This leads to costly operations that check the units (which are
assumed to be code_length
) and converts if need be. It is often very
difficult to create a situation where the arrays are not in those units,
though.
Fortunately, there are very few places where the arrays used to index the dataset are utilized directly; for the most part, they are manually stripped of units and then re-applied with the correct units in classes such as the spherical coordinates handler.
This YTEP concerns itself with a few things:
- Allowing datasets to be loaded that are indexed in non-spatial dimensions (for instance, lat, lon, pressure)
- Developing unitful coordinate systems for these non-spatial datasets
- Implementing a custom coordinate handler
Why is this hard?¶
There are assumptions made in a number of places that data is spatial. Often this shows up in one of these ways:
- Calls to
ensure_code
or conversions explicitly tocode_length
.- Assumptions that a set of units can be represented as a form of length, for instance during integration.
- Inhomogeneous units in a single YTArray are not supported in the current development tip of yt. Some behavior can be mocked up using object arrays, but this is incredibly unreliable.
Implementation of Index Arrays¶
To address this issue, an implementation of an object explicitly for indexing
data has been created, currently called an IndexArray
. This object
subclasses from YTArray
, but differs in some crucial ways.
- Multiple units may be specified. These units must be of the same length as the final axis of the array.
- The units in an array are immutable. To change units, the array must be copied. Practically, this means that
convert_to_units
will raise an exception, but it brings with it the benefit that it is difficult to find oneself in a situation where something likedomain_left_edge
is not the native units of the indexing system.- Fancy-indexing is not possible; only slicing can be conducted.
These arrays are almost always assumed to be created internally within yt.
Some situations, such as specifying a “center” to an object, can accept
IndexArray
objects.
Implementation of Coordinates¶
For inhomogeneous units to be useful, there must be a mechanism for specifying
the units to a coordinate handler. The implementation of a
CustomCoordinateHandler
manages this task. This coordinate handler assumes
that the coordinate space is functionally Cartesian, but where the axes
correspond to non-spatial information. For instance, you might have the first
axis be mass, the second time, the third distance.
Warning
At present, distance metrics are assumed to be scaled identically amongst the three axes. This means that distance is computed in a Euclidean fashion!
To specify this, the CustomCoordinateHandler
accepts an axis unit
specification. This extends the existing axis ordering argument to include
axis units. From the perspective of the user, this would look like this::
ds = yt.load_uniform_grid(data, [30, 30, 30],
bbox=np.array([[0.0, 10.0], [0.0, 30.0], [0.4, 0.9]]),
geometry = ('custom', (('length', 'm'), ('mass', 'g'), ('time', 's')))
)
In this function call, note that the geometry
argument has been extended to
include both the axis ordering and the units that each takes. The first axis
is called length
with units of m
, the second is called mass
with
units of g
and the third is time
with units of s
.
Note that these could all be length units, but with different names – this would also be a custom coordinate system where the naming scheme can be modified.
All coordinate handlers now have an axes_unit
dict, which maps the axis
names to units.
Future developments may include allowing for specification of non-Euclidean distance functions.
Impact on Plotting¶
PlotWindow
as a whole is designed to be used for plotting spatial datasets.
Integrating non-spatial datasets presents us with two options:
- Modify
PlotWindow
such that it is generic with respect to units and aspect ratios and usable for non-spatial data.- Utilize something like
PhasePlot
orParticlePlot
for plotting image data from non-spatial datasets.
At present, extremely basic plotting functionality has been put into
PlotWindow
to deal with non-spatial datasets, but this has also caused some
minor impedance mismatches.
The current long-term strategy is to refactor the two plotting interfaces to
share a common base class (also likely with ParticlePlot
), and then have
these choose the appropriate subclass for plotting non-spatial data and “do the
right thing.”
Future: More than Three Dimensions¶
Utilizing IndexArray
is the first step toward enabling additional
dimensions of data access. However, this set of functionality alone is by far
insufficient. In order to enable access to greater dimensionality of data,
there must be concerted effort to eliminate assumptions of 3 dimensions and
generalize data structures. While this is now feasible, it is still quite the
undertaking.
Backwards Compatibility¶
The biggest potential source of problems with backwards compatibility arise
from the utilization of YTArray
objects where IndexArray
objects are
required. This is mostly likely to happen places like centers specified to
objects. However, in updating the tests, it seems that these are minimally
invasive and should have only very minor impact on user-facing scripts and
APIs.
Work is in progress to ensure that an IndexArray
with homogeneous units
behaves the same as a YTArray
with those same units. This should minimize
impact.