YTEP-0017: Domain-Specific Output Types¶
Created: September 18, 2013 Author: Matthew Turk and Anthony Scopatz
This YTEP is designed to begin the process of generalizing astrophysics-specific components of yt toward applications in other domains.
Proposed and in completed.
This would only be implemented in yt 3.0.
Currently, yt is extremely strongly focused on astrophysical data. This leads
to the inclusion of attributes such as
current_redshift and so on, as well as some other fundam. Even within
astrophysical simulations, these can be irrelevant or unnecessary.
Furthermore, there may be attributes relevant to other domains (that transcend
a single subclass of
StaticOutput) that may be relevant or necessary.
This concept of branding things extends even to the level of the commonly-used
pf, which originated within the original Enzo usage as
shorthand for “parameter file,” and the name
StaticOutput as in contrast to
the “streaming” movie format within Enzo. In order to effectively move beyond
both astro- and Enzo-centrism, the terminology, attributes, and extensibility
of datasets should be emphasized and defined.
Attributes on StaticOutput¶
The following attributes are defined on every
StaticOutput regardless of
whether the dataset is astrophysics, cosmology, or even rectilinear cartesian
current_time(note: this also is not correctly implemented for Enzo)
cosmological_simulation is set to off, the cosmology-related
parameters will be defined. Additionally, the default “field type” is
which is globally set and not necessarily trivial to modify. Changing the
units to be less astro-specific (which may not be necessary for length units)
is part of a larger units-related discussion, rather than part of this YTEP.
StaticOutput is tied extremely strongly to a file on disk.
Because that is largely internally-facing, changing that may not be subject to
a YTEP, but rather a simple refactoring.
Finally, not all simulation types have a concept of
if the indexing system does. This is currently outside the scope of this YTEP.
The domain left and right edges also do not always matter for particle
simulations (except in non-outflow boundary conditions) but are still always
relevant to the indexing system.
Below are a few suggested mechanisms for retaining this information as “first class” attributes of a given data set when appropriate, but to remove it from those datasets where it is not appropriate.
Naming and Branding¶
Objects will be renamed:
StaticOutputwill be renamed to
TimeSeriesDatawill be renamed to
DatasetSeriesand will no longer exclusively refer to a time-related set of data, but instead include arbitrary collections of datasets.
- Instead of
pfas shorthand, we will use
Currently, all datasets expose a
.hierarchy attribute, shortened to
This naming is a holdover from the time when Enzo ando ther patch-based AMR
datasets were the primary data examined with yt. However, this makes
considerably less sense when seen in light of support of particle datasets,
semi-structured datasets, unigrid datasets, and eventually unstructured mesh
datasets. What we really mean when we say
.h is index
or geometry. Currently, the
StaticOutput object also possesses a
.geometry attribute, although this is a string scalar.
I do not think we should replace the
.h attribute wholesale, and I do not
necessarily think that data objects should necessarily directly hang off of the
StaticOutput (or whatever it is renamed) object. However, I do think that
we should eliminate
hierarchy in favor of something more generic that is
more descriptive, and we should consider alternates for creating data objects.
Regardless of what we decide on, the
.h attribute should remain for the
time being, and we should also not instantiate our indexing method until
The resolution decided upon during discussion has been:
- Eliminate the
hierarchyobject as a name.
geometryseems to be the most popular for what the
- Retain the
hattribute as an alias (for now, possibly forever)
- Each dataset will have an
indexproperty which will be a
OctIndexetc etc. This is essentially the same as the
- Move data objects up to the top level of
Because some domains will have fundamental parameters that put into context the data they represent, this YTEP proposes a plugin system wherein domain-specific “contexts” register themselves and specific frontends identify which plugins are applicable to that specific frontend. This dual-ended handshaking helps ensure that plugins ensure they are applicable to a frontend, and that frontends identify potential plugins that work for them.
A domain plugin (called
DomainContext) will operate on a dataset
object, adding new attributes, but not new methods. This violates common
object-oriented philosophy and practice, but from an implementation perspective
it seems to be the cleanest and avoiding the most meta-programming.
On instantiation, a static output normally goes through these steps:
This YTEP would propose changing this order to:
_apply_domain_contexts would iterate through the intersecting set of
globally and frontend-specific registered domain-specific plugins, and for each
one would call the class method:
is_appropriate supplying the dataset
self) as the only argument. If so, the plugin would then return
True and an instance of it would be appended to the dataset property
domain_contexts (or some other name, as this collides with
referring to simulation spatial information.) Alternately, we could mandate an
_adapt_* method (seen below) and in the absence of such a method assume the
plugin is blacklisted.
These plugins would then, in sequence, have their
apply method called with
the dataset as the only argument. They can then add additional attributes to
the dataset, as well as additional key parameters to print out. The runtime
overhead should be negligible.
This extends further to the compartmentalization of field definitions. We
leave that somewhat unspecified here, but domain contexts should enable the
application of specific field objects based on runtime parameters. This could
mean, for instance, conversion of face-centered to cell-centered quantities,
magnetic field analysis, nuclear decay times, and so on. One mechanism for
doing this would be to add field objects to the already-created
object. (This is why that step must be raised in the list.)
One concern with this is that frontend-specific parameters (i.e.,
cosmological_simulation) are not universal, so an adapter between the
frontend and the plugin needs to be created. We propose that this be required
for each frontend by enabling plugins to call methods on the dataset. These
methods will be named
_adapt_* where the suffix is the contexts’s shortname.
These will return dictionaries of parameters which will be rigorously checked
for contents (i.e., preventing incorrect or incomplete information from being
passed back.) Contexts must define these methods.
As an example, here is pseudocode for a cosmological simulation context:
class CosmologyContext(DomainContext): domain = 'cosmology' def __init__(self): pass @classmethod def is_appropriate(cls, pf): if not hasattr(pf, '_adapt_cosmology'): return None rv = pf._adapt_cosmology() if rv['cosmological_simulation'] == 1: c = cls() return c return None def apply(self, pf): params = pf._adapt_cosmology() pf.cosmological_simulation = rv['cosmological_simulation'] pf.cosmology = Cosmology()
This design mechanism is somewhat open for discussion; the problems of adapting varying parameters and matching both the generality of the domain context and the frontend dataset provide challenges. An alternative is to provide a default class method for each context that is used by the base dataset object to obtain a false value.
As noted during discussion, context can and should subclass each other. How this interfaces with which plugin in the order of resolution is not yet clear, as (for instance) the base class should not necessarily modify an attribute when the subclass would then override.
These domain context will be extensible at runtime by specifying an additional list of plugins to check, by adding additional plugins to the global (and frontend-specific) registry, and by adding to the plugin list for each dataset type.
Much of the implementation has been described above. However, these domain
plugins should reside in a subdirectory of
data_objects, specifically named
yt/data_objects/domain_contexts/ and should be limited to one class per
- The backwards compatibility of renaming is likely quite small, except for those cases where names would be changed.
- The backwards compatibility of checking for
cosmological_simulationwould probably require additional field validation (or instead, fields that are added specifically by the cosmology context).
TimeSeriesDatato a new name may need to be gradually introduced, retaining backwards compatibility for a while.
- Fixing Enzo’s
current_timewill cause challenges for anyone who is not using internal time conversion factors. I think this number is likely small.
We could continue with the status quo.