YTEP-0016: Volume Traversal¶
Created: September 10, 2013 Author: Matthew Turk
yt should consider volume traversal, accumulation of data, and flexible definitions of paths to be first-class operations as well as implementable by individuals. Essentially, we need a method for describing “derived values for volumes”.
Currently, the only mechanisms for studying or understanding data in yt are contained in the following procedures:
- Derived quantities (generating scalars from fields in volumes)
- Derived fields (generating fields from other fields)
- Contour identification
- Ray casting (on or off axis; i.e., projections or volume rendering)
- Contour identification
- Surface extraction
Several of these items utilize brick decomposition, but not all. This is the process by which overlapping grids are broken apart until a full tesselation of the domain (or data source) is created with no overlaps. This is done by the kD-tree decomposition.
What this YTEP proposes is to make handling tiles of data first class, as well as provide easy mechanisms for creating volume traversal mechanisms. There are two components to this: handling tiles of data, and creating fast methods for passing through the data and moving between tiles.
Note that this YTEP does not (yet) address situations where the mesh of the simulation is too large to fit into memory on a single node. Because the kD-tree is able to build in parallel, this essentially will amount to distributing regions of the dataset (where the mesh may not be known) to individual processors, allowing build on those processors, and enabling individuals to describe reduction steps in their operators.
Currently, in yt-3.0, all data objects expose a “block” iterator that returns data containers as well as masks of data. A similar iterator should exist for iterating over the “tiles” that compose a given data object. How this should behave is somewhat open for discussion, as the kD-tree itself has a notion of a ‘viewpoint traversal’ which may be important. Furthermore, it is not necessarily true that the traversal will be easily defined. As an example of this, tiles may need to traversed according to extrema in some fluid.
The order that tiles are returned to the individual should be flexible and extensible. A few predefined orders should be implemented:
- Depth-first traversal
- View-point traversal
- Breadth-first traversal
Additionally, these should allow for front-to-back or back-to-front yielding. An example API for this would be:
data_source = ds.h.sphere(c, (10.0, 'mpc')) for brick in data_source.tiles.depth_traverse(): operation(brick)
depth_traverse() would also be exposed for simply iterating
tiles object. Additional traversals could be extensibly defined.
Many of these traversal already exist for the AMR kD-tree. A traversal
should be able to be defined and executed from the following set of
- kD-tree object
- Arguments defining the traversal itself
- Data-centering (optional) and fields (optional)
It should yield
Most importantly, the notion of what is returned by this system needs to be defined. The notions of what brick is and possessed need to be defined. There are several options:
- An empty
CoveringGridthat knows how to read data.
- A filled (i.e., data pre-specified)
PartitionedGrid, where vertex or cell-centered data must be specified.
- A slice object and a grid object
- A new object, designed for this system, which acts as a superset of
PartitionedGrid. This object would include connectivity information as well, as it would not be independent of the tree itself. The
PartitionedGridcould be modified to fit this.
Regardless of which object is returned, at a minimum a kD-tree (or other partitioning) must be created when requested, at the call to tiles, potentially cached, and then objects iterated over. Each of these tiles is guaranteed to be the finest data available within the region they cover, and they are guaranteed not to overlap with any others.
For the purposes of this YTEP, we will assume the fourth option. If the
PartitionedGrid object were to be extended, I believe it would likely be best
to extend it as follows. Note that for many of these operations we implicitly
assume that it is operating on a grid patch; for octree codes, the creation of
this object will be considerably simpler, and for particle codes we simply
define these as the leaf nodes from the octree index itself. Because we need
to handle particle codes, we must also ensure that these objects can query
- Cache a slice of the grid or data object that it operates on. (For situations where it fully encompasses the parent region, it need not have a slice.)
- Create a mechanism for filtering particles from the data object it operates on.
- Enable the object to query new fields from its source object. This means that at instantiation time we may not regard the object as having a given field, but that this field can be added at a later time by querying.
- Provide a mechanism for identifying neighbor objects from a given face index. This is the connectivity relationship described above; given any one cell that resides on the boundary of a brick, return the brick (which may or may not be a leaf node) that is adjacent. This would enable identifying the leaf node at a given location within that boundary cell, which may reside at a higher level of refinement and could thus correspond to multiple tiles. This degeneracy results from the fact that we cannot guarantee that neighboring tiles differ by only a single level of refinement. However, because this will be defined at the Python level, rather than specifically for well-defined traversal operations, this is acceptable as we should leave open to the individual how to select the appropriate cell or what to do with it.
- Provide mechanisms for generating vertex-centered data or cell-centered data quickly.
At the present time, a simple first-pass at implementation could occur with the following:
- Implement a
tilesroutine that mandates supplying fields to cache or load, the vertex or cell centering of data, and a viewpoint traversal scheme.
- Cache a kD-tree based on these tiles.
- Iteratively yield tiles from this tree based on the traversal specified above.
The interface for these tiles, at a minimum, must expose that of the
PartitionedGrid with one modification: fields should be accessible by
__getitem__, so that any possible changes in the future that would expose
this would be backwards compatible with usages now.
The second aspect of this YTEP is to define a mechanism for integrating paths through tiles. Currently we do this through strict vectors that cannot be re-entrant into a grid; these vectors cannot change path along the way, and the number of them is fixed at the time of the first grid traversal.
As currently implemented, flexibility in volume traversal is defined in terms
of the mechanism by which values are accumulated. This includes the definition
of these objects, all inside
- Accumulator data (i.e.,
Essentially, for a given image type, a sampler can be defined. This sample
receives the following arguments:
VolumeContainer (a C-interface to a
partitioned grid with
v_pos (vector position),
enter_t (cell-entrance in terms of the parameter),
exit_t (exit time based on the vector at time of entrance),
(index into the data) and
data, a pointer to an accumulator (i.e.,
These sampler functions are called for every index that a vector traverses.
The volumes themselves are traversed inside
walk_volume (and, in the
nascent cylindrical volume rendering bookmark,
This assumes cartesian coordinates and simply calls the
for every zone that is crossed. This enables volume rendering, projecting and
so on to be conducted simply by swapping out the sampler function and correctly
interpreting an image object returned.
However, this is not sufficient for arbitrary traversals or arbitrary data collection. We need flexibility to define the following things:
- The mechanism of traversing blocks of data (covered at a higher level by the kD-tree itself, and not necessarily a part of this YTEP)
- Bootstrapping traversal of a volume by a given ray object. This would include identifying the zones that a ray first encounters and setting its initial time of intersection.
- Defining a mechanism for updating the indices in the volume that a ray will intersect next
- Defining a method for determining when a ray has left an object
- Defining a method for selecting the next brick to traverse or connect to
- Updating the value of a ray’s direction
Many of the problems can be seen simply by considering cylindrical volume rendering itself. If the view point is somewhere outside the cylinder looking toward it, rays from an orthonormal image plane will each construct a chord through the cylindrical shells. These chords will each span up to pi along the theta direction, and can have the following properties in their traversal:
- dtheta/dl can switch signs
- grids can be periodic with themselves
- dr/dl can switch signs
- a ray can exit a grid off the r boundary and then re-enter it later in the computation
- Both dtheta/dl and dr/dl change with each update of the ray’s position, and are not even constant over a single zone.
While this demonstrates some of the complexity, we also want to be able to support translating streamlines, clump finding and even gravitational lensing into this new mechanism for traversing volumes.
Therefore, we need a new mechanism that abstracts (independently) both the collecting or accumulating of data as well as the mechanism by which a given ray traverses a patch of data, whether that patch is one or several cells large. In this manner we will remain neutral to the nature of the data container, which may be an octree, a kD-tree, or a single grid.
- At the outermost level, tiles will be traversed in Python, and a collection of rays (either in an
ImagePlaneor some other object) will be handed each brick as it comes.
- Each ray will be “bootstrapped” onto a brick. This will result either in a traversal or an immediate return. (At a later time we will consider fast evaluation of which rays to consider.)
- Each cell traversed by the ray will be “sampled” in some way.
- The ray state (location, index, direction, etc) will be updated.
- Rays will traverse until they leave a brick.
- The next brick will be identified, either from ray positions or from the traversal at the python level.
Note that this does not yet enable a ray to request the next brick at a given position, which will be necessary. However, for the purposes of this iteration of the YTEP, we take it as given that such communication will be defined at a later time, or will be handled on a ray-by-ray basis, where the iteration is managed for each ray individually.
Objects to Manage¶
To accommodate the flow control outlined above, the following classes will need to be implemented, with the following specifications. These will be in Cython. A base class (listed below) will form the basis for each type of traversal.
struct ray_state: np.float64_t v_dir np.float64_t v_pos np.float64_t tmax int ind int step np.float64_t enter_t np.float64_t exit_t void *sdata class GeometryTraversal: # set values like domain size or whatever is necessary here def __init__(self, parameter_file) # Return whether the ray hits the vc or not cdef int initialize_ray(self, ray_state *ray, VolumeContainer *vc) nogil cdef int increment_ray(self, ray_state *ray, VolumeContainer *vc) nogil cdef np.float64_t intersection(self, np.float64_t val, int axis, np.float64_t v_dir, np.float64_t v_pos) nogil cdef int walk_volume(self, VolumeContainer *vc, sampler_function *sampler, ray_state *ray, np.float64_t *return_t = ?) nogil cdef
ray_state object will be independent of the geometry, and will always
refer to the cartesian state of the ray. A given geometry traversal will set
up the ray state (i.e., where it intersects with a volume container) and how to
increment the ray state as zones are crossed. The
will determine the state of the ray as it first touches a brick, and will
return 0 or 1 if the ray is inside that brick. The
will receive a ray and determine the crossing time in the parameter
the ray uses as it passes through a cell. The return value is 0 for the ray
having left the object and 1 for the ray being within the object and the
sampler function needing to be called.
intersection will get the position
at which a ray intersects a given value, and
walk_volume will typically be
described in the base class and not overridden elsewhere. Part of the level of
abstraction is to enable
walk_volume to largely be the same for each
geometry, but enabling it to be overridden means we can use the same traversal
for other operations such as clump finding and so on.
As a first implementation, the following classes will need to be implemented:
At a later time, the
SphericalTraversal object can be implemented.
This should retain all backwards compatibility for cartesian systems.
I’m not sure of any alternatives currently.