Data Class

The Data Class

Mien Data elements are Instances of class Data. They correspond to The nmpml tag "Data" in an nmpml (xml) document, and they are used for storing all sorts of large numerical data that isn't easily included inline in an xml document. The Data class is a NMPMLObject, so it inherits the NMPML API.

In addition, the Data class provides an extensive set of methods for dealing with numerical Data and additional nested Data elements (refereed to as subdata). The mien.dsp.dataset module also provides a set of tools for interacting with Data elements.

Essentially, data elements store three types of things:

In addition, class Data provides some useful methods for loading and saving data on demand, manipulating sub-elements, and maintaining undo/redo information when the document is being used by a GUI that requests and undo history.

Since python doesn't prevent outside functions from accessing instance members, you could just look at a data element DE and use:

dat=DE.data #gets an ndarray meta=DE.attributes #gets a dictionary of the xml attributes.

Now you can manipulate these objects using any code you like. Since they are mutable objects (and are passed by reference), your changes will affect DE, and be applied the the Mien GUIs as you expect. There are, however, several good reasons not to do this.

Reason #1: You won't allow GUIs to keep an undo history. It's prohibitively expensive to make a copy of the entire Data tree every time a user function might get called, so undo information is stored as incremental changes by the Data element itself. In order for the element to be aware that it should record changes, you need to call it's methods. Consequently, if you modify object attributes directly, you will break "undo"

Reason #2: Your functions will be more likely to fail. For example, some Data elements have different attributes. Others store their actual numerical data at some remote URL and don't load it until it's needed. If you call DE.data, you may get "None", and if you call DE.attributes['SamplesPerSecond'] you may raise a KeyError. If you use DE.getData(), and DE.fs() (or DE.attrib('SamplesPerSecond') instead, the object will do a lot of work for you to make sure that the data are loaded if possible, and the attributes are calculated if they need to be.

Consequently, it's worth some effort to learn some of the Data API. I'll start with the basic things you need to know to write the most common extensions. Many of the things you need are methods of class Data, but some are also helper functions provided by the module mien.datafiles.dataset.

Data Sample Types

One of the xml attributes of a Data element is its "SampleType". This attribute specifies what sort of data are being represented, and determines how the data should be interpreted. You can call DE.stype() to get a sting indicating the sample type. Data with no type information are type "generic". In general, you can't do very many useful things with generic data without imposing an assumption of sample type. Here is a list of supported sample types (drawn from the pydoc documentation for mien.nmpml.data.Data)

You can go a long way toward writing extensions for neuroscience using only the "timeseries" and "events" (and maybe "labeledevents") types. "timeseries" is the most common type, and much of this tutorial will assume timeseries data. The function mien.datafiles.dataset.isSampledType with return 'e' for event-like sampled data types, 's' for timeseries-like types, and False for everything else. Any data with some True value of isSampledType will provide a sampling rate (DE.fs() or DE.attrib('SamplesPerSecond')). Most neuroscience extensions can ignore any data with isSampledType False.

Most Common Data Methods

The most common class of user actions for Data elements is to get and set data and subdata. These functions and methods will be helpful. Methods are written using python calling syntax, assuming that the Data element calling the method is named "DE". Functions are listed as module.with.function->functionName

Headers

Data headers specify the XML attributes of a Data element. They can be represented with python dictionaries, xml attributes, or members of a Data instance. The usual MIEN rules for converting python attributes to XML attributes apply to elements of a Data header when a file is saved.

Headers are somewhat unique in that certain Data sample types require particular header keys to be set to particular types of value of the Data element can't be correctly interpreted. Attempting to call "datinit" with an invalid data,header pair will raise an error.

Not every XML attribute of a Data element is part of the data header. Saving a Data element in a non-xml format may destroy most of the xml attributes, but will usually preserve the data header.

The most common keys in the data header are:

The method DE.header() returns a python dictionary of the whole data header. The method DE.attrib(name) returns the value of header element name

DE.setAttrib(name, value) is used to set individual header elements.

SubData and Data Paths

All nmpml objects can be nested, and can call sub-elements and container elements using the NMPML API, but the requirements of DSP functions motivate a separate, simpler, system of sub-element access for data elements nested within other data elements. The only downside to this choice is that there are a confusing number of paths for a subdata element. Each element has an "xpath" (defined by the XML specification), a "upath" (defined by MIEN nmpml), and a "dpath" (or Data Path, defined by class Data). This section will only talk about dpath.

A Data element has dpath="/" if its container is not also a data element. Elements that are directly nested in an element with path "/" have path "/name" where name is the nmpml "name" attribute of the child. Deeply nested elements have additional path components reflecting each level of nesting, so "/physiology/filtered/ftime" refers to an element named "ftime" with parent "filtered" that has parent "physiology" that has some Data parent (with whatever name), which has a parent that is not a Data element.

A collection of Data elements that are all nested together, and all share the same set of dpaths, is often referred to as a data set or workspace. Class Data formally refers to this collection as a data hierarchy.

The methods DE.getSubData(path), and DE.createSubData(path [,data] [,header] [,delete]) are used to manipulate sub-elements using dpaths. The method DE.getHierarchy() returns a dictionary that maps every dpath in a hierarchy to the elements.

Selection tuples

DSP functions frequently need to reference a chunk of data that is somewhere in a data hierarchy, but may not be in the top level element, and may not contain all the channels or samples of a particular element. Selection tuples are a type of label that can uniquely specify any contiguous chunk of data in a hierarchy.

Selection tuples have three components. The most complete form of a selection tuple is (dpath, channels, range). Dpath is a string (see SubData and Data Paths), channels is a list of integers, and range is a 2-tuple or slice object.

The tuple specifies, respectively, which subdata element, which channels (columns) in that element, what range of samples (rows). The values have sensible defaults, so any one of them can be set to None. The tuple (None, None, None) selects all samples of all channels of the toplevel element (dpath='/').

Module mien.datafiles.dataset provides a number of functions for dealing with selections, including getSelection, setSelection, and getSelectionHeader.

There are some subtleties in using selection tuples on event and labeledevent data. In general, the selection functions should do "what you want", but you may want to read the module documentation or even the code if you run into strange behavior.

(footnote 1):

Note that xml attributes are slightly different than python object attributes. For a data element named DE, the xml attributes are stored in a dictionary named DE.attributes. "attributes" itself is an attribute of the object DE, in the Python sense, but in the xml sense, only the keys stored inside this dictionary are attributes of the xml tag that is being represented by DE

 

Last edit: 05/29/09

Index