Mien Data elements are Instances of class Data. They correspond to The nmpml tag "Data" in an nmpml (xml) document, and they are used for storing all sorts of large numerical data that isn't easily included inline in an xml document. The Data class is a NMPMLObject, so it inherits the NMPML API.
In addition, the Data class provides an extensive set of methods for dealing with numerical Data and additional nested Data elements (refereed to as subdata). The mien.dsp.dataset module also provides a set of tools for interacting with Data elements.
Essentially, data elements store three types of things:
Metadata, including the type of numerical data, sampling rate (if needed), names of channels in the data, etc. These are stored in xml attributes of the Data element. (footnote 1)
The numerical data itself, stored as a numpy ndarray. For a data element named DE, this is stored as DE.data
Nested elements, including other Data elements, stored in a list named DE.elements
In addition, class Data provides some useful methods for loading and saving data on demand, manipulating sub-elements, and maintaining undo/redo information when the document is being used by a GUI that requests and undo history.
Since python doesn't prevent outside functions from accessing instance members, you could just look at a data element DE and use:
Now you can manipulate these objects using any code you like. Since they are mutable objects (and are passed by reference), your changes will affect DE, and be applied the the Mien GUIs as you expect. There are, however, several good reasons not to do this.
Reason #1: You won't allow GUIs to keep an undo history. It's prohibitively expensive to make a copy of the entire Data tree every time a user function might get called, so undo information is stored as incremental changes by the Data element itself. In order for the element to be aware that it should record changes, you need to call it's methods. Consequently, if you modify object attributes directly, you will break "undo"
Reason #2: Your functions will be more likely to fail. For example, some Data elements have different attributes. Others store their actual numerical data at some remote URL and don't load it until it's needed. If you call DE.data, you may get "None", and if you call DE.attributes['SamplesPerSecond'] you may raise a KeyError. If you use DE.getData(), and DE.fs() (or DE.attrib('SamplesPerSecond') instead, the object will do a lot of work for you to make sure that the data are loaded if possible, and the attributes are calculated if they need to be.
Consequently, it's worth some effort to learn some of the Data API. I'll start with the basic things you need to know to write the most common extensions. Many of the things you need are methods of class Data, but some are also helper functions provided by the module mien.datafiles.dataset.
One of the xml attributes of a Data element is its "SampleType". This attribute specifies what sort of data are being represented, and determines how the data should be interpreted. You can call DE.stype() to get a sting indicating the sample type. Data with no type information are type "generic". In general, you can't do very many useful things with generic data without imposing an assumption of sample type. Here is a list of supported sample types (drawn from the pydoc documentation for mien.nmpml.data.Data)
group - There are no data. This element simply serves to group other Data elements. This can be used,for example, to collect 10 different sets of "events" type data that may be semantically related.
timeseries - The data are a 2D array, with sequential, evenly spaced, samples in time presented in each row. The columns are referred to as "channels" by analogy to the channels of a DAQ device. The attribute "SamplesPerSecond" is required for timeseries. Most "analog" data in neuroscience are timeseries.
This type can be used to represent any data that are uniformly sampled in the independent variable. Mien GUIs and method names will call this "time", but the methods will work the same if it is in fact space, frequency, etc. In general, Mien will use time and independent variable interchangeably, but if you use some other independent variable, don't be concerned by the nomenclature. All the functions should still work as expected.
ensemble - Like a timeseries, but each distinct dimension (channel) contains many samples, so the number of channels is smaller than the number of columns in the array. The attribute "Reps" indicates the number of columns per channel (which must be the same for all channels). The shape of an ensemble will be the shape of the data/reps.
events - The data are an Nx1 array, and represent sequential, binary, events (e.g. action potentials). "SamplesPerSecond" must be specified, and the values are the indexes of the sample row on which the event occurs.
histogram - The data are an NxM integer array, representing the number of some event that fall in equally spaced sequential bins in the independent variable. Usually M==1 and separate instances are used for different events, but multichannel histograms will work as expected.
labeledevents - The data are an Nx>=2 array. The first column is interpreted as for "events" type, but need not be sequential. The second column is the "label" of the event (e.g. the id of the unit that generated a spike). Optional third and further columns may provide additional label information (intensity, stimulus that elicited the event, etc.). The Data element will handle these "multilabeledevts" correctly (although the shape method will still return the number of units based on the first label), but not all of the extension functions in the dsp and datafiles modules are certain to handle them correctly. Dataviewer will respect a third column, which is coded as color of the event markers, but ignores subsequent columns.
function - The data are a 2D array, similar to a timeseries, but the values of the dependent variable are explicitly listed in the first column. These must be unique and monotonic. The datafiles.dataset.resample function can convert "function" type data to timeseries. (Dataviewer will automatically resample a function if you try to display one)
locus - The data are an NxM array representing N points in R-M. As a result the values in any column may be non-monotonic and non-unique. (Note that the Dataviewer GUI can't deal with locus data. It will try to spawn a simple graph to display 2 or 3D loci. Higher dimensional loci can't be displayed at all).
You can go a long way toward writing extensions for neuroscience using only the "timeseries" and "events" (and maybe "labeledevents") types. "timeseries" is the most common type, and much of this tutorial will assume timeseries data. The function mien.datafiles.dataset.isSampledType with return 'e' for event-like sampled data types, 's' for timeseries-like types, and False for everything else. Any data with some True value of isSampledType will provide a sampling rate (DE.fs() or DE.attrib('SamplesPerSecond')). Most neuroscience extensions can ignore any data with isSampledType False.
The most common class of user actions for Data elements is to get and set data and subdata. These functions and methods will be helpful. Methods are written using python calling syntax, assuming that the Data element calling the method is named "DE". Functions are listed as module.with.function->functionName
DE.getData()
This function returns the Narray that carries the actual data for DE. Without arguments, this will be similar to simply using the member DE.data, but it's usually better. In particular, some Data elements use "lazy" loading to load data from a remote url or external file. In this case, getData() will correctly return an array, while DE.data will be None. getData() can still return None for "group" elements, undefined data, or if loading the data fails. getData can also be called with arguments (chans, range, copy). Chans is a list or array of channels (columns) to select. Range is a python slice through samples (rows), and copy is a Boolean, that, if true, returns a copy of the data rather than a reference. Copy is False by default, so changing the elements of an array returned by DE.getData() will modify the data in place
DE.setData(dat)
Sets the data in DE to the array dat. You can also use setData(dat, chans, range) to set a subset. If you are changing the number of channels, sample type, or anything else besides the values in the array, you will need to use datinit()
DE.datinit(data, header)
Sets the data in DE to the array data, and the XML attributes to the dictionary header. You can also use datinint(data, header, copy=True) to set the data to a copy of data (this avoids hiving subsequent changes to DE change your saved reference to data). There are some constraints on legal values for data given the settings in header. (See #headers)
DE.getSubData(path)
Returns the Data element at the datapath indicated by path (see SubData and Data Paths. This returns None if there is no such path.
DE.createSubData(path)
This creates a subdata element at path. By default, it won't overwrite existing elements, so the path name may be modified to avoid an overwrite. The method returns the actual path name of the created element. You can also use createSubData(path, delete=True) to allow the overwrite of any existing element at that path and guarantee the creation of a new element with that exact path. createSubData(path, data, header) automatically initializes the new element with the provided data and header.
mien.nmpml.data->newData(data, header)
Returns a new element with the indicated dat array and header
mien.nmpml.data->newHeader(sampletype, samplingrate, labels, starttime)
Returns an appropriate header for a Data element with the indicated properties. All the properties attempt to take the "most common" default values, so, for example, newHeader(fs=10000) returns a header for a "timeseries" element with a sampling rate of 10000 Hz, 0.0 start time, and no channel labels.
mien.datafiles.dataset->getSelection(data, sel)
Returns the chunk of data or subdata from element data referenced by selection tuple sel. Selection tuples are used heavily by DSP functions. See selection tuples.
mien.datafiles.dataset->setSelection(data, value, sel)
Sets the chunk of data in element data referenced by tuple sel to value
Data headers specify the XML attributes of a Data element. They can be represented with python dictionaries, xml attributes, or members of a Data instance. The usual MIEN rules for converting python attributes to XML attributes apply to elements of a Data header when a file is saved.
Headers are somewhat unique in that certain Data sample types require particular header keys to be set to particular types of value of the Data element can't be correctly interpreted. Attempting to call "datinit" with an invalid data,header pair will raise an error.
Not every XML attribute of a Data element is part of the data header. Saving a Data element in a non-xml format may destroy most of the xml attributes, but will usually preserve the data header.
The most common keys in the data header are:
SampleType
This tag is always required, and is discussed here
Labels
This tag is usually optional. If specified it should be an array of strings, and provides human readable names for different channels (columns) in the data. It is most common in timeseries data. Some sample types (ensemble, labeledevents) can have a different number of "channels" (and thus Labels) than they have actual columns in the data Narray
StartTime
This is a floating point number associated to all "sampled" sampletypes (timeseries, ensemble, histogram, events, labeledevents) specifying the value of the independent variable associated to the first sample in the data.
SamplesPerSecond
This is the sampling rate (in Hz if the independent variable is in seconds) of sampled data. It is required for all sampled types
Reps
This is a property unique to "ensemble" type data, and it must be specified for that type. It is an integer indicating the number of repeats (or events, instances, or whatever) in the ensemble
The method DE.header() returns a python dictionary of the whole data header. The method DE.attrib(name) returns the value of header element name
DE.setAttrib(name, value) is used to set individual header elements.
All nmpml objects can be nested, and can call sub-elements and container elements using the NMPML API, but the requirements of DSP functions motivate a separate, simpler, system of sub-element access for data elements nested within other data elements. The only downside to this choice is that there are a confusing number of paths for a subdata element. Each element has an "xpath" (defined by the XML specification), a "upath" (defined by MIEN nmpml), and a "dpath" (or Data Path, defined by class Data). This section will only talk about dpath.
A Data element has dpath="/" if its container is not also a data element. Elements that are directly nested in an element with path "/" have path "/name" where name is the nmpml "name" attribute of the child. Deeply nested elements have additional path components reflecting each level of nesting, so "/physiology/filtered/ftime" refers to an element named "ftime" with parent "filtered" that has parent "physiology" that has some Data parent (with whatever name), which has a parent that is not a Data element.
A collection of Data elements that are all nested together, and all share the same set of dpaths, is often referred to as a data set or workspace. Class Data formally refers to this collection as a data hierarchy.
The methods DE.getSubData(path), and DE.createSubData(path [,data] [,header] [,delete]) are used to manipulate sub-elements using dpaths. The method DE.getHierarchy() returns a dictionary that maps every dpath in a hierarchy to the elements.
DSP functions frequently need to reference a chunk of data that is somewhere in a data hierarchy, but may not be in the top level element, and may not contain all the channels or samples of a particular element. Selection tuples are a type of label that can uniquely specify any contiguous chunk of data in a hierarchy.
Selection tuples have three components. The most complete form of a selection tuple is (dpath, channels, range). Dpath is a string (see SubData and Data Paths), channels is a list of integers, and range is a 2-tuple or slice object.
The tuple specifies, respectively, which subdata element, which channels (columns) in that element, what range of samples (rows). The values have sensible defaults, so any one of them can be set to None. The tuple (None, None, None) selects all samples of all channels of the toplevel element (dpath='/').
Module mien.datafiles.dataset provides a number of functions for dealing with selections, including getSelection, setSelection, and getSelectionHeader.
There are some subtleties in using selection tuples on event and labeledevent data. In general, the selection functions should do "what you want", but you may want to read the module documentation or even the code if you run into strange behavior.
Note that xml attributes are slightly different than python object attributes. For a data element named DE, the xml attributes are stored in a dictionary named DE.attributes. "attributes" itself is an attribute of the object DE, in the Python sense, but in the xml sense, only the keys stored inside this dictionary are attributes of the xml tag that is being represented by DE