MIEN Extension Writing HowTo

Before reading this document, you should read the overview of the extension architecture, located in the Extensions Overview

It is assumed that you know some basic things about Python. You don't need to know much, but you should be able to construct basic Python types, like a tuple of 3 strings or a function that takes 2 arguments. If you need help with this, there is extensive online documentation for Python.

I'll further assume that you are using some Posix operating system (OS X, Unix, Linux, BSD etc), or have at least installed cygwin or mingw on your Windows box. If you're writing extensions in Windows (particularly without cygwin or mingw) you're tackling a challenge. You should probably be a Windows guru who won't need any help with compiler setup, file management, environment variables and such. Extensions (especially compiled or mixed-language extensions) are much harder to write and use on Windows.

As described in the overview, mien blocks are implemented as python packages. To write extensions in other languages, you will still need to write a python package that wraps your external code. This tutorial will assume that your extension package is pure python code. Later tutorials will handle writing mixed-language extensions.

The first step to creating a mien extension is to generate a package, and place it in the directory referenced by MIEN_EXTENSION_DIR.

To generate a package, simply create a new directory, and put a file named __init__.py in it. To create a package named "newext", provided that $MIEN_EXTENSION_DIR = ~/mienblocks, use the following:

mkdir ~/mienblocks/newext touch ~/mienblocks/newext/__init__.py

That's it! You have a python package. Currently it doesn't do anything, but we'll get to that. The rest of this tutorial assumes you are working on this package, and will refer to the directory $MIEN_EXTENSION_DIR/newext as simply "newext"

To make your package do something in python, you need to add some python modules (other than __init__.py) to newext that define functions or classes. The majority of the tutorial will cover how to write these modules so that they provide useful mien extensions.

In order to get Mien to use the package as an extension, however, you will also need to provide an index to the mien blocks provided by the package. You do this by editing newext/__init__.py. You will need a working index in order to test your blocks as you build them, so the tutorial will first describe building the index.

Obviously, there's a bit of a bootstrapping problem here, in that you can't make a useful index without having something to index, so first we'll add a simple block. don't worry too much about the implementation of the block yet.

Create a file newext/dataStats.py, and put the following code in it:

def printDataShape(ds): print ds.shape()

This function (printDataShape, shown above) is a valid DSP type mien block. It will print the dimensions of the active Data instance to standard output. It is also automatically a member of newext, since it is in a .py file in the same directory as newext/__init__.py. Mien won't recognize it as a block though, since it is not reported in the extension index.

PYTHON INDENTATION:

If you aren't used to writing Python, remember that Python delimits code blocks using indentation. The whitespace before "print ds.length()" is required for proper execution. This can make Python tricky to cut and paste, especially from html pages. (There's much debate about whether this behavior is a good thing. I don't much like it, but I love Python despite it, and so Mien is written in Python. If you're going to deal with mien, and thus with Python, you will need to make peace with significant whitespace)

Also, there is some disagreement in the python community about whether to use spaces or tabs to indent code. Python calculates the indent level of a line using the total number of whitespace characters (spaces or tabs). One thing every (sane) pythoneer can agree on is that code should never use mixed indention (both spaces and tabs in the same file), since this can lead to the horribly frustrating situation where two lines line up visually in an editor but in fact have different indent levels.

The Mien core code always uses single Tab indenting (one Tab character per indent level). It is therefore a reasonably good idea to use single tab indenting in mien blocks. This way if you cut and paste code snippets from a mien core module, you won't accidentally generate mixed-indented code. If everyone who writes Mien extensions sticks to the single tab convention than it will be easier to share code and maintain other people's code.

Creating the Package and Package index

At startup (and whenever user extensions are reloaded), Mien builds an index of the contents of each extension package by reading a set of module attributes from the __init__.py file of the package. These attributes have the same names as the mien block types (DSP, SPATIAL, DV, CV, ME, MECM, PARSERS, NMPML). If a given attribute isn't defined, this doesn't cause an error, but Mien assumes that no blocks of the corresponding type are found in that package. Since newext/__init__.py was created as an empty file, all the mien identifiers are undefined, so Mien assumes that no blocks are present in the package. Since we added a DSP block, we should index it. That can be done by adding the following one line to newext/__init__.py:

DSP=[(' dataStats', ' printDataShape', 'Print data shape') ]

This is the most general and complete syntax for declaring a block index entry. For a simple module like dataStats, there are several other ways to write the index, but this way will work for any possible block. DSP and all the other index attributes are usually set to Python lists, as in this case. There is only one exception, which is that they may also be set to the string "ALL". If some identifier is set to "ALL" then Mien will search the entire package, recursively (except for file names beginning with "." or "_"), and register every function that is defined in each module as a block of the indicated type (unless the name of the function begins with "_")(footnote 1). This means that no other block types can be defined in the whole package, and all blocks must be functions (not classes, dictionaries, or tuples, which prevents the use of ALL indexing for NMPML, MECM and PARSER block types). It also means that the extensions will be registered with automatically generated names, which may be ugly or confusing. However, if the whole purpose of a large extension package is to provide, e.g., several hundred DSP blocks, this method can greatly simplify indexing. It also means that the index doesn't need to be updated when new blocks (of the correct type) are added to the package.

So far have only one block in package newext. It's of type DSP, and it's entry point is a function. We could have written the index as:

DSP="ALL"

We might need to change this if we add any more blocks to newext in the future, however.

Provided the index identifier is a list, then the elements of the list may be either tuples (footnote 2) or strings. The most complete form is to use a tuple. The string form is a shortcut. If an entry in an index list is a string (or unicode object), then it needs to be the name of a module within the package, and it indicates that all functions in the in that module (unless their names begin with "_") should be automatically registered as blocks of the indicated type. Again this means that only functions can be registered, and names will be auto-generated, but it is more flexible than using "ALL" .

For DSP and SPATIAL functions, this is generally the preferred indexing method. We can decide that all the blocks is newext.dataStats will be simple DSP functions, and writte our index as:

DSP=['dataStats']

Now we won't need to change the index if we add a few more DSP functions to dataStats, and if we want to add some SPATIAL functions or a PARSER to newext, we will simply have to add a new module. This method is the standard for DSP and SPATIAL blocks.

Finally, if the index entry is a tuple, then it should contain strings, and it may contain either two or three of them. The first string is the name of a module. The second is the name of a module attribute within that module that is the entry point for a block. If it is specified, the third string is the name of the block that this entry point defines.

If the third string is omitted, Mien automatically generates a name for the block, using the fully qualified name of the entry point. For example if we set our index for newext using "DSP=[('dataStats', ' printLength')]", our block would be assigned the name "newext.dataStats.printLength". This name would then be used by the Mien guis, and written into the "function" attributes of MienBlocks elements to specify your function. In the case of the MienBlocks, the automatic name is informative, unique, and machine evaluatable, so it is usually better than a user-defined name. In addition, Dataviewer and Cellviewer used the "." structure in automatic names to build nested menu items in their DSP and SPATIAL menus, so it' usually a good idea to let Mien assign automatic names to DSP and SPATIAL function. If you primarily intend your function for GUI use, however, you may want to assign a more human readable name, or at least choose the name of your entry point carefully. ME, DV, and CV blocks commonly have explicitly assigned names. NMPML and Parser blocks must assign a name explicitly.

An important note about block names:

They need to be unique. Mien stores blocks of each type in a dictionary keyed by the block names. Since dictionaries can only have one copy of any given key, if you have two blocks with identical names, only one of them will be available within Mien (usually the one that is defined in the extension package who's name would come last in a string sort). Some built-in mien capabilities are also provided as blocks (particularly DSP and SPATIAL blocks) and can be overwritten by user extensions of the same name. Mien's automatic names are always unique, so it's often a good idea to use them if you have a large number of similar DSP or SPATIAL blocks. An alternative is to use your name or some other uncommon string in your explicitly defined names. For example, it is not unlikely that if I write a block named "Spike Sorter" that it will someday end up in an extensions directory with someone else's block that is also named "Spike Sorter". If I had named it "Graham Cummins's Spike Sorter" this is much less likely.

Now you should be equipped to build an index for your new blocks when you add them. The rest of the tutorial will usually focus on building blocks, and assume that, after you build them you will index them in some way you like. There are a few extra complexities involved in indexing PARSER and NMPML blocks, however, which are presented in the appropriate sections

Directory structure

As long as you register all your blocks correctly in the package index, it is possible to make a package with any subdirectory structure that will still work as a Mien extension block. However, I use two conventions for directory structure that will make it easier for others to understand and use your blocks.

First, if your blocks require a binary Python extension (e.g. written in C), the source code for the compiled language components should be placed in a subdirectory named "bin", along with a "setup.py" file that can build the extensions. The resulting compiled shared libraries should be placed in the top level of the extension package.

For examlpe: Suppose you have an extension "myblocks" in the directory "~/mienblocks", with an associated C file. The C file should be placed in something like ~/mienblocks/myblocks/bin/mycfuncs.c". More importantly, the following command:

python ~/mienblocks/myblocks/bin/setup.py \ --install-lib=~/mienblocks/myblocks/

should build and install your compiled libraries.

Secondly, keep block package directory structures simple when possible. Blocks are small special purpose projects by nature, and shouldn't need deeply-nested directory structures, and flatter structures are often easier to understand. You shouldn't need more than one level of nesting. I only ever use nested directories at all for "bin" as described above.

Getting Documentation

Writing any extensions requires some understanding of Python code, the numpy extension to python, and some of MIEN's built in behaviors.

A wide range of reasonably good Python language documentation can be found at the python web site. In particular, the Library Reference is an essential companion (at least for me) when writing Python.

The numpy extension provides fast numerical arrays for python, and MIEN uses it extensively. Most large blocks of numerical or anatomical data are stored in numpy arrays. Narrays act like some other Pyhon containers (particularly lists, since they are mutable), but to manipulate them efficiently , you will need to understand methods and functions defined by the numpy package. Documentation for this package is harder to come by, because the official numpy guide is not free. If you can afford it, you may want to buy the guide and support Travis Oliphant's excellent work on scientific python. If not, you can try using the older numeric documentation and/or the numarray manual (pdf), which describe similar (but not identical) packages. For the impatient, I have summarized some of the most common numpy features on this site.

Each type of extension requires knowledge of different details of the MIEN API, so each is covered on its own HOWTO page.

footnote 1: Mien also checks to make sure that each function is actually defined in the package, not imported with "from" or "import ... as", so programmers can safely use "from numpy import *" in an extension module without worrying that all of the several hundred names defined by numpy will get registered as Mien blocks. The avoidance of modules and functions beginning with "_" in the indexing prevents indexing the __init__.py of subpackages, and is also useful for defining helper functions or helper modules that shouldn't be registered as blocks themselves.

footnote 2: Actually, Mien doesn't check that index entries are type tuple. If the entry is type str or unicode, it is treated as a string. Otherwise it is treated as a tuple. This should work fine if it is in fact a list, or some other ordered container (e.g. a numpy narray or PyObjects), but the convention is to use tuples.

 

Last edit: 05/29/09

Index