NumPy and SciPy for IronPython / .Net
The Python NumPy and SciPy packages have been ported to run under IronPython, Microsoft’s Python implementation for .NET. These packages implement a fast and flexible multi-dimensional array package (NumPy) and a large collection of scientific
and numerical algorithms built on top of the array (SciPy). Both packages consist of a heavily optimized native core implementation of the algorithms and a custom .NET interface for integrating the packages into IronPython.
The IronPython ports of NumPy and SciPy are full .NET ports and include custom C#/C interfaces to a common native C core. This means that the full functionality is available not only to IronPython but to all .NET languages such as C# or F# by directly accessing
the C# interface objects or sometimes by evaluating IronPython expressions from other .NET languages. This means that a multi-dimensional array object (ndarray) can be passed seamlessly between IronPython and C# or F# code. Further, the ndarray object
implements the standard IEnumerable interface, allowing the array object to often be used with existing code that is not specific to NumPy.
This release represents an early version of the architecture for the NumPy 2.0 release where there is a common, platform-independent core and multiple interface layers (currently CPython and IronPython/.NET). This version attempts to maintain full
compatibility with earlier versions of NumPy, and in general this goal has been achieved. However, some inconsistencies were necessarily as a result of the port and the re-architecting of the NumPy core. See ‘What You Should Know’ below
regarding this release.
The first part of this document provides a quick overview of the capabilities of NumPy and SciPy for those unfamiliar with the packages. These examples attempt to highlight a range of capabilities of these packages but are by no means complete.
Please see
http://numpy.org
and
http://scipy.org
for the complete documentation. The second part of the document describes the known limitations, incompatibilities, and future work for these packages.
Installing NumPy and SciPy on IronPython
NumPy and SciPy are bundled as pre-built “eggs” common to Python software distributions. The files include the Python source code and pre-built binary images in an easy-to-install package.
For download and installation instructions please see
http://www.enthought.com/repo/.iron/.
NumPy Examples
NumPy is the fundamental package needed for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- useful linear algebra, Fourier transform, and random number capabilities.
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. This allows NumPy to seamlessly and speedily integrate with a wide variety of data formats.
This section provides a brief tour of some of the capabilities of NumPy, particularly the array functionality.
Getting Started
All of the following examples assume that IronPython (2.7rc1 or later) and NumPy for .NET have been installed and the packages loaded into python (“import numpy as np”). Please see
http://www.enthought.com/repo/.iron/
for instructions on downloading and installing NumPy.
Indexing and Views
The NumPy package takes full advantage of the flexibility and expressiveness of the Python range syntax to provide powerful indexing capabilities. When a NumPy array is indexed, the package typically returns a view into the original data and avoids copying
it. This allows the selection of subsets of data to be performed very quickly, and modifications to be made using the much clearer syntax of the subset.
The following example creates a 10x10 2-d array and selects one row and one column from it. Both the row and col variables are views into the array so when col[2] is set to a new value, the original array is modified:
x = np.arange(100).reshape((10,10)) # Now have a 10x10 array
row = x[3] # Y is [30..39], view not copy
col = x[:,3] # Z is [3, 13, ... 39], not a copy
col[2] = 42
print "x[2, 3] = %d" % x[2, 3] # x[2,3] has been modified
Note, too, in the first line that the array is originally created as a 1-d, 100-element array and then a new shape is applied. NumPy provides powerful capabilities to “look at” the same data array in multiple ways as will be shown in later
examples.
Continuing the same example, the entire column can be modified. Now that the 3rd column in the array has been modified:
col += 100 # Modify third column of X
print "x = %s" % z # 0, 1, 2, 103, 4, 5, ...
# 10, 11, 12, 113, 14, 15, ...
NumPy fully supports the Python range syntax of start:step:increment. Using this syntax, we can select every third element of the array starting with the second:
x = np.arange(100)
y = x[1::3]
Again, this is simply a view into the original array so modifications to this array are reflected in the original. The view makes it much easier to operate on the selected subset of data. For example, if we wanted to multiply every third element
by 2:
y = x[1::3]
for i in range(len(y)): y[i] *= 2
Or more simply:
x[1::3] *= 2
There are two advantages to this approach. First, it avoids the off-by-one class of errors common in for-next loops in C by using the array indexing syntax. Second, in the single-line version, iteration over the array is performed in native code,
making it the fastest version, too.
Fancy Indexing
Going further, NumPy supports the use of ‘mask arrays’ as array indices. Mask arrays are arrays of Boolean values and when used as an array index, NumPy selects elements where the mask is true.
For example:
x = np.arange(100)
y = (x % 3) == 0 # y is bool array w/ true where (x%3 == 0)
z = x[y] # z is elements of X where Y is true.
In the second line, the result of applying the double equal operator to the result of the modulus operator is a Boolean array indicating each index where (x%3) is 0. Using ‘y’ as an index into ‘x’ produces an array containing
only the values that are even multiples of 3 (z). In this case it is important to note that ‘z’ is a
copy of x because there is no way to simply adjust the number of dimensions and strides to make a new view of ‘x’ since the mask array can be an arbitrary pattern. In this case, changes to ‘z’ will not be reflected in
‘x’.
A slightly more complex example uses fancy indexing to extract the subset of values that are perfect squares and passes the result as a C# function which can access it through the IEnumerable interface:
x = np.arange(100)
y = np.sqrt(x)
MyCSharpClass.DoSomething(x[y == np.trunc(y)])
C vs. Fortran Ordering and Interoperability
By default NumPy stores the data using a memory layout common to the C language (most rapidly changing index last) vs the standard Fortran memory layout (more rapidly changing index first). However, NumPy supports arrays with both layouts transparently and
makes it simple to convert between the two. Selecting Fortran memory layout makes it easier to integrate with legacy Fortran code:
# 10x10 2d array of ints, convert to Fortran order, floats
x = np.arange(100).reshape((10,10))
y = np.array(x, dtype=np.float32, order='Fortran')
# Pass IntPtr of array data to C# object
MyCSharpFortWrapr.DoSomething(y.UnsafeAddress)
Record arrays and Type Conversions
Record arrays are a variation on indexing in that they allow each element of the array to consist of multiple fields of different types. For example, the following array contains 10 elements consisting of a string, an integer, and a floating point value:
x = np.zeros(10, dtype=('S,i,f'))
Array fields can be accessed using the string ‘f0’, ‘f1’, … ‘fN’ or the fields can be explicitly named:
x = np.zeros(10, dtype=[('name', 'S'), ('quantity', 'i'), ('weight', 'f')])
Accessing each row of the record array returns the fields as a tuple:
name, qty, weight = np[0]
Alternately, each field can be accessed as a column and extracted into an array. For example, the weighted average weight of the inventory items:
avgWeight = np.sum(x['quantity'] * x['weight']) / np.sum(x['quantity'])
And similar to indexing earlier, referencing fields of an array produces a view so modifications to the view are reflected in the original. The weights can all be converted to from kg to lbs like this:
x['weight'] *= 2.206
One of the particularly powerful uses of record arrays is the ability to apply the record structure to an existing memory layout, such as one read from a file. Say we have a binary file that is a simple dump of an array of the following structure:
struct {
float latitude;
float longitude;
int16 elevation;
int16 surfaceType;
};
This file can be quickly loaded into a NumPy array for processing. For example:
x = np.fromfile('data.dat', dtype=[('lat', 'f4'), ('lng', 'f4'),('elev', 'i2'), ('surface', 'i2')])
maxElev = np.max(x['elev'])
Further, it is not uncommon to find data written using a machine architecture different than the current one, such as big- vs little-endian. By default NumPy uses the native machine format but can be instructed to use big-endian with the greater-than
symbol prefix (>) or little-endian with the less-than symbol prefix (<). The following code reads the above data file assuming it was written on a big-endian machine and extracts a copy of the elevation data into native-format 32-bit integers:
x = np.fromfile('data.dat', dtype=[('lat', '>f4'), ('lng', '>f4'), ('elev', '>i2'), ('surface', '>i2')])
elev = np.array(x['elev'], dtype='i4') # 'elev' is now a copy
As mentioned before, selecting a field out of an array produces a view so writes to the view are reflected back to the original. This is true even if the byte format is non-native; the byte swaps are handled transparently:
x['elev'][2] = 1500
Signal Processing, Linear Algebra, and Other Packages
A full description of the FFT, linear algebra (linalg), and other packages provided by NumPy are beyond the scope of this document and the reader is referred to
http://numpy.org
for complete documentation. However, a final short example will demonstrate how some of the above concepts can be combined with these packages.
The FFTPack module provides a range of fast Fourier transforms that operate on NumPy arrays. The following example reads data from a hypothetical data file containing signal data. The data is stored as a 32-bit int plus two 16-bit ints all in
big-endian format. The following code extracts the middle value and performs a basic, 1-d FFT transform on it. To make it useful without the hypothetical file, it also generates some synthetic data.
SciPy Overview
SciPy (pronounced "Sigh Pie") is large Python library for mathematics, science, and engineering. The SciPy library depends on
NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical
integration and optimization. NumPy and SciPy are easy to use, but powerful enough to be depended upon by some of the world's leading scientists and engineers. A full description of the SciPy library can be found at http://scipy.org.
Here is a brief look at some of the SciPy modules available on IronPython:
- Constants: a large collection of physical and numerical constant values
- FFTPack: Fourier and inverse Fourier transforms plus convolutions
- Linalg: Linear algebra library
- Special: Collection of special function such as Bessel and Elliptic functions
- Stats: Discrete and continuous statistics functions
import numpy as np
import numpy.fft as fft
x = np.fromfile('signal.dat', dtype=('>i4,>i2,>i2'))
# Copy, convert to native floats, rescale
sig = np.array(x['f1'], dtype=np.float32) / 32768
# Alternate: since no file is provided, we can make up some data:
data = [math.sin(np.pi / 8 * i) for i in range(320)]
noise = [np.random.rand() * 0.02 for i in range(320)]
sig = np.array(data, dtype='f')
sig += np.array(noise)
f = fft.fft(sig)
print "Strongest frequency = %d n" % np.argmax(np.abs(f))
What You Should Know
Some SciPy packages have not been ported yet - Many of the most important SciPy packages have been ported to IronPython but some packages are still in-progress. The following packages are included in this release: stats, special, linalg,
fftpack, constants, and misc. The remaining packages are in-progress and will be added as the ports are completed and passing regression tests.
Memory corruption in chararray package - In the NumPy chararray package (numpy.char), there is a memory corruption issue that has not yet been tracked down. This does not impact arrays of strings in general, just the numpy.char package
and surfaces in a couple of that package’s regression tests.
‘Bytes’ type is not recognized as a string type - In CPython strings can be both unicode and ASCII. However, in .NET all strings are unicode and the bytes type is represented as a byte[] type, which does not implement the
string interface. In most cases everything works as expected, but occasionally an error will be reported. In this case, it is necessary to manually convert to either a bytes or string type as needed. Because of this issue, numpy.compat.asbytes()
in IronPython returns a string type instead of a bytes type.
numpy.fromfile does not support file handles - In CPython numpy.fromfile accepts both file names (strings) and file handles. Because IronPython file handles are implemented as .NET streams and the NumPy file reading code is in the (native)
core, these streams can not be passed into the NumPy core. For now, only file names are accepted.
numpy.fromstring only support text strings - The CPython version of numpy.fromstring supports both text and binary strings. Because .NET strings are unicode, only text strings are currently supported.
Pickling from transposed and dis-continuous arrays is not implemented -
This functionality has not yet been implemented.
numpy._dotblas CBlas library has not yet been ported - This module must still be ported. The SciPy linear algebra package scipy.linalg can be used instead.
ndarray.resize() can not check for multiple references - IronPython uses a garbage collector instead of reference counting so there is no way for the resize function to check to see whether there are multiple references to the underlying
data. Using resize() on an array referenced by other objects is illegal and can cause memory issues. This function should only be used on a new copy of an array or avoided.
The __array_struct__ and __array_interface__ attributes are not implemented - These attributes provide methods for other objects to present an array interface to NumPy. If external classes implement these interfaces it will not cause a problem,
but the NumPy array on IronPython does not currently implement these interfaces. This breaks some regression tests in both NumPy and SciPy.
Buffer Support is weakly implemented - The current version of IronPython does not implement a full version of the buffer protocol in CPython. NumPy provides and implements the interface IExtBufferProtocol, but this is not a standard
part of IronPython.
Long double type is not supported - The .NET platform does not have support a long double type so it has not been implemented in this release.
[would you like to help port NumPy/SciPy to Mono? Let’s talk!]