Friday 15 August 2014

SWIG, Python and opaque structs

One of my open source projects is python-gphoto2, a Python interface to the popular libgphoto2 digital camera software library. It uses SWIG (Simplified Wrapper and Interface Generator) to generate the Python bindings from the library's "header files", saving me the trouble of hand crafting a Python interface to every C function in the library.

SWIG makes it quite easy to produce a "low level" Python interface to a C (or C++) library, but the result can be quite tricky to use. Problems often occur with object creation and deletion, which is something the average Python programmer doesn't normally have to think about. This blog post describes how I've tackled these problems in python-gphoto2.

The libgphoto2 C library makes a lot of use of "opaque" structs. These have their contents hidden from the library user, who only sees them as pointers to an unknown type. This is a classic example of good user interface design, and fits well with Python, where everything is a pointer anyway.

A typical header file for the "Foo" object might look like this:

typedef struct _Foo Foo;

int foo_new(Foo **obj);
int foo_free(Foo *obj);

int foo_get_value_a(Foo *obj, int *result);
int foo_set_value_a(Foo *obj, int value);

int foo_get_value_b(Foo *obj, char **result);
int foo_set_value_b(Foo *obj, char *value);
"Foo" objects are created by foo_new and deleted by foo_free. Data stored in the Foo object can only be accessed through functions such as foo_get_value_a and foo_set_value_b. All the functions return an int to indicate success or failure.

The first challenge when writing a SWIG interface file for the Foo object is raised by foo_new, which returns its result in a pointer to a pointer. SWIG's "typemap" system is well suited to dealing with this, as in this interface file:
%module Foo

%{
#include "foo.h"
%}

%include "typemaps.i"

%typemap(in, numinputs=0) Foo ** (Foo *temp) {
  $1 = &temp;
}
%typemap(argout) Foo ** {
  if (!PyList_Check($result)) {
    PyObject* temp = $result;
    $result = PyList_New(1);
    PyList_SetItem($result, 0, temp);
  }
  PyObject* temp = SWIG_NewPointerObj(*$1, SWIGTYPE_p__Foo, 0);
  PyList_Append($result, temp);
  Py_DECREF(temp);
}

%include "Foo.h"
The in typemap tells SWIG that any function in Foo.h that has a Foo ** parameter does not require an input value. The argout typemap is a bit more complicated. It is also applied to any function that has a Foo ** parameter. The first five lines convert the existing function result to a Python list, if it isn't already a list. The next three lines create a new Python Foo * pointer from the function's output and append it to the result list. The upshot of all this is that Foo objects can then be used from Python as follows:
import Foo

OK, f = Foo.foo_new()
...
OK = Foo.foo_set_value_b(f, 'Hello world!')
...
OK = Foo.foo_free(f)
This is a usable Python binding to the Foo object, but it has one or two nasty surprises in store for the casual Python programmer.
  1. Forgetting to call foo_free will lead to a memory leak.
  2. Attempting to use the Foo object f after calling foo_free will cause a segmentation fault.
  3. Calling foo_free twice on the same Foo object will cause a "double free or corruption" error.
Whilst the first of these will probably go unnoticed, the other two will produce abrupt termination with an unhelpful error message. Not what one expects from a Python program.

These problems can be partly cured if SWIG knows that foo_new is an object constructor and foo_free is an object destructor. This can be done quite easily:
%module Foo

%{
#include "foo.h"
%}

%include "typemaps.i"

%typemap(in, numinputs=0) Foo ** (Foo *temp) {
  $1 = &temp;
}
%typemap(argout) Foo ** {
  if (!PyList_Check($result)) {
    PyObject* temp = $result;
    $result = PyList_New(1);
    PyList_SetItem($result, 0, temp);
  }
  PyObject* temp = SWIG_NewPointerObj(*$1, SWIGTYPE_p__Foo, SWIG_POINTER_NEW);
  PyList_Append($result, temp);
  Py_DECREF(temp);
}
%delobject foo_free;

%include "Foo.h"
As a result of passing SWIG_POINTER_NEW to SWIG_NewPointerObj we now get a "possible memory leak" warning if a Python program fails to call foo_free. The %delobject directive makes it safe to call foo_free twice. (See the SWIG documentation for more detail.) However, the Python programmer is still responsible for calling foo_free shortly before deleting the corresponding Python object.

SWIG makes it possible to add functions to a C structure with the %extend directive, which is often used to add a destructor. My first attempts to use this failed, typically with a SWIG error "%extend defined for an undeclared class _Foo". The solution turned out to be to declare an empty struct with the appropriate name: struct _Foo {};. Because the interface hides the detail of the struct we're using, it apparently doesn't matter if it has no contents at all!

This "solution" generates yet another problem. We now have a partially defined struct which SWIG wants to generate its own constructor and destructor for. (It does this for all structs by default.) The details of the struct are hidden so we get this SWIG error: "invalid application of ‘sizeof’ to incomplete type ‘struct _Foo‘". To prevent this error we can tell SWIG to ignore the struct with the %ignore directive.

The final SWIG interface file is as follows:
%module Foo

%{
#include "foo.h"
%}

%include "typemaps.i"

%typemap(in, numinputs=0) Foo ** (Foo *temp) {
  $1 = &temp;
}
%typemap(argout) Foo ** {
  if (!PyList_Check($result)) {
    PyObject* temp = $result;
    $result = PyList_New(1);
    PyList_SetItem($result, 0, temp);
  }
  PyObject* temp = SWIG_NewPointerObj(*$1, SWIGTYPE_p__Foo, SWIG_POINTER_NEW);
  PyList_Append($result, temp);
  Py_DECREF(temp);
}
%delobject foo_free;
struct _Foo {};
%extend _Foo {
  ~_Foo() {
    foo_free($self);
  }
};
%ignore _Foo;

%include "Foo.h"
At last SWIG has all the information it needs. It knows _Foo is a struct, but it knows nothing of its contents. It knows foo_new creates Foo objects and it knows foo_free deletes Foo objects. Finally it puts the pieces together and invokes foo_free automatically when the Python Foo object is deleted.