Overview of the Pyrex Language

This document informally describes the extensions to the Python language made by Pyrex. Some day there will be a reference manual covering everything in more detail.
 

Contents

  • Python functions vs. C functions

  •  
  • C variable and type definitions

  •  
  • External declarations

  •  
  • Scope rules

  •  
  • Statements and expressions

  •  
  • Extension Types
  • Special Methods
  • Limitations

  • Python functions vs. C functions

    There are two kinds of function definition in Pyrex:

    Python functions are defined using the def statement, as in Python. They take Python objects as parameters and return Python objects.

    C functions are defined using the new cdef statement. They take either Python objects or C values as parameters, and can return either Python objects or C values.

    Within a Pyrex module, Python functions and C functions can call each other freely, but only Python functions can be called from outside the module by interpreted Python code. So, any functions that you want to "export" from your Pyrex module must be declared as Python functions.

    Parameters of either type of function can be declared to have C data types, using normal C declaration syntax. For example,

    def spam(int i, char *s):
        ...
    cdef int eggs(unsigned long l, float f):
        ...
    When a parameter of a Python function is declared to have a C data type, it is passed in as a Python object and automatically converted to a C value, if possible. Automatic conversion is currently only possible for numeric types and string types; attempting to use any other type for the parameter of a Python function will result in a compile-time error.

    C functions, on the other hand, can have parameters of any type, since they're passed in directly using a normal C function call.

    If no type is specified for a parameter or return value, it is assumed to be a Python object. (Note that this is different from the C convention, where it would default to int.) For example, the following defines a C function that takes two Python objects as parameters and returns a Python object:

    cdef spamobjs(x, y):
        ...
    Reference counting for these objects is performed automatically according to the standard Python/C API rules (i.e. borrowed references are taken as parameters and a new reference is returned).

    The name object can also be used to explicitly declare something as a Python object. This can be useful if the name being declared would otherwise be taken as the name of a type, for example,

    cdef ftang(object int):
        ...
    declares a parameter called int which is a Python object.


    C variable and type definitions

    The cdef statement is also used to declare C variables, either local or module-level:
    cdef int i, j, k
    cdef float f, g[42], *h
    and C struct, union or enum types:
    cdef struct Grail:
        int age
        float volume
    cdef union Food:
        char *spam
        float *eggs
    cdef enum CheeseType:
        cheddar, edam, 
        camembert
    cdef enum CheeseState:
        hard = 1
        soft = 2
        runny = 3
    Note that the words struct, union and enum are used only when defining a type, not when referring to it. For example, to declare a variable pointing to a Grail you would write
    cdef Grail *gp
    and not
    cdef struct Grail *gp # WRONG

    External declarations

    By default, C functions and variables declared at the module level are local to the module (i.e. they have the C static storage class). They can also be declared extern to specify that they are defined elsewhere, for example:
    cdef extern int spam_counter
    cdef extern void order_spam(int tons)
    At some time in the future it is planned that Pyrex will be able to read C header files. In the meantime, to create interfaces to existing C code you will have to include extern definitions for all the outside C functions and variables that you use.


    Scope rules

    Pyrex determines whether a variable belongs to a local scope, the module scope, or the built-in scope completely statically. As with Python, assigning to a variable which is not otherwise declared implicitly declares it to be a Python variable residing in the scope where it is assigned. Unlike Python, however, a name which is referred to but not declared or assigned is assumed to reside in the builtin scope, not the module scope. Names added to the module dictionary at run time will not shadow such names.


    Statements and expressions

    Control structures and expressions follow Python syntax for the most part. When applied to Python objects, they have the same semantics as in Python (unless otherwise noted). Most of the Python operators can also be applied to C values, with the obvious semantics.

    If Python objects and C values are mixed in an expression, conversions are performed automatically between Python objects and C numeric or string types.

    Reference counts are maintained automatically for all Python objects, and all Python operations are automatically checked for errors, with appropriate action taken.

    C operations which do not have direct Python equivalents are handled as follows:

  • There is no -> operator in Pyrex. Instead of p->x, use p.x

  •  
  • There is no * operator in Pyrex. Instead of *p, use p[0]

  •  
  • There is an an & operator, with the same semantics as in C

  •  
  • Type casts are written <type>value, for example:
  • cdef char *p, float *q
    p = <char*>q

    Extension Types

    Pyrex lets you create new built-in Python types, also known as extension types. You define an extension type using the cdef class statement. Here's an example:
    cdef class Shrubbery:

        cdef int width, height

        def __init__(self, w, h):
            self.width = w
            self.height = h

        def describe(self):
            print "This shrubbery is", self.width, \
                "by", self.height, "cubits."

    As you can see, a Pyrex extension type definition looks a lot like a Python class definition. Within it, you use the def statement to define methods that can be called from Python code. You can even define many of the special methods such as __init__ as you would in Python.

    The main difference is that you can also use the cdef statement to define attributes of any C data type, so you can use extension types to wrap arbitrary C data structures and provide a Python-like interface to them.

    Some other difference you need to be aware of are:

  • The set of attributes of an extension type is fixed at compile time; you can't add attributes to an extension type instance at run time simply by assigning to them, as you could with a Python class instance. (You can subclass the extension type in Python and add attributes to instances of the subclass, however.)

  •  
  • Attributes defined with cdef are only accessible from Pyrex code, not from Python code. (A way of defining Python-accessible attributes is planned, but not yet implemented. In the meantime, use accessor methods.)

  •  
  • To access the cdef-attributes of an extension type instance, the Pyrex compiler must know that you have an instance of that type, and not just a generic Python object. It knows this already in the case of the "self" parameter of the methods of that type, but in other cases you will have to tell it by means of a declaration. For example,
  • Some of the __xxx__ special methods behave differently from their Python counterparts, and some of them are named differently as well. See here for more information.
  • Pyrex extension types can be subclassed in Python. They cannot currently inherit from other built-in or extension types, but this may be possible in a future version.


    Limitations

    Pyrex is not quite a full superset of Python. The following restrictions apply:
  • Function definitions (whether using def or cdef) cannot be nested within other function definitions.

  •  
  • Class definitions can only appear at the top level of a module, not inside a function.

  •  
  • import * is not supported under any circumstances.

  •  
  • Generators are not supported.
  • There are also some temporary limitations which may eventually be lifted:
  • Class and function definitions cannot be placed inside control structures.

  •  
  • In-place operators (+=, etc) are not yet supported

  •  
  • Default argument values, keyword arguments, * and ** arguments, and tuple unpacking in argument lists are not yet supported

  •  
  • List comprehensions are not yet supported
  • There are probably also some other gaps which I can't think of at the moment.