Species, derivatives of types and Gröbner bases for operads

This is a typed up copy of my lecture notes from the combinatorics seminar at KTH, 2010-09-01. This is not a perfect copy of what was said at the seminar, rather a starting point from which the talk grew.

In some points, I've tried to fill in the most sketchy and un-articulated points with some simile of what I ended up actually saying.

Combinatorial species started out as a theory to deal with enumerative combinatorics, by providing a toolset & calculus for formal power series. (see Bergeron-Labelle-Leroux and Joyal)

As it turns out, not only is species useful for manipulating generating functions, btu it provides this with a categorical approach that may be transplanted into other areas.

For the benefit of the entire audience, I shall introduce some definitions.

Definition: A category C is a collection of objects and arrows with each arrow assigned a source and target object, such that

  1. Each object has its own identity arrow 1.
  2. Chains of arrows are associatively composable, with 1 the identity of this composition.

Examples: Sets, Finite sets, k-Vector spaces, left and right R-Modules, Graphs, Groups, Abelian groups.

[tex]\mathbb B[/tex]: finite sets with only bijections as morphisms.


  • Category of a monoid.
  • Category generated by a graph
  • Category of a group (groupoid version of the category of a monoid)
  • Category of a poset
  • Category of Haskell types and functions.

Definition: A functor F is a map of categories, in other words, a pair of maps Fo on objects and Fa on arrows, such that the identity arrow maps to identity arrows and compositions of maps map to compositions of their images.

Constant functor: sends every object to A, every map to 1.

Identity functor: sends objects and arrows to themselves.

Underlying set functor, free monoid/vector space/module/... functors.

We are now equipped to define Species:
Definition: A species of structures is a functor [tex]\mathbb B\to\mathbb B[/tex].

The idea is a set gets mapped to the set of all structures labelled by the elements in the original set. A bijection on labels maps to the transport of structures along the relabeling.

[tex]A[/tex]: rooted trees, labels on vertices
[tex]G[/tex]: simple graphs, labels on vertices
[tex]Gc[/tex]: connected simple graphs, labels on vertices.
[tex]a[/tex]: trees, labels on vertices.
[tex]D[/tex]: directed graphs, labels on vertices
[tex]\wp[/tex]: subsets, [tex]\wp[U] = \{S: S\subseteq U\}[/tex].
[tex]End[/tex]: endofunctions
[tex]Inv[/tex]: involutions
[tex]S[/tex]: permutations
[tex]C[/tex]: cycles
[tex]L[/tex]: linear orders
[tex]E[/tex]: sets: [tex]E[U] = \{U\}[/tex]
[tex]e[/tex]: elements: [tex]e[U] = U[/tex]
[tex]X[/tex]: singletons: [tex]X[U] = U[/tex] if [tex]|U|=1[/tex] and [tex]X[U] = \emptyset[/tex] otherwise
[tex]1[/tex]: empty set: [tex]1[\emptyset]=\{\emptyset\}[/tex], [tex]1[U] = \emptyset[/tex] otherwise
[tex]0[/tex]: empty species: [tex]0[U] = \emptyset[/tex].

In enumerative combinatorics, the power of species resides in the association to each species a number of generating functions:

The generating series of a species F is the exponential formal power series
[tex]F(x) = \sum_{n=0}^\infty |F[n]|\frac{x^n}{n!}[/tex]
where we use the convention [tex][n] = \{1,2,\dots,n\}[/tex], and [tex]F[n] = F[[n]][/tex].
[tex]L(x) = \frac{1}{1-x}[/tex]
[tex]S(x) = \frac{1}{1-x}[/tex]
[tex]E(x) = e^x[/tex]
[tex]e(x) = xe^x[/tex]
[tex]\wp(x) = e^{2x}[/tex]
[tex]X(x) = x[/tex]
[tex]1(x) = 1[/tex]
[tex]0(x) = 0[/tex]

There are a number other in use for combinatorics, but for my purposes, this is the one I'll focus on.

Operations on species

The real power, though, emerges when we start combining species, and carry over the combinations to actions on the corresponding power series.


The number of ways to form either, say, a graph or a linear order on a set of labels is the sum of the numbers of ways to form either in isolation.

This corresponds cleanly to the coproduct in Set, and we may write F+G for the species
[tex](F+G)[U] = F[U] + G[U][/tex]
(where the second + is the coproduct — i.e. disjoint union — of sets)

In the power series, we get [tex](F+G)(x) = F(x) + G(x)[/tex].

Examples: We may define the species of all non-empty sets [tex]E_+[/tex] by
[tex]E = 1 + E_+[/tex]
This kind of functional equations is where the theory of species starts to really shine.


A tricoloring of a set is a subdivision of the set into three disjoint subsets covering the original set. The number of tricolorings of size n is
[tex]\sum_{i+j+k=n} \#\text{sets of size i}\cdot\#\text{sets of size j}\cdot\#\text{sets of size k}[/tex]
A permutation fixes some set of points. The permutation restricted to the non-fixed points is a derangement. Total number of permutations on n elements is
[tex]\sum_{i+j=n}\#\text{sets of size i}\cdot\#\text{derangements of size j}[/tex]
In both of these cases, the total generating series is a product of the component series, and we end up defining
[tex]F\cdot G[U] = \{(f,g): f\in F[U_1], g\in G[U_2], U_1\cap U_2 = \emptyset, U_1\cup U_2 = U\}[/tex]
So [tex]F\cdot G[U] = \sum_{U_1,U_2\text{ decompose }U} F[U_1]\times G[U_2][/tex].

Thus, tricolorings are [tex]E\cdot E\cdot E[/tex] and permutations are [tex]S = E\cdot Der[/tex], where the set is that of fixed points, and the derangement captures the actual action of the permutation.


Endofunctions of sets decompose in their actions on the points as cycles or directed trees leading in to these cycles.

Since a collection of disjoint cycles corresponds to a permutation, we can consider such endofunctions to be permutations decorated with rooted trees attached to points of the permutations; or even permutations of rooted trees.

To form such a structure on a set U, we'd first partition U into subsets, put the structure of a rooted tree on each subset, and then the structure of a permutation on the set of these subsets.

Thus, the number of such structures on n elements is
[tex]\sum_{r\leq n}\sum_{\sum_{k=1}^n i_k = n} \#\tex{permutations on [r]}\cdot\prod_{k=1}^r\#\text{rooted trees on [i_k]}[/tex]

This corresponds to the power series [tex]S(A(x))[/tex], and we write, in general, [tex]F\circ G[U][/tex] for the species of F-structures of G-structures on subsets.

A = X·E(A)
L = 1 + X·L
B = 1 + X·B·B


Picking out a single point in a structure on n points can be done in precisely n ways.

Thus the corresponding generating function will be
[tex]\sum n\cdot f_n\cdot\frac{x^n}{n!}[/tex]
for f-structures with a single label distinguished.
Since we're working with exponential power series, we may notice that
[tex]\frac{\partial}{\partial x} \sum f_n\frac{x^n}{n!} = \sum f_{n+1}\frac{x^n}{n!}[/tex]
and thus that derivatives are shifts.
Furthermore, [tex]x\cdot\sum f_n\frac{x^n}{n!} = \sum f_n\frac{x^{n+1}}{n!} = \sum f_{n-1}n\cdot{x^n}{n!}[/tex]
so that the generating function for F-structures with a single distinguished are
[tex]F^\bullet(x) = x\cdot\frac{\partial}{\partial x}F(x)[/tex]

In species, this process is called pointing.

In functional programming, Conor McBride related this construction to Huet's Zipper datatypes.

As it turns out, many of the constructions for species make eminent sense outside the category [tex]\mathbb B[/tex]. In fact, species in Hask are known to programming language researchers as container datatypes and the whole calculus translates relatively cleanly.

Functional equations translate to the standard new data type definitions in Haskell.

L = 1 + X·L
`` data List a = Nil | Cons a (List a)``
B = 1 + X·B·B
`` data BinaryTree a = Leaf | Node (BinaryTree a) a (BinaryTree a)``
A = X·E(A)
`` data RootedTree a = Node a (Set (RootedTree a))``
usually, we simulate the Set here by a List. If we need for our rooted trees to be planar, we can in fact impose a Linear Order structure instead, and get something like
Ap = X·L(Ap)

The species interpretation of [tex]\frac{\partial}{\partial x}[/tex] corresponding to leaving a hole in the structure carries over cleanly, so that [tex]\frac{\partial}{\partial x}T[/tex] is the type of T-with-a-hole.

Two derivatives of lists

We can deal with [tex]\frac{\partial}{\partial x}L[/tex] in two different ways:

DL = D(1+X·L) = D1 + D(X·L) = 0 + DX·L+X·DL
and thus
DL-X·DL = 1·L
so (1-X)·DL = L
and thus
DL = L·1/(1-X)
Now, from L=1+X·L follows by a similar cavalier use of subtraction and division — which, by the way, in species theory is captured by the idea of a virtual species, and dealt with relatively cleanly — that
L = 1+X·L
L-X·L = 1
and thus
(1-X)·L = 1
L = 1/(1-X)
Thus, we can conclude that
DL = L·1/(1-X) = L·L
and thus, a list with a hole is a pair of lists: the stuff before the hole and the stuff after the hole.
We could, instead of using implicit differentiation, as in 1, attack the derivation we had of
L = 1/(1-X) = 1+X+X·X+X·X·X+…
DL = D(1/(1-X)) = D(1+X+X·X+…) = 0+1+2X+3X·X+…
which we can observe factors as
= (1+X+X·X+X·X·X+…)·(1+X+X·X+X·X·X+…) = L·L
Or we can just use the division rule
DL = D(1/(1-X)) = D(1/u)·D(1-X) = -1/(u·u)·(-1) = 1/[(1-X)·(1-X)] = L·L

Gröbner bases for operads

All of this becomes relevant to the implementation of Buchberger's algorithm on shuffle operads (see Dotsenko—Khoroshkin and Dotsenko—Vejdemo-Johansson) in the step where the S-polynomials (and thereby also the reductions) are defined. With a common multiple defined, we need some way to extend the modifications that take the initial term to the common multiple to the rest of that term.

For this, it turns out, that the derivative of the tree datatype used provides theoretical guarantees that only partially filled in trees of the right size, with holes the right size, can be introduced; and also provides an easy and relatively efficient algorithm for contructing the hole-y trees and later filling in the holes.