On C Linked Lists

There are two basic styles of double-linked lists in C; I'll call them ring style and linear style.

The Linux kernel has ring-style, defined in include/linux/list.h. You declare a 'struct list_head' and everyone who wants to be in the list puts a 'struct list_head list' in their structure (struct list_node in CCAN's version). This forms a ring of pointers in both the forward and reverse directions.

SAMBA uses linear-style, defined in lib/util/dlinklist.h. You simply declare a 'struct foo *' as your list, and everyone who wants to be in the list puts a 'struct foo *prev, *next' in their structure. This forms a NULL-terminated list in the forward direction (the reverse direction became a ring recently to facilitate tail access). (Linux's hlist.h datastructure is similar, only done in list.h style).

Points in favor of ring-style lists:

Insertion and deletion are branchless, as the elements and head are homogeneous:

 head->next->prev = new;
 new->next = head->next;
 new->prev = head;
 head->next = new;

vs:

if (!head) {
    new->prev = head = new;
    new->next = NULL;
} else {
    new->prev = head->prev;
    head->prev = new;
    new->next = head;
    head = new;
}

list_add, list_del are inline functions, not macros, since the types are known.

Points in favor of linear-style lists:

They're typesafe, since the head pointer and internal pointers are all the correct type.
Forward iteration is simpler, since the list ends at NULL rather than back at the head. In theory this could free a register, but the bigger difference is that it's often useful to have NULL in your iterator once the loop is done.

As a corollary, iteration, initialization and emptiness testing don't need some tricky macros:

  struct foo *head = NULL;
  if (head == NULL) ...
  for (i = head; i; i = i->next) ...

  LIST_HEAD(head);
  if (list_empty(&head)) ...
  list_for_each(i, head, list) ...

Uses slightly less space for the head pointer (one pointer, vs two).

So how important is efficiency of insert and delete in a real project? To provide some data on this, I first annotate the linux kernel so each list.h operation would increment a counter which I could dump out every so often. Then I booted the kernel on my laptop and ran as normal for three days.

Operation	Frequency
Empty Test	45%
Delete	25%
Add	23%
Iterate Start	3.5%
Iterate Next	2.5%
Replace	0.76%
Other Manipulation	0.0072%

Firstly, I was surprised that we add and remove from lists much more than we look through them. On reflection, this makes sense: if lookups are really common we don't use a linked list. And note how often we check for whether the list is empty: this looks like a "list as a slow path" pattern. I wonder if SAMBA (or other projects) list usage looks the same... anyone?

Secondly, we delete more than we add, but we could be "deleting" initialized-but-unadded elements. I didn't chase this beyond re-reading and enhancing my patch to measure "other manipulations" to see if they could explain it (no).

Thirdly, when we iterate it's short: the list is empty or we break out on the first element 28% of the time (I didn't differentiate). I wonder if a single-linked list would be an interesting micro-space-optimization for many of these lists.

Summary: I like the safety of SAMBA's lists, but there's clearly real merit in eliminating those branches for add and delete. It's a genuine performance-usability trade-off, so I I think we'll still have both in CCAN...