On C Linked Lists
There are two basic styles of double-linked lists in C; I'll call them ring style and linear style.
The Linux kernel has ring-style, defined in include/linux/list.h. You declare a 'struct list_head' and everyone who wants to be in the list puts a 'struct list_head list' in their structure (struct list_node in CCAN's version). This forms a ring of pointers in both the forward and reverse directions.
SAMBA uses linear-style, defined in lib/util/dlinklist.h. You simply declare a 'struct foo *' as your list, and everyone who wants to be in the list puts a 'struct foo *prev, *next' in their structure. This forms a NULL-terminated list in the forward direction (the reverse direction became a ring recently to facilitate tail access). (Linux's hlist.h datastructure is similar, only done in list.h style).
Points in favor of ring-style lists:
- Insertion and deletion are branchless, as the elements and head are homogeneous:
head->next->prev = new; new->next = head->next; new->prev = head; head->next = new;
vs:
if (!head) { new->prev = head = new; new->next = NULL; } else { new->prev = head->prev; head->prev = new; new->next = head; head = new; }
- list_add, list_del are inline functions, not macros, since the types are known.
Points in favor of linear-style lists:
- They're typesafe, since the head pointer and internal pointers are all the correct type.
- Forward iteration is simpler, since the list ends at NULL rather than back at the head. In theory this could free a register, but the bigger difference is that it's often useful to have NULL in your iterator once the loop is done.
- As a corollary, iteration, initialization and emptiness testing don't need some tricky macros:
struct foo *head = NULL; if (head == NULL) ... for (i = head; i; i = i->next) ...
vs
LIST_HEAD(head); if (list_empty(&head)) ... list_for_each(i, head, list) ...
- Uses slightly less space for the head pointer (one pointer, vs two).
So how important is efficiency of insert and delete in a real project? To provide some data on this, I first annotate the linux kernel so each list.h operation would increment a counter which I could dump out every so often. Then I booted the kernel on my laptop and ran as normal for three days.
Operation | Frequency |
---|---|
Empty Test | 45% |
Delete | 25% |
Add | 23% |
Iterate Start | 3.5% |
Iterate Next | 2.5% |
Replace | 0.76% |
Other Manipulation | 0.0072% |
Firstly, I was surprised that we add and remove from lists much more than we look through them. On reflection, this makes sense: if lookups are really common we don't use a linked list. And note how often we check for whether the list is empty: this looks like a "list as a slow path" pattern. I wonder if SAMBA (or other projects) list usage looks the same... anyone?
Secondly, we delete more than we add, but we could be "deleting" initialized-but-unadded elements. I didn't chase this beyond re-reading and enhancing my patch to measure "other manipulations" to see if they could explain it (no).
Thirdly, when we iterate it's short: the list is empty or we break out on the first element 28% of the time (I didn't differentiate). I wonder if a single-linked list would be an interesting micro-space-optimization for many of these lists.
Summary: I like the safety of SAMBA's lists, but there's clearly real merit in eliminating those branches for add and delete. It's a genuine performance-usability trade-off, so I I think we'll still have both in CCAN...