We have a table of the checklists. A “checklist” is a version at a particular time, so there are multiple records for ‘APC’ but only one of these will be the current APC checklist, with a null replaced_by.
|copy_of_id||Checklist history pointers|
|persisted_at_ts||Checklist history timestamps|
|Other data fields. Labels, owner, long title, uri, etc.|
We have two tables for the items in the checklists: checklist_p and checklist_v, p=’physical’ and v=’virtual’.
Checklist_p always contains the current version of every checklist. In particular, checklist_p contains the taxon ids. The data structure is
|id||The physical id|
|cl_id||The checklist id|
|row||The row number relative to the checklist (starting at row 0).|
|parent_row||Tree structure in doubly-linked grapevine format. I’m thinking about using offsets rather than numbers for reasons which shall be explained.
(I might make these names shorter)
|Taxon id and other data items|
Note that this makes it straightforward to read out current lists in row order, and to check that current lists do not have duplicate taxa. If we slap a unique constraint on cli_id/taxon_uri, then job done.
Row number zero has a null taxon_id, it’s used so that a checklist may contain multiple items at the top level.
Checklist_v contains the arrangement of all checklists. It is similar in structure to checklist_p – a grapevine list using row numbers. Each item in the list refers to another list item either in checklist_v or checklist_p. Each item in the list either uses the arrangement of sub nodes from the thing referred to, or has its arrangement in checklist_v.
This means that checklist_v, while being structured like checklist_p has entries missing – they are located elsewhere. You can see why I am thinking that storing the tree structure as offsets might be the go. It means that when you chase a reference, you don’t have to keep track of an adjustment to the row numbers referred to.
My main concern is whether or not checklist_v should be permitted to refer to itself. The problem with this is that it means that reconstructing old checklists becomes a tree-walk of arbitrary depth. The problem with *not* doing it is that if a node is rearranged so that it no longer appears in checklist_p, then the node as it was needs to be reconstructed in checklist_v everywhere that it appears. In effect, checklist_v fills up with crud that we don’t really care about anymore but we have to keep.
But it’s not as bad as it appears – nodes get moved (more on this in a minute), so checklist_v self-references don’t follow a history, they point to the latest place where the node appears in that form. The very cool thing about this is that we can show that on a printout of an old, reconstructed list.
The basic job of editing is this:
You construct a new checklist made up of bits of other checklists and new data which you enter. You can opt to push an entry in your list into another (current) list – replacing a node in that list with a node in yours. Note that inserts and deletes are done by replacing the parent node of the node you are trying to insert.
When you do this, a new checklist is created. The nodes from your list and the unaltered nodes from the target list are moved into the new list and replaced with references.
And, well, and that’s it. (TODO: what happens to the NSL list? it has to be left in permanent “I am not a current list” state, or something.)
A worked example
Actually – no. I’ll start coding, then once that’s done do an update and then dump the results. Which is how those graphs were done.