Ok.


Alright.

We have a table of the checklists. A “checklist” is a version at a particular time, so there are multiple records for ‘APC’ but only one of these will be the current APC checklist, with a null replaced_by.

id Checklist id
copy_of_id Checklist history pointers
replaced_by_id
persisted_at_ts Checklist history timestamps
replaced_at_ts
Other data fields. Labels, owner, long title, uri, etc.

We have two tables for the items in the checklists: checklist_p and checklist_v, p=’physical’ and v=’virtual’.

Checklist_p always contains the current version of every checklist. In particular, checklist_p contains the taxon ids. The data structure is

id The physical id
cl_id The checklist id
row The row number relative to the checklist (starting at row 0).
parent_row Tree structure in doubly-linked grapevine format. I’m thinking about using offsets rather than numbers for reasons which shall be explained.
(I might make these names shorter)
size
prev_sibling_row
next_sibling_row
first_child_row
last_child_row
Taxon id and other data items

Note that this makes it straightforward to read out current lists in row order, and to check that current lists do not have duplicate taxa. If we slap a unique constraint on cli_id/taxon_uri, then job done.

Row number zero has a null taxon_id, it’s used so that a checklist may contain multiple items at the top level.

Checklist_v contains the arrangement of all checklists. It is similar in structure to checklist_p – a grapevine list using row numbers. Each item in the list refers to another list item either in checklist_v or checklist_p. Each item in the list either uses the arrangement of sub nodes from the thing referred to, or has its arrangement in checklist_v.

This means that checklist_v, while being structured like checklist_p has entries missing – they are located elsewhere. You can see why I am thinking that storing the tree structure as offsets might be the go. It means that when you chase a reference, you don’t have to keep track of an adjustment to the row numbers referred to.

My main concern is whether or not checklist_v should be permitted to refer to itself. The problem with this is that it means that reconstructing old checklists becomes a tree-walk of arbitrary depth. The problem with *not* doing it is that if a node is rearranged so that it no longer appears in checklist_p, then the node as it was needs to be reconstructed in checklist_v everywhere that it appears. In effect, checklist_v fills up with crud that we don’t really care about anymore but we have to keep.

But it’s not as bad as it appears – nodes get moved (more on this in a minute), so checklist_v self-references don’t follow a history, they point to the latest place where the node appears in that form. The very cool thing about this is that we can show that on a printout of an old, reconstructed list.

To continue.

The basic job of editing is this:

You construct a new checklist made up of bits of other checklists and new data which you enter. You can opt to push an entry in your list into another (current) list – replacing a node in that list with a node in yours. Note that inserts and deletes are done by replacing the parent node of the node you are trying to insert.

When you do this, a new checklist is created. The nodes from your list and the unaltered nodes from the target list are moved into the new list and replaced with references.

And, well, and that’s it. (TODO: what happens to the NSL list? it has to be left in permanent “I am not a current list” state, or something.)

No, actually, that’s not it. The other issue is where two people are hammering away at the same list. A replacement of a node at high level means that the checkout needs to be turned into a checkout of the new node. Unaltered sibling nodes need to be fixed up …. I still need the notion of a ‘tracking’ link. Which is kind of comforting. This system is probably isomorphic to my git-like nodes system, just redone with row numbers to make working with the current list much easier.

A worked example

Actually – no. I’ll start coding, then once that’s done do an update and then dump the results. Which is how those graphs were done.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: