I probably shouldn’t tell you this, but I have a test instance of d2rq running against a test copy of the new database at http://biodiversity.org.au/d2rq.
By the time you read this, it has probably already crashed and will need to be bounced tomorrow. But hopefully not.
Most of my work over the past few weeks has been to build a d2rq mapping for our data, and to write a corresponding OWL vocabulary. I have not attempted to use existing vocablaries and fit them to what we are doing, instead opting to write a self-contained vocabulary. For terms that – hopefully – map to common things pertaining generally to taxonomy, see http://biodiversity.org.au/voc/boa/BOA.rdf, and the other files alongside it: Name.rdf, Author.rdf, Reference.rdf, Instance.rdf. Terms that are more specific to our National Species List application are in http://biodiversity.org.au/voc/nsl/NSL.rdf.
Naturally, it still needs cleaning up. Consider http://biodiversity.org.au/voc/boa/Name#Rank-visible. This boolean means “ranks like this need to be shown in the name” – eg, ‘var’. Does this belong in the boa vocabulary? Or is it an artifact of how we produce name strings, belonging in the nsl vocabulary? I don’t know. To a degree, it doesn’t really matter – the main thing is the overall organisation of the data as a model of how the business of taxonomy gets done, and the persistence of the URIs for the individual objects.
Before I continue on to what I actually wanted to write about, I see I need to justify not using existing vocabularies.
First: God knows, we tried. Taxon Concept Schema, OAI-PMH, the TDWG ontology, SKOS, LSIDs – it’s all in there in the current stuff. But there’s endless quibbling about whether or not what SKOS means by a ‘concept’ is really the same as a name or an instance of a name (aka: not all instances are taxon concepts), or whether names are really names in the sense intended by the vocabulary you are trying to use, or if some of them are, and if so which ones (An illegitimate name is a name, but is it really a name? Those kinds of discussions.). Multiply this by the total number of terms you are trying to borrow. Each vocabulary presents a list of ranks, of nomenclatural statuses and what have you, but those lists never quite match what we have. 80% are fine.
But you are always having to invent extra vocabulary to make things fit properly. We tried to use TCS and wound up putting chunks of information in the ‘custom data’ section (before giving up on TCS altogether because the schema is so tight that it’s very difficult to generate a TCS document that is correct).
The solution we are going with is just to expose the data we have, with a vocabulary that we publish at the site (currently full of “insert description here” descriptions), and to offload the job of determining whether what we mean is what a querent is asking about onto the querent.
If you want to write a query across our SPARQL server and dbpedia, by all means go for it. Provided that we keep the ids for the actual data items persistent, we can fool around with the triples we serve up for them at a later stage.
Second, and rather more interestingly: some of what we are doing is – as far as I can make out – new. Some of this new stuff is what I am hoping to write about.
Looking back, this is a little long for a blog post, and if I go on to discuss what I actually want to discuss, then this post would have two separate and almost completely unrelated topics.
So I will change the title of this post and start a new one about our concretization of relationship instances, including the vexing question of what I am supposed to call doing it. ‘Concretization’ is just plain wrong.
— EDIT —
I think I am going to have to go with “realize”. It seems odd, because it’s the object of the verb “abstract”, but it’s the only thing that fits.