D2RQ


(this post is a bit of a note-to-self. Apologies if it lacks enough context to make it understandable.)

So. We would like to publish our data to the semantic web live using d2rq. This is actually pretty exciting. It will replace our eXist XML database. eXist was a nice idea, but there turned out to be all sorts of problems with using it as a platform.

Furthermore, our new data model embodies a different way of looking at names, references, and taxon concepts which we believe to be an advance on the current TDWG picture. Part of the purpose of publishing the data to the semantic web is to expose this new data model.

But, that’s not what this post is about.

I ran into a nasty little problem with mapping the database, which I seem to have solved, and it wasn’t obvious from the d2rq docs.

It’s like this:

  1. We have a table NAME.
  2. It has an optional one-to-one join to SIMPLE_NAME, which has derived, denormalised data.
  3. SIMPLE_NAME has three fields: FAMILY, GENUS, SPECIES which link back to name.
  4. I would like to expose this with hasFamily, hasGenus, and hasSpecies.

How hard could it be?

I found that when I put one of these properties in, everything was sweet. When I put two in – oh my Lord. It started thinking that things were the genus-of their family: stuff like that.

Eventually, I found something that worked.

The clue is that table aliases are global across the entire configuration. So what works is to create a completely new entity mapping for name, using a table alias (simplename_family, simplename_genus, simplename_species) and so on. Using those mappings, the underlying gear seems to be able to produce the triples without tripping over itself – it just treats them as separate things.

But, you may ask, isn’t it going to be a drag to link all of these things with owl:sameAs?

Not at all!

It seems that d2rq is perfectly happy to have multiple mappings that resolve to the same uri. At a guess, it generates a humungous union query underneath it all.

The relevant code looks a bit like this:

# ===============================================================================================
# main name table

map:APNI_Name a d2rq:ClassMap;
	d2rq:dataStorage map:APNI_database;
	d2rq:uriPattern "nsl.name/@@name.id@@";
	d2rq:class <http://biodiversity.org.au/voc/nsl/Name>;
	.

# this is the only actual data field that I am pulling out at present

map:APNI_Name_fullName a d2rq:PropertyBridge;
	d2rq:belongsToClassMap map:APNI_Name;
	d2rq:property nsl_name:fullName;
	d2rq:propertyDefinitionLabel "name.fullName";
	d2rq:column "name.full_name";
	.

# ===============================================================================================
# simple name - derived links

map:APNI_SimpleName a d2rq:ClassMap;
	d2rq:dataStorage map:APNI_database;
	d2rq:uriPattern "nsl.name/@@nsl_simple_name.id@@";
	d2rq:class <http://biodiversity.org.au/voc/nsl/Name>;
	.

# ===============================================================================================
# Alias the name table once for each join

map:APNI_simplenameFamily a d2rq:ClassMap;
	d2rq:dataStorage map:APNI_database;
	d2rq:uriPattern "nsl.name/@@simplename_family.id@@";
 	d2rq:alias "name AS simplename_family";
	d2rq:class <http://biodiversity.org.au/voc/nsl/Name>;
	.

map:APNI_simplenameGenus a d2rq:ClassMap;
	d2rq:dataStorage map:APNI_database;
	d2rq:uriPattern "nsl.name/@@simplename_genus.id@@";
 	d2rq:alias "name AS simplename_genus";
	d2rq:class <http://biodiversity.org.au/voc/nsl/Name>;
	.

map:APNI_simplenameSpecies a d2rq:ClassMap;
	d2rq:dataStorage map:APNI_database;
	d2rq:uriPattern "nsl.name/@@simplename_species.id@@";
 	d2rq:alias "name AS simplename_species";
	d2rq:class <http://biodiversity.org.au/voc/nsl/Name>;
	.

# ===============================================================================================
# Map the joins on simplename

map:APNI_SimpleName_family a d2rq:PropertyBridge;
    d2rq:belongsToClassMap map:APNI_SimpleName;
    d2rq:property nsl_name:hasFamily;
    d2rq:alias "name AS simplename_family";
    d2rq:refersToClassMap map:APNI_simplenameFamily;
    d2rq:join "nsl_simple_name.family_nsl_id => simplename_family.id";
    d2rq:limitInverse 3;
    .

map:APNI_SimpleName_genus a d2rq:PropertyBridge;
    d2rq:belongsToClassMap map:APNI_SimpleName;
    d2rq:property nsl_name:hasGenus;
    d2rq:alias "name AS simplename_genus";
    d2rq:refersToClassMap map:APNI_simplenameGenus;
    d2rq:join "nsl_simple_name.genus_nsl_id => simplename_genus.id";
    d2rq:limitInverse 3;
    .

map:APNI_SimpleName_species a d2rq:PropertyBridge;
    d2rq:belongsToClassMap map:APNI_SimpleName;
    d2rq:property nsl_name:hasSpecies;
    d2rq:alias "name AS simplename_species";
    d2rq:refersToClassMap map:APNI_simplenameSpecies;
    d2rq:join "nsl_simple_name.species_nsl_id => simplename_species.id";
    d2rq:limitInverse 3;
    .

And with that, this SPARQL:

SELECT ?s ?p ?o WHERE {
  {
    { 
      <http://localhost:2020/resource/nsl.name/54444> ?p ?o 
    }
    union 
    {
      ?s ?p <http://localhost:2020/resource/nsl.name/54444>
    }
  }
}

Correctly produces the output (apologies for the wordpress clipping and html entities):

spo
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
nsl_name:hasFamily<…/nsl.name/54444>
nsl_name:hasParent<…/nsl.name/214968>
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
nsl_name:hasNameType<…/nsl.name.type/scientific>
nsl_name:name-hasNameType<…/nsl.name.type/scientific>
nsl_name:name-hasNameGroup<…/nsl.name.group/botanical>
nsl_name:hasNameGroup<…/nsl.name.group/botanical>
nsl_name:fullName“Orchidaceae Juss.”
rdfs:label“name #54444: Orchidaceae Juss.”
nsl:dbTable“NAME”
nsl:dbId“54444”
rdf:type<http://biodiversity.org.au/voc/nsl/Name&gt;
nsl_name:name-hasNameCategory<…/nsl.name.category/scientific>
nsl_name:hasNameCategory<…/nsl.name.category/scientific>
nsl_name:name-hasNameStatus<…/nsl.name.status/nom._cons.>
nsl_name:hasNameStatus<…/nsl.name.status/nom._cons.>
nsl_name:hasNameRank<…/nsl.name.rank/Familia>
nsl_name:name-hasNameRank<…/nsl.name.rank/Familia>
<…/nsl.name/204868>nsl_name:hasFamily
<…/nsl.name/124970>nsl_name:hasFamily
<…/nsl.name/132656>nsl_name:hasFamily
<…/nsl.name/204868>nsl_name:hasParent
<…/nsl.name/120932>nsl_name:hasParent
<…/nsl.name/120939>nsl_name:hasParent

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: