Sparql at biodiversity.org.au, pt 2


When last we spoke – this morning, in fact – I was having trouble getting a simple string match going.

After some investigation – our DBA asking Oracle to dob on what joseki is doing – it’s… damn weird.

We have different SQL being generated for the FILTER query and the =”…” query. The filter one, as I’d expect, does a select * from nodes. It’s surprising that its as fast as it is. But so does the =”…” one.

Joseki seems to generate all the right SQL, but it’s commented out, and all that’s left uncommented is “select * from nodes” … no – I was totally wrong about that. Disregard everything I just said about Joseki not converting queries into SQL. The SQL is good, I was just not seeing the line breaks at the end of the commented out bits.

The hash that joseki is generating for the string “Abacopteris aspera” does not match the hash for that string in the database. We are using SDB with index2, and that means that each distinct value is hashed and the has indexed – that’s how it deals with different data types.

The bit that seems to matter from the query is

-- Const: <http://biodiversity.org.au/voc/graph/DATASET#20111129-APNI_TAX_NAM&gt;
INNER JOIN Nodes N_2
-- Const: <http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete&gt;
ON ( N_1.hash = -707528504822182151 AND N_2.hash = 696684872646940002 ) INNER JOIN Nodes N_3
-- Const: "Abacopteris aspera"
ON ( N_3.hash = -6114722394035499839 )
INNER JOIN Quads Q_1
-- <http://biodiversity.org.au/voc/graph/DATASET#20111129-APNI_TAX_NAM&gt; :2s <http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete&gt; "Abacopteris aspera"
ON ( Q_1.g = N_1.id
-- Const condition: <http://biodiversity.org.au/voc/graph/DATASET#20111129-APNI_TAX_NAM&gt;
AND Q_1.p = N_2.id
-- Const condition: <http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete&gt;
AND Q_1.o = N_3.id
-- Const condition: "Abacopteris aspera" )
LEFT OUTER JOIN Nodes R_1
-- Var: :3s
ON ( Q_1.s = R_1.id )

Now, pulling out all the hash values from that and querying against the oracle data tables:

HASH TO_CHAR(LEX)
-707528504822182151 http://biodiversity.org.au/voc/graph/DATASET#20111129-APNI_TAX_NAM
696684872646940002 http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete

As you see, the hash values for the URIs are correctly computed. But the has value for the string – according to the value in the data table, it should be 6576901907426019494, which is nowhere to be seen.

Hmm. What’s that in hex, I wonder? Hash in the database: 5B45D9705AB788A6, hash in the query: AB2424613B634CC1. Nope – no luck there. Nothing to do with each other.

So: why is the query engine computing a different hash value for a constant string than the SDB loader generated when it loaded it?

I hacked up Joseki by recompiling one of the classes and adding debugging. There’s a method Nodelayout2.hash(String lex, String lang, String datatype, int type). It gives me the SDB hash when passed
Abacopteris aspera, null, http://www.w3.org/2001/XMLSchema#string, 4
and the JOSEKI has when passed
Abacopteris aspera, null, null, 3

So I’m guessing that type 3 is “untyped literal” and type 4 is “typed literal”. …

Ok. Types are in Enumeration ValueType. 3 and 4 are STRING and XSDSTRING, respectively, which makes perfect sense.

Can I get JOSEKI to covert my literal into an XSDSTRING?

select ?lbl ?pred ?value
where { 
  graph g:APNI_TAX_NAM { 
    <http://biodiversity.org.au/apni.name/277356>
     tn:nameComplete 
     "Abacopteris aspera"^^<http://www.w3.org/2001/XMLSchema#string> .
  }
}

Drat. No. But here’s the intriguing thing …

Ah ha! It’s not intriguing at all! I’m an idiot! I just wasted half a day puzzled over this! I do indeed get one row back, but because none of the variables are bound, my results page shows a table row that’s only a couple of pixels high! If I hadn’t coloured the rows, I’d have seen nothing at all!

Well … that’s awesome. It means that this should work:

select ?taxNamUri
where { 
  graph g:APNI_TAX_NAM { 
    ?taxNamUri
     tn:nameComplete 
     "Abacopteris aspera"^^<http://www.w3.org/2001/XMLSchema#string> .
  }
}

And not only does it work, it comes back really fast. Hmm. Now, that’s running agains an instance of joseki running on my machine, which I have hacked up for the occasion. What about running it agains the one at BOA? …

Oh my Lord! It’s awesome! Let’s try using a prefix for the xml schema namespace. …

Yep, that’s good too. Now then: lets combine two different names into a single subgraph. This is important, because I am aiming at being able to submit a list of names:

select ?taxNamUri
where { 
  graph g:APNI_TAX_NAM { 
    {?taxNamUri tn:nameComplete "Abacopteris aspera"^^xs:string .}
    UNION
    {?taxNamUri tn:nameComplete "Abacopteris presliana"^^xs:string .}
    UNION
    {?taxNamUri tn:nameComplete "Abacopteris triphylla"^^xs:string .}
  }

And finally, I should be able to hook that up to my “branch” graph (its a long story) to get the accepted taxon, and the “taxon” graph to get the full title of that taxon.

select ?taxNamUri ?name  ?acceptedTax ?accTaxTitle
where { 
  graph g:APNI_TAX_NAM { 
    {
      {?taxNamUri tn:nameComplete "Abacopteris aspera"^^xs:string .}
      UNION
      {?taxNamUri tn:nameComplete "Abacopteris presliana"^^xs:string .}
      UNION
      {?taxNamUri tn:nameComplete "Abacopteris triphylla"^^xs:string .}
    }
    ?taxNamUri tn:nameComplete ?name
  }
  OPTIONAL {
    graph g:APC_TREE {
      ?acceptedTax ibis:isAcceptedConceptFor ?taxNamUri .
    }
    graph g:APNI_TAX_CON {
      ?acceptedTax dcterms:title ?accTaxTitle
    }
  }
}

(I’ll get rid of the URI columns)

?name ?accTaxTitle
Abacopteris aspera Pronephrium asperum (C.Presl) Holttum [CHAH 2006]
Abacopteris presliana
Abacopteris triphylla Pronephrium triphyllum (Sw.) Holttum sensu Bostock, P.D. (1998)

And that’s it. Well, the SPARQL, anyway. I just need to write a little bit if HTML and javascript allowing a user to paste a list of names in a HTML form, and done.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: