When last we spoke – this morning, in fact – I was having trouble getting a simple string match going.
After some investigation – our DBA asking Oracle to dob on what joseki is doing – it’s… damn weird.
We have different SQL being generated for the FILTER query and the =”…” query. The filter one, as I’d expect, does a select * from nodes. It’s surprising that its as fast as it is. But so does the =”…” one.
Joseki seems to generate all the right SQL, but it’s commented out, and all that’s left uncommented is “select * from nodes” … no – I was totally wrong about that. Disregard everything I just said about Joseki not converting queries into SQL. The SQL is good, I was just not seeing the line breaks at the end of the commented out bits.
The hash that joseki is generating for the string “Abacopteris aspera” does not match the hash for that string in the database. We are using SDB with index2, and that means that each distinct value is hashed and the has indexed – that’s how it deals with different data types.
The bit that seems to matter from the query is
Now, pulling out all the hash values from that and querying against the oracle data tables:
As you see, the hash values for the URIs are correctly computed. But the has value for the string – according to the value in the data table, it should be 6576901907426019494, which is nowhere to be seen.
Hmm. What’s that in hex, I wonder? Hash in the database: 5B45D9705AB788A6, hash in the query: AB2424613B634CC1. Nope – no luck there. Nothing to do with each other.
So: why is the query engine computing a different hash value for a constant string than the SDB loader generated when it loaded it?
I hacked up Joseki by recompiling one of the classes and adding debugging. There’s a method Nodelayout2.hash(String lex, String lang, String datatype, int type). It gives me the SDB hash when passed
Abacopteris aspera, null, http://www.w3.org/2001/XMLSchema#string, 4
and the JOSEKI has when passed
Abacopteris aspera, null, null, 3
So I’m guessing that type 3 is “untyped literal” and type 4 is “typed literal”. …
Ok. Types are in Enumeration ValueType. 3 and 4 are STRING and XSDSTRING, respectively, which makes perfect sense.
Can I get JOSEKI to covert my literal into an XSDSTRING?
Drat. No. But here’s the intriguing thing …
Ah ha! It’s not intriguing at all! I’m an idiot! I just wasted half a day puzzled over this! I do indeed get one row back, but because none of the variables are bound, my results page shows a table row that’s only a couple of pixels high! If I hadn’t coloured the rows, I’d have seen nothing at all!
Well … that’s awesome. It means that this should work:
And not only does it work, it comes back really fast. Hmm. Now, that’s running agains an instance of joseki running on my machine, which I have hacked up for the occasion. What about running it agains the one at BOA? …
Oh my Lord! It’s awesome! Let’s try using a prefix for the xml schema namespace. …
Yep, that’s good too. Now then: lets combine two different names into a single subgraph. This is important, because I am aiming at being able to submit a list of names:
And finally, I should be able to hook that up to my “branch” graph (its a long story) to get the accepted taxon, and the “taxon” graph to get the full title of that taxon.
(I’ll get rid of the URI columns)
|Abacopteris aspera||Pronephrium asperum (C.Presl) Holttum [CHAH 2006]|
|Abacopteris triphylla||Pronephrium triphyllum (Sw.) Holttum sensu Bostock, P.D. (1998)|