Sparql at biodiversity.org.au


Well! Quite a bit of success with our test deployment of a SPARQL server. The server runs at http://biodiversity.org.au/sparql/. I have a very nice html page that uses the JSON output, but WordPress won’t let me upload it. Oh well. And, of course, our wiki is locked up.

Oh well. Here’s code for a simple HTML form. WordPress clips it, but that’s just display: you can still c/p it into a html file.

<html>
  <body>
    <form action="http://biodiversity.org.au/sparql/" 
        method="post" target="SPARQLOUTPUT">
      <textarea style="background-color: #F0F0F0;" 
          name="query" cols="70" rows="30">
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
prefix tc: <http://rs.tdwg.org/ontology/voc/TaxonConcept#>
prefix pc: <http://rs.tdwg.org/ontology/voc/PublicationCitation#>
prefix tcomm: <http://rs.tdwg.org/ontology/voc/Common#>
prefix ibis: <http://biodiversity.org.au/voc/ibis/IBIS#>
prefix afd: <http://biodiversity.org.au/voc/afd/AFD#>
prefix apni: <http://biodiversity.org.au/voc/apni/APNI#>
prefix apc: <http://biodiversity.org.au/voc/apc/APC#>
prefix afdp: <http://biodiversity.org.au/voc/afd/profile#>
prefix apnip: <http://biodiversity.org.au/voc/apni/profile#>
prefix g: <http://biodiversity.org.au/voc/graph/GRAPH#>

select ?label ?title ?title ?desc
  where {
    graph g:meta {
      ?uri rdf:type g:GraphURI .
      OPTIONAL { ?uri rdfs:label ?label  } .
      OPTIONAL { ?uri dcterms:title ?title  } .
      OPTIONAL { ?uri dcterms:description ?desc  } .
    }
  }
ORDER BY ?uri
      </textarea>
      <br>
      <input type="radio" name="output" value="xml"> xml,
      <input type="radio" name="output" value="json"> json,
      <input type="radio" name="output" value="text"> text,
      <input type="radio" name="output" value="csv"> csv,
      <input type="radio" name="output" value="tsv" checked> tsv<br>
      Force <tt>text/plain</tt>: <input 
          type="checkbox" name="force-accept"   value="text/plain"><br>
      <input type="submit" value="Get Results" >
    </form>
  </body>
</html>

The form contains a little sample query.

A big problem is metadata, which involves question like

  • What named graphs does the sparql service expose?
  • What vocabularies are used?
  • What publically-visible identifiers/top-level objects are available?

I’ve made a bit of an attempt at making this self-documenting by having the “meta” and “ibis_voc” graphs, containing the graphs and the vocabulary. But it’s hard going interpreting OWL, which is what the vocabulary documents are, and the refs:comment entries for the local vocabularies are not always well-written. Sigh: no matter how clever you try to be with tools and structure, ultimately you have to sit down and write the content.

So: What classes and predicates are defined in our custom ibis vocabulary – in addition to the TDWG standards? (my sample code assumes that you have left the prefix declarations as they are in the sample HTML form)

select ?pred 
where { 
  graph g:ibis_local_voc { 
    ?pred rdf:type owl:ObjectProperty .
  }
}
ORDER BY ?pred

Rad. Do any of them had domains and ranges defined?

select ?domain ?pred ?range 
where { 
  graph g:ibis_local_voc { 
    ?pred rdf:type owl:ObjectProperty .
    OPTIONAL { ?pred rdfs:domain ?domain } .
    OPTIONAL { ?pred rdfs:range ?range } .
  }
}
ORDER BY ?domain ?pred

System.out.println(
new String[]{“Booyah”,”Awesome”,”Boss”,”Amazing”}
[new Random().getInteger()%4] + “!”);

Ok. But what about the data? Well, our identifiers are biodiversity.org.au URIs (a rather important nugget of info, that), in order to “play nice” with the semantic web.

select ?pred ?value
where { 
  graph g:APNI_TAX_NAM { 
    <http://biodiversity.org.au/apni.name/277356>
        ?pred ?value .
  }
}
ORDER BY ?pred

You know, I’d really like to se the rdfs:labels for those predicates rather than the URLs

select ?lbl ?value
where { 
  graph g:APNI_TAX_NAM { 
    <http://biodiversity.org.au/apni.name/277356>
        ?pred ?value .
  }
  OPTIONAL {
    graph g:ibis_voc {
      ?pred rdfs:label ?lbl
    }
  }
}
ORDER BY ?pred

And we see that I haven’t done rdfs labels for some of them.

Of course, the really big thing is to search by name. And here we hit a snag. Let’s get specific about http://biodiversity.org.au/apni.name/277356 .

select ?lbl ?pred ?value
where { 
  graph g:APNI_TAX_NAM { 
    <http://biodiversity.org.au/apni.name/277356>
     tn:nameComplete 
     ?value .
  }
}

As you see, it definitely has a name and that name is Abacopteris aspera. So that means this should work. It should pull out a single row with none of the variables bound:

select ?lbl ?pred ?value
where { 
  graph g:APNI_TAX_NAM { 
    <http://biodiversity.org.au/apni.name/277356>
     tn:nameComplete 
     "Abacopteris aspera" .
  }
}

But no – no rows. This, however:

select ?nameComplete
where { 
  graph g:APNI_TAX_NAM { 
    <http://biodiversity.org.au/apni.name/277356>
     tn:nameComplete ?nameComplete .
    FILTER(?nameComplete = "Abacopteris aspera") .
  }
}

Works fine. So … to find the URIs with a name complete of “Abacopteris aspera”, we just use a filter, right? Not so fast! It takes a while to run. It’s pretty obvious that it’s not hitting the index.

(Continued …)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: