COinS, OpenURL, etc


Oh those librarians! I have encountered them before, dealing with OAI-PMH, a spec that mandates how to declare your XML namespaces in the documents you hand around.

They have another one: OpenURL.

This is all about these two posts:

Naturally, the boss would like all of this cool stuff available on our site at Biodiversity.org.au. I have noticed that scientists are a trifle competitive.

Hmm.


OpenURL seems to be a way to embed a data record in a querystring. Nasty. Equivalently: it is a set of well-known querystring parameters pertaining to things available in libraries, specifically scientific books, journals, and journal articles. Part of the standard is a

  • OpenURLs for books are described here.
  • OpenURLs for journal and journal articles are described here.

The important bits are the metadata fields

  • url_ver=Z39.88-2004 – this is an OpenURL version 1.0 querystring
  • rft_val_fmt=info:ofi/fmt:kev:mtx:journal- we are using the journal openurl format

The other important bit is the DOI field, which both formats include.


COinS is a way to attach metadata to blocks of text on an HTML page. You have a span element with a class and a title. The class specifies that this is a coin and what format/namespace the coin belongs to. Not sure where the spec for that is, or the list of well-known values. The class for OpenURLs is Z3988. The title contains the data itself – in this case, the OpenURL.


Finally, webhooks. As far as I can tell, a webhook is simply a url that you can call with some parameters. The spec (such as it is) insists on POST requests. I don’t know why. The spec also does not state what should be – you know – *in* the post request. I rather suspect that it assumes application/www-form-encoded key/value pairs and fails to say so because someone or other is not aware that a POST can contain anything. And it doesn’t seem to specify any way of specifying what the parameters are.

It really seems like someone has re-discovered web services and gotten excited about ’em. The only thing of note is that there’s a client/server switcheroo. When you register a webhook url with google docs, conceptually google is “upstream”, but mechanically it is a client calling a service you implement. it’s just a push subscription.


For both the OpenURL specification and for these webhook things in general, it seems to me that calling ’em a webservice and using WSDL to document them is the way to go. Hence my whinge about librarians at the start – like the OAI-PMH spec, the guys at OpenURL don’t seem to be aware of existing standards. In particular, if it was defined at the WSDL level, they wouldn’t need to go into detail about the fact that you need to URL-encode the values: it’s implied by the binding type. Having words in the OpenURL spec explaining url encoding is exactly the same kind of mistake as having words in the OAI-PMH spec specifying how to declare the oai namespace in your XML.

Sigh.

So Rod Page has a OpenURL resolver service. When it resolves an OpenURL, it has an interface by which the user can go “Yes, I meant *this one*” . That interface looks at the referring entity identifier on the OpenURL, gets the WebHook (from where? Is in in the OpenURL? There’s no spot for that in the info:ofi/fmt:kev:mtx:journal format. Is it in some other querystring parameter? ) and sends to that webhook … I don’t know. Possibly the OpenURL with all the fields filled in. Or maybe with just the DOI, which is the bit that this demo was interested in.

In other words: I am rather getting the impression that we have a custom bit of software that talks to another custom bit of software.

Oh well. Adding COinS to our publication records, and even to internal links, is doable. A snag is that the firefox plugin does not display the content of the span but replaces it with an image. Gahh! Who TF thought that that was a good idea??? The idea of COinS is that you wrap something – the reference citation or whatever. If the firefox plugin is running, it will actually not show the user the citation text if it’s wrapped. Goddamn …

Ok. How about a service that allows a client to attach an OpenURL to a publication record? Damn it, damn it, damn it: the webhooks docs states specifically that it has to be a POST request. Why? I don’t know. An OpenURL resolved against a base service URL is a GET request with a querystring. 3 options:

  1. Ignore the “webhooks” guys and accept the parameters as a GET request
  2. Ignore the OpenURL guys and take a POST with the parameters in the OpenURL as www-form-encoded parameters. That is: not as a URL as such.
  3. Accept a www-form-encoded form with a single in-house parameter named “openURL”, which has to be an openURL … which may or may not have to be be URL-encoded. This satisfies no-one, and involved making up our own spec on top of these others

All of which could have been avoided if they’d used WSDL and not specified that OpenURLs must be URLs, but were WSDL “Messages”. And if the WebHooks guys had admitted that a WebHook is when you allow a user to give you a URL with which to do this:

<wsdl:definitions>
    <wsdl:import namespace="http://hookspec"/>
    <wsdl:service name="Freds callback">
        <wsdl:port name="Callback"
                binding="hookspec:httpBinding">
            <http:address location="http://fred.com/webhook1"/>
        </wsdl:port>
    </wsdl:service>
</wsdl:definitions>  

Maybe I’ve gotten it all wrong. Wouldn’t be the first time.


UPDATE: found the spec. It’s at http://www.niso.org/kst/reports/standards?step=2&gid=None&project_key=d5320409c5160be4697dc046613f71b9a773cd9e, obviously.

Advertisements

8 Responses to COinS, OpenURL, etc

  1. Rod Page says:

    Paul,

    To my eyes WSDL is as bad as, if not worse than, OpenURL. SOAP lost the web service battle to REST, in part because it makes the simple stuff way too much hassle. Just point a URL at something and getting data beats formulating XML documents to get other XML documents.

    I agree that OpenURL is ugly as hell (see http://iphylo.blogspot.com/2007/05/amnh-dspace-and-openurl.html ), in part because it’s modelling a whole range of possible objects, as well as service relationships. Hence it’s vastly over-engineered for 99% of what people use it for.

    Regarding COinS, these are not intended to be visible in web pages, the <span> tag is invisible as it as no content (just attributes). The Firefox OpenURL referrer add on finds these and can insert an image to click on, or a text string, depending on the user’s preferences. The idea is the user clicks on this to go to the OpenURL resolver. The original web page is still expected to display the citation data (in whatever form it wants), the COinS only show up if a browser extension is installed (or a tool such as Zotero is installed, whcih can extract information from the COinS). I hard-coded displaying a text link in the AFD CoucnDB demo simply because I use Safari as my defaul browser, and there’s no COinS extension for Safari.

    The web hooks idea appeals to me simply as a way to connect original web site to the results of the OpenURL resolver. I’m assuming the web hooks “spec” uses POST to avoid placing a limit on how much information to send to the hook. I’m assuming the web hooks people avoided SOAP because that would have killed the idea stone dead. Anyone can write a script that receives a POST request and does something. Simple and open wins.

    The OpenURL resolver I show in the demo is simply an attempt to explore ways to feed the resolver results back to the originator of the request, but allowing for user interaction. When originally populating the AFD CouchDB database I used the default BioStor OpenURL resolver as a web service: I called the OpenURL using HTTP GET with a parameter telling it I wanted JSON. If the JSON response included a BioStor reference id I added that to the AFD database. This worked well, but there are cases where the OpenURL couldn’t find a match, for example if the metadata in AFD was wrong (or incorrectly parsed). I could search for those manually using the OpenURL resolver’s web interface, but then how do I add those to AFD? I could do this manually by editing the database, but I think it’s more elegant to “tell” the database that I’ve found the record. Hence web hooks.

    Hope this makes the motivation a bit clearer.

    Rod

    • Paul Murray says:

      According to http://ocoins.info , “the web page might have default text inside the span for users without access to activating agents”. But that does seem to indicate that if you do have the activating agent, then it gets stripped … ok, fine. No problem there.

      WSDL is not necessarily about SOAP. It’s a general way to define interfaces: the available services and the types they pass around. You can then specify that there are SOAP, http GET/POST – heck, *email* – implementations of those interfaces at physical locations.

      The power of it is in the toolset. If I were to plonk a service at a URL with parameters and to describe them with a WSDL, you can use a tool to generate java code which will make that service look like a remote object with a java interface that you simply call. There’s support in other languages too.

      Oh, I found the OpenURL specification, by the way. Added the link to this blog post.

      • Rod Page says:

        Yes, my comments on SOAP may have been a red herring (and I was scarred from trying to hand craft WSDLs a few years back). I guess it also reflects what layer you program at. Do you embed web services calls within a larger project, treating them as just another function call, or do you write scripts that essentially speak HTTP and JSON, so anything that gets in the way of that seems like overhead. I’ve ended up doing the latter.

  2. Greg Whitbread says:

    Thanks Paul. a valuable contribution to our take on this approach. But competitive? The brief was – investigate this. Is it a way to leverage the work Rod is doing to map AFD references into BHL and onto DOI’s as updates to associated links in AFD? Two way communication through the service layer has always been part of the spec.

    greg

    • Roderic Page says:

      We could use web hooks as a way to push updates from my version of AFD to AFD in Australia (as opposed to Australia pulling them from my server by regularly polling, say, a RSS feed). For example, if a record is updated on my CouchDB version of AFD, my code could send new updated data to a web hook in Oz, saying “here’s some updated data”, and you could elect to accept it or ignore it. There’s various ways of doing this, but I’m keen on anything that avoids data in silos, and enables annotations and links added in one place to be made more widely available, in this case including the original data provider.

  3. Paul Murray says:

    I could write a web-hook (or web service) that takes an RDF triple and stores it somewhere. Perhaps we’d limit the RDF verbs that you were allowed to use to just a subset of the dcmi terms.

    Or we could go for a more structured approach and store a data block, say: URI, timestamp, who-says-so, URI type (essentially, the RDF predicate), and perhaps a comment.

    The main thing is who-says-so, although not having a HTML form will limit the amount of spam. I’ll briefly have a look at openid. The other possibility is simply a list of authorised posters that is manually maintained.

    I attempted to do COinS in the xsl, but the XSLT processor included with exist is only 1.0 and does not have a url-encode function, meaning that titles or authors with ampersands don’t get done correctly. This means that the OpenURL will have to be generated on the back end … or I could try to explain to eXist about XSLT2.

    Generating it on the back-end is problematic as we are re-engineering the XML and don’t want to release the results until we are happy with it.

  4. Eric Hellman says:

    A nicer URI for the spec is http://www.niso.org/standards/z39-88-2004

    Also, POST is fine for OpenURL.

    The discussion of how agents should process a span ended up the way it did was because there was no other clean way for the author of a document to specify that content in the span was a default that should be overwritten. It was also felt that to match user expectation for OpenURL links, the span should be used to indicate where a link should be added, not as an annotation to a block of citation text.

    The implementation issue with XSLT is sticky; perhaps it could be addressed in javascript. Putting the metadata in a URI element and using HTML output method in XSLT would not have solved the problem, if it’s any consolation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: