Enhancing Institutional Repositories One Line of Code at A Time

March 23rd, 2009 by Julian Cheal

Back in February I met with quiet a few UK HE institutional repository managers over a period of two weeks, the meetings were designed in order to gather some basic requirements on what additional light-weight software could be developed for repositories; that would aid not only repository managers, but also repository users and depositors.

Once back in the office I read over my notes that I took down from the meetings and wrote them up in a big list and pasted them through the wordle.net tool. What this gave me was a tag cloud of all of my notes, which looks like the following. (Click the image to view a larger version in a new window.)

Repository Wants List Wordle

Repository Wants List Wordle

However as you can see it’s hard to read due to the relative size of the words repository and repositories. Which you’d expect in a meeting about repositories, but not too helpful in this case. So here is an updated version with the repository words taken out.

Repository Wants List Wordle Without Repositories

Repository Wants List Wordle Without Repositories

In the meeting the main topics we asked about were:

  • Deposit
  • User Workflows
  • Metadata creation
  • Managing copyright
  • Preservation
  • Using the content of repositories
  • Integrating with institutional software
  • Integrating with other software, services and websites outside of the institution

Looking at the wordle output, it is easy to notice that the main topics discussed at the meetings where:

  • Search
  • REF
  • SWORD
  • Import
  • Arxiv.org (or any other type of subject specific repository)
  • Funder
  • Statistics
  • And so on

These are all good topics to start development on. During our first meeting we had a flip chart and drew out some ideas of a couple of applications, that could be developed for the desktop to aid in repository deposit and then one to deal with publications output.

The following are mockups of what such applications could look like.

Software Ideas

Desktop Repository Deposit Tool

The idea behind this is for a simple and easy to use application for academics to simply drag and drop their files into the application, fill out some basic metadata and simply click deposit. That’s all they’d need to do to deposit into their institutional repository. Behind the scenes SWORD will be taking care of talking to the repository and uploading the files, so then the repository manager can check the upload just like any other deposit.

Desktop Repository Deposit Tool

Desktop Repository Deposit Tool

Desktop Repository Deposit Tool Advanced Options

Desktop Repository Deposit Tool Advanced Options

Desktop Repository Deposit Tool Integrated Sherpa RoMEO Lookup

Desktop Repository Deposit Tool Integrated Sherpa RoMEO Lookup

Desktop Repository Deposit Tool Drag and Drop files

Desktop Repository Deposit Tool Drag and Drop files

Desktop Repository Deposit Tool Browse For Files

Desktop Repository Deposit Tool Browse For Files

Software Ideas

Multiple Feed RSS Publications Output

The idea behind this application is sometimes academics would like to have some control over their publications output and embed them into either their personal pages or departmental pages. This application would run on their desktop the user would copy and paste their own rss feeds from their institutional repository. The application would then show the feed in a list, which the user can then decide which publications they wish to view in their new feed. The idea is for an academic to include multiple input feeds maybe from different repositories, where they have items deposited. Also to create a formated output or a widget for ease of inclusion in an existing website.

Repository RSS Publication Output

Desktop Repository RSS Publication Output Tool

Repository RSS Publication Output Final Output

Desktop Repository RSS Publication Output Tool New RSS Feed

Next steps

This post is really the starting point of these ideas. What we really want is for user interaction, so please feel free to comment on any of the ideas mentioned in this post, or to add new ideas and discuss possible applications that can be written for UK repository users.

Desktop applications improving your web experience

March 11th, 2009 by md269

I have recently attended FOWA in Dublin where Matthew Ogles of Last.fm fame spoke about how we shouldn’t consider the desktop and the web to be disparate entities. While it is certainly true that a lot of applications that historically worked in the desktop space such as word processing, spreadsheets and even image manipulation are now available on the web, we shouldn’t ignore the desktop. In fact, Matthew demonstrated that by harnessing users desktop interactions they have managed to grow Last.fm.

Matthew highlighted the importance of the Last.fm API in facilitating interoperability and helping grow the website. Matthew focused on scrobbles for the rest of his talk. A scrobble is an indication that a user is listening to a given track and can be seen in various application such as Facebook or instant messaging applications. In this case a scrobble is fed back to Last.fm and appears on you Last.fm profile.

They are a few important points to note here about scrobbles. Matthew clarified that this is an activity that people want to perform as people are keen to “point out to their friends that they are a unique and individual musical snowflake.” He also pointed out that the information is running on the back of an action they already perform or in other words the submission is integrated into the work-flow.

Another way in which they use scrobbles is to grow the site. If someone is listening to a track that doesn’t appear on the Last.fm site they discover this and create a page for this artist or track, something which is great for search engines and site visibility.

Last.fm have now collected 30 billion scrobbles which is certainly a statistically significant set of crowd knowledge. Matthew made a very good case that the fact that someone listened to the track is probably a better indication of their liking a track than someone’s 0-5 star rating of a song which I completely agree with. I do however question if this approach would work with other resources. For example a web resource may just have an appealing title and be skimmed and dismissed rather than be a resource that many people read and digest.

In summary there were a lot of great ideas and concepts that we can apply to our environment but more on this later…

SWAP - Django Demonstrator

February 19th, 2009 by md269

In the course of working with the Scholarly Works Application Profile at UKOLN it has become apparent that it is often difficult for repository managers to understand how SWAP might work for them, as they have nothing concrete to work with. Because of this, it is difficult to ’sell’ the idea of SWAP to this important stakeholder-group and it has meant that SWAP is still largely untested in terms of usability. For this reason we have decided to investigate a concrete implementation of SWAP using a demonstrator that can be used to allow users to play with entering real world data to populate a SWAP profile.

I have recently been using Django to make quick prototypes and in this case I realised it would be the perfect tool to create the demonstrator with minimum fuss. Django has many great features including my favourite implementation of a template engine and object relational mapper (ORM) and an automagic administration interface. In this case we are interested in the latter two. The data model is defined in Python code and the database generated from these models. In most cases there is no need to write any SQL with the Django model API providing the functionality to alter objects. Having used several ORM’s in the past including Hibernate, which admittedly has a lot more to do with greater database support and more features, really appreciate the simplicity. The admin interface comes for free and can be used as is or customised with little effort. It is this admin interface that I am using as a SWAP demonstrator.

The different major elements of SWAP can be represented as objects and hence as a database table which in turn appears in the admin interface.

Django admin interface

Django admin interface

As we would expect each main element of the profile can be added and built up in order. In this example we are adding an expression to an existing Scholarly Work.

SWAP Expression

SWAP Expression

We will be running an interactive workshop for an invited group of repository managers to test our SWAP implementation, with a view to getting feedback about the usability of the profile itself. Any comments about this approach will be welcome!

Dev8D

February 17th, 2009 by md269

I have just returned from The Developer Happiness Days hosted in London which offered a unique approach to innovation and development. In most sectors, developers get raw deal. They are the voiceless and the downtrodden. They are found lurking in the windowless basements of corporate offices working into the early hours, pasty faced, tired and lonely with only LCD glare for company.

Melodrama aside, David Flanders has hosted a week long event that addressed the unbalance. The event aimed to and succeeded in, giving developers a voice and drawing them more in to the collaborative process of software and systems development.

Developers of varying vintage assembled in Bloomsbury keen to participate and network. The fresh faced developers were able to absorb the many lightning talks and presentations, with the experienced hosting these lightning sessions and sharing their knowledge. I think it is fair to say that there was something for everyone, with all the developers leaving that little bit wiser.

I found two of the sessions particularly useful. The first highlighted the pain that can be encountered when dealing with character encodings and why you should always program for unicode and demonstrated some best practices. We were shown why ignoring unicode will give you headaches further in the development cycle and shown that early and late encoding conversion using supplied pything api calls could alleviate much of the pain. The second by Mark van Harmelen - Designing with Sketches, Paper Prototypes, and Users showed a much faster, better (and fun) way to develop software designs and prototypes. By tearing up paper and moving Post-its about a far friendlier way of designing was demonstrated. As someone who previously moved in the complicated and sometimes restrictive world of UML this was incredibly valuable.

The participants were given poker chips to be used as currency. They could either be exchanged for beer on Wednesday or cashed in to enter a competition where the people with the most chips won a rather snazzy netbook. After the first day everyone soon became accustomed to handing over chips and interacting at a level I don’t think would have been reached otherwise.

We followed up with a Dragon’s Den inspired idea where the developers pitched software solutions to a panel. This was a light-hearted critique session that proved valuable for both the advice and formalising the pitch process. Having participated in the session I can report that it was both useful and good fun. Maybe next year we could get some fancy dress in the style of a Harry and Paul sketch… ok maybe not.

Is this a first, investing and listening to developers in such a public and large scale? I think this might just catch on and I certainly hope it does. A big thanks to David Flanders, Ben O’Steen and Rachel Bruce for a fantastic week.

Sherpa RoMEO AJAX Autocomplete

January 19th, 2009 by Julian Cheal

Depositing papers in an institutional repository usually requires manual entry of data into a web form by the depositor who is either an academic, clerical staff, or the repository manager. The next part of the workflow is for the item to then be validated by the repository manager, before being deposited into the repository.

What problems are there with manual entry?

The problem with manual entry of data by humans is that it leaves room for mistakes, typos and general incorrectness. Also manual entry is quite a laborious task. One textual section prone to mistakes is the publication details of an item. This section usually consists of a journal or publication title, an ISSN and the publisher. Getting these details correct is useful in knowing whether a paper can be deposited or not. Using a journal’s name as reference for looking up the publisher’s details for example a repository manager needs to know that the data is correct.

Can a paper be deposited in a repository?

Repository managers need to know whether copyright polices allow a paper to be deposited in to their institutional repository, or whether there is an embargo on the paper. If a paper can be deposited which version of the paper can they use? The pre-print or the post-print, or any other variation of the paper. Services such as Sherpa’s RoMEO have been created to give managers a tool for searching and retrieving such information.

The Sherpa RoMEO service is maintained by Sherpa and supported by JISC and the Wellcome Trust. It is a development that has grown out of the RoMEO project which aimed to produce a listing of journal publishers. The journal information is provided by the British Library’s Zetoc service which is hosted by MIMAS. Community contributions and Sherpa’s partners update the publisher information.

However even with services like RoMEO this still leaves the situation as it was before; manual entry of publisher information, being manual checked in RoMEO for any restrictions on an item. However there is now a better solution. AJAX autocompletion.

How it works:

  • The user starts typing in the journal or publication title
  • Local JavaScript sends the text via a http request to Sherpa RoMEO API
  • RoEMO returns the results back encoded in XML
  • Local xScript (either Perl or Java or other) parses the XML results
  • Local JavaScript then displays the results and the user chooses correct title from them
  • The publication section then gets filled in with correct data.
RoMEO API Lookup using AJAX

RoMEO API Lookup using AJAX

EPrints & DSpace

EPrints (current version 3.1.2) has built in autocomplete functionality using the http://script.aculo.us/ JavaScripts. DSpace (current version 1.5.1) currently does not have autocomplete included out of the box. There is a plugin http://wiki.dspace.org/index.php/Autosuggest_using_AJAX that (technical) users can install. Stuart Lewis is currently working on implementing autocomplete natively into DSPace.

Within the EPrints repository there are a couple of different approaches to autocompletion. One is what EPrints call authority lists, which are simply static files that site on the web server. http://wiki.eprints.org/w/Autocompletion_and_Authority_Files_(Romeo_Autocomplete)The EPrints autocomplete can search either the static authority lists or, functions that execute SQL queries. Using the authority lists can range from a simple implementation to a complex solution. EPrints users can also install a local version of the RoMEO database which it stores in an authority file. Using AJAX autocomplete, EPrints can look up the publisher information for a journal. However this data will need updating to add/remove publisher information as needed.

Another option is to use a Perl script http://lucas.ucs.ed.ac.uk/test/ajax-romeo.html by Ian Stuart at Edina. This is a Perl script which uses the Sherpa RoMEO API unlike the EPrints authority lists. Using the RoMEO API means that this method will have up-to-date publisher information. Installation of Ian’s script is very easy to do and instructions to do so are here http://wiki.iedemonstrator.org/confluence/display/ied/EPrints+Romeo+AJAX+lookup+widget

Why autocomplete is better?

An AJAX autocomplete form asynchronously requests data in the background without the need of submitting the page as is typical of most web forms. Therefore as the depositor types in the journal name or title the AJAX script requests a list of results from the Sherpa/Romeo database or the EPrints authority lists in ‘real time’.

If a depositor does not have JavaScript enabled in their web browser, the AJAX script will not load, therefore failing gracefully. Allowing the depositor to continue to enter the details manually.

Autocomplete is better as it allows the depositor to not have to look up the journal information separately in RoMEO while completing a deposit. Furthermore with manual entry user error can produce typos this method removes user error and should allow for faster more accurate deposits.

CRIG DRY Workshop

June 7th, 2008 by paulwalk

DRY CRIGOn Friday (06/06/2008) the IE Demonstrator project ran its first event for the in conjunction with CRIG. CRIG has, through the efforts of the WoCRIG team, assembled a strong group of institutional repository developers. We decided to exploit this for our first IE Demonstrator event by giving the event a focus on repositories. To this end we invited a few services, mainly from the JISC Information Environment (JISC IE), with potential appeal to repository developers to send a representative with a technical perspective to come and make an ”elevator pitch” to the assembled developers. Or, to put it another way, in Tony Hirst’’s words:

…the hardcore(?!) of the UK’’s academic repository hackers were plotting on how to get the most out of helper webservices being produced by other JISC funded webservices projects….

For the record, the services who courageously came and pitched were:

We chose the Parade Bar on the University of Bath’’s campus as the students have mostly left the campus for the summer vacation period. The facilities worked I think - and my main fear - that other customers would disrupt our activities - did not come to pass, although the last ”pitch” did have to compete with the background noise. The feedback I have had so far is very positive.

Some of the Q&A following the pitches was quite lively. My hope was that, along with the repository developers getting to see some service offerings which might tempt them, the service representatives might get some useful feedback from the developers. One undeniable and strong signal from the repository developers was broadcast with groans whenever anyone mentioned SOAP interfaces. There is a generation of JISC IE services which was developed when SOAP was the fashionable way of exposing the functionality of services to other services and applications - even when this was perhaps unnecessarily complex. With the current appetite for simple, RESTful interfaces, SOAP interfaces are increasingly poorly received by developers of repository and web services.

DRY CRIG

An event like this has its impact in the small moments - the serendipitous conversations, the quickly hacked demonstrations, the moments of epiphany. I experienced all of these - some quick examples:

  • Ian Ibbotson showed me his library for exposing a SOLR interface as a Z39.50 target. He knocked together a quick demo cross searching the index to the repository aggregation which UKOLN has recently developed for Intute, together with the index to Oxford University’’s repository developed by Ben O”Steen (who was also at the event). His comment to me was “Couldn”t have got this sorted without peoples” input here today, so thats cool!”
  • In conversation with James Reid and Tony Hirst, we started to explore the idea that datasets might be geoparsed and marked up in a ”just-in-time” (JIT) fashion, rather than applied to the entire dataset. In other words, enrich data at the point of outward-facing service, rather than at the point of creation or ingest, and only then when necessary. This way we avoid the overhead of maintaining an increasingly complex and ”heavy” source dataset.

I have asked participants to send in their moments like this, so we can begin to capture some of the knowledge and ideas generated at such events. My particular avenue of investigation, inspired by my conversation with James and Tony, will be the geoparsing of repository metadata.

Thanks to all who came, especially to the speakers, and to David Flanders for co-organising the event with us.

Welcome!

April 30th, 2008 by paulwalk

Welcome to the blog for the JISC IE Demonstrator Project. We”re just getting under way, but should soon be able to announce some plans for development and an event in the very near future. In the meantime, if you are interested in this project, your best bet is to subscribe to the RSS feed for this blog as we intend to use this mechanism as the primary channel for news and announcements about the project.