lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Gearon <>
Subject Re: OAI on SOLR already done?
Date Wed, 02 Feb 2011 22:19:58 GMT
Does something like this work to extract dates, phone numbers, addresses across 
international formats and languages?

Or, just in the plain ol' USA?

 Dennis Gearon

Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from ''

EARTH has a Right To Life,
otherwise we all die.

----- Original Message ----
From: Demian Katz <>
To: "" <>
Cc: Paul Libbrecht <>
Sent: Wed, February 2, 2011 12:40:58 PM
Subject: RE: OAI on SOLR already done?

I already replied to the original poster off-list, but it seems that it may be 
worth weighing in here as well...

The next release of VuFind ( is going to include OAI-PMH 
server support.  As you say, there is really no way to plug OAI-PMH directly 
into Solr...  but a tool like VuFind can provide a fairly generic, extensible, 
Solr-based platform for building an OAI-PMH server.  Obviously this is helpful 
for some use cases and not others...  but I'm happy to provide more information 
if anyone needs it.

- Demian
From: Jonathan Rochkind []
Sent: Wednesday, February 02, 2011 3:38 PM
Cc: Paul Libbrecht
Subject: Re: OAI on SOLR already done?

The trick is that you can't just have a generic black box OAI-PMH
provider on top of any Solr index. How would it know where to get the
metadata elements it needs, such as title, or last-updated date, etc.
Any given solr index might not even have this in stored fields -- and a
given app might want to look them up from somewhere other than stored

If the Solr index does have them in stored fields, and you do want to
get them from the stored fields, then it's, I think (famous last words)
relatively straightforward code to write. A mapping from solr stored
fields to metadata elements needed for OAI-PMH, and then simply
outputting the XML template with those filled in.

I am not aware of anyone that has done this in a
re-useable/configurable-for-your-solr tool. You could possibly do it
solely using the built-in Solr
JSP/XSLT/other-templating-stuff-I-am-not-familiar-with stuff, rather
than as an external Solr client app, or it could be an external Solr
client app.

This is actually a very similar problem to something someone else asked
a few days ago "Does anyone have an OpenSearch add-on for Solr?"  Very
very similar problem, just with a different XML template for output
(usually RSS or Atom) instead of OAI-PMH.

On 2/2/2011 3:14 PM, Paul Libbrecht wrote:
> Peter,
> I'm afraid your service is harvesting and I am trying to look at a PMH provider 
> Your project appeared early in the goolge matches.
> paul
> Le 2 févr. 2011 à 20:46, Péter Király a écrit :
>> Hi,
>> I don't know whether it fits to your need, but we are builing a tool
>> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
>> with OAI-PMH and index the harvested records into Solr. The records is
>> harvested, processed, and stored into MySQL, then we index them into
>> Solr. We created some ways to manipulate the original values before
>> sending to Solr. We created it in a modular way, so you can change
>> settings in an admin interface or write your own "hooks" (special
>> Drupal functions), to taylor the application to your needs. We support
>> only Dublin Core, and our own FRBR-like schema (called XC schema), but
>> you can add more schemas. Since this forum is about Solr, and not
>> applications using Solr, if you interested this tool, plase write me a
>> private message, or visit, or the
>> module's page at
>> Hope this helps,
>> Péter
>> eXtensible Catalog
>> 2011/2/2 Paul Libbrecht<>:
>>> Hello list,
>>> I've met a few google matches that indicate that SOLR-based servers implement

>>>the Open Archive Initiative's Metadata Harvesting Protocol.
>>> Is there something made to be re-usable that would be an add-on to solr?
>>> thanks in advance
>>> paul

View raw message