lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Jain" <Eric.J...@isb-sib.ch>
Subject Re: Lucene as a high-performance RDF database.
Date Mon, 11 Aug 2003 06:47:48 GMT
> I have been giving some thought to using Lucene as an RDF database.
> I'm specifically thinking about the RDF model and not the RDF syntax.

Excellent idea! In fact, I am going to try it out right now...


> Can anyone see any problems here?  This database will eventually grow
> to around 2TB in the next month so performance issues are non-trivial.

We have something like 20 million statements per gigabyte (RDF/XML).
Even if what you store into the index is more compact (let's say 2
million statements per gigabyte), with terrabytes of data you might
eventually run out of document identifiers (positive integers).


I have previously done some tests with storing RDF into a relational
database. Following is a list of fields I used; I believe the same
should work for a Lucene index.

  model_ns
  model
  statement_ns
  statement
  subject_ns
  subject
  predicate_ns
  predicate
  resource_ns
  resource
  literal

model_ns/model is used to create logical groups of statements. Either
resource_ns/resource or literal is used to store the object of the
statement. Latter can only be queried properly when using a database
that supports large text fields such as MySQL or PostreSQL; obviously
not an issue for Lucene.


--
Eric Jain


Mime
View raw message