lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <bur...@newsmonster.org>
Subject Lucene as a high-performance RDF database.
Date Sun, 10 Aug 2003 22:33:07 GMT
I have been giving some thought to using Lucene as an RDF database.   
I'm specifically thinking about the RDF model and not the RDF syntax.

Essentially this would just comprise triples encoded in a document as 
fields.

So for example we would have subject predicate and object relationships 
as document fields.  Subject and predicates would be Tokens and then the 
object field would be indexed.

For example a triple (document) would be:

    http://jakarta.apache.org -> title -> "A great Java developer's website"

This would be just one document in the index.

This would have a lot of advantages most importantly speed and the 
reliability of Lucene and the ability to run a full text query on objects.

For example we could query on "Java" and get back 
"http://jakarta.apache.org"

The major downside I could see is that this would mean that we would be 
indexing a LOT of small documents with a LOT of index updates.

Can anyone see any problems here?  This database will eventually grow to 
around 2TB in the next month so performance issues are non-trivial.

Most people have deployed Lucene with large document sizes and the fact 
that most people are citing document COUNT makes me nervous.

Kevin



Mime
View raw message