lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes.Lichtenberger" <>
Subject Re: Indexing and searching across versioned document collections
Date Fri, 09 Nov 2012 08:53:41 GMT
On 11/09/2012 09:41 AM, jake dsouza wrote:
> Hello,
> Has any one worked on making Lucene index and search versioned document
> collections i.e any corpus with multiple versions of documents similar to
> wikipedia or source code.
> I am working on a project to index and search versioned collections while
> keeping the index size minimum by taking into consideration differences in
> the versions to minimize the size of the index .
> Could some one direct me to any existing efforts to make Lucene work with
> versions .

Hello Jake,

I never found the time, but it's still on my todo list, for a versioned 
XML DBS[1]. But that is also my issue, I somehow would need the internal 
buckets or nodes or whatever index structure it uses. For instance with 
a PATRICIA trie it's very simple with my system, as I can just store the 
nodes, which are then versioned (CoW-principle such that only changed 
nodes are written, depending on the versioning strategy used (maybe also 
a bunch of nodes in a "page" which holds a set of nodes). I never 
figured out how todo this with Lucene, that's why I'm thinking about 
implementing or simply integrating a PATRICIA-trie and enhance an XQuery 
parser with fulltext capabilities.

However, _if_ it's possible with Lucene it would be great :-) That said 
it's open source and maybe anyone would have some value and is motivated 
to contribute, but that's just a wish ;-)

kind regards,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message