lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes.Lichtenberger" <Johannes.Lichtenber...@uni-konstanz.de>
Subject Re: Indexing and searching across versioned document collections
Date Fri, 09 Nov 2012 08:53:41 GMT
On 11/09/2012 09:41 AM, jake dsouza wrote:
> Hello,
>
> Has any one worked on making Lucene index and search versioned document
> collections i.e any corpus with multiple versions of documents similar to
> wikipedia or source code.
> I am working on a project to index and search versioned collections while
> keeping the index size minimum by taking into consideration differences in
> the versions to minimize the size of the index .
>
> Could some one direct me to any existing efforts to make Lucene work with
> versions .

Hello Jake,

I never found the time, but it's still on my todo list, for a versioned 
XML DBS[1]. But that is also my issue, I somehow would need the internal 
buckets or nodes or whatever index structure it uses. For instance with 
a PATRICIA trie it's very simple with my system, as I can just store the 
nodes, which are then versioned (CoW-principle such that only changed 
nodes are written, depending on the versioning strategy used (maybe also 
a bunch of nodes in a "page" which holds a set of nodes). I never 
figured out how todo this with Lucene, that's why I'm thinking about 
implementing or simply integrating a PATRICIA-trie and enhance an XQuery 
parser with fulltext capabilities.

However, _if_ it's possible with Lucene it would be great :-) That said 
it's open source and maybe anyone would have some value and is motivated 
to contribute, but that's just a wish ;-)

kind regards,
Johannes

[1] https://github.com/JohannesLichtenberger/sirix


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message