lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Flex indexing : Hybrid index maintnenance for faster indexing
Date Tue, 05 Oct 2010 10:20:33 GMT
Nice paper!

It's a neat trick to index the large postings as separate files, ie
let the fileystem handle the growth as new postings are appended
over time.

But, unfortunately, we can't easily do this in Lucene, since Lucene
assumes index files are write once, and derives its transactional
semantics from this approach.  Ie, this would require sizable changes,
beyond just swapping in a different Codec.

Still, the idea that small/big postings lists should be handled
differently is something we can take advantage of in a Codec, and I
think we should.  I think likely we will switch to a default codec
that uses pulsing (storing term's postiugs directly in terms dict) for
very low freq terms, maybe vInt for medium freq terms, and FOR/PFOR
for high freq terms.

Mike

On Mon, Oct 4, 2010 at 6:42 PM, Burton-West, Tom <tburtonw@umich.edu> wrote:
> Hi all,
>
> Would it be possible to implement something like this in Flex?
>
>
> Büttcher, S., & Clarke, C. L. A. (2008). Hybrid index maintenance for contiguous
inverted lists. Information Retrieval, 11(3), 175-207. doi:10.1007/s10791-007-9042-8
>
> The approach takes advantage of having a different policy for large postings lists (ie
frequent terms)  versus small postings lists for flushing the buffer and writing to disk.
>
>
> Tom Burton-West
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message