Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <A902805D9E8F66428EACF947DDA9840C08F3CF5911@ITCS-ECLS-1-VS3.adsroot.itcs.umich.edu>
References: 
 <A902805D9E8F66428EACF947DDA9840C08F3CF5911@ITCS-ECLS-1-VS3.adsroot.itcs.umich.edu>
Date: Tue, 5 Oct 2010 06:20:33 -0400
Message-ID: <AANLkTi=J=Y894uWdS=Yun=PWcVB7OcuiudHdL4csuA7K@mail.gmail.com>
Subject: Re: Flex indexing : Hybrid index maintnenance for faster indexing
From: Michael McCandless <lucene@mikemccandless.com>
To: dev@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Nice paper!

It's a neat trick to index the large postings as separate files, ie
let the fileystem handle the growth as new postings are appended
over time.

But, unfortunately, we can't easily do this in Lucene, since Lucene
assumes index files are write once, and derives its transactional
semantics from this approach.  Ie, this would require sizable changes,
beyond just swapping in a different Codec.

Still, the idea that small/big postings lists should be handled
differently is something we can take advantage of in a Codec, and I
think we should.  I think likely we will switch to a default codec
that uses pulsing (storing term's postiugs directly in terms dict) for
very low freq terms, maybe vInt for medium freq terms, and FOR/PFOR
for high freq terms.

Mike

On Mon, Oct 4, 2010 at 6:42 PM, Burton-West, Tom <tburtonw@umich.edu> wrote=
:
> Hi all,
>
> Would it be possible to implement something like this in Flex?
>
>
> B=FCttcher, S., & Clarke, C. L. A. (2008). Hybrid index maintenance for c=
ontiguous inverted lists. Information Retrieval, 11(3), 175-207. doi:10.100=
7/s10791-007-9042-8
>
> The approach takes advantage of having a different policy for large posti=
ngs lists (ie frequent terms) =A0versus small postings lists for flushing t=
he buffer and writing to disk.
>
>
> Tom Burton-West
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org