lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Thu, 13 Dec 2012 11:33:37 GMT
On Wed, Dec 12, 2012 at 9:08 PM, lukai <lukai1984@gmail.com> wrote:
> Do we have any plan to decouple the index process?
>
> Lucene was design for search, but according the question people ask in the
> thread it beyonds search functionality sometimes. Like we might want to
> customize our scoring function based on payload. Sometimes i dont need to
> store TF/IDF information. We can pre-calculate features and store into the
> system. But i still need to store the extra TF/IDF information. And
> sometimes, i think we want to load the whole postings into memory to speed
> up the performance. In that case, we really want to customize the
> functionality/process of Inverted index.

Much of this can already be done with Lucene.  Eg, plug in your own
Similarity to get custom scoring (and we already have a bunch of
standard models ... TF/IDF (default), BM25, DFR, language models,
etc.).  Use MemoryPostingsFormat to pull everything into RAM.
Customize other parts of the index using your own Codec.

> The main problem is, the
> implementation is highly coupled with the index chain. It's not easy to
> re-write a new one. Do we have plan to make the index chain change more
> easier?
>
> Flexible index chain logic, flexible codecs format.

The indexing chain, which is inside IndexWriter and processes each
document into temporary RAM structures and then writes a new segment
via the Codec API, can in fact be changed, but it's extremely expert
and the APIs are not documented (you must read the source code to work
through it).

That said, customizing the chain is rarely really necessary ...
typically existing pluggability (payloads, Sims, custom codec) can
solve most problems.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message