lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lukai <lukai1...@gmail.com>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Thu, 13 Dec 2012 02:08:16 GMT
Do we have any plan to decouple the index process?

Lucene was design for search, but according the question people ask in the
thread it beyonds search functionality sometimes. Like we might want to
customize our scoring function based on payload. Sometimes i dont need to
store TF/IDF information. We can pre-calculate features and store into the
system. But i still need to store the extra TF/IDF information. And
sometimes, i think we want to load the whole postings into memory to speed
up the performance. In that case, we really want to customize the
functionality/process of Inverted index. The main problem is, the
implementation is highly coupled with the index chain. It's not easy to
re-write a new one. Do we have plan to make the index chain change more
easier?

Flexible index chain logic, flexible codecs format.

Thanks,



On Fri, Nov 30, 2012 at 10:02 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Fri, Nov 30, 2012 at 12:25 PM, Wu, Stephen T., Ph.D.
> <Wu.Stephen@mayo.edu> wrote:
> > Is there any (preliminary) code checked in somewhere that I can look at,
> > that would help me understand the practical issues that would need to be
> > addressed?
> >
> > If I understand you correctly, it's a little different from what's
> happening
> > in your blog posts:
> >
> http://blog.mikemccandless.com/2012/07/building-new-lucene-postings-format.h
> > tml
> >
> http://blog.mikemccandless.com/2012/08/lucenes-new-blockpostingsformat-thank
> > s.html
> > Those posts deal with making your own codec, but not about changing
> what's
> > stored in the postings?  I guess I misunderstood "postings format"
> before.
>
> I don't know of any examples of adding an entirely new attribute to
> the postings, except via payloads.
>
> All the examples we have are of Codecs/PostingsFormats/etc. storing
> all the usual attributes (term & its stats (docFreq/totalTermFreq),
> doc, freq, position, offsets, payload) in "interesting" ways.
>
> Maybe we can make this more concrete: what new attribute are you
> needing to record in the postings and access at search time?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message