lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: who clears attributes?
Date Tue, 11 Aug 2009 12:22:42 GMT
I think extensible analysis (the new TokenStream API) is a net
positive: it gives us strongly typed and high performance
extensibility to a Token, so apps can now add whatever attrs they

And, I see it as the first (of 3) big "legs" that we need to reach
flexible indexing.  We really have to do flexible indexing piece-meal
since it's so big.

The flexible indexing chain (still package private, but otherwise
"done") is the 2nd leg, allowing you to pull whatever app-specific
attrs you've created during analysis, and get them into the index in
some manner.

The final leg is LUCENE-1458, which has seen good progress (eg, I got
it to the point where I had a pulsing codec working well, for inlining
low-freq terms directly into the terms dict), but I need to get back
to it, modernize it, iterate, etc.  That API enables you to make your
own codecs to write/read stuff in the index.

Once we get that finished, I think we finally have the basic full
infrastructure in place for flexible indexing.

I think what's happening now is people are really starting to dig into
the new stuff.  I've been drilling into the new QueryParser, and
besides a few small issues (mostly different defaults), it looks solid
and very configurable/extensible.  Solr & others have been digging
into the extensible analysis API, and I think of all features in 2.9,
the extensible analysis API has received the most hardening.  Hoss and
Mark have been drilling on the "long tail" of the impact of
per-segment searching & collection, uncovering sneaky "explain"
challenges and others.  I think this is all healthy, to be expected,

I do still think a longish 2.9 beta is warranted, if we can succeed in
getting users outside the dev group to kick the tires and uncover


On Tue, Aug 11, 2009 at 7:31 AM, Mark Miller<> wrote:
> Earwin Burrfoot wrote:
>>>> The only person that tried to disprove this claim is Uwe. Others
>>>> either say "the problems are solved, so it's okay to move to the new
>>>> API", or "this will be usable when flexindexing arrives".
>>> Others (not me) have spent a lot of time going over this before (more
>>> than
>>> once I think) - they prob are just sick of retyping. Lots of searchable
>>> archives out there though.
>> Okay, I'll dig into them. Sorry for being a bother.
> Your not being a bother - sorry if I came off that way. Didn't mean to. I
> just know a lot of the reasons for the API switch have been discussed
> before, and much if it has not come up again in this discussion.
> If you felt the tone of that email was anything but trying to throw out some
> info, I apologize. Not trying to squash this current debate at all.
> --
> - Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message