lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Earwin Burrfoot (JIRA)" <>
Subject [jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
Date Mon, 22 Jun 2009 19:36:07 GMT


Earwin Burrfoot commented on LUCENE-1701:

>>> Design for today.
>> And spend two years deprecating and supporting today's designs after you get a better
thing tomorrow. Back-compat Lucene-style and agile design aren't something that marries well.
>> donating something to Lucene means casting it in concrete.
> We can't let fear of back-compat prevent us from making progress.
My point was that strict back-compat prevents people from donating work which is not yet finalized.
They either lose comfortable volatility of private code, or have to maintain two versions
of it - private and Lucene.

>> NRT seems to tread the same path, and I'm not sure it's going to win that much turnaround
time after newly-introduced per-segment collection.
> I agree, per-segment collection was the bulk of the gains needed for
> NRT. This was a big change and a huge step forward in simple reopen
> turnaround.
I vote it for the most frustrating (in terms of adopting your custom code) and most useful
change of 2.9 :)

> But, not having to write & read deletes to disk, not commit (fsync)
> from writer in order to see those changes in reader should also give
> us decent gains. fsync is surprisingly and intermittently costly.
I'm not sure this can't be achieved without messing with IR/W guts so much. Guys from LinkedIn
that drive this feature (if i'm not mistaken), they had a prior solution with separate indexes,
one on disk, one in RAM. Per-segment collection adds superfast reopens and MultiReader that
is way greater than MultiSearcher - you can finally do adequate fast searches across separate
indexes. Do we still need to add complexity for minor performance gains?

> And this integration lets us take it a step further with LUCENE-1313,
> where recently created segments can remain in RAM and be shared with
> the reader.

>> Some time ago I finished a first version of IR plugins, and enjoy pretty low reopen
times (field/facet/filter cache warmups included). (Yes, I'm going to open an issue for plugins
once they stabilize enough)
> I'm confused: I thought that effort was to make SegmentReader's
> components fully pluggable? (Not to actually change what components
> SegmentReader is creating). EG does this modularization alter the
> approach to NRT? I thought they were orthogonal.
Yes, they are orthonogal. This was yet another praise to per-segment collection and an example
of how this approach can be extended on your custom stuff (like filtercache).

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache,
move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>                 Key: LUCENE-1701
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch,
LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch,
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document
specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory,
that creates a preconfigured Field instance with norms and tf off, optionally a stored text
(LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField
could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new
SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation,
it would be good to have the static parsers in FieldCache public available. SortField would
init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator
has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code
cleaner and we would be able to hide the "hack" StopFillCacheException by making it private
to FieldCache (currently its public because NumericUtils is in o.a.l.util).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message