lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: New facet module
Date Sat, 09 Jul 2011 11:13:11 GMT
Actually I think the faceting module is per-segment?

The facets are encoded into payloads, and then it visits the payload
of each hit right per segment, and aggregates the counts.

Like, on reopen (NRT or not) of a reader, there are no global data
structures that must be recomputed.  EG, this facets impl doesn't use
FieldCache on the global reader (leading to insanity....).

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 9, 2011 at 12:40 AM, Shai Erera <serera@gmail.com> wrote:
> Well, the approach is entirely different, and the new module
> introduces features not available in the other impls (and I imagine
> vice versa).
>
> The taxonomy is managed on the side, hence why it is global to the
> 'content' index. It plays very well with NRT, and we in fact have
> several apps that use the module in an NRT environment.
>
> The taxonomy index supports NRT by itself, by using the IR.open(IW)
> API and then it's up to the application to manage its content index
> search as NRT.
>
> I think you should read the high-level description I put on
> LUCENE-3079 and the userguide I put on LUCENE-3261. As I said, the
> approach is quite different than the bitset and FieldCache ones.
>
> Shai
>
> On Saturday, July 9, 2011, Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
>>> The taxonomy is global to the index, but I think it will be
>>> interesting to explore per-segment taxonomy, and how it can be used to
>>> improve indexing or search perf (hopefully both)
>>
>> Right so with NRT this'll be an issue.  Is there a write up on this?
>> It sounds fairly radical in design.  Eg, I'm curious as to how it
>> compares with the bit set and un-inverted field cache based faceting
>> systems.
>>
>> On Fri, Jul 8, 2011 at 8:44 PM, Shai Erera <serera@gmail.com> wrote:
>>> Currently it doesn't facet per segment, because the approach it uses
>>> is irrelevant to per segment.
>>>
>>> It maintains a count array in the size of the taxonomy and every
>>> matching document contributes to the weight of the categories it is
>>> associated with, orregardless of the segment it is found in.
>>>
>>> The taxonomy is global to the index, but I think it will be
>>> interesting to explore per-segment taxonomy, and how it can be used to
>>> improve indexing or search perf (hopefully both).
>>>
>>> Shai
>>>
>>> On Saturday, July 9, 2011, Jason Rutherglen <jason.rutherglen@gmail.com>
wrote:
>>>> Is it faceting per-segment?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message