lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra R <chithu.r...@gmail.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Mon, 28 Nov 2016 11:01:13 GMT
Hi,
         When opening SortedSetDocValuesReaderState at search time, whether
the whole doc value files (.dvd & .dvm) information are loaded in memory or
specified field information(say $facets field) alone load in memory?




Any help is much appreciated.


Regards,
Chitra

On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r111@gmail.com> wrote:

>
> Kindly post your suggestions.
>
> Regards,
> Chitra
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r111@gmail.com> wrote:
>
>> Hey, I got it clearly. Thank you so much. Could you please help us to
>> implement it in our use case?
>>
>>
>> In our case, we are having dynamic index and it is variable depth too. So
>> flat facet is enough.No need of hierarchical facets.
>>
>> What I think is,
>>
>>
>>    1. Index my facet field as normal doc value field, so that no special
>>    operation (like taxonomy and sorted set doc values facet field) will be
>>    done at index time and only doc value field stores its ordinals in their
>>    respective field.
>>    2. At search time, I will pass query (user search query) , filter
>>    (path traversed list)  and collect the matching documents in
>>    Facetscollector.
>>
>>    3. To compute facet count for the specific field, I will gather those
>>    resulted docs, then move through each segment for collecting the matching
>>    ordinals using AtomicReader.
>>
>>
>> And know when I use this means, can't calculate facet count for more than
>> one field(facet) in a search.
>>
>> Instead of loading all the dimensions in DocValuesReaderState (will take
>> more time and memory) at search time, loading specific fields will take
>> less time and memory, hope so. Kindly help to solve.
>>
>>
>> It will do it in a minimal index and search cost, I think. And hope this
>> won't put overload at index time, also at search time this will be better.
>>
>>
>> Kindly post your suggestions.
>>
>>
>> Regards,
>> Chitra
>>
>>
>>
>>
>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> I think you've summed up exactly the differences!
>>>
>>> And, yes, it would be possible to emulate hierarchical facets on top
>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>>
>>> But if it's variable depth, it's trickier (but I think still
>>> possible).  See e.g. the Committed Paths drill-down on the left, on
>>> our dog-food server
>>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r111@gmail.com> wrote:
>>> > case 1:
>>> >         In taxonomy, for each indexed document, examines facet label ,
>>> > computes their ordinals and mappings, and which will be stored in
>>> sidecar
>>> > index at index time.
>>> >
>>> > case 2:
>>> >         In doc values, these(ordinals) are computed at search time, so
>>> there
>>> > will be a time and memory trade-off between both cases, hope so.
>>> >
>>> >
>>> > In taxonomy, building hierarchical facets at index time makes faceting
>>> cost
>>> > minimal at search time than flat facets in doc values.
>>> >
>>> > Except (memory,time and NRT latency) , Is any another contrast between
>>> > hierarchical and flat facets at search time?
>>> >
>>> >
>>> > Kindly post your suggestions...
>>> >
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r111@gmail.com>
>>> wrote:
>>> >>
>>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>>> >> facets during indexing. Hope hierarchical in the sense, we might
>>> index the
>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>>> maintained
>>> >> in sidecar index and it is mapped to the main index.
>>> >>
>>> >> For example:
>>> >>
>>> >>                 In search-lucene.com , I enter a term (say facet),
>>> top
>>> >> documents and their categories are displayed after performing the
>>> search.
>>> >> Say I drill down through Publish date/2010 to collect its child
>>> counts and
>>> >> after I will pass through publishdate/2010/10 to collect their child
>>> counts.
>>> >> And for each drill down, each search will be performed to collect its
>>> top
>>> >> docs and categories.
>>> >>
>>> >>
>>> >>                Even I can achieve this in flat facets by changing the
>>> >> drill down query.
>>> >>
>>> >> Am I right or missed anything? yet I don't know if I missed
>>> anything...
>>> >>
>>> >> So What is the need of hierarchical facets? Could you please explain
>>> >> it(hierarchical facets) in the real-world use case?
>>> >>
>>> >>
>>> >> Regards,
>>> >> Chitra
>>> >>
>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>>> >> <lucene@mikemccandless.com> wrote:
>>> >>>
>>> >>> You store dimension + string (a single value path, since it's not
>>> >>> hierarchical) into SSDVFF so that you can compute facet counts,
>>> either
>>> >>> ordinary drill down counts or the drill sideways counts.
>>> >>>
>>> >>> You can see examples of drill sideways at
>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of
>>> those
>>> >>> fields on the left and you don't lose the previous facet counts
for
>>> >>> that field.
>>> >>>
>>> >>> Mike McCandless
>>> >>>
>>> >>> http://blog.mikemccandless.com
>>> >>>
>>> >>>
>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r111@gmail.com>
>>> wrote:
>>> >>> > Hi,
>>> >>> >
>>> >>> > Lucene-Drill sideways
>>> >>> >
>>> >>> > jira_issue:LUCENE-4748
>>> >>> >
>>> >>> >                                  Is this the reason( ie Drill
>>> sideways
>>> >>> > makes
>>> >>> > a very nice faceted search UI because we
>>> >>> > don't "lose" the facet counts after drilling in) behind storing
>>> path
>>> >>> > and
>>> >>> > dimension for the given SSDVF field? Else anything?
>>> >>> >
>>> >>> > Regards,
>>> >>> > Chitra
>>> >>> >
>>> >>> >
>>> >>> >      Hey, thank you so much for the fast response, I agree
NRT
>>> refresh
>>> >>> > is
>>> >>> > somewhat costly operations and this is the major pitfall, suppose
>>> we
>>> >>> > use doc
>>> >>> > value faceting.
>>> >>> >
>>> >>> >
>>> >>> >                  While indexing SortedSetDocValuesFacetField
, it
>>> >>> > stores
>>> >>> > path and dimension of the given field internally. So Can we
achieve
>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of
storing
>>> path
>>> >>> > and
>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we
can
>>> achieve
>>> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>>> >>> >  Else I missed anything?
>>> >>> >
>>> >>> >
>>> >>> >                  What is the real purpose to store path and
>>> dimension
>>> >>> > in
>>> >>> > SSDVF field?
>>> >>> >
>>> >>> >
>>> >>> > Kindly post your suggestions.
>>> >>> >
>>> >>> > Regards,
>>> >>> > Chitra
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>>> >>> > <lucene@mikemccandless.com> wrote:
>>> >>> >>
>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <chithu.r111@gmail.com>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState
, we
>>> are
>>> >>> >> > calculating ordinals( this will be used to calculate
facet
>>> count )
>>> >>> >> > for
>>> >>> >> > doc
>>> >>> >> > values field and this only made the state instance
somewhat
>>> costly.
>>> >>> >> >                       Am I right or any other reason
behind
>>> that?
>>> >>> >>
>>> >>> >> That's correct.  It adds some latency to an NRT refresh,
and some
>>> heap
>>> >>> >> used to hold the ordinal mappings.
>>> >>> >>
>>> >>> >> >          ii) During indexing, we are providing facet
ordinals in
>>> >>> >> > each
>>> >>> >> > doc
>>> >>> >> > and I think it will be useful in search side, to calculate
facet
>>> >>> >> > counts
>>> >>> >> > only for matching docs.  otherwise, it carries any
other
>>> benefits?
>>> >>> >>
>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't
require a
>>> >>> >> separate index.
>>> >>> >>
>>> >>> >> But they add latency/heap usage, and they cannot do hierarchical
>>> >>> >> facets yet (though this could be fixed if someone just
built it).
>>> >>> >>
>>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe
(ie)
>>> >>> >> > multiple
>>> >>> >> > threads can call this method concurrently?
>>> >>> >>
>>> >>> >> Yes.
>>> >>> >>
>>> >>> >> Mike McCandless
>>> >>> >>
>>> >>> >> http://blog.mikemccandless.com
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message