lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra R <chithu.r...@gmail.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Wed, 30 Nov 2016 05:50:01 GMT
Thank you so much, mike... Hope, gained a lot of stuff on Doc
Values faceting and also clarified all my doubts. Thanks..!!


*Another use case:*

After getting matching documents for the given query, Is there any way to
calculate mix and max values on NumericDocValuesField ( say date field)?


I would like to implement it in numeric range faceting by splitting the
numeric values (getting from resulted documents) into ranges.


Chitra


On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Doc values fields are never loaded into memory; at most some small
> index structures are.
>
> When you use those fields, the bytes (for just the one doc values
> field you are using) are pulled from disk, and the OS will cache them
> in memory if available.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <chithu.r111@gmail.com> wrote:
> > Hi,
> >          When opening SortedSetDocValuesReaderState at search time,
> whether
> > the whole doc value files (.dvd & .dvm) information are loaded in memory
> or
> > specified field information(say $facets field) alone load in memory?
> >
> >
> >
> >
> > Any help is much appreciated.
> >
> >
> > Regards,
> > Chitra
> >
> > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r111@gmail.com> wrote:
> >>
> >>
> >> Kindly post your suggestions.
> >>
> >> Regards,
> >> Chitra
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r111@gmail.com>
> wrote:
> >>>
> >>> Hey, I got it clearly. Thank you so much. Could you please help us to
> >>> implement it in our use case?
> >>>
> >>>
> >>> In our case, we are having dynamic index and it is variable depth too.
> So
> >>> flat facet is enough.No need of hierarchical facets.
> >>>
> >>> What I think is,
> >>>
> >>> Index my facet field as normal doc value field, so that no special
> >>> operation (like taxonomy and sorted set doc values facet field) will
> be done
> >>> at index time and only doc value field stores its ordinals in their
> >>> respective field.
> >>> At search time, I will pass query (user search query) , filter (path
> >>> traversed list)  and collect the matching documents in Facetscollector.
> >>> To compute facet count for the specific field, I will gather those
> >>> resulted docs, then move through each segment for collecting the
> matching
> >>> ordinals using AtomicReader.
> >>>
> >>>
> >>> And know when I use this means, can't calculate facet count for more
> than
> >>> one field(facet) in a search.
> >>>
> >>> Instead of loading all the dimensions in DocValuesReaderState (will
> take
> >>> more time and memory) at search time, loading specific fields will
> take less
> >>> time and memory, hope so. Kindly help to solve.
> >>>
> >>>
> >>> It will do it in a minimal index and search cost, I think. And hope
> this
> >>> won't put overload at index time, also at search time this will be
> better.
> >>>
> >>>
> >>> Kindly post your suggestions.
> >>>
> >>>
> >>> Regards,
> >>> Chitra
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> >>> <lucene@mikemccandless.com> wrote:
> >>>>
> >>>> I think you've summed up exactly the differences!
> >>>>
> >>>> And, yes, it would be possible to emulate hierarchical facets on top
> >>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
> >>>>
> >>>> But if it's variable depth, it's trickier (but I think still
> >>>> possible).  See e.g. the Committed Paths drill-down on the left, on
> >>>> our dog-food server
> >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> >>>>
> >>>> Mike McCandless
> >>>>
> >>>> http://blog.mikemccandless.com
> >>>>
> >>>>
> >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r111@gmail.com>
> wrote:
> >>>> > case 1:
> >>>> >         In taxonomy, for each indexed document, examines facet
> label ,
> >>>> > computes their ordinals and mappings, and which will be stored
in
> >>>> > sidecar
> >>>> > index at index time.
> >>>> >
> >>>> > case 2:
> >>>> >         In doc values, these(ordinals) are computed at search time,
> so
> >>>> > there
> >>>> > will be a time and memory trade-off between both cases, hope so.
> >>>> >
> >>>> >
> >>>> > In taxonomy, building hierarchical facets at index time makes
> faceting
> >>>> > cost
> >>>> > minimal at search time than flat facets in doc values.
> >>>> >
> >>>> > Except (memory,time and NRT latency) , Is any another contrast
> between
> >>>> > hierarchical and flat facets at search time?
> >>>> >
> >>>> >
> >>>> > Kindly post your suggestions...
> >>>> >
> >>>> >
> >>>> > Regards,
> >>>> > Chitra
> >>>> >
> >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r111@gmail.com>
> >>>> > wrote:
> >>>> >>
> >>>> >> Okay. I agree with you, Taxonomy maintains and supports
> hierarchical
> >>>> >> facets during indexing. Hope hierarchical in the sense, we
might
> >>>> >> index the
> >>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish
> date:
> >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals
are
> >>>> >> maintained
> >>>> >> in sidecar index and it is mapped to the main index.
> >>>> >>
> >>>> >> For example:
> >>>> >>
> >>>> >>                 In search-lucene.com , I enter a term (say
facet),
> >>>> >> top
> >>>> >> documents and their categories are displayed after performing
the
> >>>> >> search.
> >>>> >> Say I drill down through Publish date/2010 to collect its child
> >>>> >> counts and
> >>>> >> after I will pass through publishdate/2010/10 to collect their
> child
> >>>> >> counts.
> >>>> >> And for each drill down, each search will be performed to collect
> its
> >>>> >> top
> >>>> >> docs and categories.
> >>>> >>
> >>>> >>
> >>>> >>                Even I can achieve this in flat facets by changing
> the
> >>>> >> drill down query.
> >>>> >>
> >>>> >> Am I right or missed anything? yet I don't know if I missed
> >>>> >> anything...
> >>>> >>
> >>>> >> So What is the need of hierarchical facets? Could you please
> explain
> >>>> >> it(hierarchical facets) in the real-world use case?
> >>>> >>
> >>>> >>
> >>>> >> Regards,
> >>>> >> Chitra
> >>>> >>
> >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> >>>> >> <lucene@mikemccandless.com> wrote:
> >>>> >>>
> >>>> >>> You store dimension + string (a single value path, since
it's not
> >>>> >>> hierarchical) into SSDVFF so that you can compute facet
counts,
> >>>> >>> either
> >>>> >>> ordinary drill down counts or the drill sideways counts.
> >>>> >>>
> >>>> >>> You can see examples of drill sideways at
> >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on
any of
> >>>> >>> those
> >>>> >>> fields on the left and you don't lose the previous facet
counts
> for
> >>>> >>> that field.
> >>>> >>>
> >>>> >>> Mike McCandless
> >>>> >>>
> >>>> >>> http://blog.mikemccandless.com
> >>>> >>>
> >>>> >>>
> >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r111@gmail.com>
> >>>> >>> wrote:
> >>>> >>> > Hi,
> >>>> >>> >
> >>>> >>> > Lucene-Drill sideways
> >>>> >>> >
> >>>> >>> > jira_issue:LUCENE-4748
> >>>> >>> >
> >>>> >>> >                                  Is this the reason(
ie Drill
> >>>> >>> > sideways
> >>>> >>> > makes
> >>>> >>> > a very nice faceted search UI because we
> >>>> >>> > don't "lose" the facet counts after drilling in) behind
storing
> >>>> >>> > path
> >>>> >>> > and
> >>>> >>> > dimension for the given SSDVF field? Else anything?
> >>>> >>> >
> >>>> >>> > Regards,
> >>>> >>> > Chitra
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >      Hey, thank you so much for the fast response,
I agree NRT
> >>>> >>> > refresh
> >>>> >>> > is
> >>>> >>> > somewhat costly operations and this is the major pitfall,
> suppose
> >>>> >>> > we
> >>>> >>> > use doc
> >>>> >>> > value faceting.
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >                  While indexing SortedSetDocValuesFacetField
,
> it
> >>>> >>> > stores
> >>>> >>> > path and dimension of the given field internally.
So Can we
> >>>> >>> > achieve
> >>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose
of
> storing
> >>>> >>> > path
> >>>> >>> > and
> >>>> >>> > dimension is to achieve hierarchical facets. If yes
(ie we can
> >>>> >>> > achieve
> >>>> >>> > hierarchy in SSDVFF) , so what is the need to move
over
> taxonomy?
> >>>> >>> >  Else I missed anything?
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >                  What is the real purpose to store
path and
> >>>> >>> > dimension
> >>>> >>> > in
> >>>> >>> > SSDVF field?
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > Kindly post your suggestions.
> >>>> >>> >
> >>>> >>> > Regards,
> >>>> >>> > Chitra
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> >>>> >>> > <lucene@mikemccandless.com> wrote:
> >>>> >>> >>
> >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <
> chithu.r111@gmail.com>
> >>>> >>> >> wrote:
> >>>> >>> >>
> >>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState
,
> we
> >>>> >>> >> > are
> >>>> >>> >> > calculating ordinals( this will be used to
calculate facet
> >>>> >>> >> > count )
> >>>> >>> >> > for
> >>>> >>> >> > doc
> >>>> >>> >> > values field and this only made the state
instance somewhat
> >>>> >>> >> > costly.
> >>>> >>> >> >                       Am I right or any other
reason behind
> >>>> >>> >> > that?
> >>>> >>> >>
> >>>> >>> >> That's correct.  It adds some latency to an NRT
refresh, and
> some
> >>>> >>> >> heap
> >>>> >>> >> used to hold the ordinal mappings.
> >>>> >>> >>
> >>>> >>> >> >          ii) During indexing, we are providing
facet ordinals
> >>>> >>> >> > in
> >>>> >>> >> > each
> >>>> >>> >> > doc
> >>>> >>> >> > and I think it will be useful in search side,
to calculate
> >>>> >>> >> > facet
> >>>> >>> >> > counts
> >>>> >>> >> > only for matching docs.  otherwise, it carries
any other
> >>>> >>> >> > benefits?
> >>>> >>> >>
> >>>> >>> >> Well, compared to the taxonomy facets, SSDV facets
don't
> require
> >>>> >>> >> a
> >>>> >>> >> separate index.
> >>>> >>> >>
> >>>> >>> >> But they add latency/heap usage, and they cannot
do
> hierarchical
> >>>> >>> >> facets yet (though this could be fixed if someone
just built
> it).
> >>>> >>> >>
> >>>> >>> >> >          iii) Is SortedSetDocValuesReaderState
thread-safe
> (ie)
> >>>> >>> >> > multiple
> >>>> >>> >> > threads can call this method concurrently?
> >>>> >>> >>
> >>>> >>> >> Yes.
> >>>> >>> >>
> >>>> >>> >> Mike McCandless
> >>>> >>> >>
> >>>> >>> >> http://blog.mikemccandless.com
> >>>> >>> >
> >>>> >>> >
> >>>> >>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message