lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Wed, 30 Nov 2016 08:47:01 GMT
This feature is not available in Lucene currently, but it shouldn't be hard
to add it. See Mike's comment here:
http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html?showComment=1412777154420#c363162440067733144

One more tricky (yet nicer) feature would be to have it all in one go, i.e.
you'd say something like "facet on field price" and you'd get "interesting"
buckets, per the variance in the results.

But before that, we could have a StatsFacets in Lucene which provide some
statistics about a numeric field (min/max/avg etc.).

On Wed, Nov 30, 2016 at 7:50 AM Chitra R <chithu.r111@gmail.com> wrote:

> Thank you so much, mike... Hope, gained a lot of stuff on Doc
> Values faceting and also clarified all my doubts. Thanks..!!
>
>
> *Another use case:*
>
> After getting matching documents for the given query, Is there any way to
> calculate mix and max values on NumericDocValuesField ( say date field)?
>
>
> I would like to implement it in numeric range faceting by splitting the
> numeric values (getting from resulted documents) into ranges.
>
>
> Chitra
>
>
> On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> > Doc values fields are never loaded into memory; at most some small
> > index structures are.
> >
> > When you use those fields, the bytes (for just the one doc values
> > field you are using) are pulled from disk, and the OS will cache them
> > in memory if available.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <chithu.r111@gmail.com> wrote:
> > > Hi,
> > >          When opening SortedSetDocValuesReaderState at search time,
> > whether
> > > the whole doc value files (.dvd & .dvm) information are loaded in
> memory
> > or
> > > specified field information(say $facets field) alone load in memory?
> > >
> > >
> > >
> > >
> > > Any help is much appreciated.
> > >
> > >
> > > Regards,
> > > Chitra
> > >
> > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r111@gmail.com>
> wrote:
> > >>
> > >>
> > >> Kindly post your suggestions.
> > >>
> > >> Regards,
> > >> Chitra
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r111@gmail.com>
> > wrote:
> > >>>
> > >>> Hey, I got it clearly. Thank you so much. Could you please help us
to
> > >>> implement it in our use case?
> > >>>
> > >>>
> > >>> In our case, we are having dynamic index and it is variable depth
> too.
> > So
> > >>> flat facet is enough.No need of hierarchical facets.
> > >>>
> > >>> What I think is,
> > >>>
> > >>> Index my facet field as normal doc value field, so that no special
> > >>> operation (like taxonomy and sorted set doc values facet field) will
> > be done
> > >>> at index time and only doc value field stores its ordinals in their
> > >>> respective field.
> > >>> At search time, I will pass query (user search query) , filter (path
> > >>> traversed list)  and collect the matching documents in
> Facetscollector.
> > >>> To compute facet count for the specific field, I will gather those
> > >>> resulted docs, then move through each segment for collecting the
> > matching
> > >>> ordinals using AtomicReader.
> > >>>
> > >>>
> > >>> And know when I use this means, can't calculate facet count for more
> > than
> > >>> one field(facet) in a search.
> > >>>
> > >>> Instead of loading all the dimensions in DocValuesReaderState (will
> > take
> > >>> more time and memory) at search time, loading specific fields will
> > take less
> > >>> time and memory, hope so. Kindly help to solve.
> > >>>
> > >>>
> > >>> It will do it in a minimal index and search cost, I think. And hope
> > this
> > >>> won't put overload at index time, also at search time this will be
> > better.
> > >>>
> > >>>
> > >>> Kindly post your suggestions.
> > >>>
> > >>>
> > >>> Regards,
> > >>> Chitra
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> > >>> <lucene@mikemccandless.com> wrote:
> > >>>>
> > >>>> I think you've summed up exactly the differences!
> > >>>>
> > >>>> And, yes, it would be possible to emulate hierarchical facets on
top
> > >>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
> > >>>>
> > >>>> But if it's variable depth, it's trickier (but I think still
> > >>>> possible).  See e.g. the Committed Paths drill-down on the left,
on
> > >>>> our dog-food server
> > >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> > >>>>
> > >>>> Mike McCandless
> > >>>>
> > >>>> http://blog.mikemccandless.com
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r111@gmail.com>
> > wrote:
> > >>>> > case 1:
> > >>>> >         In taxonomy, for each indexed document, examines facet
> > label ,
> > >>>> > computes their ordinals and mappings, and which will be stored
in
> > >>>> > sidecar
> > >>>> > index at index time.
> > >>>> >
> > >>>> > case 2:
> > >>>> >         In doc values, these(ordinals) are computed at search
> time,
> > so
> > >>>> > there
> > >>>> > will be a time and memory trade-off between both cases, hope
so.
> > >>>> >
> > >>>> >
> > >>>> > In taxonomy, building hierarchical facets at index time makes
> > faceting
> > >>>> > cost
> > >>>> > minimal at search time than flat facets in doc values.
> > >>>> >
> > >>>> > Except (memory,time and NRT latency) , Is any another contrast
> > between
> > >>>> > hierarchical and flat facets at search time?
> > >>>> >
> > >>>> >
> > >>>> > Kindly post your suggestions...
> > >>>> >
> > >>>> >
> > >>>> > Regards,
> > >>>> > Chitra
> > >>>> >
> > >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r111@gmail.com>
> > >>>> > wrote:
> > >>>> >>
> > >>>> >> Okay. I agree with you, Taxonomy maintains and supports
> > hierarchical
> > >>>> >> facets during indexing. Hope hierarchical in the sense,
we might
> > >>>> >> index the
> > >>>> >> field Publish date : 2010/10/15 as Publish date: 2010
, Publish
> > date:
> > >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals
are
> > >>>> >> maintained
> > >>>> >> in sidecar index and it is mapped to the main index.
> > >>>> >>
> > >>>> >> For example:
> > >>>> >>
> > >>>> >>                 In search-lucene.com , I enter a term
(say
> facet),
> > >>>> >> top
> > >>>> >> documents and their categories are displayed after performing
the
> > >>>> >> search.
> > >>>> >> Say I drill down through Publish date/2010 to collect
its child
> > >>>> >> counts and
> > >>>> >> after I will pass through publishdate/2010/10 to collect
their
> > child
> > >>>> >> counts.
> > >>>> >> And for each drill down, each search will be performed
to collect
> > its
> > >>>> >> top
> > >>>> >> docs and categories.
> > >>>> >>
> > >>>> >>
> > >>>> >>                Even I can achieve this in flat facets
by changing
> > the
> > >>>> >> drill down query.
> > >>>> >>
> > >>>> >> Am I right or missed anything? yet I don't know if I missed
> > >>>> >> anything...
> > >>>> >>
> > >>>> >> So What is the need of hierarchical facets? Could you
please
> > explain
> > >>>> >> it(hierarchical facets) in the real-world use case?
> > >>>> >>
> > >>>> >>
> > >>>> >> Regards,
> > >>>> >> Chitra
> > >>>> >>
> > >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> > >>>> >> <lucene@mikemccandless.com> wrote:
> > >>>> >>>
> > >>>> >>> You store dimension + string (a single value path,
since it's
> not
> > >>>> >>> hierarchical) into SSDVFF so that you can compute
facet counts,
> > >>>> >>> either
> > >>>> >>> ordinary drill down counts or the drill sideways counts.
> > >>>> >>>
> > >>>> >>> You can see examples of drill sideways at
> > >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down
on any of
> > >>>> >>> those
> > >>>> >>> fields on the left and you don't lose the previous
facet counts
> > for
> > >>>> >>> that field.
> > >>>> >>>
> > >>>> >>> Mike McCandless
> > >>>> >>>
> > >>>> >>> http://blog.mikemccandless.com
> > >>>> >>>
> > >>>> >>>
> > >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <
> chithu.r111@gmail.com>
> > >>>> >>> wrote:
> > >>>> >>> > Hi,
> > >>>> >>> >
> > >>>> >>> > Lucene-Drill sideways
> > >>>> >>> >
> > >>>> >>> > jira_issue:LUCENE-4748
> > >>>> >>> >
> > >>>> >>> >                                  Is this the
reason( ie Drill
> > >>>> >>> > sideways
> > >>>> >>> > makes
> > >>>> >>> > a very nice faceted search UI because we
> > >>>> >>> > don't "lose" the facet counts after drilling
in) behind
> storing
> > >>>> >>> > path
> > >>>> >>> > and
> > >>>> >>> > dimension for the given SSDVF field? Else anything?
> > >>>> >>> >
> > >>>> >>> > Regards,
> > >>>> >>> > Chitra
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >      Hey, thank you so much for the fast response,
I agree NRT
> > >>>> >>> > refresh
> > >>>> >>> > is
> > >>>> >>> > somewhat costly operations and this is the major
pitfall,
> > suppose
> > >>>> >>> > we
> > >>>> >>> > use doc
> > >>>> >>> > value faceting.
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >                  While indexing SortedSetDocValuesFacetField
,
> > it
> > >>>> >>> > stores
> > >>>> >>> > path and dimension of the given field internally.
So Can we
> > >>>> >>> > achieve
> > >>>> >>> > hierarchical facets using DrillDownQuery? Hope,
purpose of
> > storing
> > >>>> >>> > path
> > >>>> >>> > and
> > >>>> >>> > dimension is to achieve hierarchical facets.
If yes (ie we can
> > >>>> >>> > achieve
> > >>>> >>> > hierarchy in SSDVFF) , so what is the need to
move over
> > taxonomy?
> > >>>> >>> >  Else I missed anything?
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >                  What is the real purpose to
store path and
> > >>>> >>> > dimension
> > >>>> >>> > in
> > >>>> >>> > SSDVF field?
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> > Kindly post your suggestions.
> > >>>> >>> >
> > >>>> >>> > Regards,
> > >>>> >>> > Chitra
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> > >>>> >>> > <lucene@mikemccandless.com> wrote:
> > >>>> >>> >>
> > >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R
<
> > chithu.r111@gmail.com>
> > >>>> >>> >> wrote:
> > >>>> >>> >>
> > >>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState
> ,
> > we
> > >>>> >>> >> > are
> > >>>> >>> >> > calculating ordinals( this will be used
to calculate facet
> > >>>> >>> >> > count )
> > >>>> >>> >> > for
> > >>>> >>> >> > doc
> > >>>> >>> >> > values field and this only made the
state instance somewhat
> > >>>> >>> >> > costly.
> > >>>> >>> >> >                       Am I right or
any other reason behind
> > >>>> >>> >> > that?
> > >>>> >>> >>
> > >>>> >>> >> That's correct.  It adds some latency to
an NRT refresh, and
> > some
> > >>>> >>> >> heap
> > >>>> >>> >> used to hold the ordinal mappings.
> > >>>> >>> >>
> > >>>> >>> >> >          ii) During indexing, we are
providing facet
> ordinals
> > >>>> >>> >> > in
> > >>>> >>> >> > each
> > >>>> >>> >> > doc
> > >>>> >>> >> > and I think it will be useful in search
side, to calculate
> > >>>> >>> >> > facet
> > >>>> >>> >> > counts
> > >>>> >>> >> > only for matching docs.  otherwise,
it carries any other
> > >>>> >>> >> > benefits?
> > >>>> >>> >>
> > >>>> >>> >> Well, compared to the taxonomy facets, SSDV
facets don't
> > require
> > >>>> >>> >> a
> > >>>> >>> >> separate index.
> > >>>> >>> >>
> > >>>> >>> >> But they add latency/heap usage, and they
cannot do
> > hierarchical
> > >>>> >>> >> facets yet (though this could be fixed if
someone just built
> > it).
> > >>>> >>> >>
> > >>>> >>> >> >          iii) Is SortedSetDocValuesReaderState
thread-safe
> > (ie)
> > >>>> >>> >> > multiple
> > >>>> >>> >> > threads can call this method concurrently?
> > >>>> >>> >>
> > >>>> >>> >> Yes.
> > >>>> >>> >>
> > >>>> >>> >> Mike McCandless
> > >>>> >>> >>
> > >>>> >>> >> http://blog.mikemccandless.com
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>
> > >>>> >>
> > >>>> >
> > >>>
> > >>>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message