lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra R <chithu.r...@gmail.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Wed, 30 Nov 2016 09:10:53 GMT
Thank you so much, Shai...

Chitra

On Wed, Nov 30, 2016 at 2:17 PM, Shai Erera <serera@gmail.com> wrote:

> This feature is not available in Lucene currently, but it shouldn't be hard
> to add it. See Mike's comment here:
> http://blog.mikemccandless.com/2013/05/dynamic-faceting-
> with-lucene.html?showComment=1412777154420#c363162440067733144
>
> One more tricky (yet nicer) feature would be to have it all in one go, i.e.
> you'd say something like "facet on field price" and you'd get "interesting"
> buckets, per the variance in the results.
>
> But before that, we could have a StatsFacets in Lucene which provide some
> statistics about a numeric field (min/max/avg etc.).
>
> On Wed, Nov 30, 2016 at 7:50 AM Chitra R <chithu.r111@gmail.com> wrote:
>
> > Thank you so much, mike... Hope, gained a lot of stuff on Doc
> > Values faceting and also clarified all my doubts. Thanks..!!
> >
> >
> > *Another use case:*
> >
> > After getting matching documents for the given query, Is there any way to
> > calculate mix and max values on NumericDocValuesField ( say date field)?
> >
> >
> > I would like to implement it in numeric range faceting by splitting the
> > numeric values (getting from resulted documents) into ranges.
> >
> >
> > Chitra
> >
> >
> > On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> > > Doc values fields are never loaded into memory; at most some small
> > > index structures are.
> > >
> > > When you use those fields, the bytes (for just the one doc values
> > > field you are using) are pulled from disk, and the OS will cache them
> > > in memory if available.
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <chithu.r111@gmail.com>
> wrote:
> > > > Hi,
> > > >          When opening SortedSetDocValuesReaderState at search time,
> > > whether
> > > > the whole doc value files (.dvd & .dvm) information are loaded in
> > memory
> > > or
> > > > specified field information(say $facets field) alone load in memory?
> > > >
> > > >
> > > >
> > > >
> > > > Any help is much appreciated.
> > > >
> > > >
> > > > Regards,
> > > > Chitra
> > > >
> > > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r111@gmail.com>
> > wrote:
> > > >>
> > > >>
> > > >> Kindly post your suggestions.
> > > >>
> > > >> Regards,
> > > >> Chitra
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r111@gmail.com>
> > > wrote:
> > > >>>
> > > >>> Hey, I got it clearly. Thank you so much. Could you please help
us
> to
> > > >>> implement it in our use case?
> > > >>>
> > > >>>
> > > >>> In our case, we are having dynamic index and it is variable depth
> > too.
> > > So
> > > >>> flat facet is enough.No need of hierarchical facets.
> > > >>>
> > > >>> What I think is,
> > > >>>
> > > >>> Index my facet field as normal doc value field, so that no special
> > > >>> operation (like taxonomy and sorted set doc values facet field)
> will
> > > be done
> > > >>> at index time and only doc value field stores its ordinals in
their
> > > >>> respective field.
> > > >>> At search time, I will pass query (user search query) , filter
> (path
> > > >>> traversed list)  and collect the matching documents in
> > Facetscollector.
> > > >>> To compute facet count for the specific field, I will gather those
> > > >>> resulted docs, then move through each segment for collecting the
> > > matching
> > > >>> ordinals using AtomicReader.
> > > >>>
> > > >>>
> > > >>> And know when I use this means, can't calculate facet count for
> more
> > > than
> > > >>> one field(facet) in a search.
> > > >>>
> > > >>> Instead of loading all the dimensions in DocValuesReaderState
(will
> > > take
> > > >>> more time and memory) at search time, loading specific fields
will
> > > take less
> > > >>> time and memory, hope so. Kindly help to solve.
> > > >>>
> > > >>>
> > > >>> It will do it in a minimal index and search cost, I think. And
hope
> > > this
> > > >>> won't put overload at index time, also at search time this will
be
> > > better.
> > > >>>
> > > >>>
> > > >>> Kindly post your suggestions.
> > > >>>
> > > >>>
> > > >>> Regards,
> > > >>> Chitra
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> > > >>> <lucene@mikemccandless.com> wrote:
> > > >>>>
> > > >>>> I think you've summed up exactly the differences!
> > > >>>>
> > > >>>> And, yes, it would be possible to emulate hierarchical facets
on
> top
> > > >>>> of flat facets, if the hierarchy is fixed depth like
> year/month/day.
> > > >>>>
> > > >>>> But if it's variable depth, it's trickier (but I think still
> > > >>>> possible).  See e.g. the Committed Paths drill-down on the
left,
> on
> > > >>>> our dog-food server
> > > >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> > > >>>>
> > > >>>> Mike McCandless
> > > >>>>
> > > >>>> http://blog.mikemccandless.com
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r111@gmail.com>
> > > wrote:
> > > >>>> > case 1:
> > > >>>> >         In taxonomy, for each indexed document, examines
facet
> > > label ,
> > > >>>> > computes their ordinals and mappings, and which will
be stored
> in
> > > >>>> > sidecar
> > > >>>> > index at index time.
> > > >>>> >
> > > >>>> > case 2:
> > > >>>> >         In doc values, these(ordinals) are computed at
search
> > time,
> > > so
> > > >>>> > there
> > > >>>> > will be a time and memory trade-off between both cases,
hope so.
> > > >>>> >
> > > >>>> >
> > > >>>> > In taxonomy, building hierarchical facets at index time
makes
> > > faceting
> > > >>>> > cost
> > > >>>> > minimal at search time than flat facets in doc values.
> > > >>>> >
> > > >>>> > Except (memory,time and NRT latency) , Is any another
contrast
> > > between
> > > >>>> > hierarchical and flat facets at search time?
> > > >>>> >
> > > >>>> >
> > > >>>> > Kindly post your suggestions...
> > > >>>> >
> > > >>>> >
> > > >>>> > Regards,
> > > >>>> > Chitra
> > > >>>> >
> > > >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <
> chithu.r111@gmail.com>
> > > >>>> > wrote:
> > > >>>> >>
> > > >>>> >> Okay. I agree with you, Taxonomy maintains and supports
> > > hierarchical
> > > >>>> >> facets during indexing. Hope hierarchical in the
sense, we
> might
> > > >>>> >> index the
> > > >>>> >> field Publish date : 2010/10/15 as Publish date:
2010 , Publish
> > > date:
> > > >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet
ordinals are
> > > >>>> >> maintained
> > > >>>> >> in sidecar index and it is mapped to the main index.
> > > >>>> >>
> > > >>>> >> For example:
> > > >>>> >>
> > > >>>> >>                 In search-lucene.com , I enter a
term (say
> > facet),
> > > >>>> >> top
> > > >>>> >> documents and their categories are displayed after
performing
> the
> > > >>>> >> search.
> > > >>>> >> Say I drill down through Publish date/2010 to collect
its child
> > > >>>> >> counts and
> > > >>>> >> after I will pass through publishdate/2010/10 to
collect their
> > > child
> > > >>>> >> counts.
> > > >>>> >> And for each drill down, each search will be performed
to
> collect
> > > its
> > > >>>> >> top
> > > >>>> >> docs and categories.
> > > >>>> >>
> > > >>>> >>
> > > >>>> >>                Even I can achieve this in flat facets
by
> changing
> > > the
> > > >>>> >> drill down query.
> > > >>>> >>
> > > >>>> >> Am I right or missed anything? yet I don't know if
I missed
> > > >>>> >> anything...
> > > >>>> >>
> > > >>>> >> So What is the need of hierarchical facets? Could
you please
> > > explain
> > > >>>> >> it(hierarchical facets) in the real-world use case?
> > > >>>> >>
> > > >>>> >>
> > > >>>> >> Regards,
> > > >>>> >> Chitra
> > > >>>> >>
> > > >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> > > >>>> >> <lucene@mikemccandless.com> wrote:
> > > >>>> >>>
> > > >>>> >>> You store dimension + string (a single value
path, since it's
> > not
> > > >>>> >>> hierarchical) into SSDVFF so that you can compute
facet
> counts,
> > > >>>> >>> either
> > > >>>> >>> ordinary drill down counts or the drill sideways
counts.
> > > >>>> >>>
> > > >>>> >>> You can see examples of drill sideways at
> > > >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill
down on any
> of
> > > >>>> >>> those
> > > >>>> >>> fields on the left and you don't lose the previous
facet
> counts
> > > for
> > > >>>> >>> that field.
> > > >>>> >>>
> > > >>>> >>> Mike McCandless
> > > >>>> >>>
> > > >>>> >>> http://blog.mikemccandless.com
> > > >>>> >>>
> > > >>>> >>>
> > > >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <
> > chithu.r111@gmail.com>
> > > >>>> >>> wrote:
> > > >>>> >>> > Hi,
> > > >>>> >>> >
> > > >>>> >>> > Lucene-Drill sideways
> > > >>>> >>> >
> > > >>>> >>> > jira_issue:LUCENE-4748
> > > >>>> >>> >
> > > >>>> >>> >                                  Is this
the reason( ie
> Drill
> > > >>>> >>> > sideways
> > > >>>> >>> > makes
> > > >>>> >>> > a very nice faceted search UI because we
> > > >>>> >>> > don't "lose" the facet counts after drilling
in) behind
> > storing
> > > >>>> >>> > path
> > > >>>> >>> > and
> > > >>>> >>> > dimension for the given SSDVF field? Else
anything?
> > > >>>> >>> >
> > > >>>> >>> > Regards,
> > > >>>> >>> > Chitra
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >      Hey, thank you so much for the fast
response, I agree
> NRT
> > > >>>> >>> > refresh
> > > >>>> >>> > is
> > > >>>> >>> > somewhat costly operations and this is the
major pitfall,
> > > suppose
> > > >>>> >>> > we
> > > >>>> >>> > use doc
> > > >>>> >>> > value faceting.
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >                  While indexing
> SortedSetDocValuesFacetField ,
> > > it
> > > >>>> >>> > stores
> > > >>>> >>> > path and dimension of the given field internally.
So Can we
> > > >>>> >>> > achieve
> > > >>>> >>> > hierarchical facets using DrillDownQuery?
Hope, purpose of
> > > storing
> > > >>>> >>> > path
> > > >>>> >>> > and
> > > >>>> >>> > dimension is to achieve hierarchical facets.
If yes (ie we
> can
> > > >>>> >>> > achieve
> > > >>>> >>> > hierarchy in SSDVFF) , so what is the need
to move over
> > > taxonomy?
> > > >>>> >>> >  Else I missed anything?
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >                  What is the real purpose
to store path and
> > > >>>> >>> > dimension
> > > >>>> >>> > in
> > > >>>> >>> > SSDVF field?
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> > Kindly post your suggestions.
> > > >>>> >>> >
> > > >>>> >>> > Regards,
> > > >>>> >>> > Chitra
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael
McCandless
> > > >>>> >>> > <lucene@mikemccandless.com> wrote:
> > > >>>> >>> >>
> > > >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra
R <
> > > chithu.r111@gmail.com>
> > > >>>> >>> >> wrote:
> > > >>>> >>> >>
> > > >>>> >>> >> >         i)Hope, when opening
> SortedSetDocValuesReaderState
> > ,
> > > we
> > > >>>> >>> >> > are
> > > >>>> >>> >> > calculating ordinals( this will
be used to calculate
> facet
> > > >>>> >>> >> > count )
> > > >>>> >>> >> > for
> > > >>>> >>> >> > doc
> > > >>>> >>> >> > values field and this only made
the state instance
> somewhat
> > > >>>> >>> >> > costly.
> > > >>>> >>> >> >                       Am I right
or any other reason
> behind
> > > >>>> >>> >> > that?
> > > >>>> >>> >>
> > > >>>> >>> >> That's correct.  It adds some latency
to an NRT refresh,
> and
> > > some
> > > >>>> >>> >> heap
> > > >>>> >>> >> used to hold the ordinal mappings.
> > > >>>> >>> >>
> > > >>>> >>> >> >          ii) During indexing, we
are providing facet
> > ordinals
> > > >>>> >>> >> > in
> > > >>>> >>> >> > each
> > > >>>> >>> >> > doc
> > > >>>> >>> >> > and I think it will be useful in
search side, to
> calculate
> > > >>>> >>> >> > facet
> > > >>>> >>> >> > counts
> > > >>>> >>> >> > only for matching docs.  otherwise,
it carries any other
> > > >>>> >>> >> > benefits?
> > > >>>> >>> >>
> > > >>>> >>> >> Well, compared to the taxonomy facets,
SSDV facets don't
> > > require
> > > >>>> >>> >> a
> > > >>>> >>> >> separate index.
> > > >>>> >>> >>
> > > >>>> >>> >> But they add latency/heap usage, and
they cannot do
> > > hierarchical
> > > >>>> >>> >> facets yet (though this could be fixed
if someone just
> built
> > > it).
> > > >>>> >>> >>
> > > >>>> >>> >> >          iii) Is SortedSetDocValuesReaderState
> thread-safe
> > > (ie)
> > > >>>> >>> >> > multiple
> > > >>>> >>> >> > threads can call this method concurrently?
> > > >>>> >>> >>
> > > >>>> >>> >> Yes.
> > > >>>> >>> >>
> > > >>>> >>> >> Mike McCandless
> > > >>>> >>> >>
> > > >>>> >>> >> http://blog.mikemccandless.com
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>
> > > >>>> >>
> > > >>>> >
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message