lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Getting multi-values to use in filter?
Date Wed, 23 Apr 2014 15:13:25 GMT
A NumericDocValues field can only hold one value. Have you thought about
encoding the values in a BinaryDocValues field? Or are you talking about
multiple fields (different names), each has its own single value, and at
search time you sum the values from a different set of fields?

If it's one field, multiple values, then why do you need to separate the
values? Is it because you sometimes sum and sometimes e.g. avg? Do you
always include all values of a document in the formula, but the formula
changes between searches, or do you sometimes use only a subset of the
values?

If you always use all values, but change the formula between queries, then
perhaps you can just encode the pre-computed value under different NDV
fields? If you only use a handful of functions (and they are known in
advance), it may not be too heavy on the index, and definitely perform
better during search.

Otherwise, I believe I'd consider indexing them as a BDV field. For facets,
we basically need the same multi-valued numeric field, and given that NDV
is single valued, we went w/ BDV.

If I misunderstood the scenario, I'd appreciate if you clarify it :)

Shai


On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde <rob.audenaerde@gmail.com>wrote:

> Hi Shai, all,
>
> I am trying to write that Filter :). But I'm a bit at loss as how to
> efficiently grab the multi-values. I can access the
> context.reader().document() that accesses the storedfields, but that seems
> slow.
>
> For single-value fields I use a compiled JavaScript Expression with
> simplebindings as ValueSource, which seems to work quite well. The downside
> is that I cannot find a way to implement multi-value through that solution.
>
> These create for example a LongFieldSource, which uses the
> FieldCache.LongParser. These parsers only seem te parse one field.
>
> Is there an efficient way to get -all- of the (numeric) values for a field
> in a document?
>
>
> On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <serera@gmail.com> wrote:
>
> > You can do that by writing a Filter which returns matching documents
> based
> > on a sum of the field's value. However I suspect that is going to be
> slow,
> > unless you know that you will need several such filters and can cache
> them.
> >
> > Another approach would be to write a Collector which serves as a Filter,
> > but computes the sum only for documents that match the query. Hopefully
> > that would mean you compute the sum for less documents than you would
> have
> > w/ the Filter approach.
> >
> > Shai
> >
> >
> > On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov <
> > msokolov@safaribooksonline.com> wrote:
> >
> > > This isn't really a good use case for an index like Lucene.  The most
> > > essential property of an index is that it lets you look up documents
> very
> > > quickly based on *precomputed* values.
> > >
> > > -Mike
> > >
> > >
> > > On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm looking for a way to use multi-values in a filter.
> > >>
> > >> I want to be able to search on  sum(field)=100, where field has values
> > in
> > >> one documents:
> > >>
> > >> field=60
> > >> field=40
> > >>
> > >> In this case 'field' is a LongField. I examined the code in the
> > >> FieldCache,
> > >> but that seems to focus on single-valued fields only, or
> > >>
> > >>
> > >> It this something that can be done in Lucene? And what would be a good
> > >> approach?
> > >>
> > >> Thanks in advance,
> > >>
> > >> -Rob
> > >>
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message