lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: dv field is too large
Date Thu, 07 Jul 2016 09:46:31 GMT
I agree, I'll improve the docs about this limit.  Thanks Sheng.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 10:59 PM, Sheng <shengcer@gmail.com> wrote:

> I agree. That said, wouldn't it also make sense to clearly point it out by
> adding the comments to the corresponding classes. This is not the first
> time I am running into this "magic number" pitfall when using Lucene
> (e.g., 1024
> limit for the token length in early version of Lucene). Generally speaking,
> the documentation is pretty good and helpful. But without documenting
> subtle issues like this, they may only manifest themselves in production
> when the real data come in and they are "big".
>
> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
> > Well, if you must sort on a 32K single value (although I think this is
> > extremely silly, _nobody_ will notice that two docs are out of order
> > because they were identical up until the 30,000th character but the
> > 30,001st character isn't sorted correctly), do as Mike suggests and
> > chop it off before sending it to Lucene.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 6, 2016 at 3:53 PM, Sheng <shengcer@gmail.com
> <javascript:;>>
> > wrote:
> > > You misunderstand. I have many fields, and unfortunately a few of them
> > are
> > > quite big, i.e. exceeding the 32k limit. In order to make these "big"
> > > fields sortable, they have to be stored as SortedDocValueField. Or that
> > is
> > > wrong, one can actually sort the search result by a "big" field without
> > > indexing it to a SortedDocValueField. Suggestion ?
> > >
> > > On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> > <javascript:;>> wrote:
> > >
> > >> bq: In this case, we
> > >> have to index a particular data structure which has bunch of fields
> and
> > >> each of them is promised to be searchable and search-sortable to the
> > user
> > >>
> > >> If I'm reading this right, you have some structure. You say
> > >> "each of them is promised to be searchable and search-sortable"
> > >>
> > >> It _sounds_ like what you want to do is break these fields out
> > >> into separate fields each of which is searchable and sortable
> > >> independently. But from what you've described, putting the entire
> > >> thing into a single DV field isn't useful.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>
> > >>
> > >> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <shengcer@gmail.com
> > <javascript:;> <javascript:;>>
> > >> wrote:
> > >> > To be clear, the "field" is indeed tokenized, which is accompanied
> > with a
> > >> > SortedDocValueField so that it is sortable too. Am I making the
> wrong
> > >> > assumption here ?
> > >> >
> > >> > On Wednesday, July 6, 2016, Sheng <shengcer@gmail.com
> <javascript:;>
> > <javascript:;>>
> > >> wrote:
> > >> >
> > >> >> Hi Eric,
> > >> >>
> > >> >> I am refactoring a legacy system. One of the most annoying things
> is
> > I
> > >> >> have to keep the old feature even though it makes little sense.
In
> > this
> > >> >> case, we have to index a particular data structure which has bunch
> of
> > >> >> fields and each of them is promised to be searchable and
> > >> search-sortable to
> > >> >> the user. Turns out one field is notoriously large. I think the
old
> > >> >> implementation uses some quite clumsy way to make it happen. But
> > since
> > >> we
> > >> >> decide to refactor the system with all the goodies from Lucene,
we
> > want
> > >> to
> > >> >> do the sorting right, and here we are at this issue... :-(
> > >> >>
> > >> >> On Wednesday, July 6, 2016, Erick Erickson <
> erickerickson@gmail.com
> > <javascript:;>
> > >> <javascript:;>
> > >> >> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com
> <javascript:;>
> > <javascript:;>');>>
> > >> wrote:
> > >> >>
> > >> >>> Is this an "XY" problem? Meaning, why do you need DV fields
larger
> > than
> > >> >>> 32K?
> > >> >>>
> > >> >>> You can't search it as text as it's not tokenized. Faceting
and
> > sorting
> > >> >>> by a 32K
> > >> >>> field doesn't seem very useful. You may have a perfectly valid
> > reason,
> > >> >>> but it's
> > >> >>> not obvious what use-case you're serving from this thread
so
> far....
> > >> >>>
> > >> >>> Nobody has yet put forth a compelling use-case for such large
> > fields,
> > >> >>> perhaps
> > >> >>> this would be one.
> > >> >>>
> > >> >>> Best,
> > >> >>> Erick
> > >> >>>
> > >> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> > <javascript:;>
> > >> <javascript:;>> wrote:
> > >> >>> > Mike - Thanks for the prompt response. Is there a way
to bypass
> > this
> > >> >>> > constraint for SortedDocValueField ? Or we have to live
with it,
> > >> >>> meaning no
> > >> >>> > fix even in future release?
> > >> >>> >
> > >> >>> > On Wednesday, July 6, 2016, Michael McCandless <
> > >> >>> lucene@mikemccandless.com <javascript:;> <javascript:;>>
> > >> >>> > wrote:
> > >> >>> >
> > >> >>> >> I believe only binary DVs can be larger than 32K
bytes.
> > >> >>> >>
> > >> >>> >> Mike McCandless
> > >> >>> >>
> > >> >>> >> http://blog.mikemccandless.com
> > >> >>> >>
> > >> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> > <javascript:;>
> > >> <javascript:;>
> > >> >>> <javascript:;>>
> > >> >>> >> wrote:
> > >> >>> >>
> > >> >>> >> > Hi,
> > >> >>> >> >
> > >> >>> >> > I am getting an IAE indicating one of the SortedDocValueField
> > is
> > >> too
> > >> >>> >> large,
> > >> >>> >> > > 32k
> > >> >>> >> >
> > >> >>> >> > I googled a bit, and it seems like #Lucene-4583
has addressed
> > this
> > >> >>> issue
> > >> >>> >> in
> > >> >>> >> > 4.5 and 6.0, while I am currently using Lucene
6.1. Do I miss
> > or
> > >> >>> >> > misunderstand anything ?
> > >> >>> >> >
> > >> >>> >> > Thanks,
> > >> >>> >> >
> > >> >>> >>
> > >> >>>
> > >> >>>
> > ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >> >>>
> > >> >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message