lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheng <sheng...@gmail.com>
Subject Re: dv field is too large
Date Thu, 07 Jul 2016 02:59:10 GMT
I agree. That said, wouldn't it also make sense to clearly point it out by
adding the comments to the corresponding classes. This is not the first
time I am running into this "magic number" pitfall when using Lucene
(e.g., 1024
limit for the token length in early version of Lucene). Generally speaking,
the documentation is pretty good and helpful. But without documenting
subtle issues like this, they may only manifest themselves in production
when the real data come in and they are "big".

On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com> wrote:

> Well, if you must sort on a 32K single value (although I think this is
> extremely silly, _nobody_ will notice that two docs are out of order
> because they were identical up until the 30,000th character but the
> 30,001st character isn't sorted correctly), do as Mike suggests and
> chop it off before sending it to Lucene.
>
> Best,
> Erick
>
> On Wed, Jul 6, 2016 at 3:53 PM, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
> > You misunderstand. I have many fields, and unfortunately a few of them
> are
> > quite big, i.e. exceeding the 32k limit. In order to make these "big"
> > fields sortable, they have to be stored as SortedDocValueField. Or that
> is
> > wrong, one can actually sort the search result by a "big" field without
> > indexing it to a SortedDocValueField. Suggestion ?
> >
> > On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:;>> wrote:
> >
> >> bq: In this case, we
> >> have to index a particular data structure which has bunch of fields and
> >> each of them is promised to be searchable and search-sortable to the
> user
> >>
> >> If I'm reading this right, you have some structure. You say
> >> "each of them is promised to be searchable and search-sortable"
> >>
> >> It _sounds_ like what you want to do is break these fields out
> >> into separate fields each of which is searchable and sortable
> >> independently. But from what you've described, putting the entire
> >> thing into a single DV field isn't useful.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <shengcer@gmail.com
> <javascript:;> <javascript:;>>
> >> wrote:
> >> > To be clear, the "field" is indeed tokenized, which is accompanied
> with a
> >> > SortedDocValueField so that it is sortable too. Am I making the wrong
> >> > assumption here ?
> >> >
> >> > On Wednesday, July 6, 2016, Sheng <shengcer@gmail.com <javascript:;>
> <javascript:;>>
> >> wrote:
> >> >
> >> >> Hi Eric,
> >> >>
> >> >> I am refactoring a legacy system. One of the most annoying things is
> I
> >> >> have to keep the old feature even though it makes little sense. In
> this
> >> >> case, we have to index a particular data structure which has bunch
of
> >> >> fields and each of them is promised to be searchable and
> >> search-sortable to
> >> >> the user. Turns out one field is notoriously large. I think the old
> >> >> implementation uses some quite clumsy way to make it happen. But
> since
> >> we
> >> >> decide to refactor the system with all the goodies from Lucene, we
> want
> >> to
> >> >> do the sorting right, and here we are at this issue... :-(
> >> >>
> >> >> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:;>
> >> <javascript:;>
> >> >> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com <javascript:;>
> <javascript:;>');>>
> >> wrote:
> >> >>
> >> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger
> than
> >> >>> 32K?
> >> >>>
> >> >>> You can't search it as text as it's not tokenized. Faceting and
> sorting
> >> >>> by a 32K
> >> >>> field doesn't seem very useful. You may have a perfectly valid
> reason,
> >> >>> but it's
> >> >>> not obvious what use-case you're serving from this thread so far....
> >> >>>
> >> >>> Nobody has yet put forth a compelling use-case for such large
> fields,
> >> >>> perhaps
> >> >>> this would be one.
> >> >>>
> >> >>> Best,
> >> >>> Erick
> >> >>>
> >> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >> >>> > Mike - Thanks for the prompt response. Is there a way to bypass
> this
> >> >>> > constraint for SortedDocValueField ? Or we have to live with
it,
> >> >>> meaning no
> >> >>> > fix even in future release?
> >> >>> >
> >> >>> > On Wednesday, July 6, 2016, Michael McCandless <
> >> >>> lucene@mikemccandless.com <javascript:;> <javascript:;>>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> I believe only binary DVs can be larger than 32K bytes.
> >> >>> >>
> >> >>> >> Mike McCandless
> >> >>> >>
> >> >>> >> http://blog.mikemccandless.com
> >> >>> >>
> >> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> <javascript:;>
> >> <javascript:;>
> >> >>> <javascript:;>>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> > Hi,
> >> >>> >> >
> >> >>> >> > I am getting an IAE indicating one of the SortedDocValueField
> is
> >> too
> >> >>> >> large,
> >> >>> >> > > 32k
> >> >>> >> >
> >> >>> >> > I googled a bit, and it seems like #Lucene-4583 has
addressed
> this
> >> >>> issue
> >> >>> >> in
> >> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1.
Do I miss
> or
> >> >>> >> > misunderstand anything ?
> >> >>> >> >
> >> >>> >> > Thanks,
> >> >>> >> >
> >> >>> >>
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >> >>>
> >> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message