lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Dubious error message?
Date Fri, 05 Aug 2016 04:51:23 GMT
Question 2: Not that I know of

Question 2.1. It's actually pretty difficult to understand why a single _term_
can be over 32K and still make sense. This is not to say that a
single _text_ field can't be over 32K, each term within that field
is (usually) much less than that.

Do you have a real-world use-case where you have a 115K term
that can _only_ be matched by searching for exactly that
sequence of 115K characters? Not substrings. Not wildcards. A
"string" type (as opposed to anything based on solr.Textfield).

As far as the error message is concerned, that does seem somewhat opaque.
Care to raise a JIRA on it (and, if you're really ambitious attach a patch)?


On Thu, Aug 4, 2016 at 8:20 PM, Trejkaz <> wrote:
> Trying to add a document, someone saw:
>     java.lang.IllegalArgumentException: Document contains at least one
> immense term in field="bcc-address" (whose UTF8 encoding is longer
> than the max length 32766), all of which were skipped.  Please correct
> the analyzer to not produce such terms.  The prefix of the first
> immense term is: '[00, --omitted--]...', original message: bytes can
> be at most 32766 in length; got 115597
> Question 1: It says the bytes are being skipped, but to me "skipped"
> means it's just going to continue, yet I get this exception. Is that
> intentional?
> Question 2: Can we turn this check off?
> Question 2.1: Why limit in the first place? Every time I have ever
> seen someone introduce a limit, it has only been a matter of time
> until someone hits it, no matter how improbable it seemed when it was
> put in.
> TX
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message