lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Getting a MaxBytesLengthExceededException for a TextField
Date Fri, 25 Oct 2019 13:29:41 GMT
Text-based fields indeed do not have that limit for the _entire_ field. They _do_ have that
limit for any single token produced. So if your field contains, say, a base-64 encoded image
that is not broken up into smaller tokens, you’ll still get this error.

Best,
Erick

> On Oct 25, 2019, at 4:28 AM, Marko Ćurlin <marko.curlin@reversinglabs.com> wrote:
> 
> Hi everyone,
> 
> I am getting an
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while
> trying to insert a list with 9 elements, of which one is 242905 bytes long,
> into Solr.  I am aware that StrField has a hard limit of slightly less than
> 32k. I am using a TextField that by my understanding hasn't got such a
> limit, as tested here
> <https://stackoverflow.com/questions/32936361/in-solr-what-is-the-maximum-size-of-a-text-field>
> (taking into consideration that the field wasn't multivalued). So I'm
> wondering what is the correlation here, and how could it be solved? Below I
> have the error and the relevant part of the solr managed_schema. I am still
> new to Solr so take into account that there could be something obvious I am
> missing.
> 
> ERROR:
> 
> "error":{
>    "metadata":[
>      "error-class","org.apache.solr.common.SolrException",
>      "root-error-class","org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException",
>      "error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
>      "root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
>    "msg":"Async exception during distributed update: Error from
> server at http://solr-host:8983/solr/search_collection_xx: Bad Request
> \n\n request: http://solr-host:8983/solr/search_collection_xx \n\n
> Remote error message: Exception writing document id <document_id> to
> the index; possible analysis error: Document contains at least one
> immense term in field=\"text_field_name\" (whose UTF8 encoding is
> longer than the max length 32766), all of which were skipped.  Please
> correct the analyzer to not produce such terms.  The prefix of the
> first immense term is: '[115, 97, 115, 109, 101, 45, 100, 97, 109,
> 101, 46, 99, 111, 109, 47, 108, 121, 99, 107, 97, 47, 37, 50, 50, 37,
> 50, 48, 109, 101, 116]...', original message: bytes can be at most
> 32766 in length; got 242905. Perhaps the document has an indexed
> string field (solr.StrField) which is too large",
>    "code":400}
> }
> 
> relevant managed_schema:
> 
>    <dynamicField name="text_field_*"  indexed="true" stored="true"
> multiValued="true" type="case_insensitive_text" />
> 
>    <fieldType name="case_insensitive_text" class="solr.TextField"
> multiValued="false">
>      <analyzer type="index">
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> 
> Best regards,
> Marko


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message