lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautovic <emir.arnauto...@sematext.com>
Subject Re: SOLR 4.10.4 - error creating document
Date Mon, 11 May 2015 13:25:30 GMT
Hi Bernrd,
dcdescription field is not indexed.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 11.05.2015 15:22, Bernd Fehling wrote:
> Hi Emir,
>
> the dcdescription field is definately to big.
> But why is it complaining about f_dcperson and not dcdescription?
>
> Regards
> Bernd
>
>
> Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:
>> Hi Bernd,
>> Issue is with f_dcperson and what ends up in that field. It is configured to be string,
which means it is not tokenized so if some huge value is
>> in either dccreator or dccontributor it will end up as single term. Nemes suggest
that it should not contain such values, but double check in
>> your import code if you are reading wrong column or concatenating contributors or
something else causing value to be to big. Also check if you
>> have some copyField that should not be there.
>>
>> Thanks,
>> Emir
>> -- 
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On 11.05.2015 14:13, Bernd Fehling wrote:
>>> I'm getting the following error with 4.10.4
>>>
>>> WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document
:
>>> SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,....
>>> ....
>>> ..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
>>> org.apache.solr.common.SolrException: Exception writing document
>>> id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible
analysis error.
>>>           at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
>>>           at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
>>>           at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>>> ...
>>>           at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
>>>           at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
>>> Caused by: java.lang.IllegalArgumentException: Document contains at least one
immense term
>>> in field="f_dcperson" (whose UTF8 encoding is longer than the max length 32766),
all of which were skipped.
>>> Please correct the analyzer to not produce such terms.  The prefix of the first
immense
>>> term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115,
101, 101, 32, 66, 114,
>>> 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
>>> bytes can be at most 32766 in length; got 38177
>>>           at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
>>> ...
>>>
>>>
>>> My huge field is dcdescription, with the following schema:
>>>
>>>      <field name="dccreator" type="string" indexed="true" stored="true" multiValued="true"
/>
>>>      <field name="dcdescription" type="string" indexed="false" stored="true"
/>
>>>      <field name="f_dcperson" type="string" indexed="true" stored="true" multiValued="true"
/>
>>> ...
>>>     <copyField source="dccreator" dest="f_dcperson" />
>>>     <copyField source="dccontributor" dest="f_dcperson" />
>>>
>>>
>>> I guess I have to make dcdescription also "multivalue=true"?
>>>
>>> But why is it complaining about f_dcperson which is already multivalue?
>>>
>>> Second guess, dcdescription is not multivalue, but filled to max (32766).
>>> Then it is UTF8 encoded and going beyond 32766 which is larger than a single
subfield
>>> of a multivaled field and therefore the error?
>>>
>>> Any really explanation on this and how to prevent it?
>>>
>>> Regards
>>> Bernd


Mime
View raw message