lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: SOLR 4.10.4 - error creating document
Date Mon, 11 May 2015 13:30:45 GMT
Hi Emir,

ahhh, yes you're right. I missed that. Now I understand why it is not
complaining about dcdescription and the error shows up on f_dcperson.
"delay of error" ;-)

Thanks
Bernd



Am 11.05.2015 um 15:25 schrieb Emir Arnautovic:
> Hi Bernrd,
> dcdescription field is not indexed.
> 
> Thanks,
> Emir
> 
> -- 
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On 11.05.2015 15:22, Bernd Fehling wrote:
>> Hi Emir,
>>
>> the dcdescription field is definately to big.
>> But why is it complaining about f_dcperson and not dcdescription?
>>
>> Regards
>> Bernd
>>
>>
>> Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:
>>> Hi Bernd,
>>> Issue is with f_dcperson and what ends up in that field. It is configured to
be string, which means it is not tokenized so if some huge value is
>>> in either dccreator or dccontributor it will end up as single term. Nemes suggest
that it should not contain such values, but double check in
>>> your import code if you are reading wrong column or concatenating contributors
or something else causing value to be to big. Also check if you
>>> have some copyField that should not be there.
>>>
>>> Thanks,
>>> Emir
>>> -- 
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>> On 11.05.2015 14:13, Bernd Fehling wrote:
>>>> I'm getting the following error with 4.10.4
>>>>
>>>> WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document
:
>>>> SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,....
>>>> ....
>>>> ..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
>>>> org.apache.solr.common.SolrException: Exception writing document
>>>> id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible
analysis error.
>>>>           at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
>>>>           at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
>>>>           at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>>>> ...
>>>>           at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
>>>>           at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
>>>> Caused by: java.lang.IllegalArgumentException: Document contains at least
one immense term
>>>> in field="f_dcperson" (whose UTF8 encoding is longer than the max length
32766), all of which were skipped.
>>>> Please correct the analyzer to not produce such terms.  The prefix of the
first immense
>>>> term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115,
101, 101, 32, 66, 114,
>>>> 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
>>>> bytes can be at most 32766 in length; got 38177
>>>>           at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
>>>> ...
>>>>
>>>>
>>>> My huge field is dcdescription, with the following schema:
>>>>
>>>>      <field name="dccreator" type="string" indexed="true" stored="true"
multiValued="true" />
>>>>      <field name="dcdescription" type="string" indexed="false" stored="true"
/>
>>>>      <field name="f_dcperson" type="string" indexed="true" stored="true"
multiValued="true" />
>>>> ...
>>>>     <copyField source="dccreator" dest="f_dcperson" />
>>>>     <copyField source="dccontributor" dest="f_dcperson" />
>>>>
>>>>
>>>> I guess I have to make dcdescription also "multivalue=true"?
>>>>
>>>> But why is it complaining about f_dcperson which is already multivalue?
>>>>
>>>> Second guess, dcdescription is not multivalue, but filled to max (32766).
>>>> Then it is UTF8 encoded and going beyond 32766 which is larger than a single
subfield
>>>> of a multivaled field and therefore the error?
>>>>
>>>> Any really explanation on this and how to prevent it?
>>>>
>>>> Regards
>>>> Bernd
> 

Mime
View raw message