lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Daubman <daub...@gmail.com>
Subject Re: How to more gracefully handle field format exceptions?
Date Tue, 25 Sep 2012 01:26:53 GMT
Hi Otis,

I was just looking at how to implement that, but was hoping for a
cleaner method - it seems like I will have to actually parse the error
as text to find the field that caused it, then remove/mangle that
field and attempt re-adding the document - which seems less than
ideal.

I would think there would be a flag or an easy way to override the add
method that would just drop (or set to default value) any field that
didn't meet expectations.

Thanks for the suggestion,
     Aaron

On Mon, Sep 24, 2012 at 9:24 PM, Otis Gospodnetic
<otis.gospodnetic@gmail.com> wrote:
> Hi Aaron,
>
> You could catch the error on the client, fix/clean/remove, and retry, no?
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman <daubman@gmail.com> wrote:
>> Greetings,
>>
>> Is there a way to configure more graceful handling of field formatting
>> exceptions when indexing documents?
>>
>> Currently, there is a field being generated in some documents that I
>> am indexing that is supposed to be a float but some times slips
>> through as an empty string. (I know, fix the docs, but sometimes bad
>> values slip through, and it would be nice to handle them in a more
>> forgiving manner).
>>
>> Here's an example of the exception - when this happens, the entire doc
>> is thrown out due to the one malformed field:
>> ---snip---
>> ERROR org.apache.solr.core.SolrCore -
>> org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error
>> adding field 'f_floatfield'=''
>> ...
>> Caused by: java.lang.NumberFormatException: empty String
>>
>> 00:56:46,288 [SI] WARN  com.company.IndexerThread - BAD DOC:
>> a82a2f6a6a42ad3c98a05ddb3f2c382c
>> 01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore -
>> org.apache.solr.common.SolrException: ERROR:
>> [doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field
>> 'f_afloatfield'=''
>>         at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
>>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>>         at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
>>         at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
>>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>>         at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142)
>>         at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>>         at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
>>         at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
>>         at com.company.IndexerThread.run(IndexerThread.java:55)
>>         at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.lang.NumberFormatException: empty String
>>         at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011)
>>         at java.lang.Float.parseFloat(Float.java:452)
>>         at org.apache.solr.schema.TrieField.createField(TrieField.java:410)
>>         at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
>>         at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
>>         at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
>>         ... 12 more
>>
>> 01:02:12,713 [SI] WARN  com.company.IndexerThread - BAD DOC:
>> 6ff90020f9ec0f6dd623e9879c3e024d
>> ---snip---
>>
>> In my thinking (and for this situation), it would be much better to
>> just ignore the malformed field and keep the doc - is there any way to
>> configure this or enable this behavior instead?
>>
>> Thanks,
>>      Aaron

Mime
View raw message