lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
Date Thu, 07 Jun 2012 15:03:23 GMT

    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291055#comment-13291055
] 

Yonik Seeley commented on SOLR-445:
-----------------------------------

I imagine a maxErrors parameter might be useful (and more readable than abortOnFirstBatchIndexError)

maxErrors=0 (the current behavior - stop processing more updates when we hit an error)
maxErrors=10 (allow up to 10 documents to fail before aborting the update... useful for true
bulk uploading where you want to allow for an isolated failure or two, but still want to stop
if every single update is failing because something is configured wrong)
maxErrors=-1 (allow an unlimited number of documents to fail)

Making updates transactional seems really tough in cloud mode since we don't keep old versions
of documents around... although it might be possible for a short time with the transaction
log.  Anyway, that should definitely be a separate issue.

A couple of other notes:
 - structured error responses were recently added in 4.0-dev that should make this issue easier
in general.  Example:
{code}
{"responseHeader":{"status":400,"QTime":0},"error":{"msg":"ERROR: [doc=mydoc] unknown field
'foo'","code":400}}
{code}
 - Per did some error handling work that's included in a patch attached to SOLR-3178
                
> Update Handlers abort with bad documents
> ----------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>             Fix For: 4.1
>
>         Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either
fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option
1 would seem to be much harder to accomplish and possibly require more memory while Option
2 would require more information to come back from the API.  I'm about to dig into this but
I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message