lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
Date Mon, 18 Apr 2011 00:32:06 GMT

     [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-445:
--------------------------------

    Attachment: SOLR-445.patch

So, Grant. How do you feel about refactorings <G>?

I got bitten by this problem again so I decided to dust off the patch, and I re-created it.
This one shouldn't have the gratuitous re-formatting. But, after I added the bookkeeping,
the method got even more unwieldy, so I extracted some of the code to methods in XMLLoader.
I also have the un-refactored version if this one is too painful.

This patch incorporates the changes you suggested months ago. I'm a little uncertain whether
putting a constant in UpdateParams.java was the correct place, but it seemed like a pattern
used for other parameters.

One minor issue: The behavior is the same here as it used to be if you don't start the packet
with <add>. An NPE is thrown. That's because the addCmd variable isn't initialized until
the <add> tag is encountered and the NPE is a result of using the addCmd variable later
(I think I was seeing it at line 118). I think it would be better to fail if the first element
wasn't an <add> element rather than because it just happens to cause an NPE.

While I'm at it, though, what do you think about making this robust enough to ignore ?xml
and/or !DOCTYPE entries? Or is that just not worth the bother?

Erick

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Assignee: Grant Ingersoll
>             Fix For: Next
>
>         Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either
fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option
1 would seem to be much harder to accomplish and possibly require more memory while Option
2 would require more information to come back from the API.  I'm about to dig into this but
I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message