lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2
Date Thu, 11 Feb 2016 18:18:25 GMT
I first wrote the “fall back to one at a time” code for Solr 1.3.

It is pretty easy if you plan for it. Make the batch size variable. When a batch fails, retry
with a batch size of 1 for that particular batch. Then keep going or fail, either way, you
have good logging on which one failed.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 11, 2016, at 10:06 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> Steven's solution is a very common one, complete to the
> notion of re-chunking. Depending on the throughput requirements,
> simply resending the offending packet one at a time is often
> sufficient (but not _efficient). I can imagine fallback scenarios
> like "try chunking 100 at a time, for those chunks that fail
> do 10 at a time and for those do 1 at a time".
> 
> That said, in a lot of situations, the number of failures is low
> enough that just falling back to one at a time while not elegant
> is sufficient....
> 
> It sure will be nice to have SOLR-445 done, if we can just keep
> Hoss from going crazy before he gets done.
> 
> Best,
> Erick
> 
> On Thu, Feb 11, 2016 at 7:39 AM, Steven White <swhite4141@gmail.com> wrote:
>> For my application, the solution I implemented is I log the chunk that
>> failed into a file.  This file is than post processed one record at a
>> time.  The ones that fail, are reported to the admin and never looked at
>> again until the admin takes action.  This is not the most efficient
>> solution right now but I intend to refactor this code so that the failed
>> chunk is itself re-processed in smaller chunks till the chunk with the
>> failed record(s) is down to 1 record "chunk" that will fail.
>> 
>> Like Debraj, I would love to hear from others how they handle such failures.
>> 
>> Steve
>> 
>> 
>> On Thu, Feb 11, 2016 at 2:29 AM, Debraj Manna <subharaj.manna@gmail.com>
>> wrote:
>> 
>>> Thanks Erik. How do people handle this scenario? Right now the only option
>>> I can think of is to replay the entire batch by doing add for every single
>>> doc. Then this will give me error for all the docs which got added from the
>>> batch.
>>> 
>>> On Tue, Feb 9, 2016 at 10:57 PM, Erick Erickson <erickerickson@gmail.com>
>>> wrote:
>>> 
>>>> This has been a long standing issue, Hoss is doing some current work on
>>> it
>>>> see:
>>>> https://issues.apache.org/jira/browse/SOLR-445
>>>> 
>>>> But the short form is "no, not yet".
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Tue, Feb 9, 2016 at 8:19 AM, Debraj Manna <subharaj.manna@gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> 
>>>>> I have a Document Centric Versioning Constraints added in solr schema:-
>>>>> 
>>>>> <processor class="solr.DocBasedVersionConstraintsProcessorFactory">
>>>>>  <bool name="ignoreOldUpdates">false</bool>
>>>>>  <str name="versionField">doc_version</str>
>>>>> </processor>
>>>>> 
>>>>> I am adding multiple documents in solr in a single call using SolrJ
>>> 5.2.
>>>>> The code fragment looks something like below :-
>>>>> 
>>>>> 
>>>>> try {
>>>>>        UpdateResponse resp = solrClient.add(docs.getDocCollection(),
>>>>>            500);
>>>>>        if (resp.getStatus() != 0) {
>>>>>        throw new Exception(new StringBuilder(
>>>>>            "Failed to add docs in solr ").append(resp.toString())
>>>>>            .toString());
>>>>>        }
>>>>>    } catch (Exception e) {
>>>>>        logError("Adding docs to solr failed", e);
>>>>>    }
>>>>> 
>>>>> 
>>>>> If one of the document is violating the versioning constraints then
>>> Solr
>>>> is
>>>>> returning an exception with error message like "user version is not
>>> high
>>>>> enough: 1454587156" & the other documents are getting added perfectly.
>>> Is
>>>>> there a way I can know which document is violating the constraints
>>> either
>>>>> in Solr logs or from the Update response returned by Solr?
>>>>> 
>>>>> Thanks
>>>> 
>>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message