lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anurag Sharma <anura...@gmail.com>
Subject Re: Solr exceptions during batch indexing
Date Sat, 08 Nov 2014 10:51:07 GMT
Just trying to understand what's the challenge in returning the bad doc
id(s)?
Solr already know which doc(s) failed on update and can return their id(s)
in response or callback. Can we have JIRA ticket on it if it doesn't exist?

This looks like a common use case and every solr consumer might be writing
their own versions to handle this issue.

On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood <wunder@wunderwood.org>
wrote:

> Right, that is why we batch.
>
> When a batch of 1000 fails, drop to a batch size of 1 and start the batch
> over. Then it can report the exact document with problems.
>
> If you want to continue, go back to the bigger batch size. I usually fail
> the whole batch on one error.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Nov 7, 2014, at 11:44 AM, Peter Keegan <peterlkeegan@gmail.com> wrote:
>
> > I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
> single
> > thread, so it's certainly worth it.
> >
> > Thanks,
> > Peter
> >
> >
> > On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <erickerickson@gmail.com>
> > wrote:
> >
> >> And Walter has also been around for a _long_ time ;)
> >>
> >> (sorry, couldn't resist)....
> >>
> >> Erick
> >>
> >> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <
> wunder@wunderwood.org>
> >> wrote:
> >>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
> >>>
> >>> It isn’t to hard if the code is structured for it; retry with a batch
> >> size of 1.
> >>>
> >>> wunder
> >>>
> >>> On Nov 7, 2014, at 11:01 AM, Erick Erickson <erickerickson@gmail.com>
> >> wrote:
> >>>
> >>>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> >>>> you can't. So far, people have essentially written fallback logic to
> >>>> index the docs of a failing packet one at a time and report it.
> >>>>
> >>>> I'd really like better reporting back, but we haven't gotten there
> yet.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <peterlkeegan@gmail.com>
> >> wrote:
> >>>>> How are folks handling Solr exceptions that occur during batch
> >> indexing?
> >>>>> Solr stops parsing the docs stream when an error occurs (e.g. a
doc
> >> with a
> >>>>> missing mandatory field), and stops indexing the batch. The bad
> >> document is
> >>>>> not identified, so it would be hard for the client to recover by
> >> skipping
> >>>>> over it.
> >>>>>
> >>>>> Peter
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message