accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: BatchWriter Improvements - An end user's perspective
Date Tue, 30 Aug 2016 22:21:07 GMT

Keith Turner wrote:
>> >  Assuming batches were isolated from each other, and all batch/mutation
>> >  flushes were controlled and done once per batch, is it difficult because
>> >  the writes could be going to different tablet servers? Couldn't we keep
>> >  track of which failed and have a choice of having a configurable internal
>> >  retry (transient errors) or return the subset of mutations which failed and
>> >  leave it up to the caller? This could work for us. We might want need some
>> >  guarantees for a given row on the same server though - would have to think
>> >  about that.
> The batch writer does retry on network errors (until timeout is
> reached, which defaults to max long or int).  I think the only things
> that percolate up to the users are unexpected exceptions in the batch
> writer, tserver, or constraint violations.  Are you interested in
> knowing what mutations failed because of a timeout?   I don't think
> this can not be done w/o introducing a more expensive multi-step
> protocol for writing data.   Currently when the batch writer sends
> data its possible that the tserver received it and wrote it, but could
> not report success to the client.   The client may either timeout or
> send the data again.

It's trickier because server-side, we're also doing group-commits to the 
WAL. Your update session (started by the BatchWriter) will make some 
updates to the WAL and block on those to be sync'ed to the WAL. In this 
sync, there may be updates to the WAL that include updates other than 
your own.

That said, I'm not sure what the error conditions that Accumulo will 
"normally" throw you such an error (e.g. not related to HDFS being hosed 
or something). Maybe the HoldTimeException (tserver being too busy)? I'd 
have to lock myself in a room and really take a good look at this stuff 
again to refresh the cases where Accumulo might actually see an updated 
but still send you an error... Maybe this isn't a concern to you as I'm 
making it either :)

View raw message