accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: BatchWriter Improvements - An end user's perspective
Date Tue, 30 Aug 2016 19:10:23 GMT
On Tue, Aug 30, 2016 at 2:43 PM, Michael Moss <michael.moss@gmail.com> wrote:
> Thanks, Josh. Comments below.
>
> On Tue, Aug 30, 2016 at 2:02 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
>> tl;dr These are very good points, Mike. If you have the time, I think it
>> would be great to start scratching out requirements, stub out what the API
>> would look like, and box the work (as that would make it easier for you or
>> another dev to pick it up and implement it).
>
>
> I think stubbing out some APIs is a good idea. We can do that.
>
>
>>
>> Michael Moss wrote:
>>
>>> Hello, Folks.
>>>
>>> As I look at the following tickets, I thought it might be useful to share
>>> how we are using the BatchWriter, some of the challenges we've had, some
>>> thoughts about it's redesign and how we might get involved.
>>>
>>> https://issues.apache.org/jira/browse/ACCUMULO-4154
>>> https://issues.apache.org/jira/browse/ACCUMULO-2589
>>> https://issues.apache.org/jira/browse/ACCUMULO-2990
>>>
>>> One of our primary use cases of the BatchWriter is from within a Storm
>>> topology, reading from Kafka. Generally speaking, storm might be
>>> persisting
>>> a a single or small set of mutations at a time (low latency), or in larger
>>> batches with Trident (higher throughput). In addition to ACCUMULO-2990
>>> (any
>>> TimedOutException, which then throws MutationsRejectedException and
>>> requires a new connection to be made), one of our requirements is to
>>> ensure
>>> that any given thread's mutations are the ones which are flushed and none
>>> others (pseudo transactions). Otherwise, we might get a failure for a
>>> mutation which belongs to another thread (and already ACKed by Storm)
>>> which
>>> means we don't have a 'handle' on that offset anymore in Kafka to replay
>>> the failure - i.e. the message could be 'lost'.
>>>
>>> Despite being threadsafe, we end up using a single BatchWriter per thread
>>> to make reasoning about the above simpler, but this creates a resource
>>> issue - number of connections to accumulo and zk.
>>>
>>> This all makes me wonder what the design goals might have been for the
>>> current version of the driver and if the efforts to rewrite it might
>>> benefit from incorporating elements to address some of these use cases
>>> above.
>>>
>>> What can we learn from how drivers for other "NoSQL" databases are
>>> implemented? Would it make sense to remove all the global variables
>>> ("somethingFailed"), thread sleep/notify, frequent calls to
>>> "checkForFailures()" and consider using a 'connection pool' model where
>>> writes are single-threaded, linearized and isolated during the connection
>>> lease?
>>>
>>
>> The MultiTableBatchWriter was an attempt in this direction for bounded
>> resources. In the case where you had a single client writing to multiple
>> tables, you wanted to be able to say "I want all ingest from this client to
>> my tables to use X resources".
>>
>> I think your point about resource management across multiple BatchWriters
>> is a big problem presently when looked at with the concurrency problems you
>> outline.
>>
>> Being unable to determine which mutations succeeded/failed in a batch is a
>> big pain. However, making this have exactly-once semantics would be
>> extremely difficult
>
>
> Assuming batches were isolated from each other, and all batch/mutation
> flushes were controlled and done once per batch, is it difficult because
> the writes could be going to different tablet servers? Couldn't we keep
> track of which failed and have a choice of having a configurable internal
> retry (transient errors) or return the subset of mutations which failed and
> leave it up to the caller? This could work for us. We might want need some
> guarantees for a given row on the same server though - would have to think
> about that.

The batch writer does retry on network errors (until timeout is
reached, which defaults to max long or int).  I think the only things
that percolate up to the users are unexpected exceptions in the batch
writer, tserver, or constraint violations.  Are you interested in
knowing what mutations failed because of a timeout?   I don't think
this can not be done w/o introducing a more expensive multi-step
protocol for writing data.   Currently when the batch writer sends
data its possible that the tserver received it and wrote it, but could
not report success to the client.   The client may either timeout or
send the data again.

>
>
>>
>>
>> > Could we make the client non-blocking and with optional pipelining,
>> > so multiple writes could share a connection and allow interleaving of
>> > operations (with individual acks)?
>>
>> Right now, I don't think so. Multiplexing one connection isn't something
>> that Thrift is capable of AFAIK (whereas this is something that Hadoop RPC
>> can do). Presently, connections will remain open/cached to a tserver, but
>> they cannot be concurrently shared.
>
>
> Good to know. Pipelining an internal optimization, so less
> visible/important to us. Would feel much more comfortable if the guts of
> Connector/BatchWriter had better resource sharing and less threading code.
>
>
>>
>>
>> Looking forward to hearing everyone's thoughts.
>>>
>>> -Mike
>>>
>>>

Mime
View raw message