accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Read/Write Invariants Questions
Date Wed, 16 May 2012 07:42:17 GMT
On Wed, May 16, 2012 at 12:15 AM, Sukant Hajra <qn2b6c2b9w@snkmail.com> wrote:
> Hi,
>
> There's a couple of sanity checks I wanted to run by the list:
>
>    1. I see in the documentation that mutations may be partially read unless
>    using IsolatedScanners, which is a way to have atomicity for applications.
>    Is there any other mechanism for atomic operations to know about?

For the batch scanner take a look at the WholeRowIterator and the
batch scanner java docs.

>
>    2. I'm assuming that a flushed write to a row is not guaranteed to be
>    sensed by a subsequent read (no immediate consistency).  Is this correct?

After a call to flush() on a batchwriter returns, any mutations
written before the call to flush should be immediately visible.

>
>    3. When using a BatchWriter does the order in which mutations are added
>    make any reliable assertion on the order that these mutations are sensed by
>    subsequent reads?  Given two mutations A and B, I'd like to assert that any
>    node sensing B will also sense A.

No, the order does not matter.  The batch writer will have multiple
background threads writing mutations to different tablet servers.  So
the mutations will become visible at different times irrespective of
the order you add them.  For the A and B case, you could write both
mutations and then call flush.  After the flush, both will be visible.
 However during the flush operation one may be visible and the other
not visible.

>
>    4. I'm going to have a long standing thread doing batch writing.  Is it
>    reasonable/safe to give this thread an open BatchWriter (making sure to
>    close the writer when shutting down the thread)?  Or might this cause a
>    memory leak?

When you close a batchwrite it flushes any data it has in memory and
shuts down its thread pool.

>
>    5. I'm assuming that BatchWriter is minimally blocking.  Is there any merit
>    to or precedent of load balancing across multiple writers?  Or would that
>    be redundant to optimizations already built into BatchWriter?

Its safe for multiple threads to use one batchwriter.  This may be
more optimal up to the point were there are so many threads that it
causes lock contention.  The nice thing about having multiple threads
share one batch writer is that the background threads sending data to
tablet severs will presumably have larger batches.  This should result
in less network round trips.  It also allows large batches for the
write ahead log on the server side.  Write ahead log batching should
be less of a concern in 1.5 w/ group commit.

>
> Thanks a lot for helping me better understand Accumulo.  Feel free to point me
> to documentation I might have missed.
>
> -Sukant

Mime
View raw message