accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: BatchWriter woes
Date Tue, 24 Jun 2014 13:42:06 GMT
On Tue, Jun 24, 2014 at 9:09 AM, William Slacum <> wrote:

> I can try to confirm that, but the monitor isn't showing any failures
> during ingest. By "half dead" do you mean the master thinks it is alive,
> but in actuality it isn't?

yeah, its holdings it lock in zookeeper (and has tablets assigned to it in
metadata) but is unresponsive to clients.

in this case you may see the behavior you described.  Batchwriters buffer
more and more data for that tserver, and are able to send less and less to
other tservers until finally the batchwriters buffers are full of data for
the half-alive tserver.

> On Fri, Jun 20, 2014 at 10:32 AM, Keith Turner <> wrote:
>> On Thu, Jun 19, 2014 at 11:57 PM, William Slacum <
>>> wrote:
>>> I'm finding some ingest jobs I have running in a bit of a sticky sitch:
>>> I have a MapReduce job that reads a table, transforms the entries,
>>> creates an inverted index, and writes out mutations to two tables. The
>>> cluster size is in the tens of nodes, and I usually have 32 mappers running.
>>> The batch writer configs are:
>>> - memory buffer: 128MB
>>> - max latency: 5 minutes
>>> - threads: 32
>>> - timeout: default Long.MAX_VALUE
>>> I know we're on Accumulo 1.5.0 and I believe using CDH 4.5.0, Zookeeper
>>> 3.3.6.
>>> I'm noticing an ingest pattern of usually ok rates for the cluster (in
>>> the 100K+ entries per second), but after some time they start to drop off
>>> to ~10K E/s. Sometimes this happens when a round of compactions kicks off
>>> (usually major, not minor), sometimes not. Eventually, the mappers will
>>> timeout. We have them set to timeout after 10 minutes of not reporting
>>> status.
>>> I added a bit of probing/profiling, and noticed that there's an
>>> exponential growth in per entry processing time in the mapper. They're of
>>> pretty uniform size, so there should not be much variance in the times. The
>>> times go from single milliseconds, to hundreds of milliseconds, to seconds,
>>> to minutes.
>>> If I jstack a mapper, it's sitting in TabletServerBatchWriter#waitRTE.
>>> It should only enter that method if the batch writer has (a) too much data
>>> buffered or (b) the user requested a flush. I'm inferring that (a) is the
>>> case, because there is no explicit TabletServerBatchWriter#flush() call.
>>> We did notice that there was a send thread trying to send to a dead
>>> server. We can't ssh to the IP it was trying to send to, and have verified
>>> manually that it's not listed in the current tablet servers. We did notice
>>> that the master log is reporting that a recovery on a WAL associated with
>>> that IP is under way. Looking back, the master had been reporting that
>>> message for about a day and a half. The message was similar to the one
>>> described in . I do
>>> not know the significance of this as it relates to my jobs.
>> Do you think its trying to write to a half dead server?  Does that server
>> still have locations in the metadata table?
>>> I did some digging in TabletServerBatchWriter, and the only thing I can
>>> kind of see happening is that if SendTask#sendMutationsToTabletServer
>>> receives a TException, it rethrows it as an IOException, then SendTask#send
>>> will catch that exception and add the mutations to the failures collection.
>>> Since the timeout is Long.MAX_VALUE, I think it's possible this loop can
>>> continue forever or until some outside force kills the entire process.
>>> Does this seem coherent? Is there anything else that could cause this?
>>> I'm on the track of converting the code over to using bulk ingest, but I
>>> think there's an issue with a vanilla BatchWriter that I would just be
>>> getting around instead of actually fixing.
>>> Also, I'd love to provide logs, but there's a high amount of friction in
>>> getting them, so I won't be able to deliver on that front.

View raw message