accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Performance during node failure
Date Fri, 08 Nov 2013 20:10:11 GMT
On 11/8/13, 2:53 PM, Slater, David M. wrote:
> Hi all,
> I have an 8-node cluster (1 name node, 7 data nodes), running accumulo
> 1.4.2, zookeeper 3.3.6, and hadoop 1.0.3, and I have it optimized for
> ingest performance. My question has to do is how to make the performance
> degrade gracefully under node failure.
> 1) When nodes fail, I assume that what happens is that Accumulo needs to
> migrate those tablets, and hadoop needs to replicate the underlying data
> blocks. This seems to have a rather catastrophic effect on ingest rates.
> Is there a way to make more gradually migrate tablets (starting with
> more active ones) and replicate data blocks in order to not interfere
> with ingestion as severely?

First, the master needs to notice that a TabletServer died (via 
ZooKeeper lock). This will take up to 30 seconds if you haven't 
configured a more aggressive default timeout using 
`instance.zookeeper.timeout`. In practice, I think I normally see it 
take <10seconds for a failure to occur. Next, the Master will reassign 
the tables hosted by this now failed tserver out to new tserver(s). 
Perhaps the Master could sort the tablets to make reassignment happen 
faster, but I would guess that when the new TabletServer tries to bring 
the tablet online and has to perform recovery (as is the case for you 
with active ingest), this would trump the amount of time for the master 
to request the tablets to be brought online.

Ultimately, I would guess this is a balancing act between how large you 
configure the in-memory maps which should speed up ingest, and how long 
the penalty for recovery is when the amount of data you have to recover 
is much larger.

> 2) What happens to BatchWriters when a tablet server fails that it is
> attempting to write to? Will I need to start catching MutationRejected
> exceptions, will it block, or is there some other failure mode?

The BatchWriter will block/retry these mutations. You shouldn't have to 
do anything special to handle TabletServer failure at the BatchWriter level.

> 3) This I believe is a separate issue from node failure, but I was
> seeing some very odd zookeeper behavior, involving a number of timeouts.
> I currently have zookeeper running on all 7 data nodes, with the
> batchwriters running on the name node. Basically, I was getting a number
> of the following:
> client session timed out …
> opening socket connection
> socket connection established
> session establishment complete
> …
> client session timed out …
> repeat

This may be normal zookeeper operations. As the session times out, if 
the client is still there, it will renew. I'm not a zookeeper expert though.

> I would also occasionally get
> session expired for /accumulo/fe7…
> as well as
> Zookeper.KeeperException$Connectionloss
> Exception: KeeperErrorCode = Connectionloss
> for /accumulo/f37…/tables/3b/state
> at accumulo.core.zookeeper.ZooCache$
> accumulo.core.zookeeper.ZooCache.retry
> accumulo.core.zookeeper.ZooCach.get
> core.clientimpl.tables.getTableState
> core.clientimpl.multiTableBatchWriter.getBatchWriter

I'm guessing your "myIngestorProcess" doesn't actually fail, does it? 
Again, I'm guessing "normal" operations, although things like 
maxClientCnxns in zoo.cfg can influence this.

> Does anyone know if this is an Accumulo problem, a Zookeeper problem, or
> something else (network overly busy, etc.)?
> Thanks,
> Dvaid

View raw message