accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-2227) Concurrent randomwalk fails when namenode dies after bulk import step
Date Wed, 22 Jan 2014 19:07:23 GMT


ASF subversion and git services commented on ACCUMULO-2227:

Commit 06f80305e4587f519cb3dfae0686b52b32e7a0b8 in branch refs/heads/1.6.0-SNAPSHOT from [~bhavanki]
[;h=06f8030 ]

ACCUMULO-2227 / ACCUMULO-2228 Update randomwalk README with HA warning

Hadoop 2.1.0 includes better retry / failover handling than prior versions. This
commit adds a warning to the randomwalk README advising testers to expect more
failures exercising HA under Hadoop versions before 2.1.0.

> Concurrent randomwalk fails when namenode dies after bulk import step
> ---------------------------------------------------------------------
>                 Key: ACCUMULO-2227
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.4.4
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>              Labels: ha, randomwalk, test
> Running Concurrent randomwalk under HDFS HA, if the active namenode is killed:
> {noformat}
> 20 12:27:51,119 [retry.RetryInvocationHandler] WARN : Exception while invoking class
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete. Not retrying
because the invoked method is not idempotent, and unable to determine whether it was invoked
> Failed on local exception: Response is null.;
Host Details : local host is: ""; destination host is: "":8020;
> ...
>  at org.apache.hadoop.hdfs.DFSClient.delete(
> at org.apache.hadoop.hdfs.DistributedFileSystem.delete(
> at org.apache.accumulo.server.test.randomwalk.concurrent.BulkImport.visit(
> ...
> Caused by: Response is null.
> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(
> at org.apache.hadoop.ipc.Client$
> {noformat}
> This arises from an HDFS path delete call that cleans up from the bulk import. The test
should be resilient here (and when the paths are made earlier in the test) so that the test
can continue once failover has completed.

This message was sent by Atlassian JIRA

View raw message