Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC3EE10DDB for ; Thu, 5 Sep 2013 15:23:57 +0000 (UTC) Received: (qmail 14357 invoked by uid 500); 5 Sep 2013 15:23:57 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 14280 invoked by uid 500); 5 Sep 2013 15:23:54 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 14094 invoked by uid 99); 5 Sep 2013 15:23:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Sep 2013 15:23:52 +0000 Date: Thu, 5 Sep 2013 15:23:52 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-1685) bench testing shows that the NN loses the WAL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759156#comment-13759156 ] ASF subversion and git services commented on ACCUMULO-1685: ----------------------------------------------------------- Commit ff02d20db00dfa00929365e1b7befc5ebb91f76f in branch refs/heads/master from [~ecn] [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ff02d20 ] ACCUMULO-1685 properly parse logSet data for confirming deletes > bench testing shows that the NN loses the WAL > --------------------------------------------- > > Key: ACCUMULO-1685 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1685 > Project: Accumulo > Issue Type: Bug > Components: tserver > Environment: Hadoop 1.0.4, single node dev't system > Reporter: Eric Newton > Assignee: Eric Newton > Priority: Critical > Fix For: 1.6.0 > > > Doing bench testing; I build accumulo: > {noformat} > $ mvn -Pnative package -DskipTests > {noformat} > I go into the assembly area and configure and run accumulo > {noformat} > $ cd assemble/target/accumulo-1.6.0-SNAPSHOT-dev/accumulo-1.6.0-SNAPSHOT > $ cp ~/conf/* conf > $ hadoop fs -rmr /accumulo > Moved to trash: hdfs://somehost:9000/accumulo > $ ( echo test ; echo Y ; echo secret ; echo secret ) | ./bin/accumulo init > $ 2013-09-04 12:23:51,558 [util.Initialize] INFO : Hadoop Filesystem is hdfs://somehost:9000 > 2013-09-04 12:23:51,559 [util.Initialize] INFO : Accumulo data dirs are [hdfs://somehost:9000/accumulo] > 2013-09-04 12:23:51,559 [util.Initialize] INFO : Zookeeper server is localhost:2181 > 2013-09-04 12:23:51,559 [util.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running > Instance name : test > Instance name "test" exists. Delete existing entry from zookeeper? [Y/N] : Y > Enter initial password for root (this may not be applicable for your security setup): ****** > Confirm initial password for root: ****** > $ ./bin/start-all.sh > Starting monitor on localhost > Starting tablet servers .... done > Starting tablet server on localhost > 2013-09-04 12:26:24,545 [server.Accumulo] INFO : Attempting to talk to zookeeper > 2013-09-04 12:26:24,675 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS > 2013-09-04 12:26:24,679 [server.Accumulo] INFO : Connected to HDFS > Starting master on localhost > Starting garbage collector on localhost > Starting tracer on localhost > {noformat} > Next, create a table > {noformat} > $ ./bin/accumulo shell -u root -p secret > 2013-09-04 12:27:01,628 [shell.Shell] WARN : Specifying a raw password is deprecated. > Shell - Apache Accumulo Interactive Shell > - > - version: 1.6.0-SNAPSHOT > - instance name: test > - instance id: 1967c1ec-cc0f-439b-b4da-4029debd16e3 > - > - type 'help' for a list of available commands > - > root@test> createtable t > root@test t> > {noformat} > Then I checked the tserver log for the write-ahead log created for this update to the root table: > {noformat} > $ fgrep -a /wal/ logs/tserver_*.debug.log > 2013-09-04 12:26:27,130 [log.DfsLogger] DEBUG: Got new write-ahead log: localhost+9997/hdfs://rd6ul-14706v.tycho.ncsc.mil:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 > 2013-09-04 12:26:58,264 [tabletserver.Tablet] DEBUG: Logs for memory compacted: !!R<< localhost+9997/hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 > {noformat} > Now, let's check for the file: > {noformat} > $ hadoop fs -ls hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 > ls: Cannot access hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9: No such file or directory. > {noformat} > What? > Check the NN logs: > {noformat} > $ fgrep 1dd2727f /some/log/dir/hadoop-ecnewt2-local-namenode-somehost.log > 2013-09-04 12:26:27,075 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9. blk_-6011963215434912690_971163 > 2013-09-04 12:26:27,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.fsync: file /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 for DFSClient_-787226921 > {noformat} > So, the NN seems to be making the file, but it's not there when we go to look! > Here's my hdfs-site.xml file: > {noformat} > > > > > > dfs.replication > 1 > > > dfs.name.dir > /local/ecn/data/hadoop/nn > > > dfs.data.dir > /disk01/data/hadoop/dn,/disk02/data/hadoop/dn,/disk03/data/hadoop/dn > > > dfs.support.append > true > > > dfs.data.synconclose > true > > > {noformat} > I have written an integration test that I dumped into RestartIT.java, but that doesn't seem to fail in same way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira