Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Thu, 5 Sep 2013 15:23:52 +0000 (UTC)
From: "ASF subversion and git services (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12666920.1378313585460.85600.1378394632696@arcas>
In-Reply-To: <JIRA.12666920.1378313585460@arcas>
References: <JIRA.12666920.1378313585460@arcas>
Subject: [jira] [Commented] (ACCUMULO-1685) bench testing shows that the NN
 loses the WAL
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/ACCUMULO-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759156#comment-13759156 ] 

ASF subversion and git services commented on ACCUMULO-1685:
-----------------------------------------------------------

Commit ff02d20db00dfa00929365e1b7befc5ebb91f76f in branch refs/heads/master from [~ecn]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ff02d20 ]

ACCUMULO-1685 properly parse logSet data for confirming deletes

                
> bench testing shows that the NN loses the WAL
> ---------------------------------------------
>
>                 Key: ACCUMULO-1685
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1685
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>         Environment: Hadoop 1.0.4, single node dev't system
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> Doing bench testing; I build accumulo:
> {noformat}
> $ mvn -Pnative package -DskipTests
> {noformat}
> I go into the assembly area and configure and run accumulo
> {noformat}
> $ cd assemble/target/accumulo-1.6.0-SNAPSHOT-dev/accumulo-1.6.0-SNAPSHOT
> $ cp ~/conf/* conf
> $ hadoop fs -rmr /accumulo
> Moved to trash: hdfs://somehost:9000/accumulo
> $ ( echo test ; echo Y ; echo secret ; echo secret ) | ./bin/accumulo init
> $ 2013-09-04 12:23:51,558 [util.Initialize] INFO : Hadoop Filesystem is hdfs://somehost:9000
> 2013-09-04 12:23:51,559 [util.Initialize] INFO : Accumulo data dirs are [hdfs://somehost:9000/accumulo]
> 2013-09-04 12:23:51,559 [util.Initialize] INFO : Zookeeper server is localhost:2181
> 2013-09-04 12:23:51,559 [util.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running
> Instance name : test
> Instance name "test" exists. Delete existing entry from zookeeper? [Y/N] : Y
> Enter initial password for root (this may not be applicable for your security setup): ******
> Confirm initial password for root: ******
> $ ./bin/start-all.sh 
> Starting monitor on localhost
> Starting tablet servers .... done
> Starting tablet server on localhost
> 2013-09-04 12:26:24,545 [server.Accumulo] INFO : Attempting to talk to zookeeper
> 2013-09-04 12:26:24,675 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS
> 2013-09-04 12:26:24,679 [server.Accumulo] INFO : Connected to HDFS
> Starting master on localhost
> Starting garbage collector on localhost
> Starting tracer on localhost
> {noformat}
> Next, create a table
> {noformat}
> $ ./bin/accumulo shell -u root -p secret
> 2013-09-04 12:27:01,628 [shell.Shell] WARN : Specifying a raw password is deprecated.
> Shell - Apache Accumulo Interactive Shell
> - 
> - version: 1.6.0-SNAPSHOT
> - instance name: test
> - instance id: 1967c1ec-cc0f-439b-b4da-4029debd16e3
> - 
> - type 'help' for a list of available commands
> - 
> root@test> createtable t
> root@test t> 
> {noformat}
> Then I checked the tserver log for the write-ahead log created for this update to the root table:
> {noformat}
> $ fgrep -a /wal/ logs/tserver_*.debug.log
> 2013-09-04 12:26:27,130 [log.DfsLogger] DEBUG: Got new write-ahead log: localhost+9997/hdfs://rd6ul-14706v.tycho.ncsc.mil:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
> 2013-09-04 12:26:58,264 [tabletserver.Tablet] DEBUG: Logs for memory compacted: !!R<< localhost+9997/hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
> {noformat}
> Now, let's check for the file:
> {noformat}
> $ hadoop fs -ls hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
> ls: Cannot access hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9: No such file or directory.
> {noformat}
> What?
> Check the NN logs:
> {noformat}
> $ fgrep 1dd2727f /some/log/dir/hadoop-ecnewt2-local-namenode-somehost.log 
> 2013-09-04 12:26:27,075 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9. blk_-6011963215434912690_971163
> 2013-09-04 12:26:27,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.fsync: file /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 for DFSClient_-787226921
> {noformat}
> So, the NN seems to be making the file, but it's not there when we go to look!
> Here's my hdfs-site.xml file:
> {noformat}
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <!-- Put site-specific property overrides in this file. -->
> <configuration>
>   <property>
>       <name>dfs.replication</name>
>       <value>1</value>
>   </property>
>   <property>
>       <name>dfs.name.dir</name>
>       <value>/local/ecn/data/hadoop/nn</value>
>   </property>
>   <property>
>       <name>dfs.data.dir</name>
>       <value>/disk01/data/hadoop/dn,/disk02/data/hadoop/dn,/disk03/data/hadoop/dn</value>
>   </property>
>   <property>
>       <name>dfs.support.append</name>
>       <value>true</value>
>   </property>
>   <property>
>       <name>dfs.data.synconclose</name>
>       <value>true</value>
>   </property>
> </configuration>
> {noformat}
> I have written an integration test that I dumped into RestartIT.java, but that doesn't seem to fail in same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira