accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-1685) bench testing shows that the NN loses the WAL
Date Wed, 04 Sep 2013 16:54:51 GMT
Eric Newton created ACCUMULO-1685:
-------------------------------------

             Summary: bench testing shows that the NN loses the WAL
                 Key: ACCUMULO-1685
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1685
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
         Environment: Hadoop 1.0.4, single node dev't system
            Reporter: Eric Newton
            Assignee: Eric Newton
            Priority: Critical
             Fix For: 1.6.0


Doing bench testing; I build accumulo:

{noformat}
$ mvn -Pnative package -DskipTests
{noformat}

I go into the assembly area and configure and run accumulo

{noformat}
$ cd assemble/target/accumulo-1.6.0-SNAPSHOT-dev/accumulo-1.6.0-SNAPSHOT
$ cp ~/conf/* conf
$ hadoop fs -rmr /accumulo
Moved to trash: hdfs://somehost:9000/accumulo
$ ( echo test ; echo Y ; echo secret ; echo secret ) | ./bin/accumulo init
$ 2013-09-04 12:23:51,558 [util.Initialize] INFO : Hadoop Filesystem is hdfs://somehost:9000
2013-09-04 12:23:51,559 [util.Initialize] INFO : Accumulo data dirs are [hdfs://somehost:9000/accumulo]
2013-09-04 12:23:51,559 [util.Initialize] INFO : Zookeeper server is localhost:2181
2013-09-04 12:23:51,559 [util.Initialize] INFO : Checking if Zookeeper is available. If this
hangs, then you need to make sure zookeeper is running
Instance name : test
Instance name "test" exists. Delete existing entry from zookeeper? [Y/N] : Y
Enter initial password for root (this may not be applicable for your security setup): ******
Confirm initial password for root: ******
$ ./bin/start-all.sh 
Starting monitor on localhost
Starting tablet servers .... done
Starting tablet server on localhost
2013-09-04 12:26:24,545 [server.Accumulo] INFO : Attempting to talk to zookeeper
2013-09-04 12:26:24,675 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping
to talk to HDFS
2013-09-04 12:26:24,679 [server.Accumulo] INFO : Connected to HDFS
Starting master on localhost
Starting garbage collector on localhost
Starting tracer on localhost
{noformat}

Next, create a table

{noformat}
$ ./bin/accumulo shell -u root -p secret
2013-09-04 12:27:01,628 [shell.Shell] WARN : Specifying a raw password is deprecated.

Shell - Apache Accumulo Interactive Shell
- 
- version: 1.6.0-SNAPSHOT
- instance name: test
- instance id: 1967c1ec-cc0f-439b-b4da-4029debd16e3
- 
- type 'help' for a list of available commands
- 
root@test> createtable t
root@test t> 
{noformat}

Then I checked the tserver log for the write-ahead log created for this update to the root
table:

{noformat}
$ fgrep -a /wal/ logs/tserver_*.debug.log
2013-09-04 12:26:27,130 [log.DfsLogger] DEBUG: Got new write-ahead log: localhost+9997/hdfs://rd6ul-14706v.tycho.ncsc.mil:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
2013-09-04 12:26:58,264 [tabletserver.Tablet] DEBUG: Logs for memory compacted: !!R<<
localhost+9997/hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
{noformat}

Now, let's check for the file:

{noformat}
$ hadoop fs -ls hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
ls: Cannot access hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9:
No such file or directory.
{noformat}

What?

Check the NN logs:

{noformat}
$ fgrep 1dd2727f /some/log/dir/hadoop-ecnewt2-local-namenode-somehost.log 
2013-09-04 12:26:27,075 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9. blk_-6011963215434912690_971163
2013-09-04 12:26:27,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.fsync:
file /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 for DFSClient_-787226921
{noformat}

So, the NN seems to be making the file, but it's not there when we go to look!

Here's my hdfs-site.xml file:

{noformat}
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
      <name>dfs.replication</name>
      <value>1</value>
  </property>
  <property>
      <name>dfs.name.dir</name>
      <value>/local/ecn/data/hadoop/nn</value>
  </property>
  <property>
      <name>dfs.data.dir</name>
      <value>/disk01/data/hadoop/dn,/disk02/data/hadoop/dn,/disk03/data/hadoop/dn</value>
  </property>
  <property>
      <name>dfs.support.append</name>
      <value>true</value>
  </property>
  <property>
      <name>dfs.data.synconclose</name>
      <value>true</value>
  </property>
</configuration>
{noformat}

I have written an integration test that I dumped into RestartIT.java, but that doesn't seem
to fail in same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message