accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1685) bench testing shows that the NN loses the WAL
Date Wed, 04 Sep 2013 17:40:52 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758013#comment-13758013
] 

Eric Newton commented on ACCUMULO-1685:
---------------------------------------

It's the GC:

{noformat}
2013-09-04 13:38:20,353 [util.MetadataTableUtil] INFO : Setting range to (-inf,~ : [] 9223372036854775807
false)
2013-09-04 13:38:20,358 [gc.GarbageCollectWriteAheadLogs] INFO : 1 log entries scanned in
0.02 seconds
2013-09-04 13:38:20,364 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL for offline
server hdfs://somehost:9000/accumulo/wal/localhost+9997/ffb89027-b28f-4509-9a72-c3d08d1f31ac
{noformat}

Ugh!

                
> bench testing shows that the NN loses the WAL
> ---------------------------------------------
>
>                 Key: ACCUMULO-1685
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1685
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>         Environment: Hadoop 1.0.4, single node dev't system
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> Doing bench testing; I build accumulo:
> {noformat}
> $ mvn -Pnative package -DskipTests
> {noformat}
> I go into the assembly area and configure and run accumulo
> {noformat}
> $ cd assemble/target/accumulo-1.6.0-SNAPSHOT-dev/accumulo-1.6.0-SNAPSHOT
> $ cp ~/conf/* conf
> $ hadoop fs -rmr /accumulo
> Moved to trash: hdfs://somehost:9000/accumulo
> $ ( echo test ; echo Y ; echo secret ; echo secret ) | ./bin/accumulo init
> $ 2013-09-04 12:23:51,558 [util.Initialize] INFO : Hadoop Filesystem is hdfs://somehost:9000
> 2013-09-04 12:23:51,559 [util.Initialize] INFO : Accumulo data dirs are [hdfs://somehost:9000/accumulo]
> 2013-09-04 12:23:51,559 [util.Initialize] INFO : Zookeeper server is localhost:2181
> 2013-09-04 12:23:51,559 [util.Initialize] INFO : Checking if Zookeeper is available.
If this hangs, then you need to make sure zookeeper is running
> Instance name : test
> Instance name "test" exists. Delete existing entry from zookeeper? [Y/N] : Y
> Enter initial password for root (this may not be applicable for your security setup):
******
> Confirm initial password for root: ******
> $ ./bin/start-all.sh 
> Starting monitor on localhost
> Starting tablet servers .... done
> Starting tablet server on localhost
> 2013-09-04 12:26:24,545 [server.Accumulo] INFO : Attempting to talk to zookeeper
> 2013-09-04 12:26:24,675 [server.Accumulo] INFO : Zookeeper connected and initialized,
attemping to talk to HDFS
> 2013-09-04 12:26:24,679 [server.Accumulo] INFO : Connected to HDFS
> Starting master on localhost
> Starting garbage collector on localhost
> Starting tracer on localhost
> {noformat}
> Next, create a table
> {noformat}
> $ ./bin/accumulo shell -u root -p secret
> 2013-09-04 12:27:01,628 [shell.Shell] WARN : Specifying a raw password is deprecated.
> Shell - Apache Accumulo Interactive Shell
> - 
> - version: 1.6.0-SNAPSHOT
> - instance name: test
> - instance id: 1967c1ec-cc0f-439b-b4da-4029debd16e3
> - 
> - type 'help' for a list of available commands
> - 
> root@test> createtable t
> root@test t> 
> {noformat}
> Then I checked the tserver log for the write-ahead log created for this update to the
root table:
> {noformat}
> $ fgrep -a /wal/ logs/tserver_*.debug.log
> 2013-09-04 12:26:27,130 [log.DfsLogger] DEBUG: Got new write-ahead log: localhost+9997/hdfs://rd6ul-14706v.tycho.ncsc.mil:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
> 2013-09-04 12:26:58,264 [tabletserver.Tablet] DEBUG: Logs for memory compacted: !!R<<
localhost+9997/hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
> {noformat}
> Now, let's check for the file:
> {noformat}
> $ hadoop fs -ls hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9
> ls: Cannot access hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9:
No such file or directory.
> {noformat}
> What?
> Check the NN logs:
> {noformat}
> $ fgrep 1dd2727f /some/log/dir/hadoop-ecnewt2-local-namenode-somehost.log 
> 2013-09-04 12:26:27,075 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9. blk_-6011963215434912690_971163
> 2013-09-04 12:26:27,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.fsync:
file /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 for DFSClient_-787226921
> {noformat}
> So, the NN seems to be making the file, but it's not there when we go to look!
> Here's my hdfs-site.xml file:
> {noformat}
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <!-- Put site-specific property overrides in this file. -->
> <configuration>
>   <property>
>       <name>dfs.replication</name>
>       <value>1</value>
>   </property>
>   <property>
>       <name>dfs.name.dir</name>
>       <value>/local/ecn/data/hadoop/nn</value>
>   </property>
>   <property>
>       <name>dfs.data.dir</name>
>       <value>/disk01/data/hadoop/dn,/disk02/data/hadoop/dn,/disk03/data/hadoop/dn</value>
>   </property>
>   <property>
>       <name>dfs.support.append</name>
>       <value>true</value>
>   </property>
>   <property>
>       <name>dfs.data.synconclose</name>
>       <value>true</value>
>   </property>
> </configuration>
> {noformat}
> I have written an integration test that I dumped into RestartIT.java, but that doesn't
seem to fail in same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message