hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokulakannan M <gok...@huawei.com>
Subject RE: Lots of Different Kind of Datanode Errors
Date Tue, 08 Jun 2010 05:31:30 GMT
Hi Andy,

            

            What is the reference of that fix?

 

 Thanks,

  Gokul

 

 

  _____  

From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Tuesday, June 08, 2010 1:24 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Lots of Different Kind of Datanode Errors

 


Current synchronization on FSDataset seems not quite right. Doing what
amounted to applying Todd's patch that modifies FSDataSet to use reentrant
rwlocks cleared up that type of problem for us. 

 

  - Andy


From: Jeff Whiting <jeffw@qualtrics.com>
Subject: Re: Lots of Different Kind of Datanode Errors
To: hdfs-user@hadoop.apache.org
Date: Monday, June 7, 2010, 10:02 AM

Thanks for the replies.  I have turned off swap on all the machines to
prevent any swap problems.  I was pounding my hard drives quite hard.  I had
a simulated 60 clients loading data as fast as I could into hbase with a map
reduce export job going at the same time.  Would that scenario explain some
of the errors I was seeing?

Over the weekend under more of a normal load I haven't not any exception
except for about 6 of these:
2010-06-05 03:46:41,229 ERROR datanode.DataNode (DataXceiver.java:run(131))
- DatanodeRegistration(192.168.0.98:50010,
storageID=DS-1806250311-192.168.0.98-50010-1274208294562, infoPort=50075,
ipcPort=50020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
blk_-1677111232590888964_4471547 is valid, and cannot be written to.
    at
org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java
:999)

The reason the config shows 4096 is because I increased the xceiver account
after the first email message in this thread.

~Jeff

Allen Wittenauer wrote: 

On Jun 4, 2010, at 12:03 PM, Todd Lipcon wrote:
 
  

Hi Jeff,
 
That seems like a reasonable config, but the error message you pasted
indicated xceivers was set to 2048 instead of 4096.
 
Also, in my experience SocketTimeoutExceptions are usually due to swapping.
Verify that your machines aren't swapping when you're under load.
    

Or doing any other heavy disk IO.
 
  

 

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com

 


Mime
View raw message