hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Data loss on EMR cluster running Hadoop and Hive
Date Tue, 04 Sep 2012 17:40:28 GMT
Max, 
Yes, you will get better performance if your data is on HDFS (local/ephemeral) versus S3.


I'm not sure why you couldn't see the bad block. 
Next time this happens, try running an hadoop fsck from the name node. 

The reason why I was suggesting that you run against S3 is that while slower, its still faster
than trying to copy the data to the local disk, run the job and then push the results to S3.


Again, I would suggest that you try and contact support from AWS.

HTH

-Mike

On Sep 4, 2012, at 12:08 PM, Max Hansmire <hansmire@gmail.com> wrote:

> Especially where I am reading from from the file using a Map-Reduce
> job in the next step I am not sure that it makes sense in terms of
> performance to put the file on S3. I have not tested, but my suspicion
> is that the local disk reads on HDFS would outperform reading and
> writing the file to S3.
> 
> This is a bad block on HDFS and not the underlying filesystem. I
> thought that HDFS was supposed to be tolerant of native file system
> failures.
> 
> Max
> 
> On Tue, Sep 4, 2012 at 12:43 PM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> Next time, try reading and writing to S3 directly from your hive job.
>> 
>> Not sure why the block was bad... What did the AWS folks have to say?
>> 
>> -Mike
>> 
>> On Sep 4, 2012, at 11:30 AM, Max Hansmire <hansmire@gmail.com> wrote:
>> 
>>> I ran into an issue yesterday where one of the blocks on HDFS seems to
>>> have gone away. I would appreciate any help that you can provide.
>>> 
>>> I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running
>>> hadoop version 0.20.205 and hive version 0.8.1.
>>> 
>>> I have a hive table that is written out in the reduce step of a map
>>> reduce job created by hive. This step completed with no errors, but
>>> the next map-reduce job that tries to read it failed with the
>>> following error message.
>>> 
>>> "Caused by: java.io.IOException: No live nodes contain current block"
>>> 
>>> I ran hadoop fs -cat on the same file and got the same error.
>>> 
>>> Looking more closely at the data and name node logs, I see this error
>>> for the same problem block. It is in the name node when trying to read
>>> the data.
>>> 
>>> 2012-09-03 11:56:05,054 WARN
>>> org.apache.hadoop.hdfs.server.datanode.DataNode
>>> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
>>> DatanodeRegistration(10.193.39.159:9200,
>>> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
>>> infoPort=9102, ipcPort=9201):sendBlock() :  Offset 134217727 and
>>> length 1 don't match block blk_-7100869813617535842_5426 ( blockLen
>>> 120152064 )
>>> 2012-09-03 11:56:05,054 WARN
>>> org.apache.hadoop.hdfs.server.datanode.DataNode
>>> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
>>> DatanodeRegistration(10.193.39.159:9200,
>>> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
>>> infoPort=9102, ipcPort=9201):Got exception while serving
>>> blk_-7100869813617535842_5426 to /10.96.57.112:
>>> java.io.IOException:  Offset 134217727 and length 1 don't match block
>>> blk_-7100869813617535842_5426 ( blockLen 120152064 )
>>>      at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
>>>      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>>>      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>>>      at java.lang.Thread.run(Thread.java:662)
>>> 
>>> 2012-09-03 11:56:05,054 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode
>>> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
>>> DatanodeRegistration(10.193.39.159:9200,
>>> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
>>> infoPort=9102, ipcPort=9201):DataXceiver
>>> java.io.IOException:  Offset 134217727 and length 1 don't match block
>>> blk_-7100869813617535842_5426 ( blockLen 120152064 )
>>>      at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
>>>      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>>>      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>>>      at java.lang.Thread.run(Thread.java:662)
>>> 
>>> Unfortunately the EMR cluster that had the data on it has since been
>>> terminated. I have access to the logs, but I can't run an fsck. I can
>>> provide more detailed stack traces etc. if you think it would be
>>> helpful. Rerunning my process by re-generating the corrupted block
>>> resolved the issue.
>>> 
>>> Would really appreciate if anyone has a reasonable explanation of what
>>> happened and how to avoid in the future.
>>> 
>>> Max
>>> 
>> 
> 


Mime
View raw message