hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Hansmire <hansm...@gmail.com>
Subject Re: Data loss on EMR cluster running Hadoop and Hive
Date Tue, 04 Sep 2012 17:08:37 GMT
Especially where I am reading from from the file using a Map-Reduce
job in the next step I am not sure that it makes sense in terms of
performance to put the file on S3. I have not tested, but my suspicion
is that the local disk reads on HDFS would outperform reading and
writing the file to S3.

This is a bad block on HDFS and not the underlying filesystem. I
thought that HDFS was supposed to be tolerant of native file system
failures.

Max

On Tue, Sep 4, 2012 at 12:43 PM, Michael Segel
<michael_segel@hotmail.com> wrote:
> Next time, try reading and writing to S3 directly from your hive job.
>
> Not sure why the block was bad... What did the AWS folks have to say?
>
> -Mike
>
> On Sep 4, 2012, at 11:30 AM, Max Hansmire <hansmire@gmail.com> wrote:
>
>> I ran into an issue yesterday where one of the blocks on HDFS seems to
>> have gone away. I would appreciate any help that you can provide.
>>
>> I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running
>> hadoop version 0.20.205 and hive version 0.8.1.
>>
>> I have a hive table that is written out in the reduce step of a map
>> reduce job created by hive. This step completed with no errors, but
>> the next map-reduce job that tries to read it failed with the
>> following error message.
>>
>> "Caused by: java.io.IOException: No live nodes contain current block"
>>
>> I ran hadoop fs -cat on the same file and got the same error.
>>
>> Looking more closely at the data and name node logs, I see this error
>> for the same problem block. It is in the name node when trying to read
>> the data.
>>
>> 2012-09-03 11:56:05,054 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode
>> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
>> DatanodeRegistration(10.193.39.159:9200,
>> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
>> infoPort=9102, ipcPort=9201):sendBlock() :  Offset 134217727 and
>> length 1 don't match block blk_-7100869813617535842_5426 ( blockLen
>> 120152064 )
>> 2012-09-03 11:56:05,054 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode
>> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
>> DatanodeRegistration(10.193.39.159:9200,
>> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
>> infoPort=9102, ipcPort=9201):Got exception while serving
>> blk_-7100869813617535842_5426 to /10.96.57.112:
>> java.io.IOException:  Offset 134217727 and length 1 don't match block
>> blk_-7100869813617535842_5426 ( blockLen 120152064 )
>>       at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
>>       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>>       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>>       at java.lang.Thread.run(Thread.java:662)
>>
>> 2012-09-03 11:56:05,054 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode
>> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
>> DatanodeRegistration(10.193.39.159:9200,
>> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
>> infoPort=9102, ipcPort=9201):DataXceiver
>> java.io.IOException:  Offset 134217727 and length 1 don't match block
>> blk_-7100869813617535842_5426 ( blockLen 120152064 )
>>       at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
>>       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>>       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>>       at java.lang.Thread.run(Thread.java:662)
>>
>> Unfortunately the EMR cluster that had the data on it has since been
>> terminated. I have access to the logs, but I can't run an fsck. I can
>> provide more detailed stack traces etc. if you think it would be
>> helpful. Rerunning my process by re-generating the corrupted block
>> resolved the issue.
>>
>> Would really appreciate if anyone has a reasonable explanation of what
>> happened and how to avoid in the future.
>>
>> Max
>>
>

Mime
View raw message