hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Da Zheng <zhengda1...@gmail.com>
Subject a bug? inputsplit cannot return the location correctly
Date Tue, 30 Nov 2010 00:32:16 GMT

I guess it might not a right mailing list to send, but I cannot send 
emails to MapReduce mailing list. I don't know why.

I have a 6-node cluster and stored a 25GB data in the HDFS. I ran a 
simple MapReduce program and used mapred.reduce.slowstart.completed.maps 
to delay the execution of reducers. That is, during the mapping phase, 
only the mappers are running. Normally, there shouldn't be much network 
traffic in the network when there are only mappers running. However, I 
can see almost 25GB data is transmitted in the network.

So I print all splits, files that they point to, and nodes where they 
are when InputSplit is generated. I also print the same thing for each 
splits when a RecordReader is initialized. I surprisingly found that 
InputSplit (in my case, it's FileSplit) passed to RecordReader doesn't 
have any location information. It seems to explain why the Hadoop cannot 
consider about the data locality when launching mapper tasks.

It seems to be a bug to me. I use hadoop v0.20.2. Does anyone experience 
the similar problem like this?


View raw message