hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Karandikar <siddharth.karandi...@gmail.com>
Subject Re: newbie - job failing at reduce
Date Wed, 30 Jun 2010 16:40:14 GMT
Yeah. SSH is working as mentioned in the docs. Even directory
mentioned for 'mapred.local.dir' has enough space.

- Siddharth

On Wed, Jun 30, 2010 at 10:01 PM, Chris Collord <ccollord@lanl.gov> wrote:
> Interesting that the reduce phase makes it that far before failing!
> Are you able to SSH (without a password) into the failing node?  Any
> possible folder permissions issues?
> ~Chris
>
> On 06/30/2010 10:26 AM, Siddharth Karandikar wrote:
>>
>> Hey Chris,
>> Thanks for your inputs. I have tried most of the stuff, but will
>> surely go though tutorial you have pointed out. May be I will get some
>> hint there.
>>
>> Interestingly, while experimenting with it more, I noticed that, if
>> small size input file is there (50MBs) the job works perfectly fine.
>> If I give bigger input, it starts hanging @ reduce tasks. Map phase
>> always finishes 100%.
>>
>> - Siddharth
>>
>>
>> On Wed, Jun 30, 2010 at 9:11 PM, Chris Collord<ccollord@lanl.gov>  wrote:
>>
>>>
>>> Hi Siddharth,
>>> I'm VERY new to this myself, but here are a few thoughts (since nobody
>>> else
>>> is responding!).
>>> -You might want to set dfs.replication to 2.  I have read that for
>>> clusters
>>> <  8, you should have replication set to 2 machines.  8+ node clusters
>>> use 3.
>>>  This may make your cluster work, but it won't fix your problem.
>>> -Run a "bin/hadoop dfsadmin -report" with the hadoop cluster running and
>>> see
>>> what it shows for your failing node.
>>> -Check your logs/ folder for "datanode" logs and see if there's anything
>>> useful in there before the error you're getting.
>>> -You might try reformatting your hdfs, if you don't have anything
>>> important
>>> in there.  "bin/hadoop namenode -format".  (Note: this has caused
>>> problems
>>> for me in the past with namenode ID's, see the bottom on the link for
>>> Michael Noll's tutorial if that happens)
>>>
>>> You should check out Michael Noll's tutorial for all the little details:
>>>
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>
>>> Let me know if anything helps!
>>> ~Chris
>>>
>>>
>>>
>>> On 06/30/2010 04:02 AM, Siddharth Karandikar wrote:
>>>
>>>>
>>>> Anyone?
>>>>
>>>>
>>>> On Tue, Jun 29, 2010 at 8:41 PM, Siddharth Karandikar
>>>> <siddharth.karandikar@gmail.com>    wrote:
>>>>
>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I am new to Hadoop, but by reading online docs and other resource, I
>>>>> have moved ahead and now trying to run a cluster of 3 nodes.
>>>>> Before doing this, tried my program on standalone and pseudo systems
>>>>> and thats working fine.
>>>>>
>>>>> Now the issue that I am facing - mapping phase works correctly. While
>>>>> doing reduce, I am seeing following error on one of the nodes -
>>>>>
>>>>> 2010-06-29 14:35:01,848 WARN org.apache.hadoop.mapred.TaskTracker:
>>>>> getMapOutput(attempt_201006291958_0001_m_000008_0,0) failed :
>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>>
>>>>>
>>>>> taskTracker/jobcache/job_201006291958_0001/attempt_201006291958_0001_m_000008_0/output/file.out.index
>>>>> in any of the configured local directories
>>>>>
>>>>> Lets say this is @ Node1. But there is no such directory named
>>>>>
>>>>>
>>>>> 'taskTracker/jobcache/job_201006291958_0001/attempt_201006291958_0001_m_000008_0'
>>>>> under /tmp/mapred/local/taskTracker/ on Node1. Interestingly, this
>>>>> directory is available on Node2 (or Node3). Tried running the job
>>>>> multiple times, but its always failing while reducing. Same error.
>>>>>
>>>>> I have configured /tmp/mapred/local on each node from mapred-site.xml.
>>>>>
>>>>> I really don't understand why mappers are misplacing these files? Or
>>>>> am I missing something in configuration?
>>>>>
>>>>> If someone wants to look @ configurations, I have pasted that below.
>>>>>
>>>>> Thanks,
>>>>> Siddharth
>>>>>
>>>>>
>>>>> Configurations
>>>>> ==========
>>>>>
>>>>> conf/core-site.xml
>>>>> ---------------------------
>>>>>
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>> <configuration>
>>>>>  <property>
>>>>>    <name>fs.default.name</name>
>>>>>    <value>hdfs://192.168.2.115/</value>
>>>>>  </property>
>>>>> </configuration>
>>>>>
>>>>>
>>>>> conf/hdfs-site.xml
>>>>> --------------------------
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>> <configuration>
>>>>>  <property>
>>>>>    <name>fs.default.name</name>
>>>>>    <value>hdfs://192.168.2.115</value>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>dfs.data.dir</name>
>>>>>    <value>/home/siddharth/hdfs/data</value>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>dfs.name.dir</name>
>>>>>    <value>/home/siddharth/hdfs/name</value>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>dfs.replication</name>
>>>>>    <value>3</value>
>>>>>  </property>
>>>>> </configuration>
>>>>>
>>>>> conf/mapred-site.xml
>>>>> ------------------------------
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>> <configuration>
>>>>>  <property>
>>>>>    <name>mapred.job.tracker</name>
>>>>>    <value>192.168.2.115:8021</value>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>mapred.local.dir</name>
>>>>>    <value>/tmp/mapred/local</value>
>>>>>    <final>true</final>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>mapred.system.dir</name>
>>>>>    <value>hdfs://192.168.2.115/maperdsystem</value>
>>>>>    <final>true</final>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>mapred.tasktracker.map.tasks.maximum</name>
>>>>>    <value>4</value>
>>>>>    <final>true</final>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>>>>    <value>4</value>
>>>>>    <final>true</final>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>mapred.child.java.opts</name>
>>>>>    <value>-Xmx512m</value>
>>>>>    <!-- Not marked as final so jobs can include JVM debugging options
>>>>> -->
>>>>>  </property>
>>>>> </configuration>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> ------------------------------
>>> Chris Collord, ACS-PO 9/80 A
>>> ------------------------------
>>>
>>>
>>>
>
>
> --
> ------------------------------
> Chris Collord, ACS-PO 9/80 A
> ------------------------------
>
>

Mime
View raw message