On Sat, Apr 17, 2010 at 9:36 PM, = Ted Yu <yuzhiho= ng@gmail.com> wrote:

Hi,
Putting this thread back in pool to leverage collective intelligence.

If you get the full command line of the java processes, it wouldn't be<= br> difficult to correlate reduce task(s) with a particular job.

Cheers

On Sat, Apr 17, 2010 at 2:20 PM, Raghava Mutharaju <

m.vijayaragh= ava@gmail.com> wrote:

> Hello Ted,
>

> =A0 =A0 =A0 =A0Thank you for the suggestions := ). I haven't come across any other
> serious issue before this one. Infact, the same MR job runs for a smal= ler
> input size, although, lot slower than what we expected.
>
> I will use jstack to get stack trace. I had a question in this regard.= How
> would I know which MR job (job id) is related to which java process (p= id)? I
> can get a list of hadoop jobs with "hadoop job -list" and li= st of java
> processes with "jps" but how I couldn't determine how to= get the connection
> between these 2 lists.
>
>
> Thank you again.
>
> Regards,
> Raghava.
>
> On Fri, Apr 16, 2010 at 11:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> If you look at
>> https://issues.apache.org/jira/secur= e/ManageAttachments.jspa?id=3D12408776,

>> you can see that hdfs-127-branch20-redone-v2.txt<https://issues.apache.org/jira/secure= /attachment/12431012/hdfs-127-branch20-redone-v2.txt>was the latest.=

>>
>> You need to download the source code corresponding to your version= of
>> hadoop, apply the patch and rebuild.
>>
>> If you haven't experienced serious issue with hadoop for other= scenarios,
>> we should try to find out the root cause for the current problem w= ithout the
>> 127 patch.
>>
>> My advice is to use jstack to find what each thread was waiting fo= r after
>> reducers get stuck.
>> I would expect a deadlock in either your code or hdfs, I would thi= nk it
>> should the former.
>>
>> You can replace sensitive names in the stack traces and paste it i= f you
>> cannot determine the deadlock.
>>
>> Cheers
>>
>>
>> On Fri, Apr 16, 2010 at 5:46 PM, Raghava Mutharaju <
>> m.vijayaraghava@gmail= .com> wrote:
>>
>>> Hello Ted,
>>>
>>> =A0 =A0 =A0 Thank you for the reply. Will this change fix my i= ssue? I asked
>>> this because I again need to convince my admin to make this ch= ange.
>>>
>>> =A0 =A0 =A0 We have a gateway to the cluster-head. We generall= y run our MR jobs
>>> on the gateway. Should this change be made to the hadoop insta= llation on the
>>> gateway?
>>>
>>> 1) I am confused on which patch to be applied? There are 4 pat= ches listed
>>> at https://issues.apache.org/jira/browse/HDFS-127
>>>
>>> 2) How to apply the patch? Should we change the lines of code = specified
>>> and rebuild hadoop? Or is there any other way?
>>>
>>> Thank you again.
>>>
>>> Regards,
>>> Raghava.
>>>
>>>
>>> On Fri, Apr 16, 2010 at 6:42 PM, <yuzhihong@gmail.com> wrote:
>>>
>>>> That patch is very important.
>>>>
>>>> please apply it.
>>>>
>>>> Sent from my Verizon Wireless BlackBerry
>>>> ------------------------------
>>>> *From: * Raghava Mutharaju <m.vijayaraghava@gmail.com>
>>>> *Date: *Fri, 16 Apr 2010 17:27:11 -0400
>>>> *To: *Ted Yu<yuz= hihong@gmail.com>
>>>> *Subject: *Re: Reduce gets struck at 99%
>>>>
>>>> Hi Ted,
>>>>
>>>> =A0 =A0 =A0 =A0 It took sometime to contact my department&= #39;s admin (he was on
>>>> leave) and ask him to make ulimit changes effective in the= cluster (just
>>>> adding entry in /etc/security/limits.conf was not sufficie= nt, so took
>>>> sometime to figure out). Now the ulimit is 32768. I ran th= e set of MR jobs,
>>>> the result is the same --- it gets stuck at Reduce 99%. Bu= t this time, there
>>>> are no exceptions in the logs. I view JobTracker logs thro= ugh the Web UI. I
>>>> checked "Running Jobs" as well as "Failed J= obs".
>>>>
>>>> I haven't asked the admin to apply the patch
>>>> https://issues.apache.org/jira/browse/HDFS-127 that = you mentioned
>>>> earlier. Is this important?
>>>>
>>>> Do you any suggestions?
>>>>
>>>> Thank you.
>>>>
>>>> Regards,
>>>> Raghava.
>>>>
>>>> On Fri, Apr 9, 2010 at 3:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>
>>>>> For the user under whom you launch MR jobs.
>>>>>
>>>>>
>>>>> On Fri, Apr 9, 2010 at 12:02 PM, Raghava Mutharaju <= ;
>>>>> m.vijayar= aghava@gmail.com> wrote:
>>>>>
>>>>>> Hi Ted,
>>>>>>
>>>>>> =A0 =A0 =A0 =A0Sorry to bug you again :) but I do = not have an account on all
>>>>>> the datanodes, I just have it on the machine on wh= ich I start the MR jobs.
>>>>>> So is it required to increase the ulimit on all th= e nodes (in this case the
>>>>>> admin may have to increase it for all the users?)<= br> >>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Raghava.
>>>>>>
>>>>>> On Fri, Apr 9, 2010 at 11:43 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>>>
>>>>>>> ulimit should be increased on all nodes.
>>>>>>>
>>>>>>> The link I gave you lists several actions to t= ake. I think they're
>>>>>>> not specifically for hbase.
>>>>>>> Also make sure the following is applied:
>>>>>>> https://issues.apache.org/jira/browse/HDFS-1= 27
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 8, 2010 at 10:13 PM, Raghava Mutha= raju <
>>>>>>> m= .vijayaraghava@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Ted,
>>>>>>>>
>>>>>>>> =A0 =A0 =A0 =A0Should the increase in ulim= it to 32768 be applied on all the
>>>>>>>> datanodes (its a 16 node cluster)? Is this= related to HBase, because I am
>>>>>>>> not using HBase.
>>>>>>>> =A0 =A0 =A0 =A0Are the exceptions & de= lay (at Reduce 99%) due to this?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Raghava.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 9, 2010 at 1:01 AM, Ted Yu <= ;yuzhihong@gmail.com> wrote:<= br> >>>>>>>>
>>>>>>>>> Your ulimit is low.
>>>>>>>>> Ask your admin to increase it to 32768=
>>>>>>>>>
>>>>>>>>> See http://wiki.apache.org/hado= op/Hbase/Troubleshooting, item #6
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 8, 2010 at 9:46 PM, Raghav= a Mutharaju <
>>>>>>>>> m.vijayaraghava@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ted,
>>>>>>>>>>
>>>>>>>>>> I am pasting below the timestamps = from the log.
>>>>>>>>>>
>>>>>>>>>> =A0 =A0 =A0 =A0Lease-exception: >>>>>>>>>>
>>>>>>>>>> Task Attempts Machine Status Progr= ess Start Time Shuffle Finished
>>>>>>>>>> Sort Finished Finish Time Errors T= ask Logs
>>>>>>>>>> =A0 =A0Counters Actions
>>>>>>>>>> =A0 =A0attempt_201004060646_0057_r= _000014_0 /default-rack/nimbus15
>>>>>>>>>> FAILED 0.00%
>>>>>>>>>> =A0 =A08-Apr-2010 07:38:53 8-Apr-2= 010 07:39:21 (27sec) 8-Apr-2010
>>>>>>>>>> 07:39:21 (0sec) 8-Apr-2010 09:54:3= 3 (2hrs, 15mins, 39sec)
>>>>>>>>>>
>>>>>>>>>> ----------------------------------= ---
>>>>>>>>>>
>>>>>>>>>> =A0 =A0 =A0 =A0 DFS Client Excepti= on:
>>>>>>>>>>
>>>>>>>>>> Task Attempts Machine Status Progr= ess Start Time Shuffle Finished
>>>>>>>>>> Sort Finished Finish Time Errors T= ask Logs
>>>>>>>>>> =A0 =A0Counters Actions
>>>>>>>>>> =A0 =A0attempt_201004060646_0057_r= _000006_0 /default-rack/
>>>>>>>>>> nimbus3.cs.wright.edu FAILED 0.00%
>>>>>>>>>> =A0 =A08-Apr-2010 07:38:47 8-Apr-2= 010 07:39:10 (23sec) 8-Apr-2010
>>>>>>>>>> 07:39:11 (0sec) 8-Apr-2010 08:51:3= 3 (1hrs, 12mins, 46sec)
>>>>>>>>>> ----------------------------------= --------
>>>>>>>>>>
>>>>>>>>>> The file limit is set to 1024. I c= hecked couple of datanodes. I
>>>>>>>>>> haven't checked the headnode t= hough.
>>>>>>>>>>
>>>>>>>>>> The no of currently open files und= er my username, on the system on
>>>>>>>>>> which I started the MR jobs are 34= 6
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you for you help :)
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Raghava.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 9, 2010 at 12:14 AM, T= ed Yu <yuzhihong@gmail.com>= ;wrote:
>>>>>>>>>>
>>>>>>>>>>> Can you give me the timestamps= of the two exceptions ?
>>>>>>>>>>> I want to see if they're r= elated.
>>>>>>>>>>>
>>>>>>>>>>> I saw DFSClient$DFSOutputStrea= m.close() in the first stack trace.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 8, 2010 at 9:09 PM= , Ted Yu <yuzhihong@gmail.com= >wrote:
>>>>>>>>>>>
>>>>>>>>>>>> just to double check it= 9;s not a file
>>>>>>>>>>>> limits issue could you run= the following on each of the hosts:
>>>>>>>>>>>>
>>>>>>>>>>>> $ ulimit -a
>>>>>>>>>>>> $ lsof | wc -l
>>>>>>>>>>>>
>>>>>>>>>>>> The first command will sho= w you (among other things) the file
>>>>>>>>>>>> limits, it
>>>>>>>>>>>> should be above the defaul= t 1024. =A0The second will tell you have
>>>>>>>>>>>> many files
>>>>>>>>>>>> are currently open...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 8, 2010 at 7:4= 0 PM, Raghava Mutharaju <
>>>>>>>>>>>> m.vijayaraghava@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ted,
>>>>>>>>>>>>>
>>>>>>>>>>>>> =A0 =A0 =A0 =A0 Thank = you for all the suggestions. I went through the
>>>>>>>>>>>>> job tracker logs and I= have attached the exceptions found in the logs. I
>>>>>>>>>>>>> found two exceptions >>>>>>>>>>>>>
>>>>>>>>>>>>> 1) org.apache.hadoop.i= pc.RemoteException: java.io.IOException:
>>>>>>>>>>>>> Could not complete wri= te to file =A0 =A0(DFS Client)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) org.apache.hadoop.i= pc.RemoteException:
>>>>>>>>>>>>> org.apache.hadoop.hdfs= .server.namenode.LeaseExpiredException: No lease on
>>>>>>>>>>>>> /user/raghava/MR_EL/ou= tput/_temporary/_attempt_201004060646_0057_r_000014_0/part-r-00014
>>>>>>>>>>>>> File does not exist. H= older DFSClient_attempt_201004060646_0057_r_000014_0
>>>>>>>>>>>>> does not have any open= files.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The exception occurs a= t the point of writing out <K,V> pairs in
>>>>>>>>>>>>> the reducer and it occ= urs only in certain task attempts. I am not using any
>>>>>>>>>>>>> custom output format o= r record writers but I do use custom input reader.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What could have gone w= rong here?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Raghava.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Apr 8, 2010 at= 5:51 PM, Ted Yu <yuzhihong@gmail= .com>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Raghava:
>>>>>>>>>>>>>> Are you able to sh= are the last segment of reducer log ?
>>>>>>>>>>>>>> You can get them f= rom web UI:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://snv-it-lin-012.pr.com:50= 060/tasklog?taskid=3Dattempt_201003221148_1211_r_000003_0&start=3D-8193=
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adding more log in= your reducer task would help pinpoint where
>>>>>>>>>>>>>> the issue is.
>>>>>>>>>>>>>> Also look in job t= racker log.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Apr 8, 201= 0 at 2:46 PM, Raghava Mutharaju <
>>>>>>>>>>>>>> m.vijayaraghava@gmail.com
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Hi Ted,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > =A0 =A0 =A0Th= ank you for the suggestion. I enabled it using the
>>>>>>>>>>>>>> Configuration
>>>>>>>>>>>>>> > class because= I cannot change hadoop-site.xml file (I am not
>>>>>>>>>>>>>> an admin). The
>>>>>>>>>>>>>> > situation is = still the same --- it gets stuck at reduce 99%
>>>>>>>>>>>>>> and does not
>>>>>>>>>>>>>> > move further.=
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Regards,
>>>>>>>>>>>>>> > Raghava.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Thu, Apr 8= , 2010 at 4:40 PM, Ted Yu <yuzhih= ong@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > > You need= to turn on yourself (hadoop-site.xml):
>>>>>>>>>>>>>> > > <prop= erty>
>>>>>>>>>>>>>> > > =A0<n= ame>mapred.reduce.tasks.speculative.execution</name>
>>>>>>>>>>>>>> > > =A0<v= alue>true</value>
>>>>>>>>>>>>>> > > </pro= perty>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > <prop= erty>
>>>>>>>>>>>>>> > > =A0<n= ame>mapred.map.tasks.speculative.execution</name>
>>>>>>>>>>>>>> > > =A0<v= alue>true</value>
>>>>>>>>>>>>>> > > </pro= perty>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > On Thu, = Apr 8, 2010 at 1:14 PM, Raghava Mutharaju <
>>>>>>>>>>>>>> > > m.vijayaraghava@gmail.com
>>>>>>>>>>>>>> > > > wro= te:
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > > Hi,=
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > =A0= =A0 Thank you Eric, Prashant and Greg. Although the
>>>>>>>>>>>>>> timeout problem wa= s
>>>>>>>>>>>>>> > > > res= olved, reduce is getting stuck at 99%. As of now, it
>>>>>>>>>>>>>> has been stuck
>>>>>>>>>>>>>> > > > the= re
>>>>>>>>>>>>>> > > > for= about 3 hrs. That is too high a wait time for my
>>>>>>>>>>>>>> task. Do you guys<= br> >>>>>>>>>>>>>> > > see
>>>>>>>>>>>>>> > > > any= reason for this?
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > =A0= =A0 =A0Speculative execution is "on" by default right? Or
>>>>>>>>>>>>>> should I enable >>>>>>>>>>>>>> > > it?
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Reg= ards,
>>>>>>>>>>>>>> > > > Rag= hava.
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > On = Thu, Apr 8, 2010 at 3:15 PM, Gregory Lawrence <
>>>>>>>>>>>>>> gregl@yahoo-inc.com
>>>>>>>>>>>>>> > > > >= ;wrote:
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > >= ; =A0Hi,
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; I have also experienced this problem. Have you tried
>>>>>>>>>>>>>> speculative
>>>>>>>>>>>>>> > > > exe= cution?
>>>>>>>>>>>>>> > > > >= ; Also, I have had jobs that took a long time for one
>>>>>>>>>>>>>> mapper / reducer >>>>>>>>>>>>>> > > > bec= ause
>>>>>>>>>>>>>> > > > >= ; of a record that was significantly larger than those
>>>>>>>>>>>>>> contained in the >>>>>>>>>>>>>> > > > oth= er
>>>>>>>>>>>>>> > > > >= ; filesplits. Do you know if it always slows down for
>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>> > filesplit? >>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Regards,
>>>>>>>>>>>>>> > > > >= ; Greg Lawrence
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; On 4/8/10 10:30 AM, "Raghava Mutharaju" <
>>>>>>>>>>>>>> m.vijayaraghava@gmail.com>
>>>>>>>>>>>>>> > > > wro= te:
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Hello all,
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; =A0 =A0 =A0 =A0 =A0I got the time out error as mentioned below
>>>>>>>>>>>>>> -- after 600
>>>>>>>>>>>>>> > > > sec= onds,
>>>>>>>>>>>>>> > > > >= ; that attempt was killed and the attempt would be
>>>>>>>>>>>>>> deemed a failure. = I
>>>>>>>>>>>>>> > > > >= ; searched around about this error, and one of the
>>>>>>>>>>>>>> suggestions to
>>>>>>>>>>>>>> > include
>>>>>>>>>>>>>> > > > >= ; "progress" statements in the reducer -- it might be
>>>>>>>>>>>>>> taking longer
>>>>>>>>>>>>>> > than
>>>>>>>>>>>>>> > > > 600=
>>>>>>>>>>>>>> > > > >= ; seconds and so is timing out. I added calls to
>>>>>>>>>>>>>> context.progress()= and
>>>>>>>>>>>>>> > > > >= ; context.setStatus(str) in the reducer. Now, it works
>>>>>>>>>>>>>> fine -- there
>>>>>>>>>>>>>> > are
>>>>>>>>>>>>>> > > no
>>>>>>>>>>>>>> > > > >= ; timeout errors.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; =A0 =A0 =A0 =A0 =A0But, for a few jobs, it takes awfully long
>>>>>>>>>>>>>> time to move from<= br> >>>>>>>>>>>>>> > > > &qu= ot;Map
>>>>>>>>>>>>>> > > > >= ; 100%, Reduce 99%" to Reduce 100%. For some jobs its
>>>>>>>>>>>>>> 15mins and for
>>>>>>>>>>>>>> > some
>>>>>>>>>>>>>> > > > it<= br> >>>>>>>>>>>>>> > > > >= ; was more than an hour. The reduce code is not complex
>>>>>>>>>>>>>> -- 2 level loop >>>>>>>>>>>>>> > > and
>>>>>>>>>>>>>> > > > >= ; couple of if-else blocks. The input size is also not
>>>>>>>>>>>>>> huge, for the
>>>>>>>>>>>>>> > job
>>>>>>>>>>>>>> > > > tha= t
>>>>>>>>>>>>>> > > > >= ; gets struck for an hour at reduce 99%, it would take
>>>>>>>>>>>>>> in 130. Some of >>>>>>>>>>>>>> > > them
>>>>>>>>>>>>>> > > > >= ; are 1-3 MB in size and couple of them are 16MB in
>>>>>>>>>>>>>> size.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; =A0 =A0 =A0 =A0 =A0Has anyone encountered this problem before?
>>>>>>>>>>>>>> Any pointers? I >>>>>>>>>>>>>> > > use
>>>>>>>>>>>>>> > > > >= ; Hadoop 0.20.2 on a linux cluster of 16 nodes.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Thank you.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Regards,
>>>>>>>>>>>>>> > > > >= ; Raghava.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; On Thu, Apr 1, 2010 at 2:24 AM, Raghava Mutharaju <
>>>>>>>>>>>>>> > > > >= ; m.vijayaraghava@gmail.com> wrote:
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Hi all,
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; =A0 =A0 =A0 =A0I am running a series of jobs one after
>>>>>>>>>>>>>> another. While
>>>>>>>>>>>>>> > executing
>>>>>>>>>>>>>> > > > the=
>>>>>>>>>>>>>> > > > >= ; 4th job, the job fails. It fails in the reducer ---
>>>>>>>>>>>>>> the progress
>>>>>>>>>>>>>> > > > per= centage
>>>>>>>>>>>>>> > > > >= ; would be map 100%, reduce 99%. It gives out the
>>>>>>>>>>>>>> following message<= br> >>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; 10/04/01 01:04:15 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>> > > > >= ; attempt_201003240138_0110_r_000018_1, Status : FAILED
>>>>>>>>>>>>>> > > > >= ; Task attempt_201003240138_0110_r_000018_1 failed to
>>>>>>>>>>>>>> report status for<= br> >>>>>>>>>>>>>> > > 602
>>>>>>>>>>>>>> > > > >= ; seconds. Killing!
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; It makes several attempts again to execute it but
>>>>>>>>>>>>>> fails with similar=
>>>>>>>>>>>>>> > > > >= ; message. I couldn't get anything from this error
>>>>>>>>>>>>>> message and wanted=
>>>>>>>>>>>>>> > to
>>>>>>>>>>>>>> > > > loo= k
>>>>>>>>>>>>>> > > > >= ; at logs (located in the default dir of
>>>>>>>>>>>>>> ${HADOOP_HOME/logs= }). But I
>>>>>>>>>>>>>> > > don'= t
>>>>>>>>>>>>>> > > > >= ; find any files which match the timestamp of the job.
>>>>>>>>>>>>>> Also I did not
>>>>>>>>>>>>>> > > find
>>>>>>>>>>>>>> > > > >= ; history and userlogs in the logs folder. Should I look
>>>>>>>>>>>>>> at some other
>>>>>>>>>>>>>> > > > pla= ce
>>>>>>>>>>>>>> > > > >= ; for the logs? What could be the possible causes for
>>>>>>>>>>>>>> the above error? >>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; =A0 =A0 =A0 =A0I am using Hadoop 0.20.2 and I am running it on
>>>>>>>>>>>>>> a cluster with
>>>>>>>>>>>>>> > > 16
>>>>>>>>>>>>>> > > > >= ; nodes.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Thank you.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ; Regards,
>>>>>>>>>>>>>> > > > >= ; Raghava.
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > > >= ;
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>