hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopi Krishna M <mgo...@gmail.com>
Subject Re: one minute delay in running a simple ls command on hadoop (maybe near security groups..): hadoop 0.23.5
Date Thu, 04 Apr 2013 15:17:11 GMT
Hi Marcos,

executed it from shell.. returns immediately, no delay and the output is
just hduser..

hduser@hmaster-asia:~/installations/hadoop-0.23.5$ bash -c "id -Gn hduser"
hduser

one important thing I noticed on logs is (posted in my first mail), it
always get stuck *twice for exactly 60secs*.  This looks like some timeout.
 Went through settings which are close to 60secs, but no luck yet.
Output of strace shows futex as the last call after which it takes 2 mins
and then print the ls output.  Not clear if it is stuck on futex there as
that seems to be the last system call.  I tried with 0.23.6 but got the
same behavior.
*
*
*strace -t bin/hadoop dfs -ls / ; date*
.......
15:09:29 openat(AT_FDCWD,
"/home/hduser/installations/hadoop-0.23.5/share/hadoop/mapreduce/", O_RDON
LY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
15:09:29 getdents(3, /* 11 entries */, 32768) = 584
15:09:29 getdents(3, /* 0 entries */, 32768) = 0
15:09:29 close(3)                       = 0
15:09:29 mmap(NULL, 1052672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7
ff716300000
15:09:29 mprotect(0x7ff716300000, 4096, PROT_NONE) = 0
15:09:29 clone(child_stack=0x7ff7163ffff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_T
HREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7ff716400
9d0, tls=0x7ff716400700, child_tidptr=0x7ff7164009d0) = 42720
*15:09:29 futex*(0x7ff7164009d0, FUTEX_WAIT, 42720, NULLFound 5 items
drwxr-xr-x   - hduser supergroup          0 2013-03-26 12:28
/aea8d83f-c0a6-4646-a88d-f9a89be92486
drwxr-xr-x   - hduser supergroup          0 2013-03-06 12:43 /in
drwxr-xr-x   - hduser supergroup          0 2013-03-04 12:57 /out1
drwxrwx---   - hduser supergroup          0 2013-03-04 12:32 /tmp
drwx------   - hduser supergroup          0 2013-03-04 12:49 /user
 <unfinished ... exit status 0>
Thu Apr  4* 15:11:32 UTC 2013*
~/installations/hadoop-0.23.5$







On Thu, Apr 4, 2013 at 6:36 PM, MARCOS MEDRADO RUBINELLI <
marcosm@buscapecompany.com> wrote:

>  Gopi,
>
> The namenode is essentially running bash -c "id -Gn hduser" and waiting
> for the response. You could try executing from the shell it to see if it
> does take a long time in Azure, or if the output is too complex to parse.
>
> Regards,
> Marcos
>
>
> In 04-04-2013 09:33, Gopi Krishna M wrote:
>
> in case anybody has some idea on what I should look for in debugging
> this.. below is a strace of the simple bin/hadoop dfs -ls / command run on
> the machine running namenode
>
>  % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  49.56    0.000853          45        19        14 execve
>  36.20    0.000623           1       925           rt_sigprocmask
>   5.75    0.000099           2        62           brk
>   5.29    0.000091           9        10           getgid
>   3.20    0.000055           1        81           rt_sigaction
>   0.00    0.000000           0        71           read
>   0.00    0.000000           0         3           write
>   0.00    0.000000           0        49        11 open
>   0.00    0.000000           0        87           close
>   0.00    0.000000           0        71        31 stat
>   0.00    0.000000           0        36           fstat
>   0.00    0.000000           0        12           lseek
>   0.00    0.000000           0        72           mmap
>   0.00    0.000000           0        36           mprotect
>   0.00    0.000000           0         9           munmap
>   0.00    0.000000           0        19           rt_sigreturn
>   0.00    0.000000           0         2         2 ioctl
>   0.00    0.000000           0        40        31 access
>   0.00    0.000000           0        19           pipe
>   0.00    0.000000           0         8           dup2
>   0.00    0.000000           0         2           getpid
>   0.00    0.000000           0        20           clone
>   0.00    0.000000           0        38        19 wait4
>   0.00    0.000000           0         2           uname
>   0.00    0.000000           0        24         2 fcntl
>   0.00    0.000000           0        16           getdents
>   0.00    0.000000           0         2           readlink
>   0.00    0.000000           0         2           gettimeofday
>   0.00    0.000000           0         5           getrlimit
>   0.00    0.000000           0        10           getuid
>   0.00    0.000000           0        10           geteuid
>   0.00    0.000000           0        10           getegid
>   0.00    0.000000           0         2           getppid
>   0.00    0.000000           0         2           getpgrp
>   0.00    0.000000           0         5           arch_prctl
>   0.00    0.000000           0         2         1 futex
>   0.00    0.000000           0         1           set_tid_address
>   0.00    0.000000           0        10         2 openat
>   0.00    0.000000           0         1           set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.001721                  1795       113 total
>
>
> On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mgopik@gmail.com> wrote:
>
>> Hi..
>>
>>  I have installed hadoop 0.23.5 and is working fine on two of my
>> installations.  In a new installation on Windows Azure VMs, I am seeing an
>> inordinate delay (of 1 minute) when doing a simple dfs ls command.
>>
>>  This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
>>
>>  command : bin/hadoop dfs -ls /
>>
>>  With log set to DEBUG, following is the output on the console.  The
>> logs in bold show the sleep/timeout/delay of *1 minute*.  Any pointers
>> on what to check for / what is the reason for this delay?
>>
>>  13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> successful kerberos logins
>> and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> failed kerberos logins and
>> latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and
>> group related metrics
>> 13/03/31 05:49:55 DEBUG security.Groups:  Creating new Groups object
>> *13/03/31 05:49:55 DEBUG security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUni*
>> *xGroupsMapping; cacheTimeout=300000*
>> *13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login*
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local
>> user:UnixPrincipal: hduser
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI
>> loginUser:hduser (auth:SIMPLE)
>> 13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
>> 13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
>> 13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for
>> protocol ClientProtocol
>> 13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
>>
>>
>>  Thanks
>>  Gopi
>>
>>
>
>

Mime
View raw message