hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan Pollan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
Date Wed, 01 Feb 2012 06:28:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197610#comment-13197610
] 

Evan Pollan commented on MAPREDUCE-3583:
----------------------------------------

Turns out, the bug is in CDH3U3's version of ProcfsBasedProcessTree.  Once I updated my cluster
creation automation to specifically use CDH3U2 (rather than the latest update to CDH3), the
problem went away.

CDH3U3's version of ProcfsBasedProcessTree is much closer to the trunk's version than CDH3U2
(as you would expect).  So, it could be that more recent versions of this class have introduced
incompatibilities with 64 bit Ubuntu (and possibly other distros).
                
> ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.205.0
>         Environment: 64-bit Linux:
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC
2011 x86_64 GNU/Linux
>            Reporter: Zhihong Yu
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: mapreduce-3583.txt
>
>
> HBase PreCommit builds frequently gave us NumberFormatException.
> From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
> {code}
> 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file set.  User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> java.lang.NumberFormatException: For input string: "18446743988060683582"
> 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> 	at java.lang.Long.parseLong(Long.java:422)
> 	at java.lang.Long.parseLong(Long.java:468)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
> 	at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
> 	at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE:
> {code}
>         // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
>          pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
> {code}
> You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
> {code}
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC
2011 x86_64 GNU/Linux
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 20
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 16382
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 60000
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 2048
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 60000
> Running in Jenkins mode
> {code}
> From Nicolas Sze:
> {noformat}
> It looks like that the ppid is a 64-bit positive integer but Java long is signed and
so only works with 63-bit positive integers.  In your case,
>   2^64 > 18446743988060683582 > 2^63.
> Therefore, there is a NFE. 
> {noformat}
> I propose changing allProcessInfo to Map<String, ProcessInfo> so that we don't
encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message