hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
Date Thu, 02 Feb 2012 23:18:53 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199353#comment-13199353
] 

Tom White commented on MAPREDUCE-3583:
--------------------------------------

Changing to use BigInteger sounds like the right thing to do. 

> Since dtime is used in getCumulativeCpuTime(), we need to change getCumulativeCpuTime()
to return BigInteger as well.

Is it not possible to do the calculations using BigIntegers (to avoid overflow) then convert
to a long (since the final result can be represented in a long)?
                
> ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.205.0
>         Environment: 64-bit Linux:
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC
2011 x86_64 GNU/Linux
>            Reporter: Zhihong Yu
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: mapreduce-3583.txt
>
>
> HBase PreCommit builds frequently gave us NumberFormatException.
> From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
> {code}
> 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file set.  User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> java.lang.NumberFormatException: For input string: "18446743988060683582"
> 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> 	at java.lang.Long.parseLong(Long.java:422)
> 	at java.lang.Long.parseLong(Long.java:468)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
> 	at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
> 	at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE:
> {code}
>         // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
>          pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
> {code}
> You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
> {code}
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC
2011 x86_64 GNU/Linux
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 20
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 16382
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 60000
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 2048
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 60000
> Running in Jenkins mode
> {code}
> From Nicolas Sze:
> {noformat}
> It looks like that the ppid is a 64-bit positive integer but Java long is signed and
so only works with 63-bit positive integers.  In your case,
>   2^64 > 18446743988060683582 > 2^63.
> Therefore, there is a NFE. 
> {noformat}
> I propose changing allProcessInfo to Map<String, ProcessInfo> so that we don't
encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message