hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12941) abort in Unsafe_GetLong when running IA64 HPUX 64bit mode
Date Mon, 21 Mar 2016 21:52:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205231#comment-15205231
] 

Colin Patrick McCabe commented on HADOOP-12941:
-----------------------------------------------

Hi gene,

It would be better if you put the stack into a comment than in the description.

bq. if (System.getProperty("os.arch").equals("sparc")) || System.getProperty("os.arch").equals("ia64"))
should be a pretty easy fix just testing in would be the issue.

ok

bq. And I can post the fix, but since I have never done this before would benefit from some
guidance.

Just attach it as a patch file to this JIRA.  Thanks!

> abort in Unsafe_GetLong when running IA64 HPUX 64bit mode 
> ----------------------------------------------------------
>
>                 Key: HADOOP-12941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12941
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: hpux IA64  running 64bit mode 
>            Reporter: gene bradley
>
> Now that we have a core to look at we can sorta see what is going on#14 0x9fffffffaf000dd0
in Java native_call_stub frame#15 0x9fffffffaf014470 in JNI frame: sun.misc.Unsafe::getLong
(java.lang.Object, long) ->long#16 0x9fffffffaf0067a0 in interpreted frame: org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
(byte[], int, int, byte[], int, int) ->int bci: 74#17 0x9fffffffaf0066e0 in interpreted
frame: org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
(java.lang.Object, int, int, java.lang.Object, int, int) ->int bci: 16#18 0x9fffffffaf006720
in interpreted frame: org.apache.hadoop.hbase.util.Bytes::compareTo (byte[], int, int, byte[],
int, int) ->int bci: 11#19 0x9fffffffaf0066e0 in interpreted frame: org.apache.hadoop.hbase.KeyValue$KVComparator::compareRowKey
(org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 36#20 0x9fffffffaf0066e0
in interpreted frame: org.apache.hadoop.hbase.KeyValue$KVComparator::compare (org.apache.hadoop.hbase.Cell,
org.apache.hadoop.hbase.Cell) ->int bci: 3#21 0x9fffffffaf0066e0 in interpreted frame:
org.apache.hadoop.hbase.KeyValue$KVComparator::compare (java.lang.Object, java.lang.Object)
->int bci: 9;; Line: 4000xc00000003ad84d30:0 <Unsafe_GetLong+0x130>:    (p1)  ld8
             r45=[r34]0xc00000003ad84d30:1 <Unsafe_GetLong+0x131>:          adds   
         r34=16,r320xc00000003ad84d30:2 <Unsafe_GetLong+0x132>:          adds      
      ret0=8,r32;;0xc00000003ad84d40:0 <Unsafe_GetLong+0x140>:          add        
     ret1=r35,r45 <==== r35 is off0xc00000003ad84d40:1 <Unsafe_GetLong+0x141>:  
       ld8              r35=[r34],240xc00000003ad84d40:2 <Unsafe_GetLong+0x142>:   
      nop.i            0x00xc00000003ad84d50:0 <Unsafe_GetLong+0x150>:          ld8
             r41=[ret0];;0xc00000003ad84d50:1 <Unsafe_GetLong+0x151>:          ld8.s
           r49=[r34],-240xc00000003ad84d50:2 <Unsafe_GetLong+0x152>:          nop.i
           0x00xc00000003ad84d60:0 <Unsafe_GetLong+0x160>:          ld8            
 r39=[ret1];; <=== abort0xc00000003ad84d60:1 <Unsafe_GetLong+0x161>:          ld8
             ret0=[r35]0xc00000003ad84d60:2 <Unsafe_GetLong+0x162>:          nop.i 
          0x0;;0xc00000003ad84d70:0 <Unsafe_GetLong+0x170>:          cmp.ne.unc    
  p1=r0,ret0;;M,MI0xc00000003ad84d70:1 <Unsafe_GetLong+0x171>:    (p1)  mov        
     r48=r410xc00000003ad84d70:2 <Unsafe_GetLong+0x172>:    (p1)  chk.s.i          r49,Unsafe_GetLong+0x290(gdb)
x /10i $pc-48*20x9fffffffaf000d70:           flushrs                                     
                      MMI0x9fffffffaf000d71:           mov              r44=r320x9fffffffaf000d72:
          mov              r45=r330x9fffffffaf000d80:           mov              r46=r34 
                                         MMI0x9fffffffaf000d81:           mov            
 r47=r350x9fffffffaf000d82:           mov              r48=r360x9fffffffaf000d90:        
  mov              r49=r37                                           MMI0x9fffffffaf000d91:
          mov              r50=r380x9fffffffaf000d92:           mov              r51=r39
> 0x9fffffffaf000da0:           adds             r14=0x270,r4                         
            MMI(gdb) p /x $r35$9 = 0x22(gdb) x /x $ret10x9ffffffe1d0d2bda:     0x677a68676c78743a(gdb)
x /x $r45+0x220x9ffffffe1d0d2bda:     0x677a68676c78743aSo here is the problem,  this is a
64bit JVM 0 : /opt/java8/bin/IA64W/java1 : -Djava.util.logging.config.file=/test28/gzh/tomcat/conf/logging.properties2
: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager3 : -Dorg.apache.catalina.security.SecurityListener.UMASK=0224
: -server5 : -XX:PermSize=128m6 : -XX:MaxPermSize=256m7 : -Djava.endorsed.dirs=/test28/gzh/tomcat/endorsed8
: -classpath9 : /test28/gzh/tomcat/bin/bootstrap.jar:/test28/gzh/tomcat/bin/tomcat-juli.jar10
: -Dcatalina.base=/test28/gzh/tomcat11 : -Dcatalina.home=/test28/gzh/tomcat12 : -Djava.io.tmpdir=/test28/gzh/tomcat/temp13
: org.apache.catalina.startup.Bootstrap14 : startSince they are not passing and -Xmx values
we are taking defaults which look at the system resources. So what is happening here is a
32 bit word aligned address is being used to index into a byte array (gdb) jo 0x9ffffffe1d0d2bb8_mark
= 0x0000000000000001, _klass = 0x9fffffffa8c00768, instance of type [Blength of the array:
1180 0 0 102 0 0 0 8 0 70 103 122 104 103 108 120 116 58 70 83 78 95 50 48 49 53 49 48 50
50 44 65 44 49 52 52 53 52 55 57 57 51 51 57 53 56 46 52 56 54 55 50 48 51 49 99 57 97 101
52 57 101 97 101 49 100 56 49 51 53 51 99 99 97 97 54 98 56 100 46 4 105 110 102 111 115 101
113 110 117 109 68 117 114 105 110 103 79 112 101 110 0 0 1 80 -6 96 -95 -48 4 0 0 0 0 0 0
0 4This is the whole string gdb) x /2s 0x9ffffffe1d0d2bd80x9ffffffe1d0d2bd8:      ""0x9ffffffe1d0d2bd9:
     "Fgzhglxt:FSN_20151022,A,1445479933958.48672031c9ae49eae1d81353ccaa6b8d.\004infoseqnumDuringOpen"To
me this is a bug in the callee potentially in org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareToWhy
are they calling Unsafe_GetLong on a byte array,  there is no checking of alignment and I
really think this is a bug on their part. As far as I know, GetLong expects 64 bit alignment
I did find some other 64 bit users who saw this with the same stack trace as this customer
> https://issues.apache.org/jira/browse/PHOENIX-1438http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.devel/39017
> the fix would go here by adding a test for ia64 
> looking at the code from a bug they are checking for if the box is sparc.  static Comparer<byte[]>
getBestComparer() {
> +      if (System.getProperty("os.arch").equals("sparc")) {  <====
> +        if (LOG.isTraceEnabled()) {
> +          LOG.trace("Lexicographical comparer selected for "
> +              + "byte aligned system architecture");
> +        }
> +        return lexicographicalComparerJavaImpl();
> +      }
>        try {
>          Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);so this is 'fixable'
from a java class perspective.Hari said he will talk with his open source contact 
> This Hadoop bug report points to the same problem in the same code:
> https://issues.apache.org/jira/browse/HADOOP-11466
> In that case the symptom of the unaligned accesses was bad performance instead of a crash.
This shows diffs for that fix:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201501.mbox/%3Cb19d5f83ca7148b782e5b432817b6448@git.apache.org%3E
> Those diffs show that fix only avoids the bad code when running on "sparc". They really
should have instead avoided that bad code for every architecture other than x86. They should
not be assuming that that FastByteComparisons enhancement will work on other processors and
actually improves performance. On processors that do allow unaligned accesses at much cost
they are just creating bad performance that will be hard for anyone to ever find.
> For all IA64 customers this will be an issue when running 64 bit. The IA processor enforces
alignment on instruction types



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message