hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9601) Support native CRC on byte arrays
Date Mon, 18 Aug 2014 21:27:19 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101335#comment-14101335
] 

Colin Patrick McCabe commented on HADOOP-9601:
----------------------------------------------

bq. btw, I found out bad interaction between between GC & getArrayCritical when the memory
is fragmented.  This is faster until it gets slow all of a sudden.  Please pass in the &isCopy
and run with G1GC to make sure it is doing zero-copy ops for getArrayRegion.

Interesting.

The documentation says this about {{GetPrimitiveArrayCritical}}:

bq. After calling GetPrimitiveArrayCritical, the native code should not run for an extended
period of time before it calls ReleasePrimitiveArrayCritical. We must treat the code inside
this pair of functions as running in a "critical region." Inside a critical region, native
code must not call other JNI functions, or any system call that may cause the current thread
to block and wait for another Java thread. (For example, the current thread must not call
read on a stream being written by another Java thread.)

This is exactly what we're doing in the HADOOP-10838 patch.  We call {{GetPrimitiveArrayCritical}},
do the checksums, and then immediately call {{ReleasePrimitiveArrayCritical}}.  If the JVM
chooses not to take the zero-copy route, we can't override its decision.  And we can't access
that array without calling one of the accessor functions.  So I don't know how this could
be improved; do you have any ideas?

> Support native CRC on byte arrays
> ---------------------------------
>
>                 Key: HADOOP-9601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9601
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance, util
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>            Assignee: Gopal V
>              Labels: perfomance
>         Attachments: HADOOP-9601-WIP-01.patch, HADOOP-9601-WIP-02.patch, HADOOP-9601-bench.patch,
HADOOP-9601-rebase+benchmark.patch, HADOOP-9601-trunk-rebase-2.patch, HADOOP-9601-trunk-rebase.patch
>
>
> When we first implemented the Native CRC code, we only did so for direct byte buffers,
because these correspond directly to native heap memory and thus make it easy to access via
JNI. We'd generally assumed that accessing byte[] arrays from JNI was not efficient enough,
but now that I know more about JNI I don't think that's true -- we just need to make sure
that the critical sections where we lock the buffers are short.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message