hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Bovy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5541) LIBHDFS questions and performance suggestions
Date Mon, 25 Nov 2013 22:28:36 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Stephen Bovy updated HDFS-5541:

    Attachment: pdclibhdfs.zip

Windows Porting Project ( and other nix comaptibility )  

Testing with  Hortonworks Windows Dist Based on hadoop 1.1.3 and    with jdk   "1.6.0_31"

These changes are based on latest GA 2.0.xx release 

Unix/Windows Compatibility Changes 
And Some Performance Enhancments 

Added  "uthash"  for windows hash table  

#ifdef WIN32
#include "uthash.h"

Added many #def  for windows vs unix  

Added  jvm-mutex macro 

#ifdef WIN32
#define LOCK_JVM_MUTEX() \
dwWaitResult = WaitForSingleObject(hdfs_JvmMutex,INFINITE) 
#define LOCK_JVM_MUTEX() \

#ifdef WIN32
#define UNLOCK_JVM_MUTEX() \
#define UNLOCK_JVM_MUTEX() \

>> Dynamically load the jvm <<  ( more flexable )  ( and easier to build )

added simplistic starting point for lib init function 
When this fucntion is used  locking in  getjni env can be avoided 

int hdfsLibInit ( void * parms )

    JNIEnv* env = getJNIEnv();
    if (!env) return 1;
    hdfs_InitLib = 1;
    return 0;


Convert Thread local storage init to use 

{ pthread_once )  to eliminate some locking issues  

( see below ) ::

JNIEnv* getJNIEnv(void)
    JNIEnv *env = NULL;
    HDFSTLS *tls = NULL;
    int ret = 0;
    jint rv = 0;
#ifdef WIN32
    DWORD dwWaitResult; 
    tls = TlsGetValue(hdfs_dwTlsIndex1); 
    if (tls) return tls->env;

    static __thread HDFSTLS *quickTls = NULL;
    if (quickTls) return quickTls->env;

#ifndef WIN32

    pthread_once(&hdfs_threadInit_Once, Make_Thread_Key);
    if (!hdfs_gTlsKeyInitialized)
        return NULL;
    tls = pthread_getspecific(hdfs_gTlsKey);
    if (tls) {
        return tls->env;

    if (!hdfs_InitLib) { 
        env = getGlobalJNIEnv();
    } else {
        rv = (*hdfs_JVM)->AttachCurrentThread(hdfs_JVM, (void**) &env, 0);
        if (rv != 0) {
            fprintf(stderr, "Call to AttachCurrentThread "
                    "failed with error: %d\n", rv);
            return NULL;
    if (!env) {
        fprintf(stderr, "getJNIEnv: getGlobalJNIEnv failed\n");
        return NULL;
    tls = calloc ( 1, sizeof(HDFSTLS) );
    if (!tls) {
        fprintf(stderr, "getJNIEnv: OOM allocating %zd bytes\n",
                sizeof(HDFSTLS) );
        return NULL;

    tls->env = env;

#ifdef WIN32
    printf ( "dll: save environment\n" );
    if (!TlsSetValue(hdfs_dwTlsIndex1, tls))
         return NULL;    
    return env;
    quickTls = tls;
    return env;

#ifndef WIN32
    ret = pthread_setspecific(hdfs_gTlsKey, tls);
    if (ret) {
        fprintf(stderr, "getJNIEnv: pthread_setspecific failed with "
            "error code %d\n", ret);
        return NULL;

    return env;


Also used ( pthread_once )  to init  hash table and simplify hash table locking 

static int insertEntryIntoTable ( const char *key, void *data )

    ENTRY e, *ep = NULL;
    if (key == NULL || data == NULL) {
        return 0;

    pthread_once ( &hdfs_hashTable_Once, hashTableInit );    
    if ( !hdfs_hashTableInited ) {
      return -1;


Note:  Some recent  enhancements are not backwards comaptible 

        /*This is not backwards comaptible */
        jthr = invokeMethod ( env, NULL, STATIC, NULL,
                         "loadFileSystems", "()V" );
        if (jthr) {            
            printExceptionAndFree ( env, jthr, PRINT_EXC_ALL,
                                    "loadFileSystems" );
            return NULL;
        } */


The "newInstance" functions are not backwards compatible 

and therfore must be avoided  

The new readDirect function  produces a method error  on windows jdk
64 bit 1.6.0_31 

java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

could not find method read from class org/apache/hadoop/fs/FSDataInputStream wit
h signature (Ljava/nio/ByteBuffer;)I
readDirect: FSDataInputStream#read error:
Begin Method Invokation:org/apache/commons/lang/exception/ExceptionUtils ## getS
End Method Invokation
Method success
java.lang.NoSuchMethodError: read
hdfsOpenFile(/tmp/testfile.txt): WARN: Unexpected error 255 when testing for dir
ect read compatibility


And finally  >>

Dag nab it >> I cannot figure this one out >> the append does not work 

Begin Method Invokation:org/apache/hadoop/fs/FileSystem ## append

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Append is not suppor
ted. Please see the dfs.support.append configuration parameter
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSName
        at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:

> LIBHDFS questions and performance suggestions
> ---------------------------------------------
>                 Key: HDFS-5541
>                 URL: https://issues.apache.org/jira/browse/HDFS-5541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Stephen Bovy
>            Priority: Minor
>         Attachments: pdclibhdfs.zip
> Since libhdfs is a "client" interface",  and esspecially because it is a "C" interface
, it should be assumed that the code will be used accross many different platforms, and many
different compilers.
> 1) The code should be cross platform ( no Linux extras )
> 2) The code should compile on standard c89 compilers, the
> >>>  {least common denominator rule applies here} !! <<  
> C  code with  "c"   extension should follow the rules of the c standard  
> All variables must be declared at the begining of scope , and no (//) comments allowed

> >> I just spent a week white-washing the code back to nornal C standards so that
it could compile and build accross a wide range of platforms << 
> Now on-to  performance questions 
> 1) If threads are not used why do a thread attach ( when threads are not used all the
thread attach nonesense is a waste of time and a performance killer ) 
> 2) The JVM  init  code should not be imbedded within the context of every function call
  .  The  JVM init code should be in a stand-alone  LIBINIT function that is only invoked
once.   The JVM * and the JNI * should be global variables for use when no threads are utilized.
> 3) When threads are utilized the attach fucntion can use the GLOBAL  jvm * created by
the LIBINIT  { WHICH IS INVOKED ONLY ONCE } and thus safely outside the scope of any LOOP
that is using the functions 
> 4) Hash Table and Locking  Why ?????
> When threads are used the hash table locking is going to hurt perfromance .  Why not
use thread local storage for the hash table,that way no locking is required either with or
without threads.   
> 5) FINALLY Windows  Compatibility 
> Do not use posix features if they cannot easilly be replaced on other platforms   !!

This message was sent by Atlassian JIRA

View raw message