hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Basic question about using C# with Hadoop filesystems
Date Sun, 10 Jan 2010 19:30:42 GMT
Bear in mind that hdfs-fuse has something like a 30% performance impact
when compared with direct access via the Java API. The data path is
something like:

    your app -> kernel -> libfuse -> JVM -> kernel -> HDFS

    HDFS -> kernel-> JVM -> libfuse -> kernel -> your app

On Windows especially context switching during I/O like that has a 
high penalty. Maybe it would be better to bind the C libhdfs API
directly via a C# wrapper (see http://wiki.apache.org/hadoop/LibHDFS).
But, at that point, you have pulled the Java Virtual Machine into the
address space of your process and are bridging between Java land and
C# land over the JNI and the C# equivalent. So, at this point, why not
just use Java instead of C#? Or, just use C and limit the damage to
only one native-to-managed interface instead of two?

The situation will change somewhat when/if all HDFS RPC is moved to
some RPC and serialization scheme which is truly language independent,
i.e. Avro. I have no idea when or if that will happen. Even if that
happens, as Ryan said before, the HDFS client is fat. Just talking
the RPC gets you maybe 25% of the way toward a functional HDFS
client. 

The bottom line is the Hadoop software ecosystem has a strong Java
affinity. 

   - Andy



----- Original Message ----
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Sun, January 10, 2010 8:57:32 AM
> Subject: Re: Basic question about using C# with Hadoop filesystems
> 
> http://code.google.com/p/hdfs-fuse/
> 
> On Sun, Jan 10, 2010 at 7:36 AM, Aram Mkhitaryan
> wrote:
> > ah, sorry, forgot to mention, it's in hdfs-user mailing list
> > hdfs-user@hadoop.apache.org


      


Mime
View raw message