hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4707) Improvements to Hadoop Thrift bindings
Date Sat, 06 Jun 2009 18:30:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716901#action_12716901
] 

Todd Lipcon commented on HADOOP-4707:
-------------------------------------

Hey Dhruba,

It's true that this introduces code that runs in the NN. However, the design of this new code
maps one-for-one onto the communication design of HDFS itself - no data goes through the NN
thrift server. The additional load on the NN is simply the metadata RPCs that the clients
are sending, and of course the sockets serving the Thrift connections. Given that the existing
(external) Thrift service just fowards the metadata RPCs to the NameNode, there's no additional
load there. The external server probably does demultiplex them down to fewer RPC connections,
and Thrift is probably less CPU-efficient than the hand-coded Writable-based RPCs that Hadoop
uses internally.

As you mentioned, and correlated with my experience, the NN resource that is in short supply
is memory more often than CPU. Given that, I don't think the additional load from the Thrift
SerDe is going to end up being significant.

If we expect that there will be a high number of concurrent clients to the NN Thrift service,
switching out the TThreadPoolServer for something based on the new-ish TNonBlockingServer
would decrease memory consumption by reducing the number of threads. I can look into that
if you like.

However, I do agree that some ops people will have philosophical objections to putting more
code in the NameNode. This makes sense, and might be a good reason to keep the external Thrift
gateway alive. However, I haven't touched its code, and don't want to commit to maintain it
implicitly by adding new code next to it :)

Thanks
-Todd

> Improvements to Hadoop Thrift bindings
> --------------------------------------
>
>                 Key: HADOOP-4707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4707
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/thiftfs
>    Affects Versions: 0.20.0
>         Environment: Tested under Linux x86-64
>            Reporter: Carlos Valiente
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: all.diff, BlockManager.java, build_xml.diff, DefaultBlockManager.java,
DFSBlockManager.java, gen.diff, hadoop-4707-31c331.patch.gz, HADOOP-4707-55c046a.txt, hadoop-4707-6bc958.txt,
hadoop-4707-867f26.txt.gz, HADOOP-4707.diff, HADOOP-4707.patch, HADOOP-4707.patch, hadoopfs_thrift.diff,
hadoopthriftapi.jar, HadoopThriftServer.java, HadoopThriftServer_java.diff, hdfs.py, hdfs_py_venky.diff,
libthrift.jar, libthrift.jar, libthrift.jar, libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them consistent
(in particular FileStatus{blockReplication,blockSize} vs FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a Perl/Python/C++
user, 1) it is already clear that we're using Thrift, and 2) the fact that we're dealing with
Hadoop is already explicit in the namespace.  The usage of generated code is more compact
and (in my opinion) clearer:
> {quote}
>         *Perl*:
>         use HadoopFS;
>         my $client = HadoopFS::FileSystemClient->new(..);
>          _instead of:_
>         my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
>         *Python*:
>         from hadoopfs import FileSystem
>         client = FileSystem.Client(..)
>         _instead of_
>         from hadoopfs import ThriftHadoopFileSystem
>         client = ThriftHadoopFileSystem.Client(..)
>         (See also the attached diff [^scripts_hdfs_py.diff] for the
>          new version of 'scripts/hdfs.py').
>         *C++*:
>         hadoopfs::FileSystemClient client(..);
>          _instead of_:
>         hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing with a
Thrift object, and its purpose (to act as a handle for file operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, MalformedInputException
and ThriftIOException/IOException, to improve compatibility of existing clients with future
versions of the interface which might add new fields to those objects (like stack traces for
the exception types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in [^gen.diff].
They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also attached,
since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the Java bindings
needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java (see attachment [^build_xml.diff]
and modified HadoopThriftServer.java to make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be regenerated
from the stuff under 'gen-java' and the new 'compile-gen' Ant target.
> The whole changeset is also included as [^all.diff].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message