hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlos Valiente (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4707) Improvements to Hadoop Thrift bindings
Date Wed, 11 Feb 2009 21:12:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672780#action_12672780

Carlos Valiente commented on HADOOP-4707:

Thanks for your feedback, Dhruba!

bq. 2. Is there a way to "layer" the Thrift layer (DatanodeThriftServer and NameNodeThrift
server) around the org.apache.hadoop.hdfs package (instead of making them part of org.apache.hadoop.hdfs).
Can these be part of org.apache.hadoop.thriftfs and reside in the src/contrib/thriftfs directory?

Yes, no problem with that. I chose to work in src/hdfs mainly to avoid conflicts with your
code under src/contrib/thriftfs.

bq. 3. There could be new source files in the base hdfs package that allows plugging in of
multiple protocol stacks (the first one being Thrift). This code could use reflection (and
configuration settings) to figure out if the libhdfsthrift.jar is in the classpath, and if
so, then use those methods from that jar to initialize the Thrift server. The reason I propose
the above is because it does not force every Hadoop install to use Thrift. It keeps the base
Namenode/Datanode code clean and elegent. It also allows plugging in other protocol stacks
to expose the Namenode/datanode functionality.

Yes, that sounds good. Would it make sense to open a new JIRA issue to implement the plugin
functionality? Otherwise the attachment list to this issue might end up quite crowded.

bq. 4. It is possible that the Thrift implementation might need to use some (currently) package-private
methods in the Datanode/Namenode, but we can work on making them public if need be.

OK, so no real reason for hosting code under org.hadoop.hdfs.server.{datanode,namenode}, then.

bq. 5. Allowing the Thrift interface to read file contents is easy. However, writing to blocks
is more difficult, especially because the DFSClient.java is a heavy-weight piece of code and
participates heavily in ensuring correct recovery from write-pipeline failures, allows "appending"
to existing files, ensuring all blocks of a file are equal size, etc.etc. Do you have an application
that will need to write to HDFS files using this Thrift interface?

Reading is by far more important to me than writing, and I could just do the writes using
Java. If the current implementations of append(), create(), addBlock() and writeBlock() were
show-stoppers for getting the stuff committed sooner, I can focus first on addressing your
points, and then figure out later what to do with them. What would you recommend?

bq. 6. Your unit tests are nice. It is imperative for us to detect incompatible changes to
base HDFS APIs earlier rather than later.

Being able to test the Thrift code (and nothing else) by doing a:
    $ ant -Dtestcase=TestThriftServer test-core

was very helpful during development. Will it be possible to do a similar thing when the Thrift
code moves back to src/contrib/thriftfs?

> Improvements to Hadoop Thrift bindings
> --------------------------------------
>                 Key: HADOOP-4707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4707
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/thiftfs
>    Affects Versions: 0.20.0
>         Environment: Tested under Linux x86-64
>            Reporter: Carlos Valiente
>            Priority: Minor
>         Attachments: all.diff, BlockManager.java, build_xml.diff, DefaultBlockManager.java,
DFSBlockManager.java, gen.diff, HADOOP-4707.diff, hadoopfs_thrift.diff, hadoopthriftapi.jar,
HadoopThriftServer.java, HadoopThriftServer_java.diff, hdfs.py, hdfs_py_venky.diff, libthrift.jar,
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them consistent
(in particular FileStatus{blockReplication,blockSize} vs FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a Perl/Python/C++
user, 1) it is already clear that we're using Thrift, and 2) the fact that we're dealing with
Hadoop is already explicit in the namespace.  The usage of generated code is more compact
and (in my opinion) clearer:
> {quote}
>         *Perl*:
>         use HadoopFS;
>         my $client = HadoopFS::FileSystemClient->new(..);
>          _instead of:_
>         my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
>         *Python*:
>         from hadoopfs import FileSystem
>         client = FileSystem.Client(..)
>         _instead of_
>         from hadoopfs import ThriftHadoopFileSystem
>         client = ThriftHadoopFileSystem.Client(..)
>         (See also the attached diff [^scripts_hdfs_py.diff] for the
>          new version of 'scripts/hdfs.py').
>         *C++*:
>         hadoopfs::FileSystemClient client(..);
>          _instead of_:
>         hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing with a
Thrift object, and its purpose (to act as a handle for file operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, MalformedInputException
and ThriftIOException/IOException, to improve compatibility of existing clients with future
versions of the interface which might add new fields to those objects (like stack traces for
the exception types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in [^gen.diff].
They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also attached,
since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the Java bindings
needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java (see attachment [^build_xml.diff]
and modified HadoopThriftServer.java to make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be regenerated
from the stuff under 'gen-java' and the new 'compile-gen' Ant target.
> The whole changeset is also included as [^all.diff].

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message