hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-941) Make Hadoop Record I/O Easier to use outside Hadoop
Date Sat, 24 Feb 2007 03:58:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475586
] 

Milind Bhandarkar commented on HADOOP-941:
------------------------------------------

>Since one of the uses that is being considered is for a wire protocol, that is precisely
what makes Writable and 
WritableComparable valuable. 

Unfortunately, Writable imposes a single format. ├ůs it is currently in Hadoop, Writable implies
Binary (not-so packed Binary) format. From the users' experience, just by changing the format
as a string in the record i/o interface, they can output csv or xml, other common formats
(pluggable, JSON as a possible candidate), with the advantage that they are readable and therefore
more debuggable. If Writable interface allowed us to specify a different serialization method,
it would not be as indecipherable as it is now.

The Record interface, which has more readable "serialize" and "deserialize" methods  rather
than "write" and "readFields" (i.e. binary) were to replace Writable completely, it would
be good for Hadoop long-term.

e.g. Then all the popular formats, e.g. Text, Binary, and XML will all be interpretable through
a single interface.

Maybe we can use this dicussion for a better cause. To fix the problems in the Hadoop development
process itself. This proposal has been on Jira for one whole month. Asking for input from
users and other stake holders. And after significant efforts spent on implementation, this
minor issue has come to the fore. What bothers me is the failure of the process, not so much
of the wasted efforts. We need to learn from this, so that we avoid such happenings in the
future. How can we improve this process ?


> Make Hadoop Record I/O Easier to use outside Hadoop
> ---------------------------------------------------
>
>                 Key: HADOOP-941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-941
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.10.1
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>         Attachments: jute-patch.txt
>
>
> Hadoop record I/O can be used effectively outside of Hadoop. It would increase its utility
if developers can use it without having to import hadoop classes, or having to depend on Hadoop
jars. Following changes to the current translator and runtime are proposed.
> Proposed Changes:
> 1. Use java.lang.String as a native type for ustring (instead of Text.)
> 2. Provide a Buffer class as a native Java type for buffer (instead of BytesWritable),
so that later BytesWritable could be implemented as following DDL:
> module org.apache.hadoop.io {
>   record BytesWritable {
>     buffer value;
>   }
> }
> 3. Member names in generated classes should not have prefixes 'm' before their names.
In the above example, the private member name would be 'value' not 'mvalue' as it is done
now.
> 4. Convert getters and setters to have CamelCase. e.g. in the above example the getter
will be:
>   public Buffer getValue();
> 5. Provide a 'swiggable' C binding, so that processing the generated C code with swig
allows it to be used in scripting languages such as Python and Perl.
> 6. The default --language="java" target would generate class code for records that would
not have Hadoop dependency on WritableComparable interface, but instead would have "implements
Record, Comparable". (i.e. It will not have write() and readFields() methods.) An additional
option "--writable" will need to be specified on rcc commandline to generate classes that
"implements Record, WritableComparable".
> 7. Optimize generated write() and readFields() methods, so that they do not have to create
BinaryOutputArchive or BinaryInputArchive every time these methods are called on a record.
> 8. Implement ByteInStream and ByteOutStream for C++ runtime, as they will be needed for
using Hadoop Record I/O with forthcoming C++ MapReduce framework (currently, only FileStreams
are provided.)
> 9. Generate clone() methods for records in Java i.e. the generated classes should implement
Cloneable.
> 10. As part of Hadoop build process, produce a tar bundle for Record I/O alone. This
tar bundle will contain the translator classes and ant task (lib/rcc.jar), translator script
(bin/rcc), Java runtime (recordio.jar) that includes org.apache.hadoop.record.*, sources for
the java runtime (src/java), and c/c++ runtime sources with Makefiles (src/c++, src/c).
> 11. Make generated Java codes for maps and vectors use Java generics.
> These are the proposed user-visible changes. Internally, the translator will be restructured
so that it is easier to plug-in translators for different targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message