hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1053) Make Record I/O usable from independent of Hadoop
Date Sat, 03 Mar 2007 04:06:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477594

Milind Bhandarkar commented on HADOOP-1053:

I agree with the points that David has made, i.e. the description of this Jira issue should
be modified to remove the "hadoop-dependency" as a goal, but rather making hadoop record i/o
functionally modular (thanks for those poignant words, David. After all, proper usage of words
do matter for easing objections and reaching consensus).

In the context of this discussion, I would like to know the opinion of hadoop-committers:
if a significant user-base had asked for independent usage of let's say only hadoop DFS independent
of the hadoop map-reduce framework. Would they have agreed to separate them so that they can
get a huge base of users? I agree that this may seem a hypothetical question, but I have been
asked this question by a lot of people, and so I can assure you all that it is indeed not
a hypothetical question.

The particular question that I was asked by an aspiring web 2.0 company was this: I have been
following the hadoop-dev list for the last three months, and I have noticed that while the
dfs side of hadoop is adding features, the map-reduce side of things have been fixing serious
bugs. In any case, I do not need map-reduce, but believe that dfs would be a great feature
to have. Is there a way to just use dfs in my project without the rest of the hadoop?

My answer to that was, yes, of course. DFS is functionally modular (p.s. they do not want
sequncefile. They have their own ideas about storing key-value pairs.). Was my answer correct?
Should I be suggesting them instead, that "No, you have to use map-reduce framework even if
you do not need it?" (P.S. I have suggested them a way to repackage the jar so that they can
use only DFS, and watch only the dfs component traffic on hadoop-dev.)

So, now this comes back to me in the record i/o context. Why can't I say the same to Hadoop
record I/O users ? (About three users, two  startup founders that I happen to know, have asked
me this.)

But long-term vision aside, I believe this patch is an important step ahead in this on-going
saga. It at least reaches mid-way. Those looking at the generated code do not get puzzled
by why there are two ways of serializing the record in packed binary serialization format,
when the Record.serialize suffices for all current and future formats.

> Make Record I/O usable from independent of Hadoop
> -------------------------------------------------
>                 Key: HADOOP-1053
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1053
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.11.2
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.13.0
>         Attachments: jute-patch.txt
> This issue has been created to separate one proposal originally included in HADOOP-941,
for which no consensus could be reached. For earlier discussion about the issue, please see
> I will summarize the proposal here.  We need to provide a way for some users who want
to use record I/O framework outside of Hadoop.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message