Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <10432062.1172894811916.JavaMail.jira@brutus>
Date: Fri, 2 Mar 2007 20:06:51 -0800 (PST)
From: "Milind Bhandarkar (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-1053) Make Record I/O usable from
 independent of Hadoop
In-Reply-To: <24644682.1172700110807.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477594 ] 

Milind Bhandarkar commented on HADOOP-1053:
-------------------------------------------

I agree with the points that David has made, i.e. the description of this Jira issue should be modified to remove the "hadoop-dependency" as a goal, but rather making hadoop record i/o functionally modular (thanks for those poignant words, David. After all, proper usage of words do matter for easing objections and reaching consensus).

In the context of this discussion, I would like to know the opinion of hadoop-committers: if a significant user-base had asked for independent usage of let's say only hadoop DFS independent of the hadoop map-reduce framework. Would they have agreed to separate them so that they can get a huge base of users? I agree that this may seem a hypothetical question, but I have been asked this question by a lot of people, and so I can assure you all that it is indeed not a hypothetical question.

The particular question that I was asked by an aspiring web 2.0 company was this: I have been following the hadoop-dev list for the last three months, and I have noticed that while the dfs side of hadoop is adding features, the map-reduce side of things have been fixing serious bugs. In any case, I do not need map-reduce, but believe that dfs would be a great feature to have. Is there a way to just use dfs in my project without the rest of the hadoop?

My answer to that was, yes, of course. DFS is functionally modular (p.s. they do not want sequncefile. They have their own ideas about storing key-value pairs.). Was my answer correct? Should I be suggesting them instead, that "No, you have to use map-reduce framework even if you do not need it?" (P.S. I have suggested them a way to repackage the jar so that they can use only DFS, and watch only the dfs component traffic on hadoop-dev.)

So, now this comes back to me in the record i/o context. Why can't I say the same to Hadoop record I/O users ? (About three users, two  startup founders that I happen to know, have asked me this.)

But long-term vision aside, I believe this patch is an important step ahead in this on-going saga. It at least reaches mid-way. Those looking at the generated code do not get puzzled by why there are two ways of serializing the record in packed binary serialization format, when the Record.serialize suffices for all current and future formats.

> Make Record I/O usable from independent of Hadoop
> -------------------------------------------------
>
>                 Key: HADOOP-1053
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1053
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.11.2
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.13.0
>
>         Attachments: jute-patch.txt
>
>
> This issue has been created to separate one proposal originally included in HADOOP-941, for which no consensus could be reached. For earlier discussion about the issue, please see HADOOP-941.
> I will summarize the proposal here.  We need to provide a way for some users who want to use record I/O framework outside of Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.