Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 88004 invoked from network); 3 Mar 2007 04:07:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Mar 2007 04:07:13 -0000 Received: (qmail 7845 invoked by uid 500); 3 Mar 2007 04:07:21 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 7812 invoked by uid 500); 3 Mar 2007 04:07:21 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 7802 invoked by uid 99); 3 Mar 2007 04:07:21 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Mar 2007 20:07:21 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Mar 2007 20:07:12 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E05F6714339 for ; Fri, 2 Mar 2007 20:06:51 -0800 (PST) Message-ID: <10432062.1172894811916.JavaMail.jira@brutus> Date: Fri, 2 Mar 2007 20:06:51 -0800 (PST) From: "Milind Bhandarkar (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1053) Make Record I/O usable from independent of Hadoop In-Reply-To: <24644682.1172700110807.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477594 ] Milind Bhandarkar commented on HADOOP-1053: ------------------------------------------- I agree with the points that David has made, i.e. the description of this Jira issue should be modified to remove the "hadoop-dependency" as a goal, but rather making hadoop record i/o functionally modular (thanks for those poignant words, David. After all, proper usage of words do matter for easing objections and reaching consensus). In the context of this discussion, I would like to know the opinion of hadoop-committers: if a significant user-base had asked for independent usage of let's say only hadoop DFS independent of the hadoop map-reduce framework. Would they have agreed to separate them so that they can get a huge base of users? I agree that this may seem a hypothetical question, but I have been asked this question by a lot of people, and so I can assure you all that it is indeed not a hypothetical question. The particular question that I was asked by an aspiring web 2.0 company was this: I have been following the hadoop-dev list for the last three months, and I have noticed that while the dfs side of hadoop is adding features, the map-reduce side of things have been fixing serious bugs. In any case, I do not need map-reduce, but believe that dfs would be a great feature to have. Is there a way to just use dfs in my project without the rest of the hadoop? My answer to that was, yes, of course. DFS is functionally modular (p.s. they do not want sequncefile. They have their own ideas about storing key-value pairs.). Was my answer correct? Should I be suggesting them instead, that "No, you have to use map-reduce framework even if you do not need it?" (P.S. I have suggested them a way to repackage the jar so that they can use only DFS, and watch only the dfs component traffic on hadoop-dev.) So, now this comes back to me in the record i/o context. Why can't I say the same to Hadoop record I/O users ? (About three users, two startup founders that I happen to know, have asked me this.) But long-term vision aside, I believe this patch is an important step ahead in this on-going saga. It at least reaches mid-way. Those looking at the generated code do not get puzzled by why there are two ways of serializing the record in packed binary serialization format, when the Record.serialize suffices for all current and future formats. > Make Record I/O usable from independent of Hadoop > ------------------------------------------------- > > Key: HADOOP-1053 > URL: https://issues.apache.org/jira/browse/HADOOP-1053 > Project: Hadoop > Issue Type: Improvement > Components: record > Affects Versions: 0.11.2 > Environment: All > Reporter: Milind Bhandarkar > Assigned To: Milind Bhandarkar > Fix For: 0.13.0 > > Attachments: jute-patch.txt > > > This issue has been created to separate one proposal originally included in HADOOP-941, for which no consensus could be reached. For earlier discussion about the issue, please see HADOOP-941. > I will summarize the proposal here. We need to provide a way for some users who want to use record I/O framework outside of Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.