Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4EA69200B4B for ; Thu, 21 Jul 2016 21:57:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4D3AA160A73; Thu, 21 Jul 2016 19:57:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 93572160A72 for ; Thu, 21 Jul 2016 21:57:42 +0200 (CEST) Received: (qmail 36302 invoked by uid 500); 21 Jul 2016 19:57:41 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 36290 invoked by uid 99); 21 Jul 2016 19:57:41 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2016 19:57:41 +0000 Received: from [10.17.2.134] (unknown [206.169.106.2]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 45D6D1A0055 for ; Thu, 21 Jul 2016 19:57:41 +0000 (UTC) From: "Till Westmann" To: dev@asterixdb.apache.org Subject: Re: Working with Hadoop Date: Thu, 21 Jul 2016 12:57:40 -0700 Message-ID: <27A42791-382B-40F7-8766-6F94CC0CBDB1@apache.org> In-Reply-To: References: <7986F3C9-C72F-464B-B9F6-C381E6E1260A@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailer: MailMate (1.9.4r5234) archived-at: Thu, 21 Jul 2016 19:57:43 -0000 Ok, so would it make sense (and work) to update all of out dependencies to that lastest 2.6 release? Longer term - if we want to continue to support HDFS - it seems that we should think about being able to support different versions of HDFS with the same AsterixDB instance. That way we could use and combine data from different clusters with the data in AsterixDB. Does that make sense? Would that be desirable and feasible? Cheers, Till On 21 Jul 2016, at 11:10, Mike Carey wrote: > My 0.15 cents' worth: > > 1 is of definite interest as a way of sneakily expanding our turf - > AsterixDB is in the "NoSQL on steroids" space, in terms of our > features and functionality - but can properly encroach on the "SQL on > Hadoop" analytics world with 1. That's something that's of interest, > I think. For now I think supporting one popular version of Hadoop is > good - so 2.x.x is a fine answer for that. > > 2 was an NSF deliverable and we felt it would be helpful w.r.t. the > world of 1 - i.e., maybe folks would be more comfortable running us in > their data centers if their YARN sysadmins could be the resource/etc > managers. I think that's also still of interest, and both 1 and 2 are > things we should maintain. > > 3 is for an interesting/fun research question - namely, would > AsterixDB on HDFS storage be better from a replication, etc., > standpoint than AsterixDB doing everything natively and using DB-style > replication. The goal of 3 is to explore that question but not to > make HDFS-ified AsterixDB a released/supported feature in AsterixDB in > any particular timeframe. At the time we started looking at 3, we > were also thinking it might (albeit misguidedly :-)) make potential > "enterprise adopters" of AsterixDB happier to "know that their data is > safely kept in HDFS". (Nevermind that we could corrupt the details of > their data and make it unusable still. :-)) I think that's no longer > something we need to worry about as a reason for 3 - the real reason > for 3 is experimental systems research (i.e., the native vs. HDFS > performance issues study). > > Cheers, > > Mike > > > On 7/21/16 1:49 AM, abdullah alamoudi wrote: >> I think that list is all we've got. We only support Hadoop 2.x.x. >> We found that supporting both 1.x and 2.x has a cost that we couldn't >> afford. I believe there are fundamental differences between Hadoop >> 1.x and >> 2.x and that a good segment of Hadoop community still use 1.x. >> However, it >> has been a while since 1.x got a new release and so, I am not sure if >> it is >> worth investing time in making it work. >> >> Also, seems to me that our Hadoop support is mainly for attracting >> existing >> users of Hadoop and so, I really think we should not invest in that >> area >> anymore. The only thing that I think we should continue doing is >> maybe add >> more tests (for different formats,etc). That is just my opinion :) >> >> What happened to Hadoop Compatibility Layer? Is that still a thing? >> >> On Thu, Jul 21, 2016 at 5:24 AM, Ian Maxon wrote: >> >>> That's all the ways we use Hadoop at the moment that I can think of >>> as >>> well. Maybe the two other minor ones are zookeeper and HDFS backup >>> in >>> Managix. >>> >>> For 1) and 2) it's using Hadoop 2.2.0 right now. In my experimental >>> branch >>> for 3) I'm using 2.6.0, it doesn't cause any more issues for me than >>> 2.2.0. >>> I believe 1) used to support Hadoop 0.20.0 and other 1.x versions >>> but I'm >>> not sure if that works anymore. >>> >>> On Wed, Jul 20, 2016 at 7:14 PM, Till Westmann >>> wrote: >>> >>>> Hi everybody, >>>> >>>> recently the topic of Hadoop support came up and I realized that my >>>> understanding is quite spotty so I’m trying to understand where >>>> we are. >>>> >>>> AFAIK we support >>>> 1) HDFS for (potentially indexed) external datasets, >>>> 2) YARN as a resource manager, and >>>> 3) HDFS as a basis for internal storage. >>>> Is this list complete or do we have other Hadoop touchpoints? >>>> >>>> I believe that 1) and 2) should be reasonable stable and that 3) is >>>> still >>>> in >>>> the works. Is that correct? >>>> >>>> Further I'm wondering >>>> a) which versions of Hadoop we support and >>>> b) which ones we should support for all the cases. >>>> Please chime in on this as well. >>>> >>>> Any other things that anybody working with AsterixDB and Hadoop >>>> should be >>>> aware >>>> of? >>>> >>>> Thanks! >>>> Till >>>> >>>>