asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Working with Hadoop
Date Thu, 21 Jul 2016 23:32:20 GMT
Ok, I’ve filed 2 issues
https://issues.apache.org/jira/browse/ASTERIXDB-1540
https://issues.apache.org/jira/browse/ASTERIXDB-1541
and I’ve assigned the second one (update dependencies) to Ian as I 
think that he is familiar with the field and probably the only one that 
knows about the YARN part :)

Cheers,
Till

On 21 Jul 2016, at 13:45, Mike Carey wrote:

> IMO:  Yes to all...  :-)
>
>
> On 7/21/16 12:57 PM, Till Westmann wrote:
>> Ok, so would it make sense (and work) to update all of out 
>> dependencies to that lastest 2.6 release?
>>
>> Longer term - if we want to continue to support HDFS - it seems that 
>> we should think about being able to support different versions of 
>> HDFS with the same AsterixDB instance. That way we could use and 
>> combine data from different clusters with the data in AsterixDB.
>> Does that make sense?
>> Would that be desirable and feasible?
>>
>> Cheers,
>> Till
>>
>> On 21 Jul 2016, at 11:10, Mike Carey wrote:
>>
>>> My 0.15 cents' worth:
>>>
>>> 1 is of definite interest as a way of sneakily expanding our turf - 
>>> AsterixDB is in the "NoSQL on steroids" space, in terms of our 
>>> features and functionality - but can properly encroach on the "SQL 
>>> on Hadoop" analytics world with 1.  That's something that's of 
>>> interest, I think.  For now I think supporting one popular version 
>>> of Hadoop is good - so 2.x.x is a fine answer for that.
>>>
>>> 2 was an NSF deliverable and we felt it would be helpful w.r.t. the 
>>> world of 1 - i.e., maybe folks would be more comfortable running us 
>>> in their data centers if their YARN sysadmins could be the 
>>> resource/etc managers.  I think that's also still of interest, and 
>>> both 1 and 2 are things we should maintain.
>>>
>>> 3 is for an interesting/fun research question - namely, would 
>>> AsterixDB on HDFS storage be better from a replication, etc., 
>>> standpoint than AsterixDB doing everything natively and using 
>>> DB-style replication.  The goal of 3 is to explore that question but 
>>> not to make HDFS-ified AsterixDB a released/supported feature in 
>>> AsterixDB in any particular timeframe.  At the time we started 
>>> looking at 3, we were also thinking it might (albeit misguidedly 
>>> :-)) make potential "enterprise adopters" of AsterixDB happier to 
>>> "know that their data is safely kept in HDFS".  (Nevermind that we 
>>> could corrupt the details of their data and make it unusable still. 
>>> :-))  I think that's no longer something we need to worry about as a 
>>> reason for 3 - the real reason for 3 is experimental systems 
>>> research (i.e., the native vs. HDFS performance issues study).
>>>
>>> Cheers,
>>>
>>> Mike
>>>
>>>
>>> On 7/21/16 1:49 AM, abdullah alamoudi wrote:
>>>> I think that list is all we've got. We only support Hadoop 2.x.x.
>>>> We found that supporting both 1.x and 2.x has a cost that we 
>>>> couldn't
>>>> afford. I believe there are fundamental differences between Hadoop 
>>>> 1.x and
>>>> 2.x and that a good segment of Hadoop community still use 1.x. 
>>>> However, it
>>>> has been a while since 1.x got a new release and so, I am not sure 
>>>> if it is
>>>> worth investing time in making it work.
>>>>
>>>> Also, seems to me that our Hadoop support is mainly for attracting 
>>>> existing
>>>> users of Hadoop and so, I really think we should not invest in that 
>>>> area
>>>> anymore. The only thing that I think we should continue doing is 
>>>> maybe add
>>>> more tests (for different formats,etc). That is just my opinion :)
>>>>
>>>> What happened to Hadoop Compatibility Layer? Is that still a thing?
>>>>
>>>> On Thu, Jul 21, 2016 at 5:24 AM, Ian Maxon <imaxon@uci.edu> wrote:
>>>>
>>>>> That's all the ways we use Hadoop at the moment that I can think 
>>>>> of as
>>>>> well. Maybe the two other minor ones are zookeeper and HDFS backup 
>>>>> in
>>>>> Managix.
>>>>>
>>>>> For 1) and 2) it's using Hadoop 2.2.0 right now. In my 
>>>>> experimental branch
>>>>> for 3) I'm using 2.6.0, it doesn't cause any more issues for me 
>>>>> than 2.2.0.
>>>>> I believe 1) used to support Hadoop 0.20.0 and other 1.x versions 
>>>>> but I'm
>>>>> not sure if that works anymore.
>>>>>
>>>>> On Wed, Jul 20, 2016 at 7:14 PM, Till Westmann <tillw@apache.org>

>>>>> wrote:
>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> recently the topic of Hadoop support came up and I realized that

>>>>>> my
>>>>>> understanding is quite spotty so I’m trying to understand where

>>>>>> we are.
>>>>>>
>>>>>> AFAIK we support
>>>>>> 1) HDFS for (potentially indexed) external datasets,
>>>>>> 2) YARN as a resource manager, and
>>>>>> 3) HDFS as a basis for internal storage.
>>>>>> Is this list complete or do we have other Hadoop touchpoints?
>>>>>>
>>>>>> I believe that 1) and 2) should be reasonable stable and that 3)

>>>>>> is still
>>>>>> in
>>>>>> the works. Is that correct?
>>>>>>
>>>>>> Further I'm wondering
>>>>>> a) which versions of Hadoop we support and
>>>>>> b) which ones we should support for all the cases.
>>>>>> Please chime in on this as well.
>>>>>>
>>>>>> Any other things that anybody working with AsterixDB and Hadoop 
>>>>>> should be
>>>>>> aware
>>>>>> of?
>>>>>>
>>>>>> Thanks!
>>>>>> Till
>>>>>>
>>>>>>

Mime
View raw message