asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Working with Hadoop
Date Thu, 21 Jul 2016 20:45:56 GMT
IMO:  Yes to all...  :-)


On 7/21/16 12:57 PM, Till Westmann wrote:
> Ok, so would it make sense (and work) to update all of out 
> dependencies to that lastest 2.6 release?
>
> Longer term - if we want to continue to support HDFS - it seems that 
> we should think about being able to support different versions of HDFS 
> with the same AsterixDB instance. That way we could use and combine 
> data from different clusters with the data in AsterixDB.
> Does that make sense?
> Would that be desirable and feasible?
>
> Cheers,
> Till
>
> On 21 Jul 2016, at 11:10, Mike Carey wrote:
>
>> My 0.15 cents' worth:
>>
>> 1 is of definite interest as a way of sneakily expanding our turf - 
>> AsterixDB is in the "NoSQL on steroids" space, in terms of our 
>> features and functionality - but can properly encroach on the "SQL on 
>> Hadoop" analytics world with 1.  That's something that's of interest, 
>> I think.  For now I think supporting one popular version of Hadoop is 
>> good - so 2.x.x is a fine answer for that.
>>
>> 2 was an NSF deliverable and we felt it would be helpful w.r.t. the 
>> world of 1 - i.e., maybe folks would be more comfortable running us 
>> in their data centers if their YARN sysadmins could be the 
>> resource/etc managers.  I think that's also still of interest, and 
>> both 1 and 2 are things we should maintain.
>>
>> 3 is for an interesting/fun research question - namely, would 
>> AsterixDB on HDFS storage be better from a replication, etc., 
>> standpoint than AsterixDB doing everything natively and using 
>> DB-style replication.  The goal of 3 is to explore that question but 
>> not to make HDFS-ified AsterixDB a released/supported feature in 
>> AsterixDB in any particular timeframe.  At the time we started 
>> looking at 3, we were also thinking it might (albeit misguidedly :-)) 
>> make potential "enterprise adopters" of AsterixDB happier to "know 
>> that their data is safely kept in HDFS".  (Nevermind that we could 
>> corrupt the details of their data and make it unusable still. :-))  I 
>> think that's no longer something we need to worry about as a reason 
>> for 3 - the real reason for 3 is experimental systems research (i.e., 
>> the native vs. HDFS performance issues study).
>>
>> Cheers,
>>
>> Mike
>>
>>
>> On 7/21/16 1:49 AM, abdullah alamoudi wrote:
>>> I think that list is all we've got. We only support Hadoop 2.x.x.
>>> We found that supporting both 1.x and 2.x has a cost that we couldn't
>>> afford. I believe there are fundamental differences between Hadoop 
>>> 1.x and
>>> 2.x and that a good segment of Hadoop community still use 1.x. 
>>> However, it
>>> has been a while since 1.x got a new release and so, I am not sure 
>>> if it is
>>> worth investing time in making it work.
>>>
>>> Also, seems to me that our Hadoop support is mainly for attracting 
>>> existing
>>> users of Hadoop and so, I really think we should not invest in that 
>>> area
>>> anymore. The only thing that I think we should continue doing is 
>>> maybe add
>>> more tests (for different formats,etc). That is just my opinion :)
>>>
>>> What happened to Hadoop Compatibility Layer? Is that still a thing?
>>>
>>> On Thu, Jul 21, 2016 at 5:24 AM, Ian Maxon <imaxon@uci.edu> wrote:
>>>
>>>> That's all the ways we use Hadoop at the moment that I can think of as
>>>> well. Maybe the two other minor ones are zookeeper and HDFS backup in
>>>> Managix.
>>>>
>>>> For 1) and 2) it's using Hadoop 2.2.0 right now. In my experimental 
>>>> branch
>>>> for 3) I'm using 2.6.0, it doesn't cause any more issues for me 
>>>> than 2.2.0.
>>>> I believe 1) used to support Hadoop 0.20.0 and other 1.x versions 
>>>> but I'm
>>>> not sure if that works anymore.
>>>>
>>>> On Wed, Jul 20, 2016 at 7:14 PM, Till Westmann <tillw@apache.org> 
>>>> wrote:
>>>>
>>>>> Hi everybody,
>>>>>
>>>>> recently the topic of Hadoop support came up and I realized that my
>>>>> understanding is quite spotty so I’m trying to understand where we

>>>>> are.
>>>>>
>>>>> AFAIK we support
>>>>> 1) HDFS for (potentially indexed) external datasets,
>>>>> 2) YARN as a resource manager, and
>>>>> 3) HDFS as a basis for internal storage.
>>>>> Is this list complete or do we have other Hadoop touchpoints?
>>>>>
>>>>> I believe that 1) and 2) should be reasonable stable and that 3) 
>>>>> is still
>>>>> in
>>>>> the works. Is that correct?
>>>>>
>>>>> Further I'm wondering
>>>>> a) which versions of Hadoop we support and
>>>>> b) which ones we should support for all the cases.
>>>>> Please chime in on this as well.
>>>>>
>>>>> Any other things that anybody working with AsterixDB and Hadoop 
>>>>> should be
>>>>> aware
>>>>> of?
>>>>>
>>>>> Thanks!
>>>>> Till
>>>>>
>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message