tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Fri, 04 Apr 2014 14:26:26 GMT
Hi Alvin,

Your idea sounds nice. You may intend some prepared virtualbox images
or Vargrant (http://www.vagrantup.com/) images to immediately
demonstrate Tajo or let users experience Tajo. It sounds reasonable
and seems to be helpful for introducing Tajo more widely. I'll file it
on the wiki page.

The smart caching based on machine learning seems to be a kind of LRFU
caching approaches. It sounds nice. In my view. The smart caching is a
kind of caching policy. For this idea, we firstly need to have some
basic caching feature. Probably, HDFS cache inspired by HDFS-4949 and
TAJO-472 are the candidate of basic caching features. You may be
interested in these issues. I'll file your idea as the pluggable
caching policy feature on the wiki.

Best regards,
Hyunsik


On Fri, Apr 4, 2014 at 10:37 PM, Alvin Henrick <share.code@aol.com> wrote:
> Hi Hyunsik ,
>                        I Apologize for the late reply.I am US / Dallas TimeZone.You will
notice some time differences in reply :).
>
>                        I personally think Tajo is an amazing tool people would love to
use it but I would like to stress that we need to add couple of samples to the project showing
developers/business owners .How to use it effectively ?.
>
>                        Add couple of small examples and add one examples which uses at
least 2 to 4 GB of datafile and then run SQL Query on top of it. Kind of small benchmark application.
>
>                        We also need to improve our documentation.The best way to get
peoples attention and sell the product to the masses.
>
>                        Theres are few SQL on Hadoop product out there but most of the
good ones are not open source like Impala and HAWQ.
>
>                        In future I think we should also introduce the concept of Smart
Caching based on Machine Learning (The data sets on which user query frequently must be cached
for faster access).
>
> Thanks!
> Warm Regards,
> Alvin.
>
> On Apr 4, 2014, at 12:24 AM, Hyunsik Choi wrote:
>
>> Hi folks,
>>
>> I'm very happy to see that our community is growing! Also, It's a pleasure
>> to discuss the Tajo 0.8.0 release. Recently, I've tested various features
>> in various contexts, and tried to figure out if there are any critical
>> problems. I think that there are only a few issues and we can release 0.8.0
>> next week. If there are further issues to be solved before the 0.8.0
>> release, feel free to suggest ideas.
>>
>> Also, I'd like to discuss our next roadmap. We are open to any suggestion
>> from users, contributors, and committers. Please fire away!
>>
>> I'm thinking that our next stage should focus on improving the way Tajo
>> runs in thousands of large cluster nodes and for a number of concurrent
>> users. The key issues associated with this include the following:
>>
>> * High availability
>> * Multi-tenancy scheduling
>> * More stability
>> * Improved shuffle
>>
>> The current work status is as follows. Min is working on Tajo's new
>> scheduler (TAJO-540) based on sparrow. I'll support him. As far as I know,
>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys including
>> myself are investigating and solving the issues which occur in large
>> clusters. These issues should be solved in order to make Tajo a complete
>> enterprise-ready production.
>>
>> In addition, there are some SQL feature support issues. Many analytic
>> problems require window functions. Also, in-subquery and scalar subquery
>> should be supported. So, I'd like to schedule them with high priority. In
>> my view, there will be very few SQL support issues if Tajo provides these
>> features.
>>
>> Besides those areas, David is working on a nested schema and its related
>> work (TAJO-710). I guess this will take quite a while because it requires a
>> lot of hard work. So, it would be great to schedule the nested schema
>> loosely. That's just my thoughts, anyhow.
>>
>> Aside from the discussion of our roadmap, I'd like to suggest that we need
>> to release more frequently after the 0.8.0 release. So far, there has been
>> a long period between each release because Tajo is undergoing heavy
>> development. By 'releasing early, releasing often', we will make more
>> tighter feedback loop between users and developers.
>>
>> I think that there are many additional many interesting issues to be
>> included in our roadmap. Feel free to suggest your idea. We will arrange
>> our short-term roadmap and long-term roadmap based on your suggestions.
>>
>> Thank you all so much for your contribution!
>>
>> Warm Regards,
>> Hyunsik
>

Mime
View raw message