tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alvin Henrick <share.c...@aol.com>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Mon, 14 Apr 2014 02:51:56 GMT
+1 Hyunsik.

Thanks!
Warm Regards,
Alvin.

On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:

> Hi folks,
> 
> I'd like to discuss the next version number. In Jira, we have provisionally
> used 1.0, and we didn't decide the next major version. I propose 0.9 as the
> next major version. What do you think about this?
> 
> Regards,
> Hyunsik
> 
> 
> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <jihoonson@apache.org> wrote:
> 
>> Min, thanks for reminding us!
>> It's a mandatory issue.
>> We need to implement that feature ASAP.
>> 
>> Thanks,
>> Jihoon
>> 
>> 
>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>> 
>>> Min,
>>> 
>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank you
>>> for reminding me. It would be achieved by modifying Query class to
>> execute
>>> independent execution blocks in parallel. I'll add it to the wiki.
>>> 
>>> Thanks,
>>> Hyunsik
>>> 
>>> 
>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com> wrote:
>>> 
>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will scan A at
>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in
>>>> parallel, right?
>>>> 
>>>> 
>>>> Min
>>>> 
>>>> 
>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <hyunsik@apache.org>
>>> wrote:
>>>> 
>>>>> I've just updated the roadmap page. Please take a look at the section
>>>>> 'After 0.8.0'
>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>>>>> 
>>>>> If there are missed or additional ideas, feel free to add them on
>> that
>>>>> page or suggest them here. After we discuss them more, we would
>> decide
>>>>> their priorities.
>>>>> 
>>>>> Best regards,
>>>>> Hyunsik
>>>>> 
>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <hyunsik@apache.org>
>>>> wrote:
>>>>>> Hi Hyoungjun,
>>>>>> 
>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
>>>>>> users with some prepared benchmark environment, users can test Tajo
>>>>>> easily. I'll file your idea on the wiki. Thank you for your
>>>>>> suggestion.
>>>>>> 
>>>>>> Regards,
>>>>>> Hyunsik
>>>>>> 
>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <babokim@gmail.com>
wrote:
>>>>>>> Hi Hyunsik ,
>>>>>>> 
>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script
>> like
>>>> hive
>>>>>>> and impala is more helpful to test.
>>>>>>> 
>>>>>>> https://github.com/rxin/TPC-H-Hive
>>>>>>> https://github.com/cartershanklin/hive-testbench
>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> Hyoungjun
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>>>> 
>>>>>>>> Hi Jihoon,
>>>>>>>> 
>>>>>>>> CUBE and ROLL-UP are key features for analytic problems.
I filed
>> it
>>>> on
>>>>> the
>>>>>>>> wiki.
>>>>>>>> 
>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities
>> to
>>>>>>>> logical planning and distributed query planning. But, I'm
not
>> sure
>>> it
>>>>>>>> can be included in short-term roadmap. They are necessary,
but
>> they
>>>>>>>> are not required right now. In my view, it would be reasonable
to
>>>>>>>> schedule them on long-term roadmap.
>>>>>>>> 
>>>>>>>> Warm regards,
>>>>>>>> Hyunsik
>>>>>>>> 
>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <jihoonson@apache.org
>>> 
>>>>> wrote:
>>>>>>>>> Hi Hyunsik,
>>>>>>>>> I'm very glad that we can release the next version, soon.
>>>>>>>>> Also, appreciate for the guideline of the next roadmap.
>>>>>>>>> 
>>>>>>>>> Addition to the aforementioned features, I have the two
>>>> suggestions.
>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally,
I
>>>>> started it
>>>>>>>>> quite a long time ago, but it is delayed due to the lower
>>> priority
>>>>> than
>>>>>>>>> other stability issues. But, since this operator is widely
used
>>> in
>>>>>>>> analytic
>>>>>>>>> applications, we need to add this feature as soon as
possible.
>>> So,
>>>>> in my
>>>>>>>>> opinion, it would be good to add this feature to the
next
>>> roadmap.
>>>>>>>>> 
>>>>>>>>> Second is the advanced query optimization. TAJO-266 is
an issue
>>> for
>>>>>>>> making
>>>>>>>>> the query plan more flexible. After that, we can employ
the
>>> plenty
>>>>>>>>> optimization opportunities like described in TAJO-161.
>>>>>>>>> 
>>>>>>>>> How do you guys think about these issues?
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Jihoon
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>>>>>> 
>>>>>>>>>> Hi folks,
>>>>>>>>>> 
>>>>>>>>>> I'm very happy to see that our community is growing!
Also,
>> It's
>>> a
>>>>>>>> pleasure
>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've
tested
>> various
>>>>>>>> features
>>>>>>>>>> in various contexts, and tried to figure out if there
are any
>>>>> critical
>>>>>>>>>> problems. I think that there are only a few issues
and we can
>>>>> release
>>>>>>>> 0.8.0
>>>>>>>>>> next week. If there are further issues to be solved
before the
>>>> 0.8.0
>>>>>>>>>> release, feel free to suggest ideas.
>>>>>>>>>> 
>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are
open to any
>>>>>>>> suggestion
>>>>>>>>>> from users, contributors, and committers. Please
fire away!
>>>>>>>>>> 
>>>>>>>>>> I'm thinking that our next stage should focus on
improving the
>>> way
>>>>> Tajo
>>>>>>>>>> runs in thousands of large cluster nodes and for
a number of
>>>>> concurrent
>>>>>>>>>> users. The key issues associated with this include
the
>>> following:
>>>>>>>>>> 
>>>>>>>>>> * High availability
>>>>>>>>>> * Multi-tenancy scheduling
>>>>>>>>>> * More stability
>>>>>>>>>> * Improved shuffle
>>>>>>>>>> 
>>>>>>>>>> The current work status is as follows. Min is working
on
>> Tajo's
>>>> new
>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support
him. As
>> far
>>>> as I
>>>>>>>> know,
>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also,
some guys
>>>>> including
>>>>>>>>>> myself are investigating and solving the issues which
occur in
>>>> large
>>>>>>>>>> clusters. These issues should be solved in order
to make Tajo
>> a
>>>>> complete
>>>>>>>>>> enterprise-ready production.
>>>>>>>>>> 
>>>>>>>>>> In addition, there are some SQL feature support issues.
Many
>>>>> analytic
>>>>>>>>>> problems require window functions. Also, in-subquery
and
>> scalar
>>>>> subquery
>>>>>>>>>> should be supported. So, I'd like to schedule them
with high
>>>>> priority.
>>>>>>>> In
>>>>>>>>>> my view, there will be very few SQL support issues
if Tajo
>>>> provides
>>>>>>>> these
>>>>>>>>>> features.
>>>>>>>>>> 
>>>>>>>>>> Besides those areas, David is working on a nested
schema and
>> its
>>>>> related
>>>>>>>>>> work (TAJO-710). I guess this will take quite a while
because
>> it
>>>>>>>> requires a
>>>>>>>>>> lot of hard work. So, it would be great to schedule
the nested
>>>>> schema
>>>>>>>>>> loosely. That's just my thoughts, anyhow.
>>>>>>>>>> 
>>>>>>>>>> Aside from the discussion of our roadmap, I'd like
to suggest
>>> that
>>>>> we
>>>>>>>> need
>>>>>>>>>> to release more frequently after the 0.8.0 release.
So far,
>>> there
>>>>> has
>>>>>>>> been
>>>>>>>>>> a long period between each release because Tajo is
undergoing
>>>> heavy
>>>>>>>>>> development. By 'releasing early, releasing often',
we will
>> make
>>>>> more
>>>>>>>>>> tighter feedback loop between users and developers.
>>>>>>>>>> 
>>>>>>>>>> I think that there are many additional many interesting
issues
>>> to
>>>> be
>>>>>>>>>> included in our roadmap. Feel free to suggest your
idea. We
>> will
>>>>> arrange
>>>>>>>>>> our short-term roadmap and long-term roadmap based
on your
>>>>> suggestions.
>>>>>>>>>> 
>>>>>>>>>> Thank you all so much for your contribution!
>>>>>>>>>> 
>>>>>>>>>> Warm Regards,
>>>>>>>>>> Hyunsik
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Tajo - Big Data Warehouse System on Hadoop
>>>>>>> http://tajo.apache.org/
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> My research interests are distributed systems, parallel computing and
>>>> bytecode based virtual machine.
>>>> 
>>>> My profile:
>>>> http://www.linkedin.com/in/coderplay
>>>> My blog:
>>>> http://coderplay.javaeye.com
>>>> 
>>> 
>> 


Mime
View raw message