tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Chen <dc...@linkedin.com>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Mon, 14 Apr 2014 19:31:51 GMT
+1 

I agree with Hyunsik as well. I think since 1.0 increments the major version number, it should
be used for a particularly significant release. :)

Thanks,
David


On Apr 13, 2014, at 7:51 PM, Alvin Henrick <share.code@aol.com> wrote:

> +1 Hyunsik.
> 
> Thanks!
> Warm Regards,
> Alvin.
> 
> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
> 
>> Hi folks,
>> 
>> I'd like to discuss the next version number. In Jira, we have provisionally
>> used 1.0, and we didn't decide the next major version. I propose 0.9 as the
>> next major version. What do you think about this?
>> 
>> Regards,
>> Hyunsik
>> 
>> 
>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <jihoonson@apache.org> wrote:
>> 
>>> Min, thanks for reminding us!
>>> It's a mandatory issue.
>>> We need to implement that feature ASAP.
>>> 
>>> Thanks,
>>> Jihoon
>>> 
>>> 
>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>> 
>>>> Min,
>>>> 
>>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank you
>>>> for reminding me. It would be achieved by modifying Query class to
>>> execute
>>>> independent execution blocks in parallel. I'll add it to the wiki.
>>>> 
>>>> Thanks,
>>>> Hyunsik
>>>> 
>>>> 
>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com> wrote:
>>>> 
>>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will scan A
at
>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in
>>>>> parallel, right?
>>>>> 
>>>>> 
>>>>> Min
>>>>> 
>>>>> 
>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <hyunsik@apache.org>
>>>> wrote:
>>>>> 
>>>>>> I've just updated the roadmap page. Please take a look at the section
>>>>>> 'After 0.8.0'
>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>>>>>> 
>>>>>> If there are missed or additional ideas, feel free to add them on
>>> that
>>>>>> page or suggest them here. After we discuss them more, we would
>>> decide
>>>>>> their priorities.
>>>>>> 
>>>>>> Best regards,
>>>>>> Hyunsik
>>>>>> 
>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <hyunsik@apache.org>
>>>>> wrote:
>>>>>>> Hi Hyoungjun,
>>>>>>> 
>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
>>>>>>> users with some prepared benchmark environment, users can test
Tajo
>>>>>>> easily. I'll file your idea on the wiki. Thank you for your
>>>>>>> suggestion.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Hyunsik
>>>>>>> 
>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <babokim@gmail.com>
wrote:
>>>>>>>> Hi Hyunsik ,
>>>>>>>> 
>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script
>>> like
>>>>> hive
>>>>>>>> and impala is more helpful to test.
>>>>>>>> 
>>>>>>>> https://github.com/rxin/TPC-H-Hive
>>>>>>>> https://github.com/cartershanklin/hive-testbench
>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> Hyoungjun
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>>>>> 
>>>>>>>>> Hi Jihoon,
>>>>>>>>> 
>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems.
I filed
>>> it
>>>>> on
>>>>>> the
>>>>>>>>> wiki.
>>>>>>>>> 
>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities
>>> to
>>>>>>>>> logical planning and distributed query planning. But,
I'm not
>>> sure
>>>> it
>>>>>>>>> can be included in short-term roadmap. They are necessary,
but
>>> they
>>>>>>>>> are not required right now. In my view, it would be reasonable
to
>>>>>>>>> schedule them on long-term roadmap.
>>>>>>>>> 
>>>>>>>>> Warm regards,
>>>>>>>>> Hyunsik
>>>>>>>>> 
>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <jihoonson@apache.org
>>>> 
>>>>>> wrote:
>>>>>>>>>> Hi Hyunsik,
>>>>>>>>>> I'm very glad that we can release the next version,
soon.
>>>>>>>>>> Also, appreciate for the guideline of the next roadmap.
>>>>>>>>>> 
>>>>>>>>>> Addition to the aforementioned features, I have the
two
>>>>> suggestions.
>>>>>>>>>> First is the support of CUBE operator (TAJO-259).
Acutally, I
>>>>>> started it
>>>>>>>>>> quite a long time ago, but it is delayed due to the
lower
>>>> priority
>>>>>> than
>>>>>>>>>> other stability issues. But, since this operator
is widely used
>>>> in
>>>>>>>>> analytic
>>>>>>>>>> applications, we need to add this feature as soon
as possible.
>>>> So,
>>>>>> in my
>>>>>>>>>> opinion, it would be good to add this feature to
the next
>>>> roadmap.
>>>>>>>>>> 
>>>>>>>>>> Second is the advanced query optimization. TAJO-266
is an issue
>>>> for
>>>>>>>>> making
>>>>>>>>>> the query plan more flexible. After that, we can
employ the
>>>> plenty
>>>>>>>>>> optimization opportunities like described in TAJO-161.
>>>>>>>>>> 
>>>>>>>>>> How do you guys think about these issues?
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Jihoon
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>>>>>>> 
>>>>>>>>>>> Hi folks,
>>>>>>>>>>> 
>>>>>>>>>>> I'm very happy to see that our community is growing!
Also,
>>> It's
>>>> a
>>>>>>>>> pleasure
>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently,
I've tested
>>> various
>>>>>>>>> features
>>>>>>>>>>> in various contexts, and tried to figure out
if there are any
>>>>>> critical
>>>>>>>>>>> problems. I think that there are only a few issues
and we can
>>>>>> release
>>>>>>>>> 0.8.0
>>>>>>>>>>> next week. If there are further issues to be
solved before the
>>>>> 0.8.0
>>>>>>>>>>> release, feel free to suggest ideas.
>>>>>>>>>>> 
>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We
are open to any
>>>>>>>>> suggestion
>>>>>>>>>>> from users, contributors, and committers. Please
fire away!
>>>>>>>>>>> 
>>>>>>>>>>> I'm thinking that our next stage should focus
on improving the
>>>> way
>>>>>> Tajo
>>>>>>>>>>> runs in thousands of large cluster nodes and
for a number of
>>>>>> concurrent
>>>>>>>>>>> users. The key issues associated with this include
the
>>>> following:
>>>>>>>>>>> 
>>>>>>>>>>> * High availability
>>>>>>>>>>> * Multi-tenancy scheduling
>>>>>>>>>>> * More stability
>>>>>>>>>>> * Improved shuffle
>>>>>>>>>>> 
>>>>>>>>>>> The current work status is as follows. Min is
working on
>>> Tajo's
>>>>> new
>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support
him. As
>>> far
>>>>> as I
>>>>>>>>> know,
>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704).
Also, some guys
>>>>>> including
>>>>>>>>>>> myself are investigating and solving the issues
which occur in
>>>>> large
>>>>>>>>>>> clusters. These issues should be solved in order
to make Tajo
>>> a
>>>>>> complete
>>>>>>>>>>> enterprise-ready production.
>>>>>>>>>>> 
>>>>>>>>>>> In addition, there are some SQL feature support
issues. Many
>>>>>> analytic
>>>>>>>>>>> problems require window functions. Also, in-subquery
and
>>> scalar
>>>>>> subquery
>>>>>>>>>>> should be supported. So, I'd like to schedule
them with high
>>>>>> priority.
>>>>>>>>> In
>>>>>>>>>>> my view, there will be very few SQL support issues
if Tajo
>>>>> provides
>>>>>>>>> these
>>>>>>>>>>> features.
>>>>>>>>>>> 
>>>>>>>>>>> Besides those areas, David is working on a nested
schema and
>>> its
>>>>>> related
>>>>>>>>>>> work (TAJO-710). I guess this will take quite
a while because
>>> it
>>>>>>>>> requires a
>>>>>>>>>>> lot of hard work. So, it would be great to schedule
the nested
>>>>>> schema
>>>>>>>>>>> loosely. That's just my thoughts, anyhow.
>>>>>>>>>>> 
>>>>>>>>>>> Aside from the discussion of our roadmap, I'd
like to suggest
>>>> that
>>>>>> we
>>>>>>>>> need
>>>>>>>>>>> to release more frequently after the 0.8.0 release.
So far,
>>>> there
>>>>>> has
>>>>>>>>> been
>>>>>>>>>>> a long period between each release because Tajo
is undergoing
>>>>> heavy
>>>>>>>>>>> development. By 'releasing early, releasing often',
we will
>>> make
>>>>>> more
>>>>>>>>>>> tighter feedback loop between users and developers.
>>>>>>>>>>> 
>>>>>>>>>>> I think that there are many additional many interesting
issues
>>>> to
>>>>> be
>>>>>>>>>>> included in our roadmap. Feel free to suggest
your idea. We
>>> will
>>>>>> arrange
>>>>>>>>>>> our short-term roadmap and long-term roadmap
based on your
>>>>>> suggestions.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you all so much for your contribution!
>>>>>>>>>>> 
>>>>>>>>>>> Warm Regards,
>>>>>>>>>>> Hyunsik
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Tajo - Big Data Warehouse System on Hadoop
>>>>>>>> http://tajo.apache.org/
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> My research interests are distributed systems, parallel computing and
>>>>> bytecode based virtual machine.
>>>>> 
>>>>> My profile:
>>>>> http://www.linkedin.com/in/coderplay
>>>>> My blog:
>>>>> http://coderplay.javaeye.com
>>>>> 
>>>> 
>>> 
> 

Mime
View raw message