tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ktpark <sir...@apache.org>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Tue, 15 Apr 2014 00:03:57 GMT
+1

I agree with Hyunsik.
Sorry for late reply.

2014. 4. 15., 오전 5:05, Min Zhou <coderplay@gmail.com> 작성:

> Until today realized that my reply haven't been sent.
> 
> +1
> 
> Totally agree with Hyunsik. 0.9 is more appropriate for the next release.
> 
> Min
> 
> 
> On Mon, Apr 14, 2014 at 12:31 PM, David Chen <dchen@linkedin.com> wrote:
> 
>> +1
>> 
>> I agree with Hyunsik as well. I think since 1.0 increments the major
>> version number, it should be used for a particularly significant release. :)
>> 
>> Thanks,
>> David
>> 
>> 
>> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <share.code@aol.com> wrote:
>> 
>>> +1 Hyunsik.
>>> 
>>> Thanks!
>>> Warm Regards,
>>> Alvin.
>>> 
>>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
>>> 
>>>> Hi folks,
>>>> 
>>>> I'd like to discuss the next version number. In Jira, we have
>> provisionally
>>>> used 1.0, and we didn't decide the next major version. I propose 0.9 as
>> the
>>>> next major version. What do you think about this?
>>>> 
>>>> Regards,
>>>> Hyunsik
>>>> 
>>>> 
>>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <jihoonson@apache.org>
>> wrote:
>>>> 
>>>>> Min, thanks for reminding us!
>>>>> It's a mandatory issue.
>>>>> We need to implement that feature ASAP.
>>>>> 
>>>>> Thanks,
>>>>> Jihoon
>>>>> 
>>>>> 
>>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>> 
>>>>>> Min,
>>>>>> 
>>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank
>> you
>>>>>> for reminding me. It would be achieved by modifying Query class to
>>>>> execute
>>>>>> independent execution blocks in parallel. I'll add it to the wiki.
>>>>>> 
>>>>>> Thanks,
>>>>>> Hyunsik
>>>>>> 
>>>>>> 
>>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will
scan A
>> at
>>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run
it in
>>>>>>> parallel, right?
>>>>>>> 
>>>>>>> 
>>>>>>> Min
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <hyunsik@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I've just updated the roadmap page. Please take a look at
the
>> section
>>>>>>>> 'After 0.8.0'
>>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>>>>>>>> 
>>>>>>>> If there are missed or additional ideas, feel free to add
them on
>>>>> that
>>>>>>>> page or suggest them here. After we discuss them more, we
would
>>>>> decide
>>>>>>>> their priorities.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Hyunsik
>>>>>>>> 
>>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <hyunsik@apache.org>
>>>>>>> wrote:
>>>>>>>>> Hi Hyoungjun,
>>>>>>>>> 
>>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary.
If we provide
>>>>>>>>> users with some prepared benchmark environment, users
can test Tajo
>>>>>>>>> easily. I'll file your idea on the wiki. Thank you for
your
>>>>>>>>> suggestion.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Hyunsik
>>>>>>>>> 
>>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <babokim@gmail.com>
wrote:
>>>>>>>>>> Hi Hyunsik ,
>>>>>>>>>> 
>>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark
script
>>>>> like
>>>>>>> hive
>>>>>>>>>> and impala is more helpful to test.
>>>>>>>>>> 
>>>>>>>>>> https://github.com/rxin/TPC-H-Hive
>>>>>>>>>> https://github.com/cartershanklin/hive-testbench
>>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> Hyoungjun
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>>>>>>> 
>>>>>>>>>>> Hi Jihoon,
>>>>>>>>>>> 
>>>>>>>>>>> CUBE and ROLL-UP are key features for analytic
problems. I filed
>>>>> it
>>>>>>> on
>>>>>>>> the
>>>>>>>>>>> wiki.
>>>>>>>>>>> 
>>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization
opportunities
>>>>> to
>>>>>>>>>>> logical planning and distributed query planning.
But, I'm not
>>>>> sure
>>>>>> it
>>>>>>>>>>> can be included in short-term roadmap. They are
necessary, but
>>>>> they
>>>>>>>>>>> are not required right now. In my view, it would
be reasonable to
>>>>>>>>>>> schedule them on long-term roadmap.
>>>>>>>>>>> 
>>>>>>>>>>> Warm regards,
>>>>>>>>>>> Hyunsik
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <jihoonson@apache.org
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>>> Hi Hyunsik,
>>>>>>>>>>>> I'm very glad that we can release the next
version, soon.
>>>>>>>>>>>> Also, appreciate for the guideline of the
next roadmap.
>>>>>>>>>>>> 
>>>>>>>>>>>> Addition to the aforementioned features,
I have the two
>>>>>>> suggestions.
>>>>>>>>>>>> First is the support of CUBE operator (TAJO-259).
Acutally, I
>>>>>>>> started it
>>>>>>>>>>>> quite a long time ago, but it is delayed
due to the lower
>>>>>> priority
>>>>>>>> than
>>>>>>>>>>>> other stability issues. But, since this operator
is widely used
>>>>>> in
>>>>>>>>>>> analytic
>>>>>>>>>>>> applications, we need to add this feature
as soon as possible.
>>>>>> So,
>>>>>>>> in my
>>>>>>>>>>>> opinion, it would be good to add this feature
to the next
>>>>>> roadmap.
>>>>>>>>>>>> 
>>>>>>>>>>>> Second is the advanced query optimization.
TAJO-266 is an issue
>>>>>> for
>>>>>>>>>>> making
>>>>>>>>>>>> the query plan more flexible. After that,
we can employ the
>>>>>> plenty
>>>>>>>>>>>> optimization opportunities like described
in TAJO-161.
>>>>>>>>>>>> 
>>>>>>>>>>>> How do you guys think about these issues?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Jihoon
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm very happy to see that our community
is growing! Also,
>>>>> It's
>>>>>> a
>>>>>>>>>>> pleasure
>>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently,
I've tested
>>>>> various
>>>>>>>>>>> features
>>>>>>>>>>>>> in various contexts, and tried to figure
out if there are any
>>>>>>>> critical
>>>>>>>>>>>>> problems. I think that there are only
a few issues and we can
>>>>>>>> release
>>>>>>>>>>> 0.8.0
>>>>>>>>>>>>> next week. If there are further issues
to be solved before the
>>>>>>> 0.8.0
>>>>>>>>>>>>> release, feel free to suggest ideas.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, I'd like to discuss our next roadmap.
We are open to any
>>>>>>>>>>> suggestion
>>>>>>>>>>>>> from users, contributors, and committers.
Please fire away!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm thinking that our next stage should
focus on improving the
>>>>>> way
>>>>>>>> Tajo
>>>>>>>>>>>>> runs in thousands of large cluster nodes
and for a number of
>>>>>>>> concurrent
>>>>>>>>>>>>> users. The key issues associated with
this include the
>>>>>> following:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> * High availability
>>>>>>>>>>>>> * Multi-tenancy scheduling
>>>>>>>>>>>>> * More stability
>>>>>>>>>>>>> * Improved shuffle
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The current work status is as follows.
Min is working on
>>>>> Tajo's
>>>>>>> new
>>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow.
I'll support him. As
>>>>> far
>>>>>>> as I
>>>>>>>>>>> know,
>>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704).
Also, some guys
>>>>>>>> including
>>>>>>>>>>>>> myself are investigating and solving
the issues which occur in
>>>>>>> large
>>>>>>>>>>>>> clusters. These issues should be solved
in order to make Tajo
>>>>> a
>>>>>>>> complete
>>>>>>>>>>>>> enterprise-ready production.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In addition, there are some SQL feature
support issues. Many
>>>>>>>> analytic
>>>>>>>>>>>>> problems require window functions. Also,
in-subquery and
>>>>> scalar
>>>>>>>> subquery
>>>>>>>>>>>>> should be supported. So, I'd like to
schedule them with high
>>>>>>>> priority.
>>>>>>>>>>> In
>>>>>>>>>>>>> my view, there will be very few SQL support
issues if Tajo
>>>>>>> provides
>>>>>>>>>>> these
>>>>>>>>>>>>> features.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Besides those areas, David is working
on a nested schema and
>>>>> its
>>>>>>>> related
>>>>>>>>>>>>> work (TAJO-710). I guess this will take
quite a while because
>>>>> it
>>>>>>>>>>> requires a
>>>>>>>>>>>>> lot of hard work. So, it would be great
to schedule the nested
>>>>>>>> schema
>>>>>>>>>>>>> loosely. That's just my thoughts, anyhow.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Aside from the discussion of our roadmap,
I'd like to suggest
>>>>>> that
>>>>>>>> we
>>>>>>>>>>> need
>>>>>>>>>>>>> to release more frequently after the
0.8.0 release. So far,
>>>>>> there
>>>>>>>> has
>>>>>>>>>>> been
>>>>>>>>>>>>> a long period between each release because
Tajo is undergoing
>>>>>>> heavy
>>>>>>>>>>>>> development. By 'releasing early, releasing
often', we will
>>>>> make
>>>>>>>> more
>>>>>>>>>>>>> tighter feedback loop between users and
developers.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think that there are many additional
many interesting issues
>>>>>> to
>>>>>>> be
>>>>>>>>>>>>> included in our roadmap. Feel free to
suggest your idea. We
>>>>> will
>>>>>>>> arrange
>>>>>>>>>>>>> our short-term roadmap and long-term
roadmap based on your
>>>>>>>> suggestions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you all so much for your contribution!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Warm Regards,
>>>>>>>>>>>>> Hyunsik
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop
>>>>>>>>>> http://tajo.apache.org/
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> My research interests are distributed systems, parallel computing
and
>>>>>>> bytecode based virtual machine.
>>>>>>> 
>>>>>>> My profile:
>>>>>>> http://www.linkedin.com/in/coderplay
>>>>>>> My blog:
>>>>>>> http://coderplay.javaeye.com
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
> 
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com


Mime
View raw message