tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Tue, 15 Apr 2014 06:42:04 GMT
As David mentioned, the version 1.0 usually has special meanings like GA.
When we are confident with the stability and features of Tajo, we can use
1.0. Thank you all guys again!


On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <hyunsik@apache.org> wrote:

> Thank you for votes! Let's go ahead!
>
> Cheers,
> Hyunsik
>
>
> On Tue, Apr 15, 2014 at 9:03 AM, ktpark <sirpkt@apache.org> wrote:
>
>> +1
>>
>> I agree with Hyunsik.
>> Sorry for late reply.
>>
>> 2014. 4. 15., 오전 5:05, Min Zhou <coderplay@gmail.com> 작성:
>>
>> > Until today realized that my reply haven't been sent.
>> >
>> > +1
>> >
>> > Totally agree with Hyunsik. 0.9 is more appropriate for the next
>> release.
>> >
>> > Min
>> >
>> >
>> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <dchen@linkedin.com>
>> wrote:
>> >
>> >> +1
>> >>
>> >> I agree with Hyunsik as well. I think since 1.0 increments the major
>> >> version number, it should be used for a particularly significant
>> release. :)
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >>
>> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <share.code@aol.com> wrote:
>> >>
>> >>> +1 Hyunsik.
>> >>>
>> >>> Thanks!
>> >>> Warm Regards,
>> >>> Alvin.
>> >>>
>> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
>> >>>
>> >>>> Hi folks,
>> >>>>
>> >>>> I'd like to discuss the next version number. In Jira, we have
>> >> provisionally
>> >>>> used 1.0, and we didn't decide the next major version. I propose
0.9
>> as
>> >> the
>> >>>> next major version. What do you think about this?
>> >>>>
>> >>>> Regards,
>> >>>> Hyunsik
>> >>>>
>> >>>>
>> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <jihoonson@apache.org>
>> >> wrote:
>> >>>>
>> >>>>> Min, thanks for reminding us!
>> >>>>> It's a mandatory issue.
>> >>>>> We need to implement that feature ASAP.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Jihoon
>> >>>>>
>> >>>>>
>> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>> >>>>>
>> >>>>>> Min,
>> >>>>>>
>> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed
it.
>> Thank
>> >> you
>> >>>>>> for reminding me. It would be achieved by modifying Query
class to
>> >>>>> execute
>> >>>>>> independent execution blocks in parallel. I'll add it to
the wiki.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Hyunsik
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com>
>> >> wrote:
>> >>>>>>
>> >>>>>>> Yeah.. Another issue,  seems a query like A join B.
Tajo will
>> scan A
>> >> at
>> >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't
run it in
>> >>>>>>> parallel, right?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Min
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <hyunsik@apache.org
>> >
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> I've just updated the roadmap page. Please take
a look at the
>> >> section
>> >>>>>>>> 'After 0.8.0'
>> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>> >>>>>>>>
>> >>>>>>>> If there are missed or additional ideas, feel free
to add them on
>> >>>>> that
>> >>>>>>>> page or suggest them here. After we discuss them
more, we would
>> >>>>> decide
>> >>>>>>>> their priorities.
>> >>>>>>>>
>> >>>>>>>> Best regards,
>> >>>>>>>> Hyunsik
>> >>>>>>>>
>> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <
>> hyunsik@apache.org>
>> >>>>>>> wrote:
>> >>>>>>>>> Hi Hyoungjun,
>> >>>>>>>>>
>> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary.
If we
>> provide
>> >>>>>>>>> users with some prepared benchmark environment,
users can test
>> Tajo
>> >>>>>>>>> easily. I'll file your idea on the wiki. Thank
you for your
>> >>>>>>>>> suggestion.
>> >>>>>>>>>
>> >>>>>>>>> Regards,
>> >>>>>>>>> Hyunsik
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <babokim@gmail.com>
wrote:
>> >>>>>>>>>> Hi Hyunsik ,
>> >>>>>>>>>>
>> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS
data. Benchmark script
>> >>>>> like
>> >>>>>>> hive
>> >>>>>>>>>> and impala is more helpful to test.
>> >>>>>>>>>>
>> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive
>> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench
>> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks!
>> >>>>>>>>>> Hyoungjun
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi
<hyunsik@apache.org>:
>> >>>>>>>>>>
>> >>>>>>>>>>> Hi Jihoon,
>> >>>>>>>>>>>
>> >>>>>>>>>>> CUBE and ROLL-UP are key features for
analytic problems. I
>> filed
>> >>>>> it
>> >>>>>>> on
>> >>>>>>>> the
>> >>>>>>>>>>> wiki.
>> >>>>>>>>>>>
>> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more
optimization
>> opportunities
>> >>>>> to
>> >>>>>>>>>>> logical planning and distributed query
planning. But, I'm not
>> >>>>> sure
>> >>>>>> it
>> >>>>>>>>>>> can be included in short-term roadmap.
They are necessary, but
>> >>>>> they
>> >>>>>>>>>>> are not required right now. In my view,
it would be
>> reasonable to
>> >>>>>>>>>>> schedule them on long-term roadmap.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Warm regards,
>> >>>>>>>>>>> Hyunsik
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon
Son <
>> jihoonson@apache.org
>> >>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>> Hi Hyunsik,
>> >>>>>>>>>>>> I'm very glad that we can release
the next version, soon.
>> >>>>>>>>>>>> Also, appreciate for the guideline
of the next roadmap.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Addition to the aforementioned features,
I have the two
>> >>>>>>> suggestions.
>> >>>>>>>>>>>> First is the support of CUBE operator
(TAJO-259). Acutally, I
>> >>>>>>>> started it
>> >>>>>>>>>>>> quite a long time ago, but it is
delayed due to the lower
>> >>>>>> priority
>> >>>>>>>> than
>> >>>>>>>>>>>> other stability issues. But, since
this operator is widely
>> used
>> >>>>>> in
>> >>>>>>>>>>> analytic
>> >>>>>>>>>>>> applications, we need to add this
feature as soon as
>> possible.
>> >>>>>> So,
>> >>>>>>>> in my
>> >>>>>>>>>>>> opinion, it would be good to add
this feature to the next
>> >>>>>> roadmap.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Second is the advanced query optimization.
TAJO-266 is an
>> issue
>> >>>>>> for
>> >>>>>>>>>>> making
>> >>>>>>>>>>>> the query plan more flexible. After
that, we can employ the
>> >>>>>> plenty
>> >>>>>>>>>>>> optimization opportunities like
described in TAJO-161.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> How do you guys think about these
issues?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best Regards,
>> >>>>>>>>>>>> Jihoon
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik
Choi <hyunsik@apache.org
>> >:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi folks,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I'm very happy to see that our
community is growing! Also,
>> >>>>> It's
>> >>>>>> a
>> >>>>>>>>>>> pleasure
>> >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release.
Recently, I've tested
>> >>>>> various
>> >>>>>>>>>>> features
>> >>>>>>>>>>>>> in various contexts, and tried
to figure out if there are
>> any
>> >>>>>>>> critical
>> >>>>>>>>>>>>> problems. I think that there
are only a few issues and we
>> can
>> >>>>>>>> release
>> >>>>>>>>>>> 0.8.0
>> >>>>>>>>>>>>> next week. If there are further
issues to be solved before
>> the
>> >>>>>>> 0.8.0
>> >>>>>>>>>>>>> release, feel free to suggest
ideas.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Also, I'd like to discuss our
next roadmap. We are open to
>> any
>> >>>>>>>>>>> suggestion
>> >>>>>>>>>>>>> from users, contributors, and
committers. Please fire away!
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I'm thinking that our next stage
should focus on improving
>> the
>> >>>>>> way
>> >>>>>>>> Tajo
>> >>>>>>>>>>>>> runs in thousands of large cluster
nodes and for a number of
>> >>>>>>>> concurrent
>> >>>>>>>>>>>>> users. The key issues associated
with this include the
>> >>>>>> following:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> * High availability
>> >>>>>>>>>>>>> * Multi-tenancy scheduling
>> >>>>>>>>>>>>> * More stability
>> >>>>>>>>>>>>> * Improved shuffle
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The current work status is as
follows. Min is working on
>> >>>>> Tajo's
>> >>>>>>> new
>> >>>>>>>>>>>>> scheduler (TAJO-540) based on
sparrow. I'll support him. As
>> >>>>> far
>> >>>>>>> as I
>> >>>>>>>>>>> know,
>> >>>>>>>>>>>>> Alvin is working on TajoMaster
HA (TAJO-704). Also, some
>> guys
>> >>>>>>>> including
>> >>>>>>>>>>>>> myself are investigating and
solving the issues which occur
>> in
>> >>>>>>> large
>> >>>>>>>>>>>>> clusters. These issues should
be solved in order to make
>> Tajo
>> >>>>> a
>> >>>>>>>> complete
>> >>>>>>>>>>>>> enterprise-ready production.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> In addition, there are some
SQL feature support issues. Many
>> >>>>>>>> analytic
>> >>>>>>>>>>>>> problems require window functions.
Also, in-subquery and
>> >>>>> scalar
>> >>>>>>>> subquery
>> >>>>>>>>>>>>> should be supported. So, I'd
like to schedule them with high
>> >>>>>>>> priority.
>> >>>>>>>>>>> In
>> >>>>>>>>>>>>> my view, there will be very
few SQL support issues if Tajo
>> >>>>>>> provides
>> >>>>>>>>>>> these
>> >>>>>>>>>>>>> features.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Besides those areas, David is
working on a nested schema and
>> >>>>> its
>> >>>>>>>> related
>> >>>>>>>>>>>>> work (TAJO-710). I guess this
will take quite a while
>> because
>> >>>>> it
>> >>>>>>>>>>> requires a
>> >>>>>>>>>>>>> lot of hard work. So, it would
be great to schedule the
>> nested
>> >>>>>>>> schema
>> >>>>>>>>>>>>> loosely. That's just my thoughts,
anyhow.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Aside from the discussion of
our roadmap, I'd like to
>> suggest
>> >>>>>> that
>> >>>>>>>> we
>> >>>>>>>>>>> need
>> >>>>>>>>>>>>> to release more frequently after
the 0.8.0 release. So far,
>> >>>>>> there
>> >>>>>>>> has
>> >>>>>>>>>>> been
>> >>>>>>>>>>>>> a long period between each release
because Tajo is
>> undergoing
>> >>>>>>> heavy
>> >>>>>>>>>>>>> development. By 'releasing early,
releasing often', we will
>> >>>>> make
>> >>>>>>>> more
>> >>>>>>>>>>>>> tighter feedback loop between
users and developers.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I think that there are many
additional many interesting
>> issues
>> >>>>>> to
>> >>>>>>> be
>> >>>>>>>>>>>>> included in our roadmap. Feel
free to suggest your idea. We
>> >>>>> will
>> >>>>>>>> arrange
>> >>>>>>>>>>>>> our short-term roadmap and long-term
roadmap based on your
>> >>>>>>>> suggestions.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thank you all so much for your
contribution!
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Warm Regards,
>> >>>>>>>>>>>>> Hyunsik
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop
>> >>>>>>>>>> http://tajo.apache.org/
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> My research interests are distributed systems, parallel
computing
>> and
>> >>>>>>> bytecode based virtual machine.
>> >>>>>>>
>> >>>>>>> My profile:
>> >>>>>>> http://www.linkedin.com/in/coderplay
>> >>>>>>> My blog:
>> >>>>>>> http://coderplay.javaeye.com
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > My research interests are distributed systems, parallel computing and
>> > bytecode based virtual machine.
>> >
>> > My profile:
>> > http://www.linkedin.com/in/coderplay
>> > My blog:
>> > http://coderplay.javaeye.com
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message