tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Thu, 24 Apr 2014 06:20:51 GMT
Hi Eli,

Thank you for comment. I'm also really hoping that you can have time
to contribute open source projects. Especially, since you are very
skilled in Yarn, your contribution would be great help to us :).

Thanks,
Hyunsik

On Sun, Apr 20, 2014 at 3:07 AM, Eli Reisman <apache.mailbox@gmail.com> wrote:
> Great discussion everyone, sorry to have missed so much of it. I will
> certainly keep an eye on the YARN support angle and would love to help.
>
> I am hoping now that my team is growing at work I will have time to dive
> back into my open source projects. I agree that YARN (and Mesos) support
> will be a huge plus.
>
>
>
> On Mon, Apr 14, 2014 at 11:42 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>
>> As David mentioned, the version 1.0 usually has special meanings like GA.
>> When we are confident with the stability and features of Tajo, we can use
>> 1.0. Thank you all guys again!
>>
>>
>> On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>>
>> > Thank you for votes! Let's go ahead!
>> >
>> > Cheers,
>> > Hyunsik
>> >
>> >
>> > On Tue, Apr 15, 2014 at 9:03 AM, ktpark <sirpkt@apache.org> wrote:
>> >
>> >> +1
>> >>
>> >> I agree with Hyunsik.
>> >> Sorry for late reply.
>> >>
>> >> 2014. 4. 15., 오전 5:05, Min Zhou <coderplay@gmail.com> 작성:
>> >>
>> >> > Until today realized that my reply haven't been sent.
>> >> >
>> >> > +1
>> >> >
>> >> > Totally agree with Hyunsik. 0.9 is more appropriate for the next
>> >> release.
>> >> >
>> >> > Min
>> >> >
>> >> >
>> >> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <dchen@linkedin.com>
>> >> wrote:
>> >> >
>> >> >> +1
>> >> >>
>> >> >> I agree with Hyunsik as well. I think since 1.0 increments the
major
>> >> >> version number, it should be used for a particularly significant
>> >> release. :)
>> >> >>
>> >> >> Thanks,
>> >> >> David
>> >> >>
>> >> >>
>> >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <share.code@aol.com>
>> wrote:
>> >> >>
>> >> >>> +1 Hyunsik.
>> >> >>>
>> >> >>> Thanks!
>> >> >>> Warm Regards,
>> >> >>> Alvin.
>> >> >>>
>> >> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
>> >> >>>
>> >> >>>> Hi folks,
>> >> >>>>
>> >> >>>> I'd like to discuss the next version number. In Jira, we
have
>> >> >> provisionally
>> >> >>>> used 1.0, and we didn't decide the next major version.
I propose
>> 0.9
>> >> as
>> >> >> the
>> >> >>>> next major version. What do you think about this?
>> >> >>>>
>> >> >>>> Regards,
>> >> >>>> Hyunsik
>> >> >>>>
>> >> >>>>
>> >> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <jihoonson@apache.org
>> >
>> >> >> wrote:
>> >> >>>>
>> >> >>>>> Min, thanks for reminding us!
>> >> >>>>> It's a mandatory issue.
>> >> >>>>> We need to implement that feature ASAP.
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Jihoon
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
>> >> >>>>>
>> >> >>>>>> Min,
>> >> >>>>>>
>> >> >>>>>> Yes, you are right. I'm thinking it everyday, but
I missed it.
>> >> Thank
>> >> >> you
>> >> >>>>>> for reminding me. It would be achieved by modifying
Query class
>> to
>> >> >>>>> execute
>> >> >>>>>> independent execution blocks in parallel. I'll
add it to the
>> wiki.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Hyunsik
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com>
>> >> >> wrote:
>> >> >>>>>>
>> >> >>>>>>> Yeah.. Another issue,  seems a query like A
join B. Tajo will
>> >> scan A
>> >> >> at
>> >> >>>>>>> first stage, after that in the 2nd stage scan
B. Doesn't run it
>> in
>> >> >>>>>>> parallel, right?
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> Min
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi
<
>> hyunsik@apache.org
>> >> >
>> >> >>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>>> I've just updated the roadmap page. Please
take a look at the
>> >> >> section
>> >> >>>>>>>> 'After 0.8.0'
>> >> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>> >> >>>>>>>>
>> >> >>>>>>>> If there are missed or additional ideas,
feel free to add them
>> on
>> >> >>>>> that
>> >> >>>>>>>> page or suggest them here. After we discuss
them more, we would
>> >> >>>>> decide
>> >> >>>>>>>> their priorities.
>> >> >>>>>>>>
>> >> >>>>>>>> Best regards,
>> >> >>>>>>>> Hyunsik
>> >> >>>>>>>>
>> >> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik
Choi <
>> >> hyunsik@apache.org>
>> >> >>>>>>> wrote:
>> >> >>>>>>>>> Hi Hyoungjun,
>> >> >>>>>>>>>
>> >> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo
are necessary. If we
>> >> provide
>> >> >>>>>>>>> users with some prepared benchmark
environment, users can test
>> >> Tajo
>> >> >>>>>>>>> easily. I'll file your idea on the
wiki. Thank you for your
>> >> >>>>>>>>> suggestion.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Regards,
>> >> >>>>>>>>> Hyunsik
>> >> >>>>>>>>>
>> >> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준
<babokim@gmail.com>
>> wrote:
>> >> >>>>>>>>>> Hi Hyunsik ,
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> I did benchmark test with TPC-H,
TPC-DS data. Benchmark
>> script
>> >> >>>>> like
>> >> >>>>>>> hive
>> >> >>>>>>>>>> and impala is more helpful to test.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive
>> >> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench
>> >> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Thanks!
>> >> >>>>>>>>>> Hyoungjun
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik
Choi <hyunsik@apache.org
>> >:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Hi Jihoon,
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> CUBE and ROLL-UP are key features
for analytic problems. I
>> >> filed
>> >> >>>>> it
>> >> >>>>>>> on
>> >> >>>>>>>> the
>> >> >>>>>>>>>>> wiki.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> TAJO-266 and TAJO-161 will
give more optimization
>> >> opportunities
>> >> >>>>> to
>> >> >>>>>>>>>>> logical planning and distributed
query planning. But, I'm
>> not
>> >> >>>>> sure
>> >> >>>>>> it
>> >> >>>>>>>>>>> can be included in short-term
roadmap. They are necessary,
>> but
>> >> >>>>> they
>> >> >>>>>>>>>>> are not required right now.
In my view, it would be
>> >> reasonable to
>> >> >>>>>>>>>>> schedule them on long-term
roadmap.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Warm regards,
>> >> >>>>>>>>>>> Hyunsik
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01
PM, Jihoon Son <
>> >> jihoonson@apache.org
>> >> >>>>>>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>>>>> Hi Hyunsik,
>> >> >>>>>>>>>>>> I'm very glad that we can
release the next version, soon.
>> >> >>>>>>>>>>>> Also, appreciate for the
guideline of the next roadmap.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Addition to the aforementioned
features, I have the two
>> >> >>>>>>> suggestions.
>> >> >>>>>>>>>>>> First is the support of
CUBE operator (TAJO-259).
>> Acutally, I
>> >> >>>>>>>> started it
>> >> >>>>>>>>>>>> quite a long time ago,
but it is delayed due to the lower
>> >> >>>>>> priority
>> >> >>>>>>>> than
>> >> >>>>>>>>>>>> other stability issues.
But, since this operator is widely
>> >> used
>> >> >>>>>> in
>> >> >>>>>>>>>>> analytic
>> >> >>>>>>>>>>>> applications, we need to
add this feature as soon as
>> >> possible.
>> >> >>>>>> So,
>> >> >>>>>>>> in my
>> >> >>>>>>>>>>>> opinion, it would be good
to add this feature to the next
>> >> >>>>>> roadmap.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Second is the advanced
query optimization. TAJO-266 is an
>> >> issue
>> >> >>>>>> for
>> >> >>>>>>>>>>> making
>> >> >>>>>>>>>>>> the query plan more flexible.
After that, we can employ the
>> >> >>>>>> plenty
>> >> >>>>>>>>>>>> optimization opportunities
like described in TAJO-161.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> How do you guys think about
these issues?
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Best Regards,
>> >> >>>>>>>>>>>> Jihoon
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00
Hyunsik Choi <
>> hyunsik@apache.org
>> >> >:
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Hi folks,
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> I'm very happy to see
that our community is growing! Also,
>> >> >>>>> It's
>> >> >>>>>> a
>> >> >>>>>>>>>>> pleasure
>> >> >>>>>>>>>>>>> to discuss the Tajo
0.8.0 release. Recently, I've tested
>> >> >>>>> various
>> >> >>>>>>>>>>> features
>> >> >>>>>>>>>>>>> in various contexts,
and tried to figure out if there are
>> >> any
>> >> >>>>>>>> critical
>> >> >>>>>>>>>>>>> problems. I think that
there are only a few issues and we
>> >> can
>> >> >>>>>>>> release
>> >> >>>>>>>>>>> 0.8.0
>> >> >>>>>>>>>>>>> next week. If there
are further issues to be solved before
>> >> the
>> >> >>>>>>> 0.8.0
>> >> >>>>>>>>>>>>> release, feel free
to suggest ideas.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Also, I'd like to discuss
our next roadmap. We are open to
>> >> any
>> >> >>>>>>>>>>> suggestion
>> >> >>>>>>>>>>>>> from users, contributors,
and committers. Please fire
>> away!
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> I'm thinking that our
next stage should focus on improving
>> >> the
>> >> >>>>>> way
>> >> >>>>>>>> Tajo
>> >> >>>>>>>>>>>>> runs in thousands of
large cluster nodes and for a number
>> of
>> >> >>>>>>>> concurrent
>> >> >>>>>>>>>>>>> users. The key issues
associated with this include the
>> >> >>>>>> following:
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> * High availability
>> >> >>>>>>>>>>>>> * Multi-tenancy scheduling
>> >> >>>>>>>>>>>>> * More stability
>> >> >>>>>>>>>>>>> * Improved shuffle
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> The current work status
is as follows. Min is working on
>> >> >>>>> Tajo's
>> >> >>>>>>> new
>> >> >>>>>>>>>>>>> scheduler (TAJO-540)
based on sparrow. I'll support him.
>> As
>> >> >>>>> far
>> >> >>>>>>> as I
>> >> >>>>>>>>>>> know,
>> >> >>>>>>>>>>>>> Alvin is working on
TajoMaster HA (TAJO-704). Also, some
>> >> guys
>> >> >>>>>>>> including
>> >> >>>>>>>>>>>>> myself are investigating
and solving the issues which
>> occur
>> >> in
>> >> >>>>>>> large
>> >> >>>>>>>>>>>>> clusters. These issues
should be solved in order to make
>> >> Tajo
>> >> >>>>> a
>> >> >>>>>>>> complete
>> >> >>>>>>>>>>>>> enterprise-ready production.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> In addition, there
are some SQL feature support issues.
>> Many
>> >> >>>>>>>> analytic
>> >> >>>>>>>>>>>>> problems require window
functions. Also, in-subquery and
>> >> >>>>> scalar
>> >> >>>>>>>> subquery
>> >> >>>>>>>>>>>>> should be supported.
So, I'd like to schedule them with
>> high
>> >> >>>>>>>> priority.
>> >> >>>>>>>>>>> In
>> >> >>>>>>>>>>>>> my view, there will
be very few SQL support issues if Tajo
>> >> >>>>>>> provides
>> >> >>>>>>>>>>> these
>> >> >>>>>>>>>>>>> features.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Besides those areas,
David is working on a nested schema
>> and
>> >> >>>>> its
>> >> >>>>>>>> related
>> >> >>>>>>>>>>>>> work (TAJO-710). I
guess this will take quite a while
>> >> because
>> >> >>>>> it
>> >> >>>>>>>>>>> requires a
>> >> >>>>>>>>>>>>> lot of hard work. So,
it would be great to schedule the
>> >> nested
>> >> >>>>>>>> schema
>> >> >>>>>>>>>>>>> loosely. That's just
my thoughts, anyhow.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Aside from the discussion
of our roadmap, I'd like to
>> >> suggest
>> >> >>>>>> that
>> >> >>>>>>>> we
>> >> >>>>>>>>>>> need
>> >> >>>>>>>>>>>>> to release more frequently
after the 0.8.0 release. So
>> far,
>> >> >>>>>> there
>> >> >>>>>>>> has
>> >> >>>>>>>>>>> been
>> >> >>>>>>>>>>>>> a long period between
each release because Tajo is
>> >> undergoing
>> >> >>>>>>> heavy
>> >> >>>>>>>>>>>>> development. By 'releasing
early, releasing often', we
>> will
>> >> >>>>> make
>> >> >>>>>>>> more
>> >> >>>>>>>>>>>>> tighter feedback loop
between users and developers.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> I think that there
are many additional many interesting
>> >> issues
>> >> >>>>>> to
>> >> >>>>>>> be
>> >> >>>>>>>>>>>>> included in our roadmap.
Feel free to suggest your idea.
>> We
>> >> >>>>> will
>> >> >>>>>>>> arrange
>> >> >>>>>>>>>>>>> our short-term roadmap
and long-term roadmap based on your
>> >> >>>>>>>> suggestions.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Thank you all so much
for your contribution!
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Warm Regards,
>> >> >>>>>>>>>>>>> Hyunsik
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> --
>> >> >>>>>>>>>> Tajo - Big Data Warehouse System
on Hadoop
>> >> >>>>>>>>>> http://tajo.apache.org/
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> My research interests are distributed systems,
parallel
>> computing
>> >> and
>> >> >>>>>>> bytecode based virtual machine.
>> >> >>>>>>>
>> >> >>>>>>> My profile:
>> >> >>>>>>> http://www.linkedin.com/in/coderplay
>> >> >>>>>>> My blog:
>> >> >>>>>>> http://coderplay.javaeye.com
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > My research interests are distributed systems, parallel computing and
>> >> > bytecode based virtual machine.
>> >> >
>> >> > My profile:
>> >> > http://www.linkedin.com/in/coderplay
>> >> > My blog:
>> >> > http://coderplay.javaeye.com
>> >>
>> >>
>> >
>>

Mime
View raw message