tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Sat, 19 Apr 2014 18:07:13 GMT
Great discussion everyone, sorry to have missed so much of it. I will
certainly keep an eye on the YARN support angle and would love to help.

I am hoping now that my team is growing at work I will have time to dive
back into my open source projects. I agree that YARN (and Mesos) support
will be a huge plus.



On Mon, Apr 14, 2014 at 11:42 PM, Hyunsik Choi <hyunsik@apache.org> wrote:

> As David mentioned, the version 1.0 usually has special meanings like GA.
> When we are confident with the stability and features of Tajo, we can use
> 1.0. Thank you all guys again!
>
>
> On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>
> > Thank you for votes! Let's go ahead!
> >
> > Cheers,
> > Hyunsik
> >
> >
> > On Tue, Apr 15, 2014 at 9:03 AM, ktpark <sirpkt@apache.org> wrote:
> >
> >> +1
> >>
> >> I agree with Hyunsik.
> >> Sorry for late reply.
> >>
> >> 2014. 4. 15., 오전 5:05, Min Zhou <coderplay@gmail.com> 작성:
> >>
> >> > Until today realized that my reply haven't been sent.
> >> >
> >> > +1
> >> >
> >> > Totally agree with Hyunsik. 0.9 is more appropriate for the next
> >> release.
> >> >
> >> > Min
> >> >
> >> >
> >> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <dchen@linkedin.com>
> >> wrote:
> >> >
> >> >> +1
> >> >>
> >> >> I agree with Hyunsik as well. I think since 1.0 increments the major
> >> >> version number, it should be used for a particularly significant
> >> release. :)
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >>
> >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <share.code@aol.com>
> wrote:
> >> >>
> >> >>> +1 Hyunsik.
> >> >>>
> >> >>> Thanks!
> >> >>> Warm Regards,
> >> >>> Alvin.
> >> >>>
> >> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
> >> >>>
> >> >>>> Hi folks,
> >> >>>>
> >> >>>> I'd like to discuss the next version number. In Jira, we have
> >> >> provisionally
> >> >>>> used 1.0, and we didn't decide the next major version. I propose
> 0.9
> >> as
> >> >> the
> >> >>>> next major version. What do you think about this?
> >> >>>>
> >> >>>> Regards,
> >> >>>> Hyunsik
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <jihoonson@apache.org
> >
> >> >> wrote:
> >> >>>>
> >> >>>>> Min, thanks for reminding us!
> >> >>>>> It's a mandatory issue.
> >> >>>>> We need to implement that feature ASAP.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Jihoon
> >> >>>>>
> >> >>>>>
> >> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
> >> >>>>>
> >> >>>>>> Min,
> >> >>>>>>
> >> >>>>>> Yes, you are right. I'm thinking it everyday, but I
missed it.
> >> Thank
> >> >> you
> >> >>>>>> for reminding me. It would be achieved by modifying
Query class
> to
> >> >>>>> execute
> >> >>>>>> independent execution blocks in parallel. I'll add
it to the
> wiki.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Hyunsik
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com>
> >> >> wrote:
> >> >>>>>>
> >> >>>>>>> Yeah.. Another issue,  seems a query like A join
B. Tajo will
> >> scan A
> >> >> at
> >> >>>>>>> first stage, after that in the 2nd stage scan B.
Doesn't run it
> in
> >> >>>>>>> parallel, right?
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Min
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <
> hyunsik@apache.org
> >> >
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> I've just updated the roadmap page. Please
take a look at the
> >> >> section
> >> >>>>>>>> 'After 0.8.0'
> >> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
> >> >>>>>>>>
> >> >>>>>>>> If there are missed or additional ideas, feel
free to add them
> on
> >> >>>>> that
> >> >>>>>>>> page or suggest them here. After we discuss
them more, we would
> >> >>>>> decide
> >> >>>>>>>> their priorities.
> >> >>>>>>>>
> >> >>>>>>>> Best regards,
> >> >>>>>>>> Hyunsik
> >> >>>>>>>>
> >> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi
<
> >> hyunsik@apache.org>
> >> >>>>>>> wrote:
> >> >>>>>>>>> Hi Hyoungjun,
> >> >>>>>>>>>
> >> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo
are necessary. If we
> >> provide
> >> >>>>>>>>> users with some prepared benchmark environment,
users can test
> >> Tajo
> >> >>>>>>>>> easily. I'll file your idea on the wiki.
Thank you for your
> >> >>>>>>>>> suggestion.
> >> >>>>>>>>>
> >> >>>>>>>>> Regards,
> >> >>>>>>>>> Hyunsik
> >> >>>>>>>>>
> >> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준
<babokim@gmail.com>
> wrote:
> >> >>>>>>>>>> Hi Hyunsik ,
> >> >>>>>>>>>>
> >> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS
data. Benchmark
> script
> >> >>>>> like
> >> >>>>>>> hive
> >> >>>>>>>>>> and impala is more helpful to test.
> >> >>>>>>>>>>
> >> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive
> >> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench
> >> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
> >> >>>>>>>>>>
> >> >>>>>>>>>> Thanks!
> >> >>>>>>>>>> Hyoungjun
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik
Choi <hyunsik@apache.org
> >:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Hi Jihoon,
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> CUBE and ROLL-UP are key features
for analytic problems. I
> >> filed
> >> >>>>> it
> >> >>>>>>> on
> >> >>>>>>>> the
> >> >>>>>>>>>>> wiki.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> TAJO-266 and TAJO-161 will give
more optimization
> >> opportunities
> >> >>>>> to
> >> >>>>>>>>>>> logical planning and distributed
query planning. But, I'm
> not
> >> >>>>> sure
> >> >>>>>> it
> >> >>>>>>>>>>> can be included in short-term roadmap.
They are necessary,
> but
> >> >>>>> they
> >> >>>>>>>>>>> are not required right now. In
my view, it would be
> >> reasonable to
> >> >>>>>>>>>>> schedule them on long-term roadmap.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Warm regards,
> >> >>>>>>>>>>> Hyunsik
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM,
Jihoon Son <
> >> jihoonson@apache.org
> >> >>>>>>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>>>> Hi Hyunsik,
> >> >>>>>>>>>>>> I'm very glad that we can release
the next version, soon.
> >> >>>>>>>>>>>> Also, appreciate for the guideline
of the next roadmap.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Addition to the aforementioned
features, I have the two
> >> >>>>>>> suggestions.
> >> >>>>>>>>>>>> First is the support of CUBE
operator (TAJO-259).
> Acutally, I
> >> >>>>>>>> started it
> >> >>>>>>>>>>>> quite a long time ago, but
it is delayed due to the lower
> >> >>>>>> priority
> >> >>>>>>>> than
> >> >>>>>>>>>>>> other stability issues. But,
since this operator is widely
> >> used
> >> >>>>>> in
> >> >>>>>>>>>>> analytic
> >> >>>>>>>>>>>> applications, we need to add
this feature as soon as
> >> possible.
> >> >>>>>> So,
> >> >>>>>>>> in my
> >> >>>>>>>>>>>> opinion, it would be good to
add this feature to the next
> >> >>>>>> roadmap.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Second is the advanced query
optimization. TAJO-266 is an
> >> issue
> >> >>>>>> for
> >> >>>>>>>>>>> making
> >> >>>>>>>>>>>> the query plan more flexible.
After that, we can employ the
> >> >>>>>> plenty
> >> >>>>>>>>>>>> optimization opportunities
like described in TAJO-161.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> How do you guys think about
these issues?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Best Regards,
> >> >>>>>>>>>>>> Jihoon
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00
Hyunsik Choi <
> hyunsik@apache.org
> >> >:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>> Hi folks,
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> I'm very happy to see that
our community is growing! Also,
> >> >>>>> It's
> >> >>>>>> a
> >> >>>>>>>>>>> pleasure
> >> >>>>>>>>>>>>> to discuss the Tajo 0.8.0
release. Recently, I've tested
> >> >>>>> various
> >> >>>>>>>>>>> features
> >> >>>>>>>>>>>>> in various contexts, and
tried to figure out if there are
> >> any
> >> >>>>>>>> critical
> >> >>>>>>>>>>>>> problems. I think that
there are only a few issues and we
> >> can
> >> >>>>>>>> release
> >> >>>>>>>>>>> 0.8.0
> >> >>>>>>>>>>>>> next week. If there are
further issues to be solved before
> >> the
> >> >>>>>>> 0.8.0
> >> >>>>>>>>>>>>> release, feel free to suggest
ideas.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Also, I'd like to discuss
our next roadmap. We are open to
> >> any
> >> >>>>>>>>>>> suggestion
> >> >>>>>>>>>>>>> from users, contributors,
and committers. Please fire
> away!
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> I'm thinking that our next
stage should focus on improving
> >> the
> >> >>>>>> way
> >> >>>>>>>> Tajo
> >> >>>>>>>>>>>>> runs in thousands of large
cluster nodes and for a number
> of
> >> >>>>>>>> concurrent
> >> >>>>>>>>>>>>> users. The key issues associated
with this include the
> >> >>>>>> following:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> * High availability
> >> >>>>>>>>>>>>> * Multi-tenancy scheduling
> >> >>>>>>>>>>>>> * More stability
> >> >>>>>>>>>>>>> * Improved shuffle
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> The current work status
is as follows. Min is working on
> >> >>>>> Tajo's
> >> >>>>>>> new
> >> >>>>>>>>>>>>> scheduler (TAJO-540) based
on sparrow. I'll support him.
> As
> >> >>>>> far
> >> >>>>>>> as I
> >> >>>>>>>>>>> know,
> >> >>>>>>>>>>>>> Alvin is working on TajoMaster
HA (TAJO-704). Also, some
> >> guys
> >> >>>>>>>> including
> >> >>>>>>>>>>>>> myself are investigating
and solving the issues which
> occur
> >> in
> >> >>>>>>> large
> >> >>>>>>>>>>>>> clusters. These issues
should be solved in order to make
> >> Tajo
> >> >>>>> a
> >> >>>>>>>> complete
> >> >>>>>>>>>>>>> enterprise-ready production.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> In addition, there are
some SQL feature support issues.
> Many
> >> >>>>>>>> analytic
> >> >>>>>>>>>>>>> problems require window
functions. Also, in-subquery and
> >> >>>>> scalar
> >> >>>>>>>> subquery
> >> >>>>>>>>>>>>> should be supported. So,
I'd like to schedule them with
> high
> >> >>>>>>>> priority.
> >> >>>>>>>>>>> In
> >> >>>>>>>>>>>>> my view, there will be
very few SQL support issues if Tajo
> >> >>>>>>> provides
> >> >>>>>>>>>>> these
> >> >>>>>>>>>>>>> features.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Besides those areas, David
is working on a nested schema
> and
> >> >>>>> its
> >> >>>>>>>> related
> >> >>>>>>>>>>>>> work (TAJO-710). I guess
this will take quite a while
> >> because
> >> >>>>> it
> >> >>>>>>>>>>> requires a
> >> >>>>>>>>>>>>> lot of hard work. So, it
would be great to schedule the
> >> nested
> >> >>>>>>>> schema
> >> >>>>>>>>>>>>> loosely. That's just my
thoughts, anyhow.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Aside from the discussion
of our roadmap, I'd like to
> >> suggest
> >> >>>>>> that
> >> >>>>>>>> we
> >> >>>>>>>>>>> need
> >> >>>>>>>>>>>>> to release more frequently
after the 0.8.0 release. So
> far,
> >> >>>>>> there
> >> >>>>>>>> has
> >> >>>>>>>>>>> been
> >> >>>>>>>>>>>>> a long period between each
release because Tajo is
> >> undergoing
> >> >>>>>>> heavy
> >> >>>>>>>>>>>>> development. By 'releasing
early, releasing often', we
> will
> >> >>>>> make
> >> >>>>>>>> more
> >> >>>>>>>>>>>>> tighter feedback loop between
users and developers.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> I think that there are
many additional many interesting
> >> issues
> >> >>>>>> to
> >> >>>>>>> be
> >> >>>>>>>>>>>>> included in our roadmap.
Feel free to suggest your idea.
> We
> >> >>>>> will
> >> >>>>>>>> arrange
> >> >>>>>>>>>>>>> our short-term roadmap
and long-term roadmap based on your
> >> >>>>>>>> suggestions.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Thank you all so much for
your contribution!
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Warm Regards,
> >> >>>>>>>>>>>>> Hyunsik
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> --
> >> >>>>>>>>>> Tajo - Big Data Warehouse System on
Hadoop
> >> >>>>>>>>>> http://tajo.apache.org/
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> My research interests are distributed systems,
parallel
> computing
> >> and
> >> >>>>>>> bytecode based virtual machine.
> >> >>>>>>>
> >> >>>>>>> My profile:
> >> >>>>>>> http://www.linkedin.com/in/coderplay
> >> >>>>>>> My blog:
> >> >>>>>>> http://coderplay.javaeye.com
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > My research interests are distributed systems, parallel computing and
> >> > bytecode based virtual machine.
> >> >
> >> > My profile:
> >> > http://www.linkedin.com/in/coderplay
> >> > My blog:
> >> > http://coderplay.javaeye.com
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message