tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Wed, 09 Apr 2014 18:19:56 GMT
Min,

Yes, you are right. I'm thinking it everyday, but I missed it. Thank you
for reminding me. It would be achieved by modifying Query class to execute
independent execution blocks in parallel. I'll add it to the wiki.

Thanks,
Hyunsik


On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <coderplay@gmail.com> wrote:

> Yeah.. Another issue,  seems a query like A join B. Tajo will scan A at
> first stage, after that in the 2nd stage scan B. Doesn't run it in
> parallel, right?
>
>
> Min
>
>
> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <hyunsik@apache.org> wrote:
>
> > I've just updated the roadmap page. Please take a look at the section
> > 'After 0.8.0'
> > https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
> >
> > If there are missed or additional ideas, feel free to add them on that
> > page or suggest them here. After we discuss them more, we would decide
> > their priorities.
> >
> > Best regards,
> > Hyunsik
> >
> > On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <hyunsik@apache.org>
> wrote:
> > > Hi Hyoungjun,
> > >
> > > Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
> > > users with some prepared benchmark environment, users can test Tajo
> > > easily. I'll file your idea on the wiki. Thank you for your
> > > suggestion.
> > >
> > > Regards,
> > > Hyunsik
> > >
> > > On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <babokim@gmail.com> wrote:
> > >> Hi Hyunsik ,
> > >>
> > >> I did benchmark test with TPC-H, TPC-DS data. Benchmark script like
> hive
> > >> and impala is more helpful to test.
> > >>
> > >> https://github.com/rxin/TPC-H-Hive
> > >> https://github.com/cartershanklin/hive-testbench
> > >> https://github.com/cloudera/impala-tpcds-kit
> > >>
> > >> Thanks!
> > >> Hyoungjun
> > >>
> > >>
> > >> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
> > >>
> > >>> Hi Jihoon,
> > >>>
> > >>> CUBE and ROLL-UP are key features for analytic problems. I filed it
> on
> > the
> > >>> wiki.
> > >>>
> > >>> TAJO-266 and TAJO-161 will give more optimization opportunities to
> > >>> logical planning and distributed query planning. But, I'm not sure
it
> > >>> can be included in short-term roadmap. They are necessary, but they
> > >>> are not required right now. In my view, it would be reasonable to
> > >>> schedule them on long-term roadmap.
> > >>>
> > >>> Warm regards,
> > >>> Hyunsik
> > >>>
> > >>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <jihoonson@apache.org>
> > wrote:
> > >>> > Hi Hyunsik,
> > >>> > I'm very glad that we can release the next version, soon.
> > >>> > Also, appreciate for the guideline of the next roadmap.
> > >>> >
> > >>> > Addition to the aforementioned features, I have the two
> suggestions.
> > >>> > First is the support of CUBE operator (TAJO-259). Acutally, I
> > started it
> > >>> > quite a long time ago, but it is delayed due to the lower priority
> > than
> > >>> > other stability issues. But, since this operator is widely used
in
> > >>> analytic
> > >>> > applications, we need to add this feature as soon as possible.
So,
> > in my
> > >>> > opinion, it would be good to add this feature to the next roadmap.
> > >>> >
> > >>> > Second is the advanced query optimization. TAJO-266 is an issue
for
> > >>> making
> > >>> > the query plan more flexible. After that, we can employ the plenty
> > >>> > optimization opportunities like described in TAJO-161.
> > >>> >
> > >>> > How do you guys think about these issues?
> > >>> >
> > >>> > Best Regards,
> > >>> > Jihoon
> > >>> >
> > >>> >
> > >>> > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
> > >>> >
> > >>> >> Hi folks,
> > >>> >>
> > >>> >> I'm very happy to see that our community is growing! Also,
It's a
> > >>> pleasure
> > >>> >> to discuss the Tajo 0.8.0 release. Recently, I've tested various
> > >>> features
> > >>> >> in various contexts, and tried to figure out if there are
any
> > critical
> > >>> >> problems. I think that there are only a few issues and we
can
> > release
> > >>> 0.8.0
> > >>> >> next week. If there are further issues to be solved before
the
> 0.8.0
> > >>> >> release, feel free to suggest ideas.
> > >>> >>
> > >>> >> Also, I'd like to discuss our next roadmap. We are open to
any
> > >>> suggestion
> > >>> >> from users, contributors, and committers. Please fire away!
> > >>> >>
> > >>> >> I'm thinking that our next stage should focus on improving
the way
> > Tajo
> > >>> >> runs in thousands of large cluster nodes and for a number
of
> > concurrent
> > >>> >> users. The key issues associated with this include the following:
> > >>> >>
> > >>> >> * High availability
> > >>> >> * Multi-tenancy scheduling
> > >>> >> * More stability
> > >>> >> * Improved shuffle
> > >>> >>
> > >>> >> The current work status is as follows. Min is working on Tajo's
> new
> > >>> >> scheduler (TAJO-540) based on sparrow. I'll support him. As
far
> as I
> > >>> know,
> > >>> >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys
> > including
> > >>> >> myself are investigating and solving the issues which occur
in
> large
> > >>> >> clusters. These issues should be solved in order to make Tajo
a
> > complete
> > >>> >> enterprise-ready production.
> > >>> >>
> > >>> >> In addition, there are some SQL feature support issues. Many
> > analytic
> > >>> >> problems require window functions. Also, in-subquery and scalar
> > subquery
> > >>> >> should be supported. So, I'd like to schedule them with high
> > priority.
> > >>> In
> > >>> >> my view, there will be very few SQL support issues if Tajo
> provides
> > >>> these
> > >>> >> features.
> > >>> >>
> > >>> >> Besides those areas, David is working on a nested schema and
its
> > related
> > >>> >> work (TAJO-710). I guess this will take quite a while because
it
> > >>> requires a
> > >>> >> lot of hard work. So, it would be great to schedule the nested
> > schema
> > >>> >> loosely. That's just my thoughts, anyhow.
> > >>> >>
> > >>> >> Aside from the discussion of our roadmap, I'd like to suggest
that
> > we
> > >>> need
> > >>> >> to release more frequently after the 0.8.0 release. So far,
there
> > has
> > >>> been
> > >>> >> a long period between each release because Tajo is undergoing
> heavy
> > >>> >> development. By 'releasing early, releasing often', we will
make
> > more
> > >>> >> tighter feedback loop between users and developers.
> > >>> >>
> > >>> >> I think that there are many additional many interesting issues
to
> be
> > >>> >> included in our roadmap. Feel free to suggest your idea. We
will
> > arrange
> > >>> >> our short-term roadmap and long-term roadmap based on your
> > suggestions.
> > >>> >>
> > >>> >> Thank you all so much for your contribution!
> > >>> >>
> > >>> >> Warm Regards,
> > >>> >> Hyunsik
> > >>> >>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Tajo - Big Data Warehouse System on Hadoop
> > >> http://tajo.apache.org/
> >
>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message