tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 김형준 <babo...@gmail.com>
Subject Re: [DISCUSS] 0.8.0 release and next roadmap
Date Fri, 04 Apr 2014 14:48:45 GMT
Hi Hyunsik ,

I did benchmark test with TPC-H, TPC-DS data. Benchmark script like hive
and impala is more helpful to test.

https://github.com/rxin/TPC-H-Hive
https://github.com/cartershanklin/hive-testbench
https://github.com/cloudera/impala-tpcds-kit

Thanks!
Hyoungjun


2014-04-04 23:40 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:

> Hi Jihoon,
>
> CUBE and ROLL-UP are key features for analytic problems. I filed it on the
> wiki.
>
> TAJO-266 and TAJO-161 will give more optimization opportunities to
> logical planning and distributed query planning. But, I'm not sure it
> can be included in short-term roadmap. They are necessary, but they
> are not required right now. In my view, it would be reasonable to
> schedule them on long-term roadmap.
>
> Warm regards,
> Hyunsik
>
> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <jihoonson@apache.org> wrote:
> > Hi Hyunsik,
> > I'm very glad that we can release the next version, soon.
> > Also, appreciate for the guideline of the next roadmap.
> >
> > Addition to the aforementioned features, I have the two suggestions.
> > First is the support of CUBE operator (TAJO-259). Acutally, I started it
> > quite a long time ago, but it is delayed due to the lower priority than
> > other stability issues. But, since this operator is widely used in
> analytic
> > applications, we need to add this feature as soon as possible. So, in my
> > opinion, it would be good to add this feature to the next roadmap.
> >
> > Second is the advanced query optimization. TAJO-266 is an issue for
> making
> > the query plan more flexible. After that, we can employ the plenty
> > optimization opportunities like described in TAJO-161.
> >
> > How do you guys think about these issues?
> >
> > Best Regards,
> > Jihoon
> >
> >
> > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <hyunsik@apache.org>:
> >
> >> Hi folks,
> >>
> >> I'm very happy to see that our community is growing! Also, It's a
> pleasure
> >> to discuss the Tajo 0.8.0 release. Recently, I've tested various
> features
> >> in various contexts, and tried to figure out if there are any critical
> >> problems. I think that there are only a few issues and we can release
> 0.8.0
> >> next week. If there are further issues to be solved before the 0.8.0
> >> release, feel free to suggest ideas.
> >>
> >> Also, I'd like to discuss our next roadmap. We are open to any
> suggestion
> >> from users, contributors, and committers. Please fire away!
> >>
> >> I'm thinking that our next stage should focus on improving the way Tajo
> >> runs in thousands of large cluster nodes and for a number of concurrent
> >> users. The key issues associated with this include the following:
> >>
> >> * High availability
> >> * Multi-tenancy scheduling
> >> * More stability
> >> * Improved shuffle
> >>
> >> The current work status is as follows. Min is working on Tajo's new
> >> scheduler (TAJO-540) based on sparrow. I'll support him. As far as I
> know,
> >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys including
> >> myself are investigating and solving the issues which occur in large
> >> clusters. These issues should be solved in order to make Tajo a complete
> >> enterprise-ready production.
> >>
> >> In addition, there are some SQL feature support issues. Many analytic
> >> problems require window functions. Also, in-subquery and scalar subquery
> >> should be supported. So, I'd like to schedule them with high priority.
> In
> >> my view, there will be very few SQL support issues if Tajo provides
> these
> >> features.
> >>
> >> Besides those areas, David is working on a nested schema and its related
> >> work (TAJO-710). I guess this will take quite a while because it
> requires a
> >> lot of hard work. So, it would be great to schedule the nested schema
> >> loosely. That's just my thoughts, anyhow.
> >>
> >> Aside from the discussion of our roadmap, I'd like to suggest that we
> need
> >> to release more frequently after the 0.8.0 release. So far, there has
> been
> >> a long period between each release because Tajo is undergoing heavy
> >> development. By 'releasing early, releasing often', we will make more
> >> tighter feedback loop between users and developers.
> >>
> >> I think that there are many additional many interesting issues to be
> >> included in our roadmap. Feel free to suggest your idea. We will arrange
> >> our short-term roadmap and long-term roadmap based on your suggestions.
> >>
> >> Thank you all so much for your contribution!
> >>
> >> Warm Regards,
> >> Hyunsik
> >>
>



-- 
Tajo - Big Data Warehouse System on Hadoop
http://tajo.apache.org/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message