Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4032C11768 for ; Mon, 14 Apr 2014 20:06:32 +0000 (UTC) Received: (qmail 64168 invoked by uid 500); 14 Apr 2014 20:06:26 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 63988 invoked by uid 500); 14 Apr 2014 20:06:22 -0000 Mailing-List: contact dev-help@tajo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.apache.org Delivered-To: mailing list dev@tajo.apache.org Received: (qmail 63851 invoked by uid 99); 14 Apr 2014 20:06:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 20:06:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of coderplay@gmail.com designates 209.85.220.171 as permitted sender) Received: from [209.85.220.171] (HELO mail-vc0-f171.google.com) (209.85.220.171) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 20:06:15 +0000 Received: by mail-vc0-f171.google.com with SMTP id lg15so8052826vcb.16 for ; Mon, 14 Apr 2014 13:05:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=cB7KJ2oyIQ7j2nCig/FgucmpuYDAbhmJ/pmBczqmEkg=; b=oQqcFFH/Aj5rXBIvvtnc87krNKOUQTAFh5daeT35C7UtNqWB19KTqVA3pt/Ewn8S2D 6NOgldWuajjGl/4vEiCQE6WD7fWflNGqb/dyvMMOlobnqBmWJxySZJ/3mpltiqAsrFXi FsuYjKaGBKKerihQZaodnzwAX3g+OXQCIid8Q6MVl1UR/A5oR9Wd95SRvOU9t+r1Gfha Q82evMOWLqZhs7jwA27rHoC7O/V/KEVsQY3eGAB5VSWiSOPvph3ZuPtJjWBlHZe2RbxI Qk1JUa1TixIUxetBmsr7qJKfj956V//23nbuXXGH+k3OkojFUeY1aUq2IHlMEbNDajfB oTXg== MIME-Version: 1.0 X-Received: by 10.58.122.164 with SMTP id lt4mr38825908veb.2.1397505955023; Mon, 14 Apr 2014 13:05:55 -0700 (PDT) Received: by 10.221.3.2 with HTTP; Mon, 14 Apr 2014 13:05:54 -0700 (PDT) In-Reply-To: <2C704A84-FE00-4A90-8CE4-A46629DB8F27@linkedin.com> References: <05D8FDE4-46C7-4C6F-9446-7CB89BA440CB@aol.com> <2C704A84-FE00-4A90-8CE4-A46629DB8F27@linkedin.com> Date: Mon, 14 Apr 2014 13:05:54 -0700 Message-ID: Subject: Re: [DISCUSS] 0.8.0 release and next roadmap From: Min Zhou To: dev@tajo.apache.org Content-Type: multipart/alternative; boundary=047d7b2ed25505536704f70636b9 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2ed25505536704f70636b9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Until today realized that my reply haven't been sent. +1 Totally agree with Hyunsik. 0.9 is more appropriate for the next release. Min On Mon, Apr 14, 2014 at 12:31 PM, David Chen wrote: > +1 > > I agree with Hyunsik as well. I think since 1.0 increments the major > version number, it should be used for a particularly significant release.= :) > > Thanks, > David > > > On Apr 13, 2014, at 7:51 PM, Alvin Henrick wrote: > > > +1 Hyunsik. > > > > Thanks! > > Warm Regards, > > Alvin. > > > > On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: > > > >> Hi folks, > >> > >> I'd like to discuss the next version number. In Jira, we have > provisionally > >> used 1.0, and we didn't decide the next major version. I propose 0.9 a= s > the > >> next major version. What do you think about this? > >> > >> Regards, > >> Hyunsik > >> > >> > >> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son > wrote: > >> > >>> Min, thanks for reminding us! > >>> It's a mandatory issue. > >>> We need to implement that feature ASAP. > >>> > >>> Thanks, > >>> Jihoon > >>> > >>> > >>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi : > >>> > >>>> Min, > >>>> > >>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank > you > >>>> for reminding me. It would be achieved by modifying Query class to > >>> execute > >>>> independent execution blocks in parallel. I'll add it to the wiki. > >>>> > >>>> Thanks, > >>>> Hyunsik > >>>> > >>>> > >>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou > wrote: > >>>> > >>>>> Yeah.. Another issue, seems a query like A join B. Tajo will scan = A > at > >>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in > >>>>> parallel, right? > >>>>> > >>>>> > >>>>> Min > >>>>> > >>>>> > >>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi > >>>> wrote: > >>>>> > >>>>>> I've just updated the roadmap page. Please take a look at the > section > >>>>>> 'After 0.8.0' > >>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap > >>>>>> > >>>>>> If there are missed or additional ideas, feel free to add them on > >>> that > >>>>>> page or suggest them here. After we discuss them more, we would > >>> decide > >>>>>> their priorities. > >>>>>> > >>>>>> Best regards, > >>>>>> Hyunsik > >>>>>> > >>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi > >>>>> wrote: > >>>>>>> Hi Hyoungjun, > >>>>>>> > >>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provi= de > >>>>>>> users with some prepared benchmark environment, users can test Ta= jo > >>>>>>> easily. I'll file your idea on the wiki. Thank you for your > >>>>>>> suggestion. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Hyunsik > >>>>>>> > >>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, =EA=B9=80=ED=98=95=EC=A4=80 wrote: > >>>>>>>> Hi Hyunsik , > >>>>>>>> > >>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script > >>> like > >>>>> hive > >>>>>>>> and impala is more helpful to test. > >>>>>>>> > >>>>>>>> https://github.com/rxin/TPC-H-Hive > >>>>>>>> https://github.com/cartershanklin/hive-testbench > >>>>>>>> https://github.com/cloudera/impala-tpcds-kit > >>>>>>>> > >>>>>>>> Thanks! > >>>>>>>> Hyoungjun > >>>>>>>> > >>>>>>>> > >>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi : > >>>>>>>> > >>>>>>>>> Hi Jihoon, > >>>>>>>>> > >>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I file= d > >>> it > >>>>> on > >>>>>> the > >>>>>>>>> wiki. > >>>>>>>>> > >>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities > >>> to > >>>>>>>>> logical planning and distributed query planning. But, I'm not > >>> sure > >>>> it > >>>>>>>>> can be included in short-term roadmap. They are necessary, but > >>> they > >>>>>>>>> are not required right now. In my view, it would be reasonable = to > >>>>>>>>> schedule them on long-term roadmap. > >>>>>>>>> > >>>>>>>>> Warm regards, > >>>>>>>>> Hyunsik > >>>>>>>>> > >>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son >>>> > >>>>>> wrote: > >>>>>>>>>> Hi Hyunsik, > >>>>>>>>>> I'm very glad that we can release the next version, soon. > >>>>>>>>>> Also, appreciate for the guideline of the next roadmap. > >>>>>>>>>> > >>>>>>>>>> Addition to the aforementioned features, I have the two > >>>>> suggestions. > >>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I > >>>>>> started it > >>>>>>>>>> quite a long time ago, but it is delayed due to the lower > >>>> priority > >>>>>> than > >>>>>>>>>> other stability issues. But, since this operator is widely use= d > >>>> in > >>>>>>>>> analytic > >>>>>>>>>> applications, we need to add this feature as soon as possible. > >>>> So, > >>>>>> in my > >>>>>>>>>> opinion, it would be good to add this feature to the next > >>>> roadmap. > >>>>>>>>>> > >>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issu= e > >>>> for > >>>>>>>>> making > >>>>>>>>>> the query plan more flexible. After that, we can employ the > >>>> plenty > >>>>>>>>>> optimization opportunities like described in TAJO-161. > >>>>>>>>>> > >>>>>>>>>> How do you guys think about these issues? > >>>>>>>>>> > >>>>>>>>>> Best Regards, > >>>>>>>>>> Jihoon > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi : > >>>>>>>>>> > >>>>>>>>>>> Hi folks, > >>>>>>>>>>> > >>>>>>>>>>> I'm very happy to see that our community is growing! Also, > >>> It's > >>>> a > >>>>>>>>> pleasure > >>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested > >>> various > >>>>>>>>> features > >>>>>>>>>>> in various contexts, and tried to figure out if there are any > >>>>>> critical > >>>>>>>>>>> problems. I think that there are only a few issues and we can > >>>>>> release > >>>>>>>>> 0.8.0 > >>>>>>>>>>> next week. If there are further issues to be solved before th= e > >>>>> 0.8.0 > >>>>>>>>>>> release, feel free to suggest ideas. > >>>>>>>>>>> > >>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to an= y > >>>>>>>>> suggestion > >>>>>>>>>>> from users, contributors, and committers. Please fire away! > >>>>>>>>>>> > >>>>>>>>>>> I'm thinking that our next stage should focus on improving th= e > >>>> way > >>>>>> Tajo > >>>>>>>>>>> runs in thousands of large cluster nodes and for a number of > >>>>>> concurrent > >>>>>>>>>>> users. The key issues associated with this include the > >>>> following: > >>>>>>>>>>> > >>>>>>>>>>> * High availability > >>>>>>>>>>> * Multi-tenancy scheduling > >>>>>>>>>>> * More stability > >>>>>>>>>>> * Improved shuffle > >>>>>>>>>>> > >>>>>>>>>>> The current work status is as follows. Min is working on > >>> Tajo's > >>>>> new > >>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As > >>> far > >>>>> as I > >>>>>>>>> know, > >>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys > >>>>>> including > >>>>>>>>>>> myself are investigating and solving the issues which occur i= n > >>>>> large > >>>>>>>>>>> clusters. These issues should be solved in order to make Tajo > >>> a > >>>>>> complete > >>>>>>>>>>> enterprise-ready production. > >>>>>>>>>>> > >>>>>>>>>>> In addition, there are some SQL feature support issues. Many > >>>>>> analytic > >>>>>>>>>>> problems require window functions. Also, in-subquery and > >>> scalar > >>>>>> subquery > >>>>>>>>>>> should be supported. So, I'd like to schedule them with high > >>>>>> priority. > >>>>>>>>> In > >>>>>>>>>>> my view, there will be very few SQL support issues if Tajo > >>>>> provides > >>>>>>>>> these > >>>>>>>>>>> features. > >>>>>>>>>>> > >>>>>>>>>>> Besides those areas, David is working on a nested schema and > >>> its > >>>>>> related > >>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because > >>> it > >>>>>>>>> requires a > >>>>>>>>>>> lot of hard work. So, it would be great to schedule the neste= d > >>>>>> schema > >>>>>>>>>>> loosely. That's just my thoughts, anyhow. > >>>>>>>>>>> > >>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest > >>>> that > >>>>>> we > >>>>>>>>> need > >>>>>>>>>>> to release more frequently after the 0.8.0 release. So far, > >>>> there > >>>>>> has > >>>>>>>>> been > >>>>>>>>>>> a long period between each release because Tajo is undergoing > >>>>> heavy > >>>>>>>>>>> development. By 'releasing early, releasing often', we will > >>> make > >>>>>> more > >>>>>>>>>>> tighter feedback loop between users and developers. > >>>>>>>>>>> > >>>>>>>>>>> I think that there are many additional many interesting issue= s > >>>> to > >>>>> be > >>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We > >>> will > >>>>>> arrange > >>>>>>>>>>> our short-term roadmap and long-term roadmap based on your > >>>>>> suggestions. > >>>>>>>>>>> > >>>>>>>>>>> Thank you all so much for your contribution! > >>>>>>>>>>> > >>>>>>>>>>> Warm Regards, > >>>>>>>>>>> Hyunsik > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Tajo - Big Data Warehouse System on Hadoop > >>>>>>>> http://tajo.apache.org/ > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> My research interests are distributed systems, parallel computing a= nd > >>>>> bytecode based virtual machine. > >>>>> > >>>>> My profile: > >>>>> http://www.linkedin.com/in/coderplay > >>>>> My blog: > >>>>> http://coderplay.javaeye.com > >>>>> > >>>> > >>> > > > > --=20 My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com --047d7b2ed25505536704f70636b9--