Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0EA3C10C80 for ; Mon, 9 Sep 2013 00:52:42 +0000 (UTC) Received: (qmail 13540 invoked by uid 500); 9 Sep 2013 00:52:41 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 13511 invoked by uid 500); 9 Sep 2013 00:52:41 -0000 Mailing-List: contact dev-help@tajo.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.incubator.apache.org Delivered-To: mailing list dev@tajo.incubator.apache.org Received: (qmail 13502 invoked by uid 99); 9 Sep 2013 00:52:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Sep 2013 00:52:41 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 09 Sep 2013 00:52:38 +0000 Received: (qmail 13495 invoked by uid 99); 9 Sep 2013 00:52:16 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Sep 2013 00:52:16 +0000 Received: from localhost (HELO mail-pd0-f171.google.com) (127.0.0.1) (smtp-auth username hyunsik, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Sep 2013 00:52:16 +0000 Received: by mail-pd0-f171.google.com with SMTP id g10so5455426pdj.2 for ; Sun, 08 Sep 2013 17:52:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=FPNsWwIPceoGrRl55ZCmnb/RgZctIYChMNG7HXx3OJs=; b=mYy5ruTv9K9wJ8mYXjWkrI0s37fM19PRxnnCVwfCJJQln4EFEKhYHY9Vv49s6oFnPZ utoU7uugBrMpaEATZ77Qj53NaQQ2h1JRW2J0yf4WQ6FCUM6dIeIxS3T5PXS7N0zLSrJJ 9by/GmEw/1cWAm+6cIGCMSm3ewbx0OsisW/I2o94q1Dyqh2rDMeazgZmtJwMiUJn6HtK HaG5I/04uBJqw+oTTs1/qHw89fcu6ryuJseZqoJhaGVIHvYn5AZLeYRczxB80sXQX/bS vPgMpFZjwz2zPWsTbVvsQ8jEIfmsP/3wDK6Bhs8dMmNlD7Qg7O9SfQ9mZ0cF+dlcCxKS fpLw== MIME-Version: 1.0 X-Received: by 10.68.134.98 with SMTP id pj2mr16131665pbb.110.1378687935781; Sun, 08 Sep 2013 17:52:15 -0700 (PDT) Received: by 10.70.25.5 with HTTP; Sun, 8 Sep 2013 17:52:15 -0700 (PDT) In-Reply-To: <1378679445.62575.YahooMailNeo@web161703.mail.bf1.yahoo.com> References: <1373631223.91850.YahooMailBasic@web161705.mail.bf1.yahoo.com> <1373920994.50344.YahooMailNeo@web161704.mail.bf1.yahoo.com> <017413AD-E095-428B-B79D-0CA442ED3A90@apache.org> <1376241442.48824.YahooMailNeo@web161703.mail.bf1.yahoo.com> <1377849725.33835.YahooMailNeo@web161703.mail.bf1.yahoo.com> <1378463502.14691.YahooMailNeo@web161702.mail.bf1.yahoo.com> <1378554149.94829.YahooMailNeo@web161702.mail.bf1.yahoo.com> <1378679128.76395.YahooMailNeo@web161702.mail.bf1.yahoo.com> <1378679445.62575.YahooMailNeo@web161703.mail.bf1.yahoo.com> Date: Mon, 9 Sep 2013 09:52:15 +0900 Message-ID: Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec From: Hyunsik Choi To: tajo-dev , camelia c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Camelia, Could you let me know as follows? If so, it's easier to investigate the pro= blem. * your submitted SQL query * which physical operator (NLJoin or MergeJoin?) * (if possible) data sample that reproduces the problem Best regards, Hyunsik On Mon, Sep 9, 2013 at 7:30 AM, camelia c wrote: > A small addition to the previous message: > > The value obtained with > > innerTuple =3D rightChild.next(); > > > is in the join operator. > > > Camelia > > > ----- Forwarded Message ----- > From: camelia c > To: "dev@tajo.incubator.apache.org" > Sent: Monday, September 9, 2013 1:25 AM > Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec > > > > Hello, > > Thank You very much for You helpful answer of yesterday! > > While testing, I encountered the following issue: the null values which a= re read from files are sometimes randomly replaced by numbers such as 24 or= 29 or 30. This makes a serious problem for the algorithms! Can You please = tell me why do do think this happens and how can it be corrected? > > > Let me give You an example > > create external table emp1 (emp_id int, first_name text, last_name text, = dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'= =3D',') location 'file:/home/camelia/testdata/EMP1'; > > > > I specify null values in file like this: > > 1000,Tom,Smith,10,333,100 > 1001,Mary,Thompson,10,555, > 1002,Aron,Weber,,777,100 > 1003,Susan,Carlson,,999, > > Both the internal nulls and the trailing nulls(those at the end of line) = are sometimes randomly substituted with a small number; for example (last_= name, salary, emp_id, dep_id) was read from file with > > innerTuple =3D rightChild.next(); > > obtaining values innerTuple.toString() as : > > > (0=3D>Weber, 1=3D>777.0, 2=3D>1002, 3=3D>29) > > > Sometimes, in other queries the null value is correctly read as NULL. > > > > Thank You in advance! > > Yours sincerely, > Camelia > > > > > ________________________________ > From: Hyunsik Choi > To: tajo-dev ; camelia c > Sent: Saturday, September 7, 2013 6:00 PM > Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec > > > Hi camelia, > > I'm sorry for late response. I've just came back home from the family > meeting. I leave in-line comments on your question. > > Best regards, > Hyunsik > > > On Sep 7, 2013, at 8:42 PM, camelia c wrote: > >> Hello, >> >> I resend You an updated list of questions that I have. For some of the a= ncient ones, I found the answer already. >> >> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and oute= rTupleSlots and can You please give me an example of how they are filled, b= ased on a dummy data set ? > > Merge join forwards each relation in order > to find the same join key > tuples. Each of them keeps a list of tuples whose join keys are same. > Consider the below examples where there are two relations to be joined > and the first column of each relation is the join key. > > ----------------------------------- > Two relations to be joined > ----------------------------------- > Left Right > (1, A) (1, B) > (1, C) (1, C) > (3, D) (1, D) > (2, E) > > > MergeJoin first finds all the same key tuples for each relation. So, > each tuple slot contains as follows: > > outerTupleSlots : (1, A), (1,C) > innerTupleSlots : (1,B), (1, C), (1,D) > > Then, MergeJoin leads to joined tuples. In the above example, > MergeJoin > results in 6 tuples (2 x 3). > >> >> 2) I understood from a talk that the MergeJoinExec has some issues and t= hat Mr Jihoon is trying to fix them. Can I rely on the current version of M= ergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJ= oinExec? > > MergeJoinExec does not have any problem. It is correct. There was a > misunderstood. > >> >> 3) Given a JoinNode anywhere in the logical query plan, how can we obtai= n the block name containing it? >> Even for a single-block query, how do we find for a JoinNode that it bel= ongs to @ROOT, for example? >> >> More precisely, in class OuterJoinRewriteRule, in method >> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Sta= ck stack, Integer depth) >> >> I tried to do >> plan.getBlock(joinNode).getName() >> but I receive a Null Pointer Exception. >> > > The > current API cannot what you want. The API needs to be improved for > supporting that. Probably, that is archived by modifying > BasicLogicalNodeVisitor's visitChild method to call visitXXXNode > method with some object including a current block name. I'll create a > jira issue for this improvement. > > >> >> >> I look forward to receiving Your answer! >> >> Yours sincerely, >> Camelia