tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Date Mon, 09 Sep 2013 00:52:15 GMT
Hi Camelia,

Could you let me know as follows? If so, it's easier to investigate the problem.

* your submitted SQL query
* which physical operator (NLJoin or MergeJoin?)
* (if possible) data sample that reproduces the problem

Best regards,
Hyunsik


On Mon, Sep 9, 2013 at 7:30 AM, camelia c <camelie_1985@yahoo.com> wrote:
> A small addition to the previous message:
>
> The value obtained with
>
>    innerTuple = rightChild.next();
>
>
> is in the join operator.
>
>
> Camelia
>
>
> ----- Forwarded Message -----
> From: camelia c <camelie_1985@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <dev@tajo.incubator.apache.org>
> Sent: Monday, September 9, 2013 1:25 AM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
>
> Hello,
>
> Thank You very much for You helpful answer of yesterday!
>
> While testing, I encountered the following issue: the null values which are read from
files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious
problem for the algorithms! Can You please tell me why do do think this happens and how can
it be corrected?
>
>
> Let me give You an example
>
> create external table emp1 (emp_id int, first_name text, last_name text, dep_id int,
salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>
>
>
> I specify null values in file like this:
>
> 1000,Tom,Smith,10,333,100
> 1001,Mary,Thompson,10,555,
> 1002,Aron,Weber,,777,100
> 1003,Susan,Carlson,,999,
>
> Both the internal nulls and the trailing nulls(those at the end of line) are sometimes
 randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id)
was read from file with
>
> innerTuple = rightChild.next();
>
> obtaining values innerTuple.toString() as :
>
>
> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>
>
> Sometimes, in other queries the null value is correctly read as NULL.
>
>
>
> Thank You in advance!
>
> Yours sincerely,
> Camelia
>
>
>
>
> ________________________________
>  From: Hyunsik Choi <hyunsik@apache.org>
> To: tajo-dev <dev@tajo.incubator.apache.org>; camelia c <camelie_1985@yahoo.com>
> Sent: Saturday, September 7, 2013 6:00 PM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
> Hi camelia,
>
> I'm sorry for late response. I've just came back home from the family
> meeting. I leave in-line comments on your question.
>
> Best regards,
> Hyunsik
>
>
> On Sep 7, 2013, at 8:42 PM, camelia c <camelie_1985@yahoo.com> wrote:
>
>> Hello,
>>
>> I resend You an updated list of questions that I have. For some of the ancient ones,
I found the answer already.
>>
>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots
and can You please give me an example of how they are filled, based on a dummy data set ?
>
> Merge join forwards each relation in order
>  to find the same join key
> tuples. Each of them keeps a list of tuples whose join keys are same.
> Consider the below examples where there are two relations to be joined
> and the first column of each relation is the join key.
>
> -----------------------------------
> Two relations to be joined
> -----------------------------------
> Left                Right
> (1,  A)            (1, B)
> (1, C)             (1, C)
> (3, D)             (1, D)
>                       (2, E)
>
>
> MergeJoin first finds all the same key tuples for each relation. So,
> each tuple slot contains as follows:
>
> outerTupleSlots : (1, A), (1,C)
> innerTupleSlots : (1,B), (1, C), (1,D)
>
> Then, MergeJoin leads to joined tuples. In the above example,
> MergeJoin
>  results in 6 tuples (2 x 3).
>
>>
>> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon
is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for
FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
>
> MergeJoinExec does not have any problem. It is correct. There was a
> misunderstood.
>
>>
>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block
name containing it?
>> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT,
for example?
>>
>> More precisely, in class OuterJoinRewriteRule, in method
>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode>
stack, Integer depth)
>>
>> I tried to do
>>     plan.getBlock(joinNode).getName()
>> but I receive a Null Pointer Exception.
>>
>
> The
>  current API cannot what you want. The API needs to be improved for
> supporting that. Probably, that is archived by modifying
> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
> method with some object including a current block name. I'll create a
> jira issue for this improvement.
>
>
>>
>>
>> I look forward to receiving Your answer!
>>
>> Yours sincerely,
>> Camelia

Mime
View raw message