tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From camelia c <camelie_1...@yahoo.com>
Subject Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Date Mon, 09 Sep 2013 13:04:20 GMT
Hello,

I send You an archive with the 3 problems encountered so far with the 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java

Please be kind to help me solve them.

For each problem there is a separate folder in the archive, containing the query, the problem,
the TAJO output, the logical plan of MasterLOG and the worker's log.

To summarize:
Problem 1) partial output and

java.lang.NullPointerException
    at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
    at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
    at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
    at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)

, even if the physical operator's next method returns correct and complete results.

Problem 2) incorrect values in tuples received from child nodes

Problem 3) unexpected stop receiving values and 
ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space

The dataset is also concatenated in a separate data file in the archive.


Thank You very much!
Camelia




________________________________
 From: Hyunsik Choi <hyunsik@apache.org>
To: tajo-dev <dev@tajo.incubator.apache.org>; camelia c <camelie_1985@yahoo.com>

Sent: Monday, September 9, 2013 3:52 AM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 

Hi Camelia,

Could you let me know as follows? If so, it's easier to investigate the problem.

* your submitted SQL query
* which physical operator (NLJoin or MergeJoin?)
* (if possible) data sample that reproduces the problem

Best regards,
Hyunsik


On Mon, Sep 9, 2013 at 7:30 AM, camelia c <camelie_1985@yahoo.com> wrote:
> A small addition to the previous message:
>
> The value obtained with
>
>    innerTuple = rightChild.next();
>
>
> is in the join operator.
>
>
> Camelia
>
>
> ----- Forwarded Message -----
> From: camelia c <camelie_1985@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <dev@tajo.incubator.apache.org>
> Sent: Monday, September 9, 2013 1:25 AM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
>
> Hello,
>
> Thank You very much for You helpful answer of yesterday!
>
> While testing, I encountered the following issue: the null values which are read from
files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious
problem for the algorithms! Can You please tell me why do do think this happens and how can
it be corrected?
>
>
> Let me give You an example
>
> create external table emp1 (emp_id int, first_name text, last_name text, dep_id int,
salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>
>
>
> I specify null values in file like this:
>
> 1000,Tom,Smith,10,333,100
> 1001,Mary,Thompson,10,555,
> 1002,Aron,Weber,,777,100
> 1003,Susan,Carlson,,999,
>
> Both the internal nulls and the trailing nulls(those at the end of line) are sometimes 
randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id)
was read from file with
>
> innerTuple = rightChild.next();
>
> obtaining values innerTuple.toString() as :
>
>
> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>
>
> Sometimes, in other queries the null value is correctly read as NULL.
>
>
>
> Thank You in advance!
>
> Yours sincerely,
> Camelia
>
>
>
>
> ________________________________
>  From: Hyunsik Choi <hyunsik@apache.org>
> To: tajo-dev <dev@tajo.incubator.apache.org>; camelia c <camelie_1985@yahoo.com>
> Sent: Saturday, September 7, 2013 6:00 PM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
> Hi camelia,
>
> I'm sorry for late response. I've just came back home from the family
> meeting. I leave in-line comments on your question.
>
> Best regards,
> Hyunsik
>
>
> On Sep 7, 2013, at 8:42 PM, camelia c <camelie_1985@yahoo.com> wrote:
>
>> Hello,
>>
>> I resend You an updated list of questions that I have. For some of the ancient ones,
I found the answer already.
>>
>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots
and can You please give me an example of how they are filled, based on a dummy data set ?
>
> Merge join forwards each relation in order
>  to find the same join key
> tuples. Each of them keeps a list of tuples whose join keys are same.
> Consider the below examples where there are two relations to be joined
> and the first column of each relation is the join key.
>
> -----------------------------------
> Two relations to be joined
> -----------------------------------
> Left                Right
> (1,  A)            (1, B)
> (1, C)             (1, C)
> (3, D)             (1, D)
>                       (2, E)
>
>
> MergeJoin first finds all the same key tuples for each relation. So,
> each tuple slot contains as follows:
>
> outerTupleSlots : (1, A), (1,C)
> innerTupleSlots : (1,B), (1, C), (1,D)
>
> Then, MergeJoin leads to joined tuples. In the above example,
> MergeJoin
>  results in 6 tuples (2 x 3).
>
>>
>> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon
is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for
FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
>
> MergeJoinExec does not have any problem. It is correct. There was a
> misunderstood.
>
>>
>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block
name containing it?
>> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT,
for example?
>>
>> More precisely, in class OuterJoinRewriteRule, in method
>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode>
stack, Integer depth)
>>
>> I tried to do
>>     plan.getBlock(joinNode).getName()
>> but I receive a Null Pointer Exception.
>>
>
> The
>  current API cannot what you want. The API needs to be improved for
> supporting that. Probably, that is archived by modifying
> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
> method with some object including a current block name. I'll create a
> jira issue for this improvement.
>
>
>>
>>
>> I look forward to receiving Your answer!
>>
>> Yours sincerely,
>> Camelia
Mime
View raw message