tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Date Mon, 09 Sep 2013 16:39:27 GMT
Thank you for more detailed information. Are these problems caused by
your working source?

If so, how can I access your recent working source? Your github?

Actually, the recommended way for sharing your problem is as follows:

* create an Jira issue
* submit your patch or your github revision url
* describe your problem (your attached file is already satisfied)

Best regards,
Hyunsik Choi

On Mon, Sep 9, 2013 at 10:04 PM, camelia c <camelie_1985@yahoo.com> wrote:
> Hello,
>
> I send You an archive with the 3 problems encountered so far with the
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java
>
> Please be kind to help me solve them.
>
> For each problem there is a separate folder in the archive, containing the
> query, the problem, the TAJO output, the logical plan of MasterLOG and the
> worker's log.
>
> To summarize:
> Problem 1) partial output and
>
> java.lang.NullPointerException
>     at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
>     at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
>     at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
>     at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)
>
> , even if the physical operator's next method returns correct and complete
> results.
>
> Problem 2) incorrect values in tuples received from child nodes
>
> Problem 3) unexpected stop receiving values and
> ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space
>
> The dataset is also concatenated in a separate data file in the archive.
>
>
> Thank You very much!
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hyunsik@apache.org>
> To: tajo-dev <dev@tajo.incubator.apache.org>; camelia c
> <camelie_1985@yahoo.com>
> Sent: Monday, September 9, 2013 3:52 AM
>
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
> Hi Camelia,
>
> Could you let me know as follows? If so, it's easier to investigate the
> problem.
>
> * your submitted SQL query
> * which physical operator (NLJoin or MergeJoin?)
> * (if possible) data sample that reproduces the problem
>
> Best regards,
> Hyunsik
>
>
> On Mon, Sep 9, 2013 at 7:30 AM, camelia c <camelie_1985@yahoo.com> wrote:
>> A small addition to the previous message:
>>
>> The value obtained with
>>
>>    innerTuple = rightChild.next();
>>
>>
>> is in the join operator.
>>
>>
>> Camelia
>>
>>
>> ----- Forwarded Message -----
>> From: camelia c <camelie_1985@yahoo.com>
>> To: "dev@tajo.incubator.apache.org" <dev@tajo.incubator.apache.org>
>> Sent: Monday, September 9, 2013 1:25 AM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>>
>> Hello,
>>
>> Thank You very much for You helpful answer of yesterday!
>>
>> While testing, I encountered the following issue: the null values which
>> are read from files are sometimes randomly replaced by numbers such as 24 or
>> 29 or 30. This makes a serious problem for the algorithms! Can You please
>> tell me why do do think this happens and how can it be corrected?
>>
>>
>> Let me give You an example
>>
>> create external table emp1 (emp_id int, first_name text, last_name text,
>> dep_id int, salary float, job_id int) using csv with
>> ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>>
>>
>>
>> I specify null values in file like this:
>>
>> 1000,Tom,Smith,10,333,100
>> 1001,Mary,Thompson,10,555,
>> 1002,Aron,Weber,,777,100
>> 1003,Susan,Carlson,,999,
>>
>> Both the internal nulls and the trailing nulls(those at the end of line)
>> are sometimes  randomly substituted with a small number; for example
>> (last_name, salary, emp_id, dep_id) was read from file with
>>
>> innerTuple = rightChild.next();
>>
>> obtaining values innerTuple.toString() as :
>>
>>
>> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>>
>>
>> Sometimes, in other queries the null value is correctly read as NULL.
>>
>>
>>
>> Thank You in advance!
>>
>> Yours sincerely,
>> Camelia
>>
>>
>>
>>
>> ________________________________
>>  From: Hyunsik Choi <hyunsik@apache.org>
>> To: tajo-dev <dev@tajo.incubator.apache.org>; camelia c
>> <camelie_1985@yahoo.com>
>> Sent: Saturday, September 7, 2013 6:00 PM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>> Hi camelia,
>>
>> I'm sorry for late response. I've just came back home from the family
>> meeting. I leave in-line comments on your question.
>>
>> Best regards,
>> Hyunsik
>>
>>
>> On Sep 7, 2013, at 8:42 PM, camelia c <camelie_1985@yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I resend You an updated list of questions that I have. For some of the
>>> ancient ones, I found the answer already.
>>>
>>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and
>>> outerTupleSlots and can You please give me an example of how they are
>>> filled, based on a dummy data set ?
>>
>> Merge join forwards each relation in order
>>  to find the same join key
>> tuples. Each of them keeps a list of tuples whose join keys are same.
>> Consider the below examples where there are two relations to be joined
>> and the first column of each relation is the join key.
>>
>> -----------------------------------
>> Two relations to be joined
>> -----------------------------------
>> Left                Right
>> (1,  A)            (1, B)
>> (1, C)            (1, C)
>> (3, D)            (1, D)
>>                      (2, E)
>>
>>
>> MergeJoin first finds all the same key tuples for each relation. So,
>> each tuple slot contains as follows:
>>
>> outerTupleSlots : (1, A), (1,C)
>> innerTupleSlots : (1,B), (1, C), (1,D)
>>
>> Then, MergeJoin leads to joined tuples. In the above example,
>> MergeJoin
>>  results in 6 tuples (2 x 3).
>>
>>>
>>> 2) I understood from a talk that the MergeJoinExec has some issues and
>>> that Mr Jihoon is trying to fix them. Can I rely on the current version of
>>> MergeJoinExec to extend it for FullOuter_MergeJoinExec and
>>> RightOuter_MergeJoinExec?
>>
>> MergeJoinExec does not have any problem. It is correct. There was a
>> misunderstood.
>>
>>>
>>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain
>>> the block name containing it?
>>> Even for a single-block query, how do we find for a JoinNode that it
>>> belongs to @ROOT, for example?
>>>
>>> More precisely, in class OuterJoinRewriteRule, in method
>>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode,
>>> Stack<LogicalNode> stack, Integer depth)
>>>
>>> I tried to do
>>>    plan.getBlock(joinNode).getName()
>>> but I receive a Null Pointer Exception.
>>>
>>
>> The
>>  current API cannot what you want. The API needs to be improved for
>> supporting that. Probably, that is archived by modifying
>> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
>> method with some object including a current block name. I'll create a
>> jira issue for this improvement.
>>
>>
>>>
>>>
>>> I look forward to receiving Your answer!
>>>
>>> Yours sincerely,
>>> Camelia
>

Mime
View raw message