tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "camelia_c (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TAJO-175) MergeJoinExec incorrect processing
Date Mon, 09 Sep 2013 18:07:53 GMT
camelia_c created TAJO-175:
------------------------------

             Summary: MergeJoinExec incorrect processing
                 Key: TAJO-175
                 URL: https://issues.apache.org/jira/browse/TAJO-175
             Project: Tajo
          Issue Type: Bug
          Components: physical operator
         Environment: DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.4 LTS"

Hadoop 0.20.2-cdh3u3

            Reporter: camelia_c


For query 
select dep1.dep_id, emp1.dep_id, emp1.salary from dep1 join emp1 on dep1.dep_id=emp1.dep_id;


And data:

---------------dep1

10,Purchasing,1
20,Shipping,1
30,Manufacturing,3
40,QA,6
50,Accounting,


create external table dep1 (dep_id int, dep_name text, loc_id int) using csv with ('csvfile.delimiter'=',')
location 'file:/home/camelia/testdata/DEP1';

----------------- emp1

1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,

create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary
float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';


-------------------------------------------------

With the original MergeJoinExec, with logging info messages inserted along the processing
steps, it doesn't output any result and it reads wrong values (12 instead of NULL)

13/09/09 20:46:01 INFO physical.MergeJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)

13/09/09 20:46:01 INFO physical.MergeJoinExec: ********rightChild.next() =(0=>777.0, 1=>12)


The TAJO output is :

tajo> select dep1.dep_id, emp1.dep_id, emp1.salary from dep1 join emp1 on dep1.dep_id=emp1.dep_id;
2013-09-09 20:45:52,947 INFO  client.TajoClient (TajoClient.java:connectionToQueryMaster(190))
- Connected to Query Master (qid=q_1378748585102_0001, addr=127.0.1.1:8091)
Progress: 0%, response time: 1.036 sec
Progress: 0%, response time: 2.04 sec
Progress: 0%, response time: 3.042 sec
Progress: 0%, response time: 4.045 sec
Progress: 0%, response time: 5.047 sec
Progress: 0%, response time: 6.049 sec
Progress: 0%, response time: 7.05 sec
Progress: 0%, response time: 8.052 sec
Progress: 100%, response time: 8.32 sec
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/camelia/tajo_git/incubator-tajo/tajo-dist/target/tajo-0.2.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.3-alpha/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2013-09-09 20:46:02,513 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62))
- Unable to load native-hadoop library for your platform... using builtin-java classes where
applicable
2013-09-09 20:46:02,782 INFO  rpc.NettyClientBase (NettyClientBase.java:close(87)) - Proxy
is disconnected from 127.0.1.1:8091
2013-09-09 20:46:02,784 INFO  client.TajoClient (TajoClient.java:closeQuery(113)) - Closed
a QueryMaster connection (qid=q_1378748585102_0001, addr=mmm2/127.0.1.1:8091)
final state: QUERY_SUCCEEDED, init time: 1.61 sec, execution time: 0.0 sec, total response
time: 8.32 sec
result: file:/home/camelia/tajo/q_1378748585102_0001

dep_id,  dep_id,  salary
-------------------------------
tajo> 



I shall attach archive with logs data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message