tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinho Kim <jh...@apache.org>
Subject Re: I have question with tajo query
Date Tue, 12 Nov 2013 01:58:16 GMT
Hi Jae,

Could you try to do the following? I'm suspecting that some column values
may contain Hive's default null character '\\N'.

CREATE EXTERNAL TABLE tablename( A int, B text, C text)

using csv with('csvfile.delimiter'='\001','csvfile.null'='\\N')


Besides, in Hive, some value that cannot be parsed are dealt as NULL. But,
in the current Tajo, they causes errors.


Thanks
-Jinho


2013/11/12 Jae Lee <otooiland@gmail.com>

> Hi Jihoon,
>
> Thank you for your answer.
> About Q1 has more question.
> I already waiting for query result long time. I think that is not normaly.
> COUNT(*) query got result only 300sec, but DISTINCT, GROUPBY and SUM query
> is excuting whole day.
> I found another error message. Please see below message.
> Error message is about integer type column name of "Year".
> The query was "select distinct year from departuredelay;"
> I was execute same query on Hive. It had no error.
> But Year column has some null or blank data.
> Table was create EXTERNAL table with several CSV files on HDFS.
> ---------------------------------------------------------------------------
>
> 2013-11-11 18:34:01,436 ERROR worker.Task (Task.java:run(363)) -
> java.lang.NumberFormatException: For input string: "Year"
> at
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>  at java.lang.Integer.parseInt(Integer.java:492)
> at java.lang.Integer.valueOf(Integer.java:582)
> at org.apache.tajo.datum.DatumFactory.createInt4(DatumFactory.java:140)
>  at org.apache.tajo.storage.LazyTuple.createByTextBytes(LazyTuple.java:313)
> at org.apache.tajo.storage.LazyTuple.get(LazyTuple.java:126)
>  at org.apache.tajo.engine.eval.FieldEval.eval(FieldEval.java:58)
> at org.apache.tajo.engine.planner.Projector.eval(Projector.java:87)
>  at
>
> org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:111)
> at
>
> org.apache.tajo.engine.planner.physical.HashAggregateExec.compute(HashAggregateExec.java:57)
>  at
>
> org.apache.tajo.engine.planner.physical.HashAggregateExec.next(HashAggregateExec.java:83)
> at
>
> org.apache.tajo.engine.planner.physical.PartitionedStoreExec.next(PartitionedStoreExec.java:121)
>  at org.apache.tajo.worker.Task.run(Task.java:355)
> at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:376)
>  at java.lang.Thread.run(Thread.java:744)
>
>
> -----------------------------------------------------------------------------
>
>
>
> Also I attach file tajo-site.xml.
> Please check my config is correct.
>
> hostname | hadoop | tajo | DBMS
> namenode | namenode | master | Maria (Metastore)
> snamenode | snamenode+datanode1 | worker
> datanode02 | datanode2 | worker
> datanode03 | datanode3 | worker
>
>
> -----------------------------------------------------------------------------
> <configuration>
> <property>
>     <name>tajo.rootdir</name>
>     <value>hdfs://namenode:9000/tajo</value>
> </property>
> <property>
>     <name>tajo.master.umbilical-rpc.address</name>
>     <value>namenode:26001</value>
> </property>
> <property>
>     <name>tajo.master.client-rpc.address</name>
>     <value>namenode:26002</value>
> </property>
> <property>
>     <name>tajo.master.info-http.address</name>
>     <value>namenode:26080</value>
> </property>
> <property>
>     <name>tajo.catalog.client-rpc.address</name>
>     <value>namenode:26005</value>
> </property>
> </configuration>
>
>
> Regards,
> Jae
>
>
> 2013/11/11 Jihoon Son <ghoonson@gmail.com>
>
> > Hi Jae Lee,
> > thanks for your interesting to Tajo.
> >
> > Here are my answers.
> >
> > 1. The timeout message looks like an error, but it does not mean that the
> > query is failed. (We should change the message.)
> > Would you wait for some time after executing a query, please?
> > If any other errors occur, please report it to us.
> >
> > 2. Tajo's SQL commands are designed to follow those of traditional
> > relational databases.
> > In those systems, the 'DROP table' command deletes data from disks.
> > However, we are also considering the Hive-style 'DROP table', because
> > tables are generally very large.
> >
> > 3. Tajo currently does not provide any commands to kill executing
> queries.
> > Instead, you should kill the master and every worker using the unix
> 'kill'
> > command.
> >
> > If you have any other questions,
> > please feel free to ask us.
> >
> > Thanks,
> > Jihoon
> >
> >
> > 2013/11/11 Jae Lee <otooiland@gmail.com>
> >
> > > Hello,
> > >
> > > :: I have error message and hang query with below.
> > > It's from clustered tajo worker.
> > > Centos 6.2 + hadoop 2.0.5 + tajo 0.2.0
> > > Just count(*) query is working but  use distinct or group by query had
> > hang
> > > and this error messages
> > >
> > > :: have more question
> > > Tajo delete files on hdfs when i drop EXTERNAL table. is it normal?
> > > Because Hive is not delete files when drop external table.
> > >
> > > :: How to can i kill tajo jobs (query)?
> > >
> > > ---------------------------------------------------------------------
> > > 2013-11-11 18:44:22,751 WARN  worker.TaskRunner
> > (TaskRunner.java:run(339))
> > > - Timeout
> > >
> > >
> >
> GetTask:eb_1384155011466_0005_000001,container_1384155011466_0005_01_000013,
> > > but retry
> > > java.util.concurrent.TimeoutException
> > > at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:81)
> > > at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:328)
> > >  at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > Regards,
> > > Jae
> > >
> >
> >
> >
> > --
> > Jihoon Son
> >
> > Database & Information Systems Group,
> > Prof. Yon Dohn Chung Lab.
> > Dept. of Computer Science & Engineering,
> > Korea University
> > 1, 5-ga, Anam-dong, Seongbuk-gu,
> > Seoul, 136-713, Republic of Korea
> >
> > Tel : +82-2-3290-3580
> > E-mail : jihoonson@korea.ac.kr
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message