tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinho Kim <jh...@apache.org>
Subject Re: Some Tajo-0.9.0 questions
Date Fri, 16 Jan 2015 06:50:30 GMT
Hello Azuyy yu

I left some comments.

-Jinho
Best regards

2015-01-16 14:37 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:

> Hi,
>
> I tested Tajo before half a year, then not focus on Tajo because some other
> works.
>
> then I setup a small dev Tajo cluster this week.(six nodes, VM) based on
> Hadoop-2.6.0.
>
> so my questions is:
>
> 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn scheduler
> to manage  job resources. but now I found it doesn't rely on Yarn, because
> I only start HDFS daemons, no yarn daemons. so Tajo has his own job
> sheduler ?
>
>
Now, tajo does using own task scheduler. and  You can start tajo without
Yarn daemons
Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html


>
> 2) Does that we need to put the file replications on every nodes on Tajo
> cluster?
>

No, tajo does not need more replication.  if you set more replication, data
locality can be increased

such as I have a six nodes Tajo cluster, then should I set HDFS block
> replication to six? because:
>
> I noticed when I run Tajo query, some nodes are busy, but some is free.
> because the file's blocks are only located on these nodes. non others.
>
>
In my opinion, you need to run balancer
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer


3)the test data set is 4 million rows. nearly several GB. but it's very
> slow when I runing: select count(distinct ID) from ****;
> Any possible problems here?
>

Could you share tajo-env.sh, tajo-site.xml ?


>
>
> Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message