tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinho Kim <jh...@apache.org>
Subject Re: Some Tajo-0.9.0 questions
Date Fri, 16 Jan 2015 06:50:30 GMT
Hello Azuyy yu

I left some comments.

Best regards

2015-01-16 14:37 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:

> Hi,
> I tested Tajo before half a year, then not focus on Tajo because some other
> works.
> then I setup a small dev Tajo cluster this week.(six nodes, VM) based on
> Hadoop-2.6.0.
> so my questions is:
> 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn scheduler
> to manage  job resources. but now I found it doesn't rely on Yarn, because
> I only start HDFS daemons, no yarn daemons. so Tajo has his own job
> sheduler ?
Now, tajo does using own task scheduler. and  You can start tajo without
Yarn daemons
Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html

> 2) Does that we need to put the file replications on every nodes on Tajo
> cluster?

No, tajo does not need more replication.  if you set more replication, data
locality can be increased

such as I have a six nodes Tajo cluster, then should I set HDFS block
> replication to six? because:
> I noticed when I run Tajo query, some nodes are busy, but some is free.
> because the file's blocks are only located on these nodes. non others.
In my opinion, you need to run balancer

3)the test data set is 4 million rows. nearly several GB. but it's very
> slow when I runing: select count(distinct ID) from ****;
> Any possible problems here?

Could you share tajo-env.sh, tajo-site.xml ?

> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message