tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Some Tajo-0.9.0 questions
Date Fri, 16 Jan 2015 07:02:54 GMT
Thanks Kim.

The following is my tajo-env and tajo-site

*tajo-env.sh:*
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/local/java
_TAJO_OPTS="-server -verbose:gc
  -XX:+PrintGCDateStamps
  -XX:+PrintGCDetails
  -XX:+UseGCLogFileRotation
  -XX:NumberOfGCLogFiles=9
  -XX:GCLogFileSize=256m
  -XX:+DisableExplicitGC
  -XX:+UseCompressedOops
  -XX:SoftRefLRUPolicyMSPerMB=0
  -XX:+UseFastAccessorMethods
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:+CMSParallelRemarkEnabled
  -XX:CMSInitiatingOccupancyFraction=70
  -XX:+UseCMSCompactAtFullCollection
  -XX:CMSFullGCsBeforeCompaction=0
  -XX:+CMSClassUnloadingEnabled
  -XX:CMSMaxAbortablePrecleanTime=300
  -XX:+CMSScavengeBeforeRemark
  -XX:PermSize=160m
  -XX:GCTimeRatio=19
  -XX:SurvivorRatio=2
  -XX:MaxTenuringThreshold=60"
_TAJO_MASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
_TAJO_WORKER_OPTS="$_TAJO_OPTS -Xmx3g -Xms3g -Xmn1g"
_TAJO_QUERYMASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
export TAJO_OPTS=$_TAJO_OPTS
export TAJO_MASTER_OPTS=$_TAJO_MASTER_OPTS
export TAJO_WORKER_OPTS=$_TAJO_WORKER_OPTS
export TAJO_QUERYMASTER_OPTS=$_TAJO_QUERYMASTER_OPTS
export TAJO_LOG_DIR=${TAJO_HOME}/logs
export TAJO_PID_DIR=${TAJO_HOME}/pids
export TAJO_WORKER_STANDBY_MODE=true

*tajo-site.xml:*

<configuration>
  <property>
    <name>tajo.rootdir</name>
    <value>hdfs://test-cluster/tajo</value>
  </property>
  <property>
    <name>tajo.master.umbilical-rpc.address</name>
    <value>10-0-86-51:26001</value>
  </property>
  <property>
    <name>tajo.master.client-rpc.address</name>
    <value>10-0-86-51:26002</value>
  </property>
  <property>
    <name>tajo.resource-tracker.rpc.address</name>
    <value>10-0-86-51:26003</value>
  </property>
  <property>
    <name>tajo.catalog.client-rpc.address</name>
    <value>10-0-86-51:26005</value>
  </property>
  <property>
    <name>tajo.worker.tmpdir.locations</name>
    <value>/test/tajo1,/test/tajo2,/test/tajo3</value>
  </property>
  <!--  worker  -->
  <property>
    <name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name>
    <value>4</value>
  </property>
 <property>
   <name>tajo.worker.resource.memory-mb</name>
   <value>5120</value>
 </property>
  <property>
    <name>tajo.worker.resource.dfs-dir-aware</name>
    <value>true</value>
  </property>
  <property>
    <name>tajo.worker.resource.dedicated</name>
    <value>true</value>
  </property>
  <property>
    <name>tajo.worker.resource.dedicated-memory-ratio</name>
    <value>0.6</value>
  </property>
</configuration>

On Fri, Jan 16, 2015 at 2:50 PM, Jinho Kim <jhkim@apache.org> wrote:

> Hello Azuyy yu
>
> I left some comments.
>
> -Jinho
> Best regards
>
> 2015-01-16 14:37 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:
>
> > Hi,
> >
> > I tested Tajo before half a year, then not focus on Tajo because some
> other
> > works.
> >
> > then I setup a small dev Tajo cluster this week.(six nodes, VM) based on
> > Hadoop-2.6.0.
> >
> > so my questions is:
> >
> > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn scheduler
> > to manage  job resources. but now I found it doesn't rely on Yarn,
> because
> > I only start HDFS daemons, no yarn daemons. so Tajo has his own job
> > sheduler ?
> >
> >
> Now, tajo does using own task scheduler. and  You can start tajo without
> Yarn daemons
> Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html
>
>
> >
> > 2) Does that we need to put the file replications on every nodes on Tajo
> > cluster?
> >
>
> No, tajo does not need more replication.  if you set more replication, data
> locality can be increased
>
> such as I have a six nodes Tajo cluster, then should I set HDFS block
> > replication to six? because:
> >
> > I noticed when I run Tajo query, some nodes are busy, but some is free.
> > because the file's blocks are only located on these nodes. non others.
> >
> >
> In my opinion, you need to run balancer
>
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
>
>
> 3)the test data set is 4 million rows. nearly several GB. but it's very
> > slow when I runing: select count(distinct ID) from ****;
> > Any possible problems here?
> >
>
> Could you share tajo-env.sh, tajo-site.xml ?
>
>
> >
> >
> > Thanks
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message