tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Some Tajo-0.9.0 questions
Date Fri, 16 Jan 2015 08:09:31 GMT
Thanks Kim, I'll try and post back.

On Fri, Jan 16, 2015 at 4:02 PM, Jinho Kim <jhkim@apache.org> wrote:

> Thanks Azuryy Yu
>
> Your parallel running tasks of tajo-worker is 10 but heap memory is 3GB. It
> cause a long JVM pause
> I recommend following :
>
> tajo-env.sh
> TAJO_WORKER_HEAPSIZE=3000 or more
>
> tajo-site.xml
> <!--  worker  -->
> <property>
>   <name>tajo.worker.resource.memory-mb</name>
>   <value>3512</value> <!--  3 tasks + 1 qm task  -->
> </property>
> <property>
>   <name>tajo.task.memory-slot-mb.default</name>
>   <value>1000</value> <!--  default 512 -->
> </property>
> <property>
>    <name>tajo.worker.resource.dfs-dir-aware</name>
>    <value>true</value>
> </property>
> <!--  end  -->
> http://tajo.apache.org/docs/0.9.0/configuration/worker_configuration.html
>
> -Jinho
> Best regards
>
> 2015-01-16 16:02 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:
>
> > Thanks Kim.
> >
> > The following is my tajo-env and tajo-site
> >
> > *tajo-env.sh:*
> > export HADOOP_HOME=/usr/local/hadoop
> > export JAVA_HOME=/usr/local/java
> > _TAJO_OPTS="-server -verbose:gc
> >   -XX:+PrintGCDateStamps
> >   -XX:+PrintGCDetails
> >   -XX:+UseGCLogFileRotation
> >   -XX:NumberOfGCLogFiles=9
> >   -XX:GCLogFileSize=256m
> >   -XX:+DisableExplicitGC
> >   -XX:+UseCompressedOops
> >   -XX:SoftRefLRUPolicyMSPerMB=0
> >   -XX:+UseFastAccessorMethods
> >   -XX:+UseParNewGC
> >   -XX:+UseConcMarkSweepGC
> >   -XX:+CMSParallelRemarkEnabled
> >   -XX:CMSInitiatingOccupancyFraction=70
> >   -XX:+UseCMSCompactAtFullCollection
> >   -XX:CMSFullGCsBeforeCompaction=0
> >   -XX:+CMSClassUnloadingEnabled
> >   -XX:CMSMaxAbortablePrecleanTime=300
> >   -XX:+CMSScavengeBeforeRemark
> >   -XX:PermSize=160m
> >   -XX:GCTimeRatio=19
> >   -XX:SurvivorRatio=2
> >   -XX:MaxTenuringThreshold=60"
> > _TAJO_MASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
> > _TAJO_WORKER_OPTS="$_TAJO_OPTS -Xmx3g -Xms3g -Xmn1g"
> > _TAJO_QUERYMASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
> > export TAJO_OPTS=$_TAJO_OPTS
> > export TAJO_MASTER_OPTS=$_TAJO_MASTER_OPTS
> > export TAJO_WORKER_OPTS=$_TAJO_WORKER_OPTS
> > export TAJO_QUERYMASTER_OPTS=$_TAJO_QUERYMASTER_OPTS
> > export TAJO_LOG_DIR=${TAJO_HOME}/logs
> > export TAJO_PID_DIR=${TAJO_HOME}/pids
> > export TAJO_WORKER_STANDBY_MODE=true
> >
> > *tajo-site.xml:*
> >
> > <configuration>
> >   <property>
> >     <name>tajo.rootdir</name>
> >     <value>hdfs://test-cluster/tajo</value>
> >   </property>
> >   <property>
> >     <name>tajo.master.umbilical-rpc.address</name>
> >     <value>10-0-86-51:26001</value>
> >   </property>
> >   <property>
> >     <name>tajo.master.client-rpc.address</name>
> >     <value>10-0-86-51:26002</value>
> >   </property>
> >   <property>
> >     <name>tajo.resource-tracker.rpc.address</name>
> >     <value>10-0-86-51:26003</value>
> >   </property>
> >   <property>
> >     <name>tajo.catalog.client-rpc.address</name>
> >     <value>10-0-86-51:26005</value>
> >   </property>
> >   <property>
> >     <name>tajo.worker.tmpdir.locations</name>
> >     <value>/test/tajo1,/test/tajo2,/test/tajo3</value>
> >   </property>
> >   <!--  worker  -->
> >   <property>
> >     <name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name>
> >     <value>4</value>
> >   </property>
> >  <property>
> >    <name>tajo.worker.resource.memory-mb</name>
> >    <value>5120</value>
> >  </property>
> >   <property>
> >     <name>tajo.worker.resource.dfs-dir-aware</name>
> >     <value>true</value>
> >   </property>
> >   <property>
> >     <name>tajo.worker.resource.dedicated</name>
> >     <value>true</value>
> >   </property>
> >   <property>
> >     <name>tajo.worker.resource.dedicated-memory-ratio</name>
> >     <value>0.6</value>
> >   </property>
> > </configuration>
> >
> > On Fri, Jan 16, 2015 at 2:50 PM, Jinho Kim <jhkim@apache.org> wrote:
> >
> > > Hello Azuyy yu
> > >
> > > I left some comments.
> > >
> > > -Jinho
> > > Best regards
> > >
> > > 2015-01-16 14:37 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:
> > >
> > > > Hi,
> > > >
> > > > I tested Tajo before half a year, then not focus on Tajo because some
> > > other
> > > > works.
> > > >
> > > > then I setup a small dev Tajo cluster this week.(six nodes, VM) based
> > on
> > > > Hadoop-2.6.0.
> > > >
> > > > so my questions is:
> > > >
> > > > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn
> > scheduler
> > > > to manage  job resources. but now I found it doesn't rely on Yarn,
> > > because
> > > > I only start HDFS daemons, no yarn daemons. so Tajo has his own job
> > > > sheduler ?
> > > >
> > > >
> > > Now, tajo does using own task scheduler. and  You can start tajo
> without
> > > Yarn daemons
> > > Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html
> > >
> > >
> > > >
> > > > 2) Does that we need to put the file replications on every nodes on
> > Tajo
> > > > cluster?
> > > >
> > >
> > > No, tajo does not need more replication.  if you set more replication,
> > data
> > > locality can be increased
> > >
> > > such as I have a six nodes Tajo cluster, then should I set HDFS block
> > > > replication to six? because:
> > > >
> > > > I noticed when I run Tajo query, some nodes are busy, but some is
> free.
> > > > because the file's blocks are only located on these nodes. non
> others.
> > > >
> > > >
> > > In my opinion, you need to run balancer
> > >
> > >
> >
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
> > >
> > >
> > > 3)the test data set is 4 million rows. nearly several GB. but it's very
> > > > slow when I runing: select count(distinct ID) from ****;
> > > > Any possible problems here?
> > > >
> > >
> > > Could you share tajo-env.sh, tajo-site.xml ?
> > >
> > >
> > > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message