tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Some Tajo-0.9.0 questions
Date Fri, 16 Jan 2015 09:03:49 GMT
Hi,
There is no big improvement, sometimes more slower than before. I also try
to increase worker's heap size and parallel, nothing improve.

default> select count(distinct auid) from test_pl_00_0;
Progress: 0%, response time: 0.963 sec
Progress: 0%, response time: 0.964 sec
Progress: 0%, response time: 1.366 sec
Progress: 0%, response time: 2.168 sec
Progress: 0%, response time: 3.17 sec
Progress: 0%, response time: 4.172 sec
Progress: 16%, response time: 5.174 sec
Progress: 16%, response time: 6.176 sec
Progress: 16%, response time: 7.178 sec
Progress: 33%, response time: 8.18 sec
Progress: 50%, response time: 9.181 sec
Progress: 50%, response time: 10.183 sec
Progress: 50%, response time: 11.185 sec
Progress: 50%, response time: 12.187 sec
Progress: 66%, response time: 13.189 sec
Progress: 66%, response time: 14.19 sec
Progress: 100%, response time: 15.003 sec
2015-01-16T17:00:56.410+0800: [GC2015-01-16T17:00:56.410+0800: [ParNew:
26473K->6582K(31488K), 0.0105030 secs] 26473K->6582K(115456K), 0.0105720
secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
2015-01-16T17:00:56.593+0800: [GC2015-01-16T17:00:56.593+0800: [ParNew:
27574K->6469K(31488K), 0.0086300 secs] 27574K->6469K(115456K), 0.0086940
secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2015-01-16T17:00:56.800+0800: [GC2015-01-16T17:00:56.800+0800: [ParNew:
27461K->5664K(31488K), 0.0122560 secs] 27461K->6591K(115456K), 0.0123210
secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2015-01-16T17:00:57.065+0800: [GC2015-01-16T17:00:57.065+0800: [ParNew:
26656K->6906K(31488K), 0.0070520 secs] 27583K->7833K(115456K), 0.0071470
secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
?count
-------------------------------
1222356
(1 rows, 15.003 sec, 8 B selected)


On Fri, Jan 16, 2015 at 4:09 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> Thanks Kim, I'll try and post back.
>
> On Fri, Jan 16, 2015 at 4:02 PM, Jinho Kim <jhkim@apache.org> wrote:
>
>> Thanks Azuryy Yu
>>
>> Your parallel running tasks of tajo-worker is 10 but heap memory is 3GB.
>> It
>> cause a long JVM pause
>> I recommend following :
>>
>> tajo-env.sh
>> TAJO_WORKER_HEAPSIZE=3000 or more
>>
>> tajo-site.xml
>> <!--  worker  -->
>> <property>
>>   <name>tajo.worker.resource.memory-mb</name>
>>   <value>3512</value> <!--  3 tasks + 1 qm task  -->
>> </property>
>> <property>
>>   <name>tajo.task.memory-slot-mb.default</name>
>>   <value>1000</value> <!--  default 512 -->
>> </property>
>> <property>
>>    <name>tajo.worker.resource.dfs-dir-aware</name>
>>    <value>true</value>
>> </property>
>> <!--  end  -->
>> http://tajo.apache.org/docs/0.9.0/configuration/worker_configuration.html
>>
>> -Jinho
>> Best regards
>>
>> 2015-01-16 16:02 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:
>>
>> > Thanks Kim.
>> >
>> > The following is my tajo-env and tajo-site
>> >
>> > *tajo-env.sh:*
>> > export HADOOP_HOME=/usr/local/hadoop
>> > export JAVA_HOME=/usr/local/java
>> > _TAJO_OPTS="-server -verbose:gc
>> >   -XX:+PrintGCDateStamps
>> >   -XX:+PrintGCDetails
>> >   -XX:+UseGCLogFileRotation
>> >   -XX:NumberOfGCLogFiles=9
>> >   -XX:GCLogFileSize=256m
>> >   -XX:+DisableExplicitGC
>> >   -XX:+UseCompressedOops
>> >   -XX:SoftRefLRUPolicyMSPerMB=0
>> >   -XX:+UseFastAccessorMethods
>> >   -XX:+UseParNewGC
>> >   -XX:+UseConcMarkSweepGC
>> >   -XX:+CMSParallelRemarkEnabled
>> >   -XX:CMSInitiatingOccupancyFraction=70
>> >   -XX:+UseCMSCompactAtFullCollection
>> >   -XX:CMSFullGCsBeforeCompaction=0
>> >   -XX:+CMSClassUnloadingEnabled
>> >   -XX:CMSMaxAbortablePrecleanTime=300
>> >   -XX:+CMSScavengeBeforeRemark
>> >   -XX:PermSize=160m
>> >   -XX:GCTimeRatio=19
>> >   -XX:SurvivorRatio=2
>> >   -XX:MaxTenuringThreshold=60"
>> > _TAJO_MASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
>> > _TAJO_WORKER_OPTS="$_TAJO_OPTS -Xmx3g -Xms3g -Xmn1g"
>> > _TAJO_QUERYMASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
>> > export TAJO_OPTS=$_TAJO_OPTS
>> > export TAJO_MASTER_OPTS=$_TAJO_MASTER_OPTS
>> > export TAJO_WORKER_OPTS=$_TAJO_WORKER_OPTS
>> > export TAJO_QUERYMASTER_OPTS=$_TAJO_QUERYMASTER_OPTS
>> > export TAJO_LOG_DIR=${TAJO_HOME}/logs
>> > export TAJO_PID_DIR=${TAJO_HOME}/pids
>> > export TAJO_WORKER_STANDBY_MODE=true
>> >
>> > *tajo-site.xml:*
>> >
>> > <configuration>
>> >   <property>
>> >     <name>tajo.rootdir</name>
>> >     <value>hdfs://test-cluster/tajo</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.master.umbilical-rpc.address</name>
>> >     <value>10-0-86-51:26001</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.master.client-rpc.address</name>
>> >     <value>10-0-86-51:26002</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.resource-tracker.rpc.address</name>
>> >     <value>10-0-86-51:26003</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.catalog.client-rpc.address</name>
>> >     <value>10-0-86-51:26005</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.worker.tmpdir.locations</name>
>> >     <value>/test/tajo1,/test/tajo2,/test/tajo3</value>
>> >   </property>
>> >   <!--  worker  -->
>> >   <property>
>> >     <name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name>
>> >     <value>4</value>
>> >   </property>
>> >  <property>
>> >    <name>tajo.worker.resource.memory-mb</name>
>> >    <value>5120</value>
>> >  </property>
>> >   <property>
>> >     <name>tajo.worker.resource.dfs-dir-aware</name>
>> >     <value>true</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.worker.resource.dedicated</name>
>> >     <value>true</value>
>> >   </property>
>> >   <property>
>> >     <name>tajo.worker.resource.dedicated-memory-ratio</name>
>> >     <value>0.6</value>
>> >   </property>
>> > </configuration>
>> >
>> > On Fri, Jan 16, 2015 at 2:50 PM, Jinho Kim <jhkim@apache.org> wrote:
>> >
>> > > Hello Azuyy yu
>> > >
>> > > I left some comments.
>> > >
>> > > -Jinho
>> > > Best regards
>> > >
>> > > 2015-01-16 14:37 GMT+09:00 Azuryy Yu <azuryyyu@gmail.com>:
>> > >
>> > > > Hi,
>> > > >
>> > > > I tested Tajo before half a year, then not focus on Tajo because
>> some
>> > > other
>> > > > works.
>> > > >
>> > > > then I setup a small dev Tajo cluster this week.(six nodes, VM)
>> based
>> > on
>> > > > Hadoop-2.6.0.
>> > > >
>> > > > so my questions is:
>> > > >
>> > > > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn
>> > scheduler
>> > > > to manage  job resources. but now I found it doesn't rely on Yarn,
>> > > because
>> > > > I only start HDFS daemons, no yarn daemons. so Tajo has his own job
>> > > > sheduler ?
>> > > >
>> > > >
>> > > Now, tajo does using own task scheduler. and  You can start tajo
>> without
>> > > Yarn daemons
>> > > Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html
>> > >
>> > >
>> > > >
>> > > > 2) Does that we need to put the file replications on every nodes on
>> > Tajo
>> > > > cluster?
>> > > >
>> > >
>> > > No, tajo does not need more replication.  if you set more replication,
>> > data
>> > > locality can be increased
>> > >
>> > > such as I have a six nodes Tajo cluster, then should I set HDFS block
>> > > > replication to six? because:
>> > > >
>> > > > I noticed when I run Tajo query, some nodes are busy, but some is
>> free.
>> > > > because the file's blocks are only located on these nodes. non
>> others.
>> > > >
>> > > >
>> > > In my opinion, you need to run balancer
>> > >
>> > >
>> >
>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
>> > >
>> > >
>> > > 3)the test data set is 4 million rows. nearly several GB. but it's
>> very
>> > > > slow when I runing: select count(distinct ID) from ****;
>> > > > Any possible problems here?
>> > > >
>> > >
>> > > Could you share tajo-env.sh, tajo-site.xml ?
>> > >
>> > >
>> > > >
>> > > >
>> > > > Thanks
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message