tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jihoon Son <ghoon...@gmail.com>
Subject Re: Tajo query scheduler and performance question
Date Thu, 05 Mar 2015 07:17:31 GMT
Hi Azuryy,
truly sorry for late response.
I left some comments below.

Sincerely,
Jihoon

On Wed, Mar 4, 2015 at 7:15 PM Azuryy Yu <azuryyyu@gmail.com> wrote:

> Hi,
>
> I read theTajo-0.9.0  source code, I found Tajo using a simple FIFO
> scheduler,
>
> I accept this in the current stage. but when Tajo peek a query from the
> scheduler queue, then allocate workers for this query,
>
> Allocator only consider availale resource on a random worker list,  then
> specify a set of workers.
>
> 1)
> so My question is why we don't consider HDFS locatbility? otherwise network
> will be the bottleneck.
>
> I understand Tajo don't use YARN as a scheduler currently. and write a
> temporary simple FIFO scheduler. and I am also looked at
> https://issues.apache.org/jira/browse/TAJO-540 , I hope new Tajo scheduler
> similar to Sparrow.
>
It seems that there are some misunderstandings on our resource scheduling. The
FIFO scheduler has a role of the query scheduler. That is, given a list of
submitted queries, it reserves resources required to execute queries
consecutively. The Sparrow-like scheduler can be used for the concurrent
execution of multiple queries.

Once a query is started, the *task scheduler* is responsible for allocating
tasks to workers. As you said, tasks are allocated to workers if they have
enough resources. However when allocating tasks, our task scheduler
considers the physical disk location where the data is stored on as well as
the location of the node containing data. For example, with your cluster,
each worker can be assigned 12 tasks each of which processes data stored on
different 12 disks. Since a worker is generally equipped multiple disks,
this approach can utilize the disk bandwidth efficiently.

You can see the locality information in the Tajo's query master log. Here
is an example.
...
2015-03-05 15:14:12,662 INFO
org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned
Local/Rack/Total: (9264/1555/10819), Locality: 85.63%, Rack host:
xxx.xxx.xxx.xxx
...


> 2) performance related.
> I setup 10 nodes clusters, (1 master, 9 workers)
>
> 64GB mem, 24CPU, 12*4TB HDD,  1.6GB test data.(160 million records)
>
> It's works good for some agg sql tests except count(distinct)
>
> count(distinct) is very slow - ten minutes.
>
This result looks strange and difficult to find what makes the query
execution slow.
Would you mind sharing some logs and additional information of input data
(# of files, the distribution of data on HDFS)?
In addition, it would be great if you share the evaluation results of other
queries which you think the response time is sufficiently short.

>
> who can give me a simple explanation of how Tajo works with
> count(distinct), I can share my tajo-site here:
>
> <configuration>
>   <property>
>     <name>tajo.rootdir</name>
>     <value>hdfs://realtime-cluster/tajo</value>
>   </property>
>
>   <!-- master -->
>   <property>
>     <name>tajo.master.umbilical-rpc.address</name>
>     <value>xx:26001</value>
>   </property>
>   <property>
>     <name>tajo.master.client-rpc.address</name>
>     <value>xx:26002</value>
>   </property>
>   <property>
>     <name>tajo.master.info-http.address</name>
>     <value>xx:26080</value>
>   </property>
>   <property>
>     <name>tajo.resource-tracker.rpc.address</name>
>     <value>xx:26003</value>
>   </property>
>   <property>
>     <name>tajo.catalog.client-rpc.address</name>
>     <value>xx:26005</value>
>   </property>
>   <!--  worker  -->
>   <property>
>     <name>tajo.worker.tmpdir.locations</name>
>
> <value>file:///data/hadoop/data1/tajo,file:///data/hadoop/
> data2/tajo,file:///data/hadoop/data3/tajo,file:///
> data/hadoop/data4/tajo,file:///data/hadoop/data5/tajo,file:/
> //data/hadoop/data6/tajo,file:///data/hadoop/data7/tajo,
> file:///data/hadoop/data8/tajo,file:///data/hadoop/data9/tajo,file:///
> data/hadoop/data10/tajo,file:///data/hadoop/data11/tajo,file:/
> //data/hadoop/data12/tajo</value>
>   </property>
>   <property>
>     <name>tajo.worker.tmpdir.cleanup-at-startup</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>tajo.worker.history.expire-interval-minutes</name>
>     <value>60</value>
>   </property>
>   <property>
>     <name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name>
>     <value>24</value>
>   </property>
>   <property>
>     <name>tajo.worker.resource.memory-mb</name>
>     <value>60512</value> <!-- 3584 3 tasks + 1 qm task  -->
>   </property>
>   <property>
>     <name>tajo.task.memory-slot-mb.default</name>
>     <value>3000</value> <!--  default 512 -->
>   </property>
>   <property>
>     <name>tajo.task.disk-slot.default</name>
>     <value>1.0f</value> <!--  default 0.5 -->
>   </property>
>   <property>
>     <name>tajo.shuffle.fetcher.parallel-execution.max-num</name>
>     <value>5</value>
>   </property>
>   <property>
>     <name>tajo.executor.external-sort.thread-num</name>
>     <value>2</value>
>   </property>
>   <!-- client -->
>   <property>
>     <name>tajo.rpc.client.worker-thread-num</name>
>     <value>4</value>
>   </property>
>   <property>
>     <name>tajo.cli.print.pause</name>
>     <value>false</value>
>   </property>
> <!--
>   <property>
>     <name>tajo.worker.resource.dfs-dir-aware</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>tajo.worker.resource.dedicated</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>tajo.worker.resource.dedicated-memory-ratio</name>
>     <value>0.6</value>
>   </property>
> -->
> </configuration>
>
>
> tajo-env:
>
> export TAJO_WORKER_HEAPSIZE=60000
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message