tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinho Kim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-385) Refactoring TaskScheduler to assign multiple fragments
Date Mon, 16 Dec 2013 07:41:08 GMT

    [ https://issues.apache.org/jira/browse/TAJO-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848893#comment-13848893
] 

Jinho Kim commented on TAJO-385:
--------------------------------

Jihoon,

 I've some problems on cluster. Maybe temporal directory is wrong.
and scan performance was reduced about 30%.
How can I set previous taskScheduler ?
{noformat}
Query: select count(*) from lineitem_100

<property>
  <name>tajo.task.size-mb.default</name>
  <value>134217728</value>
</property>
{noformat}
{code}
2013-12-16 16:20:58,843 INFO  worker.TajoResourceAllocator (TajoResourceAllocator.java:run(190))
- ContainerProxy started:container_1387178314978_0001_01_000041
2013-12-16 16:20:58,846 INFO  master.TaskScheduler (TaskScheduler.java:handle(276)) - TaskRequest:
container_1387178314978_0001_01_000041,eb_1387178314978_0001_000002
2013-12-16 16:20:58,915 ERROR querymaster.QueryUnitAttempt (QueryUnitAttempt.java:transition(297))
- FROM gruter105.gruter.com >> java.io.FileNotFoundException: File does not exist: /tajo/warehouse/eb_1387178314978_0001_000001
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:39)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1303)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1205)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
{code}

> Refactoring TaskScheduler to assign multiple fragments
> ------------------------------------------------------
>
>                 Key: TAJO-385
>                 URL: https://issues.apache.org/jira/browse/TAJO-385
>             Project: Tajo
>          Issue Type: Improvement
>          Components: query master
>    Affects Versions: 0.8-incubating
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>         Attachments: TAJO-385.patch, TAJO-385_2.patch, TAJO-385_3.patch
>
>
> In the current implementation, each task processes only one fragment.
> However, processing multiple fragments in a task will increase the query processing performance
according to the storage layout and the user queries.
> In this issue, TaskScheduler is refactored to enable assigning multiple fragments to
each task.
> Followings should be contained.
> * Schedule Fragments instead of QueryUnits in TaskScheduler
> ** The QueryUnit creation is postponed until TaskScheduler receives task requests from
workers.
> ** When TaskScheduler receives task requests from workers, it dynamically creates an
QueryUnit and assigns one or more fragments.
> ** The fragment scheduling should take into account the disk load balancing.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message