drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5420) all cores at 100% of all servers
Date Wed, 28 Jun 2017 17:50:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066939#comment-16066939
] 

ASF GitHub Bot commented on DRILL-5420:
---------------------------------------

GitHub user kkhatua opened a pull request:

    https://github.com/apache/drill/pull/862

    DRILL-5420: ParquetAsyncPgReader goes into infinite loop during cleanup

    PageQueue is cleaned up using poll() instead of take(), which constantly gets interrupted
and causes CPU churn.
    During a columnReader shutdown, a flag is set so as to block any new page reading tasks
from being submitted, before the queues are finally cleared and memory occupied by the pages
released.
    More details are in this JIRA comment : https://issues.apache.org/jira/browse/DRILL-5420?focusedCommentId=16066933&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16066933

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kkhatua/drill DRILL-5420

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/862.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #862
    
----
commit 2f233d4b1318e29211856877937ef9988c34ffaf
Author: Kunal Khatua <kkhatua@maprtech.com>
Date:   2017-06-28T06:35:34Z

    DRILL-5420: ParquetAsyncPgReader goes into infinite loop during cleanup
    
    PageQueue is cleaned up using poll() instead of take(), which constantly gets interrupted
and causes CPU churn.
    During a columnReader shutdown, a flag is set so as to block any new page reading tasks
from being submitted.

----


> all cores at 100% of all servers
> --------------------------------
>
>                 Key: DRILL-5420
>                 URL: https://issues.apache.org/jira/browse/DRILL-5420
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>         Environment: linux, cluster with 5 servers over hdfs/parquet
>            Reporter: Hugo Bellomusto
>            Assignee: Kunal Khatua
>         Attachments: 2709a36d-804a-261a-64e5-afa271e782f8.json
>
>
> We have a drill cluster with five servers over hdfs/parquet.
> Each machine have 8 cores. All cores get at 100% of use.
> Each thread is looping in the while in line 314 in AsyncPageReader.java inside clear()
method.
> https://github.com/apache/drill/blob/1.10.0/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java#L314
> jstack -l 19255|grep -A 50 $(printf "%x" 29250)
> "271d6262-ff19-ad24-af36-777bfe6c6375:frag:1:4" daemon prio=10 tid=0x00007f5b2adec800
nid=0x7242 runnable [0x00007f5aa33e8000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.Throwable.fillInStackTrace(Native Method)
> 	at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> 	- locked <0x00000007374bfcb0> (a java.lang.InterruptedException)
> 	at java.lang.Throwable.<init>(Throwable.java:250)
> 	at java.lang.Exception.<init>(Exception.java:54)
> 	at java.lang.InterruptedException.<init>(InterruptedException.java:57)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
> 	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.clear(AsyncPageReader.java:317)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.clear(ColumnReader.java:140)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.close(ParquetRecordReader.java:632)
> 	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> 	at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> 	at org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> 	at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
> 	at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message