drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3665) Deadlock while executing CTAS that runs out of memory
Date Thu, 03 Aug 2017 17:57:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113185#comment-16113185
] 

Roman commented on DRILL-3665:
------------------------------

I tried different ways to reproduce this issue on a drill a7e298760f9c9e (before DRILL-5599
fix) with different memory setting and different size of tables, but I can get the only drillbit
down:

{code:xml}
Error: CONNECTION ERROR: Connection /192.168.121.7:52656 <--> node1/192.168.121.7:31010
(user client) closed unexpectedly. Drillbit down?


[Error Id: acc425f5-eb8c-4028-84fe-ac649cf36bee ] (state=,code=0)
{code}

But this error is not reproducible after DRILL-5599 fix. 

I used 2 queries:

{code:title=Query 1|borderStyle=solid}
create table lineitema as select
    cast(columns[0] as int) l_orderkey,
    cast(columns[1] as int) l_partkey,
    cast(columns[2] as int) l_suppkey,
    cast(columns[3] as int) l_linenumber,
    cast(columns[4] as double) l_quantity,
    cast(columns[5] as double) l_extendedprice,
    cast(columns[6] as double) l_discount,
    cast(columns[7] as double) l_tax,
    cast(columns[8] as varchar(200)) l_returnflag,
    cast(columns[9] as varchar(200)) l_linestatus,
    cast(columns[10] as date) l_shipdate,
    cast(columns[11] as date) l_commitdate,
    cast(columns[12] as date) l_receiptdate,
    cast(columns[13] as varchar(200)) l_shipinstruct,
    cast(columns[14] as varchar(200)) l_shipmode,
    cast(columns[15] as varchar(200)) l_comment
from `lineitembig.dat`;
{code}
{code:title=Query 2|borderStyle=solid}
create table lineitem as select
    a.l_orderkey,
    a.l_partkey,
    a.l_suppkey,
    a.l_linenumber,
    a.l_quantity,
    a.l_extendedprice,
    a.l_discount,
    a.l_tax,
    a.l_returnflag,
    a.l_linestatus,
    a.l_shipdate,
    a.l_commitdate,
    a.l_receiptdate,
    a.l_shipinstruct,
    b.l_shipmode,
    b.l_comment
from lineitema a
INNER JOIN
lineitemb b 
ON a.l_orderkey = b.l_orderkey; 
{code}

It seems we can not get the original reproduce on Drill 1.11, but this "CONNECTION ERROR"
is related to this ticket. So closed as a duplicate of DRILL-5599.

> Deadlock while executing CTAS that runs out of memory
> -----------------------------------------------------
>
>                 Key: DRILL-3665
>                 URL: https://issues.apache.org/jira/browse/DRILL-3665
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Roman
>            Priority: Critical
>             Fix For: Future
>
>         Attachments: drillbit.log.drill-3665, jstack.txt
>
>
> I had a query running out of memory during CTAS and after that drillbit was rendered
unusable:
> {code}
> 0: jdbc:drill:schema=dfs> create table lineitem as select
> . . . . . . . . . . . . >     cast(columns[0] as int) l_orderkey,
> . . . . . . . . . . . . >     cast(columns[1] as int) l_partkey,
> . . . . . . . . . . . . >     cast(columns[2] as int) l_suppkey,
> . . . . . . . . . . . . >     cast(columns[3] as int) l_linenumber,
> . . . . . . . . . . . . >     cast(columns[4] as double) l_quantity,
> . . . . . . . . . . . . >     cast(columns[5] as double) l_extendedprice,
> . . . . . . . . . . . . >     cast(columns[6] as double) l_discount,
> . . . . . . . . . . . . >     cast(columns[7] as double) l_tax,
> . . . . . . . . . . . . >     cast(columns[8] as varchar(200)) l_returnflag,
> . . . . . . . . . . . . >     cast(columns[9] as varchar(200)) l_linestatus,
> . . . . . . . . . . . . >     cast(columns[10] as date) l_shipdate,
> . . . . . . . . . . . . >     cast(columns[11] as date) l_commitdate,
> . . . . . . . . . . . . >     cast(columns[12] as date) l_receiptdate,
> . . . . . . . . . . . . >     cast(columns[13] as varchar(200)) l_shipinstruct,
> . . . . . . . . . . . . >     cast(columns[14] as varchar(200)) l_shipmode,
> . . . . . . . . . . . . >     cast(columns[15] as varchar(200)) l_comment
> . . . . . . . . . . . . > from `lineitem.dat`;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> Fragment 1:10
> [Error Id: 11084315-5388-4500-b165-642a5f595ebf on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> Here is drill's behavior after that:
> 1. Tried to run: "select * from sys.options" in the same sqlline session - hangs.
> 2. Was able to start sqlline and connect to drillbit:
>         - If you try running anything on this connection: it hangs.
>         - Issue ^C --> you will get result if you are lucky (these queries will appear
as: "CANCELLATION_REQUESTED" on WebUI)
>           (I only tried querying sys.memory, sys.options which possibly have a different
code path than queries from actual user data)
>         - If you are not lucky, you will get this error below:
> {code}
>         0: jdbc:drill:schema=dfs> show files;
>         java.lang.RuntimeException: java.sql.SQLException: Unexpected RuntimeException:
java.lang.IllegalArgumentException: Buffer has negative reference count.
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>         at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1583)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:738)
>         at sqlline.SqlLine.begin(SqlLine.java:612)
>         at sqlline.SqlLine.start(SqlLine.java:366)
>         at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> or maybe something like this:
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from nation group by n_regionkey;
> Error: CONNECTION ERROR: Exceeded timeout (5000) while waiting send intermediate work
fragments to remote nodes. Sent 1 and only heard response back from 0 nodes.
> [Error Id: 6abce8e9-78a1-4b3d-bcec-503930482b40 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> I'm attaching results of a jstack  and drillbit.log and so far I was not able to reproduce
this problem again (working on it).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message