drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3665) Deadlock while executing CTAS that runs out of memory
Date Tue, 18 Aug 2015 17:58:46 GMT
Victoria Markman created DRILL-3665:
---------------------------------------

             Summary: Deadlock while executing CTAS that runs out of memory
                 Key: DRILL-3665
                 URL: https://issues.apache.org/jira/browse/DRILL-3665
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.2.0
            Reporter: Victoria Markman
            Assignee: Chris Westin
            Priority: Critical


I had a query running out of memory during CTAS and after that drillbit was rendered unusable:

{code}
0: jdbc:drill:schema=dfs> create table lineitem as select
. . . . . . . . . . . . >     cast(columns[0] as int) l_orderkey,
. . . . . . . . . . . . >     cast(columns[1] as int) l_partkey,
. . . . . . . . . . . . >     cast(columns[2] as int) l_suppkey,
. . . . . . . . . . . . >     cast(columns[3] as int) l_linenumber,
. . . . . . . . . . . . >     cast(columns[4] as double) l_quantity,
. . . . . . . . . . . . >     cast(columns[5] as double) l_extendedprice,
. . . . . . . . . . . . >     cast(columns[6] as double) l_discount,
. . . . . . . . . . . . >     cast(columns[7] as double) l_tax,
. . . . . . . . . . . . >     cast(columns[8] as varchar(200)) l_returnflag,
. . . . . . . . . . . . >     cast(columns[9] as varchar(200)) l_linestatus,
. . . . . . . . . . . . >     cast(columns[10] as date) l_shipdate,
. . . . . . . . . . . . >     cast(columns[11] as date) l_commitdate,
. . . . . . . . . . . . >     cast(columns[12] as date) l_receiptdate,
. . . . . . . . . . . . >     cast(columns[13] as varchar(200)) l_shipinstruct,
. . . . . . . . . . . . >     cast(columns[14] as varchar(200)) l_shipmode,
. . . . . . . . . . . . >     cast(columns[15] as varchar(200)) l_comment
. . . . . . . . . . . . > from `lineitem.dat`;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
Fragment 1:10
[Error Id: 11084315-5388-4500-b165-642a5f595ebf on atsqa4-133.qa.lab:31010] (state=,code=0)
{code}

Here is drill's behavior after that:

1. Tried to run: "select * from sys.options" in the same sqlline session - hangs.

2. Was able to start sqlline and connect to drillbit:
        - If you try running anything on this connection: it hangs.
        - Issue ^C --> you will get result if you are lucky (these queries will appear
as: "CANCELLATION_REQUESTED" on WebUI)
          (I only tried querying sys.memory, sys.options which possibly have a different code
path than queries from actual user data)
        - If you are not lucky, you will get this error below:
{code}
        0: jdbc:drill:schema=dfs> show files;
        java.lang.RuntimeException: java.sql.SQLException: Unexpected RuntimeException: java.lang.IllegalArgumentException:
Buffer has negative reference count.
        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
        at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
        at sqlline.SqlLine.print(SqlLine.java:1583)
        at sqlline.Commands.execute(Commands.java:852)
        at sqlline.Commands.sql(Commands.java:751)
        at sqlline.SqlLine.dispatch(SqlLine.java:738)
        at sqlline.SqlLine.begin(SqlLine.java:612)
        at sqlline.SqlLine.start(SqlLine.java:366)
        at sqlline.SqlLine.main(SqlLine.java:259)
{code}

or maybe something like this:

{code}
0: jdbc:drill:schema=dfs> select count(*) from nation group by n_regionkey;
Error: CONNECTION ERROR: Exceeded timeout (5000) while waiting send intermediate work fragments
to remote nodes. Sent 1 and only heard response back from 0 nodes.
[Error Id: 6abce8e9-78a1-4b3d-bcec-503930482b40 on atsqa4-133.qa.lab:31010] (state=,code=0)
{code}

I'm attaching results of a jstack  and drillbit.log and so far I was not able to reproduce
this problem again (working on it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message