drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3874) flattening large JSON objects consumes too much direct memory
Date Fri, 02 Oct 2015 02:00:31 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940700#comment-14940700

ASF GitHub Bot commented on DRILL-3874:

Github user cwestin commented on the pull request:

    Re ObjectVector: I don't know what that's for. I just followed the pattern: getBufferSize()
already throws that exception.
    Re OUTPUT_MEMORY_LIMIT: what do you think? I tend to avoid adding more knobs, but I can
easily do that if you like (with the current 512MB as the default). Let me know soon, about
to kick off testing on Jason's suggested replacement of getBufferSize() implementations with
calls to getBufferSizeFor(). The problem I see here is that it will affect all flatten()s,
whether they need it or not. And, this isn't really the long term solution, which is really
to add projection capabilities so that we're not passing through the original record like

> flattening large JSON objects consumes too much direct memory
> -------------------------------------------------------------
>                 Key: DRILL-3874
>                 URL: https://issues.apache.org/jira/browse/DRILL-3874
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.1.0
>            Reporter: Chris Westin
>            Assignee: Chris Westin
> A JSON record has a field whose value is an array with 20,000 elements; the record's
size is 4MB. A select is used to flatten this. The query profile reports that the peak memory
utilization was 8GB, most of it used by the flatten. 

This message was sent by Atlassian JIRA

View raw message