drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill issue #717: DRILL-5080: Memory-managed version of external sort
Date Wed, 08 Feb 2017 08:00:15 GMT
Github user paul-rogers commented on the issue:

    Some comment got lost in the force-push. One was related to the output batch size, suggesting
we cap it at 16 MB. The reason is that value vectors about 16 MB cause memory fragmentation.
A later fix will limit an output batch to either 64K rows (the size of an sv2) or so that
the longest vector is smaller than 16 MB. The most recent commit added per-column size information
so that we can enforce this limit. For example, we can have 64K rows with columns of size
256 bytes within a 16 MB vector. There is no reason not to allow 64K rows even for rows with
four of the 256 columns. Total batch size would be 64 MB, but no single vector would be above
16 MB.
    That fix will be offered, along with tests and enabling the managed sort by default, in
a subsequent PR.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message