drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
Date Wed, 29 Nov 2017 21:55:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271600#comment-16271600

ASF GitHub Bot commented on DRILL-5993:

Github user paul-rogers commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java
    @@ -353,6 +353,23 @@ public int getRecordCount() {
       public boolean hasRecordCount() { return recordCount != -1; }
    +  /**
    +   * This works with non-hyper {@link VectorContainer}s which have no selection vectors.
    +   * Appends a row taken from a source {@link VectorContainer} to this {@link VectorContainer}.
    +   * @param srcContainer The {@link VectorContainer} to copy a row from.
    +   * @param srcIndex The index of the row to copy from the source {@link VectorContainer}.
    +   */
    +  public void appendRow(VectorContainer srcContainer, int srcIndex) {
    +    for (int vectorIndex = 0; vectorIndex < wrappers.size(); vectorIndex++) {
    +      ValueVector destVector = wrappers.get(vectorIndex).getValueVector();
    +      ValueVector srcVector = srcContainer.wrappers.get(vectorIndex).getValueVector();
    +      destVector.copyEntry(recordCount, srcVector, srcIndex);
    +    }
    +    recordCount++;
    --- End diff --
    This is OK for a row-by-row copy. But, you'll get better performance if you optimize for
the entire batch. Because you have no SV4, the source and dest batches are the same so the
vectors can be preloaded into an array of vectors to avoid the vector wrapper lookup per column.
    Plus, if the code is written per batch, you can go a step further: vectorize the operation.
Copy all values for column 1, then all for column 2, and so on. (In this case, you only get
each vector once, so sticking with the wrappers is fine.) By vectorizing, you may get the
vectorized cache-locality benefit that Drill promises from its operations. Worth a try to
see if you get any speed-up.

> Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
> ------------------------------------------------------------------------
>                 Key: DRILL-5993
>                 URL: https://issues.apache.org/jira/browse/DRILL-5993
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
> Currently the copier can only copy record from an incoming batch to the beginning of
an outgoing batch. We need to be able to copy a record and append it to the end of the outgoing

This message was sent by Atlassian JIRA

View raw message