drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill issue #1057: DRILL-5993 Append Row Method For VectorContainer (WIP)
Date Wed, 29 Nov 2017 21:50:37 GMT
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1057
  
    To answer the two questions:
    
    1. The copier is used in multiple locations, some of which include selection vectors.
Sort uses a copier to merge rows coming from multiple sorted batches. The SVR compresses out
SVs. A filter will produce an SV2 which the SVR removes. An in-memory sort produces an SV4.
But, because of the ways plans are generated, the hash join will never see a batch with an
SV. (An SVR will be inserted, if needed, to remove the SV.)
    
    2. We never write a batch using an SV. The SV is always a source indirection. Because
we do indirection on the source side (and vectors are append only), there can be no SV on
the destination side.
    
    Note also that the {{VectorContainer}} class, despite it's API, knows nothing about SVs.
The SV is tacked on separately by the {{RecordBatch}}. (This is a less-than-ideal design,
but it is how things work at present.) FWIW, the test-oriented {{RowSet}} abstractions came
about as wrappers around both the {{VectorContainer}} and SV to provide a unified view.
    
    Because of how we do SVs, you'll need three copy methods: one for no SV, one for an SV2
and another for an SV4.
    
    In the fullness of time, the new "column reader" and "column writer" abstractions will
hide all this stuff, but it will take time before those tools come online.


---

Mime
View raw message