spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nongli <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-12879][SQL] improve the unsafe row writ...
Date Mon, 18 Jan 2016 19:13:41 GMT
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10809#discussion_r50034304
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
---
    @@ -26,36 +26,44 @@
     import org.apache.spark.unsafe.types.UTF8String;
     
     /**
    - * A helper class to write data into global row buffer using `UnsafeRow` format,
    - * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + * A helper class to write data into global row buffer using `UnsafeRow` format.
    + *
    + * It will remember the offset of row buffer which it starts to write, and move the cursor
of row
    + * buffer while writing.  If a new record comes, the cursor of row buffer will be reset,
so we need
    + * to also call `reset` of this class before writing, to update the `startingOffset`
and clear out
    + * null bits.  Note that if we use it to write data into the result unsafe row, which
means we will
    + * always write from the very beginning of the global row buffer, we don't need to update
    + * `startingOffset` and can just call `zeroOutNullBites` before writing new record.
      */
     public class UnsafeRowWriter {
     
    -  private BufferHolder holder;
    +  private final BufferHolder holder;
       // The offset of the global buffer where we start to write this row.
       private int startingOffset;
    -  private int nullBitsSize;
    -  private UnsafeRow row;
    +  private final int nullBitsSize;
    +  private final int fixedSize;
     
    -  public void initialize(BufferHolder holder, int numFields) {
    -    this.holder = holder;
    +  public void reset() {
         this.startingOffset = holder.cursor;
    -    this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
     
         // grow the global buffer to make sure it has enough space to write fixed-length
data.
    -    final int fixedSize = nullBitsSize + 8 * numFields;
    -    holder.grow(fixedSize, row);
    +    holder.grow(fixedSize);
         holder.cursor += fixedSize;
     
    -    // zero-out the null bits region
    +    zeroOutNullBites();
    +  }
    +
    +  public void zeroOutNullBites() {
         for (int i = 0; i < nullBitsSize; i += 8) {
           Platform.putLong(holder.buffer, startingOffset + i, 0L);
         }
       }
     
    -  public void initialize(UnsafeRow row, BufferHolder holder, int numFields) {
    -    initialize(holder, numFields);
    -    this.row = row;
    +  public UnsafeRowWriter(BufferHolder holder, int numFields) {
    --- End diff --
    
    can you remove numFields. The holder contains a row and there is only one valid value
for numFields. Better to pull it from holder.row


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message