spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiszk <...@git.apache.org>
Subject [GitHub] spark pull request #19842: [SPARK-22643][SQL] ColumnarArray should be an imm...
Date Wed, 29 Nov 2017 15:56:35 GMT
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19842#discussion_r153829687
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
---
    @@ -175,9 +175,7 @@ public ColumnarRow getStruct(int rowId, int size) {
        * Returns the array at rowid.
        */
       public final ColumnarArray getArray(int rowId) {
    -    resultArray.length = getArrayLength(rowId);
    -    resultArray.offset = getArrayOffset(rowId);
    -    return resultArray;
    +    return new ColumnarArray(arrayData(), getArrayOffset(rowId), getArrayLength(rowId));
    --- End diff --
    
    Is it better to create `ColumnarArray` for each `rowID` only once (e.g. by using caching)?
I am curious whether we would see performance overhead for creating `ColumnarArray` to access
elements of a multi-dimensional array (e.g. `a[1][2] + a[1][3]`).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message