spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gatorsmile <...@git.apache.org>
Subject [GitHub] spark pull request #20116: [SPARK-20960][SQL] make ColumnVector public
Date Wed, 03 Jan 2018 16:43:19 GMT
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20116#discussion_r159470325
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -14,32 +14,39 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -package org.apache.spark.sql.execution.vectorized;
    +package org.apache.spark.sql.vectorized;
     
     import org.apache.spark.sql.catalyst.util.MapData;
     import org.apache.spark.sql.types.DataType;
     import org.apache.spark.sql.types.Decimal;
     import org.apache.spark.unsafe.types.UTF8String;
     
     /**
    - * This class represents in-memory values of a column and provides the main APIs to access
the data.
    - * It supports all the types and contains get APIs as well as their batched versions.
The batched
    - * versions are considered to be faster and preferable whenever possible.
    + * An interface representing in-memory columnar data in Spark. This interface defines
the main APIs
    + * to access the data, as well as their batched versions. The batched versions are considered
to be
    + * faster and preferable whenever possible.
      *
    - * To handle nested schemas, ColumnVector has two types: Arrays and Structs. In both
cases these
    - * columns have child columns. All of the data are stored in the child columns and the
parent column
    - * only contains nullability. In the case of Arrays, the lengths and offsets are saved
in the child
    - * column and are encoded identically to INTs.
    + * Most of the APIs take the rowId as a parameter. This is the batch local 0-based row
id for values
    + * in this ColumnVector.
      *
    - * Maps are just a special case of a two field struct.
    + * ColumnVector supports all the data types including nested types. To handle nested
types,
    + * ColumnVector can have children and is a tree structure. For struct type, it stores
the actual
    + * data of each field in the corresponding child ColumnVector, and only store null information
in
    --- End diff --
    
    `store ` -> `stores`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message