drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets
Date Wed, 12 Apr 2017 00:33:42 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965209#comment-15965209
] 

ASF GitHub Bot commented on DRILL-5323:
---------------------------------------

Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110783017
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java
---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    --- End diff --
    
    Can throw IndexOutOfBoundsException. Check for index with columns.size()


> Provide test tools to create, populate and compare row sets
> -----------------------------------------------------------
>
>                 Key: DRILL-5323
>                 URL: https://issues.apache.org/jira/browse/DRILL-5323
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Tools, Build & Test
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records stored
as column vectors. (Drill uses various terms for this concept. A record batch is a row set
with an operator implementation wrapped around it. A vector container is a row set, but with
much functionality left as an exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a {{VectorContainer}}
and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated code blocks.)
> * Compare the results of the operation with an expected result set.
> * Dispose of the underling direct memory when work is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message