drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3229) Create a new EmbeddedVector
Date Thu, 17 Sep 2015 22:24:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804637#comment-14804637

Steven Phillips commented on DRILL-3229:

Basic design outline:

A Union type represents a field where the type can vary between records. The data for a field
of type Union will be stored in a UnionVector.

h4. UnionVector
	Internally uses a MapVector to hold the vectors for the various types. The types include
all of the MinorTypes, including List and Map.
	For example, the internal MapVector will have a subfield named "bigInt", which will refer
to a NullableBigIntVector.

	In addition to the vectors corresponding to the minor types, there will be two additional
fields, both represented by UInt1Vectors. These are
	"bits" and "types", which will represent the nullability and types of the underlying data.
The "bits" vector will work the same way it works in other
	nullable vectors. The "types" vector will store the number corresponding to the value of
the MinorType as defined in the protobuf definition. There
	will be mutator methods for setting null and type.

h4. UnionWriter
	The UnionWriter implements and overwrites all of the methods of FieldWriter. It holds field
writers corresponding to each of the types included in the underly
	UnionVector, and delegates the method calls for each type to the corresponding writer. For
example, the BigIntWriter interface:

public interface BigIntWriter extends BaseWriter {
  public void write(BigIntHolder h);

  public void writeBigInt(long value);
	UnionWriter overwrites these methods:

  public void writeBigInt(long value) {
    data.getMutator().setType(idx(), MinorType.BIGINT);

  public void writeBigInt(BigIntHolder h) {
    data.getMutator().setType(idx(), MinorType.BIGINT);

	This requires users of the interface to go through the UnionWriter, rather than using the
underlying BigIntWriter directly. Otherwise, the "type" and "bits" vector would not get set

h4. UnionReader
	Much the same as the UnionWriter, the UnionReader overwrites the methods of FieldReader,
and delegates to a corresponding specific FieldReader implementation depending on which type

	the current value is.

h4. UnionListVector
	UnionListVector extends BaseRepeatedVector. It works much the same as other Repeated vectors;
there is a data vector and an offset vector. The data vector in this case is a UnionVector.

h4. UnionListWriter
	The UnionListWriter overrides all FieldWriter methods. When starting a new list, the startList()
method is called. This calls the startNewValue(int index) method
        of the underlying UnionListVector.Mutator. Subsequent calls to the ListWriter methods
(such as bigint()), return the UnionListWriter itself, and calls to write are handled by calling
	the appropriate method on the underlying UnionListVector.Mutator, which handles updating
the offset vector.

	In the case that the map() method is called (i.e. repeated map), the UnionListWriter is itself
returned, but a state variable is updated to indicate that it should oeprate as a MapWriter.
	While in MapWriter mode, calls to the MapWriter methods will also return the UnionListWriter
itself, but will also update the field indicating what the name of the current field is.
	Subsequent writes to the ScalarWriter methods will write to the underlying UnionVector using
the UnionWriter interface.

	For example,

UnionListWriter list;


	This code first indicates that a new list is starting. By doing this, the offset vector is
correctly set. Calling map() sets the internal state of the writer to "MAP". bigInt("a") sets
the current
	field of the writer to "a", and writeBigInt(1) writes the value 1 to the underlying UnionVector.
	Another example:

MapWriter mapWriter = list.map().map("a")

	In this case, the final call to map("a") delegates to the underlying UnionWriter, and returns
a new MapWriter, with the position set according to the current offset.

> Create a new EmbeddedVector
> ---------------------------
>                 Key: DRILL-3229
>                 URL: https://issues.apache.org/jira/browse/DRILL-3229
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Execution - Codegen, Execution - Data Types, Execution - Relational
Operators, Functions - Drill
>            Reporter: Jacques Nadeau
>            Assignee: Steven Phillips
>             Fix For: Future
> Embedded Vector will leverage a binary encoding for holding information about type for
each individual field.

This message was sent by Atlassian JIRA

View raw message