drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
Date Wed, 14 Mar 2018 04:10:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398069#comment-16398069
] 

ASF GitHub Bot commented on DRILL-6234:
---------------------------------------

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1164#discussion_r174348633
  
    --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java
---
    @@ -0,0 +1,152 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.vector;
    +
    +import org.apache.drill.common.types.TypeProtos;
    +import org.apache.drill.common.types.Types;
    +import org.apache.drill.exec.memory.RootAllocator;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +/**
    + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector.
    + */
    +public class VariableLengthVectorTest
    +{
    +  /**
    +   * If the vector contains 1000 records, setting a value count of 1000 should work.
    +   */
    +  @Test
    +  public void testSettingSameValueCount()
    +  {
    +    try (RootAllocator allocator = new RootAllocator(10_000_000)) {
    +      final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR));
    +      final VarCharVector vector = new VarCharVector(field, allocator);
    +
    +      vector.allocateNew();
    +
    +      try {
    +        final int size = 1000;
    +        final VarCharVector.Mutator mutator = vector.getMutator();
    +        final VarCharVector.Accessor accessor = vector.getAccessor();
    +
    +        setSafeIndexStrings("", 0, size, mutator);
    +
    +        mutator.setValueCount(size);
    +        Assert.assertEquals(size, accessor.getValueCount());
    +        checkIndexStrings("", 0, size, accessor);
    +      } finally {
    +        vector.clear();
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Test trunicating data. If you have 10000 records, reduce the vector to 1000 records.
    +   */
    +  @Test
    +  public void testTrunicateVectorSetValueCount()
    +  {
    +    try (RootAllocator allocator = new RootAllocator(10_000_000)) {
    +      final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR));
    +      final VarCharVector vector = new VarCharVector(field, allocator);
    +
    +      vector.allocateNew();
    +
    +      try {
    +        final int size = 1000;
    +        final int fluffSize = 10000;
    +        final VarCharVector.Mutator mutator = vector.getMutator();
    +        final VarCharVector.Accessor accessor = vector.getAccessor();
    +
    +        setSafeIndexStrings("", 0, size, mutator);
    +        setSafeIndexStrings("first cut ", size, fluffSize, mutator);
    +
    +        mutator.setValueCount(fluffSize);
    +        Assert.assertEquals(fluffSize, accessor.getValueCount());
    +
    +        mutator.setValueCount(size);
    +        Assert.assertEquals(size, accessor.getValueCount());
    +        setSafeIndexStrings("redone cut ", size, fluffSize, mutator);
    +        mutator.setValueCount(fluffSize);
    +        Assert.assertEquals(fluffSize, accessor.getValueCount());
    +
    +        mutator.setValueCount(size);
    +        Assert.assertEquals(size, accessor.getValueCount());
    +
    +        checkIndexStrings("", 0, size, accessor);
    +
    +      } finally {
    +        vector.clear();
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Set 10000 values. Then go back and set new values starting at the 1001 the record.
    --- End diff --
    
    I agree the vector writers should be used. The reason why I was looking into this is that
I saw strange behavior in the legacy HashTable where setValueCount was being called with a
larger valueCount than there was data in a VarCharVector. I did an ugly (and now I think incorrect
work around) for the issue and set about to make setValueCount support setting a valueCount
larger than the number elements in the VarCharVector. Now I am realizing the issue is with
the HashTableTemplate and I need to look into why it is behaving incorrectly.
    
    Your review has been very helpful, I have a much better understanding of how value vectors
should be used now. Thanks!


> Improve Documentation of VariableWidthVector Behavior
> -----------------------------------------------------
>
>                 Key: DRILL-6234
>                 URL: https://issues.apache.org/jira/browse/DRILL-6234
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>       final VarCharVector vector = new VarCharVector(field, allocator);
>       vector.allocateNew();
>       vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is uninitialized
you should not call get for that index.
> {code}
> 	at org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
> 	at org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
> 	at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
> 	at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
> 	at org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
> 	at org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
> 	at org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> 	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> 	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> 	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message