drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-5488) Useless code in VectorTrimmer, fixed-width vector setValueCount()
Date Mon, 08 May 2017 23:00:07 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001706#comment-16001706
] 

Paul Rogers edited comment on DRILL-5488 at 5/8/17 11:00 PM:
-------------------------------------------------------------

Consider the {{setValueCount()}} method again:

{code}
    public void setValueCount(int valueCount) {
      final int currentValueCapacity = getValueCapacity();
      ...
      while(valueCount > getValueCapacity()) {
        reAlloc();
      }
      if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
        ...
{code}

The {{if}} statement is very probably wrong. We get the current capacity. If too small, we
realloc. Then, we check the value, after the possible reallocation, without updating the capacity.
It is not clear if we are trying to do check the before or after.

Then:

{code}
      if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
        incrementAllocationMonitor(); // is ++allocationMonitor
      } else if (allocationMonitor > 0) {
        allocationMonitor = 0;
      }
{code}

It seems we are trying to guess if we did a realloc rather than simply incrementing the monitor
when we actually do the reAlloc. It is also not clear that the if-statement mimics the if
that does the reAlloc.

In fact, a close inspection of the {{setValueCount()}} method, shows that it does not even
work correctly. It is allocating enough space to hold the value count, which will implicitly
set the unused slots to 0. But, this is an offset vector; we should be copying forward the
last set offset. Consider an example:

\[0, 3, 9] as set values.

Set row count as 5. What we create is:

\[0, 3, 9, 0, 0, 0]

What we should have set is:

\[0, 3, 9, 9, 9, 9]

It seems to be the responsibility of the caller to set the last value, so what we actually
set today, in this case, is:

\[0, 3, 9, 0, 0, 9]

The caveat is that the above works if the caller diligently fills in the offsets row by row.
But, if it die, there would be no reason to extend the vector in {{setValueCount()}}. It has
to be one way or the other, not the ambiguous middle that we have in the code now.


was (Author: paul-rogers):
Consider the {{setValueCount()}} method again:

{code}
    public void setValueCount(int valueCount) {
      final int currentValueCapacity = getValueCapacity();
      ...
      while(valueCount > getValueCapacity()) {
        reAlloc();
      }
      if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
        ...
{code}

The {{if}} statement is very probably wrong. We get the current capacity. If too small, we
realloc. Then, we check the value, after the possible reallocation, without updating the capacity.
It is not clear if we are trying to do check the before or after.

Then:

{code}
      if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
        incrementAllocationMonitor(); // is ++allocationMonitor
      } else if (allocationMonitor > 0) {
        allocationMonitor = 0;
      }
{code}

It seems we are trying to guess if we did a realloc rather than simply incrementing the monitor
when we actually do the reAlloc. It is also not clear that the if-statement mimics the if
that does the reAlloc.

In fact, a close inspection of the {{setValueCount()}} method, shows that it does not even
work correctly. It is allocating enough space to hold the value count, which will implicitly
set the unused slots to 0. But, this is an offset vector; we should be copying forward the
last set offset. Consider an example:

\[0, 3, 9] as set values.

Set row count as 5. What we create is:

\[0, 3, 9, 0, 0, 0]

What we should have set is:

\[0, 3, 9, 9, 9, 9]

It seems to be the responsibility of the caller to set the last value, so what we actually
set today, in this case, is:

\[0, 3, 9, 0, 0, 9]

> Useless code in VectorTrimmer, fixed-width vector setValueCount()
> -----------------------------------------------------------------
>
>                 Key: DRILL-5488
>                 URL: https://issues.apache.org/jira/browse/DRILL-5488
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Trivial
>
> Consider this code in a generated fixed-width vector, such as UInt4Vector:
> {code}
>     @Override
>     public void setValueCount(int valueCount) {
>       ...
>       final int idx = (VALUE_WIDTH * valueCount);
>       ...
>       VectorTrimmer.trim(data, idx);
>       data.writerIndex(valueCount * VALUE_WIDTH);
>     }
> {code}
> Consider the {{trim()}} method:
> {code}
> public class VectorTrimmer {
>   ...
>   public static void trim(ByteBuf data, int idx) {
>     data.writerIndex(idx);
>     if (data instanceof DrillBuf) {
>       // data.capacity(idx);
>       data.writerIndex(idx);
>     }
>   }
> }
> {code}
> This method is called {{trim}}, but it actually sets the writer index in the buffer (though
we never use that index.) Since all buffers we use are {{DrillBuf}}, the if-statement is a
no-op: we simply set the writer index twice.
> But, notice that the {{setValueCount()}} method itself calls the same {{writerIndex()}}
method, so it is actually being called three times.
> It seems this code can simply be discarded: it is called from only two places; neither
of which end up using the writer index.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message