drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "salim achouche (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
Date Tue, 03 Apr 2018 20:36:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424555#comment-16424555

salim achouche commented on DRILL-6202:

This is my take on the Drill boundary checks:

_*Short Term -*_
 * Ideally, the Drill boundary checks should be always on as long as 
 ** The impact of a Drillbit process crash (or data corruption) is big since there is no built-in
 ** The code is extensible and extensions are allowed to access direct memory
 * Having said that, my priority would have been to minimize the cost associated with these
checks instead of completely turning them off
 ** This is no different from Java's behavior with regard to array boundary checks
 * How do we do that? Actually there are multiple strategies (which could be combined)
 ** Fine-grained checks
 *** Add boundary checks within +all+ DrillBuf data accessors
 ***  Invoke the accessor API within a loop and ensure the JVM is able to optimize the checks 
 **** This will help you answer the question(s) around whether we access DM directly or through
 ** Caller overwrite
 *** Allow caller to disable checks that are deemed too expensive or not easily optimizable
by the HotSport (e.g., Reference Checks)
 *** This pattern works well for a centralized layer (e.g., Paul's accessor framework) but
not for extensions as they cannot be always trusted to do the right thing
 *** To mitigate this, we could always have an auxiliary flag that will force execution of
such checks if set; that is overwrite untrusted callers

 **** This should be done if a crash or corruption is observed
 ** Bulk Processing

 *** Bulk accessor APIs will allow +all the checks+ to be performed but with a minimal cost

_*Long Term -*_
 * With the new Accessor Framework in place all DM checks should be primarily within this
 ** The promise of this layer is that other memory formats can be transparently substituted
(e.g., Apache Arrow)
 * The question on whether the runtime checks are enabled by default becomes less important
 ** The chance of crash / corruption is highly minimized
 ** It should be rather easy for this layer to optimize the runtime checks; then the question
becomes "why not?"

_*Question -*_
 * Your Jira doesn't quite explain the
 ** "why" you intend to deprecate the IndexOutOfBoundException (since it is an unchecked exception)
 ** And replace it with what other mechanism?


*NOTE -* 
 * To minimize bookkeeping complexity, Drill operators will upfront allocate memory for the
variable length value vectors to minimize the cost of re-allocs
 * The setSafe() APIs are called (at least for Parquet) when the associated column
 ** Has enough VV space to insert the new value(s)
 ** Can extend the current VV to the next-power-of-two; the setSafe() api is responsible for
extending the vector(s)

> Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
> ----------------------------------------------------------------
>                 Key: DRILL-6202
>                 URL: https://issues.apache.org/jira/browse/DRILL-6202
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Vlad Rozov
>            Assignee: Vlad Rozov
>            Priority: Major
>             Fix For: 1.14.0
> As bounds checking may be enabled or disabled, using IndexOutOfBoundsException to resize
vectors is unreliable. It works only when bounds checking is enabled.

This message was sent by Atlassian JIRA

View raw message