drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "salim achouche (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
Date Tue, 03 Apr 2018 20:36:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424555#comment-16424555
] 

salim achouche commented on DRILL-6202:
---------------------------------------

This is my take on the Drill boundary checks:

_*Short Term -*_
 * Ideally, the Drill boundary checks should be always on as long as 
 ** The impact of a Drillbit process crash (or data corruption) is big since there is no built-in
fault-tolerance
 ** The code is extensible and extensions are allowed to access direct memory
 * Having said that, my priority would have been to minimize the cost associated with these
checks instead of completely turning them off
 ** This is no different from Java's behavior with regard to array boundary checks
 * How do we do that? Actually there are multiple strategies (which could be combined)
 ** Fine-grained checks
 *** Add boundary checks within +all+ DrillBuf data accessors
 ***  Invoke the accessor API within a loop and ensure the JVM is able to optimize the checks 
 **** This will help you answer the question(s) around whether we access DM directly or through
Netty 
 ** Caller overwrite
 *** Allow caller to disable checks that are deemed too expensive or not easily optimizable
by the HotSport (e.g., Reference Checks)
 *** This pattern works well for a centralized layer (e.g., Paul's accessor framework) but
not for extensions as they cannot be always trusted to do the right thing
 *** To mitigate this, we could always have an auxiliary flag that will force execution of
such checks if set; that is overwrite untrusted callers

 **** This should be done if a crash or corruption is observed
 ** Bulk Processing

 *** Bulk accessor APIs will allow +all the checks+ to be performed but with a minimal cost
(amortized)

_*Long Term -*_
 * With the new Accessor Framework in place all DM checks should be primarily within this
layer
 ** The promise of this layer is that other memory formats can be transparently substituted
(e.g., Apache Arrow)
 * The question on whether the runtime checks are enabled by default becomes less important
 ** The chance of crash / corruption is highly minimized
 ** It should be rather easy for this layer to optimize the runtime checks; then the question
becomes "why not?"

_*Question -*_
 * Your Jira doesn't quite explain the
 ** "why" you intend to deprecate the IndexOutOfBoundException (since it is an unchecked exception)
 ** And replace it with what other mechanism?

 

*NOTE -* 
 * To minimize bookkeeping complexity, Drill operators will upfront allocate memory for the
variable length value vectors to minimize the cost of re-allocs
 * The setSafe() APIs are called (at least for Parquet) when the associated column
 ** Has enough VV space to insert the new value(s)
 ** Can extend the current VV to the next-power-of-two; the setSafe() api is responsible for
extending the vector(s)

> Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
> ----------------------------------------------------------------
>
>                 Key: DRILL-6202
>                 URL: https://issues.apache.org/jira/browse/DRILL-6202
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Vlad Rozov
>            Assignee: Vlad Rozov
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As bounds checking may be enabled or disabled, using IndexOutOfBoundsException to resize
vectors is unreliable. It works only when bounds checking is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message