arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
Date Mon, 06 May 2019 05:23:23 GMT
>
> Maybe I need to take a closer look at how the other SQL engines are using
> Arrow. To see if they are also bypassing Arrow APIs.
> I agree that a random user should be able to protect themselves, and this
> is the utmost priority.
>
> According to my experience in Flink, JIT cannot optimize away the checks,
> and removing the checks addresses the issue.
> I want to illustrate this from two points:
>
> 1. Theoretical view point: JIT makes optimizations without changing
> semantics of the code, so it can never remove the checks without changing
> code semantics. To make it simple, if the JIT has witness the engine
> successfully processed 1,000,000 records, how can it be sure that the
> 1,000,001th record will be successful?
>
> 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB data
> set. This is really a large number of records. So the JIT must have done
> all it could to improve the code. According to the performance results,
> however, it could not eliminate the impact caused checks.
>

I don't think you're following my point. There are two different points it
seems like you want to discuss. Let's evaluate each separately:

1) Bounds checking for safety
2) Supposed inefficiency of the call hierarchy.

For #1 we provide a system level property that can disable these. The JVM
should succesfully optimize away this operation if that flag is set. Please
look at the JIT output to confirm whether this is true.

For #2: We designed things to collapse so the call hierarchy shouldn't be a
problem. Please look at the JIT output to confirm.

Please come with data around #1 and #2 to make an argument for a set of
changes.

thanks

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message