arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
Date Mon, 06 May 2019 09:51:59 GMT
I am still asking the same question: can you please analyze the assembly
the JIT is producing and look to identify why the disabled bounds checking
is at 30% and what types of things we can do to address. For example, we
have talked before about a bytecode transformer that simply removes the
bounds checking when loading Arrow if you want that behavior. If necessary,
that may be a big win from a code maintenance standpoint over having
duplicate interfaces.

The static block seems like a non-problem. You could simply load another
class that system property before loading any Arrow code. If you're
proposing a code change to solve your problem today, this seems just as
feasible.

The other question: in a real algorithm, how much does that 30% matter?
Your benchmarks are entirely about this one call whereas real workloads are
impacted by many things and the time in writing/reading vectors is
miniscule versus other things.

On Mon, May 6, 2019 at 1:16 PM Fan Liya <liya.fan03@gmail.com> wrote:

> Hi Jacques,
>
> Thank you so much for your kind reminder.
>
> To come up with some performance data, I have set up an environment and
> run some micro-benchmarks.
> The server runs Linux, has 64 cores and has 256 GB memory.
> The benchmarks are simple iterations over some double vectors (the source
> file is attached):
>
>   @Benchmark
>   @BenchmarkMode(Mode.AverageTime)
>   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>   public void testSafe() {
>     safeSum = 0;
>     for (int i = 0; i < VECTOR_LENGTH; i++) {
>       safeVector.set(i, i + 10.0);
>       safeSum += safeVector.get(i);
>     }
>   }
>
>   @Benchmark
>   @BenchmarkMode(Mode.AverageTime)
>   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>   public void testUnSafe() {
>     unSafeSum = 0;
>     for (int i = 0; i < VECTOR_LENGTH; i++) {
>       unsafeVector.set(i, i + 10.0);
>       unSafeSum += unsafeVector.get(i);
>     }
>   }
>
> The safe vector in the testSafe benchmark is from the original Arrow
> implementation, whereas the unsafe vector in the testUnsafe benchmark is
> based on our initial implementation in PR
> <https://github.com/apache/arrow/pull/4212> (This is not the final
> version. However, we believe much overhead has been removed).
> The evaluation is based on JMH framework (thanks to the suggestion from
> Jacques Nadeau). The benchmarks are run so many times by the framework that
> the effects of JIT are well considered.
>
> In the first experiment, we use the default configuration (boundary
> checking enabled), and the original Arrow vector is about 4 times slower:
>
> Benchmark                       Mode  Cnt   Score   Error  Units
> VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
>
> In the second experiment, we disable the boundary checking by JVM options:
>
> -Ddrill.enable_unsafe_memory_access=true
> -Darrow.enable_unsafe_memory_access=true
>
> This time, the original Arrow vector is about 30% slower:
>
> Benchmark                       Mode  Cnt  Score   Error  Units
> VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
>
> This is a significant improvement, about 2.84x faster compared to when
> bound checking is enabled.
> However, in our scenario, we would still chose to bypass Arrow APIs
> without hesitation, because such memory accesses are so frequent
> operations, that a 30% performance degradation will easily cause us lose
> edge.
>
> The results can be attributed to the following factors:
> 1. Although the checks have been disabled, we still need to read the flag
> and check it repeatedly in the Arrow APIs, which accumulates to large
> performance overhead.
> 2. There is too much code in the call stacks, compared with the unsafe
> API. This will lead to less efficient i-cache, even if JIT can avoids the
> cost of stack frames by in-lining most method code.
>
> Another, maybe separate problem is that, the
> flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in a
> static block. That means the only reliable way to override it is to
> override system properties in the JVM command line. However, for some
> scenarios, we do not have access to the command line (e.g. running Flink in
> Yarn). I think this deserves a separate issue.
>
> Best,
> Liya Fan
>
> On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <jacques@apache.org> wrote:
>
>> >
>> > Maybe I need to take a closer look at how the other SQL engines are
>> using
>> > Arrow. To see if they are also bypassing Arrow APIs.
>> > I agree that a random user should be able to protect themselves, and
>> this
>> > is the utmost priority.
>> >
>> > According to my experience in Flink, JIT cannot optimize away the
>> checks,
>> > and removing the checks addresses the issue.
>> > I want to illustrate this from two points:
>> >
>> > 1. Theoretical view point: JIT makes optimizations without changing
>> > semantics of the code, so it can never remove the checks without
>> changing
>> > code semantics. To make it simple, if the JIT has witness the engine
>> > successfully processed 1,000,000 records, how can it be sure that the
>> > 1,000,001th record will be successful?
>> >
>> > 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB
>> data
>> > set. This is really a large number of records. So the JIT must have done
>> > all it could to improve the code. According to the performance results,
>> > however, it could not eliminate the impact caused checks.
>> >
>>
>> I don't think you're following my point. There are two different points it
>> seems like you want to discuss. Let's evaluate each separately:
>>
>> 1) Bounds checking for safety
>> 2) Supposed inefficiency of the call hierarchy.
>>
>> For #1 we provide a system level property that can disable these. The JVM
>> should succesfully optimize away this operation if that flag is set.
>> Please
>> look at the JIT output to confirm whether this is true.
>>
>> For #2: We designed things to collapse so the call hierarchy shouldn't be
>> a
>> problem. Please look at the JIT output to confirm.
>>
>> Please come with data around #1 and #2 to make an argument for a set of
>> changes.
>>
>> thanks
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message