arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fan Liya <liya.fa...@gmail.com>
Subject Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
Date Mon, 06 May 2019 07:46:22 GMT
Hi Jacques,

Thank you so much for your kind reminder.

To come up with some performance data, I have set up an environment and run
some micro-benchmarks.
The server runs Linux, has 64 cores and has 256 GB memory.
The benchmarks are simple iterations over some double vectors (the source
file is attached):

  @Benchmark
  @BenchmarkMode(Mode.AverageTime)
  @OutputTimeUnit(TimeUnit.MICROSECONDS)
  public void testSafe() {
    safeSum = 0;
    for (int i = 0; i < VECTOR_LENGTH; i++) {
      safeVector.set(i, i + 10.0);
      safeSum += safeVector.get(i);
    }
  }

  @Benchmark
  @BenchmarkMode(Mode.AverageTime)
  @OutputTimeUnit(TimeUnit.MICROSECONDS)
  public void testUnSafe() {
    unSafeSum = 0;
    for (int i = 0; i < VECTOR_LENGTH; i++) {
      unsafeVector.set(i, i + 10.0);
      unSafeSum += unsafeVector.get(i);
    }
  }

The safe vector in the testSafe benchmark is from the original Arrow
implementation, whereas the unsafe vector in the testUnsafe benchmark is
based on our initial implementation in PR
<https://github.com/apache/arrow/pull/4212> (This is not the final version.
However, we believe much overhead has been removed).
The evaluation is based on JMH framework (thanks to the suggestion from
Jacques Nadeau). The benchmarks are run so many times by the framework that
the effects of JIT are well considered.

In the first experiment, we use the default configuration (boundary
checking enabled), and the original Arrow vector is about 4 times slower:

Benchmark                       Mode  Cnt   Score   Error  Units
VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op

In the second experiment, we disable the boundary checking by JVM options:

-Ddrill.enable_unsafe_memory_access=true
-Darrow.enable_unsafe_memory_access=true

This time, the original Arrow vector is about 30% slower:

Benchmark                       Mode  Cnt  Score   Error  Units
VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op

This is a significant improvement, about 2.84x faster compared to when
bound checking is enabled.
However, in our scenario, we would still chose to bypass Arrow APIs without
hesitation, because such memory accesses are so frequent operations, that a
30% performance degradation will easily cause us lose edge.

The results can be attributed to the following factors:
1. Although the checks have been disabled, we still need to read the flag
and check it repeatedly in the Arrow APIs, which accumulates to large
performance overhead.
2. There is too much code in the call stacks, compared with the unsafe API.
This will lead to less efficient i-cache, even if JIT can avoids the cost
of stack frames by in-lining most method code.

Another, maybe separate problem is that, the
flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in a
static block. That means the only reliable way to override it is to
override system properties in the JVM command line. However, for some
scenarios, we do not have access to the command line (e.g. running Flink in
Yarn). I think this deserves a separate issue.

Best,
Liya Fan

On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <jacques@apache.org> wrote:

> >
> > Maybe I need to take a closer look at how the other SQL engines are using
> > Arrow. To see if they are also bypassing Arrow APIs.
> > I agree that a random user should be able to protect themselves, and this
> > is the utmost priority.
> >
> > According to my experience in Flink, JIT cannot optimize away the checks,
> > and removing the checks addresses the issue.
> > I want to illustrate this from two points:
> >
> > 1. Theoretical view point: JIT makes optimizations without changing
> > semantics of the code, so it can never remove the checks without changing
> > code semantics. To make it simple, if the JIT has witness the engine
> > successfully processed 1,000,000 records, how can it be sure that the
> > 1,000,001th record will be successful?
> >
> > 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB
> data
> > set. This is really a large number of records. So the JIT must have done
> > all it could to improve the code. According to the performance results,
> > however, it could not eliminate the impact caused checks.
> >
>
> I don't think you're following my point. There are two different points it
> seems like you want to discuss. Let's evaluate each separately:
>
> 1) Bounds checking for safety
> 2) Supposed inefficiency of the call hierarchy.
>
> For #1 we provide a system level property that can disable these. The JVM
> should succesfully optimize away this operation if that flag is set. Please
> look at the JIT output to confirm whether this is true.
>
> For #2: We designed things to collapse so the call hierarchy shouldn't be a
> problem. Please look at the JIT output to confirm.
>
> Please come with data around #1 and #2 to make an argument for a set of
> changes.
>
> thanks
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message