impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3103: More efficient Bloom Filter serialisation.
Date Tue, 01 Mar 2016 05:29:09 GMT
Henry Robinson has uploaded a new patch set (#2).

Change subject: IMPALA-3103: More efficient Bloom Filter serialisation.
......................................................................

IMPALA-3103: More efficient Bloom Filter serialisation.

TBloomFilters have a 'directory' structure that is a list of individual
buckets (buckets are about 64k wide). The total size of the directory
can be 1MB or even much more. That leads to a lot of buckets, and very
inefficient deserialisation as each bucket has to be allocated on the
heap.

Instead, this patch changes the TBloomFilter representation to use one
contiguous string (like the real BloomFilter does, so that it can be
allocated with a single operation (and deserialized with a single copy).

This reduces the amount of kernel time used when deserializing a
TBloomFilter by about 20x, and also speeds up converting a TBloomFilter
to a 'real' BloomFilter by about 20x as well.

Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84
---
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
3 files changed, 15 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/59/2359/2
-- 
To view, visit http://gerrit.cloudera.org:8080/2359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>

Mime
View raw message