hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Luo <>
Subject RE: bloom filter used in 0.14?
Date Wed, 03 Feb 2016 18:19:17 GMT
Thank you all for this discussion. Very helpful.

-----Original Message-----
From: Gopal Vijayaraghavan [] On Behalf Of Gopal Vijayaraghavan
Sent: Thursday, January 28, 2016 7:43 PM
Subject: Re: bloom filter used in 0.14?

> So I am questioning whether it is enabled on the version I am on,
>which is 0.14. Does anyone know? - fix-version (1.2.0)

The version you are using does not have bloom filter support.

It should be ignoring the parameter and not generating any bloom filter streams, when writing.

hive --orcfiledump (in later versions) will print the BLOOM_FILTER as a column next to the
row index streams.

> Without any optimization, I have to use thousands of mappers to find
>just one id.

Everything else you are doing is appropriate, however be aware that the bloom filter index
(& row-index) is consulted only *after* a mapper starts up.

So it might still spin up a mapper, but it might exit immediately, which plays well into Tez
container reuse for very busy clusters - in fact, it might be faster in a busy cluster than
a completely idle one.

The sorted[1] min-max indicators suggested by Prasanth however are actually rolled up to the
split-level & can be used to prune splits before being scheduled.

[1] - only CLUSTER BY needed, not ORDER BY

This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.

View raw message