hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <>
Subject Re: bloom filter used in 0.14?
Date Fri, 29 Jan 2016 00:53:29 GMT

To get optimal performance from bloom filter, make sure the records of col1 are sorted. Sorted
on the column of interest will efficiently prune stripes and row groups. If the records that
you are searching for is spread across row groups (10K rows by default) or stripes (64MB by
default) then ORC reader will have to read all/most of the row groups and stripes. Sorting
helps to cluster the records together to making pruning better.


On Jan 28, 2016, at 6:46 PM, Frank Luo <<>>


I have a huge table that I periodically want to do select on some particular value. For example,
supposing I have a table for the entire world population. Then I know the id of “1234”
is criminal, hence I want to pull out his information from the table.

Without any optimization, I have to use thousands of mappers to find just one id. So not ideal.
I tried to enable bloom-filter on the column that I want to search on. But a simple query
shows that the amount of data read is the same as that without a bloom-filter. So I am questioning
whether it is enabled on the version I am on, which is 0.14. Does anyone know? If bloom-filter
is not the way to go, does anyone have suggestions?

Here is the hql:

create table test
  col1   STRING,
   col2   STRING
tblproperties ("orc.bloom.filter.columns"="col1");

select * from test where col1 = ‘1234’;



This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.

View raw message