spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Question about Bloom Filter in Spark 2.0
Date Wed, 22 Jun 2016 18:07:18 GMT
You should see at it both levels: there is one bloom filter for Orc data and one for data in-memory.


It is already a good step towards an integration of format and in-memory representation for
columnar data. 

> On 22 Jun 2016, at 14:01, BaiRan <lizbai@icloud.com> wrote:
> 
> After building bloom filter on existing data, does spark engine utilise bloom filter
during query processing?
> Is there any plan about predicate push down by using bloom filter in ORC / Parquet?
> 
> Thanks
> Ran
>> On 22 Jun, 2016, at 10:48 am, Reynold Xin <rxin@databricks.com> wrote:
>> 
>> SPARK-12818 is about building a bloom filter on existing data. It has nothing to
do with the ORC bloom filter, which can be used to do predicate pushdown.
>> 
>> 
>>> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <lizbai@icloud.com> wrote:
>>> Hi all,
>>> 
>>> I have a question about bloom filter implementation in Spark-12818 issue. If
I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
>>> Thanks.
>>> 
>>> Best,
>>> Ran
> 

Mime
View raw message