lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: New Lucene features and Solr indexes
Date Wed, 13 Feb 2013 18:09:47 GMT
>>should be a stupid simple postings format like any other postings format with a default

It does have a default config. It just needs a PF delegate in the constructor just like Pulsing....
Like Rob said:
>>In other words, it should work just like pulsing.

So far so good.

Now where people are getting upset (for no particularly good reason in my view) around per-field
stuff:  if you really, really want to you can supply a subclass of BloomFilterFactory to
your BloomPF constructor which allows customised control over choice of hashing algo, bitset
sizing and saturation policies if the DefaultBloomFilterFactory fails to make the right choices.
 99.99999% of people will not do this. The reason it is a factory object and not some dumb
settings is that it is called on a per-segment basis with state info that is useful context
in making sizing choices.  Now, (horror of horrors), the factory's API is passed a FieldInfo
object in the method designed to produce a bitset. It is conceivable that some rogue agents
could choose to implement some per-field decisions here if the same BloomPF instance was registered
to handle >1 field. In addition, BloomPF has some common-sense defensive coding that checks
if the factory returns null
 for the bitset - in which case it delegates all calls un-bloomed directly to the delegate

None of this prevents the use of BloomPF with the prescribed PerFieldPF manner for handling
field-specific choices.

I happen to use a custom BloomFilterFactory to implement a more efficient indexing pipeline
than the prescribed PerFieldPF route of implementing all per-field policies "up high" in the
stack -  but none of that is at the cost of a clean BloomPF API or with any unnecessary duplication
of PerFieldPF logic. 

If anything needs changing here there may be a case for providing a convenience class that
weds BloomPF and a default choice of Lucene40 codec so it can help with whatever Solr and
other config-driven engines may need ie  zero arg constructors if that's how their registry
of codecs works.


 From: Uwe Schindler <>
Sent: Wednesday, 13 February 2013, 16:47
Subject: RE: New Lucene features and Solr indexes
Hi Shawn,

I was arguing also at the time when this was committed. I fully agree with Robert, the current
API is not in a good shape!
I have the same feeling: Bloom Postings should be a stupid simple postings format like any
other postings format with a default configuration. If you really want to change its configuration,
you can subclass it as a separate postings format.


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Shawn Heisey []
> Sent: Wednesday, February 13, 2013 3:59 PM
> To:
> Subject: Re: New Lucene features and Solr indexes
> >> BloomFilterPostingsFormat is a little special compared to other
> >> postings formats because it can wrap any postings format. So maybe it
> >> should require special support, like an additional attribute in the
> >> field type definition?
> >
> > -1
> >
> > Instead of making other APIs to accomodate BloomFilter's current
> > brokenness: remove its custom per-field logic so it works with
> > PerFieldPostingsFormat, like every other PF.
> >
> > In other words, it should work just like pulsing.
> >
> > I brought this up before it was committed, and i was ignored. Thats
> > fine, but I'll be damned if i let its incorrect design complicate
> > other parts of the codebase too. I'd rather it continue to stay
> > difficult to integrate and continue walking its current path to an
> > open source death instead.
> Robert,
> I have to send you a general thank you for your dedication to the quality of
> this project, and for your amazing ability to seemingly keep the entire design
> for Lucene in your head at all times.
> I'm not sure what exactly you want to die here, or what you think would be
> the best option for me, the Solr end-user.  Is BloomFilter something that's
> not worth pursuing, or would you just like it to be integrated in a different
> way?
> Thanks,
> Shawn
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: For additional
> commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:
View raw message