lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Israel Ekpo <israele...@gmail.com>
Subject Re: Bitwise Operations on Integer Fields in Lucene and Solr Index
Date Fri, 14 May 2010 02:13:15 GMT
I have created two ISSUES as new features

https://issues.apache.org/jira/browse/LUCENE-1560

https://issues.apache.org/jira/browse/SOLR-1913

The first one is for the Lucene Filter.

The second one is for the Solr QParserPlugin

The source code and jar files are attached and the Solr plugin is available
for use immediately.



On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki <ab@getopt.org> wrote:

> On 2010-05-13 23:27, Israel Ekpo wrote:
> > Hello Lucene and Solr Community
> >
> > I have a custom org.apache.lucene.search.Filter that I would like to
> > contribute to the Lucene and Solr projects.
> >
> > So I would need some direction as to how to create and ISSUE or submit a
> > patch.
> >
> > It looks like there have been changes to the way this is done since the
> > latest merge of the two projects (Lucene and Solr).
> >
> > Recently, some Solr users have been looking for a way to perform bitwise
> > operations between and integer value and some fields in the Index
> >
> > So, I wrote a Solr QParser plugin to do this using a custom Lucene
> Filter.
> >
> > This package makes it possible to filter results returned from a query
> based
> > on the results of a bitwise operation on an integer field in the
> documents
> > returned from the pre-constructed query.
>
> Hi,
>
> What a coincidence! :) I'm working on something very similar, only the
> use case that I need to support is slightly different - I want to
> support a ranked search based on a bitwise overlap of query value and
> field value. That is, the number of differing bits would reduce the
> score. This scenario occurs e.g. during near-duplicate detection that
> uses fuzzy signatures, on document- or sentence levels.
>
> I'm going to submit my code early next week, it still needs some
> polishing. I have two ways to execute this query, neither of which uses
> filters at the moment:
>
> * method 1: during indexing the bits in the fields are turned into
> on/off terms on the same field, and during search a BooleanQuery is
> formed from the int value with the same terms. Scoring is courtesy of
> BooleanScorer. This method supports only a single int value per field.
>
> * method 2, incomplete yet - during indexing the bits are turned into
> terms as before, but this method supports multiple int values per field:
> terms that correspond to bitmasks on the same value are put at the same
> positions. Then a specialized Query / Scorer traverses all 32 posting
> lists in parallel, moving through all matching docs and scoring
> according to how many terms matched at the same position.
>
> I wrapped this in a Solr FieldType, and instead of using a custom
> QParser plugin I simply implemented FieldType.getFieldQuery().
>
> It would be great to work out a convenient user-level API for this
> feature, both the scoring and the non-scoring case.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Mime
View raw message