lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Israel Ekpo <israele...@gmail.com>
Subject Re: Bitwise Operations on Integer Fields in Lucene and Solr Index
Date Fri, 14 May 2010 02:28:09 GMT
Correction,

I meant to list

https://issues.apache.org/jira/browse/LUCENE-2460
https://issues.apache.org/jira/browse/SOLR-1913



On Thu, May 13, 2010 at 10:13 PM, Israel Ekpo <israelekpo@gmail.com> wrote:

> I have created two ISSUES as new features
>
> https://issues.apache.org/jira/browse/LUCENE-1560
>
> https://issues.apache.org/jira/browse/SOLR-1913
>
> The first one is for the Lucene Filter.
>
> The second one is for the Solr QParserPlugin
>
> The source code and jar files are attached and the Solr plugin is available
> for use immediately.
>
>
>
>
> On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki <ab@getopt.org> wrote:
>
>> On 2010-05-13 23:27, Israel Ekpo wrote:
>> > Hello Lucene and Solr Community
>> >
>> > I have a custom org.apache.lucene.search.Filter that I would like to
>> > contribute to the Lucene and Solr projects.
>> >
>> > So I would need some direction as to how to create and ISSUE or submit a
>> > patch.
>> >
>> > It looks like there have been changes to the way this is done since the
>> > latest merge of the two projects (Lucene and Solr).
>> >
>> > Recently, some Solr users have been looking for a way to perform bitwise
>> > operations between and integer value and some fields in the Index
>> >
>> > So, I wrote a Solr QParser plugin to do this using a custom Lucene
>> Filter.
>> >
>> > This package makes it possible to filter results returned from a query
>> based
>> > on the results of a bitwise operation on an integer field in the
>> documents
>> > returned from the pre-constructed query.
>>
>> Hi,
>>
>> What a coincidence! :) I'm working on something very similar, only the
>> use case that I need to support is slightly different - I want to
>> support a ranked search based on a bitwise overlap of query value and
>> field value. That is, the number of differing bits would reduce the
>> score. This scenario occurs e.g. during near-duplicate detection that
>> uses fuzzy signatures, on document- or sentence levels.
>>
>> I'm going to submit my code early next week, it still needs some
>> polishing. I have two ways to execute this query, neither of which uses
>> filters at the moment:
>>
>> * method 1: during indexing the bits in the fields are turned into
>> on/off terms on the same field, and during search a BooleanQuery is
>> formed from the int value with the same terms. Scoring is courtesy of
>> BooleanScorer. This method supports only a single int value per field.
>>
>> * method 2, incomplete yet - during indexing the bits are turned into
>> terms as before, but this method supports multiple int values per field:
>> terms that correspond to bitmasks on the same value are put at the same
>> positions. Then a specialized Query / Scorer traverses all 32 posting
>> lists in parallel, moving through all matching docs and scoring
>> according to how many terms matched at the same position.
>>
>> I wrapped this in a Solr FieldType, and instead of using a custom
>> QParser plugin I simply implemented FieldType.getFieldQuery().
>>
>> It would be great to work out a convenient user-level API for this
>> feature, both the scoring and the non-scoring case.
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Mime
View raw message