lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Indexing Boolean Expressions
Date Mon, 26 Mar 2012 06:44:03 GMT
Hello Joaquin,

I looked through the paper several times, and see no problem to implement
it in Lucene (the trivial case at least):

Let's index conjunctive condition as
 {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}

then, form query from the incoming fact (event):
fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD

to enforce overlap between condition and event, wrap the query above into
own query whose scorer will check that numClauses for the matched doc is
equal to number of matched clauses.
To get "numClauses for the matched doc" you can use FieldCache that's damn
fast; and "number of matched clauses" can be obtained from
DisjunctionSumScorer.nrMatchers()

Negative clauses, and multivalue can be covered also, I believe.

WDYT?

On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <joaquin.delgado@gmail.com>wrote:

> I looked at LUCENE-2987 and its work on the query side (changes to the
> accepted syntax to accept lower case 'or' and 'and'), which isn't really
> related to my proposal.
>
> What I'm proposing is to be able to index complex boolean expressions
> using Lucene. This can be viewed as the opposite of the regular search
> task. The objective here is find a set of relevant queries given a document
> (assignment of values to fields).
>
> This by itself may not sound that interesting but its a key piece
> to efficiently implementing any MATCHING system which is effectively a
> two-way search where constraints are defined both-ways. An example of this
> would be:
>
> 1) Job matching: Potential employers define their "job posting" as a
> documents along with complex boolean expressions used to narrow potential
> candidates. Job searchers upload their "profile" and may formulate complex
> queries when executing a search. Once a is search initiated from any of the
> sides constraints need to satisfied both ways.
> 2) Advertising: Publishers define constraints on the type of
> advertisers/ads they are willing to show in their sites. On the other hand,
> advertisers define constraints (typically at the campaign level) on
> publisher sites they want their ads to show at as well as on the user
> audiences they are targeting to. While some attribute values are known at
> definition time, others are only instantiated once the user visits a given
> page which triggers a matching request that must be satisfied in
> few milliseconds to select "valid" ads and then scored based on "relevance".
>
> So in a matching system a MATCH QUERY is considered to to be a tuple that
> consists of a value assignment to attributes/fields (doc) + a boolean
> expression (query) that goes against a double index also built on tuples
> that  simultaneously boolean expressions and associated documents.
>
> To do this efficiently we need to be able to build indexes on Boolean
> expressions (Lucene queries) and retrieve the set of matching expressions
> given a doc (typically few attributes with values assigned), which is the
> core of what is described in this paper: "Indexing Boolean Expressions"
> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>
> -- J
>
>
> So to effectively resolve the problem of realtime matching one can
>
> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <calcmaster16@gmail.com>wrote:
>
>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>
>>
>>
>>
>>>  So if Aayush Kothari is interested in working on this as a Student,
>>> all we need is a formal mentor (I can be the informal one).
>>>
>>>  Anyone up for the task?
>>>
>>>
>>>   Completely interested in working for and learning about the
>> aforementioned subject/project. +1.
>>
>> This may be related to the work I'm doing with LUCENE-2987
>> Basically changing the grammar to accepts conjunctions AND and OR in the
>> query text.
>> I would be interested in working with you on some of the details.
>>
>> However, I too am not a formal committer.
>>
>> --
>> Joe Cabreraeminorlabs.com
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
View raw message