lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <joaquin.delg...@gmail.com>
Subject Re: Indexing Boolean Expressions
Date Mon, 26 Mar 2012 17:07:50 GMT
In full dislosure, there is a patent application that Yahoo! has filed for
the use of inverted indexes for using complex  predicates for matching
contracts and opportunities in advertising:
http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false

However I believe there are many more applications that can benefit from
similar matching techniques (i.e. recommender systems,
e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
exposed in the original VLDB'09 paper (which is public) in Lucene.

As a Yahoo! employee, I might not be able to directly contribute to this
project but will be happy to point to any publicly available pointer that
can help.

Cheers,

-- Joaquin


On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Hello Joaquin,
>
> I looked through the paper several times, and see no problem to implement
> it in Lucene (the trivial case at least):
>
> Let's index conjunctive condition as
>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>
> then, form query from the incoming fact (event):
> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>
> to enforce overlap between condition and event, wrap the query above into
> own query whose scorer will check that numClauses for the matched doc is
> equal to number of matched clauses.
> To get "numClauses for the matched doc" you can use FieldCache that's damn
> fast; and "number of matched clauses" can be obtained from
> DisjunctionSumScorer.nrMatchers()
>
> Negative clauses, and multivalue can be covered also, I believe.
>
> WDYT?
>
>
> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <joaquin.delgado@gmail.com>wrote:
>
>> I looked at LUCENE-2987 and its work on the query side (changes to the
>> accepted syntax to accept lower case 'or' and 'and'), which isn't really
>> related to my proposal.
>>
>> What I'm proposing is to be able to index complex boolean expressions
>> using Lucene. This can be viewed as the opposite of the regular search
>> task. The objective here is find a set of relevant queries given a document
>> (assignment of values to fields).
>>
>> This by itself may not sound that interesting but its a key piece
>> to efficiently implementing any MATCHING system which is effectively a
>> two-way search where constraints are defined both-ways. An example of this
>> would be:
>>
>> 1) Job matching: Potential employers define their "job posting" as a
>> documents along with complex boolean expressions used to narrow potential
>> candidates. Job searchers upload their "profile" and may formulate complex
>> queries when executing a search. Once a is search initiated from any of the
>> sides constraints need to satisfied both ways.
>> 2) Advertising: Publishers define constraints on the type of
>> advertisers/ads they are willing to show in their sites. On the other hand,
>> advertisers define constraints (typically at the campaign level) on
>> publisher sites they want their ads to show at as well as on the user
>> audiences they are targeting to. While some attribute values are known at
>> definition time, others are only instantiated once the user visits a given
>> page which triggers a matching request that must be satisfied in
>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>
>> So in a matching system a MATCH QUERY is considered to to be a tuple that
>> consists of a value assignment to attributes/fields (doc) + a boolean
>> expression (query) that goes against a double index also built on tuples
>> that  simultaneously boolean expressions and associated documents.
>>
>> To do this efficiently we need to be able to build indexes on Boolean
>> expressions (Lucene queries) and retrieve the set of matching expressions
>> given a doc (typically few attributes with values assigned), which is the
>> core of what is described in this paper: "Indexing Boolean Expressions"
>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>
>> -- J
>>
>>
>> So to effectively resolve the problem of realtime matching one can
>>
>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <calcmaster16@gmail.com>wrote:
>>
>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>
>>>
>>>
>>>
>>>>  So if Aayush Kothari is interested in working on this as a Student,
>>>> all we need is a formal mentor (I can be the informal one).
>>>>
>>>>  Anyone up for the task?
>>>>
>>>>
>>>>   Completely interested in working for and learning about the
>>> aforementioned subject/project. +1.
>>>
>>> This may be related to the work I'm doing with LUCENE-2987
>>> Basically changing the grammar to accepts conjunctions AND and OR in the
>>> query text.
>>> I would be interested in working with you on some of the details.
>>>
>>> However, I too am not a formal committer.
>>>
>>> --
>>> Joe Cabreraeminorlabs.com
>>>
>>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>
>
>

Mime
View raw message