lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <>
Subject Re: Indexing Boolean Expressions
Date Mon, 05 Mar 2012 18:05:53 GMT
I looked at LUCENE-2987 and its work on the query side (changes to the
accepted syntax to accept lower case 'or' and 'and'), which isn't really
related to my proposal.

What I'm proposing is to be able to index complex boolean expressions using
Lucene. This can be viewed as the opposite of the regular search task. The
objective here is find a set of relevant queries given a document
(assignment of values to fields).

This by itself may not sound that interesting but its a key piece
to efficiently implementing any MATCHING system which is effectively a
two-way search where constraints are defined both-ways. An example of this
would be:

1) Job matching: Potential employers define their "job posting" as a
documents along with complex boolean expressions used to narrow potential
candidates. Job searchers upload their "profile" and may formulate complex
queries when executing a search. Once a is search initiated from any of the
sides constraints need to satisfied both ways.
2) Advertising: Publishers define constraints on the type of
advertisers/ads they are willing to show in their sites. On the other hand,
advertisers define constraints (typically at the campaign level) on
publisher sites they want their ads to show at as well as on the user
audiences they are targeting to. While some attribute values are known at
definition time, others are only instantiated once the user visits a given
page which triggers a matching request that must be satisfied in
few milliseconds to select "valid" ads and then scored based on "relevance".

So in a matching system a MATCH QUERY is considered to to be a tuple that
consists of a value assignment to attributes/fields (doc) + a boolean
expression (query) that goes against a double index also built on tuples
that  simultaneously boolean expressions and associated documents.

To do this efficiently we need to be able to build indexes on Boolean
expressions (Lucene queries) and retrieve the set of matching expressions
given a doc (typically few attributes with values assigned), which is the
core of what is described in this paper: "Indexing Boolean Expressions"

-- J

So to effectively resolve the problem of realtime matching one can

On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <> wrote:

>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>  So if Aayush Kothari is interested in working on this as a Student, all
>> we need is a formal mentor (I can be the informal one).
>>  Anyone up for the task?
>>   Completely interested in working for and learning about the
> aforementioned subject/project. +1.
> This may be related to the work I'm doing with LUCENE-2987
> Basically changing the grammar to accepts conjunctions AND and OR in the
> query text.
> I would be interested in working with you on some of the details.
> However, I too am not a formal committer.
> --
> Joe

View raw message