lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Indexing Boolean Expressions
Date Mon, 26 Mar 2012 17:17:38 GMT
Efficient rule matching goes further back, at least to "alerting" in Verity K2.

wunder
Search Guy, Chegg

On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:

> BTW, the idea of indexing Boolean Expressions inside a text indexing engine is not new.
For example Oracle Text provides the CTXRULE index and the MATCHES operator within their indexing
stack, which is primarily used for Rule-based text classification.
> 
> See:
> 
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
> 
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
> 
> -- J
> 
> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <joaquin.delgado@gmail.com> wrote:
> In full dislosure, there is a patent application that Yahoo! has filed for the use of
inverted indexes for using complex  predicates for matching contracts and opportunities in
advertising:
> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
> 
> However I believe there are many more applications that can benefit from similar matching
techniques (i.e. recommender systems, e-commerce, recruiting,etc) to make it worthwhile implementing
the ideas exposed in the original VLDB'09 paper (which is public) in Lucene.
> 
> As a Yahoo! employee, I might not be able to directly contribute to this project but
will be happy to point to any publicly available pointer that can help.
> 
> Cheers,
> 
> -- Joaquin
> 
> 
> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <mkhludnev@griddynamics.com>
wrote:
> Hello Joaquin,
> 
> I looked through the paper several times, and see no problem to implement it in Lucene
(the trivial case at least):
> 
> Let's index conjunctive condition as
>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
> 
> then, form query from the incoming fact (event):
> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
> 
> to enforce overlap between condition and event, wrap the query above into own query whose
scorer will check that numClauses for the matched doc is equal to number of matched clauses.

> To get "numClauses for the matched doc" you can use FieldCache that's damn fast; and
"number of matched clauses" can be obtained from DisjunctionSumScorer.nrMatchers()
> 
> Negative clauses, and multivalue can be covered also, I believe.  
> 
> WDYT?
> 
> 
> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <joaquin.delgado@gmail.com> wrote:
> I looked at LUCENE-2987 and its work on the query side (changes to the accepted syntax
to accept lower case 'or' and 'and'), which isn't really related to my proposal.
> 
> What I'm proposing is to be able to index complex boolean expressions using Lucene. This
can be viewed as the opposite of the regular search task. The objective here is find a set
of relevant queries given a document (assignment of values to fields).
> 
> This by itself may not sound that interesting but its a key piece to efficiently implementing
any MATCHING system which is effectively a two-way search where constraints are defined both-ways.
An example of this would be:
> 
> 1) Job matching: Potential employers define their "job posting" as a documents along
with complex boolean expressions used to narrow potential candidates. Job searchers upload
their "profile" and may formulate complex queries when executing a search. Once a is search
initiated from any of the sides constraints need to satisfied both ways. 
> 2) Advertising: Publishers define constraints on the type of advertisers/ads they are
willing to show in their sites. On the other hand, advertisers define constraints (typically
at the campaign level) on publisher sites they want their ads to show at as well as on the
user audiences they are targeting to. While some attribute values are known at definition
time, others are only instantiated once the user visits a given page which triggers a matching
request that must be satisfied in few milliseconds to select "valid" ads and then scored based
on "relevance".
> 
> So in a matching system a MATCH QUERY is considered to to be a tuple that consists of
a value assignment to attributes/fields (doc) + a boolean expression (query) that goes against
a double index also built on tuples that  simultaneously boolean expressions and associated
documents.
> 
> To do this efficiently we need to be able to build indexes on Boolean expressions (Lucene
queries) and retrieve the set of matching expressions given a doc (typically few attributes
with values assigned), which is the core of what is described in this paper: "Indexing Boolean
Expressions" (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
> 
> -- J
> 
> 
> So to effectively resolve the problem of realtime matching one can 
> 
> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <calcmaster16@gmail.com> wrote:
> On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>> 
>> 
>> 
>> 
>> So if Aayush Kothari is interested in working on this as a Student, all we need is
a formal mentor (I can be the informal one). 
>> 
>> Anyone up for the task?
>> 
>> 
>> Completely interested in working for and learning about the aforementioned subject/project.
+1.  
> This may be related to the work I'm doing with LUCENE-2987
> Basically changing the grammar to accepts conjunctions AND and OR in the query text.
> I would be interested in working with you on some of the details.
> 
> However, I too am not a formal committer.
> 
> -- 
> Joe Cabrera
> eminorlabs.com
> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
> 





Mime
View raw message