lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Indexing Boolean Expressions
Date Mon, 11 Feb 2013 18:02:06 GMT
Yeah.... except with Percolator (which uses MemoryIndex under the hood, I
believe), there is no relevancy score, so you just get match/no match, and
if you need to figure out top N best matching ads against queries derived
from page content, you need that relevancy score to get not all matching
docs, but just those top N.

No?

Otis
--
http://sematext.com/





On Mon, Feb 11, 2013 at 11:22 AM, J. Delgado <joaquin.delgado@gmail.com>wrote:

> I guess ElasticSearch went ahead of SOLR with the percolate API, which is
> exactly what is needed for two-way constraint+doc matching problem present
> in Advertising systems and other use cases:
>
> http://www.elasticsearch.org/guide/reference/api/percolate.html
>
> Cheers,
>
> Joaquin Delgado, PhD.
> http://www.linkedin.com/pub/profile/0/04b/277
>
> On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood <wunder@wunderwood.org>wrote:
>
>> Efficient rule matching goes further back, at least to "alerting" in
>> Verity K2.
>>
>> wunder
>> Search Guy, Chegg
>>
>> On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:
>>
>> BTW, the idea of indexing Boolean Expressions inside a text indexing
>> engine is not new. For example Oracle Text provides the CTXRULE index and
>> the MATCHES operator within their indexing stack, which is primarily used
>> for Rule-based text classification.
>>
>> See:
>>
>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
>>
>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
>>
>> -- J
>>
>> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <joaquin.delgado@gmail.com>wrote:
>>
>>> In full dislosure, there is a patent application that Yahoo! has filed
>>> for the use of inverted indexes for using complex  predicates for matching
>>> contracts and opportunities in advertising:
>>>
>>> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
>>>
>>> However I believe there are many more applications that can benefit from
>>> similar matching techniques (i.e. recommender systems,
>>> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
>>> exposed in the original VLDB'09 paper (which is public) in Lucene.
>>>
>>> As a Yahoo! employee, I might not be able to directly contribute to this
>>> project but will be happy to point to any publicly available pointer that
>>> can help.
>>>
>>> Cheers,
>>>
>>> -- Joaquin
>>>
>>>
>>> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Hello Joaquin,
>>>>
>>>> I looked through the paper several times, and see no problem to
>>>> implement it in Lucene (the trivial case at least):
>>>>
>>>> Let's index conjunctive condition as
>>>>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>>>>
>>>> then, form query from the incoming fact (event):
>>>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>>>>
>>>> to enforce overlap between condition and event, wrap the query above
>>>> into own query whose scorer will check that numClauses for the matched doc
>>>> is equal to number of matched clauses.
>>>> To get "numClauses for the matched doc" you can use FieldCache that's
>>>> damn fast; and "number of matched clauses" can be obtained from
>>>> DisjunctionSumScorer.nrMatchers()
>>>>
>>>> Negative clauses, and multivalue can be covered also, I believe.
>>>>
>>>> WDYT?
>>>>
>>>>
>>>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <joaquin.delgado@gmail.com>wrote:
>>>>
>>>>> I looked at LUCENE-2987 and its work on the query side (changes to the
>>>>> accepted syntax to accept lower case 'or' and 'and'), which isn't really
>>>>> related to my proposal.
>>>>>
>>>>> What I'm proposing is to be able to index complex boolean expressions
>>>>> using Lucene. This can be viewed as the opposite of the regular search
>>>>> task. The objective here is find a set of relevant queries given a document
>>>>> (assignment of values to fields).
>>>>>
>>>>> This by itself may not sound that interesting but its a key piece
>>>>> to efficiently implementing any MATCHING system which is effectively
a
>>>>> two-way search where constraints are defined both-ways. An example of
this
>>>>> would be:
>>>>>
>>>>> 1) Job matching: Potential employers define their "job posting" as a
>>>>> documents along with complex boolean expressions used to narrow potential
>>>>> candidates. Job searchers upload their "profile" and may formulate complex
>>>>> queries when executing a search. Once a is search initiated from any
of the
>>>>> sides constraints need to satisfied both ways.
>>>>> 2) Advertising: Publishers define constraints on the type of
>>>>> advertisers/ads they are willing to show in their sites. On the other
hand,
>>>>> advertisers define constraints (typically at the campaign level) on
>>>>> publisher sites they want their ads to show at as well as on the user
>>>>> audiences they are targeting to. While some attribute values are known
at
>>>>> definition time, others are only instantiated once the user visits a
given
>>>>> page which triggers a matching request that must be satisfied in
>>>>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>>>>
>>>>> So in a matching system a MATCH QUERY is considered to to be a tuple
>>>>> that consists of a value assignment to attributes/fields (doc) + a boolean
>>>>> expression (query) that goes against a double index also built on tuples
>>>>> that  simultaneously boolean expressions and associated documents.
>>>>>
>>>>> To do this efficiently we need to be able to build indexes on Boolean
>>>>> expressions (Lucene queries) and retrieve the set of matching expressions
>>>>> given a doc (typically few attributes with values assigned), which is
the
>>>>> core of what is described in this paper: "Indexing Boolean Expressions"
>>>>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>>>>
>>>>> -- J
>>>>>
>>>>>
>>>>> So to effectively resolve the problem of realtime matching one can
>>>>>
>>>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <calcmaster16@gmail.com>wrote:
>>>>>
>>>>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  So if Aayush Kothari is interested in working on this as a
>>>>>>> Student, all we need is a formal mentor (I can be the informal
one).
>>>>>>>
>>>>>>>  Anyone up for the task?
>>>>>>>
>>>>>>>
>>>>>>>   Completely interested in working for and learning about the
>>>>>> aforementioned subject/project. +1.
>>>>>>
>>>>>> This may be related to the work I'm doing with LUCENE-2987
>>>>>> Basically changing the grammar to accepts conjunctions AND and OR
in
>>>>>> the query text.
>>>>>> I would be interested in working with you on some of the details.
>>>>>>
>>>>>> However, I too am not a formal committer.
>>>>>>
>>>>>> --
>>>>>> Joe Cabreraeminorlabs.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Lucid Certified
>>>> Apache Lucene/Solr Developer
>>>> Grid Dynamics
>>>>
>>>>
>>
>>
>>
>

Mime
View raw message