lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [ot] a reverse lucene
Date Sun, 23 Nov 2008 13:02:23 GMT
The "formal" name for this stuff is "document filtering" or just  
"filtering".  You can start on it, by looking at TREC, which had a  
filtering task for a number of years: http://trec.nist.gov/tracks.html

At any rate, one approach is to store your queries as Lucene  
documents, albeit short ones.  Then, as others have said, you index  
new, incoming docs into a Memory Index.  From that, you can extract  
the key terms which can then be used to come up with a Query to be run  
against your "query" index.  The MoreLikeThis functionality should  
help in determining the important terms.  Then, you need to decide how  
to handle dealing with the results.  You probably don't want to route  
the document to each and every query that matches.

-Grant

On Nov 23, 2008, at 2:35 AM, Ian Holsman wrote:

> Anshum wrote:
>> Hi Ian,
>> I guess that could be achieved if you write code to read the  
>> queries and
>> query for each document (using lucene).
>> Assuming that I got the question right! :)
>>
>>
>
> yes.. that is one way, but probably not the most efficient one.
>
> think of something like http://www.google.com/alerts, but instead of  
> running once a day, it would run each time it sees a document. (as- 
> it-happens mode)
> and you would have a couple of million queries to run through.
>
> regards
> Ian
>> --
>> Anshum Gupta
>> Naukri Labs!
>> http://ai-cafe.blogspot.com
>>
>> The facts expressed here belong to everybody, the opinions to me. The
>> distinction is yours to draw............
>>
>>
>> On Sun, Nov 23, 2008 at 9:27 AM, Ian Holsman <lists@holsman.net>  
>> wrote:
>>
>>
>>> Hi. apologies for the off-topic question.
>>>
>>> I was wondering if anyone knew of a open source solution (or a  
>>> pointer to
>>> the algorithms)
>>> that do the reverse of lucene.
>>> By that I mean store a whole lot of queries, and run them against a
>>> document to see which queries match it. (with a score etc)
>>>
>>> I can see the case for this would be a news-article and several  
>>> people
>>> writing queries to get alerted if it matched a certain condition.
>>>
>>>
>>> Regards
>>> Ian
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message