lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Alert function (aka "profiled alerting")
Date Wed, 16 Mar 2005 18:37:59 GMT
On Wednesday 16 March 2005 18:26, Robert Watkins wrote:
> We are considering Lucene as a replacement for Verity K2 (I won't go
> into the myriad reasons, other than to highlight that the K2 Java API
> is riddled with errors, falsities and just plain stupidity [okay,
> I've had my rant]) and figure that Lucene can do what we are
> currently using K2 for, except for what Verity calls "profiling"
> (matching incoming documents against a stored index of queries).
> 
> The very question I need answered was asked on 31 October, 2001 in
> a message with the subject "Alert function" (message ID 116006, I do
> believe), but there was never an answer. I will, in fact, quote this
> message, as it states the question very well indeed:
> 
> > One thing I need is an alert function, that is, instead of searching
> > with a query on a lot of documents, I want all incoming documents to be
> > searched against a bunch of stored search queries (called "agents").
> >
> > The simple solution is to index the document and then iterate through
> > all agents, but that isn't very scalable (say there are 10 new documents
> > a minute, and 10000 stored agents...).
> >
> > The way commercial products like Verity and Autonomy handles this is to
> > store the search queries (the agents) instead, and then use the incoming
> > document as a search query against the stored agents. This way it
> > becomes extremely scalable.
> >
> > Could this be a solution with Lucene too, and how would that look like?
> 
> This was originally posted by a Christian Ubbesen. I have tried to contact
> him, to see if he had managed to solve this, but have not heard back (hey,
> he could well have moved on in 3.5 years!)

There are two ways to make this scalable in using Lucene without turning
things upside town (ie. without quering the alert queries with the new
documents as queries):
- collect the new documents in a temporary index, run the alert queries
  on that index, and then merge the new and the old index, possible deleting
  old documents in the old index just before merging.
- create a filter for the new documents in the updated index and use that in a
  FilteredQuery for  each alert on the new index. For very large indexes and
  relatively small numbers of new documents this FilteredQuery scales better
  than the current one:
  http://issues.apache.org/bugzilla/show_bug.cgi?id=32965

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message