lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: Alert function (aka "profiled alerting")
Date Thu, 17 Mar 2005 17:13:28 GMT
Robert Watkins wrote:

> Thank you, David, for your very interesting suggestions.

You're welcome, it's a fun problem.

Unless there's a requirement to the contrary, I would try to only 
execute the queries, say, 1x/day, and try to avoid all the cool 
optimizations by default - this uses one of those first principles of 
optimization, "don't do it".

I'd also suggest you look at how you invoke the (Lucene) JVM - a 
combination of the -server flag, a big argument to -Xmx (max memory), 
use of a RAMDirectory, and maybe even -XX:CompileThreshold=10 should 
give you better-than-default performance. The last secret arg tells the 
JVM to compile methods sooner.

>  As I said
> earlier in this thread somewhere, we are still at the exploratory
> stage (considering Lucene as a replacement for a commercial engine)
> so it will be some time before I can get my hands dirty, but you have
> certainly given me some good ideas. Answers to your questions are
> below.
> -- Robert
> On Thu, 17 Mar 2005, David Spencer wrote:
>>Fun, interesting question - maybe you could elaborate on the
>>requirements a bit.
> We deliver on-line content -- journals, reference works and the like.
> Users can save their own queries and set an alert on any of the saved
> queries. As new documents get published (quite a few each day) they
> are matched against the saved queries and emails are generated to let
> people know that such-and-such an article was matched by one of their
> saved queries.
>>[ snipped ]
>>- How complicated are the queries? Are they essentially a list of words
>>ANDed together, or are they generalized queries against multiple fields
>>with things like fuzzy term expansion and phrase matches allowed?
> The queries can achieve any level of complexity, as they are written by
> users. There are certainly multiple fields, wildcards, sounds-like, phrase
> matches, etc. etc.
>>- How big are the incoming documents?
> While most of the incoming documents are relatively small (under 100 Kb),
> some can be fairly large (500 Kb and more)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message