lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <>
Subject Re: Inappropriate content detection
Date Mon, 06 Feb 2006 05:01:14 GMT
Jeff Thorne wrote:
> I am trying to figure out whether or not Lucene is an appropriate solution
> for a problem that our site faces.
> I would like to analyze each users post for various words and expressions
> before publishing their post to the DB. I am reading through the Lucene in
> action book and it looks as if I cannot analyze a string without first
> indexing it. If this is true will indexing each post be a performance hit to
> the site? I was wondering if someone could shed some light on the best way
> to tackle this problem with Lucene or another api if doing so makes more
> sense?

You can definitely use Lucene's analyser classes without indexing.  Our 
own application does this when it needs to do things like highlighting 
text on the screen.

The idea would be you'd have a bunch of terms which are considered 
nasty, and then every new document would get analysed, and you would 
look through the terms returned from the analyser for the suspicious ones.

But no, it certainly isn't something that Lucene as a whole is very good 
at solving.  Lucene is fast for executing a single query against 
multiple documents, but what you really need is something fast for 
executing multiple queries against a single document.


Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message