lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gekkokid">
Subject Re: Inappropriate content detection
Date Mon, 06 Feb 2006 04:39:59 GMT
Hi, what scale is this website? millions of posts or under?

wouldn't it be easiler to use a bayesian algorithm to scan each new post 
before it is posted to detect whether it is acceptable or not? just a quick 
idea of my head


----- Original Message ----- 
From: "Jeff Thorne" <>
To: <>
Sent: Monday, February 06, 2006 3:56 AM
Subject: Inappropriate content detection

>I am trying to figure out whether or not Lucene is an appropriate solution
> for a problem that our site faces. Our site
> allows users to post their opinions on various topics. Due to various
> government legislations around the world our management would like us to
> scan each users post against various keywords that would indicate
> inappropriate content
> in the users posting. We are looking for racial slurs, profanity and 
> attacks
> against sexual orientation. Each users posting is
> generally not more that a few paragraphs.
> I would like to analyze each users post for various words and expressions
> before publishing their post to the DB. I am reading through the Lucene in
> action book and it looks as if I cannot analyze a string without first
> indexing it. If this is true will indexing each post be a performance hit 
> to
> the site? I was wondering if someone could shed some light on the best way
> to tackle this problem with Lucene or another api if doing so makes more
> sense?
> Thanks,
> Jeff

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message