lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gekkokid" ...@gekkokid.org.uk>
Subject Re: Inappropriate content detection
Date Mon, 06 Feb 2006 04:39:59 GMT
Hi, what scale is this website? millions of posts or under?

wouldn't it be easiler to use a bayesian algorithm to scan each new post 
before it is posted to detect whether it is acceptable or not? just a quick 
idea of my head



_gk

----- Original Message ----- 
From: "Jeff Thorne" <jeff_thorne@yahoo.com>
To: <java-user@lucene.apache.org>
Sent: Monday, February 06, 2006 3:56 AM
Subject: Inappropriate content detection


>I am trying to figure out whether or not Lucene is an appropriate solution
> for a problem that our site faces. Our site
>
> allows users to post their opinions on various topics. Due to various
> government legislations around the world our management would like us to
> scan each users post against various keywords that would indicate
> inappropriate content
>
> in the users posting. We are looking for racial slurs, profanity and 
> attacks
> against sexual orientation. Each users posting is
>
> generally not more that a few paragraphs.
>
>
>
> I would like to analyze each users post for various words and expressions
> before publishing their post to the DB. I am reading through the Lucene in
> action book and it looks as if I cannot analyze a string without first
> indexing it. If this is true will indexing each post be a performance hit 
> to
> the site? I was wondering if someone could shed some light on the best way
> to tackle this problem with Lucene or another api if doing so makes more
> sense?
>
>
>
> Thanks,
>
> Jeff
>
>
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message