lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Negative Filtering (such as for profanity)
Date Wed, 07 Mar 2007 19:07:38 GMT
Not sure if this helpful given your proposed solution, but could you  
do something on the indexing side, such as:

1.  Remove the profanity from the token stream, much like a  
stopword.  This would also mean stripping it from the display text
2. If your TokenFilter comes across a profanity, somehow mark the  
document as containing a profanity via a "profanity" Field (not sure  
if there is a way, in Lucene, to add another Field while you are in  
the analysis phase, but you could also have it update a table in a db  
or something.)  Then on search, you could just say (regular query)  
+profanity:false

HTH,
Grant

On Mar 7, 2007, at 10:07 AM, Greg Gershman wrote:

> I'm attempting to create a profanity filter.  I thought to use a  
> QueryFilter created with a Query of (-$#!+ AND -@#$% AND etc).  The  
> problem I have run into is that, as a pure negative query is not  
> supported (a query for (-term) DOES NOT return the inverse of a  
> query for (term)), I believe the bit set returned by a purely  
> negative QueryFilter is empty, so no matter how many results  
> returned by the initial query, the result after filtering is always  
> zero documents.
>
> I was wondering if anyone had suggestions as to how else to do  
> this.  I've considered simply amending the query string submitted  
> by the user to include a pre-generated String that would exclude  
> the query terms, but I consider this a non-elegant solution.  I had  
> also thought about creating a new sub-class of QueryFilter,  
> NegativeQueryFilter.  Basically, it would works just like a  
> QueryFilter, taking a positive query (so, I would pass it an OR'ed  
> list of profane words), then the resulting bits are simply  
> flipped.  I think this would work, unless I'm missing something.   
> I'm going to experiment with it, I'd appreciate anyone's thoughts  
> on this.
>
> Thanks,
>
> Greg
>
>
>
>
>
> ______________________________________________________________________ 
> ______________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message