lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas" <>
Subject Re: Index a source, but not store it... can it be done?
Date Fri, 09 Mar 2007 00:23:48 GMT
On 3/8/07, Chris Hostetter <> wrote:
> : If you store a hash code of the word rather then the actual word you
> : should be able to search for stuff but not be able to actually retrieve
> that's a really great solution ... it could even be implemented asa
> TokenFilter so none of your client code would ever even need to know that
> it was being used (just make sure it comes last after any stemming or what
> not)

I don't know... hashing individual words is an extremely weak form of
security that should be breakable without even using a computer... all
the statistical information is still there (somewhat like 'encrypting'
a message as a cryptoquote).

Doron's suggestion is preferable: eliminate token position information
from the index entirely.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message