lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Pump <>
Subject Re: Index a source, but not store it... can it be done?
Date Thu, 08 Mar 2007 18:54:43 GMT
If you store a hash code of the word rather then the actual word you 
should be able to search for stuff but not be able to actually retrieve 
it; you can trade precision for "security" based on the number of bits 
in the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would be 
a reasonable midpoint.

hash64("dog") = 4312311231123121;

"body:4312311231123121" returns document with dog, but also any other 
document with a word that hashes to the same value.

Walt Stoneburner wrote:
> Have an interesting scenario I'd like to get your take on with respect
> to Lucene:
> A data provider (e.g. someone with a private website or corporately
> shared directory of proprietary documents) has requested their content
> be indexed with Lucene so employees can be redirected to it, but
> provisionally -- under no circumstance should that content be stored
> or recreated from the index.
> Is that even possible?
> The data owner's request makes sense in the context of them wanting to
> retain full access control via logins as well as collecting access
> metrics.
> If the token 'CAT' points to C:\Corporate\animals.doc and the token
> 'DOG' points also points there, then great, CAT AND DOG will give that
> document a higher rating, though it is not possible to reconstruct
> (with any great accuracy) what the actual document content is.
> However, if for the sake of using the NEAR operator with Lucene the
> tokens are stored as  LET'S:1 SELL:2 CAT:3 AND:4 DOG:5 ROBOT:6 TOYS:7
> THIS:8 DECEMBER:9 ... then someone could pull all tokens for
> animal.doc and reconstitute the token stream.
> Does Lucene have any kind of trade off for working with "secure" (and
> I use this term loosely) data?
> -wls
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message