lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Pump <jp...@mindspring.com>
Subject Re: Index a source, but not store it... can it be done?
Date Thu, 08 Mar 2007 18:54:43 GMT
If you store a hash code of the word rather then the actual word you 
should be able to search for stuff but not be able to actually retrieve 
it; you can trade precision for "security" based on the number of bits 
in the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would be 
a reasonable midpoint.

hash64("dog") = 4312311231123121;

"body:4312311231123121" returns document with dog, but also any other 
document with a word that hashes to the same value.


Walt Stoneburner wrote:
> Have an interesting scenario I'd like to get your take on with respect
> to Lucene:
>
> A data provider (e.g. someone with a private website or corporately
> shared directory of proprietary documents) has requested their content
> be indexed with Lucene so employees can be redirected to it, but
> provisionally -- under no circumstance should that content be stored
> or recreated from the index.
>
> Is that even possible?
>
> The data owner's request makes sense in the context of them wanting to
> retain full access control via logins as well as collecting access
> metrics.
>
> If the token 'CAT' points to C:\Corporate\animals.doc and the token
> 'DOG' points also points there, then great, CAT AND DOG will give that
> document a higher rating, though it is not possible to reconstruct
> (with any great accuracy) what the actual document content is.
>
> However, if for the sake of using the NEAR operator with Lucene the
> tokens are stored as  LET'S:1 SELL:2 CAT:3 AND:4 DOG:5 ROBOT:6 TOYS:7
> THIS:8 DECEMBER:9 ... then someone could pull all tokens for
> animal.doc and reconstitute the token stream.
>
> Does Lucene have any kind of trade off for working with "secure" (and
> I use this term loosely) data?
>
> -wls
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message