lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walt Stoneburner" <>
Subject Index a source, but not store it... can it be done?
Date Thu, 08 Mar 2007 15:28:59 GMT
Have an interesting scenario I'd like to get your take on with respect
to Lucene:

A data provider (e.g. someone with a private website or corporately
shared directory of proprietary documents) has requested their content
be indexed with Lucene so employees can be redirected to it, but
provisionally -- under no circumstance should that content be stored
or recreated from the index.

Is that even possible?

The data owner's request makes sense in the context of them wanting to
retain full access control via logins as well as collecting access

If the token 'CAT' points to C:\Corporate\animals.doc and the token
'DOG' points also points there, then great, CAT AND DOG will give that
document a higher rating, though it is not possible to reconstruct
(with any great accuracy) what the actual document content is.

However, if for the sake of using the NEAR operator with Lucene the
tokens are stored as  LET'S:1 SELL:2 CAT:3 AND:4 DOG:5 ROBOT:6 TOYS:7
THIS:8 DECEMBER:9 ... then someone could pull all tokens for
animal.doc and reconstitute the token stream.

Does Lucene have any kind of trade off for working with "secure" (and
I use this term loosely) data?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message