lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Babak Farhang <farh...@gmail.com>
Subject Re: Lucene Index Encryption
Date Mon, 11 May 2009 03:27:33 GMT
Seems to me this discussion is not necessarily limited to
*encryption*: if you can implement encryption, you can also implement
compression--which is perhaps interesting for archiving purposes (at
access time, faster than unzipping an entire archived Directory and
loading it, for example).

>> Lucene needs to be able to ask a RAF opened for writing what it's
>> current "position" is during indexing..

If "position" is the only thing Lucene needs during writing, then that
is good news: seeking backwards and overwriting what's already
written--that would be difficult to implement.  If Lucene employs a
write once strategy for file I/O (w/ no exceptions), then we wont
really need a true RAF during the *write* phase: all we need is an
"append-only" RAF. No?

> What could be done in the directory implementation is to also encrypt chunks
> and on each read/write access (write: re-read the chunk, decrypt, change
> requested bytes, encrypt, store chunk; read: read the chunk, decrypt, return
> bytes - as you see writing is very costly. So a caching needs to be

I think that's the standard approach to the problem. Typically, the
chunks are fixed-length, say 32k, providing random access over the
chunks. Chunks can have headers encoding positional information (or
the information can be stored in an auxiliary file) in order to
support the setup Uwe describes above.

-Babak
http://skwish.sourceforge.net/

On Tue, May 5, 2009 at 3:07 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> Lucene needs to be able to ask a RAF opened for writing what it's
>> current "position" is during indexing, which it then stores away, and
>> later during searching it needs to ask a RAF opened for reading to
>> seek back to that position so it can read bytes from there.  Would the
>> encryption APIs allow this?
>
> And this would be the problem, as depending on your encryption, the most
> encryptions need a "stream" and cannot restart somewhere! E.g. Filesystem
> encryptions encrypt block-wise.
>
> What could be done in the directory implementation is to also encrypt chunks
> and on each read/write access (write: re-read the chunk, decrypt, change
> requested bytes, encrypt, store chunk; read: read the chunk, decrypt, return
> bytes - as you see writing is very costly. So a caching needs to be
> implemented in the Directory impl, but calling flush should write the
> encrypted chunk). Another possibility is to copy the decrypted files into a
> RAMDirectory. E.g. maybe use some algorithm like in this new recently added
> Split-Directory.
>
>> If this is possible then couldn't one make a Directory impl that hides
>> all encryption/decryption "under the hood"?
>>
>> (Presumably performance will suffer perhaps substantially since every
>> search will need to decrypt on the fly...).
>>
>> Mike
>>
>> On Mon, May 4, 2009 at 6:29 PM,  <Peter_Lenahan@ibi.com> wrote:
>> >
>> > I hope to make this a discussion rather than a request for a feature.
>> >
>> > In the database world, secure data is always encrypted in the database.
>> > Since I am interested in storing data from a database in the index, at
>> > times I want to encrypt the index when the file is one disk.
>> >
>> > Currently data stored in the Lucene Index is easily accessible to any
>> > program that wants to access it. You cannot store sensitive data in the
>> > index without the fear that it will be readable by all the people that
>> > have access to the system.
>> >
>> > There are two other posts in the mailing list that ask a question about
>> > Lucene Index Encryption. In both cases, I think that the conservation
>> > was dropped or the feature put off.
>> >
>> > Basically, I am asking for comments on the topic. I might consider
>> > coding the feature, but I would only do it if I am sure that the feature
>> > would be useful and accepted back into the core codebase of Lucene.
>> >
>> > The Sun javax.crypto package is available in the JDK 1.4 so using that
>> > package could be possible way of providing an encrypted file.
>> >
>> > The other option is Bouncy Castle, which is now being used in the PDFBox
>> > and Tika projects.
>> >
>> > In any case, because the normal Lucene Index implementation would not
>> > use an encrypted index, all references to Security classes should load
>> > dynamically with the "Class.forName()" method if they were not part of a
>> > standard JRE, to guarantee no additional requirements are placed on
>> > people currently using the Lucene libraries.
>> >
>> > Then there is the issue of what to use as the Encryption Key, and how to
>> > allow access to the Index files from the various programs that may need
>> > to get to the data. The Encryption Key needs to external from any
>> > program that accesses the Index, because with Java, if the key were
>> > stored in the code, it would be easily found with a simple decompile of
>> > the Java class.
>> >
>> > I don't have answers to the questions, but basically I am requesting
>> > comments on the topic.
>> >
>> > I imagine that if I put Encryption and Decryption at the I/O level,
>> > immediately before a segment was written or immediately after a segment
>> > was read, that I would minimize the overall impact of the Lucene
>> > Library.
>> >
>> > Another area to address is Remote Searching. The Remote Interface would
>> > need extensions that allow for Encrypted Remote files as well as
>> > Encrypted communication between the machines.
>> >
>> > However, I am not sure of these assumptions. I don't know how many
>> > places the segments are read and written. I really do not know how to do
>> > this currently, but would be willing to give it a try it there was
>> > enough interest shown in the topic.
>> >
>> >
>> > Peter
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message