lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Re: Lucene Index Encryption
Date Fri, 08 May 2009 17:27:13 GMT
I might be missing something here, but why not just store the index on  
a cryptographic virtual file system?


8 maj 2009 kl. 19.09 skrev <> < 

> Michael,
> Thanks for the comments they are very insightful.
> I hadn't thought about the Random Access issues until you brought it  
> up.
> This makes the project a little tougher, but not impossible.
> I was searching last night and there have been a couple of papers
> written on the topic of Encrypted Random Access files at MIT.
> I haven't finished reading all of them yet, but they suggest ways of
> solving the Encryption problems for Random Access Files.
> I am going to spend a few day looking at the various papers before I
> waste your time discussing this any further.
>> (Presumably performance will suffer perhaps substantially since every
>> search will need to decrypt on the fly...).
> Yes, I imagine that there will be a performance hit, this will add
> significant overhead to every byte that Lucene accesses.
> However, in some applications the price of having unsecure data is
> unacceptable, when secure data is published to a laptop for use  
> offline.
> In this case, the additional time needed to access the index would be
> acceptable.
> Examples: Military, Medical, and Financial information.
> Thanks,
> Peter
> Subject:
> Re: Lucene Index Encryption
> <>
> Actions...
> From:
> Michael McCandless (
> Date:
> May 5, 2009 1:22:00 am
> List:
> Would you encrypt at the file level?  Ie, the encryption would live
> "under" a RandomAccessFile (RAF) and otherwise feel "normal" to
> Lucene?
> (I think I remember others exploring encryption at the individual term
> level, which is interesting but does leak information in that you can
> see individual terms & their frequencies).
> Lucene needs to be able to ask a RAF opened for writing what it's
> current "position" is during indexing, which it then stores away, and
> later during searching it needs to ask a RAF opened for reading to
> seek back to that position so it can read bytes from there.  Would the
> encryption APIs allow this?
> If this is possible then couldn't one make a Directory impl that hides
> all encryption/decryption "under the hood"?
> (Presumably performance will suffer perhaps substantially since every
> search will need to decrypt on the fly...).
> Mike
> On Mon, May 4, 2009 at 6:29 PM,  <> wrote:
> I hope to make this a discussion rather than a request for a feature.
> In the database world, secure data is always encrypted in the  
> database.
> Since I am interested in storing data from a database in the index, at
> times I want to encrypt the index when the file is one disk.
> Currently data stored in the Lucene Index is easily accessible to any
> program that wants to access it. You cannot store sensitive data in  
> the
> index without the fear that it will be readable by all the people that
> have access to the system.
> There are two other posts in the mailing list that ask a question  
> about
> Lucene Index Encryption. In both cases, I think that the conservation
> was dropped or the feature put off.
> Basically, I am asking for comments on the topic. I might consider
> coding the feature, but I would only do it if I am sure that the  
> feature
> would be useful and accepted back into the core codebase of Lucene.
> The Sun javax.crypto package is available in the JDK 1.4 so using that
> package could be possible way of providing an encrypted file.
> The other option is Bouncy Castle, which is now being used in the  
> PDFBox
> and Tika projects.
> In any case, because the normal Lucene Index implementation would not
> use an encrypted index, all references to Security classes should load
> dynamically with the "Class.forName()" method if they were not part  
> of a
> standard JRE, to guarantee no additional requirements are placed on
> people currently using the Lucene libraries.
> Then there is the issue of what to use as the Encryption Key, and  
> how to
> allow access to the Index files from the various programs that may  
> need
> to get to the data. The Encryption Key needs to external from any
> program that accesses the Index, because with Java, if the key were
> stored in the code, it would be easily found with a simple decompile  
> of
> the Java class.
> I don't have answers to the questions, but basically I am requesting
> comments on the topic.
> I imagine that if I put Encryption and Decryption at the I/O level,
> immediately before a segment was written or immediately after a  
> segment
> was read, that I would minimize the overall impact of the Lucene
> Library.
> Another area to address is Remote Searching. The Remote Interface  
> would
> need extensions that allow for Encrypted Remote files as well as
> Encrypted communication between the machines.
> However, I am not sure of these assumptions. I don't know how many
> places the segments are read and written. I really do not know how  
> to do
> this currently, but would be willing to give it a try it there was
> enough interest shown in the topic.
> Peter
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message