Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 26030 invoked from network); 8 May 2009 19:04:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 May 2009 19:04:18 -0000 Received: (qmail 12587 invoked by uid 500); 8 May 2009 19:04:15 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12552 invoked by uid 500); 8 May 2009 19:04:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12542 invoked by uid 99); 8 May 2009 19:04:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 May 2009 19:04:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Peter_Lenahan@ibi.com designates 64.74.32.43 as permitted sender) Received: from [64.74.32.43] (HELO ibigatef.ibi.com) (64.74.32.43) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 May 2009 19:04:03 +0000 Received: from ibixmailf.ibi.com (127.0.0.1) by ibigatef.ibi.com (MlfMTA v3.2r9) id h0i1260171st for ; Fri, 8 May 2009 15:03:40 -0400 (envelope-from ) Received: from IBIUSMBSA.ibi.com ([172.30.176.77]) by ibixmailf.ibi.com (SonicWALL 7.0.1.1499) with ESMTP; Fri, 08 May 2009 15:03:40 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Lucene Index Encryption Date: Fri, 8 May 2009 15:02:55 -0400 Message-ID: In-Reply-To: <1e33aedb0905081134s1337631by3e4097e53e1bdd57@mail.gmail.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Lucene Index Encryption Thread-Index: AcnQC60V7u4jaUNORNyZPUaehYi1xQAAJgUg References: <1e33aedb0905081134s1337631by3e4097e53e1bdd57@mail.gmail.com> From: To: X-Mlf-Version: 7.0.1.1499 X-Mlf-UniqueId: o200905081903400240050 X-Virus-Checked: Checked by ClamAV on apache.org You are correct, other vulnerabilities will of course be the Swap file, which is much easier to dump than the memory contents, since it may persist even when the process dies or the machine is turned off, and of course a process dump or snapshot file. In either case, those cracks would be on a system that generally has access to the data while it is running. Since the same cracks would be available in the swap or dump file say when an SQL table were in memory we would have the same issue. The reality is that the data will need to be decrypted in RAM memory at sometime in its lifetime whether it is from the Lucene index, or the original source like and SQL table or a decrypted PDF file.=20 However, if the file or the computer is stolen, an encrypted index would be useless because of the encryption would presumably match that of the original source. Other system tools outside of the scope of this discussion would need to wipe the swap file of secure content when the process dies off the hard drive. Peter -----Original Message----- From: patrick o'leary [mailto:pjaol@pjaol.com]=20 Sent: Friday, May 08, 2009 2:34 PM To: java-user@lucene.apache.org Subject: Re: Lucene Index Encryption There will always be levels of where data will be insecurely available. Most notably within the memory of an application once it's running. Unless you want to go down the path of encrypting and decrypting each and every string. At which point you loose dictionary functionality and well any useful enumeration. If you run a system where your security concern is for persistent data storage, e.g. disk, backup's, off lease returns, repairs etc.. and not volatile storage, one method might be to store the lucene index in a encrypted archive, file system or a hardware device, and use something like Dave Spencers idea of converting an FSDirectory into a RAMDirectory like http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00633.html Once it's in volatile memory, it's obviously possible with someone who knows what they're doing, but a hell of a lot harder to penetrate, when the process dies the unencrypted data is gone (as long as you're using appropriate RAM) On Fri, May 8, 2009 at 1:27 PM, Karl Wettin wrote: > I might be missing something here, but why not just store the index on a > cryptographic virtual file system? > > > karl > > > 8 maj 2009 kl. 19.09 skrev >: > > > >> Michael, >> >> Thanks for the comments they are very insightful. >> >> I hadn't thought about the Random Access issues until you brought it up. >> >> This makes the project a little tougher, but not impossible. >> I was searching last night and there have been a couple of papers >> written on the topic of Encrypted Random Access files at MIT. >> >> I haven't finished reading all of them yet, but they suggest ways of >> solving the Encryption problems for Random Access Files. >> >> http://groups.csail.mit.edu/cis/theses/fu-masters.pdf >> >> I am going to spend a few day looking at the various papers before I >> waste your time discussing this any further. >> >> (Presumably performance will suffer perhaps substantially since every >>> search will need to decrypt on the fly...). >>> >> >> Yes, I imagine that there will be a performance hit, this will add >> significant overhead to every byte that Lucene accesses. >> However, in some applications the price of having unsecure data is >> unacceptable, when secure data is published to a laptop for use offline. >> In this case, the additional time needed to access the index would be >> acceptable. >> >> Examples: Military, Medical, and Financial information. >> >> Thanks, >> Peter >> >> >> >> Subject: >> Re: Lucene Index Encryption >> >> Actions... >> From: >> Michael McCandless (luc...@mikemccandless.com) >> Date: >> May 5, 2009 1:22:00 am >> List: >> org.apache.lucene.java-user >> Would you encrypt at the file level? Ie, the encryption would live >> "under" a RandomAccessFile (RAF) and otherwise feel "normal" to >> Lucene? >> >> (I think I remember others exploring encryption at the individual term >> level, which is interesting but does leak information in that you can >> see individual terms & their frequencies). >> >> Lucene needs to be able to ask a RAF opened for writing what it's >> current "position" is during indexing, which it then stores away, and >> later during searching it needs to ask a RAF opened for reading to >> seek back to that position so it can read bytes from there. Would the >> encryption APIs allow this? >> >> If this is possible then couldn't one make a Directory impl that hides >> all encryption/decryption "under the hood"? >> >> (Presumably performance will suffer perhaps substantially since every >> search will need to decrypt on the fly...). >> >> Mike >> >> On Mon, May 4, 2009 at 6:29 PM, wrote: >> >> I hope to make this a discussion rather than a request for a feature. >> >> In the database world, secure data is always encrypted in the database. >> Since I am interested in storing data from a database in the index, at >> times I want to encrypt the index when the file is one disk. >> >> Currently data stored in the Lucene Index is easily accessible to any >> program that wants to access it. You cannot store sensitive data in the >> index without the fear that it will be readable by all the people that >> have access to the system. >> >> There are two other posts in the mailing list that ask a question about >> Lucene Index Encryption. In both cases, I think that the conservation >> was dropped or the feature put off. >> >> Basically, I am asking for comments on the topic. I might consider >> coding the feature, but I would only do it if I am sure that the feature >> would be useful and accepted back into the core codebase of Lucene. >> >> The Sun javax.crypto package is available in the JDK 1.4 so using that >> package could be possible way of providing an encrypted file. >> >> The other option is Bouncy Castle, which is now being used in the PDFBox >> and Tika projects. >> >> In any case, because the normal Lucene Index implementation would not >> use an encrypted index, all references to Security classes should load >> dynamically with the "Class.forName()" method if they were not part of a >> standard JRE, to guarantee no additional requirements are placed on >> people currently using the Lucene libraries. >> >> Then there is the issue of what to use as the Encryption Key, and how to >> allow access to the Index files from the various programs that may need >> to get to the data. The Encryption Key needs to external from any >> program that accesses the Index, because with Java, if the key were >> stored in the code, it would be easily found with a simple decompile of >> the Java class. >> >> I don't have answers to the questions, but basically I am requesting >> comments on the topic. >> >> I imagine that if I put Encryption and Decryption at the I/O level, >> immediately before a segment was written or immediately after a segment >> was read, that I would minimize the overall impact of the Lucene >> Library. >> >> Another area to address is Remote Searching. The Remote Interface would >> need extensions that allow for Encrypted Remote files as well as >> Encrypted communication between the machines. >> >> However, I am not sure of these assumptions. I don't know how many >> places the segments are read and written. I really do not know how to do >> this currently, but would be willing to give it a try it there was >> enough interest shown in the topic. >> >> Peter >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java...@lucene.apache.org >> For additional commands, e-mail: java...@lucene.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java...@lucene.apache.org >> For additional commands, e-mail: java...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org