Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 26704 invoked from network); 2 Dec 2006 22:51:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Dec 2006 22:51:14 -0000 Received: (qmail 79706 invoked by uid 500); 2 Dec 2006 22:51:20 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 79668 invoked by uid 500); 2 Dec 2006 22:51:20 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 79657 invoked by uid 99); 2 Dec 2006 22:51:19 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Dec 2006 14:51:19 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [209.86.89.61] (HELO elasmtp-galgo.atl.sa.earthlink.net) (209.86.89.61) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Dec 2006 14:51:07 -0800 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=SfGx+Zh8EtglYkdP8B06LgmFr417gtBTYkSdXlTDxqAMOKEFwKsT5ecDcHB/1J6k; h=Received:Mime-Version:In-Reply-To:References:Content-Type:Message-Id:Content-Transfer-Encoding:From:Subject:Date:To:X-Mailer:X-ELNK-Trace:X-Originating-IP; Received: from [69.209.74.83] (helo=[192.168.1.64]) by elasmtp-galgo.atl.sa.earthlink.net with asmtp (Exim 4.34) id 1Gqdh0-0001Jb-0O for java-dev@lucene.apache.org; Sat, 02 Dec 2006 17:50:46 -0500 Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <7657011.post@talk.nabble.com> References: <7607415.post@talk.nabble.com> <7613046.post@talk.nabble.com> <1331853.1164900857725.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <2070020.1164901442989.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <7631251.post@talk.nabble.com> <200612010920.59588.nicolas.lalevee@anyware-tech.com> <7636352.post@talk.nabble.com> <200612011149.11645.nicolas.lalevee@anyware-tech.com> <10788930.1164994560884.JavaMail.root@elwamui-hybrid.atl.sa.earthlink.net> <7645198.post@talk.nabble.com> <8B4E8E6D-F248-403F-A598-B49CE413345C@ix.netcom.com> <7657011.post@talk.nabble.com> Content-Type: text/plain; charset=UTF-8; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable From: robert engels Subject: Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted Date: Sat, 2 Dec 2006 16:50:42 -0600 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-ELNK-Trace: 33cbdd8ed9881ca8776432462e451d7bd15d05d9470ff710aa227ca3d39d516784762210dad665a7350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 69.209.74.83 X-Virus-Checked: Checked by ClamAV on apache.org I think the point of the discussion is really to determine the answer =20= to #1. I would counter that it is not a compelling feature for MOST users of =20= Lucene, but it can still be implemented externally using binary =20 fields for those that require it, and or even easier (and maybe even =20 faster) using a encrypted filesystem with proper security. Adding it to the core Lucene complicates the code base, and I do not =20 believe it is warranted. This is only my opinion. On Dec 2, 2006, at 2:38 PM, negrinv wrote: > > At the contrary Mike, I am beginning to think that there have been =20 > a number > of misunderstandings, of my original posting to start with. > When I submitted my proposal I was prepared for some discussion on the > merits or otherwise of my proposed solution. I had no idea that the > discussion would drift towards security and performance in absolute =20= > terms. I > would like now to steer the debate in its intended direction. > > I have no difficulty agreeing with you on both counts. A non-=20 > encrypted swap > file is a security risk, and encryption imposes a performance =20 > penalty. Both > of which I submit are not relevant to my posting for the following =20 > reasons. > Security is all about knowing where you stand so you can take > counter-measures, it is not about a "false sense of security" =20 > provided by > knowing you have an encrypted swap file or a 3000 byte encryption key. > Lucene cannot provide security. It would be a legal nightmare and =20 > an absurd > expectation. The underlying operating system within which Lucene =20 > runs does > not guarantee security, the encryption software provider does not =20 > guarantee > security, password protection and physical security are also =20 > outside of > Lucene's control. What Lucene can do is to provide encryption =20 > services, > while the application has to provide a given level of security. For > instance, if you run under an operating system which does not =20 > provide swap > file encryption, then you must disable the swap file. Does that =20 > impose a > performance penalty? Probably, if your memory is limited, but now =20 > you know > where you stand so you make a decision. Performance or encrytpion =20 > or more > memory. But one cannot, in my view, shift the responsability for that > decision to Lucene. > I'll give you another example, you mentioned padding of 128 bits. =20 > True, > there are encryption routines which impose that penalty. For my =20 > (initial) > implementation I had the choice between an algorythm with padding, =20 > or RC4, > which does not pad. A 10 character term remains a 10 character term =20= > after > encryption. No padding and no index size implications. I said so in my > posting and as an application developer you then have a choice to =20 > make. Use > Lucene RC4 encryption as proposed (for the time being) or use another > product, or write your own. Without knowing the application, any =20 > decision > would be totally out of context, and no one piece of software can =20 > satisfy > all applications. A possible solution would be for Lucene to offer =20 > a choice > of algorythms. > > The army I am sure would like to run its tanks at the speed of a =20 > Ferrary, > but it cannot, it hits a wall known as cost-benefit ratio. It must =20 > choose > between security and speed and budget, keeping in mind the =20 > application. The > modern tank is the answer. A compromise. > My original posting avoided the notion of security and performance in > absolute terms precisely because of all the above considerations, =20 > it simply > addressed a couple of points which need to be resolved before the =20 > specifics > of the implementation can be discussed. > > 1) is it a good idea to have ancryption added to Lucene? I think so > obviously, but not everyone agrees. As was pointed out in this =20 > discussion, > some relational database software provides encryption at the column =20= > level, a > functionality equivalent to the one I proposed. Lucene in some ways =20= > competes > with relational databases. > > 2) assuming the answer to 1) above is yes, how should one go about =20 > including > encryption in Lucene. My solution is just that, one approach. =20 > Others have > proposed directory or file system encryption. My view on this is =20 > that this > level of encryption is already provided by all major operating =20 > systems, as > well a by some hardware devices. I would not see a justifiable =20 > benefit in > adding it to Lucene. But that is only my personal opinion, although =20= > I am > aware that directory encryption is in the hands of the system =20 > administrator, > not the application end user. Perhaps there are other options which =20= > have not > been raised yet. > > 3) assuming my proposal is acceptable, can it be implemented =20 > better. I am > not a Lucene expert, I learned Lucene on the go. I would be =20 > delighted to see > a better solution presented, it would be a learning experience for me. > > I hope I have not added to the confusion. > > Season's greetings to you and to all who took time to participate =20 > in this > discussion. > Victor > > Robert Engels wrote: >> >> I think you misunderstood. If you do not have encrypted swap (like >> OSX provides for) then you encryption is pointless as anyone can >> inspect the data as it it loaded into the heap by lucene - bypassing >> the encryption. >> >> I also think you underestimated the impact on the size of the >> indexes, as most secure encryption schemes are going to pad the >> payloads to a minimum of 128 bits, and usually much more. >> >> This is going to make a HUGE difference in the size of the index. >> >> On Dec 1, 2006, at 2:00 PM, negrinv wrote: >> >>> >>> Good news for OSX users! but what about all the others, should I >>> say the >>> majority?? >>> One more reason for encrypting at field level. >>> Victor >>> >>> >>> Robert Engels wrote: >>>> >>>> Not if running under OSX with encrypted swap turned on ! :) >>>> >>>> -----Original Message----- >>>>> From: Nicolas Lalev=EF=BF=BDe >>>>> Sent: Dec 1, 2006 4:49 AM >>>>> To: java-dev@lucene.apache.org >>>>> Subject: Re: Attached proposed modifications to Lucene 2.0 to >>>>> support >>> Field.Store.Encrypted >>>>> >>>>> Le Vendredi 1 D=EF=BF=BDcembre 2006 11:10, negrinv a =EF=BF=BDcrit=EF= =BF=BD: >>>>>> Nicolas Lalev=EF=BF=BDe-2 wrote: >>>>>>> Le Vendredi 1 D=EF=BF=BDcembre 2006 01:33, negrinv a =EF=BF=BDcrit= : >>>>>>>> Thank you Robert for your commnets. I am inclined to agree >>>>>>>> with you, >>>>>> but >>>>>>>> I >>>>>>>> would like to establish first of all if simplicity of >>>>>>>> implementation >>>>>> is >>>>>>>> the >>>>>>>> overriding consideration. But before I dwell on that let me >>>>>>>> say that >>>>>> i >>>>>>>> have >>>>>>>> discovered that I am not a master of DIFF file creation with >>>>>>>> Eclipse. >>>>>>>> The diff file attachement to my original posting is absurdly >>>>>>>> large >>>>>> and >>>>>>>> not correct. I have therefore attached a zip file containing =20= >>>>>>>> the >>>>>>>> complete source code of the classes I modified. I leave it to >>>>>>>> others >>>>>> to >>>>>>>> extract the >>>>>>>> diffs properly. >>>>>>>> Back to the issue. So far the implementation has not been >>>>>>>> difficult >>>>>>>> considering that I knew nothing about Lucene internals before I >>>>>> started. >>>>>>>> The reason is that Lucene is very well structured and the =20 >>>>>>>> changes >>>>>> just >>>>>>>> fitted nicely by adding some code in the right place with =20 >>>>>>>> minimal >>>>>>>> changes to the existing code. But I admit that the proposed >>>>>>>> implementation so far is not complete and more work is >>>>>>>> required to >>>>>>>> overcome some of its restrictions. While I like your idea I >>>>>>>> believe >>>>>> that >>>>>>>> it imposed too large a >>>>>>>> granularity on the encrypted data, all fields will all kinds >>>>>>>> of data >>>>>>>> will be encrypted including images and others which normally >>>>>>>> would >>>>>> be >>>>>>>> left alone, thus adding to the performance penalty due to >>>>>>>> encryption. >>>>>>> >>>>>>> I don't agree with you here. In Lucene, you will encrypt the =20 >>>>>>> field >>>>>> data, >>>>>>> the >>>>>>> field names, and the tokens : I would say that is represents at >>>>>>> least >>>>>> 2/3 >>>>>>> of >>>>>>> the index size. Then, with the implementation you suggest, I =20 >>>>>>> think >>>>>> (sorry >>>>>>> I >>>>>>> didn't took time to see you patch) that every time a lucene >>>>>>> data need >>>>>> to >>>>>>> be >>>>>>> read, it is decrypted each time. With an encrypted FS, your =20 >>>>>>> kernel >>>>>> will >>>>>>> maintain a cache in RAM for you, so it won't hurt so much. >>>>>>> It needs some bench to see what is effectively the best, but I >>>>>>> have >>>>>> doubt >>>>>>> that >>>>>>> your solution will be faster. >>>>>>> >>>>>>> Nicolas. >>>>>> >>>>>> Nicolas, I am all in favour of some tests to establish which >>>>>> solution is >>>>>> best, but I have to say that I don't believe file system or >>>>>> directory >>>>>> encryption in Lucene is really justified. Most operating system >>>>>> already >>>>>> provide this feature, although they are system-wide or policy-=20 >>>>>> based >>>>>> solution, hence not always within individual user control. >>>>>> But if the issue is user control, then I believe Lucene should >>>>>> provide >>>>>> maximum granularity when it comes to choice of data to encrypt. >>>>>> The issue I believe is whether some form of encryption should be >>>>>> provided >>>>>> within Lucene to enable application developers to create >>>>>> applications >>>>>> which >>>>>> offer some data protection under user control, with a minimum of >>>>>> impact, >>>>>> where by impact I mean both on peformance and workload either in >>>>>> Lucene >>>>>> code or user code. >>>>> >>>>> In fact you mean a user that has no control of it's machine, and >>>>> that >>> cannot >>>>> encrypt his partition. Here you will have the issue with the >>>>> swap : Lucene >>>>> will decrypt the data in RAM, that can possibly pushed on the >>>>> swap... I >>> know >>>>> this is extreme, but it's a security hole. >>>>> >>>>> --=20 >>>>> Nicolas LALEV=EF=BF=BDE >>>>> Solutions & Technologies >>>>> ANYWARE TECHNOLOGIES >>>>> Tel : +33 (0)5 61 00 52 90 >>>>> Fax : +33 (0)5 61 00 51 46 >>>>> http://www.anyware-tech.com >>>>> >>>>> ------------------------------------------------------------------=20= >>>>> -- >>>>> - >>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>>>> >>>> >>>> >>>> >>>> >>>> -------------------------------------------------------------------=20= >>>> -- >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>>> >>>> >>>> >>> >>> --=20 >>> View this message in context: http://www.nabble.com/Attached- >>> proposed-modifications-to-Lucene-2.0-to-support- >>> Field.Store.Encrypted-tf2727614.html#a7645198 >>> Sent from the Lucene - Java Developer mailing list archive at >>> Nabble.com. >>> >>> >>> --------------------------------------------------------------------=20= >>> - >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> >> >> > > --=20 > View this message in context: http://www.nabble.com/Attached-=20 > proposed-modifications-to-Lucene-2.0-to-support-=20 > Field.Store.Encrypted-tf2727614.html#a7657011 > Sent from the Lucene - Java Developer mailing list archive at =20 > Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org