lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <nicolas.lale...@anyware-tech.com>
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
Date Fri, 01 Dec 2006 08:20:59 GMT
Le Vendredi 1 Décembre 2006 01:33, negrinv a écrit :
> Thank you Robert for your commnets. I am inclined to agree with you, but I
> would like to establish first of all if simplicity of implementation is the
> overriding consideration. But before I dwell on that let me say that i have
> discovered that I am not a master of DIFF file creation with Eclipse. The
> diff file attachement to my original posting is absurdly large and not
> correct. I have therefore attached a zip file containing the complete
> source code of the classes I modified. I leave it to others to extract the
> diffs properly.
> Back to the issue. So far the implementation has not been difficult
> considering that I knew nothing about Lucene internals before I started.
> The reason is that Lucene is very well structured and the changes just
> fitted nicely by adding some code in the right place with minimal changes
> to the existing code. But I admit that the proposed implementation so far
> is not complete and more work is required to overcome some of its
> restrictions. While I like your idea I believe that it imposed too large a
> granularity on the encrypted data, all fields will all kinds of data will
> be encrypted including  images and others which normally would be left
> alone, thus adding to the performance penalty due to encryption.

I don't agree with you here. In Lucene, you will encrypt the field data, the 
field names, and the tokens : I would say that is represents at least 2/3 of 
the index size. Then, with the implementation you suggest, I think (sorry I 
didn't took time to see you patch) that every time a lucene data need to be 
read, it is decrypted each time. With an encrypted FS, your kernel will 
maintain a cache in RAM for you, so it won't hurt so much.
It needs some bench to see what is effectively the best, but I have doubt that 
your solution will be faster.

Nicolas.

> Many 
> hardware devices and most operating systems already provide directory or
> file system encryption therefore that level of encryption appears to me an
> unnecessary addition to Lucene. Encryption at field level however is not
> provided by anything I know. The key in my opinion is to decide what is
> best from the end user point of view, but perhaps we need more discussion
> on this.
> Victor
>
> http://www.nabble.com/file/4390/LuceneEncryptionMods.zip
> LuceneEncryptionMods.zip
>
> Robert Engels wrote:
> > I think a simpler solution would be to create a EncryptedDirectory
> > implementation of Directory, which requires a password to open/modify the
> > directory.
> >
> > Far simpler, and if yuou are using encryption to begin with, you are
> > probably encrypting most of the data anyway.
> >
> > -----Original Message-----
> >
> >>From: negrinv <victornegrin@gmail.com>
> >>Sent: Nov 29, 2006 9:45 PM
> >>To: java-dev@lucene.apache.org
> >>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
>
> Field.Store.Encrypted
>
> >>Thank you Luke for your comments and the references you supplied. I read
> >>through them and reached the following conclusions. There seems to be a
> >>philosophical issue about the boundary between a user application and the
> >>Lucene API, where should one start and the other stop.
> >>The other issue is the significant difference between compression and
> >>encryption.
> >>As far as the first issue is concerned it is really a matter of personal
> >>choice and preference. My feeling is that as long as adding functionality
> >>does not impair the performance of the API as a whole, it makes sense to
>
> add
>
> >>it to Lucene and thus simplify the task of the application developer.
>
> After
>
> >>all, application developers do not have to use all the features of the
> >> API and always have the option of subclassing, writing a better version
> >> of it
>
> if
>
> >>they can, or writing the functionality as part of the application, even
> >> if the API provides that functionality already. The API is there to make
> >> life easier for those developers who want to use it, nobody "has" to use
> >> it. The second issue is more technical. Compression simply compresses
> >> the
>
> stored
>
> >>data to save storage. The index itself is not compressed therefore
>
> searching
>
> >>proceeds as normal. With encryption however you must encrypt the index as
> >>well as the stored data otherwise one could reconstruct the source
>
> document
>
> >>from the index and thus defeat the purpose of encryption. Correct me if I
>
> am
>
> >>wrong, but I think that encrypting the Lucene index is not easy to
> >> achieve from outside of Lucene, it implies re-writing as part of the
> >> application much code now part of Lucene (see issue number one above),
> >> hence my preference for including it as part of the Lucene API rather
> >> than as part
>
> of
>
> >>the application.
> >>Victor
> >>
> >>Luke Nezda wrote:
> >>> I think that adding encryption support to Lucene fields is a bad idea
> >>> for
> >>> the same reasons adding compression was a bad idea (conclusive comments
> >>> on
> >>> the tail of this  issue
> >>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
> >>> fields
> >>> can be used by users to achieve this end.  Maybe a contrib with utility
> >>> methods would be a compromise to preserve this work and make it
> >>> accessible
> >>> to others, or alternatively just a faq entry with the sample code or
> >>> references to it.
> >>> Luke
> >>>
> >>> On 11/29/06, negrinv <victornegrin@gmail.com> wrote:
> >>>> Attached are proposed modifications to Lucene 2.0 to support
> >>>> Field.Store.Encrypted.
> >>>> The rational behind this proposal is simple. Since Lucene can store
> >>>> data
> >>>> in
> >>>> the index, it effectively makes the data portable. It is conceivable
> >>>> that
> >>>> some of the data may be sensitive in nature, hence the option to
> >>>> encrypt
> >>>> it.
> >>>> Both the data and its index are encrypted in this implementation.
> >>>> This is only an initial implementation. It has the following several
> >>>> restrictions, all of which can be resolved if required, albeit with
> >>>> some
> >>>> effort and more changes to Lucene:
> >>>> 1) binary and compressed fields cannot be encrypted as well (a
> >>>> plaintext
> >>>> once encrypted becomes binary).
> >>>> 2) Field.Store.Encrypted implies Field.Store.Yes
> >>>> This makes sense but it forces one to store the data in the same index
> >>>> where
> >>>> the tokens are stored. It may be preferable at times to have two
> >>>> indeces,
> >>>> one for tokens, the other for the data.
> >>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is
> >>>> an open
> >>>> source package, very simple to use which has the advantage of
> >>>> guaranteeing
> >>>> that the length of the encrypted field is the same as the original
> >>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
> >>>> Java
> >>>> Cryptography Extension, but unfortunately not in Java 1.4.
> >>>> The BouncyCastle RC4 is not the only algorythm available, others not
> >>>> depending on third party code can be used, but it was just the
> >>>> simplest to
> >>>> implement for this first attempt.
> >>>> 4) The attachements are modifications in diff form based on an early
> >>>> (I think August or September '06) repository snapshot of Lucene 2.0
> >>>> subsequently updated from the Lucene repository on 29/11/06. They may
> >>>> need
> >>>> some additional work to merge with the latest version in the Lucene
> >>>> repository. They also include a couple of JUnit test programs which
> >>>> explain,
> >>>> as well as test, the usage. You will need the BouncyCastle .jar
> >>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize
> >>>> the size
> >>>> of the attachements, but it can be downloaded free from:
> >>>> http://www.bouncycastle.org/latest_releases.html
> >>>>
> >>>> 5) Searching an encrypted field is restricted to single terms, no
> >>>> phrase
> >>>> or
> >>>> boolean searches allowed yet, and the term has to be encrypted by the
> >>>> application before searching it. (ref. attached JUnit test programs)
> >>>>
> >>>> To the extent that I have tested it, the code works as intended and
> >>>> does
> >>>> not
> >>>> appear to introduce any regression problems, but more testing by
> >>>> others would be desirable.
> >>>> I don't propose at this stage to do any further work with this API
> >>>> extensions unless there is some expression of interest and direction
> >>>> from
> >>>> the Lucene Developers team. I have an application ready to roll which
> >>>> uses
> >>>> the proposed Lucene encryption API additions (please see
> >>>> http://www.kbforge.com/index.html). The application is not yet
> >>>> available
> >>>> for
> >>>> downloading simply because I am not sure if the Lucene licence allows
> >>>> me
> >>>> to
> >>>> do so. I would appreciate your advice in this regard. My application
> >>>> is free
> >>>> but its source code is not available (yet). I should add that
> >>>> encryption
> >>>> does not have to be an integral part of Lucene, it can be just part
of
> >>>> the
> >>>> end application, but somehow it seems to me that Field.Store.Encrypted
> >>>> belongs in the same category as compression and binary values.
> >>>> I would be happy to receive your feedback.
> >>>>
> >>>> victor negrin
> >>>>
> >>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
> >>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
> >>>> TestEncryptedDocument.java
> >>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
> >>>> --
> >>>> View this message in context:
> >>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to
> >>>>-support-Field.Store.Encrypted-tf2727614.html#a7607415 Sent from the
> >>>> Lucene - Java Developer mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>--
> >>View this message in context:
>
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-supp
>ort-Field.Store.Encrypted-tf2727614.html#a7613046
>
> >>Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org

-- 
Nicolas LALEVÉE
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message