lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From negrinv <victorneg...@gmail.com>
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
Date Fri, 01 Dec 2006 19:59:41 GMT

I think we should not make too many assumptions about performance until we
can test alternative solutions.
The small payload overhead will be amply offset in my opinion by the ability
to be very selective about what is being encrypted, as opposed to wholesale
encryption and decryption. Also we should look at performance in the larger
context of all the possible reasons why users might need encryption. A large
proportion may not be worried about performance at all. And in final
analysis any performance degradation is not going to be crippling, we are
probably talking about very small percentages, either way, which, as long as
they are known and made available, will enable users to make an informed
decision.
Victor


Robert Engels wrote:
> 
> I agree with Nicolas.
> 
> I think the overhead of decrypting such small payloads (I think it is also
> subject to an easy attack, and/or will increase index size dramtically in
> order to prevent such small encryption blocks) will have a serious impact
> on performance.
> 
> We use Lucene for indexing only and store the actual payloads elsewhere,
> so in our case your solution is not optimal for us.
> -----Original Message-----
>>From: Nicolas Lalev�e <nicolas.lalevee@anyware-tech.com>
>>Sent: Dec 1, 2006 2:20 AM
>>To: java-dev@lucene.apache.org
>>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
Field.Store.Encrypted
>>
>>Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit�:
>>> Thank you Robert for your commnets. I am inclined to agree with you, but
>>> I
>>> would like to establish first of all if simplicity of implementation is
>>> the
>>> overriding consideration. But before I dwell on that let me say that i
>>> have
>>> discovered that I am not a master of DIFF file creation with Eclipse.
>>> The
>>> diff file attachement to my original posting is absurdly large and not
>>> correct. I have therefore attached a zip file containing the complete
>>> source code of the classes I modified. I leave it to others to extract
>>> the
>>> diffs properly.
>>> Back to the issue. So far the implementation has not been difficult
>>> considering that I knew nothing about Lucene internals before I started.
>>> The reason is that Lucene is very well structured and the changes just
>>> fitted nicely by adding some code in the right place with minimal
>>> changes
>>> to the existing code. But I admit that the proposed implementation so
>>> far
>>> is not complete and more work is required to overcome some of its
>>> restrictions. While I like your idea I believe that it imposed too large
>>> a
>>> granularity on the encrypted data, all fields will all kinds of data
>>> will
>>> be encrypted including  images and others which normally would be left
>>> alone, thus adding to the performance penalty due to encryption.
>>
>>I don't agree with you here. In Lucene, you will encrypt the field data,
the 
>>field names, and the tokens : I would say that is represents at least 2/3
of 
>>the index size. Then, with the implementation you suggest, I think (sorry
I 
>>didn't took time to see you patch) that every time a lucene data need to
be 
>>read, it is decrypted each time. With an encrypted FS, your kernel will 
>>maintain a cache in RAM for you, so it won't hurt so much.
>>It needs some bench to see what is effectively the best, but I have doubt
that 
>>your solution will be faster.
>>
>>Nicolas.
>>
>>> Many 
>>> hardware devices and most operating systems already provide directory or
>>> file system encryption therefore that level of encryption appears to me
>>> an
>>> unnecessary addition to Lucene. Encryption at field level however is not
>>> provided by anything I know. The key in my opinion is to decide what is
>>> best from the end user point of view, but perhaps we need more
>>> discussion
>>> on this.
>>> Victor
>>>
>>> http://www.nabble.com/file/4390/LuceneEncryptionMods.zip
>>> LuceneEncryptionMods.zip
>>>
>>> Robert Engels wrote:
>>> > I think a simpler solution would be to create a EncryptedDirectory
>>> > implementation of Directory, which requires a password to open/modify
>>> the
>>> > directory.
>>> >
>>> > Far simpler, and if yuou are using encryption to begin with, you are
>>> > probably encrypting most of the data anyway.
>>> >
>>> > -----Original Message-----
>>> >
>>> >>From: negrinv <victornegrin@gmail.com>
>>> >>Sent: Nov 29, 2006 9:45 PM
>>> >>To: java-dev@lucene.apache.org
>>> >>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
>>>
>>> Field.Store.Encrypted
>>>
>>> >>Thank you Luke for your comments and the references you supplied. I
>>> read
>>> >>through them and reached the following conclusions. There seems to be
>>> a
>>> >>philosophical issue about the boundary between a user application and
>>> the
>>> >>Lucene API, where should one start and the other stop.
>>> >>The other issue is the significant difference between compression and
>>> >>encryption.
>>> >>As far as the first issue is concerned it is really a matter of
>>> personal
>>> >>choice and preference. My feeling is that as long as adding
>>> functionality
>>> >>does not impair the performance of the API as a whole, it makes sense
>>> to
>>>
>>> add
>>>
>>> >>it to Lucene and thus simplify the task of the application developer.
>>>
>>> After
>>>
>>> >>all, application developers do not have to use all the features of the
>>> >> API and always have the option of subclassing, writing a better
>>> version
>>> >> of it
>>>
>>> if
>>>
>>> >>they can, or writing the functionality as part of the application,
>>> even
>>> >> if the API provides that functionality already. The API is there to
>>> make
>>> >> life easier for those developers who want to use it, nobody "has" to
>>> use
>>> >> it. The second issue is more technical. Compression simply compresses
>>> >> the
>>>
>>> stored
>>>
>>> >>data to save storage. The index itself is not compressed therefore
>>>
>>> searching
>>>
>>> >>proceeds as normal. With encryption however you must encrypt the index
>>> as
>>> >>well as the stored data otherwise one could reconstruct the source
>>>
>>> document
>>>
>>> >>from the index and thus defeat the purpose of encryption. Correct me
>>> if I
>>>
>>> am
>>>
>>> >>wrong, but I think that encrypting the Lucene index is not easy to
>>> >> achieve from outside of Lucene, it implies re-writing as part of the
>>> >> application much code now part of Lucene (see issue number one
>>> above),
>>> >> hence my preference for including it as part of the Lucene API rather
>>> >> than as part
>>>
>>> of
>>>
>>> >>the application.
>>> >>Victor
>>> >>
>>> >>Luke Nezda wrote:
>>> >>> I think that adding encryption support to Lucene fields is a bad
>>> idea
>>> >>> for
>>> >>> the same reasons adding compression was a bad idea (conclusive
>>> comments
>>> >>> on
>>> >>> the tail of this  issue
>>> >>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
>>> >>> fields
>>> >>> can be used by users to achieve this end.  Maybe a contrib with
>>> utility
>>> >>> methods would be a compromise to preserve this work and make it
>>> >>> accessible
>>> >>> to others, or alternatively just a faq entry with the sample code
or
>>> >>> references to it.
>>> >>> Luke
>>> >>>
>>> >>> On 11/29/06, negrinv <victornegrin@gmail.com> wrote:
>>> >>>> Attached are proposed modifications to Lucene 2.0 to support
>>> >>>> Field.Store.Encrypted.
>>> >>>> The rational behind this proposal is simple. Since Lucene can
store
>>> >>>> data
>>> >>>> in
>>> >>>> the index, it effectively makes the data portable. It is
>>> conceivable
>>> >>>> that
>>> >>>> some of the data may be sensitive in nature, hence the option
to
>>> >>>> encrypt
>>> >>>> it.
>>> >>>> Both the data and its index are encrypted in this implementation.
>>> >>>> This is only an initial implementation. It has the following
>>> several
>>> >>>> restrictions, all of which can be resolved if required, albeit
with
>>> >>>> some
>>> >>>> effort and more changes to Lucene:
>>> >>>> 1) binary and compressed fields cannot be encrypted as well
(a
>>> >>>> plaintext
>>> >>>> once encrypted becomes binary).
>>> >>>> 2) Field.Store.Encrypted implies Field.Store.Yes
>>> >>>> This makes sense but it forces one to store the data in the
same
>>> index
>>> >>>> where
>>> >>>> the tokens are stored. It may be preferable at times to have
two
>>> >>>> indeces,
>>> >>>> one for tokens, the other for the data.
>>> >>>> 3) As implemented, it uses RC4 encryption from BouncyCastle.
This
>>> is
>>> >>>> an open
>>> >>>> source package, very simple to use which has the advantage of
>>> >>>> guaranteeing
>>> >>>> that the length of the encrypted field is the same as the original
>>> >>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent
in
>>> its
>>> >>>> Java
>>> >>>> Cryptography Extension, but unfortunately not in Java 1.4.
>>> >>>> The BouncyCastle RC4 is not the only algorythm available, others
>>> not
>>> >>>> depending on third party code can be used, but it was just the
>>> >>>> simplest to
>>> >>>> implement for this first attempt.
>>> >>>> 4) The attachements are modifications in diff form based on
an
>>> early
>>> >>>> (I think August or September '06) repository snapshot of Lucene
2.0
>>> >>>> subsequently updated from the Lucene repository on 29/11/06.
They
>>> may
>>> >>>> need
>>> >>>> some additional work to merge with the latest version in the
Lucene
>>> >>>> repository. They also include a couple of JUnit test programs
which
>>> >>>> explain,
>>> >>>> as well as test, the usage. You will need the BouncyCastle .jar
>>> >>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize
>>> >>>> the size
>>> >>>> of the attachements, but it can be downloaded free from:
>>> >>>> http://www.bouncycastle.org/latest_releases.html
>>> >>>>
>>> >>>> 5) Searching an encrypted field is restricted to single terms,
no
>>> >>>> phrase
>>> >>>> or
>>> >>>> boolean searches allowed yet, and the term has to be encrypted
by
>>> the
>>> >>>> application before searching it. (ref. attached JUnit test
>>> programs)
>>> >>>>
>>> >>>> To the extent that I have tested it, the code works as intended
and
>>> >>>> does
>>> >>>> not
>>> >>>> appear to introduce any regression problems, but more testing
by
>>> >>>> others would be desirable.
>>> >>>> I don't propose at this stage to do any further work with this
API
>>> >>>> extensions unless there is some expression of interest and
>>> direction
>>> >>>> from
>>> >>>> the Lucene Developers team. I have an application ready to roll
>>> which
>>> >>>> uses
>>> >>>> the proposed Lucene encryption API additions (please see
>>> >>>> http://www.kbforge.com/index.html). The application is not yet
>>> >>>> available
>>> >>>> for
>>> >>>> downloading simply because I am not sure if the Lucene licence
>>> allows
>>> >>>> me
>>> >>>> to
>>> >>>> do so. I would appreciate your advice in this regard. My
>>> application
>>> >>>> is free
>>> >>>> but its source code is not available (yet). I should add that
>>> >>>> encryption
>>> >>>> does not have to be an integral part of Lucene, it can be just
part
>>> of
>>> >>>> the
>>> >>>> end application, but somehow it seems to me that
>>> Field.Store.Encrypted
>>> >>>> belongs in the same category as compression and binary values.
>>> >>>> I would be happy to receive your feedback.
>>> >>>>
>>> >>>> victor negrin
>>> >>>>
>>> >>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>>> >>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>>> >>>> TestEncryptedDocument.java
>>> >>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>>> >>>> --
>>> >>>> View this message in context:
>>> >>>>
>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to
>>> >>>>-support-Field.Store.Encrypted-tf2727614.html#a7607415 Sent from
the
>>> >>>> Lucene - Java Developer mailing list archive at Nabble.com.
>>> >>>>
>>> >>>>
>>> >>>>
>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>
>>> >>--
>>> >>View this message in context:
>>>
>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-supp
>>>ort-Field.Store.Encrypted-tf2727614.html#a7613046
>>>
>>> >>Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>> >>
>>> >>
>>> >>---------------------------------------------------------------------
>>> >>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>-- 
>>Nicolas LALEV�E
>>Solutions & Technologies
>>ANYWARE TECHNOLOGIES
>>Tel : +33 (0)5 61 00 52 90
>>Fax : +33 (0)5 61 00 51 46
>>http://www.anyware-tech.com
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7645184
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message