lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Engels <>
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
Date Thu, 30 Nov 2006 15:44:02 GMT
I think a simpler solution would be to create a EncryptedDirectory implementation of Directory,
which requires a password to open/modify the directory.

Far simpler, and if yuou are using encryption to begin with, you are probably encrypting most
of the data anyway.

-----Original Message-----
>From: negrinv <>
>Sent: Nov 29, 2006 9:45 PM
>Subject: Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
>Thank you Luke for your comments and the references you supplied. I read
>through them and reached the following conclusions. There seems to be a
>philosophical issue about the boundary between a user application and the
>Lucene API, where should one start and the other stop.
>The other issue is the significant difference between compression and
>As far as the first issue is concerned it is really a matter of personal
>choice and preference. My feeling is that as long as adding functionality
>does not impair the performance of the API as a whole, it makes sense to add
>it to Lucene and thus simplify the task of the application developer. After
>all, application developers do not have to use all the features of the API
>and always have the option of subclassing, writing a better version of it if
>they can, or writing the functionality as part of the application, even if
>the API provides that functionality already. The API is there to make life
>easier for those developers who want to use it, nobody "has" to use it.
>The second issue is more technical. Compression simply compresses the stored
>data to save storage. The index itself is not compressed therefore searching
>proceeds as normal. With encryption however you must encrypt the index as
>well as the stored data otherwise one could reconstruct the source document
>from the index and thus defeat the purpose of encryption. Correct me if I am
>wrong, but I think that encrypting the Lucene index is not easy to achieve
>from outside of Lucene, it implies re-writing as part of the application
>much code now part of Lucene (see issue number one above), hence my
>preference for including it as part of the Lucene API rather than as part of
>the application.
>Luke Nezda wrote:
>> I think that adding encryption support to Lucene fields is a bad idea for
>> the same reasons adding compression was a bad idea (conclusive comments on
>> the tail of this  issue
>>  Binary fields
>> can be used by users to achieve this end.  Maybe a contrib with utility
>> methods would be a compromise to preserve this work and make it accessible
>> to others, or alternatively just a faq entry with the sample code or
>> references to it.
>> Luke
>> On 11/29/06, negrinv <> wrote:
>>> Attached are proposed modifications to Lucene 2.0 to support
>>> Field.Store.Encrypted.
>>> The rational behind this proposal is simple. Since Lucene can store data
>>> in
>>> the index, it effectively makes the data portable. It is conceivable that
>>> some of the data may be sensitive in nature, hence the option to encrypt
>>> it.
>>> Both the data and its index are encrypted in this implementation.
>>> This is only an initial implementation. It has the following several
>>> restrictions, all of which can be resolved if required, albeit with some
>>> effort and more changes to Lucene:
>>> 1) binary and compressed fields cannot be encrypted as well (a plaintext
>>> once encrypted becomes binary).
>>> 2) Field.Store.Encrypted implies Field.Store.Yes
>>> This makes sense but it forces one to store the data in the same index
>>> where
>>> the tokens are stored. It may be preferable at times to have two indeces,
>>> one for tokens, the other for the data.
>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
>>> open
>>> source package, very simple to use which has the advantage of
>>> guaranteeing
>>> that the length of the encrypted field is the same as the original
>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
>>> Java
>>> Cryptography Extension, but unfortunately not in Java 1.4.
>>> The BouncyCastle RC4 is not the only algorythm available, others not
>>> depending on third party code can be used, but it was just the simplest
>>> to
>>> implement for this first attempt.
>>> 4) The attachements are modifications in diff form based on an early (I
>>> think August or September '06) repository snapshot of Lucene 2.0
>>> subsequently updated from the Lucene repository on 29/11/06. They may
>>> need
>>> some additional work to merge with the latest version in the Lucene
>>> repository. They also include a couple of JUnit test programs which
>>> explain,
>>> as well as test, the usage. You will need the BouncyCastle .jar
>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
>>> size
>>> of the attachements, but it can be downloaded free from:
>>> 5) Searching an encrypted field is restricted to single terms, no phrase
>>> or
>>> boolean searches allowed yet, and the term has to be encrypted by the
>>> application before searching it. (ref. attached JUnit test programs)
>>> To the extent that I have tested it, the code works as intended and does
>>> not
>>> appear to introduce any regression problems, but more testing by others
>>> would be desirable.
>>> I don't propose at this stage to do any further work with this API
>>> extensions unless there is some expression of interest and direction from
>>> the Lucene Developers team. I have an application ready to roll which
>>> uses
>>> the proposed Lucene encryption API additions (please see
>>> The application is not yet available
>>> for
>>> downloading simply because I am not sure if the Lucene licence allows me
>>> to
>>> do so. I would appreciate your advice in this regard. My application is
>>> free
>>> but its source code is not available (yet). I should add that encryption
>>> does not have to be an integral part of Lucene, it can be just part of
>>> the
>>> end application, but somehow it seems to me that Field.Store.Encrypted
>>> belongs in the same category as compression and binary values.
>>> I would be happy to receive your feedback.
>>> victor negrin
>>> luceneDiff2.txt
>>> --
>>> View this message in context:
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>View this message in context:
>Sent from the Lucene - Java Developer mailing list archive at
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message