lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Babak Farhang <farh...@gmail.com>
Subject Re: Lucene Index Encryption
Date Mon, 11 May 2009 18:06:43 GMT
On Mon, May 11, 2009 at 12:19 AM, Andrzej Bialecki <ab@getopt.org> wrote:

>
> Unfortunately, current Lucene IndexWriter implementation uses seek /
> overwrite when writing term info dictionary. This is described in more
> detail here:
>
> https://issues.apache.org/jira/browse/LUCENE-532
>

Thanks for the enlightening link. I skimmed through LUCENE-532 and
some related issues (LUCENE-701 (wow!) and LUCENE-704). It appears
that though there's already a patch for this, the reason why it is not
committed is for lack of support for the CFS file format. In
particular, Michael McCandless comments in LUCENE-532:

<quote>
Also: I like the idea of never doing "seek" when writing. The less
functionality we rely on from the filesystem, the more portable Lucene
will be. Since Lucene is so wonderfully simple, never using "seek"
during write is in fact very feasible.

I think to do this we need to change the CFS file format, so that the
offsets are stored at the end of the file. We actually can't
pre-compute where the offsets will be because we can't make
assumptions about how the file position changes when bytes are
written: this is implementation specific. For example, if the
Directory implementation does on-the-fly compression, then the file
position will not be the number of bytes written. So I think we have
to write at the end of the file.

Any opinions or other suggestions?
</quote>

I am not familiar with the details of CFS, but I didn't interpret
Michael's comment to mean that there is actually any rewriting going
on here. The problem here appears to be one of translating the
encrypted/compressed file position to the uncompressed file position.
Am I reading this right?

If in fact so, then a simple solution would be to push down all the
encoding logic into the RAF implementation itself.  The "append-only"
RAF implementation would maintain a decoded view of the file.  This
decoded view would include the (virtual) decoded file position.  In
that case, CFS could be oblivious to the actual RAF implementation.

Kind regards,

-Babak
http://skwish.sourceforge.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message