lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3218) Make CFS appendable
Date Sun, 21 Aug 2011 10:19:27 GMT


Uwe Schindler commented on LUCENE-3218:

Hi when thinking about the whole stuff one more time again, I may have a solution to again
decouple CFS from the parent directory, so one can create any CFS using one single class (but
perhaps the factory in directory is still an idea to make it customizable). There are several
solutions, but most of them have customization problems:
- The current approach was discussed already, nothing more to say
- A possibility to make it possible for MMap to map certain parts of the file is to move the
getIndexInputSlice up to the abstract Directory base class and make the default implementation
the current CFIndexInput from the default CFS impl. This would be even backwards compatible.
So the CFS impl can simply ask the parent directory it warps for a slice. The problem here
is easy: Current CFS impl opens the CFS file exactly one time and consumes exactly one file
handle. The slices work on the same file handle. If we move the slice handling up to the directory,
the "state" is gone, so handling the all-the-time open CFS file cannot be managed anymore.
When using a new file handle for each slice, we gain nothing (CFS is to reduce file handles).
- Last night I had one idea that might fix this issue. Lets move the slice handling into the
abstract IndexInput base class, again the default impl would simply use the current CFIndexInput
to return a slice. In the case of MMapIndexInput it would simply return a remapped slice on
the current file handle. The only thing that would change is that the RAF would kept open
the wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is closed
directly after mapping. This approach would allow it for the CFS impl to simply ask it parant
directory for an IndexInput to handle the SFC file itsself and for each sub-slice ask this
IndexInput for this.

The last approach seems reasonable, but we need some more checks how to implement that. The
last approach keeps both "features" of CFS:
- One OS file handle
- possibility for certain directory implementations to return sliced IndexInputs in an optimal
way. The current IndexInput have a clone method, in this case we would need a similar method,
where you can give offset and length.

On the other hand, we can remove the "factory" for CFS files from directory, we can go back
to a simple new CFSDirectory(parentDirectory, cfsName).

Does this sound reasonable?

> Make CFS appendable  
> ---------------------
>                 Key: LUCENE-3218
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 3.4, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>         Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch,
LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch
> Currently CFS is created once all files are written during a flush / merge. Once on disk
the files are copied into the CFS format which is basically a unnecessary for some of the
files. We can at any time write at least one file directly into the CFS which can save a reasonable
amount of IO. For instance stored fields could be written directly during indexing and during
a Codec Flush one of the written files can be appended directly. This optimization is a nice
sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we
could transparently pack per field files into a single file only for docvalues without changing
any code once LUCENE-3216 is resolved.

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message