lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3201) improved compound file handling
Date Sat, 20 Aug 2011 09:20:27 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-3201:
----------------------------------

    Comment: was deleted

(was: During code review I found a problem in the MMap special handling regarding number of
open files:

The default CFS Reader opens one file handle for the CFS and then maps slices using CFIndexInput.
On the other hand, MMap's CFS directory impl does a separate mapping for each slice. To map
this slice, it opens a new file handle, mmaps the slice, and closes the file handle.

The question is now: Will this file handle then be occupied until the mapping diappears? If
this is the case, we could have TooManyOpenFiles even for CFS as each sub file would occupy
one file handle. At least the MMap specific CFS reader should use the same RAF all the time
time and keep it open for mapping.)

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap
the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course,
then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return
an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt
need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound
files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory
and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental
or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message