lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2632) FilteringCodec, TeeCodec, TeeDirectory
Date Tue, 14 Feb 2012 13:12:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207690#comment-13207690
] 

Robert Muir commented on LUCENE-2632:
-------------------------------------

{quote}
Also, the handling of segments.gen and compound files that bypasses codec actually forced
me to implement TeeDirectory.
{quote}

True, though I don't know of any simple solutions to either of these :)

for CFS, we made some tiny steps in LUCENE-3728, but the codec only has limited control here
(e.g. it can store certain things
outside of CFS, this is how preflex codec reads separate norms). But it cannot yet customize
the CFS filenames nor the actual
format/packing process.

                
> FilteringCodec, TeeCodec, TeeDirectory
> --------------------------------------
>
>                 Key: LUCENE-2632
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2632
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>         Attachments: LUCENE-2632.patch, LUCENE-2632.patch
>
>
> This issue adds two new Codec implementations:
> * TeeCodec: there have been attempts in the past to implement parallel writing to multiple
indexes so that they are all synchronized. This was however complicated due to the complexity
of IndexWriter/SegmentMerger logic. The solution presented here offers a similar functionality
but working on a different level - as the name suggests, the TeeCodec duplicates index data
into multiple output Directories.
> * TeeDirectory (used also in TeeCodec) is a simple abstraction to perform Directory operations
on several directories in parallel (effectively mirroring their data). Optionally it's possible
to specify a set of suffixes of files that should be mirrored so that non-matching files are
skipped.
> * FilteringCodec is related in a remote way to the ideas of index pruning presented in
LUCENE-1812 and the concept of tiered search. Since we can use TeeCodec to write to multiple
output Directories in a synchronized way, we could also filter out or modify some of the data
that is being written. The FilteringCodec provides this functionality, so that you can use
like this:
> {code}
> IndexWriter --> TeeCodec
>                  |  |
>                  |  +--> StandardCodec --> Directory1
>                  +--> FilteringCodec --> StandardCodec --> Directory2
> {code}
> The end result of this chain is two indexes that are kept in sync - one is the full regular
index, and the other one is a filtered index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message