lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-2632) FilteringCodec, TeeCodec, TeeDirectory
Date Thu, 27 Sep 2012 21:39:07 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2632:
-------------------------------

    Attachment: LUCENE-2632-filtering.patch

Patch includes FilteringCodec only files. I've fixed some minor issues such as license docs.

About the *.impl package, I think that if all classes were under *.filtering, we could make
all but FilteringCodec, WriteFilter and Noop* classes package-private, as everything seems
to be controlled by WriteFilter. What do you think?

Anyway, this isolated patch is cleaner and so now perhaps we can think of a different design,
such as move WriteFilter functionality to the different Formats/Consumers and let users override
that by using FilterCodec over FilteringCodec and providing their own Consumer/Formats. After
all, WriteFilter by default doesn't filter anything ...

And now that we have FilterCodec, perhaps we should rename FilteringCodec to something else,
like IndexFilteringCodec, or DataFilteringCodec ... make it more distinguishable than FilterCodec.

Comments are welcome.
                
> FilteringCodec, TeeCodec, TeeDirectory
> --------------------------------------
>
>                 Key: LUCENE-2632
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2632
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>    Affects Versions: 4.0-ALPHA
>            Reporter: Andrzej Bialecki 
>         Attachments: LUCENE-2632-filtering.patch, LUCENE-2632.patch, LUCENE-2632.patch,
LUCENE-2632.patch, LUCENE-2632.patch, LUCENE-2632.patch, LUCENE-2632-trunk.patch
>
>
> This issue adds two new Codec implementations:
> * TeeCodec: there have been attempts in the past to implement parallel writing to multiple
indexes so that they are all synchronized. This was however complicated due to the complexity
of IndexWriter/SegmentMerger logic. The solution presented here offers a similar functionality
but working on a different level - as the name suggests, the TeeCodec duplicates index data
into multiple output Directories.
> * TeeDirectory (used also in TeeCodec) is a simple abstraction to perform Directory operations
on several directories in parallel (effectively mirroring their data). Optionally it's possible
to specify a set of suffixes of files that should be mirrored so that non-matching files are
skipped.
> * FilteringCodec is related in a remote way to the ideas of index pruning presented in
LUCENE-1812 and the concept of tiered search. Since we can use TeeCodec to write to multiple
output Directories in a synchronized way, we could also filter out or modify some of the data
that is being written. The FilteringCodec provides this functionality, so that you can use
like this:
> {code}
> IndexWriter --> TeeCodec
>                  |  |
>                  |  +--> StandardCodec --> Directory1
>                  +--> FilteringCodec --> StandardCodec --> Directory2
> {code}
> The end result of this chain is two indexes that are kept in sync - one is the full regular
index, and the other one is a filtered index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message