lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschütter (JIRA) <j...@apache.org>
Subject [jira] Updated: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
Date Tue, 08 Dec 2009 15:38:18 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christian Kohlschütter updated LUCENE-2133:
-------------------------------------------

    Attachment:     (was: demo.patch)

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2133
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2133
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Christian Kohlschütter
>         Attachments: LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the FieldCache.
The FieldCache is a singleton which is supposed to cache certain information for every IndexReader
that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader)
or decorators (FilterIndexReader) which all access the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In fact, some
IndexReaders may be reopen()'ed and thus they may contain completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the limitations
imposed by the singleton construct there was no implementation other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes
that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579
and LUCENE-1749, but the overall situation remains the same: There is a central registry for
assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache
instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders
and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store
different data than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache
is an interface just like FieldCache and may support different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever
necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at
least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField) 
> I have provided an patch which takes care of all these issues. It passes all JUnit tests.
> The patch is quite large, admittedly, but the change required several modifications and
some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving
some of the updated functionality in the package org.apache.lucene.search.fields (field comparators
and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader
to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method
to all registered instances by calling an onClose() method with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource.
This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions
because of unsupported type values.
> The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The Lucene
community has to decide which classes/methods can immediately be removed, which ones later,
which not at all. Whenever new classes depend on the old ones, an appropriate notice exists
in the javadocs.
> - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which
is just there for testing purposes, to show that no sanity checks are necessary any longer.
This class may be removed at any time.
> - I expect that the patch does not impact performance. On the contrary, as the patch
removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been
done so far, though.
> - I have tried to preserve the existing functionality wherever possible and to focus
on the class/method structure only. We certainly may improve the caches' behavior, but this
out of scope for this patch.
> - The refactoring finally makes the high duplication of code visible: For all supported
atomic types (byte, double, float, int, long, short) three classes each are required: *Cache,
*Comparator and *Parser. I think that further simplification might be possible (maybe using
Java generics?), but I guess the current patch is large enough for now.
> Cheers,
> Christian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message