lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <a...@flax.co.uk>
Subject Re: Anticipating a benchmark for direct posting format
Date Mon, 07 Apr 2014 21:32:07 GMT
Does FilterDirectoryReader do what you want?  https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html

Alan Woodward
www.flax.co.uk


On 7 Apr 2014, at 22:19, Benson Margulies wrote:

> Typically, an app gets a directory reader, which is a composite
> reader. To get a filter down there into the leaves of the composite
> reader, does anyone have a suggestion about where to enter the
> modularity?
> 
> I sort of want to insert myself at
> org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory,
> org.apache.lucene.index.IndexCommit) wrapping the segment readers, or
> I could make a sort of filtering composite reader that wraps each of
> the segment readers in a filter.
> 
> 
> On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera <serera@gmail.com> wrote:
>> Given that DPF delegates indexing to another PF anyway (currently Lucene41),
>> I think this might be the case. We would need to test of course. The key
>> point is that this FilterAtomicReader will be able to serve anything as
>> direct, even DV, so it might eliminate DVF too. We need to experiment and
>> benchmark!
>> 
>> Shai
>> 
>> On Apr 7, 2014 7:32 PM, "david.w.smiley@gmail.com"
>> <david.w.smiley@gmail.com> wrote:
>>> 
>>> Aaaah, nice idea to simply use FilterAtomicReader — of course!  So this
>>> would ultimately be a new IndexReaderFactory that creates
>>> FilterAtomicReaders for a subset of the fields you want to do this on.
>>> Cool!  With that, I don’t think there would be a need for
>>> DirectPostingsFormat as a postings format, would there be?
>>> 
>>> ~ David
>>> 
>>> 
>>> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <serera@gmail.com> wrote:
>>>> 
>>>> The only problem is how the Codec makes a dynamic decision on whether to
>>>> use the wrapped Codec for reading vs pre-load data into in-memory
>>>> structures, because Codecs are loaded through reflection by the SPI loading
>>>> mechanism.
>>>> 
>>>> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
>>>> mentioning in case you want to tackle DPF.
>>>> 
>>>> I think that if we allowed passing something like a CodecLookupService,
>>>> with an SPILookupService default impl, you could easily pass that to
>>>> DirectoryReader which will use your runtime logic to load the right PF (e.g.
>>>> DPF) instead of the one the index was created with.
>>>> 
>>>> But it sounds like the core problem is that when we load a Codec/PF/DVF
>>>> for reading, we cannot pass it any arguments, and so we must make an
>>>> index-time decision about how we're going to read the data later on. If we
>>>> could somehow support that, I think that will help you to achieve what you
>>>> want too.
>>>> 
>>>> E.g. currently it's an all-or-nothing decision, but if we could pass a
>>>> parameter like "50% available heap", the Codec/PF/DVF could cache the
>>>> frequently accessed postings instead of loading all of them into memory.
>>>> But, that can also be achieved at the IndexReader level, through a custom
>>>> FilterAtomicReader. And if you could reuse DPF's structures (like
>>>> DirectTermsEnum, DirectFields...), it should be easier to do this. So
>>>> perhaps we can think about a DirectAtomicReader which does that? I believe
>>>> it can share some code w/ DPF, as long as we don't make these APIs public,
>>>> or make them @super.experimental and @super.expert.
>>>> 
>>>> Just throwing some ideas...
>>>> 
>>>> Shai
>>>> 
>>>> 
>>>> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smiley@gmail.com
>>>> <david.w.smiley@gmail.com> wrote:
>>>>> 
>>>>> Benson, I like your idea.
>>>>> 
>>>>> I think your idea can be achieved as a codec, one that wraps another
>>>>> codec that establishes the on-disk format.  By default the wrapped codec
can
>>>>> be Lucene’s default codec.  I think, if implemented, this would be
a change
>>>>> to DPF instead of an additional DPF-variant codec.
>>>>> 
>>>>> ~ David
>>>>> 
>>>>> 
>>>>> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies <bimargulies@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <rcmuir@gmail.com>
wrote:
>>>>>>> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies
>>>>>>> <bimargulies@gmail.com> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> My takeaway from the prior conversation was that various
people
>>>>>>>> didn't
>>>>>>>> entirely believe that I'd seen a dramatic improvement in
query perfo
>>>>>>>> using D-P-F, and so would not smile upon a patch intended
to
>>>>>>>> liberate
>>>>>>>> D-P-F from codecs. It could be that the effect I saw has
to do with
>>>>>>>> the fact that our system depends on hitting and scoring 50%
of the
>>>>>>>> documents in an index with a lot of documents.
>>>>>>>> 
>>>>>>> 
>>>>>>> I dont understand the word "liberate" here. why is it such a
problem
>>>>>>> that this is a codec?
>>>>>> 
>>>>>> I don't want to have to declare my intentions at the time I create
>>>>>> the index. I don't want to have to use D-P-F for all readers all
the
>>>>>> time. Because I want to be able to decide to open up an index with
an
>>>>>> arbitrary on-disk format and get the in-memory cache behavior of
>>>>>> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
>>>>>> memory' from the choice of the on-disk format.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> i do not think we should give it any more status than that, it
wastes
>>>>>>> too much ram.
>>>>>> 
>>>>>> It didn't seem like 'waste' when it solved a big practical for us.
We
>>>>>> had an application that was too slow, and had plenty of RAM available,
>>>>>> and we were able to trade space for time by applying D-P-F.
>>>>>> 
>>>>>> Maybe I'm going about this backwards; if I can come up with a small,
>>>>>> inconspicuous proposed change that does what I want, there won't
be
>>>>>> any disagreement.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


Mime
View raw message