incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: CacheIndexInput cacheSize
Date Mon, 24 Oct 2016 13:37:58 GMT
On Fri, Oct 21, 2016 at 1:41 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Our application makes use of 'write-thru-block-cache' only. During
> search/merge-reads, we have modified block-cache code to only probe the
> block-cache and avoid inserting to it.
>
> In such a usage scenario, I was thinking about introducing a
> 'readBufferSize'  (default=1KB) in CacheIndexInput. From block-cache or
> underlying file we read only 'readBufferSize' data & adjust counters
> accordingly when it's a short-circuit read...
>
> You think it could be made workable?
>

Yeah it should be.


>
> Another idea could be to bypass the cache directory during merges and read
> > directly from the hdfsdirectory.  Then perhaps you could take advantage
> of
> > the SC reads without having to deal with the cache directly.
>
>
> This is what we are currently evaluating & it looks to be a safe bet
>

Ok, let me know if you have any questions.


>
> --
> Ravi
>
> On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > I my experience I too have used block cache sizes in the 64KB range for
> the
> > same reasons you listed.  The biggest of which was because we were
> running
> > upwards of 100GB caches and 1K block cache sizes are not really possible
> at
> > that size.  The biggest probably with the compaction is with the .tim
> file,
> > the rest of the files are mostly sequential reads, but because that file
> is
> > a tree it tends to jump all over the place during compaction.  I would
> > recommend if you want to speed up compaction (merges) to allow the tim
> > files to be put into block cache during the merge (e.i. turn quiet reads
> > off for those files).  This of course could flow your cache with data
> that
> > you are about to remove, so if you have the cache space it's the easiest
> > solution.
> >
> > Another idea could be to bypass the cache directory during merges and
> read
> > directly from the hdfsdirectory.  Then perhaps you could take advantage
> of
> > the SC reads without having to deal with the cache directly.
> >
> > Aaron
> >
> > On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > We have set a fairly large cacheSize of 64KB in block-cache for
> avoiding
> > > too many keys, gc pressure etc...
> > >
> > > But CacheIndexInput tries to read 64KB of data during a cache-miss &
> > fills
> > > up the CacheValue. When doing short-circuit-reads, this could turn out
> to
> > > be excessive no? For a comparison, lucene uses only 1KB buffers for the
> > > same..
> > >
> > > Do you think this will likely affect performance of searches albeit in
> a
> > > minor way?
> > >
> > > --
> > > Ravi
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message