incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: CacheIndexInput cacheSize
Date Wed, 07 Dec 2016 21:23:45 GMT
Solr uses the original block cache that was created in Blur.  As for
locking, the only locking code in the read path should be in the cache map
itself and in the HDFS client code.  I believe both have some form of java
locks, likely the HDFS client will be far worse for performance.  The block
cache itself should be lock free.

Aaron

On Fri, Dec 2, 2016 at 8:10 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Ok, I forgot to include this link. Don't know what version Cloudera is on
> w.r.t BlockCache, but they are claiming that using it during merges results
> in critical-section (allocation) lock causing meltdown...
>
> https://blog.cloudera.com/blog/2016/08/resolving-lock-
> contention-in-apache-solr-a-performance-analysis-detective-story/
>
>
> Will this hold good for the latest BlockCache version of Blur too?
>
> On Fri, Dec 2, 2016 at 6:20 PM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > One thing I was wondering is, does block-cache acquire locks of any kind
> > during reads?
> >
> > I don't use the 'read-then-cache' construct at all, so was just thinking
> > if it is fine to eliminate locks (if any) on the read path
> >
> >
> > On Mon, Oct 24, 2016 at 7:07 PM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >
> >> On Fri, Oct 21, 2016 at 1:41 AM, Ravikumar Govindarajan <
> >> ravikumar.govindarajan@gmail.com> wrote:
> >>
> >> > Our application makes use of 'write-thru-block-cache' only. During
> >> > search/merge-reads, we have modified block-cache code to only probe
> the
> >> > block-cache and avoid inserting to it.
> >> >
> >> > In such a usage scenario, I was thinking about introducing a
> >> > 'readBufferSize'  (default=1KB) in CacheIndexInput. From block-cache
> or
> >> > underlying file we read only 'readBufferSize' data & adjust counters
> >> > accordingly when it's a short-circuit read...
> >> >
> >> > You think it could be made workable?
> >> >
> >>
> >> Yeah it should be.
> >>
> >>
> >> >
> >> > Another idea could be to bypass the cache directory during merges and
> >> read
> >> > > directly from the hdfsdirectory.  Then perhaps you could take
> >> advantage
> >> > of
> >> > > the SC reads without having to deal with the cache directly.
> >> >
> >> >
> >> > This is what we are currently evaluating & it looks to be a safe bet
> >> >
> >>
> >> Ok, let me know if you have any questions.
> >>
> >>
> >> >
> >> > --
> >> > Ravi
> >> >
> >> > On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <amccurry@gmail.com>
> >> wrote:
> >> >
> >> > > I my experience I too have used block cache sizes in the 64KB range
> >> for
> >> > the
> >> > > same reasons you listed.  The biggest of which was because we were
> >> > running
> >> > > upwards of 100GB caches and 1K block cache sizes are not really
> >> possible
> >> > at
> >> > > that size.  The biggest probably with the compaction is with the
> .tim
> >> > file,
> >> > > the rest of the files are mostly sequential reads, but because that
> >> file
> >> > is
> >> > > a tree it tends to jump all over the place during compaction.  I
> would
> >> > > recommend if you want to speed up compaction (merges) to allow the
> tim
> >> > > files to be put into block cache during the merge (e.i. turn quiet
> >> reads
> >> > > off for those files).  This of course could flow your cache with
> data
> >> > that
> >> > > you are about to remove, so if you have the cache space it's the
> >> easiest
> >> > > solution.
> >> > >
> >> > > Another idea could be to bypass the cache directory during merges
> and
> >> > read
> >> > > directly from the hdfsdirectory.  Then perhaps you could take
> >> advantage
> >> > of
> >> > > the SC reads without having to deal with the cache directly.
> >> > >
> >> > > Aaron
> >> > >
> >> > > On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
> >> > > ravikumar.govindarajan@gmail.com> wrote:
> >> > >
> >> > > > We have set a fairly large cacheSize of 64KB in block-cache for
> >> > avoiding
> >> > > > too many keys, gc pressure etc...
> >> > > >
> >> > > > But CacheIndexInput tries to read 64KB of data during a
> cache-miss &
> >> > > fills
> >> > > > up the CacheValue. When doing short-circuit-reads, this could
turn
> >> out
> >> > to
> >> > > > be excessive no? For a comparison, lucene uses only 1KB buffers
> for
> >> the
> >> > > > same..
> >> > > >
> >> > > > Do you think this will likely affect performance of searches
> albeit
> >> in
> >> > a
> >> > > > minor way?
> >> > > >
> >> > > > --
> >> > > > Ravi
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message