incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: CacheIndexInput cacheSize
Date Thu, 20 Oct 2016 21:56:37 GMT
I my experience I too have used block cache sizes in the 64KB range for the
same reasons you listed.  The biggest of which was because we were running
upwards of 100GB caches and 1K block cache sizes are not really possible at
that size.  The biggest probably with the compaction is with the .tim file,
the rest of the files are mostly sequential reads, but because that file is
a tree it tends to jump all over the place during compaction.  I would
recommend if you want to speed up compaction (merges) to allow the tim
files to be put into block cache during the merge (e.i. turn quiet reads
off for those files).  This of course could flow your cache with data that
you are about to remove, so if you have the cache space it's the easiest
solution.

Another idea could be to bypass the cache directory during merges and read
directly from the hdfsdirectory.  Then perhaps you could take advantage of
the SC reads without having to deal with the cache directly.

Aaron

On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> We have set a fairly large cacheSize of 64KB in block-cache for avoiding
> too many keys, gc pressure etc...
>
> But CacheIndexInput tries to read 64KB of data during a cache-miss & fills
> up the CacheValue. When doing short-circuit-reads, this could turn out to
> be excessive no? For a comparison, lucene uses only 1KB buffers for the
> same..
>
> Do you think this will likely affect performance of searches albeit in a
> minor way?
>
> --
> Ravi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message