incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <>
Subject Re: Block-Cache and usage
Date Wed, 19 Mar 2014 17:57:20 GMT
One obvious case is a cache-hit scenario, where instead of using the
block-cache, there is a fairly heavy round-trip to data-node. It is also
highly likely that the data-node might have evicted the hot-pages due to
other active reads.

How much of cache-hit happens in Blur? Will I be correct in saying that
repeated terms occurring in search only will benefit block-cache?


On Wed, Mar 19, 2014 at 11:06 PM, Ravikumar Govindarajan <> wrote:

> I was looking at block-cache code and trying to understand why we need it.
> We divide the file into blocks of 8KB and write to hadoop. While reading,
> we only read in batches of 8KB and store in block-cache
> This is a form of read-ahead caching on the client-side[shard-server]. Am
> I correct in understanding?
> Recent releases of hadoop have a notion of read-ahead caching in data-node
> itself. The default value is 4MB but I believe it can also be configured to
> whatever is needed.
> What are the advantages of a block-cache vis-a-vis data-node read-ahead
> cache?
> I also am not familiar with hadoop IO sub-system as to whether it's
> correct and performant to do read-aheads in data-nodes for a use-case like
> lucene.
> Can someone help me?
> --
> Ravi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message