incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <>
Subject Re: Short circuit reads and hadoop
Date Tue, 15 Oct 2013 13:25:19 GMT
Your idea looks great.

But I am afraid I don't seem to grasp it fully. I am attempting to
understand your suggestions. So please bear for a moment.

When co-locating shard-server and data-nodes

1. Every shard-server will attempt to write a local copy
2. Proceed with default settings of hadoop [remote rack etc...] for next

Ideally, there are 2 problems when writing a file

1. Unable to write a local copy. [Lack of sufficient storage etc...] Hadoop
deflects the
    write to some other destination, internally.
2. Shard-server failure. Since this is stateless, it will most likely be a
hardware    failure/planned-maintenance.

Instead of asynchronously re-balancing every write[moving blocks-to-shard],
is it possible for us to trap into the above cases alone?

BTW, how do we re-arrange shard layout when shards are added/removed.

I looked at 0.20 code and it seems to move shards too much. That could be
detrimental for what we are discussing now, right?


On Sun, Oct 13, 2013 at 12:51 AM, Aaron McCurry <> wrote:

> On Fri, Oct 11, 2013 at 11:12 AM, Ravikumar Govindarajan <
>> wrote:
> > I came across this interesting JIRA in hadoop
> >
> >
> > In essence, it allows us more control over where blocks are placed.
> >
> > A BlockPlacementPolicy to optimistically write all blocks of a given
> > index-file into same set of datanodes could be highly helpful.
> >
> > Co-locating shard-servers and datanodes along with short-circuit reads
> > should improve greatly. We can always utilize the file-system cache for
> > local files. In case a given file is not served locally, then
> shard-servers
> > can use the block-cache
> >
> > Do you see some positives in such an approach?
> >
> I'm sure that there will be some performance improvements my using local
> reads for accessing HDFS.  The areas I would assume to see the biggest
> increases in performance would be merging and fetching data for retrieval.
>  Even with accessing local drives, the search time would likely not be
> improved assuming the index is hot.  One of the reasons for doing short-cut
> reads is to make use of the OS file system cache and since Blur already
> uses a filesystem cache it might just be duplicating functionality.
>  Another big reason for short-cut reads is to reduce network traffic.
> Given the scenario when another server has gone down.  If the shard server
> is in an opening state we would have to change the Blur layout system to
> only fail to the server that contains the data for the index.  This might
> have some problems because the layout of the shards are based on how many
> shards are online in the existing shards not where they are located.
> So in a perfect world if the shard fails to the correct server it would
> reduce the amount of network traffic (to near 0 if it was perfect).  So in
> short it could work, but it might be very hard.
> Assuming for a moment that the system is not dealing with a failure and
> assuming that the shard server is also running a datanode.  The first
> replica for HDFS is the local datanode, so if we just configure Hadoop for
> the short-cut reads we would already get the benefits for data fetching and
> merging.
> I had another idea that I wanted to run you.
> What if we instead of actively placing blocks during writes with the a
> Hadoop block layout policy, we write something like the Hadoop balancer.
>  We get Hadoop to move the blocks to the shard server (datanode) that is
> hosting the shard.  That way it would be asynchronous and after a failure /
> shard movement it would relocate the blocks and still use a short-cut read.
>  Kind of like the blocks following the shard around, instead of deciding up
> front where the data for the shard should be located.  If this is what you
> were thinking I'm sorry not understanding your suggestion.
> Aaron
> >
> > --
> > Ravi
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message