incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Short circuit reads and hadoop
Date Tue, 15 Oct 2013 17:37:21 GMT
Actually, what I unearthed after long-time fishing is this

1. Specifying hints to the namenode about the favored set of datanodes
    [HDFS-2576] when writing a file in hadoop.

    The downside of such a feature is a datanode failure. HDFS Balancer and

     Facebook's HBase port in GitHub has this patch in production

2.




On Tue, Oct 15, 2013 at 7:15 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> On Tue, Oct 15, 2013 at 9:25 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Your idea looks great.
> >
> > But I am afraid I don't seem to grasp it fully. I am attempting to
> > understand your suggestions. So please bear for a moment.
> >
> > When co-locating shard-server and data-nodes
> >
> > 1. Every shard-server will attempt to write a local copy
> > 2. Proceed with default settings of hadoop [remote rack etc...] for next
> > copies
> >
> > Ideally, there are 2 problems when writing a file
> >
> > 1. Unable to write a local copy. [Lack of sufficient storage etc...]
> Hadoop
> > deflects the
> >
>
> True, this might happen if the cluster is low on storage and there are
> local disk failures.  If the cluster is in this kind of condition and
> running the normal Hadoop balancer won't help (reduce the local storage by
> migrating to another machine) then it's likely we can't do anything to help
> the situation.
>
>
> >     write to some other destination, internally.
> > 2. Shard-server failure. Since this is stateless, it will most likely be
> a
> > hardware    failure/planned-maintenance.
> >
>
> Yes this is true.
>
>
> >
> > Instead of asynchronously re-balancing every write[moving
> blocks-to-shard],
> > is it possible for us to trap into the above cases alone?
> >
>
> Not sure what you mean here.
>
>
> >
> > BTW, how do we re-arrange shard layout when shards are added/removed.
> >
> > I looked at 0.20 code and it seems to move shards too much. That could be
> > detrimental for what we are discussing now, right?
> >
>
> I have fixed this issue or at least improved it.
>
> https://issues.apache.org/jira/browse/BLUR-260
>
> This implementation will only move the shards that are down, and will never
> move any more shards than is necessary to maintain a properly balanced
> shard cluster.
>
> Basically it recalculates an optimal layout when the number of servers
> changes.  It will likely need to be improved by optimizing the location of
> the shard (which server) by figuring out what server that can serve the
> shard has the most blocks from the indexes of the shard.  Doing this should
> minimize block movement.
>
>
> Also had another thought, if we follow something similar to what HBase does
> and perform a full optimization on each shard once a day.  That would
> locate the data to the local server without us having to do any other work.
>  Of course this assumes that there is enough space locally and that there
> has been some change in the index in the last 24 hours to warrant doing the
> merge.
>
> Aaron
>
>
> > --
> > Ravi
> >
> >
> >
> >
> > On Sun, Oct 13, 2013 at 12:51 AM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> >
> > > On Fri, Oct 11, 2013 at 11:12 AM, Ravikumar Govindarajan <
> > > ravikumar.govindarajan@gmail.com> wrote:
> > >
> > > > I came across this interesting JIRA in hadoop
> > > > https://issues.apache.org/jira/browse/HDFS-385
> > > >
> > > > In essence, it allows us more control over where blocks are placed.
> > > >
> > > > A BlockPlacementPolicy to optimistically write all blocks of a given
> > > > index-file into same set of datanodes could be highly helpful.
> > > >
> > > > Co-locating shard-servers and datanodes along with short-circuit
> reads
> > > > should improve greatly. We can always utilize the file-system cache
> for
> > > > local files. In case a given file is not served locally, then
> > > shard-servers
> > > > can use the block-cache
> > > >
> > > > Do you see some positives in such an approach?
> > > >
> > >
> > > I'm sure that there will be some performance improvements my using
> local
> > > reads for accessing HDFS.  The areas I would assume to see the biggest
> > > increases in performance would be merging and fetching data for
> > retrieval.
> > >  Even with accessing local drives, the search time would likely not be
> > > improved assuming the index is hot.  One of the reasons for doing
> > short-cut
> > > reads is to make use of the OS file system cache and since Blur already
> > > uses a filesystem cache it might just be duplicating functionality.
> > >  Another big reason for short-cut reads is to reduce network traffic.
> > >
> > > Given the scenario when another server has gone down.  If the shard
> > server
> > > is in an opening state we would have to change the Blur layout system
> to
> > > only fail to the server that contains the data for the index.  This
> might
> > > have some problems because the layout of the shards are based on how
> many
> > > shards are online in the existing shards not where they are located.
> > >
> > > So in a perfect world if the shard fails to the correct server it would
> > > reduce the amount of network traffic (to near 0 if it was perfect).  So
> > in
> > > short it could work, but it might be very hard.
> > >
> > > Assuming for a moment that the system is not dealing with a failure and
> > > assuming that the shard server is also running a datanode.  The first
> > > replica for HDFS is the local datanode, so if we just configure Hadoop
> > for
> > > the short-cut reads we would already get the benefits for data fetching
> > and
> > > merging.
> > >
> > > I had another idea that I wanted to run you.
> > >
> > > What if we instead of actively placing blocks during writes with the a
> > > Hadoop block layout policy, we write something like the Hadoop
> balancer.
> > >  We get Hadoop to move the blocks to the shard server (datanode) that
> is
> > > hosting the shard.  That way it would be asynchronous and after a
> > failure /
> > > shard movement it would relocate the blocks and still use a short-cut
> > read.
> > >  Kind of like the blocks following the shard around, instead of
> deciding
> > up
> > > front where the data for the shard should be located.  If this is what
> > you
> > > were thinking I'm sorry not understanding your suggestion.
> > >
> > > Aaron
> > >
> > >
> > >
> > > >
> > > > --
> > > > Ravi
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message