hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [Shadow Regions / Read Replicas ]
Date Wed, 04 Dec 2013 22:47:23 GMT
A few comments after reading through this thread:

+ Thanks for moving the (good) discussion here out of the issue.
+ Testing WAL 'tailing'* would be a good input to have.  My sense is that a
WALpr would make for about the same load on HDFS (and if so, lets just go
there altogether).
+ I like the notion of doing the minimum work necessary first BUT as has
been said above, we can't add a 'feature' that only works for one exotic
use case only; it will just rot.  Any 'customer' of said addition likely
does not want to be in a position where they are the only ones using any
such new addition.
+ I like the list Vladimir makes above.  We need to work on his list too
but it should be aside from this one.

Thanks,
St.Ack

* HDFS does not support 'tailing'.  Rather it is a heavyweight reopen of
the file each time we run off the end of the data.  Doing this for
replication and then per region replica would impose 'heavy' HDFS loading
(to be measured)


On Thu, Dec 5, 2013 at 6:00 AM, Enis Söztutar <enis.soz@gmail.com> wrote:

> >
> >
> > Thanks for adding it there -- I really think it is a big headline caveat
> on
> > my expectation of "eventual consistency".  Other systems out there that
> > give you eventually consistency on the millisecond level for most cases,
> > while this initial implementation would has eventual mean 10's of minutes
> > or even handfuls of minutes behind (with the snapshots flush mechanism)!
>
>
> > There are a handful of other things in the phase one part of the
> > implementation section that limit the usefulness of the feature to a
> > certain kind of constrained hbase user.  I'll start another thread for
> > those.
> >
> >
> Yes, hopefully we will not stop with only phase 1, and continue to
> implement
> the more-latent async wal replication and/or wal tailing. However phase 1
> will get us
> to the point of demonstrating that replicated regions works, the client
> side of execution
> is manageable, and there is real benefit for read-only or bulk loaded
> tables plus some
> specific use cases for read/write tables.
>
>
> >
> > >
> > > We are proposing to implement "Region snapshots" first and "Async wal
> > > replication" second.
> > > As argued, I think wal-tailing only makes sense with WALpr so, that
> work
> > is
> > > left until after we have WAL
> > > per region.
> > >
> > >
> > This is our main disagreement -- I'm not convinced that wal tailing only
> > making sense for the wal per region hlog implementation.  Instead of
> > bouncing around hypotheticals, it sounds like I'll be doing more
> > experiments to prove it to myself and to convince you. :)
> >
>
> That would be awesome! Region grouping or other related proposals for
> efficient wal tailing
> deserves it's own design doc(s).
>
>
> > >
> > > I think that would be great.  Back when we did snapshots, we had active
> > development against a prototype and spent a bit of time breaking it down
> > into manageable more polished pieces that had slightly lenient reviews.
> >  This exercise really helped us with our interfaces.  We committed code
> to
> > the dev branch which limited merge pains and diff for modifications made
> by
> > different contributors.  In the end when we had something we were happy
> > with on the dev branch we merged with trunk and fixed bugs/diffs that
> > cropped up in the mean time.  I'd suggest a similar process for this.
> >
>
> Agreed. We can make use of the previous best practices. Shame that we still
> do not have read-write git repo.
>
>
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message