hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Time-series data problem (WAS -> Cassandra vs HBase)
Date Fri, 04 Sep 2009 17:54:52 GMT
Yes.

As Andrew suggests, it would be good to have a generalized solution to this
issue.  Also, going forward, we're will probably start to talk up hbase-48
as a solution to the common "how to load hbase quickly" problem but only
after the patch in hbase-48 has had some exercise and review by others.

So, I'm interested in helping out with this problem of yours.

One thing I should have mentioned is the partitioner you'll have to write
for hbase-48 to ensure global sort order.  The partitioner cannot be
generic.  It needs to be intimate with how row keys are structured.  In this
case the times range window would move forward with each MR job.  At least
the start of the window would have to be fed to the partitioner.  You might
have to come up w/ a heuristic for how many reducers to run at any one time
and how they partition the time range window so there is a decent
distribution of events.

St.Ack

On Fri, Sep 4, 2009 at 10:38 AM, Schubert Zhang <zsongbo@gmail.com> wrote:

> @ stack
>
> Thank you very much. Your idea is very good.
> Yes, the monotonically increasing rowkeys are very appropriate to use
> bulk-load tools to create new regions from bottom to up. I would study
> HBASE-48 and do these experiments.
>
> This idea should be based on strict monotonically increasing rowkeys. Is it
> right?
>
> Schubert
>
>
> On Sat, Sep 5, 2009 at 1:16 AM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> > Clean this up a bit and smooth the rough edges and this is I think a
> valid
> > strategy for handling a continuous high volume import of sequential data
> > with monotonically increasing row keys generated by some global counter,
> NTP
> > synchronized System.currentTimeMillis() for example. To support this
> > especially when the functions of META are relocated to ZK we should add
> the
> > necessary APIs under HBaseAdmin.
> >
> > > If by chance they have edits that don't fit the tables current time
> > range, they'll fail to go in because first and last regions are
> read-only.
> >
> > I'm not sure I understand that phrasing. If the script modifies the
> > left-most and right-most region (in terms of the key space) to have the
> > empty row as start and end key respectively, won't any edits to the left
> or
> > right of the tables current time range go in?
> >
> >    - Andy
> >
> >
> >
> >
> > ________________________________
> > From: stack <stack@duboce.net>
> > To: hbase-user@hadoop.apache.org
> > Sent: Friday, September 4, 2009 9:50:33 AM
> > Subject: Time-series data problem (WAS -> Cassandra vs HBase)
> >
> > Ok.  Thanks for answering the questions.
> >
> > Could you do something like the below?
> >
> > + One table keyed by timestamp with an additional qualifier of some sort
> --
> > random if you have to -- so each event is a row of its own or a column of
> > its own inside a row keyed by ts.
> > + In such a table, new data would always get added to the end of the
> table;
> > old data would be removed from the top of the table.
> > + Catch the incoming events in hdfs.  They will be unordered.  Catch them
> > into sequence files (Would 'scribe' help here?)
> > + Study hbase-48.  See how it works.  Its a MR job that writes regions
> each
> > of which has a single hfile of close to maximum regionsize.
> > + Run a MR job every 10 minutes that takes all events that have arrived
> > since the last MR job, sorts them, passes them through the hbase-48
> sorting
> > reducer and in turn out to the hfileoutputformat writer writing regions.
> > + When the MR job completes, trigger a version of the script that is in
> > hbase-48 for adding the new regions.  It will move into place and add to
> > .META. regions that have been made by the last MR job.
> > + On the next run of the meta scanner, it runs every minute, the new
> > regions
> > will show in the new table.
> > + The first and last regions in the table would be managed by an external
> > script.  Aging out regions would just be a matter of deleting mention of
> > regions from .META. and removing their content from hdfs.  The same
> script
> > would be responsible for updating the first region in the .META. table
> > moving its end row to encompass the start row of the new 'first' row in
> the
> > table.  Some similar juggling would have to be done around the 'last'
> entry
> > in .META. -- every time you added regions, the last .META. entry would
> need
> > adjusting.  These regions would need to be read-only (You can set flag on
> > these regions from the script) so they didn't take on any writes.
> >
> > The MR job writing hfiles directly should be a good bit faster than hbase
> > taking the edits directly.
> >
> > If a hiccup in the system, the spill to hdfs will make it so you don't
> lose
> > events.  The next MR job that runs will just take a while.
> >
> > The batch updates that arrive by other channels that you mention in
> another
> > mail should go in fine... If by chance they have edits that don't fit the
> > tables current time range, they'll fail to go in because first and last
> > regions are read-only.
> >
> > You should run with bigger regions as the lads suggest elsewhere (and
> yeah,
> > 10-20  machines seems low for the size of the data -- you'll probably
> have
> > to start with a  narrower time range than 3 months)
> >
> > Sorry if above is a bit of a rube goldberg machine.  Just a suggestion.
> >
> > St.Ack
> >
> >
> >
> >
> >
> > On Thu, Sep 3, 2009 at 11:44 PM, Schubert Zhang <zsongbo@gmail.com>
> wrote:
> >
> > > @stack
> > >
> > >
> > >
> > > On Fri, Sep 4, 2009 at 7:34 AM, stack <stack@duboce.net> wrote:
> > >
> > > > More questions:
> > > >
> > > > + What are the requirements regards the lag between receipt of data
> and
> > > it
> > > > showing in the system?
> > >
> > > One minute, ten minutes, an hour, or 24 hours?
> > > >
> > >
> > >  As soon as possable.
> > > Delay of 10 or more minutes are acceptable.
> > >
> > >
> > > > + How many column families?
> > > >
> > >
> > > one is ok if using HBase.
> > > but I am selecting best solution now,
> > >
> > >
> > > >
> > > > St.Ack
> > > >
> > > >
> > > > On Wed, Sep 2, 2009 at 11:37 PM, Schubert Zhang <zsongbo@gmail.com>
> > > wrote:
> > > >
> > > > > in line.
> > > > >
> > > > > On Thu, Sep 3, 2009 at 12:46 PM, stack <stack@duboce.net> wrote:
> > > > >
> > > > > >
> > > > > > How many machines can you use for this job?
> > > > >
> > > > >
> > > > > Tens.  (e.g. 10~20)
> > > > >
> > > > >
> > > > > >
> > > > > > Do you need to keep it all?  Does some data expire (or can it
be
> > > moved
> > > > > > offline)?
> > > > > >
> > > > > Yes, we need remove old data which expire.
> > > > >
> > > > >
> > > > >
> > > > > > I see why you have timestamp as part of the key in your current
> > hbase
> > > > > > cluster -- i.e. tall tables -- as you have no other choice
> > currently.
> > > > >
> > > > >
> > > > > > It might make sense premaking the regions in the table.  Look
at
> > how
> > > > many
> > > > > > regions were made the day before and go ahead and premake them
to
> > > save
> > > > > > yourself having to ride over splits (I can show you how to write
> a
> > > > little
> > > > > > script to do this).
> > > > > >
> > > > > > Does the time-series data arrive roughly on time -- e.g. all
> > > > instruments
> > > > > > emit the 4 o'clock readings at 4 o'clock or is there some flux
in
> > > here?
> > > > >  In
> > > > > > other words, do you have a write rate of thousands of updates
per
> > > > second,
> > > > > > all carrying the same timestamp?
> > > > >
> > > > >
> > > > > The data will arrive with a minutes delay.
> > > > > Usually, we need to write/ingest tens of thousands of new rows.
> Many
> > > rows
> > > > > with the same timestamp.
> > > > >
> > > > >
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Schubert
> > > > > > >
> > > > > > > On Thu, Sep 3, 2009 at 2:32 AM, Jonathan Gray <
> jlist@streamy.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > @Sylvain
> > > > > > > >
> > > > > > > > If you describe your use case, perhaps we can help
you to
> > > > understand
> > > > > > what
> > > > > > > > others are doing / have done similarly.  Event logging
is
> > > certainly
> > > > > > > > something many of us have done.
> > > > > > > >
> > > > > > > > If you're wondering about how much load HBase can
handle,
> > provide
> > > > > some
> > > > > > > > numbers of what you expect.  How much data in bytes
are
> > > associated
> > > > > with
> > > > > > > each
> > > > > > > > event, how many events per hour, and what operations
do you
> > want
> > > to
> > > > > do
> > > > > > on
> > > > > > > > it?  We could help you determine how big of a cluster
you
> might
> > > > need
> > > > > > and
> > > > > > > the
> > > > > > > > kind of write/read throughput you might see.
> > > > > > > >
> > > > > > > > @Schubert
> > > > > > > >
> > > > > > > > You do not need to partition your tables by stamp.
 One
> > > possibility
> > > > > is
> > > > > > to
> > > > > > > > put the stamp as the first part of your rowkeys, and
in that
> > way
> > > > you
> > > > > > will
> > > > > > > > have the table sorted by time.  Using Scan's start/stop
keys,
> > you
> > > > can
> > > > > > > > prevent doing a full table scan.
> > > > > > > >
> > > > > > > It would not work. Since our data comes fastly. In the
method
> > only
> > > > one
> > > > > > > region(server) are busy for writing. The throughput is
bad for
> > > > writing.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > For both of you... If you are storing massive amounts
of
> > > streaming
> > > > > > > log-type
> > > > > > > > data, do you need full random read access to it? 
If you just
> > > need
> > > > to
> > > > > > > > process on subsets of time, that's easily partitioned
by
> file.
> > > > HBase
> > > > > > > should
> > > > > > > > be used if you need to *read* from it randomly, not
> streaming.
> > >  If
> > > > > you
> > > > > > > have
> > > > > > > > processing that HBase's inherent sorting, grouping,
and
> > indexing
> > > > can
> > > > > > > benefit
> > > > > > > > from, then it also can make sense to use HBase in
order to
> > avoid
> > > > > > > full-scans
> > > > > > > > of data.
> > > > > > > >
> > > > > > >
> > > > > > > I know it is a contradiction between random-access and
batch
> > > > > processing.
> > > > > > > But
> > > > > > > the features of HBase(sorting, distributed b-tree,
> > > merge/compaction)
> > > > > are
> > > > > > > very attractive.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > HBase is not the answer because of lack of HDFS append.
 You
> > > could
> > > > > > buffer
> > > > > > > > in something outside HDFS, close files after a certain
> > size/time
> > > > > (this
> > > > > > > his
> > > > > > > > what hbase does now, we can have data loss because
of no
> > > > > > > > appends as well), etc...
> > > > > > > >
> > > > > > > > Reads/writes of lots of streaming data to HBase will
always
> be
> > > > slower
> > > > > > > than
> > > > > > > > HDFS.  HBase adds additional buffering, and the
> > compaction/split
> > > > > > > processes
> > > > > > > > actually mean you copy the same data multiple times
(probably
> > 3-4
> > > > > times
> > > > > > > avg
> > > > > > > > which lines up with the 3-4x slowdown you see).
> > > > > > > >
> > > > > > > >
> > > > > > > > And there is currently a patch in development (that
works at
> > > least
> > > > > > > > partially) to do direct-to-hdfs imports to HBase which
would
> > then
> > > > be
> > > > > > > nearly
> > > > > > > > as fast as a normal HDFS writing job.
> > > > > > > >
> > > > > > > > Issue here:  https://issues.apache.org/jira/browse/HBASE-48
> > > > > > > >
> > > > > > > >
> > > > > > > > JG
> > > > > > > >
> > > > > > > >
> > > > > > > > Sylvain Hellegouarch wrote:
> > > > > > > >
> > > > > > > >> I must admit, I'm left as puzzled as you are.
Our current
> use
> > > case
> > > > > at
> > > > > > > work
> > > > > > > >> involve large amount of small event log writing.
Of course
> > HDFS
> > > > was
> > > > > > > quickly
> > > > > > > >> out of question since it's not there yet to append
to a file
> > and
> > > > > more
> > > > > > > >> generally to handle large amount of small write
ops.
> > > > > > > >>
> > > > > > > >> So we decided with HBase because we trust the
Hadoop/HBase
> > > > > > > infrastructure
> > > > > > > >> will offer us the robustness and reliability we
need. That
> > being
> > > > > said,
> > > > > > > I'm
> > > > > > > >> not feeling at ease in regards to the capacity
of HBase to
> > > handle
> > > > > the
> > > > > > > >> potential load we are looking at inputing.
> > > > > > > >>
> > > > > > > >> In fact, it's a common treat of such systems,
they've been
> > > > designed
> > > > > > with
> > > > > > > a
> > > > > > > >> certain use case in mind and sometimes I feel
like their
> > design
> > > > and
> > > > > > > >> implementation leak way too much on our infrastructure,
> > leading
> > > us
> > > > > > down
> > > > > > > the
> > > > > > > >> path of a virtual lock-in.
> > > > > > > >>
> > > > > > > >> Now I am not accusing anyone here, just observing
that I
> find
> > it
> > > > > > really
> > > > > > > >> hard to locate any industrial story of those systems
in a
> > > similar
> > > > > use
> > > > > > > case
> > > > > > > >> we have at hand.
> > > > > > > >>
> > > > > > > >> The number of nodes this or that company has doesn't
quite
> > > > interest
> > > > > me
> > > > > > > as
> > > > > > > >> much as the way they are actually using HBase
and Hadoop.
> > > > > > > >>
> > > > > > > >> RDBMS don't scale as well but they've got a long
history and
> > > > people
> > > > > do
> > > > > > > >> know how to optimise, use and manage them. It
seems
> > > > column-oriented
> > > > > > > database
> > > > > > > >> systems are still young :)
> > > > > > > >>
> > > > > > > >> - Sylvain
> > > > > > > >>
> > > > > > > >> Schubert Zhang a écrit :
> > > > > > > >>
> > > > > > > >>> Regardless Cassandra, I want to discuss some
questions
> about
> > > > > > > >>> HBase/Bigtable.  Any advices are expected.
> > > > > > > >>>
> > > > > > > >>> Regards runing MapReduce to scan/analyze big
data in HBase.
> > > > > > > >>>
> > > > > > > >>> Compared to sequentially reading data from
HDFS files
> > directly,
> > > > > > > >>> scan/sequential-reading data from HBase is
slower. (As my
> > test,
> > > > at
> > > > > > > least
> > > > > > > >>> 3:1
> > > > > > > >>> or 4:1).
> > > > > > > >>>
> > > > > > > >>> For the data in HBase, it is diffcult to only
analyze
> > specified
> > > > > part
> > > > > > of
> > > > > > > >>> data. For example, it is diffcult to only
analyze the
> recent
> > > one
> > > > > day
> > > > > > of
> > > > > > > >>> data. In my application, I am considering
partition data
> into
> > > > > > different
> > > > > > > >>> HBase tables (e.g. one day - one table), then,
I can only
> > touch
> > > > one
> > > > > > > table
> > > > > > > >>> for analyze via MapReduce.
> > > > > > > >>> In Google's Bigtable paper, in the "8.1 Google
Analytics",
> > they
> > > > > also
> > > > > > > >>> discribe this usage, but I don't know how.
> > > > > > > >>>
> > > > > > > >>> It is also slower to put flooding data into
HBase table
> than
> > > > > writing
> > > > > > to
> > > > > > > >>> files. (As my test, at least 3:1 or 4:1 too).
So, maybe in
> > the
> > > > > > future,
> > > > > > > >>> HBase
> > > > > > > >>> can provide a bulk-load feature, like PNUTS?
> > > > > > > >>>
> > > > > > > >>> Many people suggest us to only store metadata
into HBase
> > > tables,
> > > > > and
> > > > > > > >>> leave
> > > > > > > >>> data in HDFS files, because our time-series
dataset is very
> > > big.
> > > >  I
> > > > > > > >>> understand this idea make sense for some simple
application
> > > > > > > requirements.
> > > > > > > >>> But usually, I want different indexes to the
raw data. It
> is
> > > > > diffcult
> > > > > > > to
> > > > > > > >>> build such indexes if the the raw data files
(which are raw
> > or
> > > > are
> > > > > > > >>> reconstructed via MapReduce  periodically
on recent data )
> > are
> > > > not
> > > > > > > >>> totally
> > > > > > > >>> sorted.  .... HBase can provide us many expected
features:
> > > > sorted,
> > > > > > > >>> distributed b-tree, compact/merge.
> > > > > > > >>>
> > > > > > > >>> So, it is very difficult for me to make trade-off.
> > > > > > > >>> If I store data in HDFS files (may be partitioned),
and
> > > > > > metadata/index
> > > > > > > in
> > > > > > > >>> HBase. The metadata/index is very difficult
to be build.
> > > > > > > >>> If I rely on HBase totally, the performance
of
> ingesting-data
> > > and
> > > > > > > >>> scaning-data is not good. Is it reasonable
to do MapReduce
> on
> > > > > HBase?
> > > > > > We
> > > > > > > >>> know
> > > > > > > >>> the goal of HBase is to provide random access
over HDFS,
> and
> > it
> > > > is
> > > > > a
> > > > > > > >>> extention or adaptor over HDFS.
> > > > > > > >>>
> > > > > > > >>> ----
> > > > > > > >>> Many a time, I am thinking, maybe we need
a data storage
> > > engine,
> > > > > > which
> > > > > > > >>> need
> > > > > > > >>> not so strong consistency, and it can provide
better
> writing
> > > and
> > > > > > > >>> reading throughput like HDFS. Maybe, we can
design another
> > > system
> > > > > > like
> > > > > > > a
> > > > > > > >>> simpler HBase ?
> > > > > > > >>>
> > > > > > > >>> Schubert
> > > > > > > >>>
> > > > > > > >>> On Wed, Sep 2, 2009 at 8:56 AM, Andrew Purtell
<
> > > > > apurtell@apache.org>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>> To be precise, S3.
> > > > http://status.aws.amazon.com/s3-20080720.html
> > > > > > > >>>>
> > > > > > > >>>>  - Andy
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> ________________________________
> > > > > > > >>>> From: Andrew Purtell <apurtell@apache.org>
> > > > > > > >>>> To: hbase-user@hadoop.apache.org
> > > > > > > >>>> Sent: Tuesday, September 1, 2009 5:53:09
PM
> > > > > > > >>>> Subject: Re: Cassandra vs HBase
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> Right... I recall an incident in AWS where
a malformed
> > gossip
> > > > > packet
> > > > > > > >>>> took
> > > > > > > >>>> down all of Dynamo. Seems that even P2P
doesn't mitigate
> > > against
> > > > > > > corner
> > > > > > > >>>> cases.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Sep 1, 2009 at 3:12 PM, Jonathan
Ellis <
> > > > jbellis@gmail.com
> > > > > >
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>> The big win for Cassandra is that
its p2p distribution
> > model
> > > --
> > > > > > which
> > > > > > > >>>>> drives the consistency model -- means
there is no single
> > > point
> > > > of
> > > > > > > >>>>> failure.  SPF can be mitigated by
failover but it's
> really,
> > > > > really
> > > > > > > >>>>> hard to get all the corner cases right
with that
> approach.
> > > >  Even
> > > > > > > >>>>> Google with their 3 year head start
and huge engineering
> > > > > resources
> > > > > > > >>>>> still has trouble with that occasionally.
 (See e.g.
> > > > > > > >>>>>
> > > > > >
> > http://groups.google.com/group/google-appengine/msg/ba95ded980c8c179
> > > > > > > .)
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message