hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Time-series data problem (WAS -> Cassandra vs HBase)
Date Fri, 04 Sep 2009 17:16:42 GMT
Clean this up a bit and smooth the rough edges and this is I think a valid strategy for handling
a continuous high volume import of sequential data with monotonically increasing row keys
generated by some global counter, NTP synchronized System.currentTimeMillis() for example.
To support this especially when the functions of META are relocated to ZK we should add the
necessary APIs under HBaseAdmin. 

> If by chance they have edits that don't fit the tables current time range, they'll fail
to go in because first and last regions are read-only.

I'm not sure I understand that phrasing. If the script modifies the left-most and right-most
region (in terms of the key space) to have the empty row as start and end key respectively,
won't any edits to the left or right of the tables current time range go in?

    - Andy




________________________________
From: stack <stack@duboce.net>
To: hbase-user@hadoop.apache.org
Sent: Friday, September 4, 2009 9:50:33 AM
Subject: Time-series data problem (WAS -> Cassandra vs HBase)

Ok.  Thanks for answering the questions.

Could you do something like the below?

+ One table keyed by timestamp with an additional qualifier of some sort --
random if you have to -- so each event is a row of its own or a column of
its own inside a row keyed by ts.
+ In such a table, new data would always get added to the end of the table;
old data would be removed from the top of the table.
+ Catch the incoming events in hdfs.  They will be unordered.  Catch them
into sequence files (Would 'scribe' help here?)
+ Study hbase-48.  See how it works.  Its a MR job that writes regions each
of which has a single hfile of close to maximum regionsize.
+ Run a MR job every 10 minutes that takes all events that have arrived
since the last MR job, sorts them, passes them through the hbase-48 sorting
reducer and in turn out to the hfileoutputformat writer writing regions.
+ When the MR job completes, trigger a version of the script that is in
hbase-48 for adding the new regions.  It will move into place and add to
.META. regions that have been made by the last MR job.
+ On the next run of the meta scanner, it runs every minute, the new regions
will show in the new table.
+ The first and last regions in the table would be managed by an external
script.  Aging out regions would just be a matter of deleting mention of
regions from .META. and removing their content from hdfs.  The same script
would be responsible for updating the first region in the .META. table
moving its end row to encompass the start row of the new 'first' row in the
table.  Some similar juggling would have to be done around the 'last' entry
in .META. -- every time you added regions, the last .META. entry would need
adjusting.  These regions would need to be read-only (You can set flag on
these regions from the script) so they didn't take on any writes.

The MR job writing hfiles directly should be a good bit faster than hbase
taking the edits directly.

If a hiccup in the system, the spill to hdfs will make it so you don't lose
events.  The next MR job that runs will just take a while.

The batch updates that arrive by other channels that you mention in another
mail should go in fine... If by chance they have edits that don't fit the
tables current time range, they'll fail to go in because first and last
regions are read-only.

You should run with bigger regions as the lads suggest elsewhere (and yeah,
10-20  machines seems low for the size of the data -- you'll probably have
to start with a  narrower time range than 3 months)

Sorry if above is a bit of a rube goldberg machine.  Just a suggestion.

St.Ack





On Thu, Sep 3, 2009 at 11:44 PM, Schubert Zhang <zsongbo@gmail.com> wrote:

> @stack
>
>
>
> On Fri, Sep 4, 2009 at 7:34 AM, stack <stack@duboce.net> wrote:
>
> > More questions:
> >
> > + What are the requirements regards the lag between receipt of data and
> it
> > showing in the system?
>
> One minute, ten minutes, an hour, or 24 hours?
> >
>
>  As soon as possable.
> Delay of 10 or more minutes are acceptable.
>
>
> > + How many column families?
> >
>
> one is ok if using HBase.
> but I am selecting best solution now,
>
>
> >
> > St.Ack
> >
> >
> > On Wed, Sep 2, 2009 at 11:37 PM, Schubert Zhang <zsongbo@gmail.com>
> wrote:
> >
> > > in line.
> > >
> > > On Thu, Sep 3, 2009 at 12:46 PM, stack <stack@duboce.net> wrote:
> > >
> > > >
> > > > How many machines can you use for this job?
> > >
> > >
> > > Tens.  (e.g. 10~20)
> > >
> > >
> > > >
> > > > Do you need to keep it all?  Does some data expire (or can it be
> moved
> > > > offline)?
> > > >
> > > Yes, we need remove old data which expire.
> > >
> > >
> > >
> > > > I see why you have timestamp as part of the key in your current hbase
> > > > cluster -- i.e. tall tables -- as you have no other choice currently.
> > >
> > >
> > > > It might make sense premaking the regions in the table.  Look at how
> > many
> > > > regions were made the day before and go ahead and premake them to
> save
> > > > yourself having to ride over splits (I can show you how to write a
> > little
> > > > script to do this).
> > > >
> > > > Does the time-series data arrive roughly on time -- e.g. all
> > instruments
> > > > emit the 4 o'clock readings at 4 o'clock or is there some flux in
> here?
> > >  In
> > > > other words, do you have a write rate of thousands of updates per
> > second,
> > > > all carrying the same timestamp?
> > >
> > >
> > > The data will arrive with a minutes delay.
> > > Usually, we need to write/ingest tens of thousands of new rows. Many
> rows
> > > with the same timestamp.
> > >
> > >
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > > > Schubert
> > > > >
> > > > > On Thu, Sep 3, 2009 at 2:32 AM, Jonathan Gray <jlist@streamy.com>
> > > wrote:
> > > > >
> > > > > > @Sylvain
> > > > > >
> > > > > > If you describe your use case, perhaps we can help you to
> > understand
> > > > what
> > > > > > others are doing / have done similarly.  Event logging is
> certainly
> > > > > > something many of us have done.
> > > > > >
> > > > > > If you're wondering about how much load HBase can handle, provide
> > > some
> > > > > > numbers of what you expect.  How much data in bytes are
> associated
> > > with
> > > > > each
> > > > > > event, how many events per hour, and what operations do you
want
> to
> > > do
> > > > on
> > > > > > it?  We could help you determine how big of a cluster you might
> > need
> > > > and
> > > > > the
> > > > > > kind of write/read throughput you might see.
> > > > > >
> > > > > > @Schubert
> > > > > >
> > > > > > You do not need to partition your tables by stamp.  One
> possibility
> > > is
> > > > to
> > > > > > put the stamp as the first part of your rowkeys, and in that
way
> > you
> > > > will
> > > > > > have the table sorted by time.  Using Scan's start/stop keys,
you
> > can
> > > > > > prevent doing a full table scan.
> > > > > >
> > > > > It would not work. Since our data comes fastly. In the method only
> > one
> > > > > region(server) are busy for writing. The throughput is bad for
> > writing.
> > > > >
> > > > >
> > > > > >
> > > > > > For both of you... If you are storing massive amounts of
> streaming
> > > > > log-type
> > > > > > data, do you need full random read access to it?  If you just
> need
> > to
> > > > > > process on subsets of time, that's easily partitioned by file.
> > HBase
> > > > > should
> > > > > > be used if you need to *read* from it randomly, not streaming.
>  If
> > > you
> > > > > have
> > > > > > processing that HBase's inherent sorting, grouping, and indexing
> > can
> > > > > benefit
> > > > > > from, then it also can make sense to use HBase in order to avoid
> > > > > full-scans
> > > > > > of data.
> > > > > >
> > > > >
> > > > > I know it is a contradiction between random-access and batch
> > > processing.
> > > > > But
> > > > > the features of HBase(sorting, distributed b-tree,
> merge/compaction)
> > > are
> > > > > very attractive.
> > > > >
> > > > >
> > > > > >
> > > > > > HBase is not the answer because of lack of HDFS append.  You
> could
> > > > buffer
> > > > > > in something outside HDFS, close files after a certain size/time
> > > (this
> > > > > his
> > > > > > what hbase does now, we can have data loss because of no
> > > > > > appends as well), etc...
> > > > > >
> > > > > > Reads/writes of lots of streaming data to HBase will always
be
> > slower
> > > > > than
> > > > > > HDFS.  HBase adds additional buffering, and the compaction/split
> > > > > processes
> > > > > > actually mean you copy the same data multiple times (probably
3-4
> > > times
> > > > > avg
> > > > > > which lines up with the 3-4x slowdown you see).
> > > > > >
> > > > > >
> > > > > > And there is currently a patch in development (that works at
> least
> > > > > > partially) to do direct-to-hdfs imports to HBase which would
then
> > be
> > > > > nearly
> > > > > > as fast as a normal HDFS writing job.
> > > > > >
> > > > > > Issue here:  https://issues.apache.org/jira/browse/HBASE-48
> > > > > >
> > > > > >
> > > > > > JG
> > > > > >
> > > > > >
> > > > > > Sylvain Hellegouarch wrote:
> > > > > >
> > > > > >> I must admit, I'm left as puzzled as you are. Our current
use
> case
> > > at
> > > > > work
> > > > > >> involve large amount of small event log writing. Of course
HDFS
> > was
> > > > > quickly
> > > > > >> out of question since it's not there yet to append to a
file and
> > > more
> > > > > >> generally to handle large amount of small write ops.
> > > > > >>
> > > > > >> So we decided with HBase because we trust the Hadoop/HBase
> > > > > infrastructure
> > > > > >> will offer us the robustness and reliability we need. That
being
> > > said,
> > > > > I'm
> > > > > >> not feeling at ease in regards to the capacity of HBase
to
> handle
> > > the
> > > > > >> potential load we are looking at inputing.
> > > > > >>
> > > > > >> In fact, it's a common treat of such systems, they've been
> > designed
> > > > with
> > > > > a
> > > > > >> certain use case in mind and sometimes I feel like their
design
> > and
> > > > > >> implementation leak way too much on our infrastructure,
leading
> us
> > > > down
> > > > > the
> > > > > >> path of a virtual lock-in.
> > > > > >>
> > > > > >> Now I am not accusing anyone here, just observing that I
find it
> > > > really
> > > > > >> hard to locate any industrial story of those systems in
a
> similar
> > > use
> > > > > case
> > > > > >> we have at hand.
> > > > > >>
> > > > > >> The number of nodes this or that company has doesn't quite
> > interest
> > > me
> > > > > as
> > > > > >> much as the way they are actually using HBase and Hadoop.
> > > > > >>
> > > > > >> RDBMS don't scale as well but they've got a long history
and
> > people
> > > do
> > > > > >> know how to optimise, use and manage them. It seems
> > column-oriented
> > > > > database
> > > > > >> systems are still young :)
> > > > > >>
> > > > > >> - Sylvain
> > > > > >>
> > > > > >> Schubert Zhang a écrit :
> > > > > >>
> > > > > >>> Regardless Cassandra, I want to discuss some questions
about
> > > > > >>> HBase/Bigtable.  Any advices are expected.
> > > > > >>>
> > > > > >>> Regards runing MapReduce to scan/analyze big data in
HBase.
> > > > > >>>
> > > > > >>> Compared to sequentially reading data from HDFS files
directly,
> > > > > >>> scan/sequential-reading data from HBase is slower. (As
my test,
> > at
> > > > > least
> > > > > >>> 3:1
> > > > > >>> or 4:1).
> > > > > >>>
> > > > > >>> For the data in HBase, it is diffcult to only analyze
specified
> > > part
> > > > of
> > > > > >>> data. For example, it is diffcult to only analyze the
recent
> one
> > > day
> > > > of
> > > > > >>> data. In my application, I am considering partition
data into
> > > > different
> > > > > >>> HBase tables (e.g. one day - one table), then, I can
only touch
> > one
> > > > > table
> > > > > >>> for analyze via MapReduce.
> > > > > >>> In Google's Bigtable paper, in the "8.1 Google Analytics",
they
> > > also
> > > > > >>> discribe this usage, but I don't know how.
> > > > > >>>
> > > > > >>> It is also slower to put flooding data into HBase table
than
> > > writing
> > > > to
> > > > > >>> files. (As my test, at least 3:1 or 4:1 too). So, maybe
in the
> > > > future,
> > > > > >>> HBase
> > > > > >>> can provide a bulk-load feature, like PNUTS?
> > > > > >>>
> > > > > >>> Many people suggest us to only store metadata into HBase
> tables,
> > > and
> > > > > >>> leave
> > > > > >>> data in HDFS files, because our time-series dataset
is very
> big.
> >  I
> > > > > >>> understand this idea make sense for some simple application
> > > > > requirements.
> > > > > >>> But usually, I want different indexes to the raw data.
It is
> > > diffcult
> > > > > to
> > > > > >>> build such indexes if the the raw data files (which
are raw or
> > are
> > > > > >>> reconstructed via MapReduce  periodically on recent
data ) are
> > not
> > > > > >>> totally
> > > > > >>> sorted.  .... HBase can provide us many expected features:
> > sorted,
> > > > > >>> distributed b-tree, compact/merge.
> > > > > >>>
> > > > > >>> So, it is very difficult for me to make trade-off.
> > > > > >>> If I store data in HDFS files (may be partitioned),
and
> > > > metadata/index
> > > > > in
> > > > > >>> HBase. The metadata/index is very difficult to be build.
> > > > > >>> If I rely on HBase totally, the performance of ingesting-data
> and
> > > > > >>> scaning-data is not good. Is it reasonable to do MapReduce
on
> > > HBase?
> > > > We
> > > > > >>> know
> > > > > >>> the goal of HBase is to provide random access over HDFS,
and it
> > is
> > > a
> > > > > >>> extention or adaptor over HDFS.
> > > > > >>>
> > > > > >>> ----
> > > > > >>> Many a time, I am thinking, maybe we need a data storage
> engine,
> > > > which
> > > > > >>> need
> > > > > >>> not so strong consistency, and it can provide better
writing
> and
> > > > > >>> reading throughput like HDFS. Maybe, we can design another
> system
> > > > like
> > > > > a
> > > > > >>> simpler HBase ?
> > > > > >>>
> > > > > >>> Schubert
> > > > > >>>
> > > > > >>> On Wed, Sep 2, 2009 at 8:56 AM, Andrew Purtell <
> > > apurtell@apache.org>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>> To be precise, S3.
> > http://status.aws.amazon.com/s3-20080720.html
> > > > > >>>>
> > > > > >>>>  - Andy
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> ________________________________
> > > > > >>>> From: Andrew Purtell <apurtell@apache.org>
> > > > > >>>> To: hbase-user@hadoop.apache.org
> > > > > >>>> Sent: Tuesday, September 1, 2009 5:53:09 PM
> > > > > >>>> Subject: Re: Cassandra vs HBase
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Right... I recall an incident in AWS where a malformed
gossip
> > > packet
> > > > > >>>> took
> > > > > >>>> down all of Dynamo. Seems that even P2P doesn't
mitigate
> against
> > > > > corner
> > > > > >>>> cases.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Tue, Sep 1, 2009 at 3:12 PM, Jonathan Ellis <
> > jbellis@gmail.com
> > > >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>> The big win for Cassandra is that its p2p distribution
model
> --
> > > > which
> > > > > >>>>> drives the consistency model -- means there
is no single
> point
> > of
> > > > > >>>>> failure.  SPF can be mitigated by failover but
it's really,
> > > really
> > > > > >>>>> hard to get all the corner cases right with
that approach.
> >  Even
> > > > > >>>>> Google with their 3 year head start and huge
engineering
> > > resources
> > > > > >>>>> still has trouble with that occasionally.  (See
e.g.
> > > > > >>>>>
> > > > http://groups.google.com/group/google-appengine/msg/ba95ded980c8c179
> > > > > .)
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message