hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Zlatin.Balev...@barclayscapital.com>
Subject RE: hbase bulk writes
Date Tue, 29 Dec 2009 23:05:42 GMT
Thank you for the link St.Ack.

The usage case described in that thread is similar to ours.  Here are
some more details:

The numbers are for binary format that not be very compressible.  Most
of the data will be arriving during an 8-hour window.  It would be keyed
by a nanosecond timestamp so all records will be unique.  Data will be
kept indefinitely; there will be rare  updates/deletions of small number
of rows.  The main usage case is sequential range scanning and filtering
of 2^(40+) rows.  

There will be several column families; occasionally new ones will be
added and old ones deprecated.  That flexibility, the strong data
consistency and good scan performance (according to the published
benchmarks) are the main reasons we're looking at Hbase.

A question: during the time after the bulk loading MR script has
finished running and the meta scan runs, which could be up to a minute,
how will querying and scanning work?  Will they produce inconsistent
results or just not see the new data?  What about update or delete
operations?  Is it necessary to suspend/queue those and if so, is there
a way to do that within Hbase.

Best Regards,
Zlatin

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
stack
Sent: Tuesday, December 29, 2009 4:39 PM
To: hbase-user@hadoop.apache.org
Subject: Re: hbase bulk writes

You've seen the description at
http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg06010.html
for how timeseries data might be added quickly to hbase by just adding
regions to tail of a table?  They'd come online as soon as the next meta
scan ran (usually every minute).

Your schema requires multiple families?

Generally loading behind the API will get you an order of magnitude
improvement and more of bulk load speedup over loading via API.

Are you numbers for compressed data?
St.Ack



On Tue, Dec 29, 2009 at 12:28 PM,
<Zlatin.Balevsky@barclayscapital.com>wrote:

>
> >Can you put your input files under an http server and then write a
> mapreduce that pulls via HTTP?
>
> Greetings,
>
> I'm very interested in how much of an improvement would HBASE-1861 
> result in.  I am planning on inserting between  2^33 to 2^37 records 
> for aggregate 2^43 to 2^45 bytes on a daily basis.  The records will 
> be sequentially sorted, which I understand is the worst-case scenario 
> for inserting in a live Hbase system.  To make things even more 
> interesting, I can't afford any downtime, so any bulk load method will

> have to append to existing tables.
> Based on the load rates others are posting, I'm starting to doubt if 
> this will be possible with Hbase at all?  There will be plenty of cpu 
> cores and storage space.
>
> Best Regards,
> Zlatin Balevsky
> AVP AMM Group,
> Barclays Capital
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged 
> or otherwise protected from disclosure. If you are not an intended 
> recipient of this e-mail, do not duplicate or redistribute it by any 
> means. Please delete it and any attachments and notify the sender that

> you have received it in error. Unless specifically indicated, this 
> e-mail is not an offer to buy or sell or a solicitation to buy or sell

> any securities, investment products or other financial product or 
> service, an official confirmation of any transaction, or an official 
> statement of Barclays. Any views or opinions presented are solely 
> those of the author and do not necessarily represent those of 
> Barclays. This e-mail is subject to terms available at the following 
> link: www.barcap.com/emaildisclaimer. By messaging with Barclays you 
> consent to the foregoing.  Barclays Capital is the investment banking 
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14
5HP.
>  This email may relate to or be sent from other members of the 
> Barclays Group.
> _______________________________________________
>
_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected
from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or
redistribute it by any means. Please delete it and any attachments and notify the sender that
you have received it in error. Unless specifically indicated, this e-mail is not an offer
to buy or sell or a solicitation to buy or sell any securities, investment products or other
financial product or service, an official confirmation of any transaction, or an official
statement of Barclays. Any views or opinions presented are solely those of the author and
do not necessarily represent those of Barclays. This e-mail is subject to terms available
at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC,
a company registered in England (number 1026167) with its registered office at 1 Churchill
Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays
Group.
_______________________________________________

Mime
View raw message