hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Question about table size limitations...
Date Fri, 15 Apr 2011 21:26:00 GMT
Michael,

This sounds like an excellent way to organize this data (bouy + time
interval id => sequence of data points).  Clearly you will also need an
auxiliary
table that maps geolocation => {bouy,time}+

The question (as you point out) is whether hbase is going to be happy to
store so much data.  The current state is "it depends", but you are
definitely pushing things.

There may be some interesting technical alternatives that make this much
easier.  I will contact you off-list about these.

On Fri, Apr 15, 2011 at 1:01 PM, Michael Segel <michael_segel@hotmail.com>wrote:

>
>
>
> I have a question concerning if or what is the practical
> size limitations of a Table in Hbase.
>
>
> I was asked ‘how many rows can one reasonably expect HBase
> to handle…’ and the person with the question didn’t like my “It depends…”
> answer. (I’m a consultant and the answer to every problem has to start with
> a “It
> depends…” caveat. :-)
> )
>
> In trying to ascertain a practical answer, I’ve created this
> hypothetical problem and hopefully someone with a bit more insight and
> knowledge can provide a better answer.
> Please note that this is a hypothetical example and any resemblance to a
> real life problem is a coincidence.
>
>
> We have a fleet of petroleum exploration vessels. Each
> vessel tows a set of sonar buoys to take measurements of the ocean’s
> floor.  Overlapping of searches can occur.
> (Crisscross patterns)
>
> So our data sets contain both a geospatial aspect along with
> a time series aspect. The complete data set of a single ocean can be large.
> Measured in 10's of PBs.
>
>
> There are two know use cases:
>
> ·
> For a given ‘sweep’ process that data set.
> (Sweep is a data set for a given ship in a given grid space for a  single
> day where we know the start and end
> times of the sweep.)
>
> ·
> For a given grid_id (geo spatial box) process
> all of the data collected by all of the sweeps that occurred. (Different
> ships,
> dates, etc …)
>
>
> Having said all of that… how much data can we store in a
> table? How many rows?
>
>
> Assume that the data set per time interval per buoy is 1K in
> size and that there are going to be billions of these data points in the
> database. (And we can store each buoy’s result in a different column of the
> row.)
>
> What I’d like to have is some sort of formula that we can
> use to help determine a realistic size limit before performance falls
> apart.
>
>
>
> There’s more to this but the idea is to explore HBase’s
> capabilities and limitations. We need to know this because we'd like to
> plan for any problems and design to avoid them without having to try and
> test this solution without having to buy and build a 2000 node cluster...
>
> Thx
>
>
>
> -Mike
> PS. JDCryans, does this help explain the problem?
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message