hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Leffel" <daniel.lef...@gmail.com>
Subject Should I pass on HBase for this project? (for now)
Date Mon, 05 May 2008 18:51:53 GMT
Hi All (and St.Ack),
I've spent the last few weeks figuring out how to use HBase for my project.
HBase at it's surface has seemed like the dream solution for this project
and had me very excited from the beginning.

However, from the moment I've begun to implement the project, I've had a
frustrating go at it. I've spent weeks just simply trying to construct the
environment under which my application will need to run. I've sent countless
messages to this group (and thank you all so much for answering so many of
them, especially St.Ack).

At this point, I can't seem to tell which one(s) of the following is true:

   - Maybe I'm just a freaking idiot
   - Maybe HBase is just not equipped to do what I want it to do
   - Maybe HBase is just still too unstable and it will do what I need it to
   do at some point in the future
   - Maybe I have the wrong expectations for the amount of hardware I need
   to throw at the situation.

I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
with the master running on the same box as namenode and 3 region servers
(running on the same MapRed boxes).

My first and very simple task is to load a sparce table with 220 million
rows. The average row has 2 columns or so (very low byte count per row). I
have attempted to do this with a simple MapReduce job. In the Map phase, I'm
simply parsing through a text file and using the standard TableReduce to
load the table.

I've attempted to do this with various numbers of reduce tasks and various
configurations of which machines run each dameon.

The end result is always the same. At some point, Regionservers go offline -
the most recent behavior is that region servers just quit responding and
logs set to debug give no useful information. If I had to guess, this was
typical deadlock behavior.

A simple table scan (just so I can find out how rows were successfully
inserted before all the region servers died) usually causes the same
behavior (one by one, region servers just die - even with no MapRed jobs
running).

At this point, I'm at a crossroads and beginning to think that I will need
to leave HBase behind because I can't spend another week with no progress on
this project.

So, I ask the question(s) I posed in the beginning.

   - Maybe I'm just a freaking idiot
   - Maybe HBase is just not equipped to do what I want it to do
   - Maybe HBase is just still too unstable and it will do what I need it to
   do at some point in the future
   - Maybe I have the wrong expectations for the amount of hardware I need
   to throw at the situation.

Can someone please point me in the right direction?

Danny

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message