hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran M Yousuf <imyou...@gmail.com>
Subject Re: About test/production server configuration
Date Tue, 06 Apr 2010 01:11:09 GMT
On Mon, Apr 5, 2010 at 11:56 PM, Jonathan Gray <jgray@facebook.com> wrote:
> Imran,
> It's impossible to give good advice on cluster size and hardware configuration without
some idea of the requirements.

Sorry my mistake, I should have elaborated a little bit more. Please
find some requirements below inline.

> How much data?

To startup, initially we will not have much, but down the road it will
be a lot of data. Plus a lot of user created content. Initially there
will be a lot of log-like entries, plus transactions...

> How will the data be queried?

We are focusing on system design to lookup using key only. For
searching it will be Solr only. So the idea is Solr will be used for
all searching and then the data lookup will be performed in HBase. In
addition, we will have both Application layer caching in Ehcache and
Web Accelerator (Varnish).

> What kind of load do you expect?

Hard to estimate but we are planning for moderate installation, so
that if we have good response from the market we can expand, thats one
of the 2 primary reason to choose HBase we will be able to scale it on
the fly.

> You are going to be doing offline batch/MapReduce, online random access, as well as search
all from the same nodes?  This can be dangerous.

Yes the offline batch, HBase lookup will be on the same machine. But
not search as a whole...Solr will use HDFS only to store the index and
read it from, but no processing related to search will be done there.
It will be on a separate box all together. But your following
statement is tempting me to use RAID+DRDB for Solr based searching.

Another thing to note is, the offline batch work would be summarize
tables. One example from our system would be to generate daily balance
sheet of ledgers, profit loss statement etc. for 100+ Journals in a
PoS SaaS.

> I would strongly recommend against putting Hadoop+HBase on the same nodes as something
like Solr, unless you have dedicated disks for each.  Also, don't forget about ZooKeeper
which you definitely will need separate nodes/disks for if you will be co-locating so many
other things.

Hmm.. What I understand from this discussion and Patrick's point on
ZK. I would go for:

 - 4 separate DN (each DN with its dedicated disk but may be not
physical server) for Solr only, as a side note, initially we will have
2 Solr search boxes.
 - ZK needs separate disk for performance, so would have dedicated
disks for it too.

But what I am confused about is how spread out ZK, Multi-Master, RS,
DN, TT for HBase. Insight, comments, suggestions on it would be most

Another note on our perspective is that we want to scale horizontally
by adding more machines and not vertically (if we wanted it or could
afford it, we would have probably chosen a RDBMS). Being able to scale
horizontally as our user-base, load and revenue increases was/is
essential to us.

Waiting eagerly for some insight, comments and/or suggestions.

Thank you.


> JG
>> -----Original Message-----
>> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
>> Sent: Monday, April 05, 2010 9:52 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: About test/production server configuration
>> Hi,
>> We are a startup who have decided to use HBase purely because we want
>> to take advantage of HDFS based reliability, redundancy, MapReduce and
>> BigTable. For that we are thinking to go for a test environment with 5
>> servers and production environment with 10 servers in both case the
>> Hadoop cluster will be used for HBase + MapReduce + Solr Index.
>> Firstly, I would like some opinion on whether 10 servers is a good
>> number for all 3 purposes or not. Secondly what kind of test
>> environment is currently in use in different organizations. Thirdly, I
>> would like to learn some server configuration and purchase price (with
>> purchase location if possible).
>> Waiting eagerly for some feedback.
>> Thank you,
>> --
>> Imran M Yousuf
>> Entrepreneur & Software Engineer
>> Smart IT Engineering
>> Dhaka, Bangladesh
>> Email: imran@smartitengineering.com
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557

Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

View raw message