hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: About test/production server configuration
Date Tue, 06 Apr 2010 22:10:34 GMT
Imran,

Have you run Solr atop HDFS?  I doubt this will be performant.

Also, to properly scope your cluster, you need to come up with actual number targets if you
want to be able to accurately provision hardware.  "not much" data now, but "lots" of data
later could mean anything.  Decide what you want to provision for and then you can accurately
do so.

> -----Original Message-----
> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> Sent: Monday, April 05, 2010 6:11 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: About test/production server configuration
> 
> On Mon, Apr 5, 2010 at 11:56 PM, Jonathan Gray <jgray@facebook.com>
> wrote:
> > Imran,
> >
> > It's impossible to give good advice on cluster size and hardware
> configuration without some idea of the requirements.
> >
> 
> Sorry my mistake, I should have elaborated a little bit more. Please
> find some requirements below inline.
> 
> > How much data?
> 
> To startup, initially we will not have much, but down the road it will
> be a lot of data. Plus a lot of user created content. Initially there
> will be a lot of log-like entries, plus transactions...
> 
> > How will the data be queried?
> 
> We are focusing on system design to lookup using key only. For
> searching it will be Solr only. So the idea is Solr will be used for
> all searching and then the data lookup will be performed in HBase. In
> addition, we will have both Application layer caching in Ehcache and
> Web Accelerator (Varnish).
> 
> > What kind of load do you expect?
> 
> Hard to estimate but we are planning for moderate installation, so
> that if we have good response from the market we can expand, thats one
> of the 2 primary reason to choose HBase we will be able to scale it on
> the fly.
> 
> > You are going to be doing offline batch/MapReduce, online random
> access, as well as search all from the same nodes?  This can be
> dangerous.
> 
> Yes the offline batch, HBase lookup will be on the same machine. But
> not search as a whole...Solr will use HDFS only to store the index and
> read it from, but no processing related to search will be done there.
> It will be on a separate box all together. But your following
> statement is tempting me to use RAID+DRDB for Solr based searching.
> 
> Another thing to note is, the offline batch work would be summarize
> tables. One example from our system would be to generate daily balance
> sheet of ledgers, profit loss statement etc. for 100+ Journals in a
> PoS SaaS.
> 
> > I would strongly recommend against putting Hadoop+HBase on the same
> nodes as something like Solr, unless you have dedicated disks for each.
>  Also, don't forget about ZooKeeper which you definitely will need
> separate nodes/disks for if you will be co-locating so many other
> things.
> 
> Hmm.. What I understand from this discussion and Patrick's point on
> ZK. I would go for:
> 
>  - 4 separate DN (each DN with its dedicated disk but may be not
> physical server) for Solr only, as a side note, initially we will have
> 2 Solr search boxes.
>  - ZK needs separate disk for performance, so would have dedicated
> disks for it too.
> 
> But what I am confused about is how spread out ZK, Multi-Master, RS,
> DN, TT for HBase. Insight, comments, suggestions on it would be most
> welcome.
> 
> Another note on our perspective is that we want to scale horizontally
> by adding more machines and not vertically (if we wanted it or could
> afford it, we would have probably chosen a RDBMS). Being able to scale
> horizontally as our user-base, load and revenue increases was/is
> essential to us.
> 
> Waiting eagerly for some insight, comments and/or suggestions.
> 
> Thank you.
> 
> Imran
> 
> >
> > JG
> >
> >> -----Original Message-----
> >> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> >> Sent: Monday, April 05, 2010 9:52 AM
> >> To: hbase-user@hadoop.apache.org
> >> Subject: About test/production server configuration
> >>
> >> Hi,
> >>
> >> We are a startup who have decided to use HBase purely because we
> want
> >> to take advantage of HDFS based reliability, redundancy, MapReduce
> and
> >> BigTable. For that we are thinking to go for a test environment with
> 5
> >> servers and production environment with 10 servers in both case the
> >> Hadoop cluster will be used for HBase + MapReduce + Solr Index.
> >>
> >> Firstly, I would like some opinion on whether 10 servers is a good
> >> number for all 3 purposes or not. Secondly what kind of test
> >> environment is currently in use in different organizations. Thirdly,
> I
> >> would like to learn some server configuration and purchase price
> (with
> >> purchase location if possible).
> >>
> >> Waiting eagerly for some feedback.
> >>
> >> Thank you,
> >>
> >> --
> >> Imran M Yousuf
> >> Entrepreneur & Software Engineer
> >> Smart IT Engineering
> >> Dhaka, Bangladesh
> >> Email: imran@smartitengineering.com
> >> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> >> Mobile: +880-1711402557
> >
> 
> 
> 
> --
> Imran M Yousuf
> Entrepreneur & Software Engineer
> Smart IT Engineering
> Dhaka, Bangladesh
> Email: imran@smartitengineering.com
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557

Mime
View raw message