lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Architecture Question
Date Fri, 16 Nov 2012 20:47:40 GMT

> I am not sure if this is the right forum for this question, but it would
> be great if I could be pointed in the right direction. We have been using a
> combination of MySql and Solr for all our company full text and query
> needs.  But as our customers have grow so has the amount of data and MySql
> is just not proving to be a right option for storing/querying.
> I have been looking at Solr Cloud and it looks really impressive, but and
> not sure if we should give away our storage system. So, I have been
> exploring DataStax but a commercial option is out of question. So we were
> thinking of using hbase to store the data and at the same time index the
> data into Solr cloud, but for many reasons this design doesn't seem
> convincing (Also seen basic of Lilly).
> 1) Would it be recommended to just user Solr cloud with multiple
> replication or hbase-solr seems like good option

If you trust SolrCloud with replication and keep all your fields stored
then you could live without an external DB.  At this point I personally
would still want an external DB.  Whether HBase is the right DB for the job
I can't tell because I don't know anything about your data, volume, access
patterns, etc.  I can tell you that HBase does scale well - we have tables
with many billions of rows stored in it for instance.

> 2) How much strain would be to keep both Solr Shard and Hbase node on the
> same machine

HBase loves memory.  So does Solr.  They both dislike disk IO (who
doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
on data in HBase.

> 3) if there a calculation on what kind of machine configuration would I
> need to store 500-1000 million records. Most of these with be social data
> (Twitter/facebook/blogs etc) and how many shards.

No recipe here, unfortunately.  You'd have to experiment and test, do load
and performance testing, etc.  If you need help with Solr + HBase, we
happen to have a lot of experience with both and have even used them
together for some of our clients.

Performance Monitoring -
Search Analytics -

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message