lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: Architecture Question
Date Mon, 19 Nov 2012 18:38:53 GMT
If you just want to store the data, you can dump it into HDFS sequence files.  While HBase
is really nice if you want to process and serve data real-time, it adds overhead to use it
as pure storage.
Dave

-----Original Message-----
From: Cool Techi [mailto:cooltechie@outlook.com] 
Sent: Friday, November 16, 2012 8:26 PM
To: solr-user@lucene.apache.org
Subject: RE: Architecture Question

Hi Otis,

Thanks for your reply, just wanted to check what NoSql structure would be best suited to store
data and use the least amount of memory, since for most of my work Solr would be sufficient
and I want to store data just in case we want to reindex and as a backup.

Regards,
Ayush

> Date: Fri, 16 Nov 2012 15:47:40 -0500
> Subject: Re: Architecture Question
> From: otis.gospodnetic@gmail.com
> To: solr-user@lucene.apache.org
> 
> Hello,
> 
> 
> 
> > I am not sure if this is the right forum for this question, but it would
> > be great if I could be pointed in the right direction. We have been using a
> > combination of MySql and Solr for all our company full text and query
> > needs.  But as our customers have grow so has the amount of data and MySql
> > is just not proving to be a right option for storing/querying.
> >
> > I have been looking at Solr Cloud and it looks really impressive, but and
> > not sure if we should give away our storage system. So, I have been
> > exploring DataStax but a commercial option is out of question. So we were
> > thinking of using hbase to store the data and at the same time index the
> > data into Solr cloud, but for many reasons this design doesn't seem
> > convincing (Also seen basic of Lilly).
> >
> > 1) Would it be recommended to just user Solr cloud with multiple
> > replication or hbase-solr seems like good option
> >
> 
> If you trust SolrCloud with replication and keep all your fields stored
> then you could live without an external DB.  At this point I personally
> would still want an external DB.  Whether HBase is the right DB for the job
> I can't tell because I don't know anything about your data, volume, access
> patterns, etc.  I can tell you that HBase does scale well - we have tables
> with many billions of rows stored in it for instance.
> 
> 
> > 2) How much strain would be to keep both Solr Shard and Hbase node on the
> > same machine
> >
> 
> HBase loves memory.  So does Solr.  They both dislike disk IO (who
> doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
> the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
> on data in HBase.
> 
> 
> > 3) if there a calculation on what kind of machine configuration would I
> > need to store 500-1000 million records. Most of these with be social data
> > (Twitter/facebook/blogs etc) and how many shards.
> >
> 
> No recipe here, unfortunately.  You'd have to experiment and test, do load
> and performance testing, etc.  If you need help with Solr + HBase, we
> happen to have a lot of experience with both and have even used them
> together for some of our clients.
> 
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
 		 	   		  

Mime
View raw message