hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Data currently stored in Solr index. Should it be moved to HDFS?
Date Thu, 28 Jan 2010 20:36:58 GMT
Hm, yes.  See how few hits this shows:


  http://search-hadoop.com/?q=non-distributed&fc_project=Hadoop

You can set it up on 1 box, but that's really useful only for development.
 
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: "Ranganathan, Sharmila" <sranganathan@library.rochester.edu>
> To: common-user@hadoop.apache.org
> Sent: Wed, January 20, 2010 3:23:34 PM
> Subject: RE: Data currently stored in Solr index. Should it be moved to HDFS?
> 
> Thanks for your reply. Is Hadoop only for distributed applications? 
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Wednesday, January 20, 2010 2:03 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Data currently stored in Solr index. Should it be moved to
> HDFS?
> 
> Hello,
> 
> Reading large result sets from Solr is not the way we typically advise
> people to use Solr. It's not designed for that (nor is Lucene, the
> search library at its core).  There is some work being done right now
> about getting Solr better at retrieveing large result sets, but my
> feeling is you'd be better of avoiding Solr and getting data to your MR
> jobs from files stored in HDFS.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> 
> 
> 
> ----- Original Message ----
> > From: "Ranganathan, Sharmila" 
> > To: common-user@hadoop.apache.org
> > Sent: Tue, January 19, 2010 5:15:36 PM
> > Subject: Data currently stored in Solr index. Should it be moved to
> HDFS? 
> > 
> > Hi,
> > 
> > 
> > 
> > Our application stores GBs of data in Lucene Solr index. It reads from
> > Solr index and does some processing on the data and stores it back in
> > Solr as index. It is stored in Solr index so that faceted search is
> > possible.  The process of reading from solr, processing data and
> writing
> > back to index is very slow. So we are looking at some parallel
> > programming frameworks. Hadoop MapReduce seems to take input in form
> of
> > file and creates output as a file. Since we have data in Solr index,
> > should we read data from index convert to a file and send it as input
> to
> > Hadoop and read its output file and write the results to index? This
> > read and write to index will still be time consuming if not run
> > parallel. Or should we get rid of Solr index and just store data in
> > HDFS.  Also the index is stored in one folder which means one disk.
> We
> > donot use multiple disks. Is use of multiple disk a must for Hadoop?
> > 
> > 
> > 
> > I am new to Hadoop and trying to figure out whether Hadoop is the
> > solution for our application.
> > 
> > 
> > 
> > Thanks
> > 
> > SR


Mime
View raw message