hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ranganathan, Sharmila" <sranganat...@library.rochester.edu>
Subject RE: Data currently stored in Solr index. Should it be moved to HDFS?
Date Wed, 20 Jan 2010 20:23:34 GMT
Thanks for your reply. Is Hadoop only for distributed applications? 


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, January 20, 2010 2:03 PM
To: common-user@hadoop.apache.org
Subject: Re: Data currently stored in Solr index. Should it be moved to
HDFS?

Hello,

Reading large result sets from Solr is not the way we typically advise
people to use Solr. It's not designed for that (nor is Lucene, the
search library at its core).  There is some work being done right now
about getting Solr better at retrieveing large result sets, but my
feeling is you'd be better of avoiding Solr and getting data to your MR
jobs from files stored in HDFS.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: "Ranganathan, Sharmila" <sranganathan@library.rochester.edu>
> To: common-user@hadoop.apache.org
> Sent: Tue, January 19, 2010 5:15:36 PM
> Subject: Data currently stored in Solr index. Should it be moved to
HDFS? 
> 
> Hi,
> 
> 
> 
> Our application stores GBs of data in Lucene Solr index. It reads from
> Solr index and does some processing on the data and stores it back in
> Solr as index. It is stored in Solr index so that faceted search is
> possible.  The process of reading from solr, processing data and
writing
> back to index is very slow. So we are looking at some parallel
> programming frameworks. Hadoop MapReduce seems to take input in form
of
> file and creates output as a file. Since we have data in Solr index,
> should we read data from index convert to a file and send it as input
to
> Hadoop and read its output file and write the results to index? This
> read and write to index will still be time consuming if not run
> parallel. Or should we get rid of Solr index and just store data in
> HDFS.  Also the index is stored in one folder which means one disk.
We
> donot use multiple disks. Is use of multiple disk a must for Hadoop?
> 
> 
> 
> I am new to Hadoop and trying to figure out whether Hadoop is the
> solution for our application.
> 
> 
> 
> Thanks
> 
> SR


Mime
View raw message