hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ranganathan, Sharmila" <sranganat...@library.rochester.edu>
Subject Data currently stored in Solr index. Should it be moved to HDFS?
Date Tue, 19 Jan 2010 22:15:36 GMT


Our application stores GBs of data in Lucene Solr index. It reads from
Solr index and does some processing on the data and stores it back in
Solr as index. It is stored in Solr index so that faceted search is
possible.  The process of reading from solr, processing data and writing
back to index is very slow. So we are looking at some parallel
programming frameworks. Hadoop MapReduce seems to take input in form of
file and creates output as a file. Since we have data in Solr index,
should we read data from index convert to a file and send it as input to
Hadoop and read its output file and write the results to index? This
read and write to index will still be time consuming if not run
parallel. Or should we get rid of Solr index and just store data in
HDFS.  Also the index is stored in one folder which means one disk.  We
donot use multiple disks. Is use of multiple disk a must for Hadoop?


I am new to Hadoop and trying to figure out whether Hadoop is the
solution for our application.





  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message