hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan <joan.monp...@gmail.com>
Subject Creating Solr index from map/reduce
Date Wed, 29 Dec 2010 09:26:24 GMT
Hi,

I'm trying generate Solr index from hadoop (map/reduce) so I'm using this
patch SOLR-301 <https://issues.apache.org/jira/browse/SOLR-1301>, however I
don't get it.

When I try to run CSVIndexer with some arguments: <directory Solr index>
-solr <Solr home> <input, in this case CSV>

I'm runnig CSVIndexer:

<HADOOP_INSTALL>/bin/hadoop jar my.jar CSVIndexer <INDEX_FOLDER> -solr
/<SOLR_HOME> <CSV FILE PATH>

Before that I run CSVIndexer, I've put csv file into HDFS.

My Solr home hasn't default files configurations, but which is divided  into
multiple folders

/conf
/schema

I have custom solr file configurations so CSVIndexer can't find schema.xml,
obviously It won't be able to find it because this file doesn't exist, in my
case, this file is named "schema-xx.xml" and CSVIndexer is looking for it
inside "conf" folder and It don't know that schema folder exist. And I have
solr configuration file (solr.xml) where I configure multiple cores.

I tried to modify solr's paths but It still not working .

I understand that CSVIndexer copy Solr Home specified into HDFS
(/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to
find "schema.xml" it doesn't exit:

10/12/29 10:18:11 INFO mapred.JobClient: Task Id :
attempt_201012291016_0002_r_000000_1, Status : FAILED
java.lang.IllegalStateException: Failed to initialize record writer for
my.jar, attempt_201012291016_0002_r_000000_1
        at
org.apache.solr.hadoop.SolrRecordWriter.<init>(SolrRecordWriter.java:253)
        at
org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.FileNotFoundException: Source
'/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml'
does not exist
        at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636)
        at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606)
        at
org.apache.solr.hadoop.SolrRecordWriter.<init>(SolrRecordWriter.java:222)
        ... 4 more

Mime
View raw message