lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brett Hoerner <br...@bretthoerner.com>
Subject Re: Confusion when using go-live and MapReduceIndexerTool
Date Tue, 22 Apr 2014 14:36:56 GMT
I think I'm just misunderstanding the use of go-live. From mergeindexes
docs: "The indexes must exist on the disk of the Solr host, which may make
using this in a distributed environment cumbersome."

I'm guessing I'll have to write some sort of tool that pulls each completed
index out of HDFS and onto the respective SolrCloud machines and manually
do some kind of merge? I don't want to (can't) be running my Hadoop jobs on
the same nodes that SolrCloud is running on...

Also confusing to me: "no writes should be allowed on either core until the
merge is complete. If writes are allowed, corruption may occur on the
merged index." Is that saying that Solr will block writes, or is that
saying the end user has to ensure no writes are happening against the
collection during a merge? That seems... risky?


On Tue, Apr 22, 2014 at 9:29 AM, Brett Hoerner <brett@bretthoerner.com>wrote:

> Anyone have any thoughts on this?
>
> In general, am I expected to be able to go-live from an unrelated cluster
> of Hadoop machines to a SolrCloud that isn't running off of HDFS?
>
> intput: HDFS
> output: HDFS
> go-live cluster: SolrCloud cluster on different machines running on plain
> MMapDirectory
>
> I'm back to looking at the code but holy hell is debugging Hadoop hard. :)
>
>
> On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner <brett@bretthoerner.com>wrote:
>
>> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b
>>
>>
>> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller <markrmiller@gmail.com>wrote:
>>
>>> Odd - might be helpful if you can share your sorlconfig.xml being used.
>>>
>>> --
>>> Mark Miller
>>> about.me/markrmiller
>>>
>>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (brett@bretthoerner.com)
>>> wrote:
>>>
>>> I'm doing HDFS input and output in my job, with the following:
>>>
>>> hadoop jar /mnt/faas-solr.jar \
>>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \
>>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver
>>> \
>>> --morphline-file /mnt/morphline-ignore.conf \
>>> --zk-host $ZKHOST \
>>> --output-dir hdfs://$MASTERIP:9000/output/ \
>>> --collection $COLLECTION \
>>> --go-live \
>>> --verbose \
>>> hdfs://$MASTERIP:9000/input/
>>>
>>> Index creation works,
>>>
>>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-00000
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index
>>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.fdt
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.fdx
>>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.fnm
>>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.si
>>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.doc
>>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.pos
>>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.tim
>>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.tip
>>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene45_0.dvd
>>> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene45_0.dvm
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/segments_1
>>> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/index/segments_2
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-00000/data/tlog
>>> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://
>>>
>>> 10.98.33.114:9000/output/results/part-00000/data/tlog/tlog.0000000000000000000
>>>
>>> But the go-live step fails, it's trying to use the HDFS path as the
>>> remote
>>> index path?
>>>
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into
>>> Solr cluster...
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://
>>> 10.98.33.114:9000/output/results/part-00000 into
>>> http://discover8-test-1d.i.massrel.com:8983/solr
>>> 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command
>>> java.util.concurrent.ExecutionException:
>>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>>> directory '/mnt/solr_8983/home/hdfs:/
>>> 10.98.33.114:9000/output/results/part-00000/data/index' does not exist
>>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>>> at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
>>> at
>>>
>>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>>> at
>>>
>>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at
>>>
>>> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>> Caused by:
>>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>>> directory '/mnt/solr_8983/home/hdfs:/
>>> 10.98.33.114:9000/output/results/part-00000/data/index' does not exist
>>> at
>>>
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
>>> at
>>>
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>> at
>>>
>>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
>>> at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
>>> at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of index shards into
>>> Solr cluster took 2.31269488E8 secs
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging failed
>>>
>>> I'm digging into the code now, but wanted to send this out as a sanity
>>> check.
>>>
>>> Thanks,
>>> Brett
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message