lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Chen <tomchen1...@gmail.com>
Subject Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system
Date Thu, 03 Jul 2014 16:18:32 GMT
Hi,

In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
instances. The request has a indexDir parameter with a hdfs path to the
index generated on HDFS, as shown in the MRIT log:

2014-07-02 15:03:55,123 DEBUG
org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
/solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2
HTTP/1.1

So it's up to the Solr instance to understand reading index from HDFS
(rather than for the MRIT to find the local disk to write from HDFS).

The go-live option is very convenient to merge generated index to live
index. It's desirable to use go-live than copy around indexes to local file
system and then merge.

I tried to start Solr instance with these properties to allow solr instance
to write to local file system while being able to read index on HDFS when
doing MERGEINDEXES:

  -Dsolr.directoryFactory=HdfsDirectoryFactory \
  -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
  -Dsolr.lock.type=hdfs \
  -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \

i.e. the full command:
java -DnumShards=2 \
  -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf \
  -DzkHost=<zookeeper>:2181 \
  -Dhost=<node1> \
  -DSTOP.PORT=7983 -DSTOP.KEY=key \
  -Dsolr.directoryFactory=HdfsDirectoryFactory \
  -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
  -Dsolr.lock.type=hdfs \
  -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
  -jar start.jar


With that, the  go-live works fine.

Any comment on this approach?



Tom

On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> How would the MapReduceIndexerTool (MRIT for short)
> find the local disk to write from HDFS to for each shard?
> All it has is the information in the Solr configs, which are
> usually relative paths on the local Solr machines, relative
> to SOLR_HOME. Which could be different on each node
> (that would be screwy, but possible).
>
> Permissions would also be a royal pain to get right....
>
> You _can_ forego the --go-live option and copy from
> the HDFS nodes to your local drive and then execute
> the "mergeIndexes" command, see:
> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
> Note that there is the MergeIndexTool, but there's also
> the Core Admin command.
>
> The sub-indexes are in a partition in HDFS and numbered
> sequentially.
>
> Best,
> Erick
>
> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1000@gmail.com> wrote:
> > Hi,
> >
> >
> > When we run Solr Map Reduce Indexer Tool (
> > https://github.com/markrmiller/solr-map-reduce-example), it generates
> > indexes on HDFS
> >
> > The last stage is Go Live to merge the generated index to live SolrCloud
> > index.
> >
> > If the live SolrCloud write index to local file system (rather than
> HDFS),
> > the Go Live gives such error like this:
> >
> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
> > hdfs://
> >
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
> > into http://bdvs087.test.com:8983/solr
> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
> sending
> > live merge command
> > java.util.concurrent.ExecutionException:
> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > directory '/opt/testdir/solr/node/hdfs:/
> >
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
> '
> > does not exist
> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
> > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
> > at
> >
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
> > at
> >
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at
> >
> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> > at java.lang.reflect.Method.invoke(Method.java:611)
> > at
> >
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(AccessController.java:310)
> > at javax.security.auth.Subject.doAs(Subject.java:573)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by:
> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > directory '/opt/testdir/solr/node/hdfs:/
> >
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
> '
> > does not exist
> > at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
> > at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
> > at
> >
> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> > at java.lang.Thread.run(Thread.java:738)
> >
> > Any way to setup SolrCloud to write index to local file system, while
> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated
> on
> > HDFS to the SolrCloud?
> >
> > Thanks,
> > Tom
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message