lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system
Date Sat, 05 Jul 2014 23:25:21 GMT
Ok, I asked some folks who know and the response is that "that should
work, but it's not supported/tested". IOW, you're into somewhat
uncharted territory. The people who wrote the code don't have this
use-case in their priority list and probably won't be expending energy
in this direction any time soon.

So feel free! It'd be great if you reported/supplied patches for any
problems you run across, this has been a recurring theme with
HdfsDirectoryFactory and Solr replicas: "Why should three replicas
have 9 copies of the index laying around?"

Do note that disk space is cheap, however and there is considerable
work done to minimize any performance issues with HDFS.

Best,
Erick

On Thu, Jul 3, 2014 at 9:18 AM, Tom Chen <tomchen1000@gmail.com> wrote:
> Hi,
>
> In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
> instances. The request has a indexDir parameter with a hdfs path to the
> index generated on HDFS, as shown in the MRIT log:
>
> 2014-07-02 15:03:55,123 DEBUG
> org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
> /solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
> 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2
> HTTP/1.1
>
> So it's up to the Solr instance to understand reading index from HDFS
> (rather than for the MRIT to find the local disk to write from HDFS).
>
> The go-live option is very convenient to merge generated index to live
> index. It's desirable to use go-live than copy around indexes to local file
> system and then merge.
>
> I tried to start Solr instance with these properties to allow solr instance
> to write to local file system while being able to read index on HDFS when
> doing MERGEINDEXES:
>
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>
> i.e. the full command:
> java -DnumShards=2 \
>   -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf \
>   -DzkHost=<zookeeper>:2181 \
>   -Dhost=<node1> \
>   -DSTOP.PORT=7983 -DSTOP.KEY=key \
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>   -jar start.jar
>
>
> With that, the  go-live works fine.
>
> Any comment on this approach?
>
>
>
> Tom
>
> On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> How would the MapReduceIndexerTool (MRIT for short)
>> find the local disk to write from HDFS to for each shard?
>> All it has is the information in the Solr configs, which are
>> usually relative paths on the local Solr machines, relative
>> to SOLR_HOME. Which could be different on each node
>> (that would be screwy, but possible).
>>
>> Permissions would also be a royal pain to get right....
>>
>> You _can_ forego the --go-live option and copy from
>> the HDFS nodes to your local drive and then execute
>> the "mergeIndexes" command, see:
>> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
>> Note that there is the MergeIndexTool, but there's also
>> the Core Admin command.
>>
>> The sub-indexes are in a partition in HDFS and numbered
>> sequentially.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1000@gmail.com> wrote:
>> > Hi,
>> >
>> >
>> > When we run Solr Map Reduce Indexer Tool (
>> > https://github.com/markrmiller/solr-map-reduce-example), it generates
>> > indexes on HDFS
>> >
>> > The last stage is Go Live to merge the generated index to live SolrCloud
>> > index.
>> >
>> > If the live SolrCloud write index to local file system (rather than
>> HDFS),
>> > the Go Live gives such error like this:
>> >
>> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
>> > hdfs://
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
>> > into http://bdvs087.test.com:8983/solr
>> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
>> sending
>> > live merge command
>> > java.util.concurrent.ExecutionException:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
>> > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
>> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>> > at java.lang.reflect.Method.invoke(Method.java:611)
>> > at
>> >
>> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > at java.security.AccessController.doPrivileged(AccessController.java:310)
>> > at javax.security.auth.Subject.doAs(Subject.java:573)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> > Caused by:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>> > at
>> >
>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>> > at java.lang.Thread.run(Thread.java:738)
>> >
>> > Any way to setup SolrCloud to write index to local file system, while
>> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated
>> on
>> > HDFS to the SolrCloud?
>> >
>> > Thanks,
>> > Tom
>>

Mime
View raw message