lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Evans <>
Subject Re: SolrCloud, DIH, and XPathEntityProcessor
Date Tue, 12 Jan 2016 14:45:46 GMT
On Tue, Jan 12, 2016 at 2:32 PM, Shawn Heisey <> wrote:
> On 1/12/2016 6:05 AM, Tom Evans wrote:
>> Hi all, trying to move our Solr 4 setup to SolrCloud (5.4). Having
>> some problems with a DIH config that attempts to load an XML file and
>> iterate through the nodes in that file, it trys to load the file from
>> disk instead of from zookeeper.
>> <entity
>>     dataSource="lookup_conf"
>>     rootEntity="false"
>>     name="lookups"
>>     processor="XPathEntityProcessor"
>>     url="lookup_conf.xml"
>>     forEach="/lookups/lookup">
>> The file exists in zookeeper, adjacent to the data_import.conf in the
>> lookups_config conf folder.
> SolrCloud puts all the *config* for Solr into zookeeper, and adds a new
> abstraction for indexes (the collection), but other parts of Solr like
> DIH are not really affected.  The entity processors in DIH cannot
> retrieve data from zookeeper.  They do not know how.

That makes no sense whatsoever. DIH loads the data_import.conf from ZK
just fine, or is that provided to DIH from another module that does
know about ZK?

Either way, it is entirely sub-optimal to have SolrCloud store "all"
its configuration in ZK, but still require manually storing and
updating files on specific nodes in order to influence DIH. If a
server is mistakenly not updated, or manually modified locally on
disk, that node would start indexing documents differently than other
replicas, which sounds dangerous and scary!

If there is not a ZkFileDataSource, it shouldn't be too tricky to add
one... I'll see how much I dislike having config files on the host...



View raw message