lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Rogers <chris.rog...@bodleian.ox.ac.uk>
Subject Using DIH FileListEntityProcessor with SolrCloud
Date Fri, 02 Dec 2016 16:36:12 GMT
Hi all,

A question regarding using the DIH FileListEntityProcessor with SolrCloud (solr 6.3.0, zookeeper
3.4.8).

I get that the config in SolrCloud lives on the Zookeeper node (a different server from the
solr nodes in my setup).

With this in mind, where is the baseDir attribute in the FileListEntityProcessor config relative
to? I’m seeing the config in the Solr GUI, and I’ve tried setting it as an absolute path
on my Zookeeper server, but this doesn’t seem to work… any ideas how this should be setup?

My DIH config is below:

<dataConfig>
  <dataSource type="FileDataSource"/>
  <document>
    <!-- this outer processor generates a list of files satisfying the conditions
         specified in the attributes -->
    <entity name="f" processor="FileListEntityProcessor"
            fileName=".*xml"
            newerThan="'NOW-5YEARS'"
            recursive="true"
            rootEntity="false"
            dataSource="null"
            baseDir="/home/bodl-zoo-svc/files/">

      <!-- this processor extracts content using Xpath from each file found -->

      <entity name="tei" processor="XPathEntityProcessor"
              forEach="/TEI" url="${f.fileAbsolutePath}" transformer="RegexTransformer" >
        <field column="manuscript_title" name="manuscript_title" xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
        <field column="repository" name="repository" xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
        <field column="id" name="id" xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
      </entity>

    </entity>

  </document>
</dataConfig>


This same script worked as expected on a single solr node (i.e. not in SolrCloud mode).

Thanks,
Chris

--
Chris Rogers
Digital Projects Manager
Bodleian Digital Library Systems and Services
chris.rogers@bodleian.ox.ac.uk
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message