lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, William K - Norman, OK - Contractor" <William.K.Mil...@usps.gov.INVALID>
Subject RE: DIH issue with streaming xml file
Date Mon, 12 Jun 2017 18:11:43 GMT
Thank you for your response.  That is the issue that I am having.  I cannot figure out how
to get the list of files from the remote server.  I have tried changing the parent Entity
Processor to the XPathEntityProcessor and the baseDir to a url using https.  This did not
work as it was looking for a "foreach" attribute.  Is there an Entity Processor that can be
used to get the list of files from an https source or am I going to have to use solrj or create
a custom entity processor?




~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Monday, June 12, 2017 12:57 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

How do you get a list of URLs for the files on the remote server? That's probably the first
issue. Once you have the URLs in an outside entity or two, you can feed them one by one into
the inner entity.

Regards,
   Alex.

----
http://www.solr-start.com/ - Resources for Solr users, new and experienced

On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < William.K.Miller@usps.gov.invalid>
wrote:

> I am using Solr 6.5.1 and working on importing xml files using the 
> DataImportHandler.  I am wanting to get the files from a remote 
> server, but I am dealing with multiple xml files in multiple folders.  
> I am using a nested entity in my dataConfig.  Below is an example of 
> how I have my dataConfig set up.  I got most of this from an online 
> reference.  In this example I am getting the xml files from a folder 
> on the Solr server, but as I mentioned above I want to get the files 
> from a remote server.  I have looked at the different Entity 
> Processors for the DIH, but have not seen anything that seems to work.  
> Is there a way to configure the below code to let me do this?
>
>
>
>
>
> <dataConfig>
>
>
>
>                 <dataSource name="hbk" encoding="UTF-8"
> type="FileDataSource" />
>
>                 <document name="hbk">
>
>                                 <!--
>
>             Pickupdir fetches all files matching the filename regex in 
> the supplied directory
>
>             and passes them to other entities which parse the file 
> contents.
>
>         -->
>
>
>
>                                 <entity
>
>             name="pickupdir"
>
>             processor="FileListEntityProcessor"
>
>             rootEntity="false"
>
>             dataSource="null"
>
>             fileName="^[\w\d-]+\.xml$"
>
>             baseDir="/var/solr/data/hbk/data/xml/"
>
>             recursive="true"
>
>
>
>         >
>
>                                                 <!--
>
>
> Pickupxmlfile parses standard Solr update XML.
>
>                                                 -->
>
>
>
>                                                 <entity
>
>                                                                 name="xml"
>
>
> pk="itemId"
>
>
> processor="XPathEntityProcessor"
>
>
> transformer="RegexTransformer,TemplateTransformer"
>
>
> datasource="pickupdir"
>
>
> stream="true"
>
>
> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>
>
> url="${pickupdir.fileAbsolutePath}"
>
>
> forEach="/eflow/section | /eflow/section/item"
>
>                                                 >
>
>
>
>                                                                 <field 
> column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>
>                                                                 <field 
> column="sectionTitle" xpath="/eflow/section/@title" commonField="true" 
> />
>
>                                                                 <field 
> column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>
>                                                                 <field 
> column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>
>                                                                 <field 
> column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>
>
>
>                                                                 <field 
> column="itemId" xpath="/eflow/section/item/@id" />
>
>                                                                 <field 
> column="itemTitle" xpath="/eflow/section/item/@title" />
>
>                                                                 <field 
> column="itemNo" xpath="/eflow/section/item/@mit" />
>
>                                                                 <field 
> column="itemFile" xpath="/eflow/section/item/@file" />
>
>                                                                 <field 
> column="itemType" xpath="/eflow/section/item/@type" />
>
>                                                 </entity>
>
>                                 </entity>
>
>                 </document>
>
> </dataConfig>
>
>
>
>
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>
Mime
View raw message