lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Perform incremental import with PDF Files
Date Mon, 29 Jan 2018 11:19:46 GMT
Hi Karan,
Did you try running full import with clean=false?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2018, at 11:18, Karan Saini <maximus392@gmail.com> wrote:
> 
> Hi folks,
> 
> Please suggest the solution for importing and indexing PDF files
> *incrementally*. My requirements is to pull the PDF files remotely from the
> network folder path. This network folder will be having new sets of PDF
> files after certain intervals (for say 20 secs). The folder will be forced
> to get empty, every time the new sets of PDF files are copied into it. I do
> not want to loose the earlier saved index of the old files, while doing the
> next incremental import.
> 
> Currently, i am using Solr 6.6 version for the research.
> 
> The dataimport handler config is currently like this :-
> 
> <!--Remote Access--><dataConfig>
>  <dataSource type="BinFileDataSource"/>
>  <document>
>    <entity name="K2FileEntity" processor="FileListEntityProcessor"
> dataSource="null"
> 			recursive = "true"						
> 			baseDir="\\CLDSINGH02\*RemoteFileDepot*"
> 			fileName=".*pdf" rootEntity="false">
> 			
> 			<field column="file" name="id"/>			
>                        <field column="fileSize" name="size" />-->
>                        <field column="fileLastModified" name="lastmodified" />
> 
> 			  <entity name="pdf" processor="TikaEntityProcessor" onError="skip"
> 					  url="${K2FileEntity.fileAbsolutePath}" format="text">				
> 
> 				<field column="title" name="title" meta="true"/>
> 				<field column="dc:format" name="format" meta="true"/>
> 				<field column="text" name="text"/>
> 			  </entity>
>    </entity>
>  </document></dataConfig>
> 
> 
> Kind regards,
> Karan Singh


Mime
View raw message