lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karan Saini <maximus...@gmail.com>
Subject Re: Perform incremental import with PDF Files
Date Tue, 30 Jan 2018 07:34:35 GMT
Hi Emir,

There is one behavior i noticed while performing the incremental import. I
added a new field into the managed-schema.xml to test the incremental
nature of using the clean=false.

         *<field name="xtimestamp" type="date" indexed="true" stored="true"
default="NOW" multiValued="false"/>*

Now xtimestamp is having a new value even on every DIH import with
clean=false property. Now i am confused that how will i know, if
clean=false is working or not ?
Please suggest.

Kind regards,
Karan



On 29 January 2018 at 20:12, Emir Arnautović <emir.arnautovic@sematext.com>
wrote:

> Hi Karan,
> Glad it worked for you.
>
> I am not sure how to do it in C# client, but adding clean=false parameter
> in URL should do the trick.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 14:48, Karan Saini <maximus392@gmail.com> wrote:
> >
> > Thanks Emir :-) . Setting the property *clean=false* worked for me.
> >
> > Is there a way, i can selectively clean the particular index from the
> > C#.NET code using the SolrNet API ?
> > Please suggest.
> >
> > Kind regards,
> > Karan
> >
> >
> > On 29 January 2018 at 16:49, Emir Arnautović <
> emir.arnautovic@sematext.com>
> > wrote:
> >
> >> Hi Karan,
> >> Did you try running full import with clean=false?
> >>
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 29 Jan 2018, at 11:18, Karan Saini <maximus392@gmail.com> wrote:
> >>>
> >>> Hi folks,
> >>>
> >>> Please suggest the solution for importing and indexing PDF files
> >>> *incrementally*. My requirements is to pull the PDF files remotely from
> >> the
> >>> network folder path. This network folder will be having new sets of PDF
> >>> files after certain intervals (for say 20 secs). The folder will be
> >> forced
> >>> to get empty, every time the new sets of PDF files are copied into it.
> I
> >> do
> >>> not want to loose the earlier saved index of the old files, while doing
> >> the
> >>> next incremental import.
> >>>
> >>> Currently, i am using Solr 6.6 version for the research.
> >>>
> >>> The dataimport handler config is currently like this :-
> >>>
> >>> <!--Remote Access--><dataConfig>
> >>> <dataSource type="BinFileDataSource"/>
> >>> <document>
> >>>   <entity name="K2FileEntity" processor="FileListEntityProcessor"
> >>> dataSource="null"
> >>>                      recursive = "true"
> >>>                      baseDir="\\CLDSINGH02\*RemoteFileDepot*"
> >>>                      fileName=".*pdf" rootEntity="false">
> >>>
> >>>                      <field column="file" name="id"/>
> >>>                       <field column="fileSize" name="size" />-->
> >>>                       <field column="fileLastModified"
> >> name="lastmodified" />
> >>>
> >>>                        <entity name="pdf" processor="
> TikaEntityProcessor"
> >> onError="skip"
> >>>                                        url="${K2FileEntity.
> fileAbsolutePath}"
> >> format="text">
> >>>
> >>>                              <field column="title" name="title"
> >> meta="true"/>
> >>>                              <field column="dc:format" name="format"
> >> meta="true"/>
> >>>                              <field column="text" name="text"/>
> >>>                        </entity>
> >>>   </entity>
> >>> </document></dataConfig>
> >>>
> >>>
> >>> Kind regards,
> >>> Karan Singh
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message