manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling new/updated files using Windows share connection takes too long
Date Fri, 18 Jan 2013 11:32:56 GMT
Hi Shigeki,

What database is ManifoldCF configured to use in this case?  Do you
see any indication of slow queries in the ManifoldCF log?


Karl

On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi
<shigeki.kobayashi3@g.softbank.co.jp> wrote:
> Hello
>
>
> I would like some advice to improve crawling time of new/updated files using
> Windows share connection.
>
> I crawl file in Windows server and index them into Solr.
>
> Currently, the second crawling of two hundred thousands files takes  over 5
> hours, even though any files are not updated, created, deleted.
>
> I assume MCF does the following processes (let me know if I am wrong)
>
> - obtain updated time of a file
> - compare the updated time with the one MCF obtained last time crawling(
> probably stored in DB)
> - if they are different MCF recognizes the file is to be indexed.
>
> If the above processes are done for two thousands files, what part of the
> processes could take time the most? obtaining updated time? reading data
> from DB? what could be done to increase the crawling time do you think?
>
> Please give me some advice.
>
>
> Regards,
>
> Shigeki
>
>

Mime
View raw message