manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kambiz Niktabar <nikta...@yahoo.com>
Subject Fw: Slow performance of Windows Share connector
Date Mon, 26 Jan 2015 12:37:22 GMT
Hi,
I'm sending to user mailing list in case anybody else has the same issue with JCIF connector.
RegardsKambiz Niktabar
    ----- Forwarded Message -----
  From: Karl Wright <daddywri@gmail.com>
 To: Kambiz Niktabar <niktabar@yahoo.com> 
 Sent: Monday, January 26, 2015 12:58 PM
 Subject: Re: Slow performance of Windows Share connector
   
Thanks for the update!
It would be great if you could post this to the user list; other people may encounter similar
problems.

Karl




On Mon, Jan 26, 2015 at 6:25 AM, Kambiz Niktabar <niktabar@yahoo.com> wrote:

Hi Karl,
As promised I wanted to inform you about the result of this case.By looking at the capture
of WireShark, I noticed that there are many errors complaining about duplicate domain name.
Then I just change the "Authentication domain" in Server tab of repository connection to our
pre-Windows 2000 domain name and now it works perfectly fine. 
RegardsKambiz
      From: Karl Wright <daddywri@gmail.com>
 To: Kambiz Niktabar <niktabar@yahoo.com> 
 Sent: Friday, January 23, 2015 1:18 PM
 Subject: Re: Slow performance of Windows Share connector
   
Hi Kambiz,

The "access" time includes the fetching of the document up to the time spent sending the document
to the outputs.

If you are crawling the local file system through JCIFS, and you are still writing data locally,
then clearly the output connection is not involved.

My suspicion is that, because CIFS is involved under Windows, it's possible that you are indeed
going through network even though both source and destination are local.  You can readily
figure this out using WireShark, and see what packets are going in and out of that machine
during crawling.

I should also state that, in my experience, the CIFS protocol is relatively fragile, because
it is multiplexed.  That means that when any one virtual connection has errors, multiple
connections must be dropped and retried.  Windows implementations of CIFS, likewise, are
not very good at handling large numbers of virtual connections simultaneously.  If you have
a max connection count that is set too big, then, you might have errors you are unaware of.

My suggestion: First, look at the log to see if there are any errors.
Second: lower the maximum number of JCIFS repository connections to between 2 and 5.
Third: Verify that you are not doing something funny with network using Wireshark.

As far as performance of the CIFS connector is concerned, that's a function wholly of the
jcifs library, the cifs server.  It is what it is, therefore, and there's not a lot you can
do about it, other than to make sure there are no obvious bottlenecks in the network or errors
in the log.

Karl




On Fri, Jan 23, 2015 at 6:52 AM, Kambiz Niktabar <niktabar@yahoo.com> wrote:

Thanks for your prompt reply. Basically the snapshot I sent you, is related to the test for
crawling documents on the local disk and File system as output connector (outputting into
a folder on local disk too) so in this case no switch is involved in the test. I tried testing
the same folder with File System repository connection and output to Solr and it was very
quick so it seems to be something related to JCIF connector.What kind of performance do you
get with that JCIF connector (docs/sec)?
P.S. What exactly that "access" time means? is it the time that connector reads and fetches
the content into the %USERPROFILE%\Local Settings\Temp ?
RegardsKambiz
      From: Karl Wright <daddywri@gmail.com>
 To: Kambiz Niktabar <niktabar@yahoo.com> 
 Sent: Friday, January 23, 2015 12:23 PM
 Subject: Re: Slow performance of Windows Share connector
   
>From your simple history, dividing the size of the document by the time it takes to fetch
it, I get a pretty constant number (about 70 bytes per millisecond, or 70K bytes per second,
on average).  The longer the file, though, the slower it gets.  It looks to me like you
are crawling through an internet switch somewhere that is throttling your fetches.  Popular
behavior for such switches these days is to have fetches start off being fast, but then progressively
slow down to some minimum speed as more data is transmitted.  Obviously the point is to conserve
bandwidth.


Karl





On Fri, Jan 23, 2015 at 6:15 AM, Kambiz Niktabar <niktabar@yahoo.com> wrote:

Hi Karl,
I am facing some kind of performance issue with Windows Share (JCIF) connector. Crawling a
folder contains PDF and Word documents (with images in the file) takes long time. The following
scenarios have been tested:1- Testing with Solr and File system connector in separate jobs
but the result were almost the same.2- Copying documents into the local disk of ManifoldCF
Server but no difference, so it couldn't be network issue
Actually by looking at the simple history report (for the scenario of documents on local disk
and File system as output connector), I noticed that the access time for some documents are
extremely long (check attached snapshot). As it shows, there is not always any direct relation
between volume of the file and the access time.Do you have any idea what could be the reason
for the slow performance?
RegardsKambiz Niktabar




   



   



  
Mime
View raw message