manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Davis <dansm...@gmail.com>
Subject Re: Fw: Slow performance of Windows Share connector
Date Mon, 26 Jan 2015 16:20:23 GMT
For ingesting CIFS, I wonder whether it would be efficient (on Linux) to
use the CIFS kernel module and to ask for NT Security Descriptors from that
(if document level security is desired/needed). I know that the CIFS kernel
module has this capability in recent kernels (kernel 3.6+ I think).   This
capability (getting the NT Security Descriptor) was not present in some
version of the 2.x kernel but then it was added in some 3.x kernel.  Of
course, the NT Security Descriptor is available in a Windows environment as
well for network drives.

I'm mainly a lurker here, but I'm looking forward to ManifoldCF becoming
more mature ... and pitching in with suggestions where I can.

On Mon, Jan 26, 2015 at 7:37 AM, Kambiz Niktabar <niktabar@yahoo.com> wrote:

> Hi,
>
> I'm sending to user mailing list in case anybody else has the same issue
> with JCIF connector.
>
> Regards
> Kambiz Niktabar
>
>   ----- Forwarded Message -----
>  *From:* Karl Wright <daddywri@gmail.com>
> *To:* Kambiz Niktabar <niktabar@yahoo.com>
> *Sent:* Monday, January 26, 2015 12:58 PM
> *Subject:* Re: Slow performance of Windows Share connector
>
> Thanks for the update!
> It would be great if you could post this to the user list; other people
> may encounter similar problems.
>
> Karl
>
>
>
>
> On Mon, Jan 26, 2015 at 6:25 AM, Kambiz Niktabar <niktabar@yahoo.com>
> wrote:
>
> Hi Karl,
>
> As promised I wanted to inform you about the result of this case.
> By looking at the capture of WireShark, I noticed that there are many
> errors complaining about duplicate domain name. Then I just change the
> "Authentication domain" in Server tab of repository connection to our
> pre-Windows 2000 domain name and now it works perfectly fine.
>
> Regards
> Kambiz
>
>   ------------------------------
>  *From:* Karl Wright <daddywri@gmail.com>
> *To:* Kambiz Niktabar <niktabar@yahoo.com>
> *Sent:* Friday, January 23, 2015 1:18 PM
> *Subject:* Re: Slow performance of Windows Share connector
>
> Hi Kambiz,
>
> The "access" time includes the fetching of the document up to the time
> spent sending the document to the outputs.
>
> If you are crawling the local file system through JCIFS, and you are still
> writing data locally, then clearly the output connection is not involved.
>
> My suspicion is that, because CIFS is involved under Windows, it's
> possible that you are indeed going through network even though both source
> and destination are local.  You can readily figure this out using
> WireShark, and see what packets are going in and out of that machine during
> crawling.
>
> I should also state that, in my experience, the CIFS protocol is
> relatively fragile, because it is multiplexed.  That means that when any
> one virtual connection has errors, multiple connections must be dropped and
> retried.  Windows implementations of CIFS, likewise, are not very good at
> handling large numbers of virtual connections simultaneously.  If you have
> a max connection count that is set too big, then, you might have errors you
> are unaware of.
>
> My suggestion: First, look at the log to see if there are any errors.
> Second: lower the maximum number of JCIFS repository connections to
> between 2 and 5.
> Third: Verify that you are not doing something funny with network using
> Wireshark.
>
> As far as performance of the CIFS connector is concerned, that's a
> function wholly of the jcifs library, the cifs server.  It is what it is,
> therefore, and there's not a lot you can do about it, other than to make
> sure there are no obvious bottlenecks in the network or errors in the log.
>
> Karl
>
>
>
>
> On Fri, Jan 23, 2015 at 6:52 AM, Kambiz Niktabar <niktabar@yahoo.com>
> wrote:
>
> Thanks for your prompt reply. Basically the snapshot I sent you, is
> related to the test for crawling documents on the local disk and File
> system as output connector (outputting into a folder on local disk too) so
> in this case no switch is involved in the test.
> I tried testing the same folder with File System repository connection and
> output to Solr and it was very quick so it seems to be something related to
> JCIF connector.
> What kind of performance do you get with that JCIF connector (docs/sec)?
>
> P.S. What exactly that "access" time means? is it the time that connector
> reads and fetches the content into the %USERPROFILE%\Local Settings\Temp ?
>
> Regards
> Kambiz
>
>   ------------------------------
>  *From:* Karl Wright <daddywri@gmail.com>
> *To:* Kambiz Niktabar <niktabar@yahoo.com>
> *Sent:* Friday, January 23, 2015 12:23 PM
> *Subject:* Re: Slow performance of Windows Share connector
>
> From your simple history, dividing the size of the document by the time it
> takes to fetch it, I get a pretty constant number (about 70 bytes per
> millisecond, or 70K bytes per second, on average).  The longer the file,
> though, the slower it gets.  It looks to me like you are crawling through
> an internet switch somewhere that is throttling your fetches.  Popular
> behavior for such switches these days is to have fetches start off being
> fast, but then progressively slow down to some minimum speed as more data
> is transmitted.  Obviously the point is to conserve bandwidth.
>
>
> Karl
>
>
>
>
>
> On Fri, Jan 23, 2015 at 6:15 AM, Kambiz Niktabar <niktabar@yahoo.com>
> wrote:
>
> Hi Karl,
>
> I am facing some kind of performance issue with Windows Share (JCIF)
> connector. Crawling a folder contains PDF and Word documents (with images
> in the file) takes long time. The following scenarios have been tested:
> 1- Testing with Solr and File system connector in separate jobs but the
> result were almost the same.
> 2- Copying documents into the local disk of ManifoldCF Server but no
> difference, so it couldn't be network issue
>
> Actually by looking at the simple history report (for the scenario of
> documents on local disk and File system as output connector), I noticed
> that the access time for some documents are extremely long (check attached
> snapshot). As it shows, there is not always any direct relation between
> volume of the file and the access time.
> Do you have any idea what could be the reason for the slow performance?
>
> Regards
> Kambiz Niktabar
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message