manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Sharepoint 2013 Crawling a large list
Date Fri, 08 Apr 2016 10:05:38 GMT
Hi Cheng,

That is a pretty impressively messed up system!

Let's start with what we know and then go on to what we don't.

The "Remote procedure exception" error is due to an
org.apache.axis.AxisFault exception that is not apparently coming from the
server.  That's pretty weird in its own right.  Equally weird is the NPE
coming from within HttpClient during NTLM processing.  Unfortunately we
aren't seeing the actual stack traces themselves, which would allow us to
figure out what was happening; instead you are getting
ArrayIndexOutOfBounds and NullPointerExceptions doing basic things like
array copying (!).

Can you include one or two of the actual traces (with line numbers?)

My sense is that (a) you are using a non-standard JVM that is (b) running
out of memory, but not throwing an out of memory exception when that
happens.  Rather, it's blowing up and not allocating memory that it needs
instead.  It's running out of memory most likely because (c) you are using
Hsqldb, and hsqldb is keeping its database tables in memory, which is what
it does.

I would recommend either (1) give MCF more memory, or (2) better yet,
switch to Postgresql.  And if this keeps happening under either scenario,
please include a few of the full traces so I can make better sense of the
problem.

Please let us know what happens.

Thanks,
Karl


On Fri, Apr 8, 2016 at 3:32 AM, Cheng Zeng <zengc@hotmail.co.uk> wrote:

> Hi,
>
> I am trying to extract web pages and attachments from Sharepoint 2013 and
> upload these data to solr for indexing.
>
> I have installed the Sharepoint plugin on sharepoint 2013 server and been
> able to use manifoldCF to fetch items from the lists with less than 160
> items. My problem is that there are a few lists which have more than 4,900
> items. When manifoldCF tried to crawl on these large lists, it turned out
> that it started to process items very slow and seems to stop working, after
> 2,100 items were processed. I tried to slow down the speed to upload the
> items to the solr instance by forcing the working thread to sleep for 3
> seconds after every 50 items were added to the pipeline. I tried to slow
> down the speed several times but manifoldCF starts to process items very
> slow as long as 2,100 items in the list were processed. It is noted that
> manifoldCF  starts to process items very slow after around 30 minutes since
> the crawling job starts and the errors are tossed as follows.
>
> WARN 2016-04-08 12:29:14,762 (Worker thread '19') - Service interruption
> reported for job 1460088455222 connection 'SharepointRepoistoryConn':
> Remote procedure exception: ; nested exception is:
> java.lang.ArrayIndexOutOfBoundsException
> FATAL 2016-04-08 12:29:14,777 (Worker thread '28') - Error tossed: null
> java.lang.NullPointerException
> FATAL 2016-04-08 12:30:37,611 (Worker thread '29') - Error tossed: null
> java.lang.NullPointerException
>
>
> The log is attached.  If someone could help me, I would really appreciated
> it.
>
> Best regards,
>
> Cheng
>

Mime
View raw message