manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Wiki connector stuck crawling namespaces other than default
Date Wed, 01 Oct 2014 13:57:47 GMT
Hi Kambiz,

In the log you sent, I did not see any activity at all other than seeding.
Was the log complete?

You can get a better sense of what is happening by obtaining a simple
history report for this connection, and a document status report for the
job.  If there are only 27 documents, it should be very clear what is
happening by looking at these. Can you include them please?

Karl


On Wed, Oct 1, 2014 at 9:50 AM, Kambiz Niktabar <niktabar@yahoo.com> wrote:

> Hi Karl,
>
> Snapshot of the job view page is attached. By the way, it seems the number
> of pages under that namespace is only 27 and they are not being processed
> even after some minutes (see the second snapshot)
>
> Regards
> Kambiz
>
>   ------------------------------
>  *From:* Karl Wright <daddywri@gmail.com>
> *To:* "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>; Kambiz
> Niktabar <niktabar@yahoo.com>
> *Sent:* Wednesday, October 1, 2014 2:05 PM
> *Subject:* Re: Wiki connector stuck crawling namespaces other than default
>
> Hi Kambiz,
>
> The debugging output indicates that your namespace name is "404".  That
> doesn't sound correct to me.
>
> >>>>>>
> GET
> /wiki/api.php?format=xml&action=query&list=allpages&apnamespace=404&apfrom=Africa%3ATetianCarbonates&aplimit=500
> HTTP/1.1
> <<<<<<
>
> I've gone back and looked at the code and can find no way that the
> namespace would be corrupted.  But maybe this is actually correct.  Can you
> send along a screen shot of the view page for the job?
>
> Also, the wiki connector seeds documents in batches of 500 at a time.  It
> uses the last title fetched in order to be able to find the next batch of
> 500.  So if there are a lot of documents, it will take a while to seed them
> all.  In your log I see signs that this is what is happening.  Have a look
> at all the GET requests and note the apfrom parameter.
>
>
>
>
>
> Thanks,
> Karl
>
>
>
>
>

Mime
View raw message