Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4381A1762D for ; Wed, 1 Oct 2014 12:06:13 +0000 (UTC) Received: (qmail 96839 invoked by uid 500); 1 Oct 2014 12:06:12 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 96789 invoked by uid 500); 1 Oct 2014 12:06:12 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 96779 invoked by uid 99); 1 Oct 2014 12:06:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 12:06:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of daddywri@gmail.com designates 209.85.160.177 as permitted sender) Received: from [209.85.160.177] (HELO mail-yk0-f177.google.com) (209.85.160.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 12:06:07 +0000 Received: by mail-yk0-f177.google.com with SMTP id q200so59115ykb.8 for ; Wed, 01 Oct 2014 05:05:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=kceqlM+4nkXrgpPW/rnsMLgwz2XLJsAWoWsOmc30ffo=; b=d5ApfvRst/fjBzBIDROuwGaYc0x11Rl6ckNJ3wdwCDf7s47LtJfNFdfrWN6agZyGds eQ0hLLquP9UuWQkUCyhql8dRR4GANK71LSabTpcGEAx8NyGM12nkMqodQHoIuKca9+Pi bA7HldGgPn0fcqU+DRDs5OQxuoTqvHnIC2iVtcHfFFfQC+Uj3SxUo8pICQNVLrIouDgc KFx/vCMVOd9T0mlwuUjpsk2skFNy1YoCQmjwBOY3epTLiYkP1/+z7FPSlHYGZyDqsfYn 15ARSjKauAioCODcV4xwcdy86M4lJKn+XnT89bNNMsQvqsKPrd9nCAHC/NL54tOanTC5 rTig== MIME-Version: 1.0 X-Received: by 10.236.18.161 with SMTP id l21mr2176746yhl.195.1412165146707; Wed, 01 Oct 2014 05:05:46 -0700 (PDT) Received: by 10.170.189.214 with HTTP; Wed, 1 Oct 2014 05:05:46 -0700 (PDT) In-Reply-To: <1412164093.69966.YahooMailNeo@web141503.mail.bf1.yahoo.com> References: <-4537991995572748453@unknownmsgid> <1412164093.69966.YahooMailNeo@web141503.mail.bf1.yahoo.com> Date: Wed, 1 Oct 2014 08:05:46 -0400 Message-ID: Subject: Re: Wiki connector stuck crawling namespaces other than default From: Karl Wright To: "user@manifoldcf.apache.org" , Kambiz Niktabar Content-Type: multipart/alternative; boundary=089e0122a5c2ef0f2205045b51d7 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122a5c2ef0f2205045b51d7 Content-Type: text/plain; charset=UTF-8 Hi Kambiz, The debugging output indicates that your namespace name is "404". That doesn't sound correct to me. >>>>>> GET /wiki/api.php?format=xml&action=query&list=allpages&apnamespace=404&apfrom=Africa%3ATetianCarbonates&aplimit=500 HTTP/1.1 <<<<<< I've gone back and looked at the code and can find no way that the namespace would be corrupted. But maybe this is actually correct. Can you send along a screen shot of the view page for the job? Also, the wiki connector seeds documents in batches of 500 at a time. It uses the last title fetched in order to be able to find the next batch of 500. So if there are a lot of documents, it will take a while to seed them all. In your log I see signs that this is what is happening. Have a look at all the GET requests and note the apfrom parameter. Thanks, Karl --089e0122a5c2ef0f2205045b51d7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Kambiz,

The debugging = output indicates that your namespace name is "404".=C2=A0 That do= esn't sound correct to me.

>>>>>>
GET /wiki= /api.php?format=3Dxml&action=3Dquery&list=3Dallpages&apnamespac= e=3D404&apfrom=3DAfrica%3ATetianCarbonates&aplimit=3D500 HTTP/1.1<<<<<<

I've gone back and looked at th= e code and can find no way that the namespace would be corrupted.=C2=A0 But= maybe this is actually correct.=C2=A0 Can you send along a screen shot of = the view page for the job?

Also, the wiki connector seeds= documents in batches of 500 at a time.=C2=A0 It uses the last title fetche= d in order to be able to find the next batch of 500.=C2=A0 So if there are = a lot of documents, it will take a while to seed them all.=C2=A0 In your lo= g I see signs that this is what is happening.=C2=A0 Have a look at all the = GET requests and note the apfrom parameter.


Thanks,<= br>Karl


--089e0122a5c2ef0f2205045b51d7--