Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D887671C0 for ; Mon, 18 Jul 2011 23:04:50 +0000 (UTC) Received: (qmail 30789 invoked by uid 500); 18 Jul 2011 23:04:49 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 30750 invoked by uid 500); 18 Jul 2011 23:04:49 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Received: (qmail 30742 invoked by uid 99); 18 Jul 2011 23:04:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 23:04:49 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cambazz@gmail.com designates 209.85.210.52 as permitted sender) Received: from [209.85.210.52] (HELO mail-pz0-f52.google.com) (209.85.210.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 23:04:42 +0000 Received: by pzd13 with SMTP id 13so5842680pzd.11 for ; Mon, 18 Jul 2011 16:04:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=YXuLpHvCuB1DP1gxH5MVyO0ZYXwE2ZW9ibPQG89oVy0=; b=lZNBIny/+tW/IoAEmOCTPsVwxVvfcCgVMrS8xtVXwCTI8eWfoZIoBdQxHM6gJSAci2 NF7ycJZg5BA9FJfRGm4RS6esU+XL3x+iGnmLzyDDhFJK0OmlLIe5sx0B5vlen5waK9hd 9CEg+iRyBqQaMGdCY3wo9xk2C3Fj8SzFxa/iE= MIME-Version: 1.0 Received: by 10.68.63.41 with SMTP id d9mr8943345pbs.62.1311030261813; Mon, 18 Jul 2011 16:04:21 -0700 (PDT) Received: by 10.68.43.105 with HTTP; Mon, 18 Jul 2011 16:04:21 -0700 (PDT) Date: Tue, 19 Jul 2011 02:04:21 +0300 Message-ID: Subject: parser warnings From: Cam Bazz To: user@nutch.apache.org Content-Type: text/plain; charset=ISO-8859-1 What does the following log mean: 2011-07-19 01:00:07,034 WARN parse.ParserFactory - ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to contentType application/xhtml+xml via parse-plugins.xml, but its plugin.xml file does not claim to support contentType: application/xhtml+xml Does that mean that my html parser is not getting part of the crawled data? best.