Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 968669ADA for ; Tue, 7 Feb 2012 15:08:17 +0000 (UTC) Received: (qmail 22784 invoked by uid 500); 7 Feb 2012 15:08:16 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 22730 invoked by uid 500); 7 Feb 2012 15:08:15 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Delivered-To: moderator for user@nutch.apache.org Received: (qmail 45355 invoked by uid 99); 7 Feb 2012 09:22:57 -0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) From: Markus Jelsma Reply-To: markus@apache.org Organization: ASF To: user@nutch.apache.org Subject: Re: how are CSV/TXT files handled Date: Tue, 7 Feb 2012 10:17:48 +0100 User-Agent: KMail/1.13.5 (Linux/2.6.35-31-generic; KDE/4.5.5; i686; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201202071017.48416.markus@apache.org> X-Ziggo-spambar: --- X-Ziggo-spamscore: -3.2 X-Ziggo-spamreport: ALL_TRUSTED=-1,BAYES_05=-0.5,CM_REPLY_NOARROW=0.3,PROLO_TRUST_RDNS=-3,RDNS_DYNAMIC=0.982 X-Ziggo-Spam-Status: No X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: No X-Old-Spam-Status: No Upgrade to 1.4. > With the "nutch parsechecker" command I get the following error message: > > "Error: Could not find or load main class parsechecker", this doesn't sound > good! > > On Tue, Feb 7, 2012 at 9:58 AM, remi tassing wrote: > > The point that made me start thinking is because I got this error > > message: > > > > "failed(2,0): Can't retrieve Tika parser for mime-type > > application/ms-excel" > > > > I'm using Nutch-1.2 and my nutch-site.xml has: > > > > " > > > > plugin.includes > > > > protocol-httpclient|urlfilter-regex|parse-(text|html|js|tika)|inde > > x-(basic|anchor)|q..." > > > > Remi > > > > On Tue, Feb 7, 2012 at 9:16 AM, remi tassing wrote: > >> Hey guys, > >> > >> I checked the mailing-list archive but couldn't get an answer on this. I > >> think CSV and TXT don't need any kind of parsing, but how.are handled by > >> default? > >> > >> Remi