Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A101C6D7F for ; Tue, 19 Jul 2011 14:18:27 +0000 (UTC) Received: (qmail 94329 invoked by uid 500); 19 Jul 2011 14:18:26 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 94152 invoked by uid 500); 19 Jul 2011 14:18:25 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Received: (qmail 94144 invoked by uid 99); 19 Jul 2011 14:18:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 14:18:25 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.52.84.107] (HELO cp-out7.libero.it) (212.52.84.107) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 14:18:18 +0000 X-CTCH-Spam: Unknown X-CTCH-RefID: str=0001.0A0B0201.4E259214.00CC,ss=1,re=0.000,fgs=0 X-libjamoibt: 1555 Received: from whisky01.t-hoster.com (151.40.129.75) by cp-out7.libero.it (8.5.133) id 4E22BE8B0052DAB8 for user@nutch.apache.org; Tue, 19 Jul 2011 16:17:56 +0200 Received: from whisky01.t-hoster.com (localhost [127.0.0.1]) by whisky01.t-hoster.com (Postfix) with ESMTP id DC75AC1424 for ; Tue, 19 Jul 2011 16:17:55 +0200 (CEST) Received: by whisky01.t-hoster.com (Postfix, from userid 5001) id C782AC16D2; Tue, 19 Jul 2011 16:17:55 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on whisky01.t-hoster.com X-Spam-Level: Received: from [192.168.128.99] (unknown [217.118.111.98]) (Authenticated sender: matteo.castellani@t-hoster.com) by whisky01.t-hoster.com (Postfix) with ESMTPSA id 628FFC1424 for ; Tue, 19 Jul 2011 16:17:55 +0200 (CEST) Message-ID: <4E25921A.5090701@t-hoster.com> Date: Tue, 19 Jul 2011 16:18:02 +0200 From: El-Glabro User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20110702 Icedove/3.0.11 ThunderBrowse/3.3.5 MIME-Version: 1.0 To: user@nutch.apache.org Subject: Re: Nutch War file References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-Old-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.2.5 I have solved my problem with this: $ bin/nutch readseg -get crawl_urls/segments/XXXXXX http://thepagethatyouwanttosee/uri/ On 18/07/11 23:10, Sethi, Parampreet wrote: > Hey Lewis, > Thanks for the quick reply. I have setup Nutch with Solr and I am able to > index the documents in solr server. How can I check the downloaded html > content? I need to parse the content to fetch rich snippets data from > various sites. > > Also, I have installed hadoop separately on my system and was trying to > integrate hadoop with Nutch. Is there any tutorial available to do this? > > Thanks > Param > AIM : parampreetsethi > Blog: http://param-techie.blogspot.com > > On 7/18/11 5:01 PM, "lewis john mcgibbney" > wrote: > > >> Simple answer here is no. >> >> Both the web app and Lucene index which previously shipped with Nutch has >> been deprecated. >> >> Please have a a look at the new tutorial [1] and the site for more >> information on the new functionality and features which ship with Nutch 1.3 >> >> [1] http://wiki.apache.org/nutch/RunningNutchAndSolr >> >> >> On Mon, Jul 18, 2011 at 9:52 PM, Sethi, Parampreet< >> parampreet.sethi@teamaol.com> wrote: >> >> >>> Hi All, >>> >>> I downloaded the source code for Nutch 1.3 version. I tried generating war >>> file using command: >>> Ant war >>> >>> But I am getting error (I checked the build.xml, the war task is indeed >>> missing.) >>> >>> BUILD FAILED >>> Target "war" does not exist in the project "Nutch". >>> >>> Is there any other way to generate nutch.war in 1.3 version? >>> >>> Thanks >>> Param >>> >>> >>> >> >