Return-Path: Delivered-To: apmail-incubator-nutch-user-archive@www.apache.org Received: (qmail 68775 invoked from network); 10 May 2005 01:45:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 May 2005 01:45:03 -0000 Received: (qmail 21144 invoked by uid 500); 10 May 2005 01:48:25 -0000 Mailing-List: contact nutch-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@incubator.apache.org Delivered-To: mailing list nutch-user@incubator.apache.org Received: (qmail 21126 invoked by uid 99); 10 May 2005 01:48:24 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=HTML_30_40,HTML_MESSAGE,RCVD_BY_IP X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: 63.208.196.166 is neither permitted nor denied by domain of zhoulibing@gmail.com) Received: from mxout2.mailhop.org (HELO mxout2.mailhop.org) (63.208.196.166) by apache.org (qpsmtpd/0.28) with ESMTP; Mon, 09 May 2005 18:48:24 -0700 Received: from mxin2.mailhop.org ([63.208.196.176] helo=mx1.mailhop.org) by mxout2.mailhop.org with esmtp (Exim 4.43) id 1DVJnv-000Anp-9C for user@nutch.org; Mon, 09 May 2005 21:44:59 -0400 Received: from zproxy.gmail.com ([64.233.162.205]) by mx1.mailhop.org with esmtp (Exim 4.44) id 1DVJnt-000Hr4-LL for user@nutch.org; Mon, 09 May 2005 21:44:57 -0400 Received: by zproxy.gmail.com with SMTP id 18so2521044nzp for ; Mon, 09 May 2005 18:44:50 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:references; b=rhRsdZjnTcLeDStD+Xm3kWspwT9GZ9fpH5wFpcK5vyF8SQCluXxW/JgTMXxHjxZPlItw9h2vVuiEsyWwSc/VAgkRtwMnYem8C+xSFdQKDYaRA1jhbYqxYZVs9mh3Q6aYNmOtfbKuMWkEpnSrBGjBNDch/sCHJa+ZFibpjCzo2Z4= Received: by 10.36.126.1 with SMTP id y1mr1276504nzc; Mon, 09 May 2005 18:44:50 -0700 (PDT) Received: by 10.36.32.17 with HTTP; Mon, 9 May 2005 18:44:50 -0700 (PDT) Message-ID: <47f8128205050918442184f0bd@mail.gmail.com> Date: Tue, 10 May 2005 09:44:50 +0800 From: Zhou LiBing Reply-To: Zhou LiBing To: user@nutch.org Subject: Re: [Nutch-general] using nutch just for crawling, not indexing? In-Reply-To: <20050502193151.33971.qmail@web31612.mail.mud.yahoo.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_470_156039.1115689490764" References: <20050502193151.33971.qmail@web31612.mail.mud.yahoo.com> X-Mail-Handler: MailHop by DynDNS.org X-Spam-Score: -2.5 (--) Delivered-To: user@nutch.org X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_470_156039.1115689490764 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Content-Disposition: inline hi I have a problem about the nutch crawler, How can I crawling the www according to one or serveral specified URL? becauseIdon't want to use the DMOZ data. On 5/3/05, Jason Manfield wrote: > > We would like to use nutch just for crawling, and then index the crawled > database into our proprietory datastore/index. How do we go about this? I > see that nutch is a shell script, so it is possible to just crawl. Once it > crawls, I suppose the crawled data is dumped into webdb. Are there exposed > APIs to extract the data from webdb? > > One more catch -- our company is a .NET shop :((, so we would like to use > C# to read the data of the fetched/crawled pages for further indexing. > > Ideas/suggestions? > > Any plans to have nutch for .NET (like dotLucene)? > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > -- ---Letter From your friend Blue at HUST CGCL--- ------=_Part_470_156039.1115689490764--