Return-Path: Delivered-To: apmail-lucene-nutch-user-archive@www.apache.org Received: (qmail 48462 invoked from network); 10 Nov 2008 15:16:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Nov 2008 15:16:09 -0000 Received: (qmail 65050 invoked by uid 500); 10 Nov 2008 15:16:09 -0000 Delivered-To: apmail-lucene-nutch-user-archive@lucene.apache.org Received: (qmail 65010 invoked by uid 500); 10 Nov 2008 15:16:09 -0000 Mailing-List: contact nutch-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@lucene.apache.org Delivered-To: mailing list nutch-user@lucene.apache.org Received: (qmail 64999 invoked by uid 99); 10 Nov 2008 15:16:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2008 07:16:09 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Ray.Lukas@idearc.com designates 151.138.253.25 as permitted sender) Received: from [151.138.253.25] (HELO dfw2w2ssmtp6.idearc.com) (151.138.253.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2008 15:14:48 +0000 Received: from dfw2w2smail8.na1.vis.verizon.com ([158.95.223.63]) by dfw2w2ssmtp6.idearc.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 10 Nov 2008 09:11:28 -0600 Received: from dfw2w2smail5.na1.vis.verizon.com ([158.95.223.25]) by dfw2w2smail8.na1.vis.verizon.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 10 Nov 2008 09:15:10 -0600 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Example in Java Please Date: Mon, 10 Nov 2008 09:15:10 -0600 Message-ID: <6165226BDD41964D80E23DDFF26DD5C70A4DB1E2@dfw2w2smail5.na1.vis.verizon.com> In-Reply-To: <6165226BDD41964D80E23DDFF26DD5C70A4DB1D4@dfw2w2smail5.na1.vis.verizon.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Example in Java Please Thread-Index: AclDPNi+K+FAn/8JTGKqyHiGGgkCnQACa23w References: <6165226BDD41964D80E23DDFF26DD5C70A4DB1D4@dfw2w2smail5.na1.vis.verizon.com> From: "Lukas, Ray" To: X-OriginalArrivalTime: 10 Nov 2008 15:15:10.0329 (UTC) FILETIME=[1F478690:01C94347] X-Virus-Checked: Checked by ClamAV on apache.org There is a really good article at=20 http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.htm l Written a while back by Tom White, while older (not Tom, the article), it is a very good description of Nutch for a beginner. Worth looking at and reading if, like me, you are new to Nutch. Thought I would post that for other newbies.=20 ray -----Original Message----- From: Lukas, Ray [mailto:Ray.Lukas@idearc.com]=20 Sent: Monday, November 10, 2008 9:02 AM To: nutch-user@lucene.apache.org Subject: Example in Java Please If you could, please. I am, as you probably are, or have been in the recent past, short on time for my project. I need something very simple. An example that goes to a single URL, parses the pages under it, gathers up all the words (terms) and returns me a Lucene index of them so that I can then say "do any of the words I am thinking (terms from my Oracle database) appear in this index and how many times do they appear". That is it, very simple. I would like to use Nutch. I am going through the Nutch source code examples which require someone to understand Hadoop. I would love to, if I had the time, which I do not. So can someone post or point me to an example. Sorry to bother you, but time is a problem, I hope that you understand, thanks