Return-Path: Delivered-To: apmail-lucene-nutch-user-archive@www.apache.org Received: (qmail 31630 invoked from network); 8 Nov 2007 09:43:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Nov 2007 09:43:13 -0000 Received: (qmail 99362 invoked by uid 500); 8 Nov 2007 09:43:00 -0000 Delivered-To: apmail-lucene-nutch-user-archive@lucene.apache.org Received: (qmail 98519 invoked by uid 500); 8 Nov 2007 09:42:59 -0000 Mailing-List: contact nutch-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@lucene.apache.org Delivered-To: mailing list nutch-user@lucene.apache.org Received: (qmail 98506 invoked by uid 99); 8 Nov 2007 09:42:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2007 01:42:59 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hank777@gmail.com designates 64.233.182.191 as permitted sender) Received: from [64.233.182.191] (HELO nf-out-0910.google.com) (64.233.182.191) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2007 09:43:02 +0000 Received: by nf-out-0910.google.com with SMTP id d3so59677nfc for ; Thu, 08 Nov 2007 01:42:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=AAAo5oVbJttGSDohDDJBvIkWYtUEaAlosshgrCoUTGc=; b=nvR7KOtUt+9fQWakIXKnkeICa5EE6he4LJqO3OO0iy+sy3shbWItq0FnEG+g0tkJ91DmG6VLru44jvlTyXVdt/5QNG+K/RiliYt+3zkuwzNW208hmv+IHadkU5Q48bQIK3hwTToCLc56jpSYi+U+zDsbMnr7vqJh5Lr1aqVg6xg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=oMtpwTS0upFR/afbjQUHMS+/C7gD3hzpwAJ0VPFrlWii72iT00JV9oTMIBAvw4Q7LS9Ywy2VkVL8l7BcQc5mcRyTWLwaL0jJWpCfLBciHGM2FfCzC1/17PEzKHGzFLTyzcS5zDEfDVI3XVpO+IpfGladkLf0k3JRrP/uee70/s8= Received: by 10.78.147.6 with SMTP id u6mr296828hud.1194514961174; Thu, 08 Nov 2007 01:42:41 -0800 (PST) Received: by 10.78.179.10 with HTTP; Thu, 8 Nov 2007 01:42:41 -0800 (PST) Message-ID: Date: Thu, 8 Nov 2007 04:42:41 -0500 From: "hank williams" To: nutch-user@lucene.apache.org Subject: noob wants to know: joining with a relational database result, is it possible? MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Checked: Checked by ClamAV on apache.org I want to do searches within a constrained set of URLs where the constrained set of URLs is determined by a MySQL result. For example, lets say we have a program that is maintaining a MySQL database of URLs that also have a "name" field. So I want to search in the database for all the URLs that have "foo" in the name field that have "bar" in the text of the web page. Is there any way to tell Nutch "hey! i don't want *all* the results that have the word 'bar', but just ones that are within this set of URLs that I am giving you." In this circumstance perhaps you are feeding Nutch a list of URLs to constrain it. Or perhaps there is some other way. I am not necessarily suggesting the best way here, I don't have a strategy yet. I am just wondering if there is a way to marry the worlds of structured database search and Nutch/web search in a type of "cross database-type" join. Hank