Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 95870 invoked from network); 14 Oct 2010 14:18:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Oct 2010 14:18:19 -0000 Received: (qmail 83619 invoked by uid 500); 14 Oct 2010 14:18:17 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83245 invoked by uid 500); 14 Oct 2010 14:18:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83233 invoked by uid 99); 14 Oct 2010 14:18:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Oct 2010 14:18:13 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of appy74@dsl.pipex.com designates 212.74.114.7 as permitted sender) Received: from [212.74.114.7] (HELO mk-outboundfilter-3-a-1.mail.uk.tiscali.com) (212.74.114.7) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Oct 2010 14:18:03 +0000 X-Trace: 406262047/mk-outboundfilter-1.mail.uk.tiscali.com/PIPEX/$PIPEX-MX-ACCEPTED/pipex-infrastructure/193.61.44.46/None/appy74@dsl.pipex.com X-SBRS: None X-RemoteIP: 193.61.44.46 X-IP-MAIL-FROM: appy74@dsl.pipex.com X-SMTP-AUTH: X-Originating-Country: GB/UNITED KINGDOM X-MUA: Pipex Webmail (IMP3.1) X-IP-BHB: Once X-IP-Webmail: TRUE X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcFAFCttkzBPSwu/2dsb2JhbACTL8xHhUgEjT8 X-IronPort-AV: E=Sophos;i="4.57,330,1283727600"; d="scan'208";a="406262047" X-IP-Direction: IN Received: from netmail.pipex.net (HELO mk-netmail-1.mail.uk.tiscali.com) ([212.74.100.20]) by smtp.pipex.tiscali.co.uk with ESMTP/TLS/DHE-RSA-AES256-SHA; 14 Oct 2010 15:17:43 +0100 Received: from mk-netmail-1.mail.uk.tiscali.com (localhost.localdomain [127.0.0.1]) by mk-netmail-1.mail.uk.tiscali.com (8.13.8/8.13.8) with ESMTP id o9EEHhEM001140 for ; Thu, 14 Oct 2010 15:17:43 +0100 Received: (from apache@localhost) by mk-netmail-1.mail.uk.tiscali.com (8.13.8/8.13.8/Submit) id o9EEHhSK001139 for java-user@lucene.apache.org; Thu, 14 Oct 2010 15:17:43 +0100 Received: from 193.61.44.46 ( [193.61.44.46]) as user appy74@dsl.pipex.com by netmail.pipex.net with HTTP; Thu, 14 Oct 2010 15:17:43 +0100 Message-ID: <1287065863.4cb711077458e@netmail.pipex.net> Date: Thu, 14 Oct 2010 15:17:43 +0100 From: appy74@dsl.pipex.com To: java-user@lucene.apache.org Subject: Use of Lucene to store data from RSS feeds MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Pipex Webmail (IMP3.1) X-Originating-IP: 193.61.44.46 X-Pipex-Username: appy74@dsl.pipex.com X-Usage: Pipex Webmail is subject to the standard Pipex terms and conditions of use X-Virus-Checked: Checked by ClamAV on apache.org Hello I would like to store data retrieved hourly from RSS feeds in a database or in Lucene so that the text can be easily indexed for word frequencies. I need to get the text from the title and description elements of RSS items. Ideally, for each hourly retrieval from a given feed, I would add a row to a table in a dataset made up of the following columns: feed_url, title_element_text, description_element_text, polling_date_time >From this, I can look up any element in a feed and calculate keyword frequencies based upon the length of time required. This can be done as a database table and hashmaps used to calculate word frequencies. But can I do this in Lucene to this degree of granularity at all? If so, would each feed form a Lucene document or would each 'row' from the database table form one? Can anyone advise? Thanks Martin O'Shea. -- --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org