From nutch-dev-return-6782-apmail-lucene-nutch-dev-archive=lucene.apache.org@lucene.apache.org Fri Feb 09 08:37:32 2007 Return-Path: Delivered-To: apmail-lucene-nutch-dev-archive@www.apache.org Received: (qmail 50058 invoked from network); 9 Feb 2007 08:37:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Feb 2007 08:37:31 -0000 Received: (qmail 66166 invoked by uid 500); 9 Feb 2007 08:37:36 -0000 Delivered-To: apmail-lucene-nutch-dev-archive@lucene.apache.org Received: (qmail 65664 invoked by uid 500); 9 Feb 2007 08:37:34 -0000 Mailing-List: contact nutch-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-dev@lucene.apache.org Delivered-To: mailing list nutch-dev@lucene.apache.org Received: (qmail 65639 invoked by uid 99); 9 Feb 2007 08:37:33 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Feb 2007 00:37:33 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Feb 2007 00:37:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DD41441095B for ; Fri, 9 Feb 2007 00:37:05 -0800 (PST) Message-ID: <1689986.1171010225903.JavaMail.jira@brutus> Date: Fri, 9 Feb 2007 00:37:05 -0800 (PST) From: =?utf-8?Q?Dogacan_G=C3=BCney_=28JIRA=29?= To: nutch-dev@lucene.apache.org Subject: [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser In-Reply-To: <4050960.1170874445502.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/NUTCH-443?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471620 ]=20 Dogacan G=C3=BCney commented on NUTCH-443: ------------------------------------- This is pretty much the merge of our work(except parse-rss, it kept failing= on something like RSSContentUtils, so it returns a single parse for now).= =20 I also had a bug in MapWritable, this fixes it. Since the code now compiles :), I ran junit tests over it. TestFetcher fail= s for some reason, will look into it. Also, there is a bug in updatedb. If getParse returns keys different than c= ontent.getUrl and if these keys do not have entries in crawl_fetch, CrawlDb= Reducer will ignore those (assuming [correctly] that they are not fetched a= nd there is no point in processing them). I will look into this too. > allow parsers to return multiple Parse object, this will speed up the rss= parser > -------------------------------------------------------------------------= ------- > > Key: NUTCH-443 > URL: https://issues.apache.org/jira/browse/NUTCH-443 > Project: Nutch > Issue Type: New Feature > Components: fetcher > Affects Versions: 0.9.0 > Reporter: Renaud Richardet > Priority: Minor > Fix For: 0.9.0 > > Attachments: NUTCH-443-draft-v1.patch, parse-map-core-draft-v1.pa= tch, parse-map-core-untested.patch, parsers.diff > > > allow Parser#parse to return a Map. This way, the RSS parse= r can return multiple parse objects, that will all be indexed separately. A= dvantage: no need to fetch all feed-items separately. > see the discussion at http://www.nabble.com/RSS-fecter-and-index-individu= l-how-can-i-realize-this-function-tf3146271.html --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.