nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser
Date Tue, 13 Feb 2007 14:57:06 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472692
] 

Chris A. Mattmann commented on NUTCH-443:
-----------------------------------------

Hi Nutch Newbie:

I've already contacted Doğacan off-list and am currently in the process of testing his patch.
In open source development projects, the developers all have their own day jobs typical, along
with other stuff that they are busy doing. I am no different in this case. Additionally, a
patch such as this one, requires * a lot * of testing, since it fundamentally changes things
about the core Nutch API. I need to test the patch thoroughly before committing anything.
Additionally, this patch has its idiosyncracies, as do all other patches (e.g., for instance,
this patch in some places removes the log guards, and I'm not sure why yet, it has whitespace
issues as many patches do, it removes code in places and then adds it back in others, etc.).
These types of things must be addressed before anything is committed to Nutch. Since Doğacan
has taken the lead on making this patch happen (which is great by the way, thanks Doğacan!),
I will continue to work with him offlist to enlist him to perform these required updates.

So, while I'm not there yet, I am working on it. In the meanwhile, you are welcome tto patch
your Nutch system with the existing NUTCH-443 patch that I am working on, and start your development
from there. 

Cheers,
 Chris


> allow parsers to return multiple Parse object, this will speed up the rss parser
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-443
>                 URL: https://issues.apache.org/jira/browse/NUTCH-443
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>         Assigned To: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch, NUTCH-443-draft-v3.patch,
NUTCH-443-draft-v4.patch, NUTCH-443-draft-v5.patch, NUTCH-443-draft-v6.patch, parse-map-core-draft-v1.patch,
parse-map-core-untested.patch, parsers.diff
>
>
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser can
return multiple parse objects, that will all be indexed separately. Advantage: no need to
fetch all feed-items separately.
> see the discussion at http://www.nabble.com/RSS-fecter-and-index-individul-how-can-i-realize-this-function-tf3146271.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message