nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <>
Subject [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser
Date Fri, 09 Feb 2007 18:38:06 GMT


Chris A. Mattmann commented on NUTCH-443:

Nutch Newbie,

   What exactly do you mean when you mention Apache politics? Feedparser wasn't selected because
it was an Apache sub-project. In fact, that's as far from the truth as possible. I selected
feedparser at the time (in May 2005 or so), because it was the only one of the three RSS reading
APIs (informa, feedparser and rome) that I could figure out. The time that it took me to just
understand rome, and informa was far greater than the time that it took me to write the entire
RSS parser using feedparser.

   That said, things may have changed in the past year and a half. Perhaps Rome provides an
easier API than feedparser now. Perhaps informa is faster. I'm not exactly sure what the answer
to these and other questions on this subject are. However, before anything is said about feedparser,
it's only fair that the folks who wrote it get to chime in. For that matter, it would probably
be a good idea to contact Kevin Burton, the lead developer of the commons-feedparser, and
ask him about its relationship to rome, and other apis such as Stax, or informa even...


> allow parsers to return multiple Parse object, this will speed up the rss parser
> --------------------------------------------------------------------------------
>                 Key: NUTCH-443
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>            Priority: Minor
>             Fix For: 0.9.0
>         Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch, parse-map-core-draft-v1.patch,
parse-map-core-untested.patch, parsers.diff
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser can
return multiple parse objects, that will all be indexed separately. Advantage: no need to
fetch all feed-items separately.
> see the discussion at

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message