| Doğacan Güney |
Re: ant test failures |
Sat, 01 Sep, 12:48 |
| Doğacan Güney |
Re: bug with generate performance |
Fri, 07 Sep, 07:37 |
| Doğacan Güney |
Re: Limiting outlink tags. |
Fri, 07 Sep, 07:55 |
| Doğacan Güney |
Re: Build failed in Hudson: Nutch-Nightly #203 |
Tue, 11 Sep, 07:43 |
| Doğacan Güney |
Re: Build failed in Hudson: Nutch-Nightly #203 |
Tue, 11 Sep, 10:43 |
| Doğacan Güney |
Re: Scoring API issues (LONG) |
Tue, 18 Sep, 19:40 |
| Doğacan Güney |
Re: Host-level stats, ranking and recrawl |
Tue, 18 Sep, 19:43 |
| Doğacan Güney |
Re: Scoring API issues (LONG) |
Wed, 19 Sep, 06:09 |
| Doğacan Güney |
Re: Problem with trunk HtmlParser.java |
Thu, 27 Sep, 06:50 |
| Doğacan Güney |
Re: Build failed in Hudson: Nutch-Nightly #221 |
Sat, 29 Sep, 11:02 |
| Marcin Okraszewski |
=?UTF-8?Q?Limiting_outlink_tags.?= |
Thu, 06 Sep, 21:09 |
| Marcin Okraszewski |
=?UTF-8?Q?Re:_Parsing_extra_fields_from_an_html_page_in_the_web.?= =?UTF-8?Q?....?= |
Thu, 27 Sep, 19:29 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 07:47 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 07:49 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler |
Mon, 03 Sep, 07:53 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Mon, 03 Sep, 13:38 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 11:42 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Tue, 04 Sep, 11:42 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Tue, 04 Sep, 12:32 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Wed, 05 Sep, 15:07 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-251) Administration GUI |
Wed, 05 Sep, 18:09 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-546) file URL are filtered out by the crawler |
Thu, 06 Sep, 12:56 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb |
Thu, 06 Sep, 13:24 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-524) Generate Problem with Single Node |
Thu, 06 Sep, 13:26 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Fri, 07 Sep, 08:29 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Fri, 07 Sep, 08:29 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Mon, 10 Sep, 19:41 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-549) Bug |
Mon, 10 Sep, 19:41 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Mon, 10 Sep, 19:41 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-546) file URL are filtered out by the crawler |
Mon, 10 Sep, 19:47 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-491) dedup fails with ArrayIndexOutOfBoundsException |
Mon, 10 Sep, 19:49 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Mon, 10 Sep, 19:53 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Mon, 10 Sep, 20:00 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 10 Sep, 20:44 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-551) performance for generate is often really bad |
Wed, 12 Sep, 06:24 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Thu, 20 Sep, 14:06 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list |
Fri, 21 Sep, 11:27 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Fri, 21 Sep, 11:38 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication |
Fri, 21 Sep, 11:51 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Mon, 24 Sep, 08:28 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Mon, 24 Sep, 08:28 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Wed, 26 Sep, 07:55 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-25) needs 'character encoding' detector |
Wed, 26 Sep, 14:06 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-487) Neko HTML parser goes on default settings. |
Wed, 26 Sep, 14:06 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful. |
Wed, 26 Sep, 14:08 |
| Alexis Votta (JIRA) |
[jira] Commented: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication |
Tue, 25 Sep, 17:26 |
| Alexis Votta (JIRA) |
[jira] Created: (NUTCH-561) HttpClient plugin does not work with NTLM authentication |
Tue, 25 Sep, 17:28 |
| Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 18:20 |
| Andrzej Bialecki |
Re: bug with generate performance |
Fri, 07 Sep, 10:50 |
| Andrzej Bialecki |
GoogleMini URL rewriting |
Tue, 11 Sep, 20:01 |
| Andrzej Bialecki |
Scoring API issues (LONG) |
Thu, 13 Sep, 15:44 |
| Andrzej Bialecki |
Host-level stats, ranking and recrawl |
Mon, 17 Sep, 19:38 |
| Andrzej Bialecki |
Re: Scoring API issues (LONG) |
Tue, 18 Sep, 20:12 |
| Andrzej Bialecki |
Re: Scoring API issues (LONG) |
Wed, 19 Sep, 09:50 |
| Andrzej Bialecki |
Re: NUTCH-251(Administration gui) and next version |
Thu, 20 Sep, 19:33 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 03 Sep, 18:14 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb |
Thu, 06 Sep, 17:38 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm |
Mon, 10 Sep, 20:25 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-552) Upgrade Nutch to Hadoop 0.14.x |
Thu, 13 Sep, 16:09 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-553) Add more normalization rules to regex-normalize file. |
Thu, 13 Sep, 16:41 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol |
Tue, 18 Sep, 19:08 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol |
Tue, 18 Sep, 19:10 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication |
Fri, 21 Sep, 18:30 |
| Balachanthar |
Blank result page |
Fri, 21 Sep, 06:29 |
| Brian Whitman |
Re: nutch trunk filtering URLs in invertlinks even if -noFilter is on? |
Sun, 23 Sep, 15:38 |
| Brian Whitman |
Re: nutch trunk filtering URLs in invertlinks even if -noFilter is on? |
Sun, 23 Sep, 15:43 |
| Brian Whitman (JIRA) |
[jira] Commented: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable |
Fri, 14 Sep, 22:47 |
| Brian Whitman (JIRA) |
[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog |
Fri, 14 Sep, 23:34 |
| Brian Whitman (JIRA) |
[jira] Created: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol |
Sat, 15 Sep, 15:16 |
| Brian Whitman (JIRA) |
[jira] Updated: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol |
Mon, 17 Sep, 18:20 |
| Chris A. Mattmann (JIRA) |
[jira] Created: (NUTCH-562) Port mime type framework to use Tika mime detection framework |
Sat, 29 Sep, 04:36 |
| Chris A. Mattmann (JIRA) |
[jira] Work started: (NUTCH-562) Port mime type framework to use Tika mime detection framework |
Sat, 29 Sep, 04:36 |
| Chris Schneider |
Re: Host-level stats, ranking and recrawl |
Wed, 19 Sep, 16:02 |
| Chris Schneider (JIRA) |
[jira] Created: (NUTCH-558) Need tool to retrieve domain statistics |
Wed, 19 Sep, 23:52 |
| Chris Schneider (JIRA) |
[jira] Work started: (NUTCH-558) Need tool to retrieve domain statistics |
Fri, 21 Sep, 18:30 |
| Chris Schneider (JIRA) |
[jira] Updated: (NUTCH-558) Need tool to retrieve domain statistics |
Sat, 22 Sep, 21:59 |
| Chris Schneider (JIRA) |
[jira] Commented: (NUTCH-558) Need tool to retrieve domain statistics |
Sun, 23 Sep, 16:25 |
| Chris Schneider (JIRA) |
[jira] Commented: (NUTCH-558) Need tool to retrieve domain statistics |
Thu, 27 Sep, 15:20 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Mon, 03 Sep, 08:27 |
| Emmanuel Joke (JIRA) |
[jira] Closed: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb |
Tue, 04 Sep, 03:36 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format |
Tue, 04 Sep, 07:16 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Tue, 04 Sep, 08:46 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 10:34 |
| Emmanuel Joke (JIRA) |
[jira] Created: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 10:34 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 10:38 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Tue, 04 Sep, 15:30 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat |
Thu, 06 Sep, 16:32 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Tue, 11 Sep, 11:30 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication |
Wed, 19 Sep, 10:50 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Fri, 21 Sep, 16:03 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Fri, 21 Sep, 16:05 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Fri, 21 Sep, 16:05 |
| Enis Soztutar (JIRA) |
[jira] Commented: (NUTCH-558) Need tool to retrieve domain statistics |
Thu, 27 Sep, 08:15 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Tue, 04 Sep, 17:00 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler |
Tue, 11 Sep, 06:39 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1 |
Tue, 11 Sep, 06:39 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler |
Wed, 12 Sep, 04:22 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol |
Wed, 19 Sep, 05:09 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Tue, 25 Sep, 04:18 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful. |
Thu, 27 Sep, 17:38 |