| Nicolás Lichtmaier |
Plugins initialized all the time! |
Mon, 28 May, 20:47 |
| Nicolás Lichtmaier |
Re: Plugins initialized all the time! |
Mon, 28 May, 21:00 |
| Nicolás Lichtmaier |
Making "Hits" work as a normal List |
Thu, 31 May, 20:58 |
| Nicolás Lichtmaier |
[PATCH] Moving HitDetails construction to a constructor =) |
Thu, 31 May, 21:57 |
| Nicolás Lichtmaier |
Re: Plugins initialized all the time! |
Tue, 29 May, 20:39 |
| Nicolás Lichtmaier |
Re: Plugins initialized all the time! |
Tue, 29 May, 21:56 |
| Nicolás Lichtmaier |
Re: Plugins initialized all the time! |
Thu, 31 May, 17:54 |
| Doğacan Güney |
Re: Bug (with fix): Neko HTML parser goes on defaults. |
Mon, 21 May, 13:47 |
| Doğacan Güney |
Re: Plugins initialized all the time! |
Tue, 29 May, 15:50 |
| Doğacan Güney |
Re: Plugins initialized all the time! |
Tue, 29 May, 16:52 |
| Doğacan Güney |
Re: Plugins initialized all the time! |
Wed, 30 May, 06:07 |
| Doğacan Güney |
Re: Plugins initialized all the time! |
Wed, 30 May, 11:47 |
| Doğacan Güney |
Re: Plugins initialized all the time! |
Thu, 31 May, 14:02 |
| Marcin Okraszewski |
=?UTF-8?Q?Bug_(with_fix):_Neko_HTML_parser_goes_on_defaults.?= |
Mon, 21 May, 10:45 |
| Marcin Okraszewski |
=?UTF-8?Q?Re:_Bug_(with_fix):_Neko_HTML_parser_goes_on_defaults.?= |
Mon, 21 May, 14:09 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt |
Tue, 01 May, 08:42 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Wed, 09 May, 08:47 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt |
Thu, 10 May, 12:47 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Thu, 10 May, 12:58 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Fri, 11 May, 07:59 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sun, 13 May, 09:28 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Sun, 13 May, 11:06 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Sun, 13 May, 11:15 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Sun, 13 May, 16:01 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sun, 13 May, 20:09 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Mon, 14 May, 17:52 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Mon, 14 May, 17:54 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-25) needs 'character encoding' detector |
Mon, 21 May, 20:48 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters |
Tue, 22 May, 09:23 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters |
Wed, 23 May, 06:10 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-491) dedup fails with ArrayIndexOutOfBoundsException |
Thu, 24 May, 11:55 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters |
Tue, 29 May, 12:22 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates |
Thu, 31 May, 08:52 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates |
Thu, 31 May, 08:52 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-495) Unnecessary delays in Fetcher2 |
Thu, 31 May, 15:49 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-495) Unnecessary delays in Fetcher2 |
Thu, 31 May, 15:51 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-466) Flexible segment format |
Thu, 31 May, 19:28 |
| Nicolás Lichtmaier (JIRA) |
[jira] Updated: (NUTCH-479) Support for OR queries |
Wed, 09 May, 20:32 |
| Nicolás Lichtmaier (JIRA) |
[jira] Created: (NUTCH-491) dedup fails with ArrayIndexOutOfBoundsException |
Wed, 23 May, 16:53 |
| Nicolás Lichtmaier (JIRA) |
[jira] Created: (NUTCH-492) java.lang.OutOfMemoryError while indexing. |
Sat, 26 May, 23:42 |
| Ronny Næss (JIRA) |
[jira] Commented: (NUTCH-470) Adding optional terms to a query |
Wed, 09 May, 13:34 |
| Andrzej Bialecki |
Re: SIGSEGV |
Sun, 06 May, 15:00 |
| Andrzej Bialecki |
Re: svn commit: r536606 - in /lucene/nutch/trunk: ./ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/metadata/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/util/ src/plugin/creativecommons/src/test/org/creativecommons/nutch/ src/... |
Wed, 09 May, 18:54 |
| Andrzej Bialecki |
Re: Issues pending before 0.9 release |
Fri, 18 May, 07:17 |
| Andrzej Bialecki |
Re: Get meta name="description" and other meta tags from Content |
Wed, 23 May, 16:54 |
| Andrzej Bialecki |
Re: Plugins initialized all the time! |
Wed, 30 May, 11:01 |
| Andrzej Bialecki |
Re: [jira] Resolved: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content |
Thu, 31 May, 10:18 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-477) Extend URLFilters to support different filtering chains |
Thu, 03 May, 21:53 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-477) Extend URLFilters to support different filtering chains |
Thu, 03 May, 21:55 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-479) Support for OR queries |
Mon, 07 May, 19:15 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-479) Support for OR queries |
Mon, 07 May, 19:18 |
| Andrzej Bialecki (JIRA) |
[jira] Assigned: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Wed, 09 May, 17:24 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Wed, 09 May, 18:03 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-467) DeleteDuplicate fails if Segment index directory has 0 documents |
Wed, 09 May, 18:05 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-418) Fixes parsing of XHTML (e.g. title) |
Wed, 09 May, 18:40 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-417) After upgrade to hadoop-0.9.1, parsing and indexing doesn't work. |
Wed, 09 May, 18:44 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters |
Wed, 09 May, 18:51 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-393) Indexer doesn't handle null documents returned by filters |
Wed, 09 May, 19:38 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-479) Support for OR queries |
Wed, 09 May, 21:48 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sat, 12 May, 21:55 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Mon, 14 May, 14:52 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Mon, 14 May, 21:58 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-486) Break searcher dependency on commons-cli |
Tue, 15 May, 06:22 |
| Andrzej Bialecki (JIRA) |
[jira] Work started: (NUTCH-466) Flexible segment format |
Mon, 28 May, 09:01 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content |
Wed, 30 May, 18:37 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-466) Flexible segment format |
Thu, 31 May, 18:42 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-486) Break searcher dependency on commons-cli |
Thu, 31 May, 19:01 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-466) Flexible segment format |
Thu, 31 May, 19:55 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-392) OutputFormat implementations should pass on Progressable |
Thu, 31 May, 21:25 |
| Antonio Eggberg (JIRA) |
[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Mon, 07 May, 06:06 |
| Antony Bowesman (JIRA) |
[jira] Commented: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file |
Thu, 10 May, 06:50 |
| Armel T. Nene |
RE: Document Classification - indexing question |
Tue, 08 May, 10:51 |
| Armel T. Nene |
RE: Document Classification - indexing question |
Tue, 08 May, 13:03 |
| Bastian Preindl |
Document Classification - indexing question |
Tue, 08 May, 10:30 |
| Bastian Preindl |
Re: Document Classification - indexing question |
Tue, 08 May, 12:37 |
| Brian Whitman |
SIGSEGV |
Sat, 05 May, 21:59 |
| Brian Whitman |
Re: SIGSEGV |
Sun, 06 May, 17:47 |
| Brian Whitman |
Re: SIGSEGV |
Mon, 07 May, 22:34 |
| Brian Whitman |
Re: SIGSEGV |
Wed, 09 May, 18:10 |
| Briggs |
Re: Plugins initialized all the time! |
Tue, 29 May, 16:07 |
| Briggs |
Re: Plugins initialized all the time! |
Tue, 29 May, 17:16 |
| Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Thu, 10 May, 14:32 |
| Chris A. Mattmann (JIRA) |
[jira] Reopened: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser |
Sun, 13 May, 16:23 |
| Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Sun, 13 May, 16:25 |
| Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Wed, 30 May, 13:55 |
| Chris Mattmann |
Committer |
Wed, 30 May, 13:42 |
| Dennis Kubes |
Re: SIGSEGV |
Sat, 05 May, 22:39 |
| Dennis Kubes |
Re: SIGSEGV |
Mon, 07 May, 13:07 |
| Dennis Kubes |
Re: OutOfMemoryError - Why should the while(1) loop stop? |
Wed, 30 May, 15:38 |
| Dennis Kubes |
Re: How is lib-http plugin called? It is not there in plugins.include! |
Thu, 31 May, 15:32 |
| Doug Cook (JIRA) |
[jira] Commented: (NUTCH-25) needs 'character encoding' detector |
Mon, 21 May, 16:34 |
| Doug Cook (JIRA) |
[jira] Commented: (NUTCH-25) needs 'character encoding' detector |
Tue, 22 May, 22:28 |
| Doug Cutting |
Re: NUTCH-348 and Nutch-0.7.2 |
Thu, 24 May, 16:27 |
| Doug Cutting |
Re: proposal for committer |
Tue, 29 May, 20:45 |
| Emmanuel Joke (JIRA) |
[jira] Created: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list |
Tue, 22 May, 07:38 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list |
Tue, 22 May, 07:40 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list |
Tue, 22 May, 07:42 |
| Emmanuel Joke (JIRA) |
[jira] Created: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters |
Tue, 22 May, 08:35 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters |
Tue, 22 May, 08:37 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters |
Wed, 23 May, 03:37 |