| Uygar Yüzsüren |
neko parser or tagsoup parser? |
Mon, 03 Jul, 07:27 |
| Piotr Kosiorowski |
Re: Nutch web site |
Tue, 04 Jul, 15:55 |
| Piotr Kosiorowski |
Re: 0.8 release |
Tue, 04 Jul, 15:56 |
| Doug Cutting |
Re: 0.8 release |
Wed, 05 Jul, 10:46 |
| Stefan Groschupf |
Re: 0.8 release |
Wed, 05 Jul, 18:52 |
| Jérôme Charron |
Error with Hadoop-0.4.0 |
Thu, 06 Jul, 15:54 |
| Jerome Charron (JIRA) |
[jira] Resolved: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means |
Thu, 06 Jul, 16:49 |
| Sami Siren |
Re: Error with Hadoop-0.4.0 |
Thu, 06 Jul, 17:23 |
| Jérôme Charron |
Re: Error with Hadoop-0.4.0 |
Thu, 06 Jul, 21:48 |
| Doug Cutting (JIRA) |
[jira] Reopened: (NUTCH-309) Uses commons logging Code Guards |
Fri, 07 Jul, 08:59 |
| Jerome Charron (JIRA) |
[jira] Commented: (NUTCH-309) Uses commons logging Code Guards |
Fri, 07 Jul, 09:27 |
| nutch.newbie (JIRA) |
[jira] Commented: (NUTCH-300) Clustering API improvements |
Fri, 07 Jul, 10:01 |
| Dawid Weiss (JIRA) |
[jira] Commented: (NUTCH-300) Clustering API improvements |
Fri, 07 Jul, 13:17 |
| Stefan Groschupf |
Re: Error with Hadoop-0.4.0 |
Fri, 07 Jul, 15:14 |
| Lourival Júnior |
Number of pages different to Indexed documents |
Fri, 07 Jul, 17:02 |
| Teruhiko Kurosaka |
RE: 0.8 release |
Fri, 07 Jul, 20:41 |
| Jérôme Charron |
Re: Error with Hadoop-0.4.0 |
Fri, 07 Jul, 23:08 |
| Stefan Groschupf |
Re: Error with Hadoop-0.4.0 |
Sat, 08 Jul, 01:37 |
| Syed Kamran Ali |
Nutch based directory and crawler based on keyword |
Sat, 08 Jul, 14:03 |
| Stefan Neufeind (JIRA) |
[jira] Updated: (NUTCH-279) Additions for regex-normalize |
Sun, 09 Jul, 15:33 |
| AJ Chen |
Crawl error |
Mon, 10 Jul, 04:47 |
| Stefan Groschupf |
Re: [Nutch-dev] Crawl error |
Mon, 10 Jul, 05:36 |
| Stefan Groschupf |
Re: Nutch based directory and crawler based on keyword |
Mon, 10 Jul, 05:54 |
| AJ Chen |
Re: [Nutch-dev] Crawl error |
Mon, 10 Jul, 07:05 |
| Stefan Groschupf |
Re: [Nutch-dev] Crawl error |
Mon, 10 Jul, 07:08 |
| Andrzej Bialecki |
Re: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 07:37 |
| Andrzej Bialecki |
Re: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 07:42 |
| Doug Cutting |
Re: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 08:11 |
| Gal Nitzan |
RE: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 09:06 |
| Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-318) log4j not proper configured, readdb doesnt give any information |
Mon, 10 Jul, 19:07 |
| Mark Wilkerson |
Opportunities at Oracle Corporation - Oracle Enterprise Search |
Tue, 11 Jul, 05:42 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-172) Segment merger |
Tue, 11 Jul, 21:02 |
| Stefan Neufeind |
Know about Xapian-features |
Tue, 11 Jul, 23:24 |
| Stefan Neufeind |
Simultaneous update/search? |
Tue, 11 Jul, 23:36 |
| Sami Siren |
Re: Error with Hadoop-0.4.0 |
Wed, 12 Jul, 06:31 |
| Sami Siren |
Re: Error with Hadoop-0.4.0 |
Wed, 12 Jul, 07:06 |
| Doug Cutting |
Re: Error with Hadoop-0.4.0 |
Wed, 12 Jul, 08:17 |
| Stefan Neufeind |
Basic character-cleanups easily possible? |
Wed, 12 Jul, 22:23 |
| ogjunk-nu...@yahoo.com |
Re: Basic character-cleanups easily possible? |
Thu, 13 Jul, 00:11 |
| Enrico Triolo |
Re: Possible memory leak? |
Thu, 13 Jul, 11:29 |
| Sami Siren |
Re: Possible memory leak? |
Thu, 13 Jul, 11:35 |
| Stefan Groschupf |
OPICScoringFilter & Metadata transport scores as String |
Sat, 15 Jul, 22:36 |
| Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace |
Sat, 15 Jul, 22:45 |
| William Surowiec |
Possible problem in WebAppModule |
Mon, 17 Jul, 02:40 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-320) DmozParser does not output urls to stdout |
Mon, 17 Jul, 06:53 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-320) DmozParser does not output urls to stdout |
Mon, 17 Jul, 06:55 |
| William Surowiec |
[Re: Possible problem in WebAppModule] |
Mon, 17 Jul, 11:48 |
| Sami Siren |
Re: Possible problem in WebAppModule |
Mon, 17 Jul, 12:52 |
| William Surowiec |
Re: Possible problem in WebAppModule |
Mon, 17 Jul, 13:11 |
| Sudhi Seshachala |
Vertical Search (Nutch) for Opensource Jobs- http://www.myopensourcejobs.com |
Mon, 17 Jul, 13:21 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-321) Scoring API deficiency |
Mon, 17 Jul, 13:53 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-321) Scoring API deficiency |
Mon, 17 Jul, 14:08 |
| Kerry Wilson |
Windows BAT |
Mon, 17 Jul, 14:18 |
| Jukka Zitting |
Library for extracting text content from binaries |
Mon, 17 Jul, 21:59 |
| William Surowiec |
Re: Vertical Search (Nutch) for Opensource Jobs- http://www.myopensourcejobs.com |
Tue, 18 Jul, 13:20 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Tue, 18 Jul, 19:51 |
| Stefan Groschupf |
db.max.inlinks |
Tue, 18 Jul, 23:00 |
| Andrzej Bialecki |
Re: db.max.inlinks |
Tue, 18 Jul, 23:02 |
| Stefan Groschupf |
Re: db.max.inlinks |
Tue, 18 Jul, 23:09 |
| Andrzej Bialecki |
Re: db.max.inlinks |
Tue, 18 Jul, 23:19 |
| Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Wed, 19 Jul, 12:11 |
| Chris Stephens |
error in recommended plugin example |
Wed, 19 Jul, 17:24 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) |
Wed, 19 Jul, 17:34 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-271) Meta-data per URL/site/section |
Wed, 19 Jul, 18:22 |
| Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-271) Meta-data per URL/site/section |
Wed, 19 Jul, 18:54 |
| Sami Siren |
Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section |
Wed, 19 Jul, 20:12 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 20:53 |
| Sami Siren |
Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 21:22 |
| Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 21:32 |
| Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. |
Wed, 19 Jul, 21:39 |
| Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. |
Wed, 19 Jul, 21:41 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 22:06 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. |
Wed, 19 Jul, 22:33 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-321) Scoring API deficiency |
Wed, 19 Jul, 22:42 |
| Brian M.B. Keaney |
Webcrawler |
Wed, 19 Jul, 23:20 |
| Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored |
Wed, 19 Jul, 23:48 |
| Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored |
Wed, 19 Jul, 23:54 |
| Stefan Groschupf (JIRA) |
[jira] Resolved: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace |
Wed, 19 Jul, 23:56 |
| Enrico Triolo (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 09:50 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 10:06 |
| Enrico Triolo (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 12:31 |
| Stefan Groschupf |
nutch-extensionpoints not in plugin.includes |
Thu, 20 Jul, 20:09 |
| Andrzej Bialecki |
Re: nutch-extensionpoints not in plugin.includes |
Thu, 20 Jul, 20:56 |
| Stefan Groschupf |
Re: nutch-extensionpoints not in plugin.includes |
Thu, 20 Jul, 21:40 |
| Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-325) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes |
Thu, 20 Jul, 21:55 |
| Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-325) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes |
Thu, 20 Jul, 21:57 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 22:08 |
| Stefan Neufeind |
Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section |
Thu, 20 Jul, 22:33 |
| Stefan Groschupf |
log when blocked by robots.txt |
Thu, 20 Jul, 23:21 |
| Piotr Kosiorowski |
Re: log when blocked by robots.txt |
Fri, 21 Jul, 06:50 |
| Jack Tang |
Distributed Matrix Computering on Hadoop |
Fri, 21 Jul, 09:24 |
| Chris Stephens |
multiple query filters |
Fri, 21 Jul, 16:08 |
| Greg Kim |
Changing javac.version to 1.5? |
Fri, 21 Jul, 19:44 |
| Tom Jensen (JIRA) |
[jira] Created: (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents |
Fri, 21 Jul, 21:59 |
| Andrzej Bialecki |
Re: Changing javac.version to 1.5? |
Sat, 22 Jul, 18:45 |
| Sami Siren |
Re: 0.8 release |
Sat, 22 Jul, 20:15 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb |
Sun, 23 Jul, 18:22 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-327) bin/nutch setting of log path problems on cygwin |
Sun, 23 Jul, 18:30 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-327) bin/nutch setting of log path problems on cygwin |
Sun, 23 Jul, 18:45 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 |
Sun, 23 Jul, 18:56 |