| NIDHI MALIK |
nutch internet crawling help |
Fri, 28 Dec, 11:28 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #312 |
Tue, 01 Jan, 04:19 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #313 |
Wed, 02 Jan, 04:24 |
| Emmanuel Joke (JIRA) |
[jira] Created: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation |
Wed, 02 Jan, 08:54 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation |
Wed, 02 Jan, 08:58 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #314 |
Wed, 02 Jan, 16:05 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #315 |
Wed, 02 Jan, 19:38 |
| Frank McCown |
Student contributions |
Wed, 02 Jan, 22:44 |
| jian chen |
Re: Student contributions |
Wed, 02 Jan, 22:49 |
| Chris Mattmann |
Re: Student contributions |
Thu, 03 Jan, 01:43 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #316 |
Thu, 03 Jan, 04:42 |
| Frank McCown |
Re: Student contributions |
Thu, 03 Jan, 15:29 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #317 |
Fri, 04 Jan, 05:44 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Fri, 04 Jan, 07:31 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-580) Remove deprecated hadoop api calls (FS) |
Fri, 04 Jan, 07:37 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-531) Pages with no ContentType cause a Null Pointer exception |
Fri, 04 Jan, 07:57 |
| Emmanuel Joke (JIRA) |
[jira] Issue Comment Edited: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Fri, 04 Jan, 09:55 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS |
Fri, 04 Jan, 11:04 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Fri, 04 Jan, 19:51 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin |
Fri, 04 Jan, 19:51 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication |
Fri, 04 Jan, 19:53 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-561) HttpClient plugin does not work with NTLM authentication |
Fri, 04 Jan, 19:53 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Fri, 04 Jan, 19:53 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit |
Fri, 04 Jan, 19:53 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation |
Fri, 04 Jan, 19:59 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server |
Sat, 05 Jan, 05:47 |
| Dawid Weiss (JIRA) |
[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup. |
Sat, 05 Jan, 17:38 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup. |
Sat, 05 Jan, 23:02 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #319 |
Sun, 06 Jan, 04:27 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #320 |
Mon, 07 Jan, 04:29 |
| Emmanuel Joke (JIRA) |
[jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation |
Mon, 07 Jan, 07:37 |
| Chris Mattmann |
Tika 0.1-incubating released |
Mon, 07 Jan, 18:00 |
| sudarat (JIRA) |
[jira] Created: (NUTCH-599) nutch crawl and index problem |
Tue, 08 Jan, 01:46 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #321 |
Tue, 08 Jan, 04:46 |
| Susam Pal |
Re: [jira] Created: (NUTCH-599) nutch crawl and index problem |
Tue, 08 Jan, 04:51 |
| Susam Pal |
Re: [jira] Created: (NUTCH-599) nutch crawl and index problem |
Tue, 08 Jan, 04:57 |
| Carl Cerecke (JIRA) |
[jira] Commented: (NUTCH-531) Pages with no ContentType cause a Null Pointer exception |
Tue, 08 Jan, 05:58 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-599) nutch crawl and index problem |
Tue, 08 Jan, 07:44 |
| Jesiel Trevisan |
Problems with Hadhoop Log4J on Nutch 0.8.1 |
Tue, 08 Jan, 18:01 |
| sudarat (JIRA) |
[jira] Created: (NUTCH-600) Nutch index problem |
Wed, 09 Jan, 04:54 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #322 |
Wed, 09 Jan, 05:31 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #323 |
Wed, 09 Jan, 20:40 |
| tigger . |
nutch and future |
Thu, 10 Jan, 16:34 |
| Dennis Kubes |
Re: nutch and future |
Thu, 10 Jan, 17:08 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-534) SegmentMerger: add -normalize option |
Fri, 11 Jan, 12:02 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format |
Fri, 11 Jan, 12:04 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-600) Nutch index problem |
Fri, 11 Jan, 18:03 |
| viz |
setting number of reduce outputs problem |
Sat, 12 Jan, 00:05 |
| Bryan Bishop |
Plugins? |
Sat, 12 Jan, 01:37 |
| Bryan Bishop |
Re: Plugins? |
Sat, 12 Jan, 01:48 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation |
Sat, 12 Jan, 08:44 |
| Andrzej Bialecki |
Re: setting number of reduce outputs problem |
Sat, 12 Jan, 13:15 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-534) SegmentMerger: add -normalize option |
Tue, 15 Jan, 17:55 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-534) SegmentMerger: add -normalize option |
Tue, 15 Jan, 17:55 |
| Chris Chiappone (JIRA) |
[jira] Commented: (NUTCH-368) Message queueing system |
Tue, 15 Jan, 21:09 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format |
Tue, 15 Jan, 22:03 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format |
Tue, 15 Jan, 22:05 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS |
Tue, 15 Jan, 22:27 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. |
Tue, 15 Jan, 22:39 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. |
Tue, 15 Jan, 22:41 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-594) Serve Nutch search results in XML and JSON |
Tue, 15 Jan, 22:49 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED |
Tue, 15 Jan, 23:01 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point |
Tue, 15 Jan, 23:11 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-584) urls missing from fetchlist |
Wed, 16 Jan, 01:09 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-584) urls missing from fetchlist |
Wed, 16 Jan, 01:11 |
| Andrzej Bialecki |
Serious bug in Generator / FreeGenerator |
Wed, 16 Jan, 01:15 |
| iwan cornelius (JIRA) |
[jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice |
Wed, 16 Jan, 06:57 |
| Emmanuel Joke (JIRA) |
[jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice |
Wed, 16 Jan, 07:27 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. |
Wed, 16 Jan, 08:15 |
| Hudson (JIRA) |
[jira] Commented: (NUTCH-534) SegmentMerger: add -normalize option |
Wed, 16 Jan, 08:15 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-584) urls missing from fetchlist |
Wed, 16 Jan, 08:43 |
| Ruslan Ermilov (JIRA) |
[jira] Commented: (NUTCH-584) urls missing from fetchlist |
Wed, 16 Jan, 16:40 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-584) urls missing from fetchlist |
Wed, 16 Jan, 16:54 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-584) urls missing from fetchlist |
Wed, 16 Jan, 16:54 |
| Krishnamohan Meduri |
Help: parsing pdf files |
Wed, 16 Jan, 20:31 |
| Martin Kuen |
Re: Help: parsing pdf files |
Thu, 17 Jan, 00:07 |
| Manoj Bist |
Need pointers regarding accessing crawled data/customizing policy for crawl. |
Thu, 17 Jan, 07:32 |
| Andrzej Bialecki |
Re: Need pointers regarding accessing crawled data/customizing policy for crawl. |
Thu, 17 Jan, 09:35 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #331 |
Thu, 17 Jan, 16:34 |
| Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-570) Improvement of URL Ordering in Generator.java |
Thu, 17 Jan, 20:20 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml |
Thu, 17 Jan, 20:28 |
| Andrzej Bialecki (JIRA) |
[jira] Resolved: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small |
Thu, 17 Jan, 20:28 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml |
Thu, 17 Jan, 20:28 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small |
Thu, 17 Jan, 20:28 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-159) Specify temp/working directory for crawl |
Thu, 17 Jan, 20:30 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-159) Specify temp/working directory for crawl |
Thu, 17 Jan, 20:30 |
| Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments |
Thu, 17 Jan, 20:32 |
| Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-95) DeleteDuplicates depends on the order of input segments |
Thu, 17 Jan, 20:32 |
| Andrzej Bialecki |
End-Of-Life status for 0.7.x? |
Thu, 17 Jan, 20:38 |
| Dennis Kubes |
Re: End-Of-Life status for 0.7.x? |
Thu, 17 Jan, 20:49 |
| Yousef Ourabi |
Re: End-Of-Life status for 0.7.x? |
Thu, 17 Jan, 21:18 |
| Chris Mattmann |
Re: End-Of-Life status for 0.7.x? |
Fri, 18 Jan, 00:29 |
| Ahmad Dahlan |
New Developer |
Fri, 18 Jan, 01:53 |
| Sami Siren |
Re: End-Of-Life status for 0.7.x? |
Fri, 18 Jan, 04:22 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #332 |
Fri, 18 Jan, 06:00 |
| Cuong Le Manh |
Re: End-Of-Life status for 0.7.x? |
Fri, 18 Jan, 08:24 |
| Jérôme Charron |
Re: End-Of-Life status for 0.7.x? |
Fri, 18 Jan, 08:25 |
| Doğacan Güney |
Re: End-Of-Life status for 0.7.x? |
Fri, 18 Jan, 09:17 |
| Andrzej Bialecki |
NOTICE: End Of Life status for Nutch 0.7.x |
Fri, 18 Jan, 09:52 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-580) Remove deprecated hadoop api calls (FS) |
Sat, 19 Jan, 09:01 |