Mailing list archives: April 2009

Site index · List index
Message list« Previous · 1 · 2 · 3 · Next »Thread · Author · Date
Dennis Kubes Re: Is it possible to avoid Nutch 1.0 from indexing local directories ? Thu, 30 Apr, 13:42
Dennis Kubes Re: How to get the html that i crawled Thu, 30 Apr, 13:46
Dmitry Lihachev Re: how to restrict search result in defined domains? Wed, 22 Apr, 06:45
Fadzi Ushewokunze fetcher issues Mon, 13 Apr, 02:52
Fadzi Ushewokunze Re: fetcher issues Mon, 13 Apr, 03:33
Fadzi Ushewokunze Re: fetcher issues Mon, 13 Apr, 04:23
Felix Zimmermann What means "Ignoring position" using ArcSegmentCreator? Sat, 04 Apr, 10:55
Felix Zimmermann How to index segments after converted from Heritrix ARC-files. Thu, 16 Apr, 20:50
Felix Zimmermann Odd results and broken docs when indexing converted ARC-files. Fri, 17 Apr, 12:47
Felix Zimmermann Odd results and broken docs when indexing converted ARC-files (-> link to gif). Fri, 17 Apr, 12:54
Filipe Antunes Subcollections plugin not working Thu, 09 Apr, 14:49
Filipe Antunes Can't build Nutch Mon, 20 Apr, 10:00
Foss User Why 'crawl' is created in local directory instead of HDFS? Mon, 06 Apr, 18:42
Goddard, Michael J. Re: Can't build Nutch Mon, 20 Apr, 14:21
Gosavi.Shyam Spell checker in nutch 0.9 Fri, 17 Apr, 08:21
Grant Ingersoll Re: ebook resources - including lucene in action Mon, 20 Apr, 16:02
Grease How to ensure that a particular URL is not crawled (ever) again Thu, 16 Apr, 05:41
Hannu Väisänen Nutch can't find all files Fri, 03 Apr, 04:35
Hannu Väisänen Re: Problem with Crawler and Parent Directories Tue, 07 Apr, 04:16
Hannu Väisänen Re: Nutch can't find all files Wed, 08 Apr, 04:52
Hannu Väisänen Re: Nutch can't find all files Thu, 09 Apr, 04:42
Ian.huang Re: how to restrict search result in defined domains? Thu, 23 Apr, 08:50
Ilia chachkhunashvili getting WORDLIST Fri, 17 Apr, 19:35
Ilia chachkhunashvili way to get list of indexed URLS and list of words Mon, 20 Apr, 14:25
Jack Yu Re: nutch/hadoop performance and optimal configuration Fri, 03 Apr, 01:45
Jack Yu Re: nutch/hadoop performance and optimal configuration Fri, 03 Apr, 01:54
Jack Yu Re: nutch-1.0 distribution config problem Fri, 03 Apr, 10:15
Jason Todd Slack-Moehrle Nutch Crawling Questions Mon, 20 Apr, 23:10
Joel Halbert Unable to register IndexingFilter extesion plugin - N 0.9 Mon, 27 Apr, 17:40
Joel Halbert Re: Unable to register IndexingFilter extesion plugin - N 0.9 Tue, 28 Apr, 09:25
Joel Halbert N 0.9 - fetcher.threads.per.host Tue, 28 Apr, 16:34
Joel Halbert N 0.9 - fetcher.threads.per.host Tue, 28 Apr, 16:42
Joel Halbert Re: N 0.9 - fetcher.threads.per.host Tue, 28 Apr, 17:15
Joel Halbert Possible bug in when fetching page relative links after redirects - N 1.0. Wed, 29 Apr, 09:07
Joel Halbert Possible bug in when fetching relative links after a redirect - N 1.0 Wed, 29 Apr, 09:27
John Whelan Sizing Guide? Sat, 11 Apr, 21:46
John Whelan Nutch-based Application for Windows Sat, 18 Apr, 02:44
John Whelan Re: Nutch-based Application for Windows Sun, 19 Apr, 00:07
Julien Nioche Re: Problems with custom field query Wed, 15 Apr, 15:57
Justin Yao Re: crawl_parse keeps growing after re-crawling and segment merging Wed, 01 Apr, 14:38
Justin Yao Re: crawl_parse keeps growing after re-crawling and segment merging Wed, 08 Apr, 21:16
Justin Yao Re: crawl_parse keeps growing after re-crawling and segment merging Wed, 08 Apr, 22:53
Justin Yao Re: crawl_parse keeps growing after re-crawling and segment merging Thu, 09 Apr, 00:28
Justin Yao Re: crawl_parse keeps growing after re-crawling and segment merging Thu, 09 Apr, 01:43
Ken Krugler Re: The Future of Nutch Wed, 01 Apr, 14:42
Ken Krugler Re: Odd results and broken docs when indexing converted ARC-files. Fri, 17 Apr, 23:35
Ken Krugler Re: Can't build Nutch Mon, 20 Apr, 13:02
Ken Krugler Re: Nutch Crawling Questions Tue, 21 Apr, 00:46
Koch Martina AW: Problem with Crawler and Parent Directories Thu, 02 Apr, 15:40
Kunal Wku Multi-Lingual Support in Nutch Mon, 13 Apr, 15:30
Lauren Cooney Re: Seattle / PNW Hadoop + Lucene User Group? Tue, 21 Apr, 01:31
Lukas, Ray RE: ebook resources - including lucene in action Tue, 21 Apr, 11:49
Lukas, Ray Hadoop thread seems to remain alive Wed, 22 Apr, 20:30
Lukas, Ray RE: Hadoop thread seems to remain alive Thu, 23 Apr, 11:32
Lukas, Ray RE: Hadoop thread seems to remain alive Thu, 23 Apr, 13:20
Lukas, Ray RE: Hadoop thread seems to remain alive Thu, 23 Apr, 14:42
Lukas, Ray RE: Hadoop thread seems to remain alive Thu, 23 Apr, 14:47
Lukas, Ray Using nutchBean Thu, 23 Apr, 20:36
Lukas, Ray RE: Using nutchBean Thu, 23 Apr, 21:06
Lukas, Ray RE: Using nutchBean Thu, 23 Apr, 21:45
Lukas, Ray RE: Using nutchBean Thu, 23 Apr, 22:26
Lukas, Ray RE: Hadoop thread seems to remain alive Fri, 24 Apr, 11:54
Lukas, Ray RE: Hadoop thread seems to remain alive Fri, 24 Apr, 12:03
Lukas, Ray RE: Hadoop thread seems to remain alive Sat, 25 Apr, 21:53
Lyndon Maydwell Re: lukeall-0.9.1 to manually add indexes Wed, 01 Apr, 09:58
ML mail Dedup not working any more (Lock obtain timed out) Sun, 19 Apr, 07:53
Marc R. java.nio.charset.IllegalCharsetNameException: Fri, 10 Apr, 00:44
Matthew Hall Re: Seattle / PNW Hadoop + Lucene User Group? Mon, 20 Apr, 14:22
Mayank Kamthan Problem in compiling nutch 0.7 Fri, 03 Apr, 13:54
Mayank Kamthan Problem in generating the war file Mon, 27 Apr, 18:47
Mayank Kamthan Re: Problem in generating the war file Mon, 27 Apr, 21:38
Mayank Kamthan Adding a new class in Nutch and using it in a JSP Mon, 27 Apr, 21:46
MyD URL Scoring Fri, 24 Apr, 08:14
Niraj Aswani Null pointer exception Tue, 14 Apr, 14:18
Niraj Aswani null-pointer exception Tue, 14 Apr, 14:18
Quoi Nghia Chung RE: Seattle / PNW Hadoop + Lucene User Group? Sat, 18 Apr, 15:14
Rahil Baig General queries Thu, 30 Apr, 15:06
Saurabh Bhutyani =?UTF-8?B?UmU6ZWJvb2sgcmVzb3VyY2VzIC0gaW5jbHVkaW5nIGx1Y2VuZSBpbiBhY3Rpb24=?= Mon, 20 Apr, 05:58
Sherjeel Niazi How to resume crawler after crash Thu, 23 Apr, 15:02
Stevan Kovacevic Re: why nutch repeat fetching some pages Wed, 08 Apr, 11:53
Susam Pal Re: hi Kubes:the question about develop environment! Thu, 23 Apr, 13:10
Thorsten Scherler Re: The Future of Nutch Wed, 01 Apr, 00:28
Thorsten Scherler Re: The Future of Nutch Wed, 01 Apr, 00:59
Thorsten Scherler Re: The Future of Nutch Thu, 02 Apr, 12:47
Tushar Jain Re: Seattle / PNW Hadoop + Lucene User Group? Tue, 21 Apr, 06:00
Wolf Fischer Problem with Crawler and Parent Directories Thu, 02 Apr, 15:00
Wolf Fischer Problem with Crawler and Parent Directories Thu, 02 Apr, 15:23
Wolf Fischer Re: AW: Problem with Crawler and Parent Directories Tue, 07 Apr, 06:30
Zanzico Gioele nutch search score Fri, 17 Apr, 09:35
Zanzico Gioele nutch multiple site Fri, 17 Apr, 09:37
alx...@aim.com Re: lukeall-0.9.1 to manually add indexes Wed, 01 Apr, 04:42
alx...@aim.com Re: lukeall-0.9.1 to manually add indexes Wed, 01 Apr, 17:30
alx...@aim.com Re: nutch/hadoop performance and optimal configuration Fri, 03 Apr, 08:08
andy2005cst Re: Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing. Fri, 03 Apr, 09:06
askNutch hi Kubes:the question about develop environment! Wed, 22 Apr, 05:41
askNutch run nutch on eclipse problem? Thu, 23 Apr, 06:24
askNutch Re: hi Kubes:the question about develop environment! Thu, 23 Apr, 06:39
askNutch Re: run nutch on eclipse problem? Thu, 23 Apr, 09:48
brainstorm Re: AW: Nutch Training Seminar Wed, 22 Apr, 10:01
consultas Nutch 1.0 experience Wed, 01 Apr, 19:47
Message list« Previous · 1 · 2 · 3 · Next »Thread · Author · Date
Box list
Dec 2009103
Nov 2009308
Oct 2009258
Sep 2009184
Aug 2009199
Jul 2009312
Jun 2009196
May 2009163
Apr 2009247
Mar 2009408
Feb 2009214
Jan 2009204
Dec 2008229
Nov 2008193
Oct 2008171
Sep 2008269
Aug 2008165
Jul 2008122
Jun 2008243
May 2008220
Apr 2008294
Mar 2008209
Feb 2008191
Jan 2008272
Dec 2007145
Nov 2007228
Oct 2007261
Sep 2007273
Aug 2007292
Jul 2007339
Jun 2007392
May 2007242
Apr 2007309
Mar 2007283
Feb 2007188
Jan 2007370
Dec 2006225
Nov 2006160
Oct 2006251
Sep 2006412
Aug 2006450
Jul 2006315
Jun 2006380
May 2006232
Apr 2006458
Mar 2006659
Feb 2006581
Jan 2006592
Dec 2005430
Nov 2005398
Oct 2005304
Sep 2005404
Aug 2005278
Jul 2005342
Jun 2005216
May 2005151
Apr 2005220
Mar 2005167