nutch-user mailing list archives: June 2012

Site index · List index
Message list1 · 2 · Next »Thread · Author · Date
Re: Setting the Fetch time with a CustomFetchSchedule
Vikas Hazrati   Re: Setting the Fetch time with a CustomFetchSchedule Fri, 01 Jun, 17:46
Re: [VOTE] Apache Nutch 1.5 release-1.5RC4
Mattmann, Chris A (388J)   Re: [VOTE] Apache Nutch 1.5 release-1.5RC4 Sat, 02 Jun, 05:11
Lewis John Mcgibbney     [RESULT] [VOTE] Apache Nutch 1.5 release-1.5RC4 Thu, 07 Jun, 11:56
pepe3059 threads disminution when fetching page Sat, 02 Jun, 20:14
Markus Jelsma   RE: threads disminution when fetching page Mon, 04 Jun, 10:46
pepe3059     RE: threads disminution when fetching page Mon, 04 Jun, 18:41
Markus Jelsma       RE: threads disminution when fetching page Mon, 04 Jun, 21:07
pepe3059         RE: threads disminution when fetching page Wed, 06 Jun, 00:57
Markus Jelsma           RE: threads disminution when fetching page Wed, 06 Jun, 08:10
Shameema Umer How to configure nutch to fetch only recent documents Mon, 04 Jun, 06:32
Markus Jelsma   RE: How to configure nutch to fetch only recent documents Mon, 04 Jun, 10:43
Shameema Umer     Re: How to configure nutch to fetch only recent documents Mon, 04 Jun, 11:25
Questions about the "hostCount" and related variables in org.apache.nutch.crawl.Generator$Selector::reduce()
Ali Safdar Kureishy   Questions about the "hostCount" and related variables in org.apache.nutch.crawl.Generator$Selector::reduce() Mon, 04 Jun, 20:19
Re: "nutch-site.xml" not robust
Andy Xue   Re: "nutch-site.xml" not robust Wed, 06 Jun, 02:53
Lewis John Mcgibbney     Re: "nutch-site.xml" not robust Thu, 07 Jun, 11:28
Andy Xue       Re: "nutch-site.xml" not robust Sat, 09 Jun, 01:42
Andy Xue         Re: "nutch-site.xml" not robust Tue, 12 Jun, 06:25
Lewis John Mcgibbney           Re: "nutch-site.xml" not robust Tue, 12 Jun, 22:03
Andy Xue Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Wed, 06 Jun, 03:03
Markus Jelsma   RE: Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Wed, 06 Jun, 08:05
Andy Xue     Re: Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Wed, 06 Jun, 09:10
Markus Jelsma       RE: Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Wed, 06 Jun, 09:16
Lewis John Mcgibbney         Re: Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Thu, 07 Jun, 11:24
Andy Xue           Re: Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Tue, 12 Jun, 06:13
Sebastian Nagel       Re: Behaviour of "urlfilter-suffix" plug-in when dealing with a URL without filename extension Tue, 12 Jun, 21:32
chethan Nutch topN selection Wed, 06 Jun, 03:11
Markus Jelsma   RE: Nutch topN selection Wed, 06 Jun, 08:04
chethan     Re: Nutch topN selection Wed, 06 Jun, 08:15
Matthias Paul Linkdb empty Wed, 06 Jun, 07:46
Markus Jelsma   RE: Linkdb empty Wed, 06 Jun, 08:02
Matthias Paul     Re: Linkdb empty Wed, 06 Jun, 09:48
Markus Jelsma       RE: Linkdb empty Wed, 06 Jun, 09:59
Shameema Umer How to write complex rules on regex-urlfilter Wed, 06 Jun, 11:01
Markus Jelsma   RE: How to write complex rules on regex-urlfilter Wed, 06 Jun, 11:06
Shameema Umer     Re: How to write complex rules on regex-urlfilter Wed, 06 Jun, 11:16
SebaZ HTTP REFERER is missing Wed, 06 Jun, 11:36
Markus Jelsma   RE: HTTP REFERER is missing Wed, 06 Jun, 12:29
SebaZ     RE: HTTP REFERER is missing Wed, 20 Jun, 14:00
Markus Jelsma       RE: HTTP REFERER is missing Wed, 20 Jun, 22:48
SebaZ         RE: HTTP REFERER is missing Thu, 21 Jun, 08:38
Julien Nioche       Re: HTTP REFERER is missing Thu, 21 Jun, 07:14
SebaZ         Re: HTTP REFERER is missing Thu, 21 Jun, 11:13
Julien Nioche           Re: HTTP REFERER is missing Thu, 21 Jun, 11:49
SebaZ             Re: HTTP REFERER is missing Fri, 22 Jun, 07:57
kaveh minooie         getting reports from nutch Fri, 22 Jun, 07:03
Markus Jelsma           RE: getting reports from nutch Fri, 22 Jun, 07:20
Lewis John Mcgibbney             Re: getting reports from nutch Fri, 22 Jun, 08:28
SebaZ     RE: HTTP REFERER is missing Mon, 25 Jun, 09:28
Markus Jelsma       RE: HTTP REFERER is missing Mon, 25 Jun, 09:36
Ing. Eyeris Rodriguez Rueda how to crawl a specific time Wed, 06 Jun, 16:28
Shameema Umer can nutch crawl links in rss feed? Wed, 06 Jun, 17:14
Rémy Amouroux   Re: can nutch crawl links in rss feed? Wed, 06 Jun, 19:40
Shameema Umer     Re: can nutch crawl links in rss feed? Thu, 07 Jun, 10:14
Shameema Umer recrawl a certain site Thu, 07 Jun, 10:09
David MISTRETTA   Re: recrawl a certain site Thu, 07 Jun, 10:14
Lewis John Mcgibbney   Re: recrawl a certain site Thu, 07 Jun, 11:05
Shameema Umer     Re: recrawl a certain site Thu, 07 Jun, 16:37
Lewis John Mcgibbney       Re: recrawl a certain site Thu, 07 Jun, 16:41
Shameema Umer publishedDate and feed plugin Thu, 07 Jun, 10:41
Lewis John Mcgibbney   Re: publishedDate and feed plugin Thu, 07 Jun, 11:02
Shameema Umer     Re: publishedDate and feed plugin Fri, 08 Jun, 04:07
Lewis John Mcgibbney       Re: publishedDate and feed plugin Fri, 08 Jun, 13:18
Shameema Umer         Re: publishedDate and feed plugin Fri, 08 Jun, 17:32
Lewis John Mcgibbney           Re: publishedDate and feed plugin Sat, 09 Jun, 08:04
Shameema Umer             Re: publishedDate and feed plugin Sat, 09 Jun, 10:43
Shameema Umer               Re: publishedDate and feed plugin Wed, 13 Jun, 12:52
Shameema Umer                 Re: publishedDate and feed plugin Wed, 13 Jun, 12:58
Shameema Umer                   Re: publishedDate and feed plugin Thu, 14 Jun, 06:04
Lewis John Mcgibbney                     Re: publishedDate and feed plugin Thu, 14 Jun, 12:41
Shameema Umer                       Re: publishedDate and feed plugin Sat, 16 Jun, 10:11
david Utilisateurs Français Thu, 07 Jun, 10:43
chethan robots.txt UnknownHostException Thu, 07 Jun, 14:15
Markus Jelsma   RE: robots.txt UnknownHostException Thu, 07 Jun, 14:19
chethan     Re: robots.txt UnknownHostException Thu, 07 Jun, 14:28
Markus Jelsma       RE: robots.txt UnknownHostException Thu, 07 Jun, 14:37
Chethan Prasad   RE: robots.txt UnknownHostException Thu, 07 Jun, 14:48
Markus Jelsma     RE: robots.txt UnknownHostException Thu, 07 Jun, 14:57
chethan       Re: robots.txt UnknownHostException Thu, 07 Jun, 16:54
lewis john mcgibbney [ANNOUNCE] Apache Nutch 1.5 Released Thu, 07 Jun, 16:52
Markus Jelsma   RE: [ANNOUNCE] Apache Nutch 1.5 Released Thu, 07 Jun, 17:10
Julien Nioche   Re: [ANNOUNCE] Apache Nutch 1.5 Released Fri, 08 Jun, 08:22
Mattmann, Chris A (388J)     Re: [ANNOUNCE] Apache Nutch 1.5 Released Fri, 08 Jun, 15:02
Emre Çelikten Building Lucene index with Nutch 1.4 Thu, 07 Jun, 20:23
Markus Jelsma   RE: Building Lucene index with Nutch 1.4 Thu, 07 Jun, 20:27
Emre Çelikten     Re: Building Lucene index with Nutch 1.4 Thu, 07 Jun, 21:33
Emre Çelikten       Re: Building Lucene index with Nutch 1.4 Fri, 08 Jun, 03:22
Lewis John Mcgibbney         Re: Building Lucene index with Nutch 1.4 Fri, 08 Jun, 11:00
Emre Çelikten           Re: Building Lucene index with Nutch 1.4 Fri, 08 Jun, 15:42
Lewis John Mcgibbney             Re: Building Lucene index with Nutch 1.4 Sat, 09 Jun, 08:07
Shameema Umer not crawling external links Fri, 08 Jun, 06:39
Re: URL filtering and normalization
Bai Shen   Re: URL filtering and normalization Fri, 08 Jun, 13:53
Matthias Paul     Re: URL filtering and normalization Mon, 11 Jun, 06:55
Bai Shen       Re: URL filtering and normalization Mon, 11 Jun, 14:15
Bai Shen   Re: URL filtering and normalization Mon, 11 Jun, 14:19
remi tassing     Re: URL filtering and normalization Mon, 11 Jun, 22:50
lewis john mcgibbney VOTE Apache Nutch 2.0 RC1 Fri, 08 Jun, 14:49
abhishek tiwari Nutch hadoop integration Sat, 09 Jun, 15:36
Emre Çelikten   Re: Nutch hadoop integration Sat, 09 Jun, 19:02
abhishek tiwari     Re: Nutch hadoop integration Mon, 11 Jun, 05:30
Bharat Goyal       Re: Nutch hadoop integration Tue, 12 Jun, 08:43
Lewis John Mcgibbney         Re: Nutch hadoop integration Tue, 12 Jun, 12:13
chethan   Re: Nutch hadoop integration Mon, 11 Jun, 06:23
remi tassing Compilation of core classes Sun, 10 Jun, 09:35
Julien Nioche   Re: Compilation of core classes Sun, 10 Jun, 14:42
remi tassing     Re: Compilation of core classes Sat, 30 Jun, 12:37
sidbatra Nutch Parse Step Bafflingly Slow in Reduce Step [with example] Sun, 10 Jun, 22:17
Ali Safdar Kureishy Merging crawldbs and linkdbs during incremental crawl Mon, 11 Jun, 05:10
Ali Safdar Kureishy   Re: Merging crawldbs and linkdbs during incremental crawl Tue, 12 Jun, 08:40
Andy Xue Generator: 0 records selected for fetching, exiting ... Mon, 11 Jun, 08:59
Markus Jelsma   RE: Generator: 0 records selected for fetching, exiting ... Mon, 11 Jun, 09:22
Andy Xue     Re: Generator: 0 records selected for fetching, exiting ... Mon, 11 Jun, 09:42
Matthias Paul disable filtering and normalization in the crawl-tool Mon, 11 Jun, 10:08
remi tassing   Re: disable filtering and normalization in the crawl-tool Mon, 11 Jun, 22:52
Matthias Paul     Re: disable filtering and normalization in the crawl-tool Tue, 12 Jun, 13:56
Emre Çelikten Making the crawler follow a regular expression Mon, 11 Jun, 17:39
Lewis John Mcgibbney   Re: Making the crawler follow a regular expression Mon, 11 Jun, 19:08
Rémy Amouroux     Re: Making the crawler follow a regular expression Mon, 11 Jun, 19:35
Emre Çelikten       Re: Making the crawler follow a regular expression Wed, 13 Jun, 00:24
Sandeep C R Getting seed url Mon, 11 Jun, 18:09
Sebastian Nagel   Re: Getting seed url Mon, 11 Jun, 21:45
remi tassing     Re: Getting seed url Mon, 11 Jun, 22:45
Julien Nioche     Re: Getting seed url Tue, 12 Jun, 13:41
Julien Nioche       Re: Getting seed url Tue, 12 Jun, 13:42
Sebastian Nagel         Re: Getting seed url Tue, 12 Jun, 18:53
kaveh minooie           very long fetch reduce task Wed, 13 Jun, 00:31
Lewis John Mcgibbney             Re: very long fetch reduce task Wed, 13 Jun, 10:40
Ferdy Galema               Re: very long fetch reduce task Wed, 13 Jun, 11:36
Julien Nioche                 Re: very long fetch reduce task Wed, 13 Jun, 14:36
Ferdy Galema                   Re: very long fetch reduce task Wed, 13 Jun, 14:43
kaveh minooie                     Re: very long fetch reduce task Wed, 13 Jun, 17:27
Markus Jelsma                       RE: very long fetch reduce task Wed, 13 Jun, 17:33
kaveh minooie                         Re: very long fetch reduce task Wed, 13 Jun, 22:33
Re: ParseSegment taking a long time to finish
sidbatra   Re: ParseSegment taking a long time to finish Tue, 12 Jun, 00:36
Ali Safdar Kureishy How to ensure even distribution of the fetch phase across Hadoop nodes Tue, 12 Jun, 09:15
Lewis John Mcgibbney   Re: How to ensure even distribution of the fetch phase across Hadoop nodes Tue, 12 Jun, 12:06
Julien Nioche     Re: How to ensure even distribution of the fetch phase across Hadoop nodes Tue, 12 Jun, 13:56
Ali Safdar Kureishy       Re: How to ensure even distribution of the fetch phase across Hadoop nodes Tue, 12 Jun, 20:57
Ali Safdar Kureishy         Re: How to ensure even distribution of the fetch phase across Hadoop nodes Wed, 13 Jun, 12:52
Julien Nioche           Re: How to ensure even distribution of the fetch phase across Hadoop nodes Wed, 13 Jun, 14:33
Ali Safdar Kureishy             Re: How to ensure even distribution of the fetch phase across Hadoop nodes Wed, 13 Jun, 16:43
Ferdy Galema           Re: How to ensure even distribution of the fetch phase across Hadoop nodes Wed, 13 Jun, 14:35
Ali Safdar Kureishy             Re: How to ensure even distribution of the fetch phase across Hadoop nodes Thu, 14 Jun, 02:24
david Nutch name spyder Tue, 12 Jun, 12:36
Sebastian Nagel   Re: Nutch name spyder Tue, 12 Jun, 18:27
david     Re: Nutch name spyder Tue, 12 Jun, 18:43
Vlad Paunescu Nutch as a crawler Tue, 12 Jun, 14:01
Emre Çelikten   Re: Nutch as a crawler Tue, 12 Jun, 14:26
Vlad Paunescu     Re: Nutch as a crawler Fri, 15 Jun, 12:43
parnab kumar Restricting multiple hits from a site Tue, 12 Jun, 14:41
Magnús Skúlason focused crawl extended with user generated content Tue, 12 Jun, 15:56
Lewis John Mcgibbney   Re: focused crawl extended with user generated content Tue, 12 Jun, 21:38
Arkadi.Kosmy...@csiro.au   RE: focused crawl extended with user generated content Wed, 13 Jun, 01:00
Magnús Skúlason     Re: focused crawl extended with user generated content Sat, 16 Jun, 00:17
mhun...@jaydeonlineinc.com Inject using custom score and fetchInterval Tue, 12 Jun, 19:39
parnab kumar Restricting multiple hits from the same site Wed, 13 Jun, 18:09
Ali Safdar Kureishy Feedback on crawl settings Wed, 13 Jun, 19:51
lewis john mcgibbney [VOTE] Apache Nutch 2.0 RC2 Fri, 15 Jun, 12:48
Message list1 · 2 · Next »Thread · Author · Date
Box list
Apr 201479
Mar 2014228
Feb 2014149
Jan 2014109
Dec 2013193
Nov 2013164
Oct 2013207
Sep 201383
Aug 2013251
Jul 2013362
Jun 2013481
May 2013215
Apr 2013219
Mar 2013305
Feb 2013350
Jan 2013279
Dec 2012174
Nov 2012309
Oct 2012314
Sep 2012206
Aug 2012387
Jul 2012336
Jun 2012309
May 2012348
Apr 2012208
Mar 2012235
Feb 2012349
Jan 2012319
Dec 2011319
Nov 2011322
Oct 2011291
Sep 2011305
Aug 2011305
Jul 2011606
Jun 2011283
May 2011159
Apr 2011178
Mar 2011222
Feb 2011241
Jan 2011236
Dec 2010184
Nov 2010266
Oct 2010240
Sep 2010279
Aug 2010230
Jul 2010204
Jun 2010151
May 2010173
Apr 2010194
Mar 2010148
Feb 2010136
Jan 2010193
Dec 2009259
Nov 2009308
Oct 2009258
Sep 2009184
Aug 2009199
Jul 2009312
Jun 2009196
May 2009163
Apr 2009247
Mar 2009408
Feb 2009214
Jan 2009204
Dec 2008249
Nov 2008194
Oct 2008171
Sep 2008269
Aug 2008165
Jul 2008122
Jun 2008243
May 2008220
Apr 2008294
Mar 2008209
Feb 2008194
Jan 2008284
Dec 2007146
Nov 2007233
Oct 2007268
Sep 2007273
Aug 2007301
Jul 2007339
Jun 2007392
May 2007242
Apr 2007309
Mar 2007283
Feb 2007188
Jan 2007370
Dec 2006225
Nov 2006160
Oct 2006251
Sep 2006412
Aug 2006450
Jul 2006315
Jun 2006380
May 2006232
Apr 2006458
Mar 2006659
Feb 2006581
Jan 2006592
Dec 2005430
Nov 2005398
Oct 2005304
Sep 2005404
Aug 2005278
Jul 2005342
Jun 2005216
May 2005151
Apr 2005220
Mar 2005167