Mailing list archives: July 2007

Site index · List index
Message list« Previous · 1 · 2 · 3 · 4 · 5 · Next »Thread · Author · Date
Des Sant Dedup: delete from index(es) Tue, 24 Jul, 20:13
charlie w documents fetched but not indexed (Nutch 0.9) Tue, 24 Jul, 22:54
charlie w   Re: documents fetched but not indexed (Nutch 0.9) Wed, 25 Jul, 18:49
DS jha getting document link graph Tue, 24 Jul, 23:17
Brian Whitman   Re: getting document link graph Tue, 24 Jul, 23:20
Enis Soztutar   Re: getting document link graph Wed, 25 Jul, 06:21
Carl Cerecke NullPointerException fetching some sites with temp redirects Tue, 24 Jul, 23:52
Doğacan Güney   Re: NullPointerException fetching some sites with temp redirects Wed, 25 Jul, 06:08
Carl Cerecke     Re: NullPointerException fetching some sites with temp redirects Wed, 25 Jul, 20:48
Carl Cerecke     Re: NullPointerException fetching some sites with temp redirects Wed, 25 Jul, 22:40
Carl Cerecke   Re: NullPointerException fetching some sites with temp redirects Thu, 26 Jul, 23:21
Kai_testing Middleton   Re: NullPointerException fetching some sites with temp redirects Fri, 27 Jul, 00:10
Carl Cerecke     SOLVED? Re: NullPointerException fetching some sites with temp redirects Fri, 27 Jul, 01:41
Carl Cerecke       Re: SOLVED? Re: NullPointerException fetching some sites with temp redirects Fri, 27 Jul, 01:50
Doğacan Güney         Re: SOLVED? Re: NullPointerException fetching some sites with temp redirects Fri, 27 Jul, 05:52
kevin chen Inject error Wed, 25 Jul, 01:54
kevin chen   Re: Inject error Wed, 25 Jul, 02:14
kevin chen How to use automaton-urlfilter.txt Wed, 25 Jul, 02:25
Doğacan Güney   Re: How to use automaton-urlfilter.txt Wed, 25 Jul, 06:05
Anuradha doppalapudi Recrawling is not working in Nutch 0.9 Wed, 25 Jul, 06:48
bikram Nutch error /conf/masters: No such file or directory Wed, 25 Jul, 07:02
bikram   Re: Nutch error /conf/masters: No such file or directory Wed, 25 Jul, 08:27
Luca Rondanini slow generate process Wed, 25 Jul, 09:27
Doğacan Güney   Re: slow generate process Wed, 25 Jul, 11:00
Luca Rondanini     Re: slow generate process Wed, 25 Jul, 11:14
Emmanuel   Re: slow generate process Wed, 25 Jul, 12:03
Doğacan Güney     Re: slow generate process Wed, 25 Jul, 12:36
Emmanuel   Re: slow generate process Wed, 25 Jul, 12:52
Luca Rondanini     Re: slow generate process Wed, 25 Jul, 16:36
Doğacan Güney       Re: slow generate process Wed, 25 Jul, 17:29
Luca Rondanini         Re: slow generate process Thu, 26 Jul, 13:10
Doğacan Güney           Re: slow generate process Tue, 31 Jul, 07:42
Luca Rondanini             Re: slow generate process Tue, 31 Jul, 11:08
Robert Young Bad version number in .class file when injecting Wed, 25 Jul, 10:09
Robert Young   Re: Bad version number in .class file when injecting Wed, 25 Jul, 10:55
Robert Young Writing ScoringFilter plugins Wed, 25 Jul, 10:35
Emmanuel CrawlDbReader TopN Wed, 25 Jul, 11:50
Andrzej Bialecki   Re: CrawlDbReader TopN Wed, 25 Jul, 15:33
RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?)
Brette_M...@emc.com   RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?) Wed, 25 Jul, 12:28
Doğacan Güney     Re: RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?) Wed, 25 Jul, 12:44
Brette_M...@emc.com       RE: RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?) Wed, 25 Jul, 14:40
Doğacan Güney         Re: RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?) Wed, 25 Jul, 15:06
Brette_M...@emc.com           RE: RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?) Wed, 25 Jul, 17:08
Brette_M...@emc.com             RE: RE : Nutch overhead to Lucene (or: why is Nutch 4 times slower than Lucene ?) Thu, 26 Jul, 16:17
feran Point of Note to Windows Users Wed, 25 Jul, 15:13
Susam Pal   Re: Point of Note to Windows Users Thu, 26 Jul, 10:24
Kai_testing Middleton   Re: Point of Note to Windows Users Thu, 26 Jul, 17:18
Susam Pal     Re: Point of Note to Windows Users Thu, 26 Jul, 17:28
DES Lock obtain timed out Wed, 25 Jul, 20:38
Carl Cerecke Redirected-to pages and not-there pages are fetched multiple times Thu, 26 Jul, 04:07
Rüdiger Schulz (SkyGate)   Re: Redirected-to pages and not-there pages are fetched multiple times Thu, 26 Jul, 14:47
Carl Cerecke     Re: Redirected-to pages and not-there pages are fetched multiple times Thu, 26 Jul, 23:17
Kai_testing Middleton   Re: Redirected-to pages and not-there pages are fetched multiple times Fri, 27 Jul, 00:05
Anton Beza Pull out a page from already processed pages, re-parse and replace Thu, 26 Jul, 14:16
Andrzej Bialecki   Re: Pull out a page from already processed pages, re-parse and replace Thu, 26 Jul, 18:12
Anton Beza     Re: Pull out a page from already processed pages, re-parse and replace Fri, 27 Jul, 13:06
DS jha unable to open nutch index using IndexReader Thu, 26 Jul, 16:15
Kai_testing Middleton Multiple Nutch Instances Fri, 27 Jul, 01:04
Kai_testing Middleton DownloadingNutch - svn co nutch nightly Fri, 27 Jul, 03:41
Doğacan Güney   Re: DownloadingNutch - svn co nutch nightly Fri, 27 Jul, 06:00
Matthew A. Bockol eliminating almost duplicate URLs Fri, 27 Jul, 03:58
Kai_testing Middleton   Re: eliminating almost duplicate URLs Fri, 27 Jul, 05:27
Matthew A. Bockol     Re: eliminating almost duplicate URLs Mon, 30 Jul, 14:16
Doğacan Güney       Re: eliminating almost duplicate URLs Mon, 30 Jul, 14:54
Blaž Smolnikar Pages in UTF-16 Fri, 27 Jul, 06:32
Dmitry   search music, pdf files - configuration Fri, 27 Jul, 06:55
Susam Pal     Re: search music, pdf files - configuration Fri, 27 Jul, 07:24
Kai_testing Middleton cygwin - Input path doesnt exist Fri, 27 Jul, 06:56
Kai_testing Middleton   Re: cygwin - Input path doesnt exist Fri, 27 Jul, 07:33
Susam Pal     Re: cygwin - Input path doesnt exist Fri, 27 Jul, 07:56
feran   Re: cygwin - Input path doesnt exist Fri, 27 Jul, 13:20
Kai_testing Middleton   Re: cygwin - Input path doesnt exist Fri, 27 Jul, 23:00
feran     Re: cygwin - Input path doesnt exist Sat, 28 Jul, 17:06
Kai_testing Middleton   Re: cygwin - Input path doesnt exist Mon, 30 Jul, 16:24
Kai_testing Middleton cygwin and nightly builds Sat, 28 Jul, 01:17
Le Quoc Anh   Configuration for hadoop (5 computers) Sat, 28 Jul, 02:35
Enzo Michelangeli How to determine the number of pages in the index? Sat, 28 Jul, 09:30
DES   Re: How to determine the number of pages in the index? Sat, 28 Jul, 09:43
Enzo Michelangeli     Re: How to determine the number of pages in the index? Sat, 28 Jul, 10:59
Goethe Problems running crawl with cygwin, JAVA_HOME not set Sat, 28 Jul, 14:31
feran   Re: Problems running crawl with cygwin, JAVA_HOME not set Sat, 28 Jul, 15:50
Goethe     Re: Problems running crawl with cygwin, JAVA_HOME not set Sat, 28 Jul, 20:59
xu xiong online indexing? Sun, 29 Jul, 07:46
Renaud Richardet   Re: online indexing? Sun, 29 Jul, 15:42
Damian Florczyk     Re: online indexing? Mon, 30 Jul, 07:17
Emmanuel Map ouput Sun, 29 Jul, 08:52
Le Quoc Anh   error merger index Sun, 29 Jul, 09:14
Enzo Michelangeli     Re: error merger index Mon, 30 Jul, 00:05
Karsten Dello Fetching HTTPS behind Proxy fails - Patch exists, but is not included in 0.9 Sun, 29 Jul, 15:11
Goethe How do I remove ShowAllHits Mon, 30 Jul, 03:05
LE QuocAnh   Re: How do I remove ShowAllHits Mon, 30 Jul, 09:19
Susam Pal     Re: How do I remove ShowAllHits Mon, 30 Jul, 09:32
Micah Vivion Why does Nutch crawl keep on throwing an exception? Mon, 30 Jul, 08:01
DES   Re: Why does Nutch crawl keep on throwing an exception? Mon, 30 Jul, 18:30
Micah Vivion     Re: Why does Nutch crawl keep on throwing an exception? Mon, 30 Jul, 20:16
DES       Re: Why does Nutch crawl keep on throwing an exception? Mon, 30 Jul, 21:02
Micah Vivion         Re: Why does Nutch crawl keep on throwing an exception? Wed, 01 Aug, 02:09
Emmanuel MergeSegs Mon, 30 Jul, 12:28
Kai_testing Middleton How to create a wiki account for nutch-user Mon, 30 Jul, 22:36
Dmitry   Re: How to create a wiki account for nutch-user Mon, 30 Jul, 23:00
Kai_testing Middleton hung threads - NullPointerException in getPos(FSDataInputStream.java:87) Tue, 31 Jul, 00:40
LE QuocAnh   Re: hung threads - NullPointerException in getPos(FSDataInputStream.java:87) Tue, 31 Jul, 02:04
Dennis Kubes Really big indexing and timeouts? Tue, 31 Jul, 03:39
Doğacan Güney   Re: Really big indexing and timeouts? Tue, 31 Jul, 14:38
Dennis Kubes     Re: Really big indexing and timeouts? Tue, 31 Jul, 17:07
Kursun, Mahmut Error with Nutch 0.9 Tue, 31 Jul, 16:04
John Mendenhall   Re: Error with Nutch 0.9 Tue, 31 Jul, 16:13
charlie w spliting an index Tue, 31 Jul, 17:06
Kursun, Mahmut AW: Error with Nutch 0.9 Tue, 31 Jul, 17:14
John Mendenhall   Re: Error with Nutch 0.9 Tue, 31 Jul, 17:48
Kursun, Mahmut Tomcat without Apache Tue, 31 Jul, 17:21
Martin Kuen   Re: Tomcat without Apache Tue, 31 Jul, 17:32
Kai_testing Middleton   Re: Tomcat without Apache Tue, 31 Jul, 18:33
kevin chen     Re: Tomcat without Apache Wed, 01 Aug, 01:28
NutchBean (and mergecrawl.sh)
Kai_testing Middleton   NutchBean (and mergecrawl.sh) Tue, 31 Jul, 21:25
Kai_testing Middleton   NutchBean (and mergecrawl.sh) Wed, 01 Aug, 01:58
Message list« Previous · 1 · 2 · 3 · 4 · 5 · Next »Thread · Author · Date
Box list
Dec 200957
Nov 2009308
Oct 2009258
Sep 2009184
Aug 2009199
Jul 2009312
Jun 2009196
May 2009163
Apr 2009247
Mar 2009408
Feb 2009214
Jan 2009204
Dec 2008229
Nov 2008193
Oct 2008171
Sep 2008269
Aug 2008165
Jul 2008122
Jun 2008243
May 2008220
Apr 2008294
Mar 2008209
Feb 2008191
Jan 2008272
Dec 2007145
Nov 2007228
Oct 2007261
Sep 2007273
Aug 2007292
Jul 2007339
Jun 2007392
May 2007242
Apr 2007309
Mar 2007283
Feb 2007188
Jan 2007370
Dec 2006225
Nov 2006160
Oct 2006251
Sep 2006412
Aug 2006450
Jul 2006315
Jun 2006380
May 2006232
Apr 2006458
Mar 2006659
Feb 2006581
Jan 2006592
Dec 2005430
Nov 2005398
Oct 2005304
Sep 2005404
Aug 2005278
Jul 2005342
Jun 2005216
May 2005151
Apr 2005220
Mar 2005167