Mailing list archives: May 2007

Site index · List index
Message list« Previous · 1 · 2Thread · Author · Date
Enis Soztutar Re: proposal for committer Tue, 29 May, 12:39
Gal Nitzan Site nightly API link is broken Sat, 12 May, 08:00
Gal Nitzan RE: Site nightly API link is broken Sat, 12 May, 08:07
Gal Nitzan proposal for committer Mon, 28 May, 12:32
Gal Nitzan (JIRA) [jira] Created: (NUTCH-484) Nutch Nightly API link is broken in site Sat, 12 May, 09:01
Gal Nitzan (JIRA) [jira] Updated: (NUTCH-484) Nutch Nightly API link is broken in site Sat, 12 May, 09:06
Gal Nitzan (JIRA) [jira] Created: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object Sat, 12 May, 19:50
Gal Nitzan (JIRA) [jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object Sat, 12 May, 20:00
Gal Nitzan (JIRA) [jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object Sun, 13 May, 06:35
Gal Nitzan (JIRA) [jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object Sun, 13 May, 06:50
Gal Nitzan (JIRA) [jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object Sun, 13 May, 09:47
Gal Nitzan (JIRA) [jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object Sun, 13 May, 21:17
Ilya Vishnevsky bug in SegmentReader Mon, 21 May, 08:42
Ken Krugler (JIRA) [jira] Commented: (NUTCH-25) needs 'character encoding' detector Mon, 21 May, 18:01
Manoharam Reddy how is crawl-urlfilter.txt taken care of? Wed, 09 May, 15:00
Manoharam Reddy OutOfMemoryError - Why should the while(1) loop stop? Wed, 30 May, 14:55
Manoharam Reddy What is parse-oo and why doesn't parsed PDF content show up in cached.jsp ? Thu, 31 May, 07:07
Manoharam Reddy How is lib-http plugin called? It is not there in plugins.include! Thu, 31 May, 07:10
Manoharam Reddy How to create patch? Fri, 01 Jun, 06:12
Marcin Okraszewski Re: running nutch without http proxy Wed, 30 May, 06:03
Marcin Okraszewski (JIRA) [jira] Created: (NUTCH-487) Neko HTML parser goes on default settings. Mon, 21 May, 14:06
Marcin Okraszewski (JIRA) [jira] Updated: (NUTCH-487) Neko HTML parser goes on default settings. Mon, 21 May, 14:06
Marcin Okraszewski (JIRA) [jira] Created: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) Tue, 22 May, 12:18
Marcin Okraszewski (JIRA) [jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) Tue, 22 May, 12:18
Marcin Okraszewski (JIRA) [jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) Tue, 22 May, 12:20
Mark Woon (JIRA) [jira] Created: (NUTCH-486) Break searcher dependency on commons-cli Mon, 14 May, 23:36
Michael McIntosh Will any Nutch/Lucene folks be at the Enterprise Search Summit in week in New York? Fri, 11 May, 15:17
Mike Brzozowski (JIRA) [jira] Commented: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs Thu, 10 May, 16:16
Mike Brzozowski (JIRA) [jira] Updated: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs Thu, 10 May, 16:16
Mike Brzozowski (JIRA) [jira] Updated: (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4)) Thu, 10 May, 16:18
Mike Brzozowski (JIRA) [jira] Commented: (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4)) Thu, 10 May, 16:27
Mike Schwartz Re: [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 Wed, 09 May, 13:36
Mike Schwartz Re: [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 Thu, 10 May, 14:47
Nuther Re: How to install Nutch on Freebsd? Mon, 07 May, 06:59
Otis Gospodnetic IntelliJ & Eclipse Lucene code styles available Wed, 23 May, 06:20
Ravi Chintakunta (JIRA) [jira] Created: (NUTCH-480) Searching multiple indexes with a single nutch instance Tue, 08 May, 01:11
Ravi Chintakunta (JIRA) [jira] Updated: (NUTCH-480) Searching multiple indexes with a single nutch instance Tue, 08 May, 01:13
Sami Siren Re: how is crawl-urlfilter.txt taken care of? Wed, 09 May, 17:58
Sami Siren Re: svn commit: r536606 - in /lucene/nutch/trunk: ./ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/metadata/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/util/ src/plugin/creativecommons/src/test/org/creativecommons/nutch/ src/... Wed, 09 May, 18:21
Sami Siren Re: Site nightly API link is broken Sat, 12 May, 08:04
Sami Siren Re: Site nightly API link is broken Sat, 12 May, 08:31
Sami Siren (JIRA) [jira] Commented: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt Tue, 01 May, 09:03
Sami Siren (JIRA) [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 Wed, 09 May, 16:39
Sami Siren (JIRA) [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 Wed, 09 May, 16:55
Sami Siren (JIRA) [jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains Wed, 09 May, 17:16
Sami Siren (JIRA) [jira] Commented: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file Wed, 09 May, 17:20
Sami Siren (JIRA) [jira] Commented: (NUTCH-476) Would like to add a field to the document class for its MD5 signature Wed, 09 May, 17:42
Sami Siren (JIRA) [jira] Resolved: (NUTCH-456) parse msexcel plugin speedup Thu, 10 May, 16:16
Sami Siren (JIRA) [jira] Assigned: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt Thu, 10 May, 16:18
Sami Siren (JIRA) [jira] Resolved: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt Thu, 10 May, 16:32
Sami Siren (JIRA) [jira] Commented: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file Sat, 12 May, 05:28
Sami Siren (JIRA) [jira] Created: (NUTCH-482) Remove redundant plugin lib-log4j Sat, 12 May, 07:54
Sami Siren (JIRA) [jira] Created: (NUTCH-483) remove redundant commons-logging jar from ontology plugin Sat, 12 May, 07:56
Sami Siren (JIRA) [jira] Resolved: (NUTCH-484) Nutch Nightly API link is broken in site Sun, 13 May, 14:56
Sami Siren (JIRA) [jira] Resolved: (NUTCH-482) Remove redundant plugin lib-log4j Mon, 14 May, 14:38
Sami Siren (JIRA) [jira] Resolved: (NUTCH-483) remove redundant commons-logging jar from ontology plugin Mon, 14 May, 14:52
Sami Siren (JIRA) [jira] Resolved: (NUTCH-457) Create top level dist directory and checkin KEYS file to subversion be standard with Lucene Java and Hadoop Mon, 14 May, 15:16
Sami Siren (JIRA) [jira] Updated: (NUTCH-161) Change Plain text parser to use parser.character.encoding.default property for fall back encoding Tue, 15 May, 18:32
Sami Siren (JIRA) [jira] Resolved: (NUTCH-161) Change Plain text parser to use parser.character.encoding.default property for fall back encoding Tue, 15 May, 18:32
Trond Andersen (JIRA) [jira] Commented: (NUTCH-470) Adding optional terms to a query Wed, 09 May, 13:49
Vadim Bauer (JIRA) [jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. Tue, 22 May, 12:37
Vadim Bauer (JIRA) [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. Fri, 25 May, 21:05
Vadim Bauer (JIRA) [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. Fri, 25 May, 21:09
Vikas Scope-based crawling and indexing Mon, 07 May, 12:47
Yakn Get meta name="description" and other meta tags from Content Wed, 23 May, 15:02
charlie wanek (JIRA) [jira] Created: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin Fri, 11 May, 18:24
charlie wanek (JIRA) [jira] Updated: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin Fri, 11 May, 18:41
chee.wu (JIRA) [jira] Created: (NUTCH-478) Add function for stopping FetherThread gracefully Sat, 05 May, 06:27
hud...@lucene.zones.apache.org Build failed in Hudson: Nutch-Nightly #74 Thu, 03 May, 07:00
hud...@lucene.zones.apache.org Hudson build is back to normal: Nutch-Nightly #75 Fri, 04 May, 07:05
hud...@lucene.zones.apache.org Build failed in Hudson: Nutch-Nightly #80 Wed, 09 May, 07:00
hud...@lucene.zones.apache.org Hudson build is back to normal: Nutch-Nightly #81 Thu, 10 May, 07:07
hud...@lucene.zones.apache.org Build failed in Hudson: Nutch-Nightly #102 Thu, 31 May, 07:00
hud...@lucene.zones.apache.org Hudson build is back to normal: Nutch-Nightly #103 Thu, 31 May, 16:56
karthik085 Recrawl help Wed, 09 May, 19:41
karthik085 NUTCH-348 and Nutch-0.7.2 Thu, 24 May, 14:01
mr_max How to install Nutch on Freebsd? Mon, 07 May, 07:51
mr_max Re: How to install Nutch on Freebsd? Mon, 07 May, 08:11
mr_max Who of most pages indexed by means of it nutch and how many? Mon, 07 May, 08:17
mr_max And where it is possible to esteem about all opportunities nutch? Mon, 07 May, 08:20
mr_max =?UTF-8?Q?And_if_nutch_it_would_be_written?= =?UTF-8?Q?_on_With_=D0=A1++_worked_more_quickly=3F?= Mon, 07 May, 08:21
nutch.newbie (JIRA) [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility Thu, 10 May, 16:54
nutch.newbie (JIRA) [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility Fri, 11 May, 09:52
prem kumar running nutch without http proxy Tue, 29 May, 14:03
rubdabadub Re: Issues pending before 0.9 release Thu, 17 May, 04:21
rubdabadub Re: [jira] Resolved: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content Thu, 31 May, 08:04
simon_ece Nutch - Filtering (REGEX) Thu, 03 May, 07:36
wangxu (JIRA) [jira] Created: (NUTCH-493) contentType parse not correctly,,,,got empty content using readseg -get Wed, 30 May, 00:05
Message list« Previous · 1 · 2Thread · Author · Date
Box list
Dec 200932
Nov 2009154
Oct 200988
Sep 200932
Aug 200982
Jul 200977
Jun 200994
May 2009104
Apr 200985
Mar 2009255
Feb 2009250
Jan 2009197
Dec 2008130
Nov 2008117
Oct 200884
Sep 2008101
Aug 200858
Jul 200832
Jun 200893
May 200857
Apr 200878
Mar 2008152
Feb 2008189
Jan 2008151
Dec 200768
Nov 2007186
Oct 2007162
Sep 2007189
Aug 2007135
Jul 2007283
Jun 2007241
May 2007188
Apr 2007144
Mar 2007282
Feb 2007241
Jan 2007266
Dec 2006103
Nov 2006222
Oct 2006187
Sep 2006166
Aug 2006281
Jul 2006180
Jun 2006262
May 2006282
Apr 2006247
Mar 2006304
Feb 2006349
Jan 2006558
Dec 2005412
Nov 2005288
Oct 2005313
Sep 2005339
Aug 2005426
Jul 2005228
Jun 2005178
May 2005140
Apr 2005497
Mar 2005398
Feb 200510