| Enis Soztutar |
Re: proposal for committer |
Tue, 29 May, 12:39 |
| Gal Nitzan |
Site nightly API link is broken |
Sat, 12 May, 08:00 |
| Gal Nitzan |
RE: Site nightly API link is broken |
Sat, 12 May, 08:07 |
| Gal Nitzan |
proposal for committer |
Mon, 28 May, 12:32 |
| Gal Nitzan (JIRA) |
[jira] Created: (NUTCH-484) Nutch Nightly API link is broken in site |
Sat, 12 May, 09:01 |
| Gal Nitzan (JIRA) |
[jira] Updated: (NUTCH-484) Nutch Nightly API link is broken in site |
Sat, 12 May, 09:06 |
| Gal Nitzan (JIRA) |
[jira] Created: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sat, 12 May, 19:50 |
| Gal Nitzan (JIRA) |
[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sat, 12 May, 20:00 |
| Gal Nitzan (JIRA) |
[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sun, 13 May, 06:35 |
| Gal Nitzan (JIRA) |
[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sun, 13 May, 06:50 |
| Gal Nitzan (JIRA) |
[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sun, 13 May, 09:47 |
| Gal Nitzan (JIRA) |
[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object |
Sun, 13 May, 21:17 |
| Ilya Vishnevsky |
bug in SegmentReader |
Mon, 21 May, 08:42 |
| Ken Krugler (JIRA) |
[jira] Commented: (NUTCH-25) needs 'character encoding' detector |
Mon, 21 May, 18:01 |
| Manoharam Reddy |
how is crawl-urlfilter.txt taken care of? |
Wed, 09 May, 15:00 |
| Manoharam Reddy |
OutOfMemoryError - Why should the while(1) loop stop? |
Wed, 30 May, 14:55 |
| Manoharam Reddy |
What is parse-oo and why doesn't parsed PDF content show up in cached.jsp ? |
Thu, 31 May, 07:07 |
| Manoharam Reddy |
How is lib-http plugin called? It is not there in plugins.include! |
Thu, 31 May, 07:10 |
| Manoharam Reddy |
How to create patch? |
Fri, 01 Jun, 06:12 |
| Marcin Okraszewski |
Re: running nutch without http proxy |
Wed, 30 May, 06:03 |
| Marcin Okraszewski (JIRA) |
[jira] Created: (NUTCH-487) Neko HTML parser goes on default settings. |
Mon, 21 May, 14:06 |
| Marcin Okraszewski (JIRA) |
[jira] Updated: (NUTCH-487) Neko HTML parser goes on default settings. |
Mon, 21 May, 14:06 |
| Marcin Okraszewski (JIRA) |
[jira] Created: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) |
Tue, 22 May, 12:18 |
| Marcin Okraszewski (JIRA) |
[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) |
Tue, 22 May, 12:18 |
| Marcin Okraszewski (JIRA) |
[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch) |
Tue, 22 May, 12:20 |
| Mark Woon (JIRA) |
[jira] Created: (NUTCH-486) Break searcher dependency on commons-cli |
Mon, 14 May, 23:36 |
| Michael McIntosh |
Will any Nutch/Lucene folks be at the Enterprise Search Summit in week in New York? |
Fri, 11 May, 15:17 |
| Mike Brzozowski (JIRA) |
[jira] Commented: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs |
Thu, 10 May, 16:16 |
| Mike Brzozowski (JIRA) |
[jira] Updated: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs |
Thu, 10 May, 16:16 |
| Mike Brzozowski (JIRA) |
[jira] Updated: (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4)) |
Thu, 10 May, 16:18 |
| Mike Brzozowski (JIRA) |
[jira] Commented: (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4)) |
Thu, 10 May, 16:27 |
| Mike Schwartz |
Re: [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 |
Wed, 09 May, 13:36 |
| Mike Schwartz |
Re: [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 |
Thu, 10 May, 14:47 |
| Nuther |
Re: How to install Nutch on Freebsd? |
Mon, 07 May, 06:59 |
| Otis Gospodnetic |
IntelliJ & Eclipse Lucene code styles available |
Wed, 23 May, 06:20 |
| Ravi Chintakunta (JIRA) |
[jira] Created: (NUTCH-480) Searching multiple indexes with a single nutch instance |
Tue, 08 May, 01:11 |
| Ravi Chintakunta (JIRA) |
[jira] Updated: (NUTCH-480) Searching multiple indexes with a single nutch instance |
Tue, 08 May, 01:13 |
| Sami Siren |
Re: how is crawl-urlfilter.txt taken care of? |
Wed, 09 May, 17:58 |
| Sami Siren |
Re: svn commit: r536606 - in /lucene/nutch/trunk: ./ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/metadata/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/util/ src/plugin/creativecommons/src/test/org/creativecommons/nutch/ src/... |
Wed, 09 May, 18:21 |
| Sami Siren |
Re: Site nightly API link is broken |
Sat, 12 May, 08:04 |
| Sami Siren |
Re: Site nightly API link is broken |
Sat, 12 May, 08:31 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt |
Tue, 01 May, 09:03 |
| Sami Siren (JIRA) |
[jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 |
Wed, 09 May, 16:39 |
| Sami Siren (JIRA) |
[jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9 |
Wed, 09 May, 16:55 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains |
Wed, 09 May, 17:16 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file |
Wed, 09 May, 17:20 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-476) Would like to add a field to the document class for its MD5 signature |
Wed, 09 May, 17:42 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-456) parse msexcel plugin speedup |
Thu, 10 May, 16:16 |
| Sami Siren (JIRA) |
[jira] Assigned: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt |
Thu, 10 May, 16:18 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt |
Thu, 10 May, 16:32 |
| Sami Siren (JIRA) |
[jira] Commented: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file |
Sat, 12 May, 05:28 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-482) Remove redundant plugin lib-log4j |
Sat, 12 May, 07:54 |
| Sami Siren (JIRA) |
[jira] Created: (NUTCH-483) remove redundant commons-logging jar from ontology plugin |
Sat, 12 May, 07:56 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-484) Nutch Nightly API link is broken in site |
Sun, 13 May, 14:56 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-482) Remove redundant plugin lib-log4j |
Mon, 14 May, 14:38 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-483) remove redundant commons-logging jar from ontology plugin |
Mon, 14 May, 14:52 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-457) Create top level dist directory and checkin KEYS file to subversion be standard with Lucene Java and Hadoop |
Mon, 14 May, 15:16 |
| Sami Siren (JIRA) |
[jira] Updated: (NUTCH-161) Change Plain text parser to use parser.character.encoding.default property for fall back encoding |
Tue, 15 May, 18:32 |
| Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-161) Change Plain text parser to use parser.character.encoding.default property for fall back encoding |
Tue, 15 May, 18:32 |
| Trond Andersen (JIRA) |
[jira] Commented: (NUTCH-470) Adding optional terms to a query |
Wed, 09 May, 13:49 |
| Vadim Bauer (JIRA) |
[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. |
Tue, 22 May, 12:37 |
| Vadim Bauer (JIRA) |
[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. |
Fri, 25 May, 21:05 |
| Vadim Bauer (JIRA) |
[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. |
Fri, 25 May, 21:09 |
| Vikas |
Scope-based crawling and indexing |
Mon, 07 May, 12:47 |
| Yakn |
Get meta name="description" and other meta tags from Content |
Wed, 23 May, 15:02 |
| charlie wanek (JIRA) |
[jira] Created: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin |
Fri, 11 May, 18:24 |
| charlie wanek (JIRA) |
[jira] Updated: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin |
Fri, 11 May, 18:41 |
| chee.wu (JIRA) |
[jira] Created: (NUTCH-478) Add function for stopping FetherThread gracefully |
Sat, 05 May, 06:27 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #74 |
Thu, 03 May, 07:00 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #75 |
Fri, 04 May, 07:05 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #80 |
Wed, 09 May, 07:00 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #81 |
Thu, 10 May, 07:07 |
| hud...@lucene.zones.apache.org |
Build failed in Hudson: Nutch-Nightly #102 |
Thu, 31 May, 07:00 |
| hud...@lucene.zones.apache.org |
Hudson build is back to normal: Nutch-Nightly #103 |
Thu, 31 May, 16:56 |
| karthik085 |
Recrawl help |
Wed, 09 May, 19:41 |
| karthik085 |
NUTCH-348 and Nutch-0.7.2 |
Thu, 24 May, 14:01 |
| mr_max |
How to install Nutch on Freebsd? |
Mon, 07 May, 07:51 |
| mr_max |
Re: How to install Nutch on Freebsd? |
Mon, 07 May, 08:11 |
| mr_max |
Who of most pages indexed by means of it nutch and how many? |
Mon, 07 May, 08:17 |
| mr_max |
And where it is possible to esteem about all opportunities nutch? |
Mon, 07 May, 08:20 |
| mr_max |
=?UTF-8?Q?And_if_nutch_it_would_be_written?= =?UTF-8?Q?_on_With_=D0=A1++_worked_more_quickly=3F?= |
Mon, 07 May, 08:21 |
| nutch.newbie (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Thu, 10 May, 16:54 |
| nutch.newbie (JIRA) |
[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility |
Fri, 11 May, 09:52 |
| prem kumar |
running nutch without http proxy |
Tue, 29 May, 14:03 |
| rubdabadub |
Re: Issues pending before 0.9 release |
Thu, 17 May, 04:21 |
| rubdabadub |
Re: [jira] Resolved: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content |
Thu, 31 May, 08:04 |
| simon_ece |
Nutch - Filtering (REGEX) |
Thu, 03 May, 07:36 |
| wangxu (JIRA) |
[jira] Created: (NUTCH-493) contentType parse not correctly,,,,got empty content using readseg -get |
Wed, 30 May, 00:05 |