| Blaž Smolnikar |
Pages in UTF-16 |
Tue, 31 Jul, 11:23 |
| Doğacan Güney |
Re: OPIC scoring differences |
Mon, 09 Jul, 06:00 |
| Doğacan Güney |
Re: Nutch nightly build and NUTCH-505 draft patch |
Wed, 11 Jul, 06:55 |
| Doğacan Güney |
Re: OPIC scoring differences |
Wed, 11 Jul, 14:41 |
| Doğacan Güney |
Re: OOM error during parsing with nekohtml |
Tue, 17 Jul, 06:35 |
| Doğacan Güney |
Re: [jira] Commented: (NUTCH-527) MapWritable doesn't support all hadoops writable types |
Wed, 25 Jul, 18:05 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-509) Update Crawldb: avoid to start a job if there is no valid segment |
Mon, 09 Jul, 06:05 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-507) lib-lucene-analyzers jar defintion is wrong in plugin.xml |
Mon, 09 Jul, 06:18 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-507) lib-lucene-analyzers jar defintion is wrong in plugin.xml |
Mon, 09 Jul, 06:18 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-503) Generator exits incorrectly for small fetchlists |
Mon, 09 Jul, 06:48 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-505) Outlink urls should be validated |
Tue, 10 Jul, 12:42 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-505) Outlink urls should be validated |
Tue, 10 Jul, 19:12 |
| Doğacan Güney (JIRA) |
[jira] Issue Comment Edited: (NUTCH-505) Outlink urls should be validated |
Wed, 11 Jul, 06:30 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-505) Outlink urls should be validated |
Wed, 11 Jul, 10:56 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-506) Nutch should delegate compression to Hadoop |
Wed, 11 Jul, 12:04 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-510) IndexMerger delete working dir |
Wed, 11 Jul, 15:32 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-510) IndexMerger delete working dir |
Wed, 11 Jul, 15:32 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-506) Nutch should delegate compression to Hadoop |
Thu, 12 Jul, 08:49 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-505) Outlink urls should be validated |
Thu, 12 Jul, 12:17 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-505) Outlink urls should be validated |
Thu, 12 Jul, 12:40 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-505) Outlink urls should be validated |
Thu, 12 Jul, 15:09 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-513) suffix-urlfilter.txt does not have a template |
Thu, 12 Jul, 17:13 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-505) Outlink urls should be validated |
Thu, 12 Jul, 18:21 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-505) Outlink urls should be validated |
Fri, 13 Jul, 12:28 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-513) suffix-urlfilter.txt does not have a template |
Fri, 13 Jul, 12:35 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-513) suffix-urlfilter.txt does not have a template |
Fri, 13 Jul, 17:21 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-513) suffix-urlfilter.txt does not have a template |
Fri, 13 Jul, 17:23 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation |
Sat, 14 Jul, 09:32 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS |
Sat, 14 Jul, 12:10 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS |
Sat, 14 Jul, 12:12 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-515) Next fetch time is set incorrectly |
Mon, 16 Jul, 12:15 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-515) Next fetch time is set incorrectly |
Mon, 16 Jul, 12:17 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-439) Top Level Domains Indexing / Scoring |
Mon, 16 Jul, 12:28 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-515) Next fetch time is set incorrectly |
Mon, 16 Jul, 21:19 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-506) Nutch should delegate compression to Hadoop |
Mon, 16 Jul, 21:29 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-515) Next fetch time is set incorrectly |
Tue, 17 Jul, 06:21 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE |
Tue, 17 Jul, 13:53 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE |
Tue, 17 Jul, 14:19 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE |
Tue, 17 Jul, 14:28 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-506) Nutch should delegate compression to Hadoop |
Tue, 17 Jul, 15:18 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-506) Nutch should delegate compression to Hadoop |
Tue, 17 Jul, 15:20 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE |
Wed, 18 Jul, 08:16 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-517) build encoding should be UTF-8 |
Wed, 18 Jul, 18:00 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-517) build encoding should be UTF-8 |
Wed, 18 Jul, 18:01 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining |
Wed, 18 Jul, 18:05 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining |
Wed, 18 Jul, 18:05 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining |
Wed, 18 Jul, 18:40 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining |
Thu, 19 Jul, 06:26 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining |
Thu, 19 Jul, 06:30 |
| Doğacan Güney (JIRA) |
[jira] Created: (NUTCH-520) A common infrastructure for different index backends |
Thu, 19 Jul, 08:49 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-520) A common infrastructure for different index backends |
Thu, 19 Jul, 09:26 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-521) Modified injector to allow newly injected CrawlDatum to overwrite original |
Thu, 19 Jul, 10:47 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-522) Use URLValidator in the Injector |
Thu, 19 Jul, 13:55 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-522) Use URLValidator in the Injector |
Fri, 20 Jul, 07:52 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-522) Use URLValidator in the Injector |
Fri, 20 Jul, 09:13 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-520) A common infrastructure for different index backends |
Fri, 20 Jul, 12:11 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-25) needs 'character encoding' detector |
Sat, 21 Jul, 16:02 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-25) needs 'character encoding' detector |
Sat, 21 Jul, 16:04 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-25) needs 'character encoding' detector |
Sat, 21 Jul, 19:05 |
| Doğacan Güney (JIRA) |
[jira] Issue Comment Edited: (NUTCH-25) needs 'character encoding' detector |
Sat, 21 Jul, 19:11 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining |
Mon, 23 Jul, 08:59 |
| Doğacan Güney (JIRA) |
[jira] Assigned: (NUTCH-439) Top Level Domains Indexing / Scoring |
Mon, 23 Jul, 10:41 |
| Doğacan Güney (JIRA) |
[jira] Assigned: (NUTCH-522) Use URLValidator in the Injector |
Mon, 23 Jul, 10:41 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-520) A common infrastructure for different index backends |
Mon, 23 Jul, 10:57 |
| Doğacan Güney (JIRA) |
[jira] Issue Comment Edited: (NUTCH-520) A common infrastructure for different index backends |
Mon, 23 Jul, 15:00 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-524) Generate Problem with Single Node |
Tue, 24 Jul, 06:49 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment |
Tue, 24 Jul, 07:57 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment |
Tue, 24 Jul, 08:43 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment |
Tue, 24 Jul, 15:48 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb |
Wed, 25 Jul, 06:32 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-25) needs 'character encoding' detector |
Wed, 25 Jul, 07:27 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-524) Generate Problem with Single Node |
Wed, 25 Jul, 11:16 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-527) MapWritable doesn't support all hadoops writable types |
Wed, 25 Jul, 12:39 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-25) needs 'character encoding' detector |
Wed, 25 Jul, 17:57 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE |
Thu, 26 Jul, 08:37 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment |
Thu, 26 Jul, 08:54 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE |
Thu, 26 Jul, 12:55 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment |
Thu, 26 Jul, 12:55 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-439) Top Level Domains Indexing / Scoring |
Thu, 26 Jul, 12:58 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-522) Use URLValidator in the Injector |
Fri, 27 Jul, 08:32 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-522) Use URLValidator in the Injector |
Fri, 27 Jul, 08:39 |
| Doğacan Güney (JIRA) |
[jira] Issue Comment Edited: (NUTCH-522) Use URLValidator in the Injector |
Fri, 27 Jul, 13:10 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb |
Mon, 30 Jul, 10:41 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb |
Mon, 30 Jul, 10:48 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-531) Pages with no ContentType cause a Null Pointer exception |
Mon, 30 Jul, 10:54 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-533) LinkDbMerger: url normlaized is not updated in the key and inlinks list |
Mon, 30 Jul, 10:56 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Mon, 30 Jul, 11:00 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS |
Mon, 30 Jul, 11:02 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format |
Mon, 30 Jul, 11:06 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-529) NodeWalker.skipChildren don't wrok for more than 1 child. |
Mon, 30 Jul, 11:08 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child. |
Mon, 30 Jul, 18:59 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-533) LinkDbMerger: url normalized is not updated in the key and inlinks list |
Mon, 30 Jul, 18:59 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS |
Mon, 30 Jul, 19:03 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS |
Mon, 30 Jul, 19:03 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb |
Tue, 31 Jul, 06:06 |
| Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time |
Tue, 31 Jul, 06:15 |
| Doğacan Güney (JIRA) |
[jira] Resolved: (NUTCH-533) LinkDbMerger: url normalized is not updated in the key and inlinks list |
Tue, 31 Jul, 12:07 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-533) LinkDbMerger: url normalized is not updated in the key and inlinks list |
Tue, 31 Jul, 12:12 |
| Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-442) Integrate Solr/Nutch |
Tue, 31 Jul, 13:19 |
| Doğacan Güney (JIRA) |
[jira] Closed: (NUTCH-520) A common infrastructure for different index backends |
Tue, 31 Jul, 13:21 |