Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 85461 invoked from network); 1 Mar 2006 23:25:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Mar 2006 23:25:30 -0000 Received: (qmail 98648 invoked by uid 500); 1 Mar 2006 23:26:16 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 98629 invoked by uid 500); 1 Mar 2006 23:26:16 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 98620 invoked by uid 99); 1 Mar 2006 23:26:16 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2006 15:26:16 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [207.115.57.74] (HELO ylpvm43.prodigy.net) (207.115.57.74) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2006 15:26:16 -0800 Received: from pimout7-ext.prodigy.net (pimout7-int.prodigy.net [207.115.4.147]) by ylpvm43.prodigy.net (8.12.10 outbound/8.12.10) with ESMTP id k21NPvRR011681 for ; Wed, 1 Mar 2006 18:25:57 -0500 X-ORBL: [69.228.231.148] Received: from [192.168.168.15] (adsl-69-228-231-148.dsl.pltn13.pacbell.net [69.228.231.148]) by pimout7-ext.prodigy.net (8.13.4 outbound domainkey aix/8.13.4) with ESMTP id k21NPnaW066438; Wed, 1 Mar 2006 18:25:51 -0500 Message-ID: <44062D7C.8030604@apache.org> Date: Wed, 01 Mar 2006 15:25:48 -0800 From: Doug Cutting User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051013) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: scalability limits getDetails, mapFile Readers? References: <0C02C8B2-4C2F-4E49-B5D8-C84075C9CFC0@media-style.com> In-Reply-To: <0C02C8B2-4C2F-4E49-B5D8-C84075C9CFC0@media-style.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Stefan, I think you meant to send this to nutch-dev, not hadoop-dev. Doug Stefan Groschupf wrote: > Hi, > > We run into a problem with nutch using MapFileOutputFormat#getReaders > and getEntry. > In detail this happens until summary generation where we open for each > segment as much readers as much parts (part-0000 to part-n) we have. > Having 80 tasktracker and 80 segments means: > 80 x 80 x 4 (parseData, parseText, content, crawl). A search server > also needs to open as much files as required for the index searcher. > So the problem is a FileNotFoundException, (Too many open files). > > Opening and closing Readers for each Detail makes no sense. We may can > limit the number of readers somehow and close the readers that wasn't > used since the longest time. > But I'm not that happy with this solution, so any thoughts how we can > solve this problem in general? > > Thanks. > Stefan > > > --------------------------------------------- > blog: http://www.find23.org > company: http://www.media-style.com > >