Return-Path: Delivered-To: apmail-lucene-nutch-dev-archive@www.apache.org Received: (qmail 10120 invoked from network); 2 Feb 2007 13:29:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Feb 2007 13:29:07 -0000 Received: (qmail 60148 invoked by uid 500); 2 Feb 2007 13:29:12 -0000 Delivered-To: apmail-lucene-nutch-dev-archive@lucene.apache.org Received: (qmail 59936 invoked by uid 500); 2 Feb 2007 13:29:11 -0000 Mailing-List: contact nutch-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-dev@lucene.apache.org Delivered-To: mailing list nutch-dev@lucene.apache.org Received: (qmail 59925 invoked by uid 99); 2 Feb 2007 13:29:11 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Feb 2007 05:29:11 -0800 X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_SORBS_WEB X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [69.44.16.11] (HELO getopt.org) (69.44.16.11) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Feb 2007 05:29:01 -0800 Received: from [10.116.54.217] (inet20908na-1.eranet.pl [213.158.196.97]) (authenticated) by getopt.org (8.11.6/8.11.6) with ESMTP id l12DSiO15650 for ; Fri, 2 Feb 2007 07:28:45 -0600 Message-ID: <45C33C72.8000901@getopt.org> Date: Fri, 02 Feb 2007 14:28:18 +0100 From: Andrzej Bialecki User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: nutch-dev@lucene.apache.org Subject: Re: Generator.java bug? References: <001901c746c1$16b8a2e0$6403a8c0@GALTOP> In-Reply-To: <001901c746c1$16b8a2e0$6403a8c0@GALTOP> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Gal Nitzan wrote: > Hi, > > > > After many failures of generate "Generator: 0 records selected for fetching, > exiting ..." I made a post about it a few days back. > > > > I narrowed down to the following function: > > > > public Path generate(Path dbDir, Path segments, int numLists, long topN, > long curTime, boolean filter, boolean force) > > > > in the following if: if (readers == null || readers.length == 0 || > !readers[0].next(new FloatWritable())) > > > > > > It turns out that the: "!readers[0].next(new FloatWritable())" is the > culprit. > Well, this condition simply checks if the result is not empty. When we open Reader[] on a SequenceFile, each reader corresponds to a part-xxxxx. There must be at least one part, so we use the one at index 0. If we cannot retrieve at least one entry from it, then it logically follows that the file is empty, and we bail out. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com