Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 62901 invoked from network); 3 Apr 2008 15:21:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Apr 2008 15:21:20 -0000 Received: (qmail 98756 invoked by uid 500); 3 Apr 2008 15:21:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 98731 invoked by uid 500); 3 Apr 2008 15:21:13 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 98713 invoked by uid 99); 3 Apr 2008 15:21:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2008 08:21:13 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.wang@gmail.com designates 72.14.220.159 as permitted sender) Received: from [72.14.220.159] (HELO fg-out-1718.google.com) (72.14.220.159) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2008 15:20:19 +0000 Received: by fg-out-1718.google.com with SMTP id d23so2857471fga.27 for ; Thu, 03 Apr 2008 08:20:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=q/DQkB4iSXNUxexDLXmImUmyZr2Wev7xl/pIpQOMZLk=; b=kXyo+OxkFzlerlUKBqZFUzbHNN9WLxxpbGEP4XsTx9k9qKlbrL3YqHDV0u0KhTcI4+KfFBpSTSBcLFpBGmM3ksW39b1iR0s5oYqyZweKzQbNONwDScafIK900lcWQ54WmvQPSIVPG2aYjLYL0KOm3IEzQTGD5TCOrIrQf70AwJ8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=hSwL11hxnsURPD7DXS+02wshIpA8uhlNYNDhav7L6hG9oxBtgfqROa9Re6KNXFj6di8Z2c6HAyD6bKs/SbL/E8MLZ9NDZZfRoef4RzCHTNi/2mKa9rfjQ3gn4XN4fp9lgx+bbZCX+QbinNwEXWhNBxLwRk4pT8suo2bThgegplw= Received: by 10.86.51.2 with SMTP id y2mr7206363fgy.50.1207236039094; Thu, 03 Apr 2008 08:20:39 -0700 (PDT) Received: by 10.86.82.10 with HTTP; Thu, 3 Apr 2008 08:20:39 -0700 (PDT) Message-ID: <8837fb770804030820o7f56fa0cq9b2114d60a75e5c6@mail.gmail.com> Date: Thu, 3 Apr 2008 08:20:39 -0700 From: "John Wang" To: java-user@lucene.apache.org Subject: Re: payload performance wrt fieldcache In-Reply-To: <6e3ae6310804030816l1f83b5c9pa231ec46abf9fa5b@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1640_8691189.1207236039103" References: <8837fb770804030727w474f9a7epb4131bfc29716fa6@mail.gmail.com> <8837fb770804030736w235452fcwde3f684d5722d105@mail.gmail.com> <6e3ae6310804030816l1f83b5c9pa231ec46abf9fa5b@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_1640_8691189.1207236039103 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I am loading both from disk. But I found the culprit: My code: while (tp.next()) { //assert tp.doc() < maxDoc; tp.nextPosition(); <-- this call is the problem tp.getPayload(payloadBuffer, 0); byter.load(_array, tp.doc(), payloadBuffer); } The way I stored it, there is one position per doc. Removed call to tp.nextPosition, performance improved by a factor of multiple digits. I would think this call should be free. Thanks -John On Thu, Apr 3, 2008 at 8:16 AM, Chris Lu wrote: > If your index size grows larger, payload method would be more slower. > It's because Payload are read from hard disk. Fieldcache is in the > memory, which is much faster. > > Unless you are going with Solid State Disk, you'd better go with > Fieldcache for faster search. > > -- > Chris Lu > ------------------------- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) > got 2.6 Million Euro funding! > > > On Thu, Apr 3, 2008 at 7:36 AM, John Wang wrote: > > Sorry, gmail was screwy and accidentally sent the msg. > > Anyway, > > > > I have a large index, about 30M docs. > > I have a date field (by days) and there are about 1000 of them, every > doc > > has a date field filled in. > > > > So out of curiosity I index the date field two ways: > > 1) using "date" as a field, and set the date value for each doc. > > 2) new term: "_payload:_val" and added the date (as a long or 8 byte > array) > > into the payload of each doc. > > > > loading into an array long[] of length maxdoc of dates, the performance > was > > surprising: > > using payload is 7 times slower than using fieldcache. > > > > At first I thought it was because of the conversion between byte[8] to > a > > long for each doc, I changed it so it loads into byte[8*maxdoc] without > > doing the conversion, and the result is the same. > > > > I then did another experiment: > > lower the number of dates down to a small number, e.g. 50, and timed > field > > cache load, and it took much longer than when it had 1000. > > > > I did some profiling and the profiler is pointing to TermPositions.next > > and TermPositions.nextPosition and TermPositions.getPayload as the > culprit. > > > > I would think payload would always be faster. Any ideas? > > > > Thanks > > -John > > > > On Thu, Apr 3, 2008 at 7:27 AM, John Wang wrote: > > > > > Hi: > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_1640_8691189.1207236039103--