Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 2427 invoked from network); 3 Apr 2008 14:37:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Apr 2008 14:37:19 -0000 Received: (qmail 74803 invoked by uid 500); 3 Apr 2008 14:37:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 74764 invoked by uid 500); 3 Apr 2008 14:37:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 74753 invoked by uid 99); 3 Apr 2008 14:37:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2008 07:37:12 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john.wang@gmail.com designates 72.14.220.153 as permitted sender) Received: from [72.14.220.153] (HELO fg-out-1718.google.com) (72.14.220.153) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2008 14:36:31 +0000 Received: by fg-out-1718.google.com with SMTP id d23so2842415fga.27 for ; Thu, 03 Apr 2008 07:36:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=1n0edRplN0YxwABnJF7NynhGhoyc5iRoT38J8MYvQk0=; b=XWMCoPp2hCOcVor9kNY/PEniAPQrYTBxX25oEdFKnFhRvuaOmnvg7RuWN4C+D1qRyXDveF71jk5Bm+P8beypJo/+Ssq7gN7bJpzHTY60QSwvGwYDS1J5Jf0FsLDXAfhaXcqrcE90BGsWY6caV+2f/oL1nyZxScjLJyBWFRznEpw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=NrC3xZOhoEBNzP29y4bxRpcmpR2Pjw2kvIFoD7g+OKtmEfRhHsMh3CswjM8WTSkdKlp/Nd8GDsYQyeNWL/vvGpOC8+tYpqug1pCaUOJk0+/YyCIYkHiNo9eg80TBO+/NvtOTNFmvWNMELd/rH3pahqrU1OtLfARPTGWg/7A1dkE= Received: by 10.86.72.15 with SMTP id u15mr7187809fga.21.1207233402121; Thu, 03 Apr 2008 07:36:42 -0700 (PDT) Received: by 10.86.82.10 with HTTP; Thu, 3 Apr 2008 07:36:42 -0700 (PDT) Message-ID: <8837fb770804030736w235452fcwde3f684d5722d105@mail.gmail.com> Date: Thu, 3 Apr 2008 07:36:42 -0700 From: "John Wang" To: java-user@lucene.apache.org Subject: Re: payload performance wrt fieldcache In-Reply-To: <8837fb770804030727w474f9a7epb4131bfc29716fa6@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1490_31027156.1207233402134" References: <8837fb770804030727w474f9a7epb4131bfc29716fa6@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_1490_31027156.1207233402134 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Sorry, gmail was screwy and accidentally sent the msg. Anyway, I have a large index, about 30M docs. I have a date field (by days) and there are about 1000 of them, every doc has a date field filled in. So out of curiosity I index the date field two ways: 1) using "date" as a field, and set the date value for each doc. 2) new term: "_payload:_val" and added the date (as a long or 8 byte array) into the payload of each doc. loading into an array long[] of length maxdoc of dates, the performance was surprising: using payload is 7 times slower than using fieldcache. At first I thought it was because of the conversion between byte[8] to a long for each doc, I changed it so it loads into byte[8*maxdoc] without doing the conversion, and the result is the same. I then did another experiment: lower the number of dates down to a small number, e.g. 50, and timed field cache load, and it took much longer than when it had 1000. I did some profiling and the profiler is pointing to TermPositions.next and TermPositions.nextPosition and TermPositions.getPayload as the culprit. I would think payload would always be faster. Any ideas? Thanks -John On Thu, Apr 3, 2008 at 7:27 AM, John Wang wrote: > Hi: > > ------=_Part_1490_31027156.1207233402134--