Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64418 invoked from network); 18 Nov 2007 23:10:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Nov 2007 23:10:36 -0000 Received: (qmail 60701 invoked by uid 500); 18 Nov 2007 23:10:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 60663 invoked by uid 500); 18 Nov 2007 23:10:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 60652 invoked by uid 99); 18 Nov 2007 23:10:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Nov 2007 15:10:18 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markrmiller@gmail.com designates 72.14.204.224 as permitted sender) Received: from [72.14.204.224] (HELO qb-out-0506.google.com) (72.14.204.224) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Nov 2007 23:10:03 +0000 Received: by qb-out-0506.google.com with SMTP id o21so731220qba for ; Sun, 18 Nov 2007 15:09:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=h9PHw5itBtTg+nU5sGz7xuLs7E23Wa9XlXLYa3nx7BU=; b=KnXaER8QOoG3pLg4mOOOkj8ucUZ9FIOaDB+REl2Dt9WhsF8sFmBKE0laREhFMkGzSqmKYE2vKT4XYkk+e66w+0QfHm8tvNLx+f8NT92qdS5zKizSfAkSSJKxN+oXCNqY0sA3XWOkp1hrpYEMhbt8SVN3B0oQWISkqtCe1kZhmpI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=EPGOZ3T9hHk7l5B/1kP0lVDDQuQp7COeLvJJ0uHCaT2TZ6aijl2Z6xquCacKEFZP/DMW0aAj2+PMljeDtqaYrrKCD1eAftnSPbkFDKQsRKtj6oN8tf/FXPZ/HRlZAWwOXOsz20Lcr9EgA7WEFD8ggeocmcLJihHv0HmaLlDSAwk= Received: by 10.65.103.14 with SMTP id f14mr9945676qbm.1195427395197; Sun, 18 Nov 2007 15:09:55 -0800 (PST) Received: from ?192.168.1.102? ( [69.124.234.183]) by mx.google.com with ESMTPS id f17sm3542726qba.2007.11.18.15.09.54 (version=SSLv3 cipher=RC4-MD5); Sun, 18 Nov 2007 15:09:54 -0800 (PST) Message-ID: <4740C637.2010809@gmail.com> Date: Sun, 18 Nov 2007 18:09:43 -0500 From: Mark Miller User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Time of processing hits.doc() References: <718789d20711181332l1eeac315v732a5c6438ad0939@mail.gmail.com> In-Reply-To: <718789d20711181332l1eeac315v732a5c6438ad0939@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hey Haroldo. First thing you need to do is *stop* using Hits in your searches. Hits is optimized for some pretty specific use cases and you will get along much better by using a HitCollector. Hits has three main functions: It caches documents, normalizes scores, and stores ids associated with scores (a HitDoc). If you attempt to retrieve a HitDoc past the first 100 from Hits, a new search will be issued to grab double the required HitDocs needed to satisfy your HitDoc retrieval attempt. This will be repeated everytime you ask for a HitDoc beyond the current cache (which began at 100). This means that if you need to get a HitDoc beyond 100, Hits is not a great choice for you. You will want to use the HitCollector instead...but remember that you are losing the normalized scores (simple to copy code if you still want it) and the document caching (I rarely want that anyway). An issue to watch out for: with Hits, you do not have to ask for how many docs to get back, but with a HitCollector solution you will need to. This is a minor dilema if you want to go over all of the hits no matter what. You can pass a huge number to ensure you get everything, but you will be creating large data structures if you do this, as structure sizes may be initialized by the number you pass. Also, passing the maximum integer will cause an error (negative init size) as Lucene initializes a data structure to hold the hits as n+1. - Mark Haroldo Nascimento wrote: > I have a problem of performance when I need group the result do search > > I have the code below: > > for (int i = 0; i < hits.length(); i++) { > doc = hits.doc(i); > > obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL); > obj2 = doc.get(xxx); > ... > } > > I work with volume of data very big. The search process in 0.300 > seconds but when the object hits have much results, the time for get > all objects is very big. The command hits.doc(i) is processed in 2 > second. > > Por exemplo. For hits.length() equals the 25.000 results, the time > of "pos search" is 7 seconds. > > I get all result because I need group the result (remove the > duplicate results). > > Is there any form in Lucene that group the result. I need of > anything as the command "group by" of sql. > > Thanks. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org