Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 86190 invoked from network); 27 Aug 2010 03:34:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Aug 2010 03:34:35 -0000 Received: (qmail 51119 invoked by uid 500); 27 Aug 2010 03:34:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 51041 invoked by uid 500); 27 Aug 2010 03:34:29 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 51032 invoked by uid 99); 27 Aug 2010 03:34:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Aug 2010 03:34:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fancyerii@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Aug 2010 03:34:21 +0000 Received: by wwd20 with SMTP id 20so551119wwd.5 for ; Thu, 26 Aug 2010 20:34:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=quULEmpi0jCa9vPOFzE9m+4EDa+FZMoySn6napwNgGo=; b=cQ7Q/dgjUyFVYEginy8YRt/iK6kylJK1cj0aL6lKs5yfpOEQ9pTonAxqyXh9p7o3xA 0VR36BxVGAIS0nCTKQi9XcS24mErEDyFDViOTZVTKdkwzSvQKvy3OESvVg5vVTytqu6P /p40Um1CW9TRKXtPiksn5YhCxhCyixzhWz6VA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=kql6AkYDSM7vbrrRl3+8/OZpGobEGQ9iNHs7+z8vxM4FgjYfIwgBVjgpWnWfO/Xr1L +JjvPxE0MqM9glWVvxMlvwWmZaJoCx2McegSE/drgCmSIxvOwdNupzEdZNXECGmGQ3IA PEcj2l0zGJY94zGWEJBOf9UUcafuiQoStwB9Y= MIME-Version: 1.0 Received: by 10.216.54.193 with SMTP id i43mr217474wec.95.1282880040384; Thu, 26 Aug 2010 20:34:00 -0700 (PDT) Received: by 10.216.169.132 with HTTP; Thu, 26 Aug 2010 20:34:00 -0700 (PDT) In-Reply-To: <8D02E901-B277-4CC9-8672-DC5C28F4F977@gmail.com> References: <8D02E901-B277-4CC9-8672-DC5C28F4F977@gmail.com> Date: Fri, 27 Aug 2010 11:34:00 +0800 Message-ID: Subject: Re: instantiated contrib From: Li Li To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable if I index only 7k documents, the time comparison: time1: 7602331019 time2: 4246878035 total1: 10736 total2: 7393 it seems II is faster than RAMDirectory. My indexed texts are all hotel names (chinese and english, litter french). it has about 100k terms. terms such as hotel is very frequent and hotel name is very rare(exception Hotel Chain). so I guess it's distribution is a litter term is very frequent and other term is very rare. 2010/8/27 Karl Wettin : > My mail client died while sending this mail.. Sorry for any duplicate. > > It is strange that it should take 20 second to gather fields, this is the > only thing that really suprises me. I'd expect it to be instant compared = to > RAMDirectory. It is hard to say from the information you provided. Did yo= u > perhaps lazy load field values from your RAMDirectory and not retrieve th= em, > or something like that? > > Why your queries are slow is also hard to say, there can be many reaons. = 70k > documents can be quite a few documents for II if they contain enough text= . > Here are a few questions that may or may not be helpful: > > What is the content of the documents? Do they contain a lot of the same > text? Or are they all rather unique? The major thing that makes II faster > than RAMDirectory is that it does not have to deserialize values from the > bytestream. As the index grows binary searching for documents containing = a > given term will start consume more time than deserializing the index. > > What speed do you see if you only load 10% (7k)? > > Did you see the graphics in the package level javadocs? > http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/store/insta= ntiated/package-summary.html > > > =A0 =A0 =A0 =A0karl > > > 26 aug 2010 kl. 09.24 skrev Li Li: > >> I have about 70k document, the total indexed size is about 15MB(the >> orginal text files' size). >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 dir=3Dnew RAMDirectory(); >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 IndexWriter write=3Dnew IndexWriter(dir,...; >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 for(loop){ >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0writer.addDocument(doc); >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >> =A0 =A0 =A0 =A0writer.optimize(); >> =A0 =A0 =A0 =A0writer.close(); >> >> =A0 =A0 =A0 =A0IndexReader ir=3DIndexReader.open(dir,true); >> =A0 =A0 =A0 =A0InstantiatedIndex ii=3Dnew InstantiatedIndex(ir); >> =A0 =A0 =A0 =A0InstantiatedIndexReader iir=3Dnew InstantiatedIndexReader= (ii); >> =A0 =A0 =A0 =A0is=3Dnew IndexSearcher(ir); >> =A0 =A0 =A0 =A0is2=3Dnew IndexSearcher(iir); >> >> =A0 =A0 =A0 =A0 =A0 =A0 I calculate the time by: >> =A0 =A0 =A0 =A0long searchStart=3DSystem.nanoTime(); >> =A0 =A0 =A0 =A0TopDocs docs=3Dis.search(bQuery,Integer.MAX_VALUE); >> =A0 =A0 =A0 =A0long searchEnd=3DSystem.nanoTime(); >> >> =A0 =A0 =A0 =A0 =A0 =A0I searched 10,000 documents and the time of RAMDi= rectory >> and instantiated >> =A0 =A0 =A0 =A0 =A0 =A0the time used is time1: 21s(21812978000 ns) time2= : >> 20s(20713817000 ns) >> =A0 =A0 =A0 =A0 =A0 =A0I also calulate the time including get field valu= e: >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 total1: 23852ms total2: 22610ms >> =A0 =A0 =A0 =A0 =A0 it seems instantiated is not much faster than >> RAMDirectory. Is there any thing wrong I used? my max memory is 4GB >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: dev-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org