From java-user-return-21827-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Jul 07 10:14:35 2006 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 82801 invoked from network); 7 Jul 2006 10:14:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Jul 2006 10:14:35 -0000 Received: (qmail 93904 invoked by uid 500); 7 Jul 2006 10:14:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93864 invoked by uid 500); 7 Jul 2006 10:14:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93849 invoked by uid 99); 7 Jul 2006 10:14:28 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jul 2006 03:14:28 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of dominik@dbruhn.de designates 213.239.217.207 as permitted sender) Received: from [213.239.217.207] (HELO flacons.org) (213.239.217.207) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jul 2006 03:14:27 -0700 Received: from localhost (localhost [127.0.0.1]) by flacons.org (flacons.org) with ESMTP id 655FA368049 for ; Fri, 7 Jul 2006 12:14:06 +0200 (CEST) Received: from flacons.org ([127.0.0.1]) by localhost (debian [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 19621-06 for ; Fri, 7 Jul 2006 12:14:01 +0200 (CEST) Received: from localhost (dslb-084-057-177-196.pools.arcor-ip.net [84.57.177.196]) by flacons.org (flacons.org) with ESMTP id 11B9236801E for ; Fri, 7 Jul 2006 12:13:57 +0200 (CEST) From: Dominik Bruhn To: java-user@lucene.apache.org Subject: addIndexes getting slower and slower plus eating up Mem Date: Fri, 7 Jul 2006 12:13:55 +0200 User-Agent: KMail/1.9.3 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200607071213.55963.dominik@dbruhn.de> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at flacons.org X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hy, I use the following code to index about 1 Million Documents to a empty index: ============= private static void do_searchindex(Connection target) throws SQLException,IOException { int i=1164; PostIndexer.createIndexDir(); //Creates Index-Director IndexWriter fsWriter = new IndexWriter(PostIndexer.getIndexDir(), PostIndexer.getAnalyser(), false); while (do_searchindex(fsWriter,target,i)>0) { i++; } fsWriter.close(); } private static int do_searchindex(IndexWriter writer,Connection ctarget,int page) throws SQLException,IOException { ResultSet rs = ctarget.createStatement().executeQuery("SELECT postid,db_post.threadid,posttext,db_thread.threadtitle FROM db_post LEFT JOIN db_thread ON (db_thread.threadid=db_post.threadid) ORDER BY postid DESC LIMIT "+(page*500)+",500 ;"); int c=0; RAMDirectory ramDir = new RAMDirectory(); IndexWriter ramWriter = new IndexWriter(ramDir, PostIndexer.getAnalyser(), true); while (rs.next()) { PostIndexer.addToIndex(ramWriter,rs.getInt("postid"),rs.getString("posttext"),rs.getString("threadtitle")); c++; } writer.addIndexes(new Directory[] { ramDir }); ramWriter.close(); rs.close(); System.out.println("Did Page "+page); return(c); } ================= The Code for "PostIndex.addToIndex" is: =============== Document doc = new Document(); Field title = new Field("title",threadtitle,Field.Store.NO,Field.Index.TOKENIZED,Field.TermVector.NO); title.setBoost(2); doc.add(title); doc.add(new Field("text",posttext,Field.Store.NO,Field.Index.TOKENIZED,Field.TermVector.YES)); doc.add(new Field("id",""+postid,Field.Store.YES, Field.Index.UN_TOKENIZED)); writer.addDocument(doc); ============ When I run this code the first 500 Entries get added in about 2 seconds. But for the 1167*500 to (1167+1)*500 Entries it takes more than 10 Minutes. Also the RAM-Usage is increasing dramatically. Is this a normal behaviour, or is it a mistake in my code or is it a bug in Lucene? I remeber someone here on the list talking about this problem but cant find the post anymore. Thanks -- Dominik Bruhn mailto: dominik@dbruhn.de http://www.dbruhn.de --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org