Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 36243 invoked from network); 19 Aug 2004 18:03:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 19 Aug 2004 18:03:58 -0000 Received: (qmail 94222 invoked by uid 500); 19 Aug 2004 18:03:39 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 94139 invoked by uid 500); 19 Aug 2004 18:03:38 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 94078 invoked by uid 99); 19 Aug 2004 18:03:37 -0000 X-ASF-Spam-Status: No, hits=2.7 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,FROM_ENDS_IN_NUMS X-Spam-Check-By: apache.org Received: from [63.196.96.150] (HELO sr16sc81.INFXINC.NET) (63.196.96.150) by apache.org (qpsmtpd/0.27.1) with ESMTP; Thu, 19 Aug 2004 11:03:34 -0700 Received: from WK16SC20 ([10.16.4.98]) by sr16sc81.INFXINC.NET with Microsoft SMTPSVC(5.0.2195.5329); Thu, 19 Aug 2004 11:03:38 -0700 Message-ID: <05e101c48616$daac3d60$6204100a@INFXINC.NET> From: "Rob Jose" To: "Lucene Users List" References: Subject: Re: Index Size Date: Thu, 19 Aug 2004 11:03:38 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 X-OriginalArrivalTime: 19 Aug 2004 18:03:38.0941 (UTC) FILETIME=[DAAD4ED0:01C48616] X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Grant Thanks for your response. I have fixed this issue. I have indexed 5 MB worth of text files and I now only use 224 KB. I was getting 80 MB. The only change I made was to change the way I merge my temp index into my prod index. My code changed from: prodWriter.setUseCompoundFile(true); prodWriter.addIndexes(new IndexReader[] { tempReader }); To: int iNumDocs = tempReader.numDocs(); for (int y = 0; y < iNumDocs; y++) { Document tempDoc = tempReader.document(y); prodWriter.addDocument(tempDoc); } I don't know if this is a bug in the IndexWriter.addIndexes(IndexReader) method or something else I am doing that caused this, but I am getting much better results now. Thanks to everyone who helped, I really appreciate it. Rob ----- Original Message ----- From: "Grant Ingersoll" To: Sent: Thursday, August 19, 2004 10:51 AM Subject: Re: Index Size How many fields do you have and what analyzer are you using? >>> rjose89@comcast.net 8/19/2004 11:54:25 AM >>> Otis I upgraded to 1.4.1. I deleted all of my old indexes and started from scratch. I indexed 2 MB worth of text files and my index size is 8 MB. Would it be better if I stopped using the IndexWriter.addIndexes(IndexReader) method and instead traverse the IndexReader on the temp index and use IndexWriter.addDocument(Document) method? Thanks again for your input, I appreciate it. Rob ----- Original Message ----- From: "Otis Gospodnetic" To: "Lucene Users List" Sent: Thursday, August 19, 2004 8:00 AM Subject: Re: Index Size Just go for 1.4.1 and look at the CHANGES.txt file to see if there were any index format changes. If there were, you'll need to re-index. Otis --- Rob Jose wrote: > Otis > I am using Lucene 1.3 final. Would it help if I move to Lucene 1.4 > final? > > Rob > ----- Original Message ----- > From: "Otis Gospodnetic" > To: "Lucene Users List" > Sent: Thursday, August 19, 2004 7:13 AM > Subject: Re: Index Size > > > I thought this was the case. I believe there was a bug in one of the > recent Lucene releases that caused old CFS files not to be removed > when > they should be removed. This resulted in your index directory > containing a bunch of old CFS files consuming your disk space. > > Try getting a recent nightly build and see if using that takes car > eof > your problem. > > Otis > > --- Rob Jose wrote: > > > Hey George > > Thanks for responding. I am using windows and I don't see any > hidden > > files. > > I have a ton of CFS files (1366/1405). I have 22 F# (F1, F2, etc.) > > files. > > I have two FDT files and two FDX files. And three FNM files. Add > > these > > files to the deletable and segments file and that is all of the > files > > that I > > have. The CFS files are appoximately 11 MB each. The totals I > gave > > you > > before were for all of my indexes together. This particular index > > has a > > size of 21.6 GB. The files that it indexed have a size of 89 MB. > > > > OK - I just removed all of the CFS files from the directory and I > can > > still > > read my indexes. So know I have to ask what are these CFS files? > > Why are > > they created? And how can I get rid of them if I don't need them. > I > > will > > also take a look at the Lucene website to see if I can find any > > information. > > > > Thanks > > Rob > > > > ----- Original Message ----- > > From: "Honey George" > > To: "Lucene Users List" > > Sent: Thursday, August 19, 2004 12:29 AM > > Subject: Re: Index Size > > > > > > Hi, > > Please check for hidden files in the index folder. If > > you are using linx, do something like > > > > ls -al > > > > I am also facing a similar problem where the index > > size is greater than the data size. In my case there > > were some hidden temproary files which the lucene > > creates. > > That was taking half of the total size. > > > > My problem is that after deleting the temporary files, > > the index size is same as that of the data size. That > > again seems to be a problem. I am yet to find out the > > reason.. > > > > Thanks, > > george > > > > > > --- Rob Jose wrote: > > > Hello > > > I have indexed several thousand (52 to be exact) > > > text files and I keep running out of disk space to > > > store the indexes. The size of the documents I have > > > indexed is around 2.5 GB. The size of the Lucene > > > indexes is around 287 GB. Does this seem correct? > > > I am not storing the contents of the file, just > > > indexing and tokenizing. I am using Lucene 1.3 > > > final. Can you guys let me know what you are > > > experiencing? I don't want to go into production > > > with something that I should be configuring better. > > > > > > > > > I am not sure if this helps, but I have a temp index > > > and a real index. I index the file into the temp > > > index, and then merge the temp index into the real > > > index using the addIndexes method on the > > > IndexWriter. I have also set the production writer > > > setUseCompoundFile to true. I did not set this on > > > the temp index. The last thing that I do before > > > closing the production writer is to call the > > > optimize method. > > > > > > I would really appreciate any ideas to get the index > > > size smaller if it is at all possible. > > > > > > Thanks > > > Rob --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org