Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Message-ID: <05e101c48616$daac3d60$6204100a@INFXINC.NET>
From: "Rob Jose" <rjose89@comcast.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
References: <s124b0a0.053@gwia201.syr.edu>
Subject: Re: Index Size
Date: Thu, 19 Aug 2004 11:03:38 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Grant
Thanks for your response.  I have fixed this issue.  I have indexed 5 MB
worth of text files and I now only use 224 KB.  I was getting 80 MB.  The
only change I made was to change the way I merge my temp index into my prod
index.  My code changed from:
prodWriter.setUseCompoundFile(true);

prodWriter.addIndexes(new IndexReader[] { tempReader });

To:

int iNumDocs = tempReader.numDocs();

for (int y = 0; y < iNumDocs; y++) {

Document tempDoc = tempReader.document(y);

prodWriter.addDocument(tempDoc);

}


I don't know if this is a bug in the IndexWriter.addIndexes(IndexReader)
method or something else I am doing that caused this, but I am getting much
better results now.


Thanks to everyone who helped, I really appreciate it.


Rob

----- Original Message ----- 
From: "Grant Ingersoll" <GSIngers@syr.edu>
To: <lucene-user@jakarta.apache.org>
Sent: Thursday, August 19, 2004 10:51 AM
Subject: Re: Index Size


How many fields do you have and what analyzer are you using?

>>> rjose89@comcast.net 8/19/2004 11:54:25 AM >>>
Otis
I upgraded to 1.4.1.  I deleted all of my old indexes and started from
scratch.  I indexed 2 MB worth of text files and my index size is 8
MB.
Would it be better if I stopped using the
IndexWriter.addIndexes(IndexReader) method and instead traverse the
IndexReader on the temp index and use
IndexWriter.addDocument(Document)
method?

Thanks again for your input, I appreciate it.

Rob
----- Original Message ----- 
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, August 19, 2004 8:00 AM
Subject: Re: Index Size


Just go for 1.4.1 and look at the CHANGES.txt file to see if there
were
any index format changes.  If there were, you'll need to re-index.

Otis

--- Rob Jose <rjose89@comcast.net> wrote:

> Otis
> I am using Lucene 1.3 final.  Would it help if I move to Lucene 1.4
> final?
>
> Rob
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 7:13 AM
> Subject: Re: Index Size
>
>
> I thought this was the case.  I believe there was a bug in one of
the
> recent Lucene releases that caused old CFS files not to be removed
> when
> they should be removed.  This resulted in your index directory
> containing a bunch of old CFS files consuming your disk space.
>
> Try getting a recent nightly build and see if using that takes car
> eof
> your problem.
>
> Otis
>
> --- Rob Jose <rjose89@comcast.net> wrote:
>
> > Hey George
> > Thanks for responding.  I am using windows and I don't see any
> hidden
> > files.
> > I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2,
etc.)
> > files.
> > I have two FDT files and two FDX files. And three FNM files.  Add
> > these
> > files to the deletable and segments file and that is all of the
> files
> > that I
> > have.   The CFS files are appoximately 11 MB each.  The totals I
> gave
> > you
> > before were for all of my indexes together.  This particular index
> > has a
> > size of 21.6 GB.  The files that it indexed have a size of 89 MB.
> >
> > OK - I just removed all of the CFS files from the directory and I
> can
> > still
> > read my indexes.  So know I have to ask what are these CFS files?
> > Why are
> > they created?  And how can I get rid of them if I don't need them.
> I
> > will
> > also take a look at the Lucene website to see if I can find any
> > information.
> >
> > Thanks
> > Rob
> >
> > ----- Original Message ----- 
> > From: "Honey George" <honey_george@yahoo.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 12:29 AM
> > Subject: Re: Index Size
> >
> >
> > Hi,
> >  Please check for hidden files in the index folder. If
> > you are using linx, do something like
> >
> > ls -al <index folder>
> >
> > I am also facing a similar problem where the index
> > size is greater than the data size. In my case there
> > were some hidden temproary files which the lucene
> > creates.
> > That was taking half of the total size.
> >
> > My problem is that after deleting the temporary files,
> > the index size is same as that of the data size. That
> > again seems to be a problem. I am yet to find out the
> > reason..
> >
> > Thanks,
> >    george
> >
> >
> >  --- Rob Jose <rjose89@comcast.net> wrote:
> > > Hello
> > > I have indexed several thousand (52 to be exact)
> > > text files and I keep running out of disk space to
> > > store the indexes.  The size of the documents I have
> > > indexed is around 2.5 GB.  The size of the Lucene
> > > indexes is around 287 GB.  Does this seem correct?
> > > I am not storing the contents of the file, just
> > > indexing and tokenizing.  I am using Lucene 1.3
> > > final.  Can you guys let me know what you are
> > > experiencing?  I don't want to go into production
> > > with something that I should be configuring better.
> > >
> > >
> > > I am not sure if this helps, but I have a temp index
> > > and a real index.  I index the file into the temp
> > > index, and then merge the temp index into the real
> > > index using the addIndexes method on the
> > > IndexWriter.  I have also set the production writer
> > > setUseCompoundFile to true.  I did not set this on
> > > the temp index.  The last thing that I do before
> > > closing the production writer is to call the
> > > optimize method.
> > >
> > > I would really appreciate any ideas to get the index
> > > size smaller if it is at all possible.
> > >
> > > Thanks
> > > Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org