lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Jose" <rjos...@comcast.net>
Subject Re: Index Size
Date Thu, 19 Aug 2004 13:36:23 GMT
Hey George
Thanks for responding.  I am using windows and I don't see any hidden files.
I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2, etc.) files.
I have two FDT files and two FDX files. And three FNM files.  Add these
files to the deletable and segments file and that is all of the files that I
have.   The CFS files are appoximately 11 MB each.  The totals I gave you
before were for all of my indexes together.  This particular index has a
size of 21.6 GB.  The files that it indexed have a size of 89 MB.

OK - I just removed all of the CFS files from the directory and I can still
read my indexes.  So know I have to ask what are these CFS files?  Why are
they created?  And how can I get rid of them if I don't need them.  I will
also take a look at the Lucene website to see if I can find any information.

Thanks
Rob

----- Original Message ----- 
From: "Honey George" <honey_george@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, August 19, 2004 12:29 AM
Subject: Re: Index Size


Hi,
 Please check for hidden files in the index folder. If
you are using linx, do something like

ls -al <index folder>

I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking half of the total size.

My problem is that after deleting the temporary files,
the index size is same as that of the data size. That
again seems to be a problem. I am yet to find out the
reason..

Thanks,
   george


 --- Rob Jose <rjose89@comcast.net> wrote:
> Hello
> I have indexed several thousand (52 to be exact)
> text files and I keep running out of disk space to
> store the indexes.  The size of the documents I have
> indexed is around 2.5 GB.  The size of the Lucene
> indexes is around 287 GB.  Does this seem correct?
> I am not storing the contents of the file, just
> indexing and tokenizing.  I am using Lucene 1.3
> final.  Can you guys let me know what you are
> experiencing?  I don't want to go into production
> with something that I should be configuring better.
>
>
> I am not sure if this helps, but I have a temp index
> and a real index.  I index the file into the temp
> index, and then merge the temp index into the real
> index using the addIndexes method on the
> IndexWriter.  I have also set the production writer
> setUseCompoundFile to true.  I did not set this on
> the temp index.  The last thing that I do before
> closing the production writer is to call the
> optimize method.
>
> I would really appreciate any ideas to get the index
> size smaller if it is at all possible.
>
> Thanks
> Rob





___________________________________________________________ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message