lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maureen tanuwidjaja <autumn_musi...@yahoo.com>
Subject Re: Urgent : How much actually the disk space needed to optimize the index?
Date Tue, 13 Mar 2007 11:31:11 GMT
Hi Mike..
  
  "One thing that stands out in your listing is: your norms file
  (_1ke1.nrm) is enormous compared to all other files.  Are you indexing
  many tiny docs where each docs has highly variable fields or something?"
  
  Ya I also confuse why this nrm file is trmendous in size.
  I am indexing a total of 657739 XML document .
  Total number of fields are 37552 fields (I am using XML tags as the field)
  
  
  OK,this is the listing of the index file before I optimize...
  
  D:\dual_index\DI>dir
   Volume in drive D is SELAB
   Volume Serial Number is 44A7-7D50
  
   Directory of D:\dual_index\DI
  
  03/13/2007  09:29 AM    <DIR>          .
  03/13/2007  09:29 AM    <DIR>          ..
  03/13/2007  05:56  AM                 20 segments.gen
  03/13/2007  05:56 AM               712 segments_34rz
  03/13/2007  01:56 AM     2,491,551,624 _16v6.cfs
  03/13/2007  04:30 AM     2,140,779,671 _1fft.cfs
  03/13/2007  04:42 AM        76,813,296 _1gao.cfs
  03/13/2007  04:53 AM        78,626,916 _1h5j.cfs
  03/13/2007  05:06 AM       101,981,232 _1i0e.cfs
  03/13/2007  05:24 AM       182,544,071 _1iv9.cfs
  03/13/2007  05:43 AM       185,825,480 _1jq4.cfs
  03/13/2007  05:44 AM        10,569,811 _1jt7.cfs
  03/13/2007  05:46 AM        12,100,629 _1jwa.cfs
  03/13/2007  05:48 AM        12,127,317 _1jzd.cfs
  03/13/2007  05:49 AM        11,478,747 _1k2g.cfs
  03/13/2007  05:51 AM        11,483,235 _1k5j.cfs
  03/13/2007  05:53 AM        11,864,730 _1k8m.cfs
  03/13/2007  05:54 AM        10,966,413 _1kbp.cfs
  03/13/2007  05:55 AM           936,961 _1kc0.cfs
  03/13/2007  05:55 AM         1,144,949 _1kcb.cfs
  03/13/2007  05:55 AM         1,314,375 _1kcm.cfs
  03/13/2007  05:55 AM           951,460 _1kcx.cfs
  03/13/2007  05:55 AM         1,175,376 _1kd8.cfs
  03/13/2007  05:55 AM         1,171,232 _1kdj.cfs
  03/13/2007  05:55 AM         1,176,141 _1kdu.cfs
  03/13/2007  05:56 AM           124,219 _1kdv.cfs
  03/13/2007  05:56 AM           117,425 _1kdw.cfs
  03/13/2007  05:56 AM           158,673 _1kdx.cfs
  03/13/2007  05:56 AM           117,591 _1kdy.cfs
  03/12/2007  03:24 PM     5,594,336,501 _8km.cfs
  03/12/2007  06:07 PM     3,322,027,221 _h59.cfs
  03/12/2007  08:51 PM     3,017,631,411 _ppw.cfs
  03/12/2007  11:25 PM     2,383,550,153 _yaj.cfs
                31 File(s) 19,664,647,592 bytes
                 2 Dir(s)  20,398,489,600 bytes free
  
----------------------------------------------------------------------------------------------
  
  And there is another thing I want to ask...is it searching on the  optimized index render
significantly faster searching compared to the  unoptimized one?
  
  It tooks me various numbers from 40second to 3minutes in searching inside this unoptimized
index....
  
  How bout the memory consumption?will it took greater amount of memory consumption if using
the optimized one?
  
  
  
  Thanks a lot
  
  Regards,
  Maureen
  
  
Michael McCandless <lucene@mikemccandless.com> wrote:  
"maureen tanuwidjaja"  wrote:

>   How much actually the disk space needed to optimize the index?The 
>   explanation given in documentation seems to be very different with the 
>   practical situation
>   
>   I have an index file of size 18.6 G and I am going to optimize it.I 
>   keep this index in mobile Hard Disk with capacity of 100 Gb....I did 
>   not use any index reader,and I merely call index writer to optimize 
>   this index.However,to my surprise,now while optimizing, the Index size 
>   grow to almost occupy all the free space.I am preety sure that later it
>    will terminated due to there is no sufficient disk space.
>   
>   This is the content on the index file
>   ------------------------------------------------------------------------------------------
>   03/13/2007  02:14 PM              .
>   03/13/2007  02:14 PM              ..
>   03/13/2007  02:14  PM                 20 segments.gen
>   03/13/2007  02:14  PM                 67 segments_34s4
>   03/13/2007  12:06  PM                  0 write.lock
>   03/13/2007  02:14 PM    41,705,009,152 _1ke1.cfs
>   03/13/2007  12:15 PM     1,638,320,227 _1ke1.fdt
>   03/13/2007  12:15 PM         4,461,912 _1ke1.fdx
>   03/13/2007  12:09 PM         6,295,065 _1ke1.fnm
>   03/13/2007  12:26 PM       232,520,666 _1ke1.frq
>   03/13/2007  02:08 PM    44,927,549,671 _1ke1.nrm
>   03/13/2007  12:26 PM       170,766,513 _1ke1.prx
>   03/13/2007  12:26 PM         1,281,924 _1ke1.tii
>   03/13/2007  12:26 PM       103,094,835 _1ke1.tis
>   03/13/2007  02:14 PM        51,688,575 _1ke1.tvd
>   03/13/2007  02:14 PM       882,304,866 _1ke1.tvf
>   03/13/2007  02:14 PM         4,461,916 _1ke1.tvx
>   03/12/2007  03:24 PM     5,594,336,501 _8km.cfs


As best I know, it should only require 2X the disk space.  In your
case this means you should only have needed 18.6 GB of free space (ie,
1X is the current index, then another 1X in free space).

So something odd is happening here.

One thing that stands out in your listing is: your norms file
(_1ke1.nrm) is enormous compared to all other files.  Are you indexing
many tiny docs where each docs has highly variable fields or something?

Hmmm.  In fact if you are doing this, then on merge, the norms (which
are not stored "sparsely") could in fact grow far larger than 2X.

Can you send a listing of the 18.6 GB index before optimizing?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 
---------------------------------
Sucker-punch spam with award-winning protection.
 Try the free Yahoo! Mail Beta.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message