lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phillip Farber <>
Subject Re: Writing optimized index to different storage?
Date Wed, 30 Sep 2009 22:37:15 GMT
Sorry, I should have given more background. We have, at the moment 3.8 
million documents of 0.7MB/doc average so we have extremely large 
shards.  We build about 400,000 documents to a shard resulting 
200GB/shard.  We are also using LVM snapshots to manage a snapshot of 
the shard which we serve while we continue to build.

In order to optimize the building shard of around 200GB we need 400GB of 
  disk space to allow for 2x size increase. Due to the nature of 
snapshotting, the volume containing the snapshot has to be as large as 
the build volume, i.e. 400GB.

If we could write the optimized build shard elsewhere instead of "in 
place" we could avoid the need for the serving volume to match the size 
of the building volume.

We'd like to avoid the need to have 200GB+ hanging around just to 

Responses we got on whether writing "elsewhere" optimize make it clear 
that's not a solution.

I posted another question to the list just a bit ago asking whether 
mergefactor=1 would give us a single segment index that is always 
optimized so that we don't have the 2x overhead.

However, running a build with merge factor=1 shows that lots of segments 
get created/merged and that the index grows in size but shrinks at 
intervals to a degree too.  It is not clear how big the index is at any 
point in time.

Chris Hostetter wrote:
> : Is it possible to tell Solr or Lucene, when optimizing, to write the files
> : that constitute the optimized index to somewhere other than
> : SOLR_HOME/data/index or is there something about the optimize that requires
> : the final segment to be created in SOLR_HOME/data/index?
> 	For what purpose?
> XY Problem
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also:
> -Hoss

View raw message