lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: detected corrupted index / performance improvement
Date Thu, 07 Feb 2008 23:35:33 GMT
I don't think that is true - but I'm probably wrong though :).

My understanding is that several files are written in parallel  
(during the merge), causing random access. After the files are  
written, then they are all reread and written as a CFS file  
(essential sequential - although the read and write is going to cause  
head movement).

The code:

private IndexOutput tvx, tvf, tvd;              // To write term vectors
private FieldsWriter fieldsWriter;

is my clue that several files are written at once.

On Feb 7, 2008, at 5:19 PM, Mike Klaas wrote:

>
> On 7-Feb-08, at 2:00 PM, robert engels wrote:
>
>> My point is that commit needs to be used in most applications, and  
>> the commit in Lucene is very slow.
>>
>> You don't have 2x the IO cost, mainly because only the log file  
>> needs to be sync'd.  The index only has to be sync'd eventually,  
>> in order to prune the logfile - this can be done in the  
>> background, improving the performance of update and commit cycle.
>>
>> Also, writing the log file is very efficiently because it is an  
>> append/sequential operation. Writing the segment files writes  
>> multiple files - essentially causing random access writes.
>
> For large segments, multiple sequentially-written large files  
> should perform similarly to one large sequentially-written file.   
> It is only close to random access on the smallest segments (which a  
> sufficiently-large flush-by-ram shouldn't produce).
>
> -Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message