lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: strange behaviour in CompoundFileReader fileModified and touchFile
Date Fri, 01 Oct 2004 19:59:11 GMT
Bernhard Messer wrote:

> Dmitry,
>> Bernhard Messer wrote:
>>> hi,
>>> CompoundFileReader class contains some code where i can't follow the 
>>> idea behind it. Maybe somebody else can switch on the light for me, 
>>> so i can see the track. There are 2 public methods which definitly 
>>> don't work as expected. I know, extending Directory forces one to 
>>> implement the methods,  but in that particular, case the 
>>> implementation is just confusing me and my be other people too.
>>>    public long fileModified(String name) throws IOException {
>>>        return directory.fileModified(fileName);
>>>    }
>>>    public void touchFile(String name) throws IOException {
>>>        directory.touchFile(fileName);
>>>    }
>>> Looking at the implementation, both methods are working on the 
>>> compound filename itself, regardless what the filename passed in has 
>>> as it's value. It would be much more understandable, if these 
>>> methods throw some UnsupportedOperationException. The other way is 
>>> to to change them in a way, that the underlaying directory method 
>>> calls will get the real filename passed in and not the compound 
>>> filename itself.
>> Well, the reason I did it this way is because I thought this would be 
>> the least amount of disruption to the programs out there that might 
>> be using these APIs. You can't really pass the "name" into the 
>> directory since it doesn't know about these as individual files. 
>> Directoy only knows about the compound file.
> I'm not sure if this is correct. Looking at the implementation for 
> example in FSDirectory, every file, doesn't matter if it is related to 
> Lucene or not can be touched.

Yes, but the usual files that you find in the old-style segment, the 
ones that the CompoundFileReader and the rest of Lucene know about, are 
not present on the file system when the compound files are used. So 
FSDirectory only knows about the compound file, while everything up from 
the CompoundFileReader still thinks that there are multiple files in a 
given segment.

>> To implement the fileModified() fully, you could just store 
>> timestamps in the file, but then they would just the same as the 
>> timestamp on the overall file, unless there was also touchFile() 
>> support.To implement touch file, you'd have to open the file in 
>> random access and update the timestamp field of an individual file. 
>> This can certainly be done, but I didn't have a need for it. You 
>> could throw the Unsupported exception, but this could make callers 
>> have to change. Anyway, the compromise I chose was to treat a "touch" 
>> on one file as if a "touch" on all files for the segment. This works 
>> in most usages. The only time this would be a problem is if you 
>> implemented some kind of timestamp set/check that would depend on 
>> files in a segment having different timestamps. This might be 
>> important for updating segments, but since this is never done, I'm 
>> not sure this is really that useful. Do you have case in mind when 
>> this is proving to be a limitation?
> Agree with you. I don't see the need for a full implementation of 
> touch file and lastModified for the internal used compound file parts 
> or any other file. But the way it is implemented now, it just does 
> something different than it looks for the user of the API. The idea i 
> had in mind, was to implement it in a way that the compound file can 
> be touched and lastModified can be read also. If the user passes in a  
> filename, different to the compound file name, either an 
> UnsupportedOperationException or even better an IOException could be 
> thrown.

See above. The user knows about the old-style files only, it does not 
know about the cfs file. On the other hand, FSDirectory knows only of 
cfs file and not of the .f1, .f2, .fdt, and so on.

What the implementation is trying to do (unless I'm forgetting 
something) is to accept the .f1, .f2, etc names as input and change the 
timestamp of the .cfs file regardless of which particular segment file 
was requested. This makes it look like your call resulted in the 
expected behavior (in that calling fileModified with the same name will 
give back the same timestamp), but also that *someone else* has also 
called touchFile on all other files as well. I think this is not 
unreasonable and provides the most compatible behavior for the upper 
layers, short of a full implementation. Does this make sense?

> what do you think ?
> Bernhard
>>> just a thought ;-)
>>> bernhard
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message