lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: optimized disk usage when creating a compound index
Date Sat, 07 Aug 2004 10:51:29 GMT
Hi Christoph,

just reviewed the TestCompoundFile.java and you where absolutly right 
when saying that the test will fail on windows.  No the test is changed 
in a way that a second file with identical data is created. This file 
can be used in the testcases to make the comparisons against the 
compound store. Now the modified test runs fine on Microsoft and Linux 
platforms.

In the attachment you'll find the new TestCompoundFile source.

hope this helps
Bernhard

Christoph Goller wrote:

> It will not be lost. I have already reviewed it.
> There are some open issues concerning the changes in
> TestCompoundFile, that I want to discuss with Bernhard
> and then (hopefully next week) I will commit it.
>
> Christoph
>
> Erik Hatcher wrote:
>
>> Bernhard,
>>
>> Impressive work.  In order to prevent this from being lost in 
>> e-mail,  could you please create a new Bugzilla issue for each of 
>> your great  patches and attach the differences as CVS patches (cvs 
>> diff -Nu)?
>>
>> Many thanks for these contributions.
>>
>>     Erik
>>
>> On Aug 6, 2004, at 3:52 AM, Bernhard Messer wrote:
>>
>>> hi developers,
>>>
>>> i made some measurements on lucene disk usage during index 
>>> creation.  It's no surprise that during index creation,  within the 
>>> index  optimization, more disk space is necessary than the final 
>>> index size  will reach. What i didn't expect is such a high 
>>> difference in disk  size usage, switching the compound file option 
>>> on or off. Using the  compound file option, the disk usage during 
>>> index creation is more  than 3 times higher than the final index 
>>> size. This could be a pain in  the neck, running projects like 
>>> nutch, where huge datasets will be  indexed. The grow rate relies on 
>>> the fact that SegmentMerger creates  the fully compound file first, 
>>> before deleting the original, unused  files.
>>> So i patched SegmentMerger and CompoundFileWriter classes in a way,  
>>> that they will delete the file immediatly after copying the data  
>>> within the compound. The result was, that we could reduce the  
>>> necessary disk space from factor 3 to 2.
>>> The change forces to make some modifications within the  
>>> TestCompoundFile class also. In several test methods the original 
>>> file  was compared to it's compound part. Using the modified 
>>> SegmentMerger  and CompoundFileWriter, the file was already deleted 
>>> and couldn't be  opened.
>>>
>>> Here are some statistics about disk usage during index creation:
>>>
>>> compound option is off:
>>> final index size: 380 KB           max. diskspace used: 408 KB
>>> final index size: 11079 KB       max. diskspace used: 11381 KB
>>> final index size: 204148 KB      max. diskspace used: 20739 KB
>>>
>>> using compound index:
>>> final index size: 380 KB           max. diskspace used: 1145 KB
>>> final index size: 11079 KB       max. diskspace used: 33544 KB
>>> final index size: 204148 KB      max. diskspace used: 614977 KB
>>>
>>> using compound index with patch:
>>> final index size: 380 KB           max. diskspace used: 777 KB
>>> final index size: 11079 KB       max. diskspace used: 22464 KB
>>> final index size: 204148 KB      max. diskspace used: 410829
>>>
>>> The change was tested under windows and linux without any negativ 
>>> side  effects. All JUnit test cases work fine. In the attachment 
>>> you'll find  all the necessary files:
>>>
>>> SegmentMerger.java
>>> CompoundFileWriter.java
>>> TestCompoundFile.java
>>>
>>> SegmentMerger.diff
>>> CompoundFileWriter.diff
>>> TestCompoundFile.diff
>>>
>>> keep moving
>>> Bernhard
>>>
>>>
>>> Index: src/java/org/apache/lucene/index/CompoundFileWriter.java
>>> ===================================================================
>>> RCS file:  
>>> /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/ 
>>> CompoundFileWriter.java,v
>>> retrieving revision 1.3
>>> diff -r1.3 CompoundFileWriter.java
>>> 163a164,166
>>>
>>>>
>>>>                 // immediatly delete the copied file to safe  
>>>> disk-space
>>>>                 directory.deleteFile((String) fe.file);
>>>
>>>
>>> package org.apache.lucene.index;
>>>
>>> /**
>>>  * Copyright 2004 The Apache Software Foundation
>>>  *
>>>  * Licensed under the Apache License, Version 2.0 (the "License");
>>>  * you may not use this file except in compliance with the License.
>>>  * You may obtain a copy of the License at
>>>  *
>>>  *     http://www.apache.org/licenses/LICENSE-2.0
>>>  *
>>>  * Unless required by applicable law or agreed to in writing, software
>>>  * distributed under the License is distributed on an "AS IS" BASIS,
>>>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or  
>>> implied.
>>>  * See the License for the specific language governing permissions and
>>>  * limitations under the License.
>>>  */
>>>
>>> import org.apache.lucene.store.Directory;
>>> import org.apache.lucene.store.OutputStream;
>>> import org.apache.lucene.store.InputStream;
>>> import java.util.LinkedList;
>>> import java.util.HashSet;
>>> import java.util.Iterator;
>>> import java.io.IOException;
>>>
>>>
>>> /**
>>>  * Combines multiple files into a single compound file.
>>>  * The file format:<br>
>>>  * <ul>
>>>  *     <li>VInt fileCount</li>
>>>  *     <li>{Directory}
>>>  *         fileCount entries with the following structure:</li>
>>>  *         <ul>
>>>  *             <li>long dataOffset</li>
>>>  *             <li>UTFString extension</li>
>>>  *         </ul>
>>>  *     <li>{File Data}
>>>  *         fileCount entries with the raw data of the corresponding  
>>> file</li>
>>>  * </ul>
>>>  *
>>>  * The fileCount integer indicates how many files are contained in  
>>> this compound
>>>  * file. The {directory} that follows has that many entries. Each  
>>> directory entry
>>>  * contains an encoding identifier, an long pointer to the start of  
>>> this file's
>>>  * data section, and a UTF String with that file's extension.
>>>  *
>>>  * @author Dmitry Serebrennikov
>>>  * @version $Id: CompoundFileWriter.java,v 1.3 2004/03/29 22:48:02  
>>> cutting Exp $
>>>  */
>>> final class CompoundFileWriter {
>>>
>>>     private static final class FileEntry {
>>>         /** source file */
>>>         String file;
>>>
>>>         /** temporary holder for the start of directory entry for 
>>> this  file */
>>>         long directoryOffset;
>>>
>>>         /** temporary holder for the start of this file's data 
>>> section  */
>>>         long dataOffset;
>>>     }
>>>
>>>
>>>     private Directory directory;
>>>     private String fileName;
>>>     private HashSet ids;
>>>     private LinkedList entries;
>>>     private boolean merged = false;
>>>
>>>
>>>     /** Create the compound stream in the specified file. The file  
>>> name is the
>>>      *  entire name (no extensions are added).
>>>      */
>>>     public CompoundFileWriter(Directory dir, String name) {
>>>         if (dir == null)
>>>             throw new IllegalArgumentException("Missing directory");
>>>         if (name == null)
>>>             throw new IllegalArgumentException("Missing name");
>>>
>>>         directory = dir;
>>>         fileName = name;
>>>         ids = new HashSet();
>>>         entries = new LinkedList();
>>>     }
>>>
>>>     /** Returns the directory of the compound file. */
>>>     public Directory getDirectory() {
>>>         return directory;
>>>     }
>>>
>>>     /** Returns the name of the compound file. */
>>>     public String getName() {
>>>         return fileName;
>>>     }
>>>
>>>     /** Add a source stream. If sourceDir is null, it is set to the
>>>      *  same value as the directory where this compound stream exists.
>>>      *  The id is the string by which the sub-stream will be know 
>>> in  the
>>>      *  compound stream. The caller must ensure that the ID is 
>>> unique.  If the
>>>      *  id is null, it is set to the name of the source file.
>>>      */
>>>     public void addFile(String file) {
>>>         if (merged)
>>>             throw new IllegalStateException(
>>>                 "Can't add extensions after merge has been called");
>>>
>>>         if (file == null)
>>>             throw new IllegalArgumentException(
>>>                 "Missing source file");
>>>
>>>         if (! ids.add(file))
>>>             throw new IllegalArgumentException(
>>>                 "File " + file + " already added");
>>>
>>>         FileEntry entry = new FileEntry();
>>>         entry.file = file;
>>>         entries.add(entry);
>>>     }
>>>
>>>     /** Merge files with the extensions added up to now.
>>>      *  All files with these extensions are combined sequentially 
>>> into  the
>>>      *  compound stream. After successful merge, the source files
>>>      *  are deleted.
>>>      */
>>>     public void close() throws IOException {
>>>         if (merged)
>>>             throw new IllegalStateException(
>>>                 "Merge already performed");
>>>
>>>         if (entries.isEmpty())
>>>             throw new IllegalStateException(
>>>                 "No entries to merge have been defined");
>>>
>>>         merged = true;
>>>
>>>         // open the compound stream
>>>         OutputStream os = null;
>>>         try {
>>>             os = directory.createFile(fileName);
>>>
>>>             // Write the number of entries
>>>             os.writeVInt(entries.size());
>>>
>>>             // Write the directory with all offsets at 0.
>>>             // Remember the positions of directory entries so that 
>>> we  can
>>>             // adjust the offsets later
>>>             Iterator it = entries.iterator();
>>>             while(it.hasNext()) {
>>>                 FileEntry fe = (FileEntry) it.next();
>>>                 fe.directoryOffset = os.getFilePointer();
>>>                 os.writeLong(0);    // for now
>>>                 os.writeString(fe.file);
>>>             }
>>>
>>>             // Open the files and copy their data into the stream.
>>>             // Remeber the locations of each file's data section.
>>>             byte buffer[] = new byte[1024];
>>>             it = entries.iterator();
>>>             while(it.hasNext()) {
>>>                 FileEntry fe = (FileEntry) it.next();
>>>                 fe.dataOffset = os.getFilePointer();
>>>                 copyFile(fe, os, buffer);
>>>
>>>                 // immediatly delete the copied file to safe disk-space
>>>                 directory.deleteFile((String) fe.file);
>>>             }
>>>
>>>             // Write the data offsets into the directory of the  
>>> compound stream
>>>             it = entries.iterator();
>>>             while(it.hasNext()) {
>>>                 FileEntry fe = (FileEntry) it.next();
>>>                 os.seek(fe.directoryOffset);
>>>                 os.writeLong(fe.dataOffset);
>>>             }
>>>
>>>             // Close the output stream. Set the os to null before  
>>> trying to
>>>             // close so that if an exception occurs during the 
>>> close,  the
>>>             // finally clause below will not attempt to close the  
>>> stream
>>>             // the second time.
>>>             OutputStream tmp = os;
>>>             os = null;
>>>             tmp.close();
>>>
>>>         } finally {
>>>             if (os != null) try { os.close(); } catch (IOException 
>>> e)  { }
>>>         }
>>>     }
>>>
>>>     /** Copy the contents of the file with specified extension into the
>>>      *  provided output stream. Use the provided buffer for moving data
>>>      *  to reduce memory allocation.
>>>      */
>>>     private void copyFile(FileEntry source, OutputStream os, byte  
>>> buffer[])
>>>     throws IOException
>>>     {
>>>         InputStream is = null;
>>>         try {
>>>             long startPtr = os.getFilePointer();
>>>
>>>             is = directory.openFile(source.file);
>>>             long length = is.length();
>>>             long remainder = length;
>>>             int chunk = buffer.length;
>>>
>>>             while(remainder > 0) {
>>>                 int len = (int) Math.min(chunk, remainder);
>>>                 is.readBytes(buffer, 0, len);
>>>                 os.writeBytes(buffer, len);
>>>                 remainder -= len;
>>>             }
>>>
>>>             // Verify that remainder is 0
>>>             if (remainder != 0)
>>>                 throw new IOException(
>>>                     "Non-zero remainder length after copying: " +  
>>> remainder
>>>                     + " (id: " + source.file + ", length: " + length
>>>                     + ", buffer size: " + chunk + ")");
>>>
>>>             // Verify that the output length diff is equal to 
>>> original  file
>>>             long endPtr = os.getFilePointer();
>>>             long diff = endPtr - startPtr;
>>>             if (diff != length)
>>>                 throw new IOException(
>>>                     "Difference in the output file offsets " + diff
>>>                     + " does not match the original file length " +  
>>> length);
>>>
>>>         } finally {
>>>             if (is != null) is.close();
>>>         }
>>>     }
>>> }
>>> Index: src/java/org/apache/lucene/index/SegmentMerger.java
>>> ===================================================================
>>> RCS file:  
>>> /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/ 
>>> SegmentMerger.java,v
>>> retrieving revision 1.11
>>> diff -r1.11 SegmentMerger.java
>>> 151c151
>>> <     // Perform the merge
>>> ---
>>>
>>>>     // Perform the merge. Files will be deleted within  
>>>> CompoundFileWriter.close()
>>>
>>>
>>> 153,158c153
>>> <
>>> <     // Now delete the source files
>>> <     it = files.iterator();
>>> <     while (it.hasNext()) {
>>> <       directory.deleteFile((String) it.next());
>>> <     }
>>> ---
>>>
>>>>
>>> package org.apache.lucene.index;
>>>
>>> /**
>>>  * Copyright 2004 The Apache Software Foundation
>>>  *
>>>  * Licensed under the Apache License, Version 2.0 (the "License");
>>>  * you may not use this file except in compliance with the License.
>>>  * You may obtain a copy of the License at
>>>  *
>>>  *     http://www.apache.org/licenses/LICENSE-2.0
>>>  *
>>>  * Unless required by applicable law or agreed to in writing, software
>>>  * distributed under the License is distributed on an "AS IS" BASIS,
>>>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or  
>>> implied.
>>>  * See the License for the specific language governing permissions and
>>>  * limitations under the License.
>>>  */
>>>
>>> import java.util.Vector;
>>> import java.util.ArrayList;
>>> import java.util.Iterator;
>>> import java.io.IOException;
>>>
>>> import org.apache.lucene.store.Directory;
>>> import org.apache.lucene.store.OutputStream;
>>> import org.apache.lucene.store.RAMOutputStream;
>>>
>>> /**
>>>  * The SegmentMerger class combines two or more Segments, 
>>> represented  by an IndexReader ({@link #add},
>>>  * into a single Segment.  After adding the appropriate readers, 
>>> call  the merge method to combine the
>>>  * segments.
>>>  *<P>
>>>  * If the compoundFile flag is set, then the segments will be 
>>> merged  into a compound file.
>>>  *
>>>  *
>>>  * @see #merge
>>>  * @see #add
>>>  */
>>> final class SegmentMerger {
>>>   private boolean useCompoundFile;
>>>   private Directory directory;
>>>   private String segment;
>>>
>>>   private Vector readers = new Vector();
>>>   private FieldInfos fieldInfos;
>>>
>>>   // File extensions of old-style index files
>>>   private static final String COMPOUND_EXTENSIONS[] = new String[] {
>>>     "fnm", "frq", "prx", "fdx", "fdt", "tii", "tis"
>>>   };
>>>   private static final String VECTOR_EXTENSIONS[] = new String[] {
>>>     "tvx", "tvd", "tvf"
>>>   };
>>>
>>>   /**
>>>    *
>>>    * @param dir The Directory to merge the other segments into
>>>    * @param name The name of the new segment
>>>    * @param compoundFile true if the new segment should use a  
>>> compoundFile
>>>    */
>>>   SegmentMerger(Directory dir, String name, boolean compoundFile) {
>>>     directory = dir;
>>>     segment = name;
>>>     useCompoundFile = compoundFile;
>>>   }
>>>
>>>   /**
>>>    * Add an IndexReader to the collection of readers that are to be  
>>> merged
>>>    * @param reader
>>>    */
>>>   final void add(IndexReader reader) {
>>>     readers.addElement(reader);
>>>   }
>>>
>>>   /**
>>>    *
>>>    * @param i The index of the reader to return
>>>    * @return The ith reader to be merged
>>>    */
>>>   final IndexReader segmentReader(int i) {
>>>     return (IndexReader) readers.elementAt(i);
>>>   }
>>>
>>>   /**
>>>    * Merges the readers specified by the {@link #add} method into 
>>> the  directory passed to the constructor
>>>    * @return The number of documents that were merged
>>>    * @throws IOException
>>>    */
>>>   final int merge() throws IOException {
>>>     int value;
>>>
>>>     value = mergeFields();
>>>     mergeTerms();
>>>     mergeNorms();
>>>
>>>     if (fieldInfos.hasVectors())
>>>       mergeVectors();
>>>
>>>     if (useCompoundFile)
>>>       createCompoundFile();
>>>
>>>     return value;
>>>   }
>>>
>>>   /**
>>>    * close all IndexReaders that have been added.
>>>    * Should not be called before merge().
>>>    * @throws IOException
>>>    */
>>>   final void closeReaders() throws IOException {
>>>     for (int i = 0; i < readers.size(); i++) {  // close readers
>>>       IndexReader reader = (IndexReader) readers.elementAt(i);
>>>       reader.close();
>>>     }
>>>   }
>>>
>>>   private final void createCompoundFile()
>>>           throws IOException {
>>>     CompoundFileWriter cfsWriter =
>>>             new CompoundFileWriter(directory, segment + ".cfs");
>>>
>>>     ArrayList files =
>>>       new ArrayList(COMPOUND_EXTENSIONS.length + fieldInfos.size());
>>>
>>>     // Basic files
>>>     for (int i = 0; i < COMPOUND_EXTENSIONS.length; i++) {
>>>       files.add(segment + "." + COMPOUND_EXTENSIONS[i]);
>>>     }
>>>
>>>     // Field norm files
>>>     for (int i = 0; i < fieldInfos.size(); i++) {
>>>       FieldInfo fi = fieldInfos.fieldInfo(i);
>>>       if (fi.isIndexed) {
>>>         files.add(segment + ".f" + i);
>>>       }
>>>     }
>>>
>>>     // Vector files
>>>     if (fieldInfos.hasVectors()) {
>>>       for (int i = 0; i < VECTOR_EXTENSIONS.length; i++) {
>>>         files.add(segment + "." + VECTOR_EXTENSIONS[i]);
>>>       }
>>>     }
>>>
>>>     // Now merge all added files
>>>     Iterator it = files.iterator();
>>>     while (it.hasNext()) {
>>>       cfsWriter.addFile((String) it.next());
>>>     }
>>>
>>>     // Perform the merge. Files will be deleted within  
>>> CompoundFileWriter.close()
>>>     cfsWriter.close();
>>>
>>>   }
>>>
>>>   /**
>>>    *
>>>    * @return The number of documents in all of the readers
>>>    * @throws IOException
>>>    */
>>>   private final int mergeFields() throws IOException {
>>>     fieldInfos = new FieldInfos();          // merge field names
>>>     int docCount = 0;
>>>     for (int i = 0; i < readers.size(); i++) {
>>>       IndexReader reader = (IndexReader) readers.elementAt(i);
>>>       fieldInfos.addIndexed(reader.getIndexedFieldNames(true), true);
>>>       fieldInfos.addIndexed(reader.getIndexedFieldNames(false), false);
>>>       fieldInfos.add(reader.getFieldNames(false), false);
>>>     }
>>>     fieldInfos.write(directory, segment + ".fnm");
>>>
>>>     FieldsWriter fieldsWriter = // merge field values
>>>             new FieldsWriter(directory, segment, fieldInfos);
>>>     try {
>>>       for (int i = 0; i < readers.size(); i++) {
>>>         IndexReader reader = (IndexReader) readers.elementAt(i);
>>>         int maxDoc = reader.maxDoc();
>>>         for (int j = 0; j < maxDoc; j++)
>>>           if (!reader.isDeleted(j)) {               // skip deleted  
>>> docs
>>>             fieldsWriter.addDocument(reader.document(j));
>>>             docCount++;
>>>           }
>>>       }
>>>     } finally {
>>>       fieldsWriter.close();
>>>     }
>>>     return docCount;
>>>   }
>>>
>>>   /**
>>>    * Merge the TermVectors from each of the segments into the new one.
>>>    * @throws IOException
>>>    */
>>>   private final void mergeVectors() throws IOException {
>>>     TermVectorsWriter termVectorsWriter =
>>>       new TermVectorsWriter(directory, segment, fieldInfos);
>>>
>>>     try {
>>>       for (int r = 0; r < readers.size(); r++) {
>>>         IndexReader reader = (IndexReader) readers.elementAt(r);
>>>         int maxDoc = reader.maxDoc();
>>>         for (int docNum = 0; docNum < maxDoc; docNum++) {
>>>           // skip deleted docs
>>>           if (reader.isDeleted(docNum)) {
>>>             continue;
>>>           }
>>>           termVectorsWriter.openDocument();
>>>
>>>           // get all term vectors
>>>           TermFreqVector[] sourceTermVector =
>>>             reader.getTermFreqVectors(docNum);
>>>
>>>           if (sourceTermVector != null) {
>>>             for (int f = 0; f < sourceTermVector.length; f++) {
>>>               // translate field numbers
>>>               TermFreqVector termVector = sourceTermVector[f];
>>>               termVectorsWriter.openField(termVector.getField());
>>>               String [] terms = termVector.getTerms();
>>>               int [] freqs = termVector.getTermFrequencies();
>>>
>>>               for (int t = 0; t < terms.length; t++) {
>>>                 termVectorsWriter.addTerm(terms[t], freqs[t]);
>>>               }
>>>             }
>>>             termVectorsWriter.closeDocument();
>>>           }
>>>         }
>>>       }
>>>     } finally {
>>>       termVectorsWriter.close();
>>>     }
>>>   }
>>>
>>>   private OutputStream freqOutput = null;
>>>   private OutputStream proxOutput = null;
>>>   private TermInfosWriter termInfosWriter = null;
>>>   private int skipInterval;
>>>   private SegmentMergeQueue queue = null;
>>>
>>>   private final void mergeTerms() throws IOException {
>>>     try {
>>>       freqOutput = directory.createFile(segment + ".frq");
>>>       proxOutput = directory.createFile(segment + ".prx");
>>>       termInfosWriter =
>>>               new TermInfosWriter(directory, segment, fieldInfos);
>>>       skipInterval = termInfosWriter.skipInterval;
>>>       queue = new SegmentMergeQueue(readers.size());
>>>
>>>       mergeTermInfos();
>>>
>>>     } finally {
>>>       if (freqOutput != null) freqOutput.close();
>>>       if (proxOutput != null) proxOutput.close();
>>>       if (termInfosWriter != null) termInfosWriter.close();
>>>       if (queue != null) queue.close();
>>>     }
>>>   }
>>>
>>>   private final void mergeTermInfos() throws IOException {
>>>     int base = 0;
>>>     for (int i = 0; i < readers.size(); i++) {
>>>       IndexReader reader = (IndexReader) readers.elementAt(i);
>>>       TermEnum termEnum = reader.terms();
>>>       SegmentMergeInfo smi = new SegmentMergeInfo(base, termEnum,  
>>> reader);
>>>       base += reader.numDocs();
>>>       if (smi.next())
>>>         queue.put(smi);                  // initialize queue
>>>       else
>>>         smi.close();
>>>     }
>>>
>>>     SegmentMergeInfo[] match = new SegmentMergeInfo[readers.size()];
>>>
>>>     while (queue.size() > 0) {
>>>       int matchSize = 0;              // pop matching terms
>>>       match[matchSize++] = (SegmentMergeInfo) queue.pop();
>>>       Term term = match[0].term;
>>>       SegmentMergeInfo top = (SegmentMergeInfo) queue.top();
>>>
>>>       while (top != null && term.compareTo(top.term) == 0) {
>>>         match[matchSize++] = (SegmentMergeInfo) queue.pop();
>>>         top = (SegmentMergeInfo) queue.top();
>>>       }
>>>
>>>       mergeTermInfo(match, matchSize);          // add new TermInfo
>>>
>>>       while (matchSize > 0) {
>>>         SegmentMergeInfo smi = match[--matchSize];
>>>         if (smi.next())
>>>           queue.put(smi);              // restore queue
>>>         else
>>>           smi.close();                  // done with a segment
>>>       }
>>>     }
>>>   }
>>>
>>>   private final TermInfo termInfo = new TermInfo(); // minimize consing
>>>
>>>   /** Merge one term found in one or more segments. The array  
>>> <code>smis</code>
>>>    *  contains segments that are positioned at the same term.  
>>> <code>N</code>
>>>    *  is the number of cells in the array actually occupied.
>>>    *
>>>    * @param smis array of segments
>>>    * @param n number of cells in the array actually occupied
>>>    */
>>>   private final void mergeTermInfo(SegmentMergeInfo[] smis, int n)
>>>           throws IOException {
>>>     long freqPointer = freqOutput.getFilePointer();
>>>     long proxPointer = proxOutput.getFilePointer();
>>>
>>>     int df = appendPostings(smis, n);          // append posting data
>>>
>>>     long skipPointer = writeSkip();
>>>
>>>     if (df > 0) {
>>>       // add an entry to the dictionary with pointers to prox and 
>>> freq  files
>>>       termInfo.set(df, freqPointer, proxPointer, (int) (skipPointer 
>>> -  freqPointer));
>>>       termInfosWriter.add(smis[0].term, termInfo);
>>>     }
>>>   }
>>>
>>>   /** Process postings from multiple segments all positioned on the
>>>    *  same term. Writes out merged entries into freqOutput and
>>>    *  the proxOutput streams.
>>>    *
>>>    * @param smis array of segments
>>>    * @param n number of cells in the array actually occupied
>>>    * @return number of documents across all segments where this 
>>> term  was found
>>>    */
>>>   private final int appendPostings(SegmentMergeInfo[] smis, int n)
>>>           throws IOException {
>>>     int lastDoc = 0;
>>>     int df = 0;                      // number of docs w/ term
>>>     resetSkip();
>>>     for (int i = 0; i < n; i++) {
>>>       SegmentMergeInfo smi = smis[i];
>>>       TermPositions postings = smi.postings;
>>>       int base = smi.base;
>>>       int[] docMap = smi.docMap;
>>>       postings.seek(smi.termEnum);
>>>       while (postings.next()) {
>>>         int doc = postings.doc();
>>>         if (docMap != null)
>>>           doc = docMap[doc];                      // map around  
>>> deletions
>>>         doc += base;                              // convert to 
>>> merged  space
>>>
>>>         if (doc < lastDoc)
>>>           throw new IllegalStateException("docs out of order");
>>>
>>>         df++;
>>>
>>>         if ((df % skipInterval) == 0) {
>>>           bufferSkip(lastDoc);
>>>         }
>>>
>>>         int docCode = (doc - lastDoc) << 1;      // use low bit to 
>>> flag  freq=1
>>>         lastDoc = doc;
>>>
>>>         int freq = postings.freq();
>>>         if (freq == 1) {
>>>           freqOutput.writeVInt(docCode | 1);      // write doc & freq=1
>>>         } else {
>>>           freqOutput.writeVInt(docCode);      // write doc
>>>           freqOutput.writeVInt(freq);          // write frequency in 
>>> doc
>>>         }
>>>
>>>         int lastPosition = 0;              // write position deltas
>>>         for (int j = 0; j < freq; j++) {
>>>           int position = postings.nextPosition();
>>>           proxOutput.writeVInt(position - lastPosition);
>>>           lastPosition = position;
>>>         }
>>>       }
>>>     }
>>>     return df;
>>>   }
>>>
>>>   private RAMOutputStream skipBuffer = new RAMOutputStream();
>>>   private int lastSkipDoc;
>>>   private long lastSkipFreqPointer;
>>>   private long lastSkipProxPointer;
>>>
>>>   private void resetSkip() throws IOException {
>>>     skipBuffer.reset();
>>>     lastSkipDoc = 0;
>>>     lastSkipFreqPointer = freqOutput.getFilePointer();
>>>     lastSkipProxPointer = proxOutput.getFilePointer();
>>>   }
>>>
>>>   private void bufferSkip(int doc) throws IOException {
>>>     long freqPointer = freqOutput.getFilePointer();
>>>     long proxPointer = proxOutput.getFilePointer();
>>>
>>>     skipBuffer.writeVInt(doc - lastSkipDoc);
>>>     skipBuffer.writeVInt((int) (freqPointer - lastSkipFreqPointer));
>>>     skipBuffer.writeVInt((int) (proxPointer - lastSkipProxPointer));
>>>
>>>     lastSkipDoc = doc;
>>>     lastSkipFreqPointer = freqPointer;
>>>     lastSkipProxPointer = proxPointer;
>>>   }
>>>
>>>   private long writeSkip() throws IOException {
>>>     long skipPointer = freqOutput.getFilePointer();
>>>     skipBuffer.writeTo(freqOutput);
>>>     return skipPointer;
>>>   }
>>>
>>>   private void mergeNorms() throws IOException {
>>>     for (int i = 0; i < fieldInfos.size(); i++) {
>>>       FieldInfo fi = fieldInfos.fieldInfo(i);
>>>       if (fi.isIndexed) {
>>>         OutputStream output = directory.createFile(segment + ".f" + i);
>>>         try {
>>>           for (int j = 0; j < readers.size(); j++) {
>>>             IndexReader reader = (IndexReader) readers.elementAt(j);
>>>             byte[] input = reader.norms(fi.name);
>>>             int maxDoc = reader.maxDoc();
>>>             for (int k = 0; k < maxDoc; k++) {
>>>               byte norm = input != null ? input[k] : (byte) 0;
>>>               if (!reader.isDeleted(k)) {
>>>                 output.writeByte(norm);
>>>               }
>>>             }
>>>           }
>>>         } finally {
>>>           output.close();
>>>         }
>>>       }
>>>     }
>>>   }
>>>
>>> }
>>> Index: src/test/org/apache/lucene/index/TestCompoundFile.java
>>> ===================================================================
>>> RCS file:  
>>> /home/cvspublic/jakarta-lucene/src/test/org/apache/lucene/index/ 
>>> TestCompoundFile.java,v
>>> retrieving revision 1.5
>>> diff -r1.5 TestCompoundFile.java
>>> 20a21,24
>>>
>>>> import java.util.Collection;
>>>> import java.util.HashMap;
>>>> import java.util.Iterator;
>>>> import java.util.Map;
>>>
>>>
>>> 197a202,204
>>>
>>>>
>>>>             InputStream expected = dir.openFile(name);
>>>>
>>> 203c210
>>> <             InputStream expected = dir.openFile(name);
>>> ---
>>>
>>>>
>>> 206a214
>>>
>>>>
>>> 220a229,231
>>>
>>>>         InputStream expected1 = dir.openFile("d1");
>>>>         InputStream expected2 = dir.openFile("d2");
>>>>
>>> 227c238
>>> <         InputStream expected = dir.openFile("d1");
>>> ---
>>>
>>>>
>>> 229,231c240,242
>>> <         assertSameStreams("d1", expected, actual);
>>> <         assertSameSeekBehavior("d1", expected, actual);
>>> <         expected.close();
>>> ---
>>>
>>>>         assertSameStreams("d1", expected1, actual);
>>>>         assertSameSeekBehavior("d1", expected1, actual);
>>>>         expected1.close();
>>>
>>>
>>> 234c245
>>> <         expected = dir.openFile("d2");
>>> ---
>>>
>>>>
>>> 236,238c247,249
>>> <         assertSameStreams("d2", expected, actual);
>>> <         assertSameSeekBehavior("d2", expected, actual);
>>> <         expected.close();
>>> ---
>>>
>>>>         assertSameStreams("d2", expected2, actual);
>>>>         assertSameSeekBehavior("d2", expected2, actual);
>>>>         expected2.close();
>>>
>>>
>>> 270,271d280
>>> <         // Now test
>>> <         CompoundFileWriter csw = new CompoundFileWriter(dir,  
>>> "test.cfs");
>>> 275a285,292
>>>
>>>>
>>>>         InputStream[] check = new InputStream[data.length];
>>>>         for (int i=0; i<data.length; i++) {
>>>>            check[i] = dir.openFile(segment + data[i]);
>>>>         }
>>>>
>>>>         // Now test
>>>>         CompoundFileWriter csw = new CompoundFileWriter(dir,  
>>>> "test.cfs");
>>>
>>>
>>> 283d299
>>> <             InputStream check = dir.openFile(segment + data[i]);
>>> 285,286c301,302
>>> <             assertSameStreams(data[i], check, test);
>>> <             assertSameSeekBehavior(data[i], check, test);
>>> ---
>>>
>>>>             assertSameStreams(data[i], check[i], test);
>>>>             assertSameSeekBehavior(data[i], check[i], test);
>>>
>>>
>>> 288c304
>>> <             check.close();
>>> ---
>>>
>>>>             check[i].close();
>>>
>>>
>>> 299c315,316
>>> <     private void setUp_2() throws IOException {
>>> ---
>>>
>>>>     private Map setUp_2() throws IOException {
>>>>             Map streams = new HashMap(20);
>>>
>>>
>>> 303a321,322
>>>
>>>>
>>>>             streams.put("f" + i, dir.openFile("f" + i));
>>>
>>>
>>> 305a325,326
>>>
>>>>
>>>>         return streams;
>>>
>>>
>>> 308c329,336
>>> <
>>> ---
>>>
>>>>     private void closeUp(Map streams) throws IOException {
>>>>         Iterator it = streams.values().iterator();
>>>>         while (it.hasNext()) {
>>>>             InputStream stream = (InputStream)it.next();
>>>>             stream.close();
>>>>         }
>>>>     }
>>>>
>>> 364c392
>>> <         setUp_2();
>>> ---
>>>
>>>>         Map streams = setUp_2();
>>>
>>>
>>> 368c396
>>> <         InputStream expected = dir.openFile("f11");
>>> ---
>>>
>>>>         InputStream expected = (InputStream)streams.get("f11");
>>>
>>>
>>> 410c438,439
>>> <         expected.close();
>>> ---
>>>
>>>>         closeUp(streams);
>>>>
>>> 418c447
>>> <         setUp_2();
>>> ---
>>>
>>>>         Map streams = setUp_2();
>>>
>>>
>>> 422,423c451,452
>>> <         InputStream e1 = dir.openFile("f11");
>>> <         InputStream e2 = dir.openFile("f3");
>>> ---
>>>
>>>>         InputStream e1 = (InputStream)streams.get("f11");
>>>>         InputStream e2 = (InputStream)streams.get("f3");
>>>
>>>
>>> 426c455
>>> <         InputStream a2 = dir.openFile("f3");
>>> ---
>>>
>>>>         InputStream a2 = cr.openFile("f3");
>>>
>>>
>>> 486,487d514
>>> <         e1.close();
>>> <         e2.close();
>>> 490a518,519
>>>
>>>>
>>>>         closeUp(streams);
>>>
>>>
>>> 497c526
>>> <         setUp_2();
>>> ---
>>>
>>>>         Map streams = setUp_2();
>>>
>>>
>>> 569a599,600
>>>
>>>>
>>>>         closeUp(streams);
>>>
>>>
>>> 574c605
>>> <         setUp_2();
>>> ---
>>>
>>>>         Map streams = setUp_2();
>>>
>>>
>>> 587a619,620
>>>
>>>>
>>>>         closeUp(streams);
>>>
>>>
>>> 592c625
>>> <         setUp_2();
>>> ---
>>>
>>>>         Map streams = setUp_2();
>>>
>>>
>>> 617a651,652
>>>
>>>>
>>>>         closeUp(streams);
>>>
>>>
>>> package org.apache.lucene.index;
>>>
>>> /**
>>>  * Copyright 2004 The Apache Software Foundation
>>>  *
>>>  * Licensed under the Apache License, Version 2.0 (the "License");
>>>  * you may not use this file except in compliance with the License.
>>>  * You may obtain a copy of the License at
>>>  *
>>>  *     http://www.apache.org/licenses/LICENSE-2.0
>>>  *
>>>  * Unless required by applicable law or agreed to in writing, software
>>>  * distributed under the License is distributed on an "AS IS" BASIS,
>>>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or  
>>> implied.
>>>  * See the License for the specific language governing permissions and
>>>  * limitations under the License.
>>>  */
>>>
>>> import java.io.IOException;
>>> import java.io.File;
>>> import java.util.Collection;
>>> import java.util.HashMap;
>>> import java.util.Iterator;
>>> import java.util.Map;
>>>
>>> import junit.framework.TestCase;
>>> import junit.framework.TestSuite;
>>> import junit.textui.TestRunner;
>>> import org.apache.lucene.store.OutputStream;
>>> import org.apache.lucene.store.Directory;
>>> import org.apache.lucene.store.InputStream;
>>> import org.apache.lucene.store.FSDirectory;
>>> import org.apache.lucene.store.RAMDirectory;
>>> import org.apache.lucene.store._TestHelper;
>>>
>>>
>>> /**
>>>  * @author dmitrys@earthlink.net
>>>  * @version $Id: TestCompoundFile.java,v 1.5 2004/03/29 22:48:06  
>>> cutting Exp $
>>>  */
>>> public class TestCompoundFile extends TestCase
>>> {
>>>     /** Main for running test case by itself. */
>>>     public static void main(String args[]) {
>>>         TestRunner.run (new TestSuite(TestCompoundFile.class));
>>> //        TestRunner.run (new TestCompoundFile("testSingleFile"));
>>> //        TestRunner.run (new TestCompoundFile("testTwoFiles"));
>>> //        TestRunner.run (new TestCompoundFile("testRandomFiles"));
>>> //        TestRunner.run (new  
>>> TestCompoundFile("testClonedStreamsClosing"));
>>> //        TestRunner.run (new TestCompoundFile("testReadAfterClose"));
>>> //        TestRunner.run (new TestCompoundFile("testRandomAccess"));
>>> //        TestRunner.run (new  
>>> TestCompoundFile("testRandomAccessClones"));
>>> //        TestRunner.run (new TestCompoundFile("testFileNotFound"));
>>> //        TestRunner.run (new TestCompoundFile("testReadPastEOF"));
>>>
>>> //        TestRunner.run (new TestCompoundFile("testIWCreate"));
>>>
>>>     }
>>>
>>>
>>>     private Directory dir;
>>>
>>>
>>>     public void setUp() throws IOException {
>>>         //dir = new RAMDirectory();
>>>         dir = FSDirectory.getDirectory(new  
>>> File(System.getProperty("tempDir"), "testIndex"), true);
>>>     }
>>>
>>>
>>>     /** Creates a file of the specified size with random data. */
>>>     private void createRandomFile(Directory dir, String name, int size)
>>>     throws IOException
>>>     {
>>>         OutputStream os = dir.createFile(name);
>>>         for (int i=0; i<size; i++) {
>>>             byte b = (byte) (Math.random() * 256);
>>>             os.writeByte(b);
>>>         }
>>>         os.close();
>>>     }
>>>
>>>     /** Creates a file of the specified size with sequential data. 
>>> The  first
>>>      *  byte is written as the start byte provided. All subsequent  
>>> bytes are
>>>      *  computed as start + offset where offset is the number of 
>>> the  byte.
>>>      */
>>>     private void createSequenceFile(Directory dir,
>>>                                     String name,
>>>                                     byte start,
>>>                                     int size)
>>>     throws IOException
>>>     {
>>>         OutputStream os = dir.createFile(name);
>>>         for (int i=0; i < size; i++) {
>>>             os.writeByte(start);
>>>             start ++;
>>>         }
>>>         os.close();
>>>     }
>>>
>>>
>>>     private void assertSameStreams(String msg,
>>>                                    InputStream expected,
>>>                                    InputStream test)
>>>     throws IOException
>>>     {
>>>         assertNotNull(msg + " null expected", expected);
>>>         assertNotNull(msg + " null test", test);
>>>         assertEquals(msg + " length", expected.length(),  
>>> test.length());
>>>         assertEquals(msg + " position", expected.getFilePointer(),
>>>                                         test.getFilePointer());
>>>
>>>         byte expectedBuffer[] = new byte[512];
>>>         byte testBuffer[] = new byte[expectedBuffer.length];
>>>
>>>         long remainder = expected.length() - expected.getFilePointer();
>>>         while(remainder > 0) {
>>>             int readLen = (int) Math.min(remainder,  
>>> expectedBuffer.length);
>>>             expected.readBytes(expectedBuffer, 0, readLen);
>>>             test.readBytes(testBuffer, 0, readLen);
>>>             assertEqualArrays(msg + ", remainder " + remainder,  
>>> expectedBuffer,
>>>                 testBuffer, 0, readLen);
>>>             remainder -= readLen;
>>>         }
>>>     }
>>>
>>>
>>>     private void assertSameStreams(String msg,
>>>                                    InputStream expected,
>>>                                    InputStream actual,
>>>                                    long seekTo)
>>>     throws IOException
>>>     {
>>>         if(seekTo >= 0 && seekTo < expected.length())
>>>         {
>>>             expected.seek(seekTo);
>>>             actual.seek(seekTo);
>>>             assertSameStreams(msg + ", seek(mid)", expected, actual);
>>>         }
>>>     }
>>>
>>>
>>>
>>>     private void assertSameSeekBehavior(String msg,
>>>                                         InputStream expected,
>>>                                         InputStream actual)
>>>     throws IOException
>>>     {
>>>         // seek to 0
>>>         long point = 0;
>>>         assertSameStreams(msg + ", seek(0)", expected, actual, point);
>>>
>>>         // seek to middle
>>>         point = expected.length() / 2l;
>>>         assertSameStreams(msg + ", seek(mid)", expected, actual,  
>>> point);
>>>
>>>         // seek to end - 2
>>>         point = expected.length() - 2;
>>>         assertSameStreams(msg + ", seek(end-2)", expected, actual,  
>>> point);
>>>
>>>         // seek to end - 1
>>>         point = expected.length() - 1;
>>>         assertSameStreams(msg + ", seek(end-1)", expected, actual,  
>>> point);
>>>
>>>         // seek to the end
>>>         point = expected.length();
>>>         assertSameStreams(msg + ", seek(end)", expected, actual,  
>>> point);
>>>
>>>         // seek past end
>>>         point = expected.length() + 1;
>>>         assertSameStreams(msg + ", seek(end+1)", expected, actual,  
>>> point);
>>>     }
>>>
>>>
>>>     private void assertEqualArrays(String msg,
>>>                                    byte[] expected,
>>>                                    byte[] test,
>>>                                    int start,
>>>                                    int len)
>>>     {
>>>         assertNotNull(msg + " null expected", expected);
>>>         assertNotNull(msg + " null test", test);
>>>
>>>         for (int i=start; i<len; i++) {
>>>             assertEquals(msg + " " + i, expected[i], test[i]);
>>>         }
>>>     }
>>>
>>>
>>>     // ===========================================================
>>>     //  Tests of the basic CompoundFile functionality
>>>     // ===========================================================
>>>
>>>
>>>     /** This test creates compound file based on a single file.
>>>      *  Files of different sizes are tested: 0, 1, 10, 100 bytes.
>>>      */
>>>     public void testSingleFile() throws IOException {
>>>         int data[] = new int[] { 0, 1, 10, 100 };
>>>         for (int i=0; i<data.length; i++) {
>>>             String name = "t" + data[i];
>>>             createSequenceFile(dir, name, (byte) 0, data[i]);
>>>
>>>             InputStream expected = dir.openFile(name);
>>>
>>>             CompoundFileWriter csw = new CompoundFileWriter(dir, 
>>> name  + ".cfs");
>>>             csw.addFile(name);
>>>             csw.close();
>>>
>>>             CompoundFileReader csr = new CompoundFileReader(dir, 
>>> name  + ".cfs");
>>>
>>>             InputStream actual = csr.openFile(name);
>>>             assertSameStreams(name, expected, actual);
>>>             assertSameSeekBehavior(name, expected, actual);
>>>
>>>             expected.close();
>>>             actual.close();
>>>             csr.close();
>>>         }
>>>     }
>>>
>>>
>>>     /** This test creates compound file based on two files.
>>>      *
>>>      */
>>>     public void testTwoFiles() throws IOException {
>>>         createSequenceFile(dir, "d1", (byte) 0, 15);
>>>         createSequenceFile(dir, "d2", (byte) 0, 114);
>>>
>>>         InputStream expected1 = dir.openFile("d1");
>>>         InputStream expected2 = dir.openFile("d2");
>>>
>>>         CompoundFileWriter csw = new CompoundFileWriter(dir, "d.csf");
>>>         csw.addFile("d1");
>>>         csw.addFile("d2");
>>>         csw.close();
>>>
>>>         CompoundFileReader csr = new CompoundFileReader(dir, "d.csf");
>>>
>>>         InputStream actual = csr.openFile("d1");
>>>         assertSameStreams("d1", expected1, actual);
>>>         assertSameSeekBehavior("d1", expected1, actual);
>>>         expected1.close();
>>>         actual.close();
>>>
>>>
>>>         actual = csr.openFile("d2");
>>>         assertSameStreams("d2", expected2, actual);
>>>         assertSameSeekBehavior("d2", expected2, actual);
>>>         expected2.close();
>>>         actual.close();
>>>         csr.close();
>>>     }
>>>
>>>     /** This test creates a compound file based on a large number 
>>> of  files of
>>>      *  various length. The file content is generated randomly. The  
>>> sizes range
>>>      *  from 0 to 1Mb. Some of the sizes are selected to test the  
>>> buffering
>>>      *  logic in the file reading code. For this the chunk variable 
>>> is  set to
>>>      *  the length of the buffer used internally by the compound 
>>> file  logic.
>>>      */
>>>     public void testRandomFiles() throws IOException {
>>>         // Setup the test segment
>>>         String segment = "test";
>>>         int chunk = 1024; // internal buffer size used by the stream
>>>         createRandomFile(dir, segment + ".zero", 0);
>>>         createRandomFile(dir, segment + ".one", 1);
>>>         createRandomFile(dir, segment + ".ten", 10);
>>>         createRandomFile(dir, segment + ".hundred", 100);
>>>         createRandomFile(dir, segment + ".big1", chunk);
>>>         createRandomFile(dir, segment + ".big2", chunk - 1);
>>>         createRandomFile(dir, segment + ".big3", chunk + 1);
>>>         createRandomFile(dir, segment + ".big4", 3 * chunk);
>>>         createRandomFile(dir, segment + ".big5", 3 * chunk - 1);
>>>         createRandomFile(dir, segment + ".big6", 3 * chunk + 1);
>>>         createRandomFile(dir, segment + ".big7", 1000 * chunk);
>>>
>>>         // Setup extraneous files
>>>         createRandomFile(dir, "onetwothree", 100);
>>>         createRandomFile(dir, segment + ".notIn", 50);
>>>         createRandomFile(dir, segment + ".notIn2", 51);
>>>
>>>         final String data[] = new String[] {
>>>             ".zero", ".one", ".ten", ".hundred", ".big1", ".big2",  
>>> ".big3",
>>>             ".big4", ".big5", ".big6", ".big7"
>>>         };
>>>
>>>         InputStream[] check = new InputStream[data.length];
>>>         for (int i=0; i<data.length; i++) {
>>>            check[i] = dir.openFile(segment + data[i]);
>>>         }
>>>
>>>         // Now test
>>>         CompoundFileWriter csw = new CompoundFileWriter(dir,  
>>> "test.cfs");
>>>         for (int i=0; i<data.length; i++) {
>>>             csw.addFile(segment + data[i]);
>>>         }
>>>         csw.close();
>>>
>>>         CompoundFileReader csr = new CompoundFileReader(dir,  
>>> "test.cfs");
>>>         for (int i=0; i<data.length; i++) {
>>>             InputStream test = csr.openFile(segment + data[i]);
>>>             assertSameStreams(data[i], check[i], test);
>>>             assertSameSeekBehavior(data[i], check[i], test);
>>>             test.close();
>>>             check[i].close();
>>>         }
>>>         csr.close();
>>>     }
>>>
>>>
>>>     /** Setup a larger compound file with a number of components, 
>>> each  of
>>>      *  which is a sequential file (so that we can easily tell that 
>>> we  are
>>>      *  reading in the right byte). The methods sets up 20 files - 
>>> f0  to f19,
>>>      *  the size of each file is 1000 bytes.
>>>      */
>>>     private Map setUp_2() throws IOException {
>>>             Map streams = new HashMap(20);
>>>         CompoundFileWriter cw = new CompoundFileWriter(dir, "f.comp");
>>>         for (int i=0; i<20; i++) {
>>>             createSequenceFile(dir, "f" + i, (byte) 0, 2000);
>>>             cw.addFile("f" + i);
>>>
>>>             streams.put("f" + i, dir.openFile("f" + i));
>>>         }
>>>         cw.close();
>>>
>>>         return streams;
>>>     }
>>>
>>>     private void closeUp(Map streams) throws IOException {
>>>         Iterator it = streams.values().iterator();
>>>         while (it.hasNext()) {
>>>             InputStream stream = (InputStream)it.next();
>>>             stream.close();
>>>         }
>>>     }
>>>
>>>     public void testReadAfterClose() throws IOException {
>>>         demo_FSInputStreamBug((FSDirectory) dir, "test");
>>>     }
>>>
>>>     private void demo_FSInputStreamBug(FSDirectory fsdir, String file)
>>>     throws IOException
>>>     {
>>>         // Setup the test file - we need more than 1024 bytes
>>>         OutputStream os = fsdir.createFile(file);
>>>         for(int i=0; i<2000; i++) {
>>>             os.writeByte((byte) i);
>>>         }
>>>         os.close();
>>>
>>>         InputStream in = fsdir.openFile(file);
>>>
>>>         // This read primes the buffer in InputStream
>>>         byte b = in.readByte();
>>>
>>>         // Close the file
>>>         in.close();
>>>
>>>         // ERROR: this call should fail, but succeeds because the  
>>> buffer
>>>         // is still filled
>>>         b = in.readByte();
>>>
>>>         // ERROR: this call should fail, but succeeds for some 
>>> reason  as well
>>>         in.seek(1099);
>>>
>>>         try {
>>>             // OK: this call correctly fails. We are now past the 
>>> 1024  internal
>>>             // buffer, so an actual IO is attempted, which fails
>>>             b = in.readByte();
>>>         } catch (IOException e) {
>>>         }
>>>     }
>>>
>>>
>>>     static boolean isCSInputStream(InputStream is) {
>>>         return is instanceof CompoundFileReader.CSInputStream;
>>>     }
>>>
>>>     static boolean isCSInputStreamOpen(InputStream is) throws  
>>> IOException {
>>>         if (isCSInputStream(is)) {
>>>             CompoundFileReader.CSInputStream cis =
>>>             (CompoundFileReader.CSInputStream) is;
>>>
>>>             return _TestHelper.isFSInputStreamOpen(cis.base);
>>>         } else {
>>>             return false;
>>>         }
>>>     }
>>>
>>>
>>>     public void testClonedStreamsClosing() throws IOException {
>>>         Map streams = setUp_2();
>>>         CompoundFileReader cr = new CompoundFileReader(dir, "f.comp");
>>>
>>>         // basic clone
>>>         InputStream expected = (InputStream)streams.get("f11");
>>>         assertTrue(_TestHelper.isFSInputStreamOpen(expected));
>>>
>>>         InputStream one = cr.openFile("f11");
>>>         assertTrue(isCSInputStreamOpen(one));
>>>
>>>         InputStream two = (InputStream) one.clone();
>>>         assertTrue(isCSInputStreamOpen(two));
>>>
>>>         assertSameStreams("basic clone one", expected, one);
>>>         expected.seek(0);
>>>         assertSameStreams("basic clone two", expected, two);
>>>
>>>         // Now close the first stream
>>>         one.close();
>>>         assertTrue("Only close when cr is closed",  
>>> isCSInputStreamOpen(one));
>>>
>>>         // The following should really fail since we couldn't expect to
>>>         // access a file once close has been called on it 
>>> (regardless  of
>>>         // buffering and/or clone magic)
>>>         expected.seek(0);
>>>         two.seek(0);
>>>         assertSameStreams("basic clone two/2", expected, two);
>>>
>>>
>>>         // Now close the compound reader
>>>         cr.close();
>>>         assertFalse("Now closed one", isCSInputStreamOpen(one));
>>>         assertFalse("Now closed two", isCSInputStreamOpen(two));
>>>
>>>         // The following may also fail since the compound stream is  
>>> closed
>>>         expected.seek(0);
>>>         two.seek(0);
>>>         //assertSameStreams("basic clone two/3", expected, two);
>>>
>>>
>>>         // Now close the second clone
>>>         two.close();
>>>         expected.seek(0);
>>>         two.seek(0);
>>>         //assertSameStreams("basic clone two/4", expected, two);
>>>
>>>         closeUp(streams);
>>>
>>>     }
>>>
>>>
>>>     /** This test opens two files from a compound stream and 
>>> verifies  that
>>>      *  their file positions are independent of each other.
>>>      */
>>>     public void testRandomAccess() throws IOException {
>>>         Map streams = setUp_2();
>>>         CompoundFileReader cr = new CompoundFileReader(dir, "f.comp");
>>>
>>>         // Open two files
>>>         InputStream e1 = (InputStream)streams.get("f11");
>>>         InputStream e2 = (InputStream)streams.get("f3");
>>>
>>>         InputStream a1 = cr.openFile("f11");
>>>         InputStream a2 = cr.openFile("f3");
>>>
>>>         // Seek the first pair
>>>         e1.seek(100);
>>>         a1.seek(100);
>>>         assertEquals(100, e1.getFilePointer());
>>>         assertEquals(100, a1.getFilePointer());
>>>         byte be1 = e1.readByte();
>>>         byte ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         // Now seek the second pair
>>>         e2.seek(1027);
>>>         a2.seek(1027);
>>>         assertEquals(1027, e2.getFilePointer());
>>>         assertEquals(1027, a2.getFilePointer());
>>>         byte be2 = e2.readByte();
>>>         byte ba2 = a2.readByte();
>>>         assertEquals(be2, ba2);
>>>
>>>         // Now make sure the first one didn't move
>>>         assertEquals(101, e1.getFilePointer());
>>>         assertEquals(101, a1.getFilePointer());
>>>         be1 = e1.readByte();
>>>         ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         // Now more the first one again, past the buffer length
>>>         e1.seek(1910);
>>>         a1.seek(1910);
>>>         assertEquals(1910, e1.getFilePointer());
>>>         assertEquals(1910, a1.getFilePointer());
>>>         be1 = e1.readByte();
>>>         ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         // Now make sure the second set didn't move
>>>         assertEquals(1028, e2.getFilePointer());
>>>         assertEquals(1028, a2.getFilePointer());
>>>         be2 = e2.readByte();
>>>         ba2 = a2.readByte();
>>>         assertEquals(be2, ba2);
>>>
>>>         // Move the second set back, again cross the buffer size
>>>         e2.seek(17);
>>>         a2.seek(17);
>>>         assertEquals(17, e2.getFilePointer());
>>>         assertEquals(17, a2.getFilePointer());
>>>         be2 = e2.readByte();
>>>         ba2 = a2.readByte();
>>>         assertEquals(be2, ba2);
>>>
>>>         // Finally, make sure the first set didn't move
>>>         // Now make sure the first one didn't move
>>>         assertEquals(1911, e1.getFilePointer());
>>>         assertEquals(1911, a1.getFilePointer());
>>>         be1 = e1.readByte();
>>>         ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         a1.close();
>>>         a2.close();
>>>         cr.close();
>>>
>>>         closeUp(streams);
>>>     }
>>>
>>>     /** This test opens two files from a compound stream and 
>>> verifies  that
>>>      *  their file positions are independent of each other.
>>>      */
>>>     public void testRandomAccessClones() throws IOException {
>>>         Map streams = setUp_2();
>>>         CompoundFileReader cr = new CompoundFileReader(dir, "f.comp");
>>>
>>>         // Open two files
>>>         InputStream e1 = cr.openFile("f11");
>>>         InputStream e2 = cr.openFile("f3");
>>>
>>>         InputStream a1 = (InputStream) e1.clone();
>>>         InputStream a2 = (InputStream) e2.clone();
>>>
>>>         // Seek the first pair
>>>         e1.seek(100);
>>>         a1.seek(100);
>>>         assertEquals(100, e1.getFilePointer());
>>>         assertEquals(100, a1.getFilePointer());
>>>         byte be1 = e1.readByte();
>>>         byte ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         // Now seek the second pair
>>>         e2.seek(1027);
>>>         a2.seek(1027);
>>>         assertEquals(1027, e2.getFilePointer());
>>>         assertEquals(1027, a2.getFilePointer());
>>>         byte be2 = e2.readByte();
>>>         byte ba2 = a2.readByte();
>>>         assertEquals(be2, ba2);
>>>
>>>         // Now make sure the first one didn't move
>>>         assertEquals(101, e1.getFilePointer());
>>>         assertEquals(101, a1.getFilePointer());
>>>         be1 = e1.readByte();
>>>         ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         // Now more the first one again, past the buffer length
>>>         e1.seek(1910);
>>>         a1.seek(1910);
>>>         assertEquals(1910, e1.getFilePointer());
>>>         assertEquals(1910, a1.getFilePointer());
>>>         be1 = e1.readByte();
>>>         ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         // Now make sure the second set didn't move
>>>         assertEquals(1028, e2.getFilePointer());
>>>         assertEquals(1028, a2.getFilePointer());
>>>         be2 = e2.readByte();
>>>         ba2 = a2.readByte();
>>>         assertEquals(be2, ba2);
>>>
>>>         // Move the second set back, again cross the buffer size
>>>         e2.seek(17);
>>>         a2.seek(17);
>>>         assertEquals(17, e2.getFilePointer());
>>>         assertEquals(17, a2.getFilePointer());
>>>         be2 = e2.readByte();
>>>         ba2 = a2.readByte();
>>>         assertEquals(be2, ba2);
>>>
>>>         // Finally, make sure the first set didn't move
>>>         // Now make sure the first one didn't move
>>>         assertEquals(1911, e1.getFilePointer());
>>>         assertEquals(1911, a1.getFilePointer());
>>>         be1 = e1.readByte();
>>>         ba1 = a1.readByte();
>>>         assertEquals(be1, ba1);
>>>
>>>         e1.close();
>>>         e2.close();
>>>         a1.close();
>>>         a2.close();
>>>         cr.close();
>>>
>>>         closeUp(streams);
>>>     }
>>>
>>>
>>>     public void testFileNotFound() throws IOException {
>>>         Map streams = setUp_2();
>>>         CompoundFileReader cr = new CompoundFileReader(dir, "f.comp");
>>>
>>>         // Open two files
>>>         try {
>>>             InputStream e1 = cr.openFile("bogus");
>>>             fail("File not found");
>>>
>>>         } catch (IOException e) {
>>>             /* success */
>>>             //System.out.println("SUCCESS: File Not Found: " + e);
>>>         }
>>>
>>>         cr.close();
>>>
>>>         closeUp(streams);
>>>     }
>>>
>>>
>>>     public void testReadPastEOF() throws IOException {
>>>         Map streams = setUp_2();
>>>         CompoundFileReader cr = new CompoundFileReader(dir, "f.comp");
>>>         InputStream is = cr.openFile("f2");
>>>         is.seek(is.length() - 10);
>>>         byte b[] = new byte[100];
>>>         is.readBytes(b, 0, 10);
>>>
>>>         try {
>>>             byte test = is.readByte();
>>>             fail("Single byte read past end of file");
>>>         } catch (IOException e) {
>>>             /* success */
>>>             //System.out.println("SUCCESS: single byte read past 
>>> end  of file: " + e);
>>>         }
>>>
>>>         is.seek(is.length() - 10);
>>>         try {
>>>             is.readBytes(b, 0, 50);
>>>             fail("Block read past end of file");
>>>         } catch (IOException e) {
>>>             /* success */
>>>             //System.out.println("SUCCESS: block read past end of  
>>> file: " + e);
>>>         }
>>>
>>>         is.close();
>>>         cr.close();
>>>
>>>         closeUp(streams);
>>>     }
>>> }
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>


Mime
View raw message