lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lucene@ziplip.com <luc...@ziplip.com>
Subject Inconsistent Read and write behavior in TermInfosWriter and Reader
Date Sun, 15 May 2005 17:57:40 GMT
Hi,


While writing an undefined term , the field is inserted into the index as fieldnumber -1 and
while reading the same index back an exception is thrown.

The behavior should be reversed in my opinion. It should allow insertion of bad data and reads
should very pardoning and try to recover from bad data. Here are the suggested code changes.


--
TermInfosWriter

private final void writeTerm(Term term)
throws IOException {
int iField = fieldInfos.fieldNumber(term.field);
if (iField < 0) {
throw new IOException("Unknown field "+term.field+"; term="+term.text);
}
int start = stringDifference(lastTerm.text, term.text);
int length = term.text.length() - start;

output.writeVInt(start); // write shared prefix length
output.writeVInt(length); // write delta length
output.writeChars(term.text, start, length); // write delta chars

output.writeVInt(iField); // write field num

lastTerm = term;
}

FieldsReader
 final Document doc(int n) throws IOException {
    indexStream.seek(n * 8L);
    long position = indexStream.readLong();
    fieldsStream.seek(position);
    Document doc = new Document();
    int numFields = fieldsStream.readVInt();
    for (int i = 0; i < numFields; i++) {
      int fieldNumber = fieldsStream.readVInt();
      byte bits = fieldsStream.readByte();
      String stFieldValue = fieldsStream.readString();
      if (fieldNumber >=0) {
          FieldInfo fi = fieldInfos.fieldInfo(fieldNumber);
          doc.add(new Field(fi.name, // name
                            stFieldValue, // read value
                            true, // stored
                            fi.isIndexed, // indexed
                            (bits & 1) != 0)); // tokenized
      }
    }
    return doc;
  }
Mime
View raw message