lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject stored field compression
Date Fri, 14 May 2004 16:19:48 GMT
[ Moved discussion from lucene-user. ]

Ype Kingma wrote:
> One place where compression might be useful is in the stored fields [...]

I agree, and this would not be hard to add.

The simplest approach would be to just add the following to Field.java:

   private boolean isCompressed;
   public boolean isCompressed() { return isCompressed; }
   public boolean setIsCompressed(boolean isCompressed) {
     this.isCompressed = isCompressed;
   }

Perhaps along with additional constructors that permit one to specify 
whether a field is compressed, e.g., Field.Text(String name, String 
value, boolean isCompressed).

Then just change FieldsWriter and FieldsReader to use a bit in the bits 
that are stored with each field to indicate whether the value is 
compressed, and, when it is, compress or decompress it accordingly.

A more elaborate approach would be to lazily decompress fields when 
values are accessed.  That way, when you only require one field's value, 
you don't decompress all of the values.  This would require changing 
Field.java a bit more, perhaps replacing its stringValue and readerValue 
fields with something like:

   private Object value;

   private class CompressedValue {
     private byte[] data;
     public CompressedValue(byte[] data) { this.data = data; }
     public CompressedValue(String value) { ... code to compress ... }
     public toString() { ... code to decompress ... }
     public getData() { return data; }
   }

   public String stringValue() {
     value instanceof Reader ? null : value.toString();
   }

   public Reader readerValue() {
     return value instanceof Reader ? (Reader)value : null;
   }

   public byte[] compressedValue() {
     return value instanceof CompressedValue
      ? ((CompressedValue)value).getData()
      : null;
   }

   public boolean setIsCompressed(boolean isCompressed) {
     if (isCompressed && !this.isCompressed) {
       value = new CompressedValue((String)value);
     } else if (!isCompressed && this.isCompressed) {
       value = stringValue();
     }
     this.isCompressed = isCompressed;
   }

   // replace the ctor Field(String, String, ...) with the following
   public Field(String name, Object value,
                boolean store, boolean index,
                boolean token, boolean vector) {
      ...
      if (value instanceof String) {
        this.value = (String)value;
      } else if (value instanceof byte[]) {
        this.value == new CompressedValue((byte[])array);
      } else {
        throw new IllegalArgumentException(...);
      }
      ...
   }

Then change FieldsWriter to write the compressedValue() bytes, when 
non-null, and, finally, change FieldsWriter to, when a value is 
compressed, read the bytes and pass them instead of a String to the ctor.

Anyone want to take this on?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message