incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <>
Subject [jira] Updated: (LUCY-63) InStream and OutStream
Date Wed, 28 Oct 2009 04:19:59 GMT


Marvin Humphrey updated LUCY-63:


InStream and OutStream are roughly analogous to Lucene's IndexInput and
IndexOutput classes, but there are some differences.

Under Lucy, FileHandle is where alternate "file" treatments are implemented:
RAMFileHandle, FSFileHandle.  InStream and OutStream are not final, but that's
so that it's possible to extend them with new methods.  In contrast, alternate
file treatments are achieved under Lucene by subclassing IndexInput and
IndexOutput directly.

Additionally, InStream and OutStream are always buffered.  This allows us to
inline some functionality that would otherwise have to be implemented in terms
of abstract methods like IndexInput.readByte() and IndexOutput.WriteByte().

>From Lucene's (note readByte() in loop): 

public int readVInt() throws IOException {
  byte b = readByte();
  int i = b & 0x7F;
  for (int shift = 7; (b & 0x80) != 0; shift += 7) {
    b = readByte();
    i |= (b & 0x7F) << shift;
  return i;

>From Lucy's InStream.c (note static inline function SI_read_u8() in loop):

InStream_read_c32 (InStream *self) 
    u32_t retval = 0;
    while (1) {
        const u8_t ubyte = SI_read_u8(self);
        retval = (retval << 7) | (ubyte & 0x7f);
        if ((ubyte & 0x80) == 0) { break; }
    return retval;

static INLINE u8_t
SI_read_u8(InStream *self)
    if (self->buf >= self->limit) { S_refill(self); }
    return (u8_t)*self->buf++;

The fact that OutStream is buffered means an extra memory copy (Lucene has
this too).  Theoretically, it would be nice if we could write to the system
buffer directly, but that requires extending the file first -- see

The fact that InStream is buffered introduces no extra cost, because there is
no copy: for InStreams which wrap FSFileHandles, the buffer is sourced from a
memory-mapping operation (mmap for Unixen, MapViewOfFile under Windows).
Multiple InStream objects may share the same underlying FileHandle, since they
do not rely on or update the FileHandle's file position or other state
(excluding refcount). 

At present, no support is provided for systems which do not support memory
mapping.  Previous experiments included a fallback which read data into a
malloc'd buffer, and it would be possible to reintroduce that functionality if
we have to.  For now, though, it's simpler to leave it out.

> InStream and OutStream
> ----------------------
>                 Key: LUCY-63
>                 URL:
>             Project: Lucy
>          Issue Type: Sub-task
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>         Attachments: InStream.bp, InStream.c,

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message