lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <>
Subject [jira] [Resolved] (LUCENE-1757) Support adding a "stored" field via a Reader
Date Sat, 13 Apr 2013 21:14:15 GMT


Erick Erickson resolved LUCENE-1757.

    Resolution: Won't Fix

SPRING_CLEANING_2013 JIRAS. I think this has been long since changed.
> Support adding a "stored" field via a Reader
> --------------------------------------------
>                 Key: LUCENE-1757
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/index
>            Reporter: Tim Smith
> All current constructors for Field() that take a Reader explicitly say they will not
be stored.
> It would be highly desirable to support adding a stored field to a Document using a Reader
(or some special interface that can go direct to the source data)
> This could greatly reduce memory required for adding very large stored fields (if used
efficiently by IndexWriter)
> This will support two primary use cases:
> 1. can create stored field from arbitrary CharSequence 
> I may internally use a MutableString type class during document processing to conserve
memory, however, i would currently have to convert this to a String() prior to adding it as
a stored field. If i could just pass a Reader for this mutable string/char sequence indexing
could be smart enough to not require allocating double the space.
> 2. can create a stored field from a file on disk
> If adding large stored fields, the actual value may be on disk to reduce memory use during
indexing. In order to support using this as a Stored Field, it would currently have to be
entirely loaded into memory as a String/byte[] in order to be added to a Field() (this could
be quite large and provoke OutOfMemory error)
> Document retrieval considerations:
> It would then also be ideal if when fetching a Document from the index, you could specify
a "max string size" for the returned stored field
> if the field was larger than this cutoff, a Reader going directly to disk would be returned
instead of a String/byte[]  This would again allow smart applications to save memory during
document retrieval (this would be especially be nice for highlighting as the source data could
be streamed right into the highlighter)
> It would also be acceptable if some new interface would be accepted instead of Reader
> this could be some form of "sized" input stream that will return the number of bytes/chars
that will be produced in total
> ex:
> {code}
> public interface FieldSource {
>   /** Size of stored field value (in bytes if isBinary() is true, in chars if isBinary()
is false) */
>   public int size();
>   /** if true, use getInputStream(), if false, use getReader() */
>   public boolean isBinary();
>   /** Get the input stream for pulling this from its source (null if isBinary() is false)
>   public InputStream getInputStream();
>   /** Get the reader for reading character data (null if isBinary() is true) */
>   public Reader getReader();
> }
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message