lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Harris" <rygu...@gmail.com>
Subject Re: Updating and Appending
Date Wed, 23 Jan 2008 21:29:57 GMT
On Jan 23, 2008 9:04 AM, Yonik Seeley <yonik@apache.org> wrote:
> On Jan 22, 2008 4:10 PM, Owens, Martin <Martin.Owens@merrillcorp.com> wrote:
> > We've got some memory constraint worries from using Java RMI, although I can see
this problem could effect the xml requests too. The Java code doesn't seem to handle large
files as streams.
>
> [...]
>
> If you are talking about a single very large document, you are
> right... there is no way to stream this currently since the XML (and
> CSV) parsers can't give us Readers to various fields.  We perhaps
> could in the future provide a field type that pulled it's actual value
> from a URL.
>
> -Yonik

Supposing you could do this -- i.e. that you could get Solr to pass a
particular field's data to Lucene without reading it all into memory
first --, are there any potential problems on the Lucene end? It's not
going to turn around and slurp the whole field into member itself, is
it?

That was the indexing side. You also have the searching side, in
particular when you need to retrieve the value of a huge stored field.
It looks like Lucene will give you a stored field's value as a stream
(a Java Reader), but that won't do any good if, behind the scenes, it
brings the whole field into memory first. Then there's the question of
whether Solr needs to slurp that whole stream into memory before
outputting that field's contents as XML. (I doubt it does, but I
haven't looked at any of the code recently.) And then if you're using
a client such as solrsharp, there's the question of whether *it* will
slurp the whole stream into memory.

Maybe this is something to take up on JIRA or solr-dev, rather than
here. I was just trying to get a sense of how difficult the proposed
feature would be.

Mime
View raw message