lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Index and Field.Text
Date Sat, 06 Dec 2003 00:48:06 GMT
On Friday 05 December 2003 10:45, Doug Cutting wrote:
> Tatu Saloranta wrote:
> > Also, shouldn't there be at least 3 methods that take Readers; one for
> > Text-like handling, another for UnStored, and last for UnIndexed.
> How do you store the contents of a Reader?  You'd have to double-buffer
> it, first reading it into a String to store, and then tokenizing the
> StringReader.  A key feature of Reader values is that they're streamed:

Not really, you can pass Reader to tokenizer, which then reads and tokenizes 
directly (I think that's the way code also works). This because internally 
String is read using StringReader, so passing a String looks more like a 
convenience feature?

> the entire value is never in RAM.  Storing a Reader value would remove
> that advantage.  The current API makes this explicit: when you want
> something streamed, you pass in a Reader, when you're willing to have
> the entire value in memory, pass in a String.

I guess for things that are both tokenized and stored, passing a Reader can't 
really help a lot; if one wants to reduce mem usage, text needs to be read 
twice, or analyzer needs to help in writing output; or, text needs to be read 
in-memory much like what happens now. It'd simplify application code a bit, 
but wouldn't do much more.

So.... I guess I need to downgrade my suggestion to require just 2 
Reader-taking factory methods? :-)
I still think that index-only and store-only version would both make sense. In 
latter case, storing could be done in fully streaming fashion; in former 
tokenization can be done?

> Yes, it is a bit confusing that Text(String, String) stores its value,
> while Text(String, Reader) does not, but it is at least well documented.
>   And we cannot change it: that would break too many applications.  But
> we can put this on the list for Lucene 2.0 cleanups.

Yes, I understand that. It'd not be reasonable to do such a change. But how 
about adding more intuitive factory method (UnStored(String, Reader))?

> When I first wrote these static methods I meant for them to be
> constructor-like.  I wanted to have multiple Field(String, String)
> constructors, but that's not possible, so I used capitalized static
> methods instead.  I've never seen anyone else do this (capitalize any
> method but a real constructor) so I guess I didn't start a fad!  This


> should someday too be cleaned up.  Lucene was the first Java program
> that I ever wrote, and thus its style is in places non-standard.  Sorry.

Best standards are created by people doing things others use, follow or 
imitate... so it was worth a try! :-)

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message