Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 46810 invoked from network); 6 Dec 2003 00:46:26 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 6 Dec 2003 00:46:26 -0000 Received: (qmail 38967 invoked by uid 500); 6 Dec 2003 00:46:06 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 38934 invoked by uid 500); 6 Dec 2003 00:46:06 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 38917 invoked from network); 6 Dec 2003 00:46:06 -0000 Received: from unknown (HELO mz1.forethought.net) (216.241.36.12) by daedalus.apache.org with SMTP; 6 Dec 2003 00:46:06 -0000 Received: from j72.denver.dsl.forethought.net ([216.241.38.72] helo=www.doomdark.org) by mz1.forethought.net with esmtp (Exim 4.14) id 1ASQaM-0001Im-KK for lucene-user@jakarta.apache.org; Fri, 05 Dec 2003 17:46:14 -0700 Content-Type: text/plain; charset="iso-8859-1" From: Tatu Saloranta Reply-To: tatu@hypermall.net Organization: Linux-users missalie To: "Lucene Users List" Subject: Re: Index and Field.Text Date: Fri, 5 Dec 2003 17:48:06 -0700 User-Agent: KMail/1.4.3 References: <200312050932.58322.tatu@hypermall.net> <3FD0C43D.9030303@lucene.com> In-Reply-To: <3FD0C43D.9030303@lucene.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200312051748.06135.tatu@hypermall.net> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Friday 05 December 2003 10:45, Doug Cutting wrote: > Tatu Saloranta wrote: > > Also, shouldn't there be at least 3 methods that take Readers; one for > > Text-like handling, another for UnStored, and last for UnIndexed. > > How do you store the contents of a Reader? You'd have to double-buffer > it, first reading it into a String to store, and then tokenizing the > StringReader. A key feature of Reader values is that they're streamed: Not really, you can pass Reader to tokenizer, which then reads and tokenizes directly (I think that's the way code also works). This because internally String is read using StringReader, so passing a String looks more like a convenience feature? > the entire value is never in RAM. Storing a Reader value would remove > that advantage. The current API makes this explicit: when you want > something streamed, you pass in a Reader, when you're willing to have > the entire value in memory, pass in a String. I guess for things that are both tokenized and stored, passing a Reader can't really help a lot; if one wants to reduce mem usage, text needs to be read twice, or analyzer needs to help in writing output; or, text needs to be read in-memory much like what happens now. It'd simplify application code a bit, but wouldn't do much more. So.... I guess I need to downgrade my suggestion to require just 2 Reader-taking factory methods? :-) I still think that index-only and store-only version would both make sense. In latter case, storing could be done in fully streaming fashion; in former tokenization can be done? > Yes, it is a bit confusing that Text(String, String) stores its value, > while Text(String, Reader) does not, but it is at least well documented. > And we cannot change it: that would break too many applications. But > we can put this on the list for Lucene 2.0 cleanups. Yes, I understand that. It'd not be reasonable to do such a change. But how about adding more intuitive factory method (UnStored(String, Reader))? > When I first wrote these static methods I meant for them to be > constructor-like. I wanted to have multiple Field(String, String) > constructors, but that's not possible, so I used capitalized static > methods instead. I've never seen anyone else do this (capitalize any > method but a real constructor) so I guess I didn't start a fad! This :-) > should someday too be cleaned up. Lucene was the first Java program > that I ever wrote, and thus its style is in places non-standard. Sorry. Best standards are created by people doing things others use, follow or imitate... so it was worth a try! :-) -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org