Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 45417 invoked from network); 10 Nov 2009 06:10:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Nov 2009 06:10:38 -0000 Received: (qmail 64026 invoked by uid 500); 10 Nov 2009 06:10:37 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 63941 invoked by uid 500); 10 Nov 2009 06:10:37 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 63930 invoked by uid 99); 10 Nov 2009 06:10:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 06:10:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of clraychen@gmail.com designates 209.85.210.173 as permitted sender) Received: from [209.85.210.173] (HELO mail-yx0-f173.google.com) (209.85.210.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 06:10:26 +0000 Received: by yxe3 with SMTP id 3so3259817yxe.20 for ; Mon, 09 Nov 2009 22:10:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=b4RVN9jJ2BXdbzbi/VD+K4CO4wyzm1LnjQxLZKWb+Gc=; b=vgztd62w6LMMErg4GJC/i+yVLtR4Y+HWGeDux9a9OtQyZiE5URbfD0nahNhzSTgA7k LqIDbPxnEXgpvorzFPHB07e4HQ9cW6eIe534i5IX6/L7btlTHOLH8k5pKyHui5Z6jZV7 LFPGcG6W4OlL23MIfLmMqR6N0xm7EylAXaYqk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=hOvH4ZodoBXqULJTTUA0Yny4dUzwEC2qkiwmITF6LQoBYSmJ5ezdbzLG+MOizFdGSZ yqIE5W6uTzUiXrN8++nY8HzmPUWIrrZYKfNQjGPPQAITv5FXWV4tp4tp/1769mCrz+tO TKchxe0dmII6KnfZLmiwYkjnxBG8moPxIE60U= MIME-Version: 1.0 Received: by 10.101.4.22 with SMTP id g22mr1560672ani.40.1257833394758; Mon, 09 Nov 2009 22:09:54 -0800 (PST) In-Reply-To: <4AF90270.6060600@gmail.com> References: <96933a4d0911031444j27a2d414l4bbdab74a04ea9e0@mail.gmail.com> <96933a4d0911051045p61431af5ie2cecb850a62267a@mail.gmail.com> <20091105225701.9CBFB478445@athena.apache.org> <4AF392A7.1030504@gmail.com> <4AF3F7DE.3020400@gmail.com> <20091106102845.710C8478443@athena.apache.org> <96933a4d0911071329p13185c5fh3e6900b7474dd602@mail.gmail.com> <4AF8EDA6.9080703@gmail.com> <4d9fb1a00911092103m3a7489a6i7417daab7e208298@mail.gmail.com> <4AF90270.6060600@gmail.com> Date: Tue, 10 Nov 2009 14:09:54 +0800 Message-ID: <4d9fb1a00911092209l49bb00abo85e872e8df93d393@mail.gmail.com> Subject: Re: [classlib] BufferedReader and FileInputStream.available() From: Ray Chen To: dev@harmony.apache.org Content-Type: multipart/alternative; boundary=001636c92900ae63c60477fe28ff X-Virus-Checked: Checked by ClamAV on apache.org --001636c92900ae63c60477fe28ff Content-Type: text/plain; charset=ISO-8859-1 Shall we store the return value of out.position()-1 to avoid invoking it twice? If it is too tiny for performace improvement, current patch seems good for me. On Tue, Nov 10, 2009 at 2:04 PM, Regis wrote: > Ray Chen wrote: > >> Hi Regis, >> >> Maybe add the z/OS newline, on z/OS 0x15 means the newline. You can refer >> to >> PrintStream.write(int) for this. >> > > how about this: > > if ((out.position() > offset) > && (out.get(out.position() - 1) == '\n' || out > .get(out.position() - 1) == 0x15)) { > > // we could return the result without blocking read > break; > } > > >> On Tue, Nov 10, 2009 at 12:35 PM, Regis wrote: >> >> enh wrote: >>> >>> On Fri, Nov 6, 2009 at 02:28, Mark Hindess >>> > >>>> wrote: >>>> >>>> In message <4AF3F7DE.3020400@gmail.com>, Tim Ellison writes: >>>>> >>>>> On 06/Nov/2009 03:06, Regis wrote: >>>>>> >>>>>> Mark Hindess wrote: >>>>>>> >>>>>>> In message < >>>>>>>> 96933a4d0911051045p61431af5ie2cecb850a62267a@mail.gmail.com>, >>>>>>>> enh writes: >>>>>>>> >>>>>>>> On Wed, Nov 4, 2009 at 23:07, Regis wrote: >>>>>>>>> >>>>>>>>> Mark Hindess wrote: >>>>>>>>>> >>>>>>>>>> In message <4AF0DC18.1000504@gmail.com>, Regis writes: >>>>>>>>>>> >>>>>>>>>>> Regis, these are really two different issues so lets treat them >>>>>>>>>>> as such. I actually think fixing the /proc reading is better >>>>>>>>>>> done in the java code. I've attached a patch that just catches >>>>>>>>>>> and ignores the IOException from the available call. I think >>>>>>>>>>> this is reasonable since if the exception was "real" then the >>>>>>>>>>> read will fail if a similar exception. In fact, since the user >>>>>>>>>>> is doing a read call, having the exception thrown by the read is >>>>>>>>>>> more in keeping with the expectations of the user than having an >>>>>>>>>>> exception thrown by the available call. >>>>>>>>>>> >>>>>>>>>>> Thanks Mark, I ran luni tests with your patch, no regression >>>>>>>>>> found, I'll leave patch to you to apply :) >>>>>>>>>> >>>>>>>>>> invoking in.available before read is a little overhead, in the >>>>>>>>>> worst case, one read need four system calls, three seek and one >>>>>>>>>> read. Actually the condition >>>>>>>>>> >>>>>>>>>> if ((in.available() = 0) && (out.position() > offset)) >>>>>>>>>> >>>>>>>>>> it's only for tty file, for normal file it's not necessary. It's >>>>>>>>>> better if we can remove the check for normal file, but I have no >>>>>>>>>> idea how to do this. >>>>>>>>>> >>>>>>>>>> isatty(3). for the price of one JNI call at construction and a >>>>>>>>> boolean field to record the result, you could avoid any later >>>>>>>>> calls. >>>>>>>>> >>>>>>>>> This might be a good idea, but I'm getting confused... >>>>>>>> >>>>>>>> At the moment there seems to be a special case in >>>>>>>> FileInputStream.available() like this: >>>>>>>> >>>>>>>> // stdin requires special handling >>>>>>>> if (fd == FileDescriptor.in) { >>>>>>>> return (int) fileSystem.ttyAvailable(); >>>>>>>> } >>>>>>>> ... perform seek based available check ... >>>>>>>> >>>>>>>> but I find this confusing because the comment and check implies >>>>>>>> that stdin is special but the fileSystem method name implies >>>>>>>> that it is being a tty that is the distinction. This is >>>>>>>> confusing because *any* file descriptor can be a tty - e.g. new >>>>>>>> FileReader("/dev/tty") - and stdin doesn't have to be a tty - >>>>>>>> "java >>>>>>> >>>>>>>> Good question, I suppose here should be like this: >>>>>>> if (istty(fd)) { >>>>>>> return (int) fileSystem.ttyAvailable(fd); >>>>>>> } >>>>>>> >>>>>>> With apologies for being pedantic, but IMHO it should be fd.istty() >>>>>> since there are likely a few places where this could be useful, and we >>>>>> should cache it in the right place. >>>>>> >>>>>> Agreed. I think that is what elliott was suggesting earlier in this >>>>> thread. Definitely seems like we should fix the semantics of these tty >>>>> checks so they aren't just about stdin. >>>>> >>>>> yes and no. i was restricting myself to the question "how should we >>>> query the number of available bytes?". on Unix, the answer is with the >>>> FIONREAD ioctl. so my claim was really: >>>> >>>> * there should be a "static native int available(FileDescriptor fd)" >>>> method generally available >>>> * the Unix implementation should use the FIONREAD ioctl. >>>> * any platform-specific hacks (such as, say, if Windows really needs >>>> the three-seek hack) should be in the native code for the broken >>>> platforms. >>>> * any Java should use the intention-revealing >>>> "available(FileDescriptor)" method and not concern itself with how >>>> that works on any given platform. >>>> >>>> we've done that for Android. (if Windows forces you to treat files and >>>> ttys completely differently, your life will be more difficult, but i >>>> still think that should be in the platform-specific code. as you >>>> pointed out with the "/dev/tty" example, on Unix it's not just >>>> inefficient, it's wrong.) >>>> >>>> for the bigger question of BufferedReader.readLine, i don't think we >>>> should be calling "available" at all here. i think somewhere there >>>> must be the equivalent of a "readFully" rather than a "read", and >>>> that's the mistake that needs fixing. fix that, and the "available" >>>> hack can go completely. >>>> >>>> I just found a way can remove the in.available() check here. Instead of >>> checking in.available, I check whether the last char read from stream is >>> '\n'. >>> >>> Index: modules/luni/src/main/java/java/io/InputStreamReader.java >>> ===================================================================== >>> --- modules/luni/src/main/java/java/io/InputStreamReader.java >>> +++ modules/luni/src/main/java/java/io/InputStreamReader.java >>> @@ -246,14 +246,9 @@ public class InputStreamReader extends Reader { >>> while (out.hasRemaining()) { >>> // fill the buffer if needed >>> if (needInput) { >>> - try { >>> - if ((in.available() == 0) >>> - && (out.position() > offset)) { >>> - // we could return the result without >>> blocking >>> read >>> - break; >>> - } >>> - } catch (IOException e) { >>> - // available didn't work so just try the read >>> + if ((out.position() > offset) && >>> out.get(out.position() - 1) == '\n') { >>> + // we could return the result without blocking >>> read >>> + break; >>> } >>> >>> int to_read = bytes.capacity() - bytes.limit(); >>> >>> >>> -- >>> Best Regards, >>> Regis. >>> >>> >>> >>> try this: >>>> >>>> #include >>>> int main() { >>>> ssize_t n; >>>> char buf[1024]; >>>> while ((n = read(0, buf, sizeof(buf))) > 0) { >>>> write(2, ">", 1); >>>> write(2, buf, n); >>>> } >>>> return 0; >>>> } >>>> >>>> if you run that and type "hello" then return, then "world" then >>>> return, you'll see (mixing your input and its output): >>>> >>>> hello >>>> >>>> hello >>>>> >>>>> world >>>> >>>> world >>>>> >>>>> because it's the terminal's job to make sure you get data at the end >>>> of a line. (or not, if so configured, but if someone has configured >>>> their terminal that way, that's their decision, they should expect >>>> programs' behavior to change, and we certainly shouldn't go around >>>> reconfiguring the terminal.) >>>> >>>> by contrast, if you run this (where my program above has been built as >>>> "read") like this: >>>> >>>> echo -e "hello\nworld\n" | ./read >>>> >>>> you see: >>>> >>>> hello >>>> world >>>> >>>> i.e. both lines arrive as a single chunk of data. >>>> >>>> (you didn't say exactly _how_ the test fails, so i'm just guessing >>>> it's related to this.) >>>> >>>> --elliott >>>> >>>> Still need to look at the available natives too since elliott suggested >>>> >>>>> that we should replace the 3*seek with a single ioctl for all cases >>>>> (though this might not be as portable). >>>>> >>>>> -Mark. >>>>> >>>>> Regis, can you explain a little more about why the check is needed >>>>> >>>>>> in the available call? And why the available check is needed in the >>>>>>>> FileInputStream read method? >>>>>>>> >>>>>>>> After I removed the check, everything is fine, except following >>>>>>> test is >>>>>>> failed: >>>>>>> >>>>>>> public static void main(String args[]) >>>>>>> { >>>>>>> BufferedReader reader = new BufferedReader(new >>>>>>> InputStreamReader(System.in)); >>>>>>> String line = null; >>>>>>> try { >>>>>>> line = reader.readLine(); >>>>>>> } catch (IOException e) { >>>>>>> e.printStackTrace(); >>>>>>> } >>>>>>> System.out.println(line); >>>>>>> } >>>>>>> >>>>>>> In each case, is it really stdin that is special or any file >>>>>>> >>>>>>>> descriptor >>>>>>>> representing a tty? >>>>>>>> >>>>>>>> I'd like to understand this as there are a whole bunch of ttyRead >>>>>>>> with >>>>>>>> a similar check that only affects stdin not any tty descriptor. >>>>>>>> >>>>>>>> in the meantime, i wondered also whether it's worth swapping the >>>>>>>> two >>>>>>>> >>>>>>>>> conjuncts in the if, since the latter is essentially free while the >>>>>>>>> former is expensive. (i've no idea how often the cheap conjunct is >>>>>>>>> false, though.) >>>>>>>>> >>>>>>>>> I thought about this but held of doing it for the same reason. >>>>>>>> However, >>>>>>>> I guess it is almost certainly a win. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mark. >>>>>>>> >>>>>>>> >>>>> >>>> >>>> -- >>> Best Regards, >>> Regis. >>> >>> >> >> >> > > -- > Best Regards, > Regis. > -- Regards, Ray Chen --001636c92900ae63c60477fe28ff--