harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Regis <xu.re...@gmail.com>
Subject Re: [classlib] BufferedReader and FileInputStream.available()
Date Tue, 10 Nov 2009 04:35:50 GMT
enh wrote:
> On Fri, Nov 6, 2009 at 02:28, Mark Hindess <mark.hindess@googlemail.com> wrote:
>> In message <4AF3F7DE.3020400@gmail.com>, Tim Ellison writes:
>>> On 06/Nov/2009 03:06, Regis wrote:
>>>> Mark Hindess wrote:
>>>>> In message <96933a4d0911051045p61431af5ie2cecb850a62267a@mail.gmail.com>,
>>>>> enh writes:
>>>>>> On Wed, Nov 4, 2009 at 23:07, Regis <xu.regis@gmail.com> wrote:
>>>>>>> Mark Hindess wrote:
>>>>>>>> In message <4AF0DC18.1000504@gmail.com>, Regis writes:
>>>>>>>>
>>>>>>>> Regis, these are really two different issues so lets treat
them
>>>>>>>> as such.  I actually think fixing the /proc reading is better
>>>>>>>> done in the java code.  I've attached a patch that just catches
>>>>>>>> and ignores the IOException from the available call.  I think
>>>>>>>> this is reasonable since if the exception was "real" then
the
>>>>>>>> read will fail if a similar exception.  In fact, since the
user
>>>>>>>> is doing a read call, having the exception thrown by the
read is
>>>>>>>> more in keeping with the expectations of the user than having
an
>>>>>>>> exception thrown by the available call.
>>>>>>> Thanks Mark, I ran luni tests with your patch, no regression
>>>>>>> found, I'll leave patch to you to apply :)
>>>>>>>
>>>>>>> invoking in.available before read is a little overhead, in the
>>>>>>> worst case, one read need four system calls, three seek and one
>>>>>>> read. Actually the condition
>>>>>>>
>>>>>>> if ((in.available() = 0) && (out.position() > offset))
>>>>>>>
>>>>>>> it's only for tty file, for normal file it's not necessary. It's
>>>>>>> better if we can remove the check for normal file, but I have
no
>>>>>>> idea how to do this.
>>>>>> isatty(3). for the price of one JNI call at construction and a
>>>>>> boolean field to record the result, you could avoid any later
>>>>>> calls.
>>>>> This might be a good idea, but I'm getting confused...
>>>>>
>>>>> At the moment there seems to be a special case in
>>>>> FileInputStream.available() like this:
>>>>>
>>>>>   // stdin requires special handling
>>>>>   if (fd == FileDescriptor.in) {
>>>>>     return (int) fileSystem.ttyAvailable();
>>>>>   }
>>>>>   ... perform seek based available check ...
>>>>>
>>>>> but I find this confusing because the comment and check implies
>>>>> that stdin is special but the fileSystem method name implies
>>>>> that it is being a tty that is the distinction.  This is
>>>>> confusing because *any* file descriptor can be a tty - e.g. new
>>>>> FileReader("/dev/tty") - and stdin doesn't have to be a tty -
>>>>> "java </dev/zero".
>>>> Good question, I suppose here should be like this:
>>>> if (istty(fd)) {
>>>>   return (int) fileSystem.ttyAvailable(fd);
>>>> }
>>> With apologies for being pedantic, but IMHO it should be fd.istty()
>>> since there are likely a few places where this could be useful, and we
>>> should cache it in the right place.
>> Agreed.  I think that is what elliott was suggesting earlier in this
>> thread.  Definitely seems like we should fix the semantics of these tty
>> checks so they aren't just about stdin.
> 
> yes and no. i was restricting myself to the question "how should we
> query the number of available bytes?". on Unix, the answer is with the
> FIONREAD ioctl. so my claim was really:
> 
> * there should be a "static native int available(FileDescriptor fd)"
> method generally available
> * the Unix implementation should use the FIONREAD ioctl.
> * any platform-specific hacks (such as, say, if Windows really needs
> the three-seek hack) should be in the native code for the broken
> platforms.
> * any Java should use the intention-revealing
> "available(FileDescriptor)" method and not concern itself with how
> that works on any given platform.
> 
> we've done that for Android. (if Windows forces you to treat files and
> ttys completely differently, your life will be more difficult, but i
> still think that should be in the platform-specific code. as you
> pointed out with the "/dev/tty" example, on Unix it's not just
> inefficient, it's wrong.)
> 
> for the bigger question of BufferedReader.readLine, i don't think we
> should be calling "available" at all here. i think somewhere there
> must be the equivalent of a "readFully" rather than a "read", and
> that's the mistake that needs fixing. fix that, and the "available"
> hack can go completely.

I just found a way can remove the in.available() check here. Instead of checking 
  in.available, I check whether the last char read from stream is '\n'.

Index: modules/luni/src/main/java/java/io/InputStreamReader.java
=====================================================================
--- modules/luni/src/main/java/java/io/InputStreamReader.java
+++ modules/luni/src/main/java/java/io/InputStreamReader.java
@@ -246,14 +246,9 @@ public class InputStreamReader extends Reader {
              while (out.hasRemaining()) {
                  // fill the buffer if needed
                  if (needInput) {
-                    try {
-                        if ((in.available() == 0)
-                            && (out.position() > offset)) {
-                            // we could return the result without blocking read
-                            break;
-                        }
-                    } catch (IOException e) {
-                        // available didn't work so just try the read
+                    if ((out.position() > offset) && out.get(out.position() -

1) == '\n') {
+                        // we could return the result without blocking read
+                        break;
                      }

                      int to_read = bytes.capacity() - bytes.limit();


-- 
Best Regards,
Regis.

> 
> try this:
> 
> #include <unistd.h>
> int main() {
>   ssize_t n;
>   char buf[1024];
>   while ((n = read(0, buf, sizeof(buf))) > 0) {
>     write(2, ">", 1);
>     write(2, buf, n);
>   }
>   return 0;
> }
> 
> if you run that and type "hello" then return, then "world" then
> return, you'll see (mixing your input and its output):
> 
> hello
>> hello
> world
>> world
> 
> because it's the terminal's job to make sure you get data at the end
> of a line. (or not, if so configured, but if someone has configured
> their terminal that way, that's their decision, they should expect
> programs' behavior to change, and we certainly shouldn't go around
> reconfiguring the terminal.)
> 
> by contrast, if you run this (where my program above has been built as
> "read") like this:
> 
> echo -e "hello\nworld\n" | ./read
> 
> you see:
> 
>> hello
> world
> 
> i.e. both lines arrive as a single chunk of data.
> 
> (you didn't say exactly _how_ the test fails, so i'm just guessing
> it's related to this.)
> 
>  --elliott
> 
>> Still need to look at the available natives too since elliott suggested
>> that we should replace the 3*seek with a single ioctl for all cases
>> (though this might not be as portable).
>>
>> -Mark.
>>
>>>>> Regis, can you explain a little more about why the check is needed
>>>>> in the available call?  And why the available check is needed in the
>>>>> FileInputStream read method?
>>>> After I removed the check, everything is fine, except following test is
>>>> failed:
>>>>
>>>>   public static void main(String args[])
>>>>   {
>>>>     BufferedReader reader = new BufferedReader(new
>>>> InputStreamReader(System.in));
>>>>     String line = null;
>>>>     try {
>>>>       line = reader.readLine();
>>>>     } catch (IOException e) {
>>>>       e.printStackTrace();
>>>>     }
>>>>     System.out.println(line);
>>>>   }
>>>>
>>>>> In each case, is it really stdin that is special or any file descriptor
>>>>> representing a tty?
>>>>>
>>>>> I'd like to understand this as there are a whole bunch of ttyRead with
>>>>> a similar check that only affects stdin not any tty descriptor.
>>>>>
>>>>>> in the meantime, i wondered also whether it's worth swapping the
two
>>>>>> conjuncts in the if, since the latter is essentially free while the
>>>>>> former is expensive. (i've no idea how often the cheap conjunct is
>>>>>> false, though.)
>>>>> I thought about this but held of doing it for the same reason.  However,
>>>>> I guess it is almost certainly a win.
>>>>>
>>>>> Regards,
>>>>>  Mark.
>>
>>
> 
> 
> 


-- 
Best Regards,
Regis.

Mime
View raw message