commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-288) Supply a ReversedLinesFileReader
Date Tue, 15 Nov 2011 00:48:52 GMT

    [ https://issues.apache.org/jira/browse/IO-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150124#comment-13150124
] 

Sebb commented on IO-288:
-------------------------

Good to know that it's easy to unambiguously detect CR and LF.

There seems to be a lot of spurious files in the zip archive.

I'm not sure that the getNewLineMatchByteCount() is as efficient as BufferedReader.readLine()
- it seems to process characters multiple times. It could probably be improved by just checking
current and previous chars. Also, I don't think it's necessary to encode \n or \r - just use
the appropriate characters.

There are no tests for multi-block files where there may be lines spanning blocks.
Indeed the CRLF pair may span blocks; I'm not convinced that the code handles that correctly.
In order for getNewLineMatchByteCount() to detect all CRLF pairs, it generally needs at least
2 characters to be present; this does not seem to be guaranteed.

Note: could use a smaller block size to make the test files smaller; probably sensible to
compare the results with a forward line reader. It would then be simple to have a directory
of various different test files - read the file forward and store the lines; ensure that the
reverse reader matches the reversed lines.

The field totalBlockCount needs to be a long, not an int.

Might simplify the code to use empty arrays rather than null.
                
> Supply a ReversedLinesFileReader 
> ---------------------------------
>
>                 Key: IO-288
>                 URL: https://issues.apache.org/jira/browse/IO-288
>             Project: Commons IO
>          Issue Type: New Feature
>          Components: Utilities
>            Reporter: Georg Henzler
>             Fix For: 2.2
>
>         Attachments: ReversedLinesFileReader0.2.zip
>
>
> I needed to analyse a log file today and I was looking for a ReversedLinesFileReader:
A class that behaves exactly like BufferedReader except that it goes from bottom to top when
readLine() is called. I didn't find it in IOUtils and the internet didn't help a lot either,
e.g. http://www.java2s.com/Tutorial/Java/0180__File/ReversingaFile.htm is a fairly inefficient
- the log files I'm analysing are huge and it is not a good idea to load the whole content
in the memory. 
> So I ended up writing an implementation myself using little memory and the class RandomAccessFile
- see attached file. It's used as follows:
> int blockSize = 4096; // only that much memory is needed, no matter how big the file
is
> ReversedLinesFileReader reversedLinesFileReader = new ReversedLinesFileReader (myFile,
blockSize, "UTF-8"); // encoding is supported
> String line = null;
> while((line=reversedLinesFileReader.readLine())!=null) {
>   ... // use the line
>   if(enoughLinesSeen) {
>      break;  
>   }
> }
> reversedLinesFileReader.close();
> I believe this could be useful for other people as well!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message