commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Georg Henzler (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-288) Supply a ReversedLinesFileReader
Date Tue, 15 Nov 2011 08:13:52 GMT

    [ https://issues.apache.org/jira/browse/IO-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150294#comment-13150294
] 

Georg Henzler commented on IO-288:
----------------------------------

Sorry for the spurious files, i created the zip with the default utility in OS-X.

I think the code addresses most of your questions already:
- There are a few tests already (testXxxxSmallBlockSize()) that test the multi-block behaviour
for lines that span a block (you can go down to a block size of 1 and it still works, that
shows that the algorithm is solid I think)
- I think it's clever to encode the newline characters - that way we automatically get the
correct byte sequence for multi byte encodings (e.g. UTF-16) and if a one byte-per-char-encoding
chose to use different bytes it would also work (performance is no issue for this as it happens
only once)
- I think about getNewLineMatchByteCount() to make it more efficient - although for the standard
ISO case it ends up just being four byte comparisons instead of three. Should make almost
no difference but on the pro side it makes the implementation nicely generic.
- It's true, there is an issue with block-spanning newlines to be fixed. If a windows newline
(\r\n) happens to span a block a wrong extra empty line will be returned.

I'll provide a fix for the newline problem and will change totalBlockCount to long.



                
> Supply a ReversedLinesFileReader 
> ---------------------------------
>
>                 Key: IO-288
>                 URL: https://issues.apache.org/jira/browse/IO-288
>             Project: Commons IO
>          Issue Type: New Feature
>          Components: Utilities
>            Reporter: Georg Henzler
>             Fix For: 2.2
>
>         Attachments: ReversedLinesFileReader0.2.zip
>
>
> I needed to analyse a log file today and I was looking for a ReversedLinesFileReader:
A class that behaves exactly like BufferedReader except that it goes from bottom to top when
readLine() is called. I didn't find it in IOUtils and the internet didn't help a lot either,
e.g. http://www.java2s.com/Tutorial/Java/0180__File/ReversingaFile.htm is a fairly inefficient
- the log files I'm analysing are huge and it is not a good idea to load the whole content
in the memory. 
> So I ended up writing an implementation myself using little memory and the class RandomAccessFile
- see attached file. It's used as follows:
> int blockSize = 4096; // only that much memory is needed, no matter how big the file
is
> ReversedLinesFileReader reversedLinesFileReader = new ReversedLinesFileReader (myFile,
blockSize, "UTF-8"); // encoding is supported
> String line = null;
> while((line=reversedLinesFileReader.readLine())!=null) {
>   ... // use the line
>   if(enoughLinesSeen) {
>      break;  
>   }
> }
> reversedLinesFileReader.close();
> I believe this could be useful for other people as well!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message