commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (VFS-483) [RAM] Many suggestions to improve the RAM file provider.
Date Thu, 08 Aug 2013 01:55:50 GMT

     [ https://issues.apache.org/jira/browse/VFS-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gary Gregory updated VFS-483:
-----------------------------

    Description: 
[I edited typos and misspellings]

---------- Forwarded message ----------
From: Bernd Eckenfels <ecki@zusammenkunft.net>
Date: Wed, Aug 7, 2013 at 5:36 PM
Subject: [VFS] RAM Provider review and questions
To: dev@commons.apache.org


Hello,

I had a look at the VFS2 RAM Provider in 2.0 and have some questions comments:


RamFileData.java
----------------
#52 byte[] buffer - this is the actual content of the file, I would name it that way. Buffer
sounds transient
(/) {color:green}DONE{color} See:
- buffer -> content
- getBuffer() -> getBuffer() (package private, so no compatibility issue)
- setBuffer() -> setBuffer() (package private, so no compatibility issue)
{noformat}

commit -m "[VFS-483][RAM] Many suggestions to improve the RAM file provider. Rename ivar buffer
to content, also rename package private accessors." C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileSystem.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileRandomAccessContent.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileOutputStream.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileObject.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileObject.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileOutputStream.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileRandomAccessContent.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileSystem.java
    Transmitting file data ...
    Committed revision 1511554.
{noformat}

#62 Collection<RamFileData> - this keeps references to all childer, aka the whole tree.
When a implementation gets smarter about cache/buffers this could harm a partial (off heap)
storage and similiar things. Change this to String[] basenames? (especially since lookup is
rather fast in RAM)

#71 children = Collections.synchronizedCollection(new ArrayList<RamFileData>()) - there
are multiple contains() lookups on this collection. This can be slow on larger directories.
Would it make sense to have a adaptive data structure here (or provide a general infrastructure
for small-to-large collections?

#136 this.buffer = new byte[0]; - I would use a "final static byte[] EMPTY" to avoid allocation
(/) {color:green}DONE{color} See:
{noformat}
commit -m "[VFS-483][RAM] Many suggestions to improve the RAM file provider. Refactor 'new
byte[0]' into a private static final byte[] called EMPTY." C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
    Transmitting file data ...
    Committed revision 1511555.
{noformat}

#186 contains/add - the "if contains throw / add" construct is quite readable however it requires
two instead of one linear search and since it is not synchronized there is a window for race
condition. Using the return boolean of the add() method (on a collection which refuses duplicates)
is more efficient.

#208 contains/remove - same argument first checking then removing has a race and is inefficient.
remove returns null if it was not in there.

#225 return children; - is it safe to return a modifyable collection? See also RamFileSystem#108
which synchronizes across class boundaries.

#244 equals / RamFileData data = (RamFileData) o; - nitpick: i would use "that" or "other",
"data" looks local

#256 this.getName().hashCode() - is it intentional to access field via getter, toString uses
the field

#281: System.arraycopy(this.buffer, 0, newBuf, 0, size); - this will fail if resize() truncates.
Is it guranteed not to happen?

I think the whole resize is generally strange, only using the setBuf() is more atomic and
does not leak uninitilized bytes.


RamFileObject.java
------------------
#95 this.data.getBuffer().length; - use data.size() or even this.size()?

#125 setBuffer(new byte[0]) - use EMPTY constant

#272 does it make sense to put the whole quota calculation into fs. (especially the option
builder)

#276 when "newSize - size" is negative the resize() is a truncation, it should not be rejected
even when the overall size is over maxsize (can this actually happen?)

RamFileOutputStream.java
------------------------
#63 introduce "RamFileObject data = this.file.getData()" as it is dereferenced 3 times in
this method

#75 file.getData().getBuffer() - does not update last update date (if this is intended then
should comment)

#61 I would change the whole logic to use setBuffer()

#87 write(buffer1) - replace with write(buffer1, 0, 1) this safes one method invocation

RamFileProvider.java
--------------------
#34 I think this is an unusual " *" line :)

RamFileSystem.java
------------------
#54 cache - is this actually a cache or is it the actual content store (see above RamFileData#62
for parent/child references). If it is intended to be the real storage it should be names
that way.

#212 the Null argument is a bit misleading as the argument was not null by the state was imaginary.

#263 what a creative way to copy an input stream to an output stream. There is no need to
Buffer the Ramoutput stream and you should never use the byte-wise read if not needed. I also
wonder if there is no IOUtil to do that. Besides in this specific case it would be better
to ask the source for size, resize the byte[] to it, read it fully and setBuffer() it (atomically).
(and 512b is also a very small buffer, it should be more like 32k)

#273 the flush is not needed, close will flush (especially if you not buffer later on the
flush is a nop)

#290 getClass/getMessage is seldom/never used anywhere else. FileSystemExeption should really
have an IOExepction types constructor to be used consistently.

#306 int size() limits the maximum size for the Ram cache to 2GB, should be long (see also
RamFileSystemConfigBuilder#68 which sets default to Integer.MAX_VALUE)

#306 fs.size() is actually used very often (especially when writing small write()s to the
OutputStream. It is a synchronized full-cach-walker, looks like a bad bottleneck. Maybe it
is better if a outputstream requests the size only once and calculates locally or use an AtomicLong
to account all changes instead of walking.

Generally I am not sure how the synchronization/concurrency model of the providers looks like,
so I did not comment on those aspects, but it looks like some methods expect to be executed
atomic.

Greetings
Bernd
-- 
http://bernd.eckenfels.net

  was:
---------- Forwarded message ----------
From: Bernd Eckenfels <ecki@zusammenkunft.net>
Date: Wed, Aug 7, 2013 at 5:36 PM
Subject: [VFS] RAM Provider review and questions
To: dev@commons.apache.org


Hello,

I had a look at the VFS2 RAM Provider in 2.0 and have some questions comments:


RamFileData.java
----------------
#52 byte[] buffer - this is the actual content of the file, I would name it that way. Buffer
sounds transient
(/) {color:green}DONE{color} See:
- buffer -> content
- getBuffer() -> getBuffer() (package private, so no compatibility issue)
- setBuffer() -> setBuffer() (package private, so no compatibility issue)
{noformat}

commit -m "[VFS-483][RAM] Many suggestions to improve the RAM file provider. Rename ivar buffer
to content, also rename package private accessors." C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileSystem.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileRandomAccessContent.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileOutputStream.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileObject.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileObject.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileOutputStream.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileRandomAccessContent.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileSystem.java
    Transmitting file data ...
    Committed revision 1511554.
{noformat}

#62 Collection<RamFileData> - this keeps references to all childer, aka the whole tree.
When a implementation gets smarter about cache/buffers this could harm a partial (off heap)
storage and similiar things. Change this to String[] basenames? (especially since lookup is
rather fast in RAM)

#71 children = Collections.synchronizedCollection(new ArrayList<RamFileData>()) - there
are multiple contains() lookups on this collection. This can be slow on larger directories.
Would it make sense to have a adaptive data structure here (or provide a general infrastructure
for small-to-large collections?

#136 this.buffer = new byte[0]; - I would use a "final static byte[] EMPTY" to avoid allocation
(/) {color:green}DONE{color} See:
{noformat}
commit -m "[VFS-483][RAM] Many suggestions to improve the RAM file provider. Refactor 'new
byte[0]' into a private static final byte[] called EMPTY." C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
    Transmitting file data ...
    Committed revision 1511555.
{noformat}

#186 contains/add - the "if contains throw / add" construct is quite readable however it requires
two instead of one linear search and since it is not synchronized there is a window for race
condition. Using the return boolean of the add() method (on a collection which refuses duplicates)
is more efficient.

#208 contains/remove - same argument first checking then removing has a race and is inefficient.
remove returns null if it was not in there.

#225 return children; - is it safe to return a modifyable collection? See also RamFileSystem#108
which synchronizes across class boundaries.

#244 equals / RamFileData data = (RamFileData) o; - nitpick: i would use "that" or "other",
"data" looks local

#256 this.getName().hashCode() - is it intentional to access field via getter, toString uses
the field

#281: System.arraycopy(this.buffer, 0, newBuf, 0, size); - this will fail if resize() truncates.
Is it guranteed not to happen?

I think the whole resize is generally strange, only using the setBuf() is more atomic and
does not leak uninitilized bytes.


RamFileObject.java
------------------
#95 this.data.getBuffer().length; - use data.size() or even this.size()?

#125 setBuffer(new byte[0]) - use EMPTY constant

#272 does it make sense to put the whole quota calculation into fs. (especially the option
builder)

#276 when "newSize - size" is negative the resize() is a truncation, it should not be rejected
even when the overall size is over maxsize (can this actually happen?)

RamFileOutputStream.java
------------------------
#63 introduce "RamFileObject data = this.file.getData()" as it is dereferenced 3 times in
this method

#75 file.getData().getBuffer() - does not update last update date (if this is intended then
should comment)

#61 I would change the whole logic to use setBuffer()

#87 write(buffer1) - replace with write(buffer1, 0, 1) this safes one method invocation

RamFileProvider.java
--------------------
#34 I think this is an unusual " *" line :)

RamFileSystem.java
------------------
#54 cache - is this actually a cache or is it the actual content store (see above RamFileData#62
for parent/child references). If it is intended to be the real storage it should be names
that way.

#212 the Null argument is a bit misleading as the argument was not null by the state was imaginary.

#263 what a creative way to copy an input stream to an output stream. There is no need to
Buffer the Ramoutput stream and you should never use the byte-wise read if not needed. I also
wonder if there is no IOUtil to do that. Besides in this specific case it would be better
to ask the source for size, resize the byte[] to it, read it fully and setBuffer() it (atomically).
(and 512b is also a very small buffer, it should be more like 32k)

#273 the flush is not needed, close will flush (especially if you not buffer later on the
flush is a nop)

#290 getClass/getMessage is seldom/never used anywhere else. FileSystemExeption should really
have an IOExepction types constructor to be used consistently.

#306 int size() limits the maximum size for the Ram cache to 2GB, should be long (see also
RamFileSystemConfigBuilder#68 which sets default to Integer.MAX_VALUE)

#306 fs.size() is actually used very often (especially when writing small write()s to the
OutputStream. It is a synchronized full-cach-walker, looks like a bad bottleneck. Maybe it
is better if a outputstream requests the size only once and calculates locally or use an AtomicLong
to account all changes instead of walking.

Generally I am not sure how the synchronization/concurrency model of the providers looks like,
so I did not comment on those aspects, but it looks like some methods expect to be executed
atomic.

Greetings
Bernd
-- 
http://bernd.eckenfels.net

    
> [RAM] Many suggestions to improve the RAM file provider.
> --------------------------------------------------------
>
>                 Key: VFS-483
>                 URL: https://issues.apache.org/jira/browse/VFS-483
>             Project: Commons VFS
>          Issue Type: Improvement
>    Affects Versions: 2.0
>         Environment: Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19
08:51:28-0500)
> Maven home: C:\Java\apache-maven-3.0.5\bin\..
> Java version: 1.7.0_25, vendor: Oracle Corporation
> Java home: C:\Program Files\Java\jdk1.7.0_25\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows 7", version: "6.1", arch: "amd64", family: "windows"
>            Reporter: Gary Gregory
>            Assignee: Gary Gregory
>
> [I edited typos and misspellings]
> ---------- Forwarded message ----------
> From: Bernd Eckenfels <ecki@zusammenkunft.net>
> Date: Wed, Aug 7, 2013 at 5:36 PM
> Subject: [VFS] RAM Provider review and questions
> To: dev@commons.apache.org
> Hello,
> I had a look at the VFS2 RAM Provider in 2.0 and have some questions comments:
> RamFileData.java
> ----------------
> #52 byte[] buffer - this is the actual content of the file, I would name it that way.
Buffer sounds transient
> (/) {color:green}DONE{color} See:
> - buffer -> content
> - getBuffer() -> getBuffer() (package private, so no compatibility issue)
> - setBuffer() -> setBuffer() (package private, so no compatibility issue)
> {noformat}
> commit -m "[VFS-483][RAM] Many suggestions to improve the RAM file provider. Rename ivar
buffer to content, also rename package private accessors." C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileSystem.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileRandomAccessContent.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileOutputStream.java
C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileObject.java
>     Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
>     Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileObject.java
>     Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileOutputStream.java
>     Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileRandomAccessContent.java
>     Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileSystem.java
>     Transmitting file data ...
>     Committed revision 1511554.
> {noformat}
> #62 Collection<RamFileData> - this keeps references to all childer, aka the whole
tree. When a implementation gets smarter about cache/buffers this could harm a partial (off
heap) storage and similiar things. Change this to String[] basenames? (especially since lookup
is rather fast in RAM)
> #71 children = Collections.synchronizedCollection(new ArrayList<RamFileData>())
- there are multiple contains() lookups on this collection. This can be slow on larger directories.
Would it make sense to have a adaptive data structure here (or provide a general infrastructure
for small-to-large collections?
> #136 this.buffer = new byte[0]; - I would use a "final static byte[] EMPTY" to avoid
allocation
> (/) {color:green}DONE{color} See:
> {noformat}
> commit -m "[VFS-483][RAM] Many suggestions to improve the RAM file provider. Refactor
'new byte[0]' into a private static final byte[] called EMPTY." C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
>     Sending        C:/vcs/svn/apache/commons/trunks-proper/vfs/core/src/main/java/org/apache/commons/vfs2/provider/ram/RamFileData.java
>     Transmitting file data ...
>     Committed revision 1511555.
> {noformat}
> #186 contains/add - the "if contains throw / add" construct is quite readable however
it requires two instead of one linear search and since it is not synchronized there is a window
for race condition. Using the return boolean of the add() method (on a collection which refuses
duplicates) is more efficient.
> #208 contains/remove - same argument first checking then removing has a race and is inefficient.
remove returns null if it was not in there.
> #225 return children; - is it safe to return a modifyable collection? See also RamFileSystem#108
which synchronizes across class boundaries.
> #244 equals / RamFileData data = (RamFileData) o; - nitpick: i would use "that" or "other",
"data" looks local
> #256 this.getName().hashCode() - is it intentional to access field via getter, toString
uses the field
> #281: System.arraycopy(this.buffer, 0, newBuf, 0, size); - this will fail if resize()
truncates. Is it guranteed not to happen?
> I think the whole resize is generally strange, only using the setBuf() is more atomic
and does not leak uninitilized bytes.
> RamFileObject.java
> ------------------
> #95 this.data.getBuffer().length; - use data.size() or even this.size()?
> #125 setBuffer(new byte[0]) - use EMPTY constant
> #272 does it make sense to put the whole quota calculation into fs. (especially the option
builder)
> #276 when "newSize - size" is negative the resize() is a truncation, it should not be
rejected even when the overall size is over maxsize (can this actually happen?)
> RamFileOutputStream.java
> ------------------------
> #63 introduce "RamFileObject data = this.file.getData()" as it is dereferenced 3 times
in this method
> #75 file.getData().getBuffer() - does not update last update date (if this is intended
then should comment)
> #61 I would change the whole logic to use setBuffer()
> #87 write(buffer1) - replace with write(buffer1, 0, 1) this safes one method invocation
> RamFileProvider.java
> --------------------
> #34 I think this is an unusual " *" line :)
> RamFileSystem.java
> ------------------
> #54 cache - is this actually a cache or is it the actual content store (see above RamFileData#62
for parent/child references). If it is intended to be the real storage it should be names
that way.
> #212 the Null argument is a bit misleading as the argument was not null by the state
was imaginary.
> #263 what a creative way to copy an input stream to an output stream. There is no need
to Buffer the Ramoutput stream and you should never use the byte-wise read if not needed.
I also wonder if there is no IOUtil to do that. Besides in this specific case it would be
better to ask the source for size, resize the byte[] to it, read it fully and setBuffer()
it (atomically). (and 512b is also a very small buffer, it should be more like 32k)
> #273 the flush is not needed, close will flush (especially if you not buffer later on
the flush is a nop)
> #290 getClass/getMessage is seldom/never used anywhere else. FileSystemExeption should
really have an IOExepction types constructor to be used consistently.
> #306 int size() limits the maximum size for the Ram cache to 2GB, should be long (see
also RamFileSystemConfigBuilder#68 which sets default to Integer.MAX_VALUE)
> #306 fs.size() is actually used very often (especially when writing small write()s to
the OutputStream. It is a synchronized full-cach-walker, looks like a bad bottleneck. Maybe
it is better if a outputstream requests the size only once and calculates locally or use an
AtomicLong to account all changes instead of walking.
> Generally I am not sure how the synchronization/concurrency model of the providers looks
like, so I did not comment on those aspects, but it looks like some methods expect to be executed
atomic.
> Greetings
> Bernd
> -- 
> http://bernd.eckenfels.net

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message