ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: files vs strings in collection reader
Date Tue, 07 May 2013 19:42:17 GMT
do you have any numbers of what sort of impact this will actually have?  Not clear to me what
the savings would be from. Instantiating objects either way.  Should we be just initializing
the ArrayList to something other than the default size?

-- James


> -----Original Message-----
> From: dev-return-1580-Masanz.James=mayo.edu@ctakes.apache.org [mailto:dev-
> return-1580-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of Tim
> Miller
> Sent: Tuesday, May 07, 2013 2:18 PM
> To: dev@ctakes.apache.org
> Subject: files vs strings in collection reader
> 
> The FilesInDirectoryCollectionReader creates an arraylist of java.io.File
> objects when it is initialized. For large datasets (~50k
> files) this is substantial time overhead and probably memory as well.
> Seems like it would be more efficient to use Strings instead of Files
> there and just open the File object when getNext() is called. It is pretty
> easy to implement, any downside to making this switch?
> Tim

Mime
View raw message