ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Miller <timothy.mil...@childrens.harvard.edu>
Subject files vs strings in collection reader
Date Tue, 07 May 2013 19:17:57 GMT
The FilesInDirectoryCollectionReader creates an arraylist of 
java.io.File objects when it is initialized. For large datasets (~50k 
files) this is substantial time overhead and probably memory as well. 
Seems like it would be more efficient to use Strings instead of Files 
there and just open the File object when getNext() is called. It is 
pretty easy to implement, any downside to making this switch?
Tim

Mime
View raw message