ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Sarma <ksa...@ksarma.com>
Subject Re: files vs strings in collection reader
Date Tue, 07 May 2013 19:55:21 GMT
Presumably some sort of system call is required to list the files in the
directory -- there presumably is slight overhead in storing those once and
then calling the file initializer on stored filenames. That being said, I
agree that the overhead there is likely minuscule.







--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab
Member, CA Delegation to the House of Delegates of the American Medical
Association
ksarma@ksarma.com
gchat: ksarma@gmail.com
linkedin: www.linkedin.com/in/ksarma


On Tue, May 7, 2013 at 12:44 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> I don't think that File instantiation is more slow than the ae process,
> and Tim is talking about tens of thousands of files in the directory tree.
>
> The only filesystem call that should exist in any new File(..) is a
> normalize(..) or resolve(..) on the passed parameter(s), which should just
> be string manipulation and no actual io calls, native or otherwise.  In
> other words, new File(..) should be fast.
>
> -----Original Message-----
> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> Sarma
> Sent: Tuesday, May 07, 2013 3:26 PM
> To: dev@ctakes.apache.org
> Subject: Re: files vs strings in collection reader
>
> Hmm, without having actually reviewed the code in cTAKES (I'm not on my
> work computer), my understanding of the "correct" way of doing this is to
> use the listFiles method on the directory File to get an array of Files;
> this should be implemented natively by the JVM and could be faster than
> individual initialization.
>
>
>
>
>
> --
> Karthik Sarma
> UCLA Medical Scientist Training Program Class of 20??
> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation to
> the House of Delegates of the American Medical Association
> ksarma@ksarma.com
> gchat: ksarma@gmail.com
> linkedin: www.linkedin.com/in/ksarma
>
>
> On Tue, May 7, 2013 at 12:17 PM, Tim Miller <
> timothy.miller@childrens.harvard.edu> wrote:
>
> > The FilesInDirectoryCollectionRead**er creates an arraylist of
> > java.io.File objects when it is initialized. For large datasets (~50k
> > files) this is substantial time overhead and probably memory as well.
> > Seems like it would be more efficient to use Strings instead of Files
> > there and just open the File object when getNext() is called. It is
> > pretty easy to implement, any downside to making this switch?
> > Tim
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message