systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glenn Weidner (JIRA)" <>
Subject [jira] [Closed] (SYSTEMML-1396) Enable lazily freeing cuda allocated memory chunks
Date Tue, 02 May 2017 18:45:04 GMT


Glenn Weidner closed SYSTEMML-1396.

> Enable lazily freeing cuda allocated memory chunks
> --------------------------------------------------
>                 Key: SYSTEMML-1396
>                 URL:
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Runtime
>            Reporter: Nakul Jindal
>            Assignee: Nakul Jindal
>             Fix For: SystemML 0.14
> The current version of deallocating cuda memory chunks is done asynchronously. That came
about as a result of the {{cudaFree}} operations being expensive and so the thought process
of doing cudaFree asynchronously was that the cudaFree could happen when the CPU was busy
with other work. In tight loops where most operations are done on the GPU, the asynchronous
cudaFree weren't really asynchronous. Operations waiting to use the GPU would pay the penalty
for the cudaFree operation.
> After adding extra instrumentation, it was determined that {{cudaAlloc}} operations were
fairly expensive as well. 
> Most GPU operations are done in loops with constantly allocating and deallocating the
same size of memory chunks per loop. What would be more efficient is to "clear out" or set
the memory to 0 instead.

This message was sent by Atlassian JIRA

View raw message