ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (IGNITE-2876) IGFS: System pool starvation is possible during data block write.
Date Thu, 24 Mar 2016 07:44:25 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vladimir Ozerov reassigned IGNITE-2876:
---------------------------------------

    Assignee: Vladimir Ozerov  (was: Ivan Veselovsky)

> IGFS: System pool starvation is possible during data block write.
> -----------------------------------------------------------------
>
>                 Key: IGNITE-2876
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2876
>             Project: Ignite
>          Issue Type: Bug
>          Components: IGFS
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Vladimir Ozerov
>            Priority: Critical
>             Fix For: 1.6
>
>
> *Problem*
> IGFS has a set of messages to exchange data and signal events between nodes. These are:
> - {{IgfsAckMessage}}
> - {{IgfsBlocksMessage}}
> - {{IgfsDeleteMessage}}
> - {{IgfsFragmentizerRequest}}
> - {{IgfsFragmentizerResponse}}
> Currently these messages are processed in a system pool which is wrong and may lead to
starvation, deadlocks and incorrect behavior.
> Several examples:
> 1) {{IgfsBlocksMessage}} handling logic performs "Cache.putAsync" operation. This operation
involves acquiring of semaphore permit. This semaphore, in turn, can only be released from
another thread in the same system pool. As such, all system pool threads could hang on permit
acquire forever.
> 2) In case file system size is exceeded, the same message waits for some time in hope
that free space in cache will appear. However, if all system pool threads waits for this point,
concurrent block removal cannot proceed, so these threads are doomed to receive {{IgfsOutOfSpaceException}}
irrespective of whether they wait or not.
> *Solution*
> 1) Introduce new IO policy for IGFS (see {{GridIoPolicy}}). 
> 2) Force all IGFS message to be processed with this policy. No backward compatibility
is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message