river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Dolan <christopher.do...@avid.com>
Subject RE: Implications for Security Checks - SocketPermission, URL and DNS lookups
Date Tue, 13 Dec 2011 15:04:39 GMT
I think you're referring to this: http://support.microsoft.com/kb/314882 ("Inbound connections
limit in Windows XP"). If so, that applies only to WinXP. I understood that Microsoft relaxed
that restriction for Vista and later. As you say it did not apply to the server OS, specifically
Win 2003.

So, I wouldn't bother with a specific Reggie patch for this issue, as it will be less and
less important as time progresses.

Chris

-----Original Message-----
From: Gregg Wonderly [mailto:gregg@wonderly.org] 
Sent: Tuesday, December 13, 2011 8:56 AM
To: dev@river.apache.org
Subject: Re: Implications for Security Checks - SocketPermission, URL and DNS lookups

Also, one simple reminder about "Windows".  The folks at Microsoft want to be 
able to make you buy server class OSes, so the user OSes limit the number of 
simultaneous socket connections as well as other things, so that you can't buy a 
cheap "user" seat and make a "server" of any substance out of it.   But, when 
you put a Jini LUS instance, such as Jini on a "user" seat machine, these 
limitations can "help" control overload.  What happens, is that Windows will 
throw out "RST" packets when too many connections occur, and cause the 
connecting machines to back off.

I don't have specific numbers to show, but practically, it will cause a few 
machines at a time to register, and others to retry later when the next 
multicast announcement goes out.

 From some perspectives, we might want to look at providing a "setting" for 
reggie which would cause it to limit the total number of inbound registrations 
and lookups in a way which would provide for some good old fashioned resource 
management that worked well to keep what Chris mentions here from happening.

Gregg Wonderly

On 12/13/2011 8:31 AM, Christopher Dolan wrote:
> Quite true Gregg, but that doesn't help when Reggie boots and hundreds of hosts contact
it in a short time span against a cold DNS cache. Prior to resolution of RIVER-396 ("PreferredClassProvider
classloader cache concurrency improvement") these timeout failures were effectively serial
and caused long stalls. The resulting OOMEs and failed thread creation events in some isolated
scenarios were unrecoverable. For me, this was mitigated by the triple solution of 1) turning
off the SocketPermission check, 2) the RIVER-396 patch and 3) switching JERI to NIO to save
some threads.
>
> Chris
>
> -----Original Message-----
> From: Gregg Wonderly [mailto:gregg@wonderly.org]
> Sent: Tuesday, December 13, 2011 8:19 AM
> To: dev@river.apache.org
> Cc: Peter Firmstone
> Subject: Re: Implications for Security Checks - SocketPermission, URL and DNS lookups
>
> Remember to, from a general "workaround" perspective, that you can use command
> line options to "lengthen" the time that DNS failure information is retained, to
> keep things moving when no reverse DNS information is available.  The default,
> is like 10 seconds, and that is considerably shorter than what you will
> generally experience in a failed lookup.  The end result, is that the failure
> cache doesn't serve much purpose without it having a very extended time, as a
> workaround.   In some cases, I've set it to an hour or more, and some initial
> startup is then "slow", and initial client "connection" can be a little slow,
> but then things move along quite well.
>
> Gregg Wonderly
>
> On 12/13/2011 2:56 AM, Peter Firmstone wrote:
>> In addition CodeSource.implies() also causes DNS checks, I'm not 100% sure
>> about the jvm code, but Harmony code uses SocketPermission.implies() to check
>> if one CodeSource implies another, I believe the jvm policy implementation
>> also utilises it, because harmony's implementation is built from Sun's java spec.
>>
>> So in the existing policy implementations, when parsing the policy files,
>> additional start up delays may be caused by the CodeSource.implies() method
>> making network DNS calls.
>>
>> In my ConcurrentPolicyFile implementation (to replace the standard java
>> PolicyFile implementation), I've created a URIGrant, I've taken code from
>> Harmony to implement implies(ProtectionDomain pd), that performs wildcard
>> matching compliant with CodeSource.implies, the only difference being, that no
>> attempt to resolve URI's is made.
>>
>> Typically most policy files specify file based URL's for CodeSource, however
>> in a network application where many CodeSources may be network URL's, DNS
>> lookup causes added delays.
>>
>> I've also created a CodeSourceGrant which uses CodeSource.implies() for
>> backward compatibility with existing java policy files, however I'm sure that
>> most will simply want to revise their policy files.
>>
>> The standard interface PermissionGrant, is implemented by the following
>> inheritance hierarchy of immutable classes:
>>
>>                                   PrincipalGrant
>>                   ______________|_______________________________
>>
>> |
>> |
>> ProtectionDomainGrant
>> CertificateGrant
>>                  |
>> ________________ |________________
>> ClassLoaderGrant
>> |                                                                  |
>>
>> URIGrant                                              CodeSourceGrant
>>
>>
>> Only PrincipalGrant is publicly visible, a builder returns the correct
>> implementation.
>>
>> ProtectionDomainGrant and ClassLoaderGrant are dynamically granted, by the
>> completely new DynamicPolicyProvider (which has long since passed all tests).
>>
>> CertificateGrant, URIGrant and CodeSourceGrant are used by the File based
>> policy's and RemotePolicy, which is intended to be a service that nodes in a
>> djinn can use to allow an administrator to update the policy (eg to include
>> new certificates or principals), with all the protection of subject
>> authentication and secure connections.  RemotePolicy is idempotent, the policy
>> is updated in one operation, so the current policy state is always known to
>> the administrator (who is a client).
>>
>> Since a File based policy is mostly read and only written when refreshed,
>> PermissionGrant's are held in a volatile array reference, copied (only the
>> reference) by any code that reads the array.  The array reference is updated
>> when the policy is updated, the array is never mutated after publishing.
>>
>> A ConcurrentMap<ProtectionDomain, PermissionCollection>  (with weak keys) acts
>> as a cache, I've got ConcurrentPermissions, an implementation that replaces
>> the hetergenous java.security.Permissions class, this also resolves any
>> unresolved permissions.
>>
>> However I'm starting to wonder if it's wiser to throw away the cache
>> altogether and simply build java.security.Permissions on demand, then throw
>> Permissions away immediately after use for collection in the young generation
>> heap (it's likely to fit in level 2 cache and never even be copied to Ram).
>> This would eliminate contention between existing PermissionCollection's that
>> block, like SocketPermissionCollection.
>>
>> So if you have for instance 100 different AccessControlContext's being checked
>> by different threads, that all contain the same ProtectionDomain's for a
>> SocketPermission, then all will be executed in parallel.  Currently due to
>> blocking, each SocketPermission that performs a DNS check must either resolve
>> or timeout, before it's SocketPermissionCollection can release it's
>> synchronization lock (and there may be multiple SocketPermission's in a
>> SocketPermissionCollection), before another thread can check it's context and
>> so on, which explains everything coming to a standstill.
>>
>> If all permission checks execute in parallel independently, without blocking,
>> then the timeout won't be magnified.
>>
>> I am considering going one step further and replacing SocketPermission and
>> SocketPermissionCollection, and implementing DNS checks in the
>> SocketPermissionCollection rather than SocketPermission.  By doing this a
>> matching record will be found in most cases without requiring DNS reverse
>> lookup.  If I keep this as an internal policy implementation detail, then if
>> Oracle fixes SocketPermission, we can return to using the standard java
>> implementation, in fact I could make it a configuration property.
>>
>> It's an unfortunate fact that not all permission checks are performed in the
>> policy, replacing SocketPermission also requires the cooperation of the
>> SecurityManager.  To make matters worse, static ProtectionDomains created
>> prior to my policy implementation being constructed will never consult my
>> policy implementation as such they will still contain SocketPermission.   So
>> the SecurityManager would need to check each ProtectionDomain for both
>> implementations, so reimplementing SocketPermission doesn't eliminate its use
>> entirely.
>>
>> It's worth noting that SocketPermission is implemented rather poorly and the
>> same functionality can be provided with far fewer DNS lookups being performed,
>> since the majority are performed completely unnecessarily.  Perhaps it's worth
>> me donating some time to OpenJDK to fix it, I'd have to check with Apache
>> legal first I suppose.
>>
>> The problems with DNS lookup also affects CodeSource and URL equals and
>> hashcode methods, so these classes shouldn't be used in collections.
>>
>> Cheers,
>>
>> Peter.
>>
>> Christopher Dolan wrote:
>>> To simulate the problem, go to InetAddress.getHostFromNameService() in your
>>> IDE, set a breakpoint on the "nameService.getHostByAddr" line with a
>>> condition of something like this:
>>>
>>>       new java.util.concurrent.CountDownLatch(1).await(15,
>>> java.util.concurrent.TimeUnit.SECONDS)
>>>
>>> then launch your River application from within the IDE. This will cause all
>>> reverse DNS lookups to stall for 15 seconds before succeeding. This will
>>> affect Reggie the worst because it has to verify so many hostnames. In a
>>> large group (a few thousand services) this will drive Reggie's thread count
>>> skyward, perhaps triggering OutOfMemory errors if it's in a 32-bit JVM.
>>>
>>> This problem happens in the real world in facilities that allow client
>>> connections to the production LAN, but do not allow the production LAN to
>>> resolve hosts in the client LAN. This may occur due to separate IT teams or
>>> strict security rules or simple configuration errors. Because most
>>> client-server systems, like web servers, do not require the server to contact
>>> the client this problem does not become immediately visible to IT. Instead,
>>> the question is inevitably "Why is Jini/River so sensitive to reverse DNS?
>>> All of my other services work fine."
>>>
>>> Chris
>>>
>>> -----Original Message-----
>>> From: Tom Hobbs [mailto:tvhobbs@googlemail.com] Sent: Monday, December 12,
>>> 2011 1:43 PM
>>> To: dev@river.apache.org
>>> Subject: Re: RE: Implications for Security Checks - SocketPermission, URL and
>>> DNS lookups
>>>
>>> My biggest concern with such fundamental changes is controlling the impact
>>> it will have.  I'm a pretty good example of this, I haven't experienced the
>>> troubles these changes are intended to overcome.  I also don't havent made
>>> any attempt to dive into these areas of the code, for any reason.
>>>
>>> Is it possible to put together a test case which exposes these problems and
>>> also proves the solution?
>>>
>>> Obviously, a test case involving misconfigured networks is daft, in that
>>> instance a handy "if your network misconfigured" diagnostic tool or
>>> documentation would be a good idea.
>>>
>>> Please don't interpret this concern as a criticism of your work, Peter.
>>> Far from it.  It's just a comment born out of not really having any contact
>>> with the area your working in!
>>>
>>>
>>> Grammar and spelling have been sacrificed on the altar of messaging via
>>> mobile device.
>>>
>>> On 12 Dec 2011 18:01, "Christopher Dolan"<christopher.dolan@avid.com>
>>> wrote:
>>>
>>>> Specifically for SocketPermission, I experienced severe timeout problems
>>>> with reverse DNS misconfigurations. For some LAN-based deployments, I
>>>> relaxed this criterion via 'new SocketPermission("*",
>>>> "accept,listen,connect,resolve")'. This was difficult to apply to a general
>>>> Sun/Oracle JVM, however, because the default security policy *prepends* a
>>>> ("localhost:1024-","listen") permission that triggers the reverse DNS
>>>> lookup. To avoid this inconvenient setting, I install a new
>>>> java.security.Policy subclass that delegates to the default Policy except
>>>> when the incoming permission is a SocketPermission. That way I don't need
>>>> to modify the policy file in the JVM. The Policy.implies() override method
>>>> is trivial because it just needs to do " if (permission instanceof
>>>> SocketPermission) { ... }". The PermissionCollection methods were trickier
>>>> to override (skip over any SocketPermission elements in the default
>>>> Policy's PermissionCollection), but still only about 50 LOC.
>>>>
>>>> Chris
>>>>
>>>> -----Original Message-----
>>>> From: Peter Firmstone [mailto:jini@zeus.net.au]
>>>> Sent: Friday, December 09, 2011 9:28 PM
>>>> To: dev@river.apache.org
>>>> Subject: Implications for Security Checks - SocketPermission, URL and DNS
>>>> lookups
>>>>
>>>> DNS lookups and reverse lookups caused by URL and SocketPermission,
>>>> equals, hashCode and implies methods create some serious performance
>>>> problems for distributed programs.
>>>>
>>>> The concurrent policy implementation I've been working on reduces lock
>>>> contention between threads performing security checks.
>>>>
>>>> When the SecurityManager is used to check a guard, it calls the
>>>> AccessController, which retrieves the AccessControlContext from the call
>>>> stack, this contains all the ProtectionDomain's on the call stack (I
>>>> won't go into privileged calls here), if a ProtectionDomain is dynamic
>>>> it will consult the Policy, prior to checking the static permissions it
>>>> contains.
>>>>
>>>> The problem with the old policy implementation is lock contention caused
>>>> by multiple threads all using multiple ProtectionDomains, when the time
>>>> taken to perform a check is considerable, especially where identical
>>>> security checks might be performed by multiple threads executing the
>>>> same code.
>>>>
>>>> Although concurrent policy reduces contention between ProtectionDomain's
>>>> calls to Policy.implies, there remain some fundamental problems with the
>>>> implementations of SocketPermission and URL, that cause unnecessary DNS
>>>> lookups during equals(), hashCode() and implies() methods.
>>>>
>>>> The following bugs concern SocketPermission (please read before
>>>> continuing) :
>>>>
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6592285
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4975882 - contains a
>>>> lot of valuable comments.
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4671007 - fixed,
>>>> perhaps incorrectly.
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6501746
>>>>
>>>> Anyway to cut a long story short, DNS lookups and DNS reverse lookups
>>>> are performed for the equals and hashCode implementations in
>>>> SocketPermission and URL, with disastrous performance implications for
>>>> policy implementations using collections and caching security permission
>>>> check results.
>>>>
>>>> For example, once a SocketPermission guard has been checked for a
>>>> specific AccessContolContext the result is cached by my SecurityManager,
>>>> avoiding repeat security checks, however if that cache contains
>>>> SocketPermission, DNS lookups will be required, the cache will perform
>>>> slower than some other directly performed security checks!  The cache is
>>>> intended to return quickly to avoid reconsulting every ProtectionDomain
>>>> on the stack.
>>>>
>>>> To make matters worse, when checking a SocketPermission guard, the DNS
>>>> may be consulted for every non wild card SocketPermission contained
>>>> within a SocketPermissionCollection, up until it is implied.  DNS checks
>>>> are being made unnecessarily, since the wild card that matches may not
>>>> require a DNS lookup at all, but because the non matching
>>>> SocketPermission's are being checked first, the DNS lookups and reverse
>>>> lookups are still performed.  This could be fixed completely, by moving
>>>> the responsibility of DNS lookups from SocketPermission to
>>>> SocketPermissionCollection.
>>>>
>>>> The identity of two SocketPermission's are equal if they resolve to the
>>>> same IP address, but their hashCode's are different! See bug 6592623.
>>>>
>>>> The identity of a SocketPermission with an IP address and a DNS name,
>>>> resolving to identical IP address should not (in my opinion) be equal,
>>>> but is!  One SocketPermission should only imply the other while DNS
>>>> resolves to the same IP address, otherwise the equality of the two
>>>> SocketPermission's will change if the IP address is assigned to a
>>>> different domain!  Object equality / identity shouldn't depend on the
>>>> result of a possibly unreliable network source.
>>>>
>>>> SocketPermission and SocketPermissionCollection are broken, the only
>>>> solution I can think of is to re-implement these classes (from Harmony)
>>>> in the policy and SecurityManager, substituting the existing jvm
>>>> classes.  This would not be visible to client developers.
>>>>
>>>> SocketPermission's may also exist in a ProtectionDomain's static
>>>> Permissions, these would have to be converted by the policy when merging
>>>> the permissions from the ProtectionDomain with those from the policy.
>>>> Since ProtectionDomain, attempts to check it's own internal permissions,
>>>> after the policy permission check fails, DNS checks are currently
>>>> performed by duplicate SocketPermission's residing in the
>>>> ProectionDomain, this will no longer occur, since the permission being
>>>> checked will be converted to say for argument sake
>>>> org.apache.river.security.SocketPermission.  However because some
>>>> ProtectionDomains are static, they never consult the policy, so the
>>>> Permission's contained in each ProtectionDomain will require conversion
>>>> also, to do so will require extending and implementing a
>>>> ProtectionDomain that encapsulates existing ProtectionDomain's in the
>>>> AccessControlContext, by utilising a DomainCombiner.
>>>>
>>>> For CodeSource grant's, the policy file based grant's are defined by
>>>> URL's, however URL's identity depend upon DNS record results, similar to
>>>> SocketPermission equals and hashCode implementations which we have no
>>>> control over.
>>>>
>>>> I'm thinking about implementing URI based grant's instead, to avoid DNS
>>>> lookups, then allowing a policy compatibility mode to be enabled (with
>>>> logging) for falling back to CodeSource grant's when a URL cannot be
>>>> converted to a URI, this is a much simpler fix than the SocketPermission
>>>> problem.
>>>>
>>>> For Dynamic Policy Grants, because ProtectionDomain doesn't override
>>>> equals (that's a good thing), the contained CodeSource must also be
>>>> checked, again potentially slowing down permission checks with DNS
>>>> lookups, simply because CodeSource uses URL's.  Changing the Dynamic
>>>> Grant's to use URI based comparison would be relatively simple, since
>>>> the URI is obtained dynamically when the dynamic grant is created.
>>>>
>>>> URI based grant's don't use DNS resolution and would have a narrower
>>>> scope of implied CodeSources, an IP based grant won't imply a DNS domain
>>>> URL based CodeSource and vice versa.  Rather than rely on DNS
>>>> resolution, grant's could be made specifically for IPv4, IPv6 and DNS
>>>> names in policy files.  URL.toURI() can be utilised to check if URI
>>>> grant's imply a CodeSource without resorting to DNS.
>>>>
>>>> Any thoughts, comments or ideas?
>>>>
>>>> N.B. It's sad that security is implemented the way it is, it would be
>>>> far better if it was Executor based, since every protection domain could
>>>> be checked in parallel, rather than in sequence.
>>>>
>>>> Regards,
>>>>
>>>> Peter.
>>>>
>>>>
>>>>
>>
>


Mime
View raw message