river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Trasuk <tras...@stratuscom.com>
Subject Re: Tests failing due to port in use
Date Tue, 09 Apr 2013 20:32:38 GMT

On Tue, 2013-04-09 at 15:29, Peter Firmstone wrote:
> This was being discussed in two separate threads:
> 
> Greg,
> 
> Are you able to find which process is using a port?  Perhaps jvisualvm 
> might help determine which test it is?  If you can narrow it down to a 
> bunch of test definitions sharing a test class, by identifying the class 
> on the stack, you'll find it.
> 
No, there's no process using the port.  The evidence points to a
configuration error.  The problem is not that the registrars don't
start.  It's that they start on random ports when the port is supposed
to be specified.

If you don't believe that, perhaps you could explain what the properties
file "qa/src/com/sun/jini/test/share/reggie3_2Ports.properties", which
> >> includes properties called 
> >> "net.jini.core.lookup.ServiceRegistrar.port.0=7700",
> >> "net.jini.core.lookup.ServiceRegistrar.port.1=7701",
> >> "net.jini.core.lookup.ServiceRegistrar.port.2=7702", etc. 
and others that are similar, are supposed to be doing?

Randomly changing code is not a troubleshooting technique.  

That probably sounds a little snarky.  Sorry, it's been a long day. 
What I mean is that based on the evidence I'm seeing, I'm pretty sure
that there is a configuration error, that might be local, or might
actually be an error in the config files in the integration tests.  I'm
not going to patch the integration tests in the '2.2' branch without
knowing for sure what's going on.

Relevant part of the trace output for a failed test is below.  Note that
the test is expecting ConstrainableLookupLocator objects, which do in
fact have a constructor that takes a port parameter.

Is it possible that this issue is also present in the Jenkins builds?  

Having asked that, it is certainly a lot more likely (given that the
Jenkins build is on a shared machine) that some other build on the
machine is trying to use one of the ports that the integration tests
want to use for the class server.  I suspect that would cause failures.

Cheers,

Greg.

BaseQATest.startInitLookups FINE:  initial lookups started --
BaseQATest.displayLookupStartInfo FINE:    # of lookups = 3
BaseQATest.displayLookupStartInfo FINE:      locator lookup[0] = ConstrainableLookupLocator[[jini://192.168.1.112:61219/],
[null]]
GroupsUtil.displayGroupSet FINE:        group[0] = LLDGroup0_A_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[1] = LLDGroup0_B_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[2] = LLDGroup0_C_Gregs-MacBook-Pro-3.local_1365366343456
BaseQATest.displayLookupStartInfo FINE:      locator lookup[1] = ConstrainableLookupLocator[[jini://192.168.1.112:61228/],
[null]]
GroupsUtil.displayGroupSet FINE:        group[0] = LLDGroup1_A_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[1] = LLDGroup1_B_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[2] = LLDGroup1_C_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[3] = LLDGroup1_D_Gregs-MacBook-Pro-3.local_1365366343456
BaseQATest.displayLookupStartInfo FINE:      locator lookup[2] = ConstrainableLookupLocator[[jini://192.168.1.112:61231/],
[null]]
GroupsUtil.displayGroupSet FINE:        group[0] = LLDGroup2_A_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[1] = LLDGroup2_B_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[2] = LLDGroup2_C_Gregs-MacBook-Pro-3.local_1365366343456
BaseQATest.startInitLookups FINE:  initial lookups wanted --
BaseQATest.displayLookupStartInfo FINE:    # of lookups = 3
BaseQATest.displayLookupStartInfo FINE:      locator lookup[0] = ConstrainableLookupLocator[[jini://192.168.1.112/],
[null]]
GroupsUtil.displayGroupSet FINE:        group[0] = LLDGroup0_A_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[1] = LLDGroup0_B_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[2] = LLDGroup0_C_Gregs-MacBook-Pro-3.local_1365366343456
BaseQATest.displayLookupStartInfo FINE:      locator lookup[1] = ConstrainableLookupLocator[[jini://192.168.1.112/],
[null]]
GroupsUtil.displayGroupSet FINE:        group[0] = LLDGroup1_A_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[1] = LLDGroup1_B_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[2] = LLDGroup1_C_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[3] = LLDGroup1_D_Gregs-MacBook-Pro-3.local_1365366343456
BaseQATest.displayLookupStartInfo FINE:      locator lookup[2] = ConstrainableLookupLocator[[jini://192.168.1.112/],
[null]]
GroupsUtil.displayGroupSet FINE:        group[0] = LLDGroup2_A_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[1] = LLDGroup2_B_Gregs-MacBook-Pro-3.local_1365366343456
GroupsUtil.displayGroupSet FINE:        group[2] = LLDGroup2_C_Gregs-MacBook-Pro-3.local_1365366343456
BaseQATest.tearDown FINE:  tearDown - terminating lookup service(s)

> Sim wrote:
> > On 09-04-13 11:44, Peter Firmstone wrote:
> >> I've never experienced the issue locally (I see it on Jenkins quite a
> >> lot), but I suspect a stale registrar process left from another test may
> >> be stopping the socket from closing.  Not that registrars are also
> >> simulated for discovery tests, so it may not necessarily be Reggie.
> >>
> >> The code is duplicated in two places in superclasses of the tests that
> >> are failing,  the method portInUse(int port) is supposed to check if the
> >> ports available, but only selects from a list of LookupLocator's known
> >> to have started, so doesn't actually check the port's available.
> >>
> >> Perhaps you can find the stale test process responsible?
> >
> > And other jobs get executed on the same machine at the same time. 
> > Maybe these consume ports?
> >
> > The best way to find the problem looks to me like doing a 'lsof -i' in 
> > the test as soons as a port turns out used unexpectedly.
> >
> > Gr. Simon
> >
> > -- 
> > QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
> > Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
> 
> Peter wrote:
> > It's a problem I've been aware of for some time, many tests don't 
> > specify a port, they just want three or four or however many 
> > registrar's started and they have to specify it using a LookupLocator, 
> > unfortunately you can't specify an ephemeral LookupLocator, it's not 
> > allowed, so if you don't specify a particular port, you get 4160 and 
> > the portInUse method is supposed to work around that.
> >
> > When a test finds a port in use, it updates the registrar wanted to an 
> > available port.  If it thinks 4160 is available, when it isn't, boom, 
> > test failed.
> >
> > Regards,
> >
> > Peter.
> >
> > Greg Trasuk wrote:
> >> I hope you're not adding this to try to help my configuration issue,
> >> because I'm pretty sure that's not the problem, and I hate to screw
> >> around with the regression tests unneccessarily.
> >>
> >> Evidence-wise, the "lookup started != initial lookups wanted" issue
> >> looks like this:
> >> When I enable logging at the FINE level, I see three registrars started,
> >> with unicast discovery on random ports - this is Reggie's fallback
> >> position when it attempts to open its unicast port and it's already in
> >> use.  However, the test that's checking for "lookup started == initial
> >> lookups wanted" is testing for three separate registrars on the default
> >> port.  Obviously that's an impossible proposition.  My working theory is
> >> that the test is supposed to be configured to start Reggie on three
> >> separate unicast ports (e.e. 7700, 7701, 7702), and the assertion is
> >> checking to see if that happened correctly.  This theory is supported by
> >> the existence of
> >> "qa/src/com/sun/jini/test/share/reggie3_2Ports.properties", which
> >> includes properties called 
> >> "net.jini.core.lookup.ServiceRegistrar.port.0=7700",
> >> "net.jini.core.lookup.ServiceRegistrar.port.1=7701",
> >> "net.jini.core.lookup.ServiceRegistrar.port.2=7702", etc. 
> >> Hence my suspicion that I don't understand the configuration mechanism
> >> well enough yet.  My theory is that the test is supposed to start up
> >> Reggie's on three different ports, but there is something about my test
> >> environment setup that causes the configuration to be missed.  Anpother
> >> possibility is that the actual test configuration got messed up at some
> >> point (I did see a comment in one of the properties files about "these
> >> entries are commented out because they are now included in the test
> >> descriptions".
> >>
> >> Testing that theory is going to take me a little while to understand the
> >> multi-layer test configuration mechanism.  Unfortunately I've got a full
> >> morning of other work ahead of me.
> >>
> >> Anyhow, I don't really think changing the tests is the correct approach
> >> at this point (at least in my troubleshooting process).
> >>
> >> Cheers,
> >>
> >> Greg.
> >>
> >>
> >>
> >> On Tue, 2013-04-09 at 07:05, peter_firmstone@apache.org wrote:
> >>  
> >>> Author: peter_firmstone
> >>> Date: Tue Apr  9 11:05:32 2013
> >>> New Revision: 1465969
> >>>
> >>> URL: http://svn.apache.org/r1465969
> >>> Log:
> >>> Be more thorough when checking for port availability on localhost 
> >>> during testing, don't assume port free if current test isn't using it.
> >>>
> >>> Modified:
> >>>     
> >>> river/jtsk/skunk/qa_refactor/trunk/qa/src/com/sun/jini/test/share/LookupServices.java

> >>>
> >>>
> >>> Modified: 
> >>> river/jtsk/skunk/qa_refactor/trunk/qa/src/com/sun/jini/test/share/LookupServices.java

> >>>
> >>> URL: 
> >>> http://svn.apache.org/viewvc/river/jtsk/skunk/qa_refactor/trunk/qa/src/com/sun/jini/test/share/LookupServices.java?rev=1465969&r1=1465968&r2=1465969&view=diff
> >>>
> >>> ==============================================================================

> >>>
> >>> --- 
> >>> river/jtsk/skunk/qa_refactor/trunk/qa/src/com/sun/jini/test/share/LookupServices.java

> >>> (original)
> >>> +++ 
> >>> river/jtsk/skunk/qa_refactor/trunk/qa/src/com/sun/jini/test/share/LookupServices.java

> >>> Tue Apr  9 11:05:32 2013
> >>> @@ -27,7 +27,15 @@ import com.sun.jini.test.spec.discoverys
> >>>  import com.sun.jini.test.spec.discoveryservice.AbstractBaseTest;
> >>>  import 
> >>> com.sun.jini.test.spec.discoveryservice.AbstractBaseTest.DiscoveryStruct;

> >>>
> >>>  import 
> >>> com.sun.jini.test.spec.discoveryservice.AbstractBaseTest.RegGroupsPair;
> >>> +import java.io.IOException;
> >>>  import java.net.InetAddress;
> >>> +import java.net.InetSocketAddress;
> >>> +import java.net.ServerSocket;
> >>> +import java.net.Socket;
> >>> +import java.net.SocketAddress;
> >>> +import java.net.SocketException;
> >>> +import java.net.SocketOptions;
> >>> +import java.net.SocketTimeoutException;
> >>>  import java.net.UnknownHostException;
> >>>  import java.rmi.RemoteException;
> >>>  import java.util.ArrayList;
> >>> @@ -660,12 +668,47 @@ public class LookupServices {
> >>>       * @return true if port in use.       */
> >>>      private boolean portInUse(int port) {
> >>> +        if (port == 0) return false; // Ephemeral
> >>>          for(int i=0;i<lookupsStarted.size();i++) {
> >>>              LocatorGroupsPair pair = lookupsStarted.get(i);
> >>>              int curPort = (pair.getLocator()).getPort();
> >>>              if(port == curPort) return true;
> >>>          }//end loop
> >>> -        return false;
> >>> +        // Open a client ephemeral socket and attempt to connect to
> >>> +        // port on localhost to see if someone's listening.
> >>> +        Socket sock = null;
> >>> +        try {
> >>> +            sock = new Socket();
> >>> +            if (sock instanceof SocketOptions){
> >>> +                // Socket terminates with a RST rather than a FIN, 
> >>> so there's no TIME_WAIT
> >>> +                try {
> >>> +                    ((SocketOptions) 
> >>> sock).setOption(SocketOptions.SO_LINGER, Integer.valueOf(0));
> >>> +                } catch (SocketException se) {
> >>> +                    // Ignore, not supported.
> >>> +                    logger.log( Level.FINEST, "SocketOptions set 
> >>> SO_LINGER threw an Exception", se);
> >>> +                }
> >>> +            }
> >>> +            SocketAddress add = new InetSocketAddress(port);
> >>> +            sock.connect(add, 3000); // Try to connect for up to 
> >>> three seconds +            // We were able to connect to a socket 
> >>> listening on localhost
> >>> +            return true;
> >>> +        } catch (SocketTimeoutException e){
> >>> +            // There might be a stale process assume in use.
> >>> +            logger.log( Level.FINEST, "Socket timed out while 
> >>> trying to connect", e);
> >>> +            return true;
> >>> +        } catch (IOException e){
> >>> +            // There was nothing listening on the socket so it's 
> >>> probably free.
> >>> +            // or it timed out.
> >>> +            return false;
> >>> +        } finally {
> >>> +            if (sock != null){
> >>> +                try {
> >>> +                    sock.close();
> >>> +                } catch (IOException ex){
> >>> +                    logger.log( Level.FINEST, "Socket threw 
> >>> exception while attempting to close", ex);
> >>> +                }// Ignore
> >>> +            }
> >>> +        }
> >>>      }//end portInUse
> >>>           private void refreshLookupLocatorListsAt(int index){
> >>>
> >>>     
> >>
> >>
> >>   
> >


Mime
View raw message