vcl-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Kurth <andy_ku...@ncsu.edu>
Subject Re: New capture failed attempting Windows post-load tasks
Date Mon, 13 Aug 2012 17:49:08 GMT
I believe there is a bug in the latest version of Cygwin which is
causing update_cygwin.cmd to fail.  As a result, the computer being
loaded never responds to SSH.  When ssh-keygen.exe is run from a
normal non-Cygwin command prompt, the following occurs:

C:\cygwin\home\root\VCL\Scripts>C:\Cygwin\bin\ssh-keygen.exe -t rsa1
-f C:\cygwin\etc\ssh_host_key -N ""
Generating public/private rsa1 key pair.
      8 [main] ssh-keygen 224 exception::handle: Exception:
STATUS_ACCESS_VIOLATION
   2114 [main] ssh-keygen 224 open_stackdumpfile: Dumping stack trace
to ssh-keygen.exe.stackdump
  61325 [main] ssh-keygen 224 exception::handle: Exception:
STATUS_ACCESS_VIOLATION
  68272 [main] ssh-keygen 224 exception::handle: Error while dumping
state (probably corrupted stack)

Running rebaseall doesn't help. The command succeeds if run from a
Cygwin shell.  I just committed an update to update_cygwin.cmd to wrap
the ssh-keygen.exe commands in "bash.exe -c".
(https://issues.apache.org/jira/browse/VCL-616)

You're going to have to update the file on the management node and in
any images which were captured but aren't loading:

On the management node:
* cd /usr/local/vcl/tools/Windows/Scripts
* rm -f update_cygwin.cmd
* wget https://svn.apache.org/repos/asf/vcl/trunk/managementnode/tools/Windows/Scripts/update_cygwin.cmd

For images which aren't loading correctly, update_cygwin.cmd will need
to be updated within the image and then a new revision of the VCL
image must be created.

* Make an imaging reservation for the problematic image.
* Watch the console as the image is being loaded.  Assuming you're
using ESXi, view the Console tab from the vSphere Client.  You should
see the VM being powered on, the root account automatically logs in,
runs a few scripts, and then logs off.
* After root is automatically logged off, manually log in as root.
The password will be the value of WINDOWS_ROOT_PASSWORD configured in
/etc/vcl/vcld.conf.
* Once logged in as root, open the Cygwin shell.
* cd ~/VCL/Scripts
* rm -f update_cygwin.cmd
* wget https://svn.apache.org/repos/asf/vcl/trunk/managementnode/tools/Windows/Scripts/update_cygwin.cmd
* Manually run update_cygwin.cmd: ./update_cygwin.cmd

The vcld process should still be running and waiting for the computer
to respond to SSH (you have 900 seconds).  When you run
update_cygwin.cmd, the computer should begin responding and the
reservation should finish loading.  You should be able to log in
normally from the information on the Current Reservations page.  Save
a new revision of the image.  It should be saved with the updated copy
of update_cygwin.cmd which was downloaded to the management node.

-Andy


On Fri, Aug 3, 2012 at 12:25 PM, Basilio, Norvin <nbasilio@odu.edu> wrote:
> I am also experiencing this issue when using Cygwin 1.7. I've run the "update_cygwin.cmd"
manually and saw that its unable to regenerate the keys. I decided to try and capture my image
using the older Cygwin 1.5 and the update_cygwin.cmd was able to regenerate the keys correctly
allowing the reload process to complete.
>
> Norvin Basilio
> nbasilio@odu.edu
>
>
> -----Original Message-----
> From: Hechler, Adam [mailto:hechla@rpi.edu]
> Sent: Friday, August 03, 2012 12:14 PM
> To: user@vcl.apache.org
> Subject: RE: New capture failed atttempting Windows post-load tasks
>
> Hello again,
>
> So, I walked out of the office last night thinking that my re-capture was running smoothly.
It got about 20 minutes in, I think and then (I think this is the section containing the fatal
error - it failed to configure the firewall to all SSH?). Is that my problem?  If so, any
idea how to correct that?
>
> Thanks,
> Adam
>
> ----
>
> 2012-08-02 17:39:37|24852|125:125|image|Windows.pm:firewall_enable_ssh_private(4633)|SSH
will be enabled on private interface: Local Area Connection 3
> 2012-08-02 17:39:37|24852|125:125|image|utils.pm:run_ssh_command(5380)|executing SSH
command on vmwg0-120-57:
> |24852|125:125|image| /usr/bin/ssh -i /etc/vcl/vcl.key  -o
> |24852|StrictHostKeyChecking=no -l root -p 22 -x vmwg0-120-57
> |24852|'C:/Windows/System32/netsh.exe firewall delete portopening
> |24852|protocol = TCP port = 22 interface = "Local Area Connection 4"
> |24852|;C:/Windows/System32/netsh.exe firewall delete portopening
> |24852|protocol = TCP port = 22 profile = ALL
> |24852|;C:/Windows/System32/netsh.exe firewall set portopening name =
> |24852|"Cygwin SSHD" protocol = TCP port = 22 mode = ENABLE interface =
> |24852|"Local Area Connection 3"' 2>&1
> 2012-08-02 17:39:42|24852|125:125|image|utils.pm:run_ssh_command(5464)|run_ssh_command
output:
> |24852|125:125|image| The interface was not found.
> |24852|125:125|image| Ok.
> |24852|125:125|image| The interface was not found.
> 2012-08-02 17:39:42|24852|125:125|image|utils.pm:run_ssh_command(5474)|SSH command executed
on vmwg0-120-57, command:
> |24852|125:125|image| /usr/bin/ssh -i /etc/vcl/vcl.key  -o
> |24852|StrictHostKeyChecking=no -l root -p 22 -x vmwg0-120-57
> |24852|'C:/Windows/System32/netsh.exe firewall delete portopening
> |24852|protocol = TCP port = 22 interface = "Local Area Connection 4" ;C:/Windows/System32/netsh.exe
firewall delete portopening protocol = TCP port = 22 profile = ALL ;C:/Windows/System32/netsh.exe
firewall set portopening name = "Cygwin SSHD" protocol = TCP port = 22 mode = ENABLE interface
= "Local Area Connection 3"' 2>&1 125:125|image| returning (1, "The interface was not
found. O...") 125:125|image| ---- WARNING ---- 125:125|image| 2012-08-02 17:39:42|24852|125:125|image|Windows.pm:firewall_enable_ssh_private(4665)|failed
to configure firewall to allow SSH on private interface, exit status: 1, output:
> |24852|125:125|image| The interface was not found. Ok. The interface was not found.
> |24852|125:125|image| ( 0) Windows.pm, firewall_enable_ssh_private
> |24852|(line: 4665) 125:125|image| (-1) Windows.pm, reboot (line: 3335)
> |24852|125:125|image| (-2) Windows.pm, disable_pagefile (line: 2077)
> |24852|125:125|image| (-3) Windows.pm, pre_capture (line: 474)
> |24852|125:125|image| (-4) Version_5.pm, pre_capture (line: 105)
> |24852|125:125|image| (-5) VMware.pm, capture (line: 556) 125:125|image|
> |24852|---- WARNING ---- 125:125|image| 2012-08-02
> |24852|17:39:42|24852|125:125|image|Windows.pm:reboot(3336)|reboot not
> |24852|attempted, failed to enable ssh from private IP addresses
>
>
>
>> -----Original Message-----
>> From: Hechler, Adam [mailto:hechla@rpi.edu]
>> Sent: Thursday, August 02, 2012 4:38 PM
>> To: user@vcl.apache.org
>> Subject: RE: New capture failed atttempting Windows post-load tasks
>>
>> Thanks Dmitri,
>>
>> I was able to ssh to the vm from the management node before I captured.
>>
>> Curious.. because I never thought about it before... I can re-capture
>> an existing vm that's already been captured? I guess it makes logical
>> sense. It's still just a vm existing in VMWare Server.
>>
>> I'll give that a try.
>>
>> Adam
>>
>> > -----Original Message-----
>> > From: dchebota@gmu.edu [mailto:dchebota@gmu.edu]
>> > Sent: Thursday, August 02, 2012 4:35 PM
>> > To: user@vcl.apache.org
>> > Subject: Re: New capture failed atttempting Windows post-load tasks
>> >
>> > Adam
>> >
>> > Where you able to 'ssh -i /etc/vcl/vcl.key image-computer-name'
>> > before
>> you
>> > captured the image?
>> >
>> > Yes, it seems like a good idea to redo ssh config, run
>> > get-node-key.sh from management node and re-capture the image.
>> > You will have new image under Manage Images and can delete the old
>> image
>> > which is not working.
>> >
>> > Reboot the image before you start capture to make sure Cygwin SSH
>> > starts up.
>> >
>> > Thanks
>> >
>> >
>> > On Aug 2, 2012, at 16:18 , "Hechler, Adam" <hechla@rpi.edu> wrote:
>> >
>> > > Hi Dmitri,
>> > >
>> > > I tried that and it's not working. I even went into Cygwin and
>> > > tried to
>> > manually start sshd from in there and it's giving me the following
>> > error
>> > messages:
>> > >
>> > > Could not load host key: /etc/ssh_host_rsa_key Could not load host
>> > > key: /etc/ssh_host_dsa_key Could not load host key:
>> > > /etc/ssh_host_ecdsa_key Disabling protocol version 2. Could not
>> > > load host key
>> > > sshd: no hostkeys available -- exiting.
>> > >
>> > > When I check in etc, there are files for the host keys but they're
>> > > empty
>> > now.  When I check the sshd log there's a bunch of entries showing
>> > that it matched host keys and then three sets of "no host keys
>> > available" at the bottom of the log (presumably from my last three
>> > attempts to start sshd beginning with the reload).
>> > >
>> > > Can I just run the cywin-sshd-config.sh again on the vm and then
>> > > run the
>> > gen-node-key again on the management node?  It's already been
>> > captured so I'm not sure if that would cause havoc.
>> > >
>> > > Adam
>> > >
>> > >
>> > >> -----Original Message-----
>> > >> From: dchebota@gmu.edu [mailto:dchebota@gmu.edu]
>> > >> Sent: Thursday, August 02, 2012 4:03 PM
>> > >> To: user@vcl.apache.org
>> > >> Subject: Re: New capture failed atttempting Windows post-load
>> > >> tasks
>> > >>
>> > >> Hi Adam
>> > >>
>> > >> Once you connect to Windows XP using VI client, can you start
>> > >> Cygwin
>> SSH
>> > >> service manually under Control Panel -> Services?
>> > >>
>> > >> Thanks.
>> > >> On Aug 2, 2012, at 15:34 , "Hechler, Adam" <hechla@rpi.edu> wrote:
>> > >>
>> > >>> Hi again,
>> > >>>
>> > >>> So after getting the new sshd-config file this morning, I
>> > >>> configured it
>> and
>> > all
>> > >> seemed good. I then attempted to capture my base image. The
>> > >> capture
>> > itself
>> > >> completed successfully but then I got an error that the reload
>> > >> process
>> > failed
>> > >> right after this:
>> > >>>
>> > >>> 2012-08-02
>> > >>
>> 12:23:18|21124|124:124|reload|Windows.pm:post_load(583)|beginning
>> > >> Windows post-load tasks on vmwg0-120-57
>> > >>>
>> > >>> After numerous attempts (about 107) to connect to SSH it finally
>> > >>> failed
>> > >> reporting:
>> > >>>
>> > >>> 2012-08-02
>> > >>
>> >
>> 12:38:35|21124|124:124|reload|Module.pm:code_loop_timeout(767)|waiti
>> > >> ng for vmwg0-120-57 to respond to SSH, code did not return true
>> > >> after waiting 900 seconds
>> > >>>
>> > >>> Since it didn't finish the post-load tasks I was still able to
>> > >>> login as root to
>> > my
>> > >> Windows XP image using the VI Client console. I opened Cygwin and
>> > typed ps
>> > >> -ef looking to see if sshd was running but it's not. The only
>> > >> processes
>> > running
>> > >> are ps, bash and mintty. Should I be able to see if sshd is
>> > >> running using
>> this
>> > >> method of checking. I know about ps -ef from very limited unix
>> > interactions
>> > >> so I thought I'd try it.
>> > >>>
>> > >>> I know that in the past, when sshd didn't start (before
>> > >>> capturing into
>> > VCL) I
>> > >> would have to open a cmd prompt and run the rebaseall but it
>> > >> looks like
>> > that
>> > >> cmd file gets deleted during the capture? because it's no longer
>> > >> in C:\cygwin\home\root which is where it used to be. I was
>> > >> thinking I
>> would
>> > >> just try to run that again.
>> > >>>
>> > >>> Any clues?
>> > >>>
>> > >>> Thanks,
>> > >>> Adam
>> > >>>
>> > >>>
>> > >>>
>> > >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> > >>> Adam Hechler Senior Analyst /PC Systems Administrator Rensselaer
>> > >>> Polytechnic Institute
>> > >>> 275 Windsor Street
>> > >>> Hartford, CT 06120 USA
>> > >>> Ph: 860-548-2446
>> > >>> Email: hechla@rpi.edu
>> > >>> Web: http://www.ewp.rpi.edu
>> > >>> <image001.jpg> <image002.jpg> <image003.jpg>
 <image004.png>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Thank you,
>> > >>
>> > >> Dmitri Chebotarov
>> > >> Virtual Computing Lab Systems Engineer, TSD - Ent Servers &
>> > >> Messaging
>> > >> 223 Aquia Building, Ffx, MSN: 1B5
>> > >> Phone: (703) 993-6175
>> > >> Fax: (703) 993-3404
>> > >>
>> > >>
>> > >>
>> > >
>> >
>> >
>> >
>> > --
>> > Thank you,
>> >
>> > Dmitri Chebotarov
>> > Virtual Computing Lab Systems Engineer, TSD - Ent Servers &
>> > Messaging
>> > 223 Aquia Building, Ffx, MSN: 1B5
>> > Phone: (703) 993-6175
>> > Fax: (703) 993-3404
>> >
>> >
>> >
>
>
>
> --
> BEGIN-ANTISPAM-VOTING-LINKS
> ------------------------------------------------------
>
> Teach CanIt if this mail (ID 690561004) is spam:
> Spam:        https://www.spamtrap.odu.edu/b.php?i=690561004&m=46685512045c&t=20120803&c=s
> Not spam:    https://www.spamtrap.odu.edu/b.php?i=690561004&m=46685512045c&t=20120803&c=n
> Forget vote: https://www.spamtrap.odu.edu/b.php?i=690561004&m=46685512045c&t=20120803&c=f
> ------------------------------------------------------
> END-ANTISPAM-VOTING-LINKS
>

Mime
View raw message