Discussion:
[Ltsp-discuss] NBD crashing
Charles Barnwell
2015-03-19 11:51:24 UTC
Permalink
I am having a serious problem that I am finding hard to diagnose.

I am running Ubuntu 14.04.1 LTS Trusty Tahir with LTSP 5.5.1-1ubuntu2.
Both the server and client images are at the latest level and are kept
up to date.

Occasionally when one client is powered up and negotiates a DHCP
address all other machines go blank, or maybe have error messages
indicating a SQUASHFS error, unable to connect. This happens fairly
randomly and (annoyingly I can't reproduce it.

Looking at /var/log/syslog I think these messages may relate to when I
have a problem.

Mar 19 11:02:39 BFoE01 dhcpd: DHCPACK on 192.168.0.26 to 00:1e:4f:4d:72:4a via e
th0
Mar 19 11:02:39 BFoE01 nbd_server[2944]: Spawned a child process
Mar 19 11:02:39 BFoE01 nbd_server[28560]: virststyle ipliteral
Mar 19 11:02:39 BFoE01 nbd_server[28560]: connect from 192.168.0.26, assigned fi
le is /opt/ltsp/images/i386.img
Mar 19 11:02:39 BFoE01 nbd_server[28560]: Can't open authorization file /etc/lts
p/nbd-server.allow (No such file or directory).
Mar 19 11:02:39 BFoE01 nbd_server[28560]: Starting to serve
Mar 19 11:02:39 BFoE01 nbd_server[28560]: Size of exported file/device is 503033
856
Mar 19 11:02:44 BFoE01 ldminfod[28562]: connect from 192.168.0.26 (192.168.0.26)
Mar 19 11:02:45 BFoE01 nbd_server[28567]: virststyle ipliteral
Mar 19 11:02:45 BFoE01 nbd_server[28567]: connect from 192.168.0.26, assigned fi
le is /opt/ltsp/images/i386.img
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Can't open authorization file /etc/lts
p/nbd-server.allow (No such file or directory).
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Starting to serve
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Size of exported file/device is 503033
856
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Disconnect request received.
Mar 19 11:02:45 BFoE01 nbd_server[2944]: Spawned a child process
Mar 19 11:02:45 BFoE01 nbd_server[2944]: Child exited with 0
Mar 19 11:03:06 BFoE01 nbd_server[2944]: Spawned a child process
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Negotiation failed/5a: magic mismatch
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Exiting.
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Modern initial negotiation failed
Mar 19 11:03:06 BFoE01 nbd_server[2944]: Child exited with 1

The two error messages:
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Negotiation failed/5a: magic mismatch
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Modern initial negotiation failed

Only seem to occur when the problem happens.

The client that caused the problem (192.168.0.26 in this case) seems
to renegotiate with the server and the boot process is successful.
Other machines need to be rebooted, and until they do I get numerous
messages in the log which are:

Mar 19 11:03:18 BFoE01 ldminfod[28570]: connect from 192.168.0.22
(192.168.0.22)

These keep occurring until 192.168.0.22 is rebooted.

I am thinking of changing to use NFS rather than NBD, but would prefer
to fix this as is.

Charles Barnwell

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_____________________________________________________________________
Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto:
https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help, try #ltsp channel on irc.freenode.net
Funke, Martin
2015-03-20 06:54:10 UTC
Permalink
Hi,

i checked my syslog when my client boots up:
Mar 19 11:02:39 BFoE01 nbd_server[2944]: Spawned a child process Mar 19 11:02:39 BFoE01 nbd_server[28560]: virststyle ipliteral Mar 19 11:02:39 BFoE01 nbd_server[28560]: connect from 192.168.0.26, assigned fi le is /opt/ltsp/images/i386.img Mar 19 11:02:39 BFoE01 nbd_server[28560]: Can't open authorization file /etc/lts p/nbd-server.allow (No such file or directory).
Mar 19 11:02:39 BFoE01 nbd_server[28560]: Starting to serve Mar 19 11:02:39 BFoE01 nbd_server[28560]: Size of exported file/device is 503033856

I have the same messages and everything works so far.

What I don’t have is:

Mar 19 11:02:45 BFoE01 nbd_server[28567]: Disconnect request received.
Mar 19 11:02:45 BFoE01 nbd_server[2944]: Spawned a child process Mar 19 11:02:45 BFoE01 nbd_server[2944]: Child exited with 0 Mar 19 11:03:06 BFoE01 nbd_server[2944]: Spawned a child process Mar 19 11:03:06 BFoE01 nbd_server[28569]: Negotiation failed/5a: magic mismatch Mar 19 11:03:06 BFoE01 nbd_server[28569]: Exiting.

How long do you have this issue?`

Best regards
Martin

-----Ursprüngliche Nachricht-----
Von: Charles Barnwell [mailto:***@gmail.com]
Gesendet: Donnerstag, 19. März 2015 12:51
An: ltsp-***@lists.sourceforge.net
Betreff: [Ltsp-discuss] NBD crashing

I am having a serious problem that I am finding hard to diagnose.

I am running Ubuntu 14.04.1 LTS Trusty Tahir with LTSP 5.5.1-1ubuntu2.
Both the server and client images are at the latest level and are kept up to date.

Occasionally when one client is powered up and negotiates a DHCP address all other machines go blank, or maybe have error messages indicating a SQUASHFS error, unable to connect. This happens fairly randomly and (annoyingly I can't reproduce it.

Looking at /var/log/syslog I think these messages may relate to when I have a problem.

Mar 19 11:02:39 BFoE01 dhcpd: DHCPACK on 192.168.0.26 to 00:1e:4f:4d:72:4a via e
th0
Mar 19 11:02:39 BFoE01 nbd_server[2944]: Spawned a child process Mar 19 11:02:39 BFoE01 nbd_server[28560]: virststyle ipliteral Mar 19 11:02:39 BFoE01 nbd_server[28560]: connect from 192.168.0.26, assigned fi le is /opt/ltsp/images/i386.img Mar 19 11:02:39 BFoE01 nbd_server[28560]: Can't open authorization file /etc/lts p/nbd-server.allow (No such file or directory).
Mar 19 11:02:39 BFoE01 nbd_server[28560]: Starting to serve Mar 19 11:02:39 BFoE01 nbd_server[28560]: Size of exported file/device is 503033
856
Mar 19 11:02:44 BFoE01 ldminfod[28562]: connect from 192.168.0.26 (192.168.0.26) Mar 19 11:02:45 BFoE01 nbd_server[28567]: virststyle ipliteral Mar 19 11:02:45 BFoE01 nbd_server[28567]: connect from 192.168.0.26, assigned fi le is /opt/ltsp/images/i386.img Mar 19 11:02:45 BFoE01 nbd_server[28567]: Can't open authorization file /etc/lts p/nbd-server.allow (No such file or directory).
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Starting to serve Mar 19 11:02:45 BFoE01 nbd_server[28567]: Size of exported file/device is 503033
856
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Disconnect request received.
Mar 19 11:02:45 BFoE01 nbd_server[2944]: Spawned a child process Mar 19 11:02:45 BFoE01 nbd_server[2944]: Child exited with 0 Mar 19 11:03:06 BFoE01 nbd_server[2944]: Spawned a child process Mar 19 11:03:06 BFoE01 nbd_server[28569]: Negotiation failed/5a: magic mismatch Mar 19 11:03:06 BFoE01 nbd_server[28569]: Exiting.
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Modern initial negotiation failed Mar 19 11:03:06 BFoE01 nbd_server[2944]: Child exited with 1

The two error messages:
Mar 19 11:03:06 BFoE01 nbd_server[28569]: Negotiation failed/5a: magic mismatch Mar 19 11:03:06 BFoE01 nbd_server[28569]: Modern initial negotiation failed

Only seem to occur when the problem happens.

The client that caused the problem (192.168.0.26 in this case) seems to renegotiate with the server and the boot process is successful.
Other machines need to be rebooted, and until they do I get numerous messages in the log which are:

Mar 19 11:03:18 BFoE01 ldminfod[28570]: connect from 192.168.0.22
(192.168.0.22)

These keep occurring until 192.168.0.22 is rebooted.

I am thinking of changing to use NFS rather than NBD, but would prefer to fix this as is.

Charles Barnwell

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/
_____________________________________________________________________
Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto:
https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help, try #ltsp channel on irc.freenode.net

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_____________________________________________________________________
Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto:
https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help, try #ltsp chan
Charles Barnwell
2015-03-24 12:43:56 UTC
Permalink
Hi Martin,

Thanks for your response.

The system is running smoothly and one client gets powered on (let's
call this machine A). When that client receives an IP address using
DHCP all other machines die. Usually the machines go blank, or
sometimes issue error messages which are:

SQUASHFS error: Unable to read directory block........
SQUASHFS error: Unable to read metadata cache entry........

When this happens I see a negotiation error in /var/log/syslog:

Mar 23 10:59:14 BFoE01 nbd_server[19452]: Spawned a child process
Mar 23 10:59:14 BFoE01 nbd_server[29108]: Negotiation failed/5a: magic mismatch
Mar 23 10:59:14 BFoE01 nbd_server[29108]: Exiting.
Mar 23 10:59:14 BFoE01 nbd_server[29108]: Modern initial negotiation failed
Mar 23 10:59:14 BFoE01 nbd_server[19452]: Child exited with 1

The boot up of machine A continues successfully but all other machines
need to be rebooted.

I have been trying to generate more logging information but haven't
succeeded. I tried using transactionlog in the config file, but that
does not log negotiations and my problem seems to be with negotiation.

Sorry if I have not made this clear.

I am thinking to moving to using NFS instead of NBD, but I am having
problems finding good documentation. Any advice?



Charles
Just to be sure i unstand your problem correctly.
A clients starts up and when this one tries to get a ip address from the
dhcp server every other client which is currently powered on gets an error
message?
How did you set your Server up?
Best regards
Martin
Gesendet: Samstag, 21. März 2015 11:33
An: Funke, Martin
Betreff: Re: AW: [Ltsp-discuss] NBD crashing
Post by Funke, Martin
Hi,
Mar 19 11:02:39 BFoE01 nbd_server[2944]: Spawned a child process Mar 19
11:02:39 BFoE01 nbd_server[28560]: virststyle ipliteral Mar 19 11:02:39
BFoE01 nbd_server[28560]: connect from 192.168.0.26, assigned fi le is
/opt/ltsp/images/i386.img Mar 19 11:02:39 BFoE01 nbd_server[28560]: Can't
open authorization file /etc/lts p/nbd-server.allow (No such file or
directory).
Mar 19 11:02:39 BFoE01 nbd_server[28560]: Starting to serve Mar 19
11:02:39 BFoE01 nbd_server[28560]: Size of exported file/device is 503033856
I have the same messages and everything works so far.
Mar 19 11:02:45 BFoE01 nbd_server[28567]: Disconnect request received.
Mar 19 11:02:45 BFoE01 nbd_server[2944]: Spawned a child process Mar 19
11:02:45 BFoE01 nbd_server[2944]: Child exited with 0 Mar 19 11:03:06 BFoE01
nbd_server[2944]: Spawned a child process Mar 19 11:03:06 BFoE01
nbd_server[28569]: Negotiation failed/5a: magic mismatch Mar 19 11:03:06
BFoE01 nbd_server[28569]: Exiting.
How long do you have this issue?`
Best regards
Martin
-----Ursprüngliche Nachricht-----
Martin,
Thanks for that. I am now pretty sure that the problem occurs when I get the
negotiation failure. Any ideas about how to debug more?
At tyre moment I am restarting nbd_server every night to see if that has an
effect.
Not sure exacting when it started, I'm trying to find out.
Charles.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_____________________________________________________________________
Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto:
https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additiona
Ivan Mincik
2015-03-24 15:00:20 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Charles Barnwell
I am having a serious problem that I am finding hard to diagnose.
Hi Charles, I have had similar misterious problems when I had either IP
conflict or conflict of MAC addresses created by two clients in VirtualBox
with the same MAC. Try to make sure if one of these is not your problem.


- --
Ivan Minčík
***@gmail.com GPG: 0x79529A1E http://imincik.github.io/0x79529A1E.key
***@gista.sk GPG: 0xD714B02C http://imincik.github.io/0xD714B02C.key
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJVEXv/AAoJEPfdLsR5UpoeongIAIo3Pth6v9qVnaQQSYlYD3c4
Bh414m+tjwUteiaRmqKruSJy9unuHXP3qJj7RsG1xrs4YEpkkFWJbWzE5pk7uoCf
BxomyxCa6rcwvTSnR9yj/EicG42fvSE0bwAR+NrR+zjgSOE8F8Pw8c1M5v4n+QZj
j4E/OfSvpOl85I/UfHC55QVDmcI4HUbeADfYhJqmvFZTZZjrYjmxi8ve/Fm6e/GE
07dC2gBYwrlLtGdwExYAjeoSBc2dSrlrJRWPWwG8qXsVDYfhwVCeFMpqG6oOtj9Y
aFQN+SOhSP53V32mQbVzavMCj/vIqRhbH5dGVtDNKQZT/xSPK+y28mByRU3lG/I=
=XeK9
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_____________________________________________________________________
Ltsp-discuss mailing list. To un-subscribe, or change prefs, goto:
https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help, try #ltsp channel on irc.fr

Continue reading on narkive:
Loading...