Discussion:
[Ltsp-discuss] [PATCH 0/1] Bye bye "No response from server" - add ssh error logging
Jakob Unterwurzacher
2014-09-14 16:16:51 UTC
Permalink
Dear team,

I first posted the patch to add ssh error logging to ldm a while ago to
https://bugs.launchpad.net/ltsp/+bug/1325388 .

It has bit-rotted there for three months without feedback. Due to
whitespace changes it does not apply any more.

I have now un-bit-rotted and tested again on Ubuntu 14.04. It works fine.

I am posting the patch in a follow-up email in the hope to get it merged,
to have ldm finally produce useful error messages on login failure.

Best regards,
Jakob

Jakob Unterwurzacher (1):
Bye bye "No response from server" - add ssh error logging

src/logging.c | 12 +++--
src/plugins/ssh/ssh.c | 124 +++++++++++++++++++++++++++++++++++---------------
2 files changed, 95 insertions(+), 41 deletions(-)
--
1.9.3
Jakob Unterwurzacher
2014-09-14 16:16:52 UTC
Permalink
ldm now reads the full output from ssh even if it dies on the way.
Proper error messages are logged and displayed, wrapped in _() so
they can be localized.

Messages we get:
* Permission denied (publickey,password).
The common case of a password typo or invalid user.
We make sure we read all the output BEFORE checking if the child has
died. Also, we handle EINTR, as this is what select() returns when it
gets SIGCHLD.

* ssh: connect to host server port 22: Connection timed out
For that to work, ConnectTimeout=10 is set on the ssh command line.
Otherwise the 30 second timeout elapses before ssh tells us what is
going on.

* ssh: connect to host server port 22: Connection refused
That message contains a ": " like the password prompt. This is solved
by waiting for 200ms of silence before searching for the prompt (polling
interval reduced from 1s to 200ms to not bore the user).
In the error case, ssh will have exited in the meantime, and we know
for sure that it was NOT a password prompt.
---
src/logging.c | 12 +++--
src/plugins/ssh/ssh.c | 124 +++++++++++++++++++++++++++++++++++---------------
2 files changed, 95 insertions(+), 41 deletions(-)

diff --git a/src/logging.c b/src/logging.c
index 63389dd..d93ff39 100644
--- a/src/logging.c
+++ b/src/logging.c
@@ -29,10 +29,14 @@
#include "logging.h"

FILE *logfile = NULL;
-char *LOG_LEVEL_NAMES[] = { "EMERGENCY", "ALERT", "CRITICAL", "ERROR",
- "WARNING", "NOTICE", "INFO", "DEBUG"
-};
-
+char *LOG_LEVEL_NAMES[] = { "EMERGENCY", /* 0 */
+ "ALERT", /* 1 */
+ "CRITICAL", /* 2 */
+ "ERROR", /* 3 */
+ "WARNING", /* 4 */
+ "NOTICE", /* 5 */
+ "INFO", /* 6 */
+ "DEBUG" }; /* 7 */
int loglevel = 0;

/*
diff --git a/src/plugins/ssh/ssh.c b/src/plugins/ssh/ssh.c
index 0110937..689ec43 100644
--- a/src/plugins/ssh/ssh.c
+++ b/src/plugins/ssh/ssh.c
@@ -16,6 +16,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <utmp.h>
+#include <errno.h>
#include <crypt.h>

#include "../../ldm.h"
@@ -289,7 +290,7 @@ expect(int fd, char *p, int seconds, ...)
char buffer[BUFSIZ];
gchar *arg;
GPtrArray *expects;
- int loopcount = seconds;
+ int loopcount = seconds * 5;
int loopend = 0;

bzero(p, MAXEXP);
@@ -313,58 +314,87 @@ expect(int fd, char *p, int seconds, ...)
* Main loop.
*/

+ log_entry("ssh",7,"expect: entering main loop");
+
while (1) {
- timeout.tv_sec = (long) 1; /* one second timeout */
- timeout.tv_usec = 0;
+ timeout.tv_sec = 0;
+ timeout.tv_usec = 200000; /* 200ms timeout */

FD_ZERO(&set);
FD_SET(fd, &set);
st = select(FD_SETSIZE, &set, NULL, NULL, &timeout);

- if (child_exited) {
- break; /* someone died on us */
- }
-
- if (st < 0) { /* bad thing */
- break;
- }
-
- if (loopcount == 0) {
- break;
- }
+ log_entry("ssh", 7, "expect: select returned %d", st);

- if (!st) { /* timeout */
- loopcount--; /* We've not seen the data we want */
- continue;
- }
+ /* There is data available - we want to read it in even if
+ * the child process has exited - we want to get the error
+ * message
+ */
+ if (st > 0)
+ {
+ size = read(fd, buffer, sizeof buffer);
+ if (size == -1) {
+ log_entry("ssh", 3, "expect: read returned error %d (%s)", errno, strerror(errno));
+ break;
+ }
+ if (size == 0) {
+ log_entry("ssh", 3, "expect: read returned 0 (EOF))");
+ break;
+ }

- size = read(fd, buffer, sizeof buffer);
- if (size <= 0) {
- break;
+ if ((total + size) < MAXEXP) {
+ strncpy(p + total, buffer, size);
+ total += size;
+ log_entry("ssh", 7, "expect: got %d bytes so far", total);
+ }
+ else
+ {
+ log_entry("ssh", 3, "expect: buffer full");
+ return ERROR;
+ }
}
+ else /* No data available */
+ {
+ /* select() was interrupted by a signal,
+ * most likely SIGCHLD. Just try again */
+ if (st == -1 && errno == EINTR)
+ {
+ log_entry("ssh", 7, "expect: retrying");
+ continue;
+ }
+ else if (st == -1) { /* other errors */
+ log_entry("ssh", 3, "expect: select returned error %d (%s)", errno, strerror(errno));
+ break;
+ }
+ /* st == 0 and there was no data for 200ms */

- if ((total + size) < MAXEXP) {
- strncpy(p + total, buffer, size);
- total += size;
- }
+ if (child_exited) { /* someone died on us */
+ log_entry("ssh", 3, "expect: ssh process has died");
+ break;
+ }

- for (i = 0; i < expects->len; i++) {
- if (strstr(p, g_ptr_array_index(expects, i))) {
- loopend = TRUE;
+ if (loopcount == 0) {
+ log_entry("ssh", 3, "expect: timed out after %d seconds", seconds);
break;
}
- }

- if (loopend) {
- break;
+ for (i = 0; i < expects->len; i++) {
+ if (strstr(p, g_ptr_array_index(expects, i))) {
+ log_entry("ssh", 7, "expect: found string %d \"%s\"", i, g_ptr_array_index(expects, i));
+ loopend = TRUE;
+ break;
+ }
+ }
+
+ loopcount--; /* We've not seen the data we want */
}

- if (timeout.tv_sec == 0) {
+ if (loopend) {
break;
}
}

- log_entry("ldm", 7, "expect saw: %s", p);
+ log_entry("ssh", 7, "expect: saw: %s", p);

if (size < 0 || st < 0 || child_exited) {
return ERROR; /* error occured */
@@ -380,6 +410,7 @@ void
ssh_chat(gint fd)
{
int seen;
+ size_t len;
gchar lastseen[MAXEXP];
int first_time = 1;

@@ -407,20 +438,34 @@ ssh_chat(gint fd)
/* If it's not the first time through, or the :'s not at the
* end of a line (password expiry or error), set the message */
if ((!first_time) || (lastseen[i - 1] != ':')) {
+ log_entry("ssh", 4, "ssh_chat: ssh returned \"%d\"", lastseen);
set_message(lastseen);
}
/* If ':' *IS* the last character on the line, we'll assume a
* password prompt is presented, and get a password */
if (lastseen[i - 1] == ':') {
+ log_entry("ssh", 7, "ssh_chat: writing password");
write(fd, sshinfo->password, strlen(sshinfo->password));
write(fd, "\n", 1);
}
first_time = 0;
} else if (seen < 0) {
- set_message(_("No response from server, restarting..."));
+ log_entry("ssh", 3, "ssh_chat: expect returned error %d", seen);
+ g_strstrip(lastseen);
+ len = strlen(lastseen);
+ if (len > 0)
+ {
+ log_entry("ssh", 3, "ssh_chat: ssh returned \"%s\"", lastseen);
+ set_message(_(lastseen));
+ }
+ else
+ {
+ log_entry("ssh", 3, "ssh_chat: did not get an error message from ssh");
+ set_message(_("No response from server, restarting..."));
+ }
sleep(5);
close_greeter();
- die("ssh", "no response, restarting");
+ die("ssh","login failed, restarting");
}
}
}
@@ -454,12 +499,17 @@ ssh_session(void)
command = g_strjoin(" ", "ssh", "-Y", "-t", "-M",
"-S", sshinfo->ctl_socket,
"-o", "NumberOfPasswordPrompts=1",
+ /* ConnectTimeout should be less than the timeout ssh_chat
+ * passes to expect, so we get the error message from ssh
+ * before expect gives up
+ */
+ "-o", "ConnectTimeout=10",
"-l", sshinfo->username,
port ? port : "",
sshinfo->sshoptions ? sshinfo->sshoptions : "",
sshinfo->server,
"echo " SENTINEL "; exec /bin/sh -", NULL);
- log_entry("ssh", 7, "ssh_session: %s", command);
+ log_entry("ssh", 6, "ssh_session: %s", command);

sshinfo->sshpid = ldm_spawn(command, NULL, NULL, ssh_tty_init);

@@ -487,7 +537,7 @@ ssh_endsession(void)
command =
g_strjoin(" ", "ssh", "-S", sshinfo->ctl_socket, "-O", "exit",
sshinfo->server, NULL);
- log_entry("ssh", 7, "closing ssh session: %s"), command;
+ log_entry("ssh", 6, "closing ssh session: %s", command);
pid = ldm_spawn(command, NULL, NULL, NULL);
ldm_wait(pid);
close(sshinfo->sshfd);
--
1.9.3
Vagrant Cascadian
2014-09-15 15:30:15 UTC
Permalink
Post by Jakob Unterwurzacher
ldm now reads the full output from ssh even if it dies on the way.
Proper error messages are logged and displayed, wrapped in _() so
they can be localized.
Thanks for the (updated) patch! It's an issue that's long been a thorn
in the side of many users. Apologies for not getting back to you sooner
on this.

We unfortunately don't have a lot of active developers who understand a
patch of this size; ldm has largely been in maintenance mode for quite
some time.

Would it be possible to usefully break this into maybe 2-4 smaller
patches, and submit to the (old) launchpad bug? That might make it a
little easier to review.
Post by Jakob Unterwurzacher
* Permission denied (publickey,password).
The common case of a password typo or invalid user.
We make sure we read all the output BEFORE checking if the child has
died. Also, we handle EINTR, as this is what select() returns when it
gets SIGCHLD.
* ssh: connect to host server port 22: Connection timed out
For that to work, ConnectTimeout=10 is set on the ssh command line.
Otherwise the 30 second timeout elapses before ssh tells us what is
going on.
What controls the 30 second timeout? Where exactly does it occur?
Post by Jakob Unterwurzacher
* ssh: connect to host server port 22: Connection refused
That message contains a ": " like the password prompt. This is solved
by waiting for 200ms of silence before searching for the prompt (polling
interval reduced from 1s to 200ms to not bore the user).
In the error case, ssh will have exited in the meantime, and we know
for sure that it was NOT a password prompt.
The last time we tried parsing the output like this, it turned out that
the messages returned via ssh were highly variable depending on distro
and local configuration. Maybe your code handles this no worse than the
current code; I can't really say.


live well,
vagrant
Jakob Unterwurzacher
2014-09-15 18:17:50 UTC
Permalink
Post by Vagrant Cascadian
Post by Jakob Unterwurzacher
ldm now reads the full output from ssh even if it dies on the way.
Proper error messages are logged and displayed, wrapped in _() so
they can be localized.
Thanks for the (updated) patch! It's an issue that's long been a thorn
in the side of many users. Apologies for not getting back to you sooner
on this.
We unfortunately don't have a lot of active developers who understand a
patch of this size; ldm has largely been in maintenance mode for quite
some time.
Would it be possible to usefully break this into maybe 2-4 smaller
patches, and submit to the (old) launchpad bug? That might make it a
little easier to review.
Hi Vagrant!

I already suspected it might be a bit big. I will try to split it up
into a few logical steps.
Post by Vagrant Cascadian
Post by Jakob Unterwurzacher
* ssh: connect to host server port 22: Connection timed out
For that to work, ConnectTimeout=10 is set on the ssh command line.
Otherwise the 30 second timeout elapses before ssh tells us what is
going on.
What controls the 30 second timeout? Where exactly does it occur?
The thirty seconds are passed in from ssh_chat():

ssh.c:427: seen = expect(fd, lastseen, 30, SENTINEL, ": ", NULL);

That is, expect gives up after 30 seconds if it cannot find a prompt.

When the server cannot be reached, ssh by default takes longer than that
to timeout, so we would never get the error message.
Post by Vagrant Cascadian
The last time we tried parsing the output like this, it turned out that
the messages returned via ssh were highly variable depending on distro
and local configuration. Maybe your code handles this no worse than the
current code; I can't really say.
Parsing human-readable output is generally fragile, I understand the
concern. However, I am not actually interpreting the output, I am just
passing it on to the user in full. That is, only if a password prompt
cannot be found.
The logic for detecting the password prompt has not changed - a ":" at
the end of output is a prompt.

I'll take a look if the prompt can look differently, and of course I
will be there to fix any issues that may arise.

Best regards,
Jakob
Jakob Unterwurzacher
2014-09-21 18:36:20 UTC
Permalink
Post by Vagrant Cascadian
We unfortunately don't have a lot of active developers who understand a
patch of this size; ldm has largely been in maintenance mode for quite
some time.
Hi Vagrant,

I looked at that thing again. Most of the complexity in the ssh handling
seems to come from the expired password handling.
However, it's broken and deemed unfixable [1, 2].
Ok to rip it out? That will simplify things a lot.

Best regards,
Jakob


[1] https://bugs.launchpad.net/ltsp/+bug/1053447
[2]
https://www.mail-archive.com/ltsp-***@lists.sourceforge.net/msg40865.html
Loading...