[rancid] clogin + ssh: stuck at fingerprint verification

Patrik Lundin patrik at sigterm.se
Thu May 18 14:43:57 UTC 2017


Hello,

I have been trying to figure out an odd problem related to clogin when
using ssh that appeared the other day. Basically clogin will (sometimes)
get stuck when the ssh client prompts for fingerprint verification.

OS version: Ubuntu 16.04.2 LTS
RANCID package version: 3.3.0-1
Expect package version: 5.45-7
OpenSSH version: 1:7.2p2-4ubuntu2.2

The .cloginrc looks like this:
===
add autoenable * 1
add method * {ssh}
add user * test
add password * secret
===

The output of running clogin looks like this when it hangs (and
eventually times out):
===
# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Error: TIMEOUT reached
===

The problem is that clogin fails to successfully parse the ssh output in
order to send the "yes" needed to continue.

What makes this problem tricky is that it seems to be timing related.
Here is an attempt that initially works and then fails on the second
attempt after removing the fingerprint again:
===
# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)? 
Host switch01.example.com added to the list of known hosts.
yes
Warning: Permanently added 'switch01.example.com,10.0.0.10' (RSA) to the list of known hosts.

Password: 
[...]

# ssh-keygen -R switch01.example.com
# Host switch01.example.com found: line 1
/root/.ssh/known_hosts updated.
Original contents retained as /root/.ssh/known_hosts.old

# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)? 
Error: TIMEOUT reached
===

The regex in clogin that is responsible for answering the question looks like this:
===
-re "(Host key not found |The authenticity of host .* be established).* \\(yes/no\\)\\?" {
    send "yes\r"
    send_user "\nHost $router added to the list of known hosts.\n"
    exp_continue
}
===

It requires that all three lines of output are parsed as a single chunk
(starting with "The authenticity of host" and ending with "(yes/no)".
When stuff works this is indeed what happens (heavily trimmed output):
===
# clogin -d switch01.example.com
[...]
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)? 
[...]
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match glob pattern "Host is unreachable"? no
"No address associated with name"? no
"(Host key not found |The authenticity of host .* be established).* \(yes/no\)\?"? (No Gate, RE only) gate=yes re=yes
expect: set expect_out(0,string) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)?"
expect: set expect_out(1,string) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)?"
send: sending "yes\r" to { exp4 }
===

On the specific host where the above output has been collected it even
goes as far as running clogin without debug mostly hangs while it always
manages to send a "yes" if running with -d (I'm guessing because it is
giving the ssh binary more time to present the output while debug output
is being printed).

Here is how it can look on a host where running with -d fails, heavily
trimmed:
===
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established." (spawn_id exp4) match regular expression [...]
[...]
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\n" (spawn_id exp4) match regular expression  [..]
[...]
expect: does "" (spawn_id exp4) match regular expression  [...]
[...]
expect: does "RSA key fingerprint is SHA256:<fingerprint>." (spawn_id exp4) match regular expression [...]
[...]
expect: does "RSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match regular expression [...]
[...]
expect: does "Are you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match regular expression [...]
===

As can be seen, instead of receiving the complete output as a single
chunk it is instead handled in pieces, which means the regex that is
supposed to send a "yes" is never matched.

It appears I can get around this by increasing the magic "sleep 0.3" in
clogin to something like "sleep 5" but it seems like a pretty brittle
workaround.

Has anyone struggled with something like this before?

-- 
Patrik Lundin



More information about the Rancid-discuss mailing list