Archived Forum Post

Index of archived forum posts

Question:

When the Network Cable is Unplugged / Disconnected...

Jan 04 '17 at 18:52

im testing for connection dropout while tunnel is open/connected. this is so i can reconnect should i detect a disconnect (or not send data until reconnected)

im monitoring using the /tunnel.GetCurrentState/ while i disconnecting the pc from the network. but it does not actually give the connection state.

i have a couple of scenarios i have encountered.

1) while the network cable is disconnected i try to send mysql data via the tunnel, vb6 hangs, i do expect this with a blocking calls while it times out.. but it never timed out. (after 5mins) if i reconnect the network cable hoping it may reconnect, but it stays in the hung state. i have to kill vb6.

2) while the network cable is disconnected i try to send mysql data via the tunnel, vb6 hangs as expected, but does timeout and i get control back, but if i reconnect the network cable to the pc, and successfully re-establish the tunnel connection (via a reconnect button), when i try and reconnect to the mysql (ADODB.Connection -> conn.connect) mysql complains its not connected.

any clues on the best way to handle and check connection state to ensure its safe to send data.

I did some testing...

Test scenario #1.

1) Create tunnel
2) Monitor tunnel.IsSshConnected (is 1 when connected) and tunnel.GetCurrentState
3) Disconnect network cable
4) while tunnel.IsSshConnected returns 1
5) try to do mysql call (within the 30 seconds, IsSshConnected is still 1, so app assumes connection is ok)
6) app freezes/blocks
7) wait 5 mins, app still hung
8) reconnect network cable
9) wait 5 mins, app still hung

Only way out is to end task

Test scenario #2.

1) Create tunnel
2) Monitor tunnel.IsSshConnected (is 1 when connected) and tunnel.GetCurrentState
3) Disconnect network cable
4) wait until tunnel.IsSshConnected returns 0 (about 30 seconds)
5) try mysql function, returns immediately with connect error (good)
6) reconnect network cable
7) Re-Create tunnel
8) tunnel.IsSshConnected returns 1
9) try mysql function, returns immediately with connect error (bad)

Even though the tunnel is reconnected and running, mysql function using the port fail. Maybe i should have called StopAccepting(waitForThreadExit) and CloseTunnel(waitForThreadExit) before recreating tunnel! lets try #3

Test scenario #3.

1) Create tunnel
2) Monitor tunnel.IsSshConnected (is 1 when connected) and tunnel.GetCurrentState
3) Disconnect network cable
4) wait until tunnel.IsSshConnected returns 0 (about 30 seconds)
5) call tunnel.StopAccepting(waitForThreadExit)
6) call tunnel.CloseTunnel(waitForThreadExit)
7) Re-Create tunnel
8) wait for tunnel.IsSshConnected returns 1
9) try mysql function
10) app freezes/blocks

Only way out is to end task

For a workaround for now, As my app is unattended and runs on a server, i will write a watchdog program to monitor a heartbeat, if it times out it will kill the app and restart it. (Kludge, but will work for what i need)


Answer

When the network cable is unplugged, a TCP connection is not necessarily broken. In fact, TCP connections can survive temporary network disconnections, even potentially while reading and writing.

Before explaining further, you may wish to read this article: http://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

First and foremost -- you don't want software that abandons a connection so easily, especially with wireless communications. You want to be able to survive connectivity disruptions that are short-lived.

Note: When I say "TCP", this also includes SSL/TLS and SSH connections, because these protocols operate on top of TCP. The SSL, TLS, and SSH protocols are still TCP connections.

What happens when the network cable is unplugged? If both sides of the connection are idle and nothing is being sent or received, and the network cable is quickly re-connected, then the temporary disconnection is never noticed. The TCP connection remains alive and well.

What happens if the network cable is unplugged, and quickly afterwards one tries to send data on the socket? It depends. On the Windows operating system, experience has shown that the "send" function call (i.e. https://linux.die.net/man/2/send , or https://msdn.microsoft.com/en-us/library/windows/desktop/ms740506(v=vs.85).aspx) will likely still return a success status. This is what's going on: The application is handing the data to the operating system to be sent. The operating system, given that it has room in the outgoing send buffer for the socket, says "OK, I'll take care of it.". The "send" system call returns success, the application continues on its merry way, while the operating system works to actually send the data.

It may be that OS tries to accommodate temporary disruptions, and waits to see if the network cable becomes restored before failing completely. The TCP protocol involves ACKs and such, and just because the OS sends the data once doesn't mean the OS won't need to retry (if it doesn't receive an ACK).

In other words, just because the app called the "send" system call (either directly or indirectly through Chilkat), doesn't mean the data actually got to the destination. (But if it does reach the destination, then everything sent beforehand is guaranteed delivered and in order.)

So... a network cable can be unplugged, and "send" system calls on the socket can continue succeeding (from the app's point of view) up to a point, depending on the socket send buffer size (which is a socket option) or the operating system's TCP implementation.

What happens when reading on a connected socket when the network cable is unplugged? Reading a socket is always via the "recv" system call (https://linux.die.net/man/2/recv) The programming language doesn't matter. Whether it's Java, Objective-C, Perl, Ruby, etc., when you dig down through the layers of code, it's "recv" that's ultimately getting called.

If recv is called just after the cable is unplugged, it won't return immediately indicating an error. Again.. you wouldn't want it. There could be short-lived network disruptions that become resolved. So... recv will hang and eventually return a failure. The typical failure is:

    WindowsError: An existing connection was forcibly closed by the remote host.
    WindowsErrorCode: 0x2746
(On Linux, Mac OS X, and other systems, it would be an equivalent error. ) The point is that the unplugged network cable is not immediately known. It takes time for the software to give up on the problem caused by the physical hardware.


What happens in test scenario #1?

IsSshConnected won't immediately know that the network cable is unplugged. It is impossible to immediately know it.

So...the MySQL call sends data on its connected socket. It's connected to a background client thread on localhost:<someport>, which is a client thread managed by the Chilkat SshTunnel class. This connection is fine, even with the network cable unplugged because both endpoints are on localhost.

The data is received by the client thread and is deposited for the Tunnel Manager Thread (also a background thread managed by Chilkat SshTunnel) to send on the client's channel through the SSH connection. But the network cable is unplugged. However, the Tunnel Manager Thread won't know this immediately. The "send" system call(s) to send the client data through the SSH tunnel succeed (as described above). If the network cable is quickly plugged back in, the send could actually succeed. If not, it will eventually fail. However, the "send" system call is ancient history by now. It returned success, and from the Tunnel Manager's point of view, the data has been sent.

Now the app is waiting for the MySQL response. Meanwhile, the Tunnel Manager Thread is watching the SSH connection for any incoming data. Eventually, the recv on the SSH server connection fails (with "An existing connection was forcibly closed by the remote host.") and now the Tunnel Manager Thread proceeds to tell all the client threads to close their connections and end.

At this point, your application's UI thread of execution, which is inside the MySQL call, should find that it's connected peer has disconnected, and it should return with a failure. I don't know why the MySQL code would wait forever. I've tested this scenario with other code, such as with Chilkat HTTP, and the HTTP request fails after ~30 seconds (as it should).

The fact that the app freezes/blocks is because the app is making a synchronous call from the UI thread into MySQL, and the flow of control has not returned back so that control can return to the VB6 event loop to process UI events (thus the UI freezes).


Answer

For test scenario #2 and #3...

To recover from a lost SSH server connection, make the following calls:

  1. tunnel.StopAccepting
  2. tunnel.DisconnectAllClients
  3. tunnel.CloseTunnel

Then.. reconnect, re-authenticate, and start accepting connections again:

  1. tunnel.Connect(...)
  2. tunnel.AuthenticatePw(...) (or another Authenticate method)
  3. tunnel.BeginAccepting(portNum)

I tested it and for me it worked fine..