1

Closed

Dispose() should not leave running threads behind!

description

Hi. I have found a really hard to find bug that is very consistently repeatable. Before getting into the details, I think it should be noted that this issue will be resolved if the Renci.SshNet.SshClient.Dispose() and the Renci.SshNet.ForwardedPortLocal.Dispose() (and anything else) actually and honestly dispose everything. But the problem is that after calling those, there are still threads hanging around doing stuff and sometimes they get stuck badly. It looks like someone went crazy with anonymous methods and launching Thread Tasks without leaving a handle to the Thread for management by the parents to kill them when disposed. And I found in Session.NET40.cs this line: Task.Factory.StartNew(action, TaskCreationOptions.LongRunning);which I changed to: Task.Factory.StartNew(action, TaskCreationOptions.AttachedToParent);but it had no positive impact.I have seen posts on this site about Disconnect doing funny stuff and also about the Timeouts not seeming to disconnect things, and some other things like that. I bet they are related to getting over your head in multi-threading or events.The result is that sometimes I still have the process listening on the local port way after the Dispose() is called. And also the Connection is sitting there too. Usually this happens when eliminating both the Forwards and the Connection, but sometimes even just the Forwards. The call to the Forward's Disconnect() and Dispose() when the problem happens will take a much longer time, but then sadly continue going after seemingly giving up.When I pause I can see 2 Renci threads hanging around and stuck at these points:ForwardedPortLocal.NET.cs line 34: var socket = this._listener.AcceptSocket();Session.cs line 1575: var message = this.ReceiveMessage();I have tested this A LOT!!! trying to find ways around it. It happens in the code base (pulled today 12/5/12), it happens in .Net 4.0 DLL from a month ago, and it happens from .Net 3.5 DLL pulled today also. Another result is that sometimes the situation causes the app to die a horrible death with:System.AggregateException was unhandled Message=A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. Source=mscorlib StackTrace: at System.Threading.Tasks.TaskExceptionHolder.Finalize() InnerException: System.NullReferenceException Message=Object reference not set to an instance of an object. Source=Renci.SshNet StackTrace: at Renci.SshNet.ForwardedPortLocal.b__0() at System.Threading.Tasks.Task.InnerInvoke() at System.Threading.Tasks.Task.Execute()WHEN DOES THIS HAPPEN:Here is the kicker. I wrote an app to manage SSH tunnels. And when I first was having problems with Renci, I made it so I can implement other libraries and other means. Via a config file I can add and remove tunnels and have multiple connections. The process reads the config file periodically and applies the changes found. So I can have a connection using Renci, another using Chilkat, another using Plink, Tunnelier, and that's all I finished implementing so far.If I have all Renci connections, it seems that I can add and remove tunnels and connections a bunch of times and not have an issue. (however usually there is a Connection (SshClient) that hangs around in CLOSE_WAIT state). If I bring say Chilkat into the mix, things still are OK.. and I can replace all active connections to one of the other implementations.The problems start happening when I bring in Plink or Tunnelier. They are launched using System.Process. They close themselves properly from what I see in memory and via TCP connections and SSH server. When I have a Renci and Plink (let's say) connection, it is then that if I start removing tunnels from Renci I might have issues. If I start removing the Connection, I almost always have an issue.The issue is that when Renci is told to close up shop, it stalls for a while, and then the code moves on while leaving the listeners in place (though of course not working). And they stay there it seems indefinitely. And I'm also using System.Net.NetworkInformation.IPGlobalProperties.GetIPGlobalProperties().GetActiveTcpListeners() to look at the listeners and it's there. It's because there are those silly Renci threads that are hanging around for some reason trying to read from the socket.And I have verified that the Connection replacement (like Plink) is not actually opening up that tunnel. I have my app now specifically checks for the Active listeners before moving tunnels to another connection.This may be a very complicated specific bug to find, but I think the real root of the problem and the much easier one to fix, is that when you Dispose(), it really should be DISPOSED, leaving no threads behind. I have also verified that calling SshClient.Dispose() does not DISCONNECT, so you have to call both. I think also similar thing with the Forward.. if you don't call Stop().This is how I stop the connections and port forwards: foreach (ForwardedPort forward in this.sshConnection.ForwardedPorts.ToList()) { this.sshConnection.RemoveForwardedPort(forward); Console.WriteLine("just finished removing tunnel"); } this.sshConnection.Disconnect(); Console.WriteLine("just finished conn Disconnect()"); this.sshConnection.Dispose(); Console.WriteLine("just finished conn Dispose()"); public void Dispose() { if (this.localForward != null) { this.localForward.Stop(); Console.WriteLine("just finished tunnel Stop()"); this.localForward.Dispose(); Console.WriteLine("just finished tunnel Disconnect()"); //System.Threading.Thread.Sleep(2000); } }
Closed Apr 6 at 3:56 PM by drieseng

comments

coewar wrote Dec 6, 2012 at 3:34 AM

One correction I'd like to make. The source code I was using in today's testing actually was from about a month ago. So I downloaded today's code and copied the .Net 4.0 version into my project and I still came up with the same results, though it seemed a little harder to make it fail.

Maybe the line numbers are different in where the threads are stuck after the Disconnect() and Stop() and Dispose() calls.

One of the threads (for the Connection I'd guess) that gets stuck seems to always start on this line when I pause:
Session.NET.cs line 99:
                var receivedBytes = this._socket.Receive(buffer, offset + receivedTotal, length - receivedTotal, SocketFlags.None);
but going up the stack, you can see it's stuck in a loop:
Session.cs line 1585 and near by while loop:
            while (this._socket.Connected)
            {
                var message = this.ReceiveMessage();

                if (message == null)
                {
                    throw new NullReferenceException("The 'message' variable cannot be null");
                }

                this.HandleMessageCore(message);
            }

While more than 1 threads (for the tunnels) are stuck in the ForwardedPort.NET.cs line 34:
                    var socket = this._listener.AcceptSocket();

coewar wrote Dec 6, 2012 at 3:36 AM

None the less, these threads are still there after the Dispose() was called and kept going. This ought not to happen. If it all gets disposed for real, this and many other issues likely will be resolved.

coewar wrote Dec 6, 2012 at 4:31 PM

Let me know if I can help you with this coding.

olegkap wrote Dec 19, 2012 at 4:17 PM

Hi,
Thanks for describing this issue.
I think you right, that this problem is likely connected to few others related to hanging the process.
I know in the past I tried numerous times to find the best way to disconnect the connection or even more importantly to discover when connection was dropped.
I will look into task issue and see how can I fix that but if you have any suggestions that works for, please let me know so I could incorporate it into main code.

What I might need help with is testing, since I am on completly different project now and dont have all my test environment set up, and will have to go back to it after New Year :(

olegkap wrote Dec 19, 2012 at 4:46 PM

Hi,

I just checked in some code which I think should improve port forwarding disconnection.

Can you please test it in your environment and let me know if fixed port forwarding problem?

Thanks,
Oleg

coewar wrote Dec 19, 2012 at 4:53 PM

Sweet! Yes I'll give this a try soon.

coewar wrote Dec 19, 2012 at 6:51 PM

Unfortunately there is no change from my test. I did see that you forgot to add a call to this.InternalStop() in the ForwardedPortLocal.Dispose() method (because you did add it to ForwardedPortDynamic.Dispose()).

I also changed in my local copy ForwardedPort.Stop() from:
    public virtual void Stop()
    {
        this.Session.ErrorOccured -= Session_ErrorOccured;
    }
to:
    public virtual void Stop()
    {
        if (this.Session != null)
        {
            this.Session.ErrorOccured -= Session_ErrorOccured;
        }
    }
Because if you ask the SshConnection to remove the port forwarding before disposing the Port Forward, the Port Forward disposing will fail there because the Session will be NULL.


I will make further changes and look more closely.

The very strange thing is that this only happens when I have a Process that's doing something spun up by the same application using the TcpListener. I think I'll write up a simpler test app that does not use anything else to make it easier. But the shocker is that in ForwardedPortLocal.InternalStop(), this part here waits and actually times out and the Dispose call here does nothing:
        this._listenerTaskCompleted.WaitOne(this.Session.ConnectionInfo.Timeout);
        this._listenerTaskCompleted.Dispose();
I have even made changes where I actually contain that calling inside a Thread and then call the Abort() on the thread. The thread is gone but the listening is still there. And it always stays there until I kill the totally unrelated Process that is also running in my app.

I'm going to start looking into where you're setting up Channels and stuff.. I do see some of that stuff is not Disposed, though not saying that it's at all a problem.

However, I will change from using the Threads to using the BeginAcceptSocket() and EndAcceptSocket() because I think it might cause a built-in handling of the events and threads that maybe deals with this better.

Otherwise I'm clueless why it's happening.

olegkap wrote Dec 19, 2012 at 7:54 PM

Thanks,
yea, forgot about localport, will add it now as well as Session null fix.
I thought about doing BeginAccept myself too, so if yo have tested implemention of this fix, please let me know and I will update the code.

Thanks,
Oleg

olegkap wrote Mar 7, 2013 at 9:40 PM

Hi,

Can you give it another try?
I just committed 23481 code where I fixed 2 possible issues for leaving thread running in port forwarding scenario.

Can you try it again and see if it was fixed in your situation?

Thanks,
Oleg