Re: Cant' reboot my mud.

From: Jeremy Elson (jelson@blaze.cs.jhu.edu)
Date: 04/22/94


>      This is not being caused by the mud code or autorun. The problem
> that you are experiencing can be caused one of two ways. The first
> and most likely is that you already have a version of the mud up
> and running and don't realize it.

Or, there is something else being run on that port which you've forgotton
about, such as a sign.

I think the second explanation (below) is more likely though.

>     The second cause of this problem can be blamed on your users. Try
> issuing the netstat command in /usr/ucb and you should see a line
> similar to this:
> 
> tcp 0   0  YOUR_HOST_NAME.4224   mail.sni.co.uk.2275    ESTABLISHED
> 
> when the mud is executing properly and you have a PC in from the UK.
> If the mud crashes or reboots you may run across this display:
> 
> tcp 0   0  YOUR_HOST_NAME.4224   mail.sni.co.uk.2275    TIME_WAIT
> 
> The TIME_WAIT is caused by the TCP/IP protocol stack. It won't allow
> you to reboot your mud because it has a lingering network connection
> that may get confused if the server port suddenly came back to life.

This isn't quite correct, though it's very close. :)  As you say, normal
connections to the MUD (or any other TCP/IP port) show up in the netstat
display as being in state ESTABLISHED.  When the server goes down, the
operating system bumps all remaining connections into the "TIME_WAIT" state
for a little while; the purpose of this is to keep another application from
immediatley reusing the port, just in case there are a few lagged packets
still heading across the Internet, expecting to find the old application
there.

TIME_WAIT connections are normal, though, and they don't cause problems.
After about 30 to 60 seconds (depending on your operating system), the
TIME_WAIT connections will disappear.

The problem is that sometimes connections can get stuck in the CLOSING
state.  As far as I can tell, this is caused when the machine on the
other end of the line crashes (say, when someone's telnetting from a
PC and just turns the PC off).  The reason for this is that TCP uses
negotiation for both opening and closing a connection, and the port is
stuck there waiting for some element of the port-closing negotiation,
which obviously will never come from a crashed machine.

CLOSING connections are big trouble, because some operating systems will
not allow a new application to use a certain port until all of the CLOSING
connections go away.  And I've seen CLOSING connections stay around for
literally a month before.

So, what's the solution?  I don't know.  The dumb solution is just to reboot
your own machine, which will get rid of any CLOSING connections.  Or, if you
can, move to a site which uses an OS that handles CLOSING connections more
gracefully.

I've heard that it's possible to screw around with the networking code
(specifically, the SO_REUSEADDR parameter) to get the application to ignore
CLOSINGs, but I've never had any success with that.

Jeremy



This archive was generated by hypermail 2b30 : 12/07/00 PST