Socket Error type: 4, WSAGetLastError: 10048

system · April 3, 2010, 7:27am

(imported topic written by Tingram91)

Have a Windows 2003 Server that agent was just installed on, and after it processes a few collection and dies with the following error. Now it continues to post in reports but will no longer perform any actions sent from the console.

ShutdownListener

Troubleshooting: Full agent wipe and reinstall, registry, files and folders removed.

Server rebooted

Any ideas?

BenKus · April 5, 2010, 4:32am

(imported comment written by BenKus)

Not sure that I have seen that before… probably best to contact support and ssend them the log file…

Ben

system · April 5, 2010, 7:51am

(imported comment written by Tingram91)

Thanks for the follow up Ben, always enjoy stumping the master

system · April 9, 2010, 5:25am

(imported comment written by rdamours91)

I ran into the same thing a couple weeks ago. I’ll see where I saved the diagnostics info before I solved the problem with a re-install.

system · April 12, 2010, 6:54pm

(imported comment written by Tingram91)

I was actually able to get this resolved, i believe it had something to do with the client being in Restricted mode.

Once i was able to resolve that problem, this issue went away.

system · May 10, 2010, 10:38pm

(imported comment written by rdamours91)

I am getting a a lot more of these than I originally thought. I’ve got about 40 servers that don’t report back completely until a client restart.

Here’s what I’m finding on the net to do with the 10048 errors.

You are overloading the TCP/IP stack. Windows (and I think all socket stacks actually) have a limitation on the number of sockets that can be opened in rapid sequence due to how sockets get closed under normal operation. Whenever a socket is closed, it enters the TIME_WAIT state for a certain time (240 seconds IIRC). Each time you poll, a socket is consumed out of the default dynamic range (I think its about 5000 dynamic ports just above 1024), and each time that poll ends, that particular socket goes into TIME_WAIT. If you poll frequently enough, you will eventually consume all of the available ports, which will result in TCP error 10048.

Generally, WCF tries to avoid this problem by pooling connections and things like that. This is usually the case with internal services that are not going over the internet. I am not sure if any of the wsHttp bindings support connection pooling, but the netTcp binding should. I would assume named pipes does not run into this problem. I couldn’t say for the MSMQ binding.

There are two solutions you can use to get around this problem. You can either increase the dynamic port range, or reduce the period of TIME_WAIT. The former is probably the safer route, but if you are consuming an extremely high volume of sockets (which doesn’t sound like the case for your scenario), reducing TIME_WAIT is a better option (or both together.)

Changing the Dynamic Port Range

Open regedit.

Open key HKLM\System\CurrentControlSet\Services\Tcpip\Parameters

Edit (or create as DWORD) the MaxUserPort value.

Set it to a higher number. (i.e. 65534)

Changing the TIME_WAIT delay

Open regedit.

Open key HKLM\System\CurrentControlSet\Services\Tcpip\Parameters

Edit (or create as DWORD) the TCPTimeWaitDelay.

Set it to a lower number. Value is in seconds. (i.e. 60 for 1 minute delay)

One of the above solutions should fix your problem. If it persists after changing the port range, I would see try increasing the period of your polling so it happens less frequently…that will give you more leeway to work around the time wait delay. I would change the time wait delay as a last resort.