Why does my large Windows cluster keep crashing?

If your Windows cluster is made up of more than 4000 hosts, a Windows limitiation requires you to manually add a registry entry to the master and master candidate host's Windows Registry Editor following these steps:

  1. On the master host, start the Windows Registry Editor.

  2. Locate HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters.

  3. Click Parameters.

  4. From Edit, click New and then add the following registry entry:

    Value Name: MaxUserPort

    Value Type: DWORD

    Value Data: 65534

    Valid Range: 5000-65534 (decimal)

    Default: 0x1388 (5000 decimal)

    Description: This parameter controls the maximum port number that is used when a program requests any available user port from the system. Typically, short-lived ports are allocated between the values of 1024 and 5000 inclusive.

  5. Quit the Registry Editor.

  6. Restart the master host, and then repeat these steps on each candidate host.