Skip to main content

High CPU load in IIS web server caused by HttpClient

High CPU load is one of the most common issues web servers struggle with. There could be several root causes such as deadlocks, insufficient hardware, high traffic, poor coding. In this post, I will explain the reason and solution for a high CPU load caused by an object which is not thread-safe.

This issue may surface in many ways. One if them is -obviously- high resource utilization. You can monitor the resource usage in Task Manager, Resource Monitor or Performance Monitor.

Before we go further, let’s remember what the thread safety is:

Thread safety is a computer programming concept applicable in the context of multi-threaded programs. A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time.

Thread safety

High CPU load in Performance Monitor and Event Viewer

The Performance Monitor chart below shows that the CPU load is around 80% (red solid line). It also demonstrates how the error count (brown dashed line) increases in parallel to request total (green dashed line).

High CPU load displayed in Performance Monitor
High CPU load displayed in Performance Monitor

This issue may appear as error or warning messages in Event Viewer that seems unrelated to CPU load at first sight. The message below is recorded in the System container of Event Viewer and it mentions unresponsive application pool which points out high CPU load issue.

Event ID: 5010 (Warning)

A process serving application pool “X” failed to respond to a ping. The process id was “1234”


Root cause of the high CPU load

As I mentioned in the beginning of the post, there could be a variety of reasons why a web server suffers from high CPU load. In this scenario, the reason is that using an instance of a non-thread-safe object multiple times. More specifically, using the same HttpClient instance from different threads increases the CPU usage.

In official documentation for HttpClient, it is recommended to use only one instance of this object:

HttpClient is intended to be instantiated once and re-used throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. This will result in SocketException errors.

HttpClient Class

However, in practice, it is not a good idea especially for the scenarios where requests step on each other:

If you have requests that are related (or won’t step on eachother) then using the same HttpClient makes a lot of sense.

Best practice usage of HttpClient

If you are targeting the wrong CPU architecture (32-bit or 64-bit) that doesn’t match with your DLL libraries, this may cause an issue as well. Check this post out for a solution: How to find out processor architecture (x86, x64) of dll and exe files?

Solution for high CPU load

In many cases like this one, the application shouldn’t use only one instance of HttpClient. It causes requests to overlap each other and increase the CPU load.

Instead of calling the same instance of HttpClient (or the function that is creating HttpClient object), using a new instance of it should solve the high CPU usage issue.

If you don’t have CPU usage issue but your worker process (w3wp.exe) is crashing periodically, this post will help you to fix it: w3wp.exe crashes every 5 minutes with error code 0xc0000374

TLS fatal error code 20. The Windows SChannel error state is 960 (Solved)

You may see “SChannel error state is 960” in Event Viewer when your web server fails to establish secure communication with clients. Users receive certification errors while you see the event log below in your server:

A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 20. The Windows SChannel error state is 960.

Cause

This issue is caused by the different or incompatible chiper suites used in web server and load balancer. Cipher suites or chiper blocks are set of encryption methods such as RSA and DHE.

When there is a conflict or mismatch in the chiper suites used, web server cannot decrypt the encrypted request coming from load balancer and logs this error message: “The TLS protocol defined fatal error code is 20. The Windows SChannel error state is 960.”

Looking for a way to redirect all HTTP requests to HTTPS? Check this post out.

How to solve “SChannel error state is 960”

You can fix secure connection failures and make Schannel errors disappear by enabling custom chiper suite and editing the list of chiper suites used in your web server. Follow the instructions below in Windows Server:

  1. Log onto the server using an account that is a member of the Local Administrators group
  2. Go to “Start > Run“. Enter: gpedit.msc
  3. In the left pane, expand “Computer Configuration > Administrative Templates > Network > SSL Configuration Settings
  4. In the right pane, right click “SSL Cipher Suite Order” and choose “Edit”
  5. Click “Enabled”
  6. Copy the content of “SSL Cipher Suites” text box and paste it notepad
  7. Edit this list to make sure it matches the chipper suite list used in your load balancer. As a general recommendation:
    • Move TLS_RSA chiper suites to the top
    • Copy TLS_ECDHE ones after them
    • Remove these two chiper suites as they have known interoperability issues:
      TLS_DHE_RSA_WITH_AES_128_CBC_SHA
      TLS_DHE_RSA_WITH_AES_256_CBC_SHA
    • In the “SSL Cipher Suite Order” window, click “OK”
    • Reboot the server
Solve SChannel error state is 960

Note: The list you provide in the Step 7 cannot exceed 1023 characters. In order to reduce it, make sure to give priority to the ones at top in the default cipher list. This list is ordered from strongest chipper suites to the weakest ones. Additionally, you can remove the suites that are in the black list for HTTP/2. Here is more information about HTTP/2 black list.

It didn’t work?

A less likely cause of this issue is a change in MAC (Message Authentication Code) (Source). This code is used by web server to determine that the request hasn’t changed on the way (request forgery or man-in-the-middle attack). If the web server sees that the MAC has changed, it drops the connection. Make sure that your load balancer doesn’t edit MAC value.

Another possible cause is a Windows update (KB4457129) that reportedly breaks NLB (Network Load Balancer) Cluster. Uninstalling this update or installing the patch (KB4457133) solves the issue (Source).

Reference:

Recommendations for using dynamic IP address in web servers

It is highly recommended to use static IP address for servers so that users have a consistent access to the applications. However, there might be certain scenarios where you want to use dynamic IP address in web servers.

Dynamic IP address in web servers

Using dynamic IP address in web servers means using a dynamic DNS such as DynDNS and No-IP. Every time the server address changes, DNS should be notified with the new IP address to update DNS record for applications accordingly. Most of the dynamic DNS services do this by using desktop agents installed to your servers.

Assuming that you built your architecture and made necessary configuration, there are a few more things to consider:

  • Check with software vendors to see if this scenario is supported
  • In order to minimize downtime between IP changes, it is recommended to have a short TTL (time-to-live) time. Short TTL time will speed up DNS requests so that users retrieve the new IP faster in the case of a change in web server IP address. Please note that having short TTL time may affect network performance. Please discuss it with your network team before making changes.
  • If possible, increase the reservation time of IP addresses assigned to web servers in your DHCP to reduce the amount of IP changes over time.

Recommended TTL value

The recommended TTL value when using dynamic IP address in web servers depends on the DHCP IP lease duration in your network. If your lease duration is set to the default value (8 days for Microsoft DHCP), 1 hour TTL value (default for Microsoft DNS) is a good choice. If your lease duration is very low such as 1 or 2 hours, then your TTL value should be less than 1 hour. How much low it could be depends on your network performance. If you need to lower TTL to below 1 hour, I recommend lowering it gradually while monitoring network performance.

A good rule of thumb is to make your DDNS TTL half the amount of your DHCP lease. If the IP address lease is set to 60 (1 minute), set your TTL to 30 (30 seconds). If the IP address is 3600 (1 hour), set your TTL to 1800 (30 minutes).

Source: 1&1

Using dynamic IP address in multihomed web servers

If your server is multihomed (connected to multiple networks), you will need multiple NICs (Network Interface Controller) in your server. Each site hosted in your server can be bound to one NIC. If you have more sites than NICs, make sure to specify unique hostnames or port numbers for bindings of sites so they don’t conflict with each other. In the case of 2 sites having the bindings, one of the sites will turn off automatically.

Dynamic IP address in web servers
Site bindings in IIS

Looking for a way to capture client IP address in your IIS logs? Check this post out.

References