Troubleshooting Server Load Balancing¶
This section describes how to identify, troubleshoot, and resolve the most common issues encountered by users with server load balancing.
Connections not being balanced¶
Connections not being balanced is most always a failure of the testing
methodology being used, and is usually specific to HTTP. Web browsers will
commonly keep connections to a web server open, and hitting refresh re-uses the
existing connection. A single connection will never be changed to another
balanced server. Another common issue is the web browser cache, where the
browser never actually requests the page again. It is preferable to use a
command line tool such as
curl for testing of this nature, because it
ensures the test is not impacted by the problems inherent in testing with web
curl has no cache, and opens a new connection to the server each
time it is run. More information on curl can be found in
Verifying load balancing.
If sticky connections are enabled, ensure testing is performed from multiple source IP addresses. Tests from a single source IP address will go to a single server unless a long period of time elapses between connection attempts.
Down server not marked as offline¶
If a server goes down but is not marked as offline, it is because the monitoring performed by the load balancing daemon believes it is still up and running. If using a TCP monitor, the TCP port must still be accepting connections. The service on that port could be broken in numerous ways and still answer TCP connections. For ICMP monitors, this problem is exacerbated, as servers can be hung or crashed with no listening services at all and still answer to pings.
Live server not marked as online¶
If a server is online, but not marked as online, it is because it isn’t online from the perspective of the load balancing daemon monitors. The server must answer on the TCP port used or respond to pings sourced from the IP address of the firewall interface closest to the server.
For example, if the server is on the LAN, the server must answer requests initiated from the LAN IP address of the firewall. To verify this for ICMP monitors, browse to Diagnostics > Ping and ping the server IP address using the interface where the server is located.
For TCP monitors, use Diagnostics > Test Port, and choose the firewall’s LAN interface as the source, and the web server IP address and port as the target.
Another way to test is from a shell prompt on the firewall, either using the
console or ssh menu option
8 and the
# nc -vz 10.6.0.12 80 nc: connect to 10.6.0.12 port 80 (tcp) failed: Operation timed out
And here is an example of a successful connection:
# nc -vz 10.6.0.12 80 Connection to 10.6.0.12 80 port [tcp/http] succeeded!
If the connection fails, troubleshoot further on the web server.
Unable to reach a virtual server from a client in the same subnet as the pool server¶
Client systems in the same subnet as the pool servers will fail to properly
connect using this load balancing method.
relayd forwards the connection to
the web server with the source address of the client intact. The server will
then try to respond directly to the client. If the server has a direct path to
the client, e.g. through a locally connected NIC in the same subnet, it will not
flow back through the firewall properly and the client will receive the reply
from the server’s local IP address and not the IP address in
due to the fact that the server IP address is incorrect from the perspective of
the client, the connection is dropped as being invalid.
One way around this is by using manual outbound NAT and crafting a manual outbound NAT rule so that traffic leaving the internal interface (LAN) coming from the LAN subnet, going to the web servers, gets translated to the interface address of LAN. That way the traffic appears to originate from the firewall, and the server will respond back to the firewall, which then relays the traffic back to the client using the expected addresses. The original client source IP address is lost in the process, but the only other viable solution is to move the servers to a different network segment.