IPsec Troubleshooting

Due to the finicky nature of IPsec, it isn’t unusual for trouble to arise. Thankfully there are some basic (and some not so basic) troubleshooting steps that can be employed to track down potential problems.

IPsec Logging

Examples presented in this chapter have logs edited for brevity but significant messages remain.

Logging for IPsec may be configured to provide more useful information. To configure IPsec logging for diagnosing tunnel issues pfSense, the following procedure yields the best balance of information:

  • Navigate to VPN > IPsec on the Advanced Settings tab
  • Set IKE SA, IKE Child SA, and Configuration Backend to Diag
  • Set all other log settings to Control
  • Click Save

Note

Changing logging options is not disruptive to IPsec activity and there is no need to enter a specific “debug mode” for IPsec on current versions of pfSense.

Tunnel does not establish

First check the service status at Status > Services. If the IPsec service is stopped, double check that it is enabled at VPN > IPsec. Also, if using mobile clients, ensure that on the Mobile clients tab, the enable box is also checked.

If the service is running, check the firewall logs (Status > System Logs, Firewall tab) to see if the connection is being blocked, and if so, add a rule to allow the blocked traffic. Rules are normally added automatically for IPsec, but that feature can be disabled.

The single most common cause of failed IPsec tunnel connections is a configuration mismatch. Often it is something small, such as a DH group set to 1 on side A and 2 on side B, or perhaps a subnet mask of /24 on one side and /32 on the other. Some routers (Linksys, for one) also like to hide certain options behind “Advanced” buttons or make assumptions. A lot of trial and error may be involved, and a lot of log reading, but ensuring that both sides match precisely will help the most.

Depending on the Internet connections on either end of the tunnel, it is also possible that a router involved on one side or the other does not properly handle IPsec traffic. This is a larger concern with mobile clients, and networks where NAT is involved outside of the actual IPsec endpoints. The problems are generally with the ESP protocol and problems with it being blocked or mishandled along the way. NAT Traversal (NAT- T) encapsulates ESP in UDP port 4500 traffic to work around these issues.

Tunnel establishes but no traffic passes

The top suspect if a tunnel comes up but won’t pass traffic is the IPsec firewall rules. If Site A cannot reach Site B, check the Site B firewall log and rules. Conversely, if Site B cannot contact Site A, check the Site A firewall log and rules. Before looking at the rules, inspect the firewall logs at Status > System Logs, on the Firewall tab. If blocked entries are present which involve the subnets used in the IPsec tunnel, then move on to checking the rules. If there are no log entries indicating blocked packets, revisit the section on IPsec routing considerations in Routing and gateway considerations.

Blocked packets on the IPsec or enc0 interface indicate that the tunnel itself has established but traffic is being blocked by firewall rules. Blocked packets on the LAN or other internal interface may indicate that an additional rule may be needed on that interface ruleset to allow traffic from the internal subnet out to the remote end of the IPsec tunnel. Blocked packets on WAN or OPT WAN interfaces would prevent a tunnel from establishing. Typically this only happens when the automatic VPN rules are disabled. Adding a rule to allow the ESP protocol and UDP port 500 from that remote IP address will allow the tunnel to establish. In the case of mobile tunnels, allow traffic from any source to connect to those ports.

Rules for the IPsec interface can be found under Firewall > Rules, on the IPsec tab. Common mistakes include setting a rule to only allow TCP traffic, which means things like ICMP ping and DNS would not work across the tunnel. See Firewall for more information on how to properly create and troubleshoot firewall rules.

In some cases it is possible that a setting mismatch can also cause traffic to fail passing the tunnel. In one instance, a subnet defined on one non-pfSense firewall was 192.0.2.1/24, and on the pfSense firewall it was 192.0.2.0/24. The tunnel established, but traffic would not pass until the subnet was corrected.

Routing issues are another possibility. Running a traceroute (tracert on Windows) to an IP address on the opposite side of the tunnel can help track down these types of problems. Repeat the test from both sides of the tunnel. Check the Routing and gateway considerations section in this chapter for more information. When using traceroute , traffic which enters and leaves the IPsec tunnel will seem to be missing some interim hops. This is normal, and part of how IPsec works. Traffic which does not properly enter an IPsec tunnel will appear to leave the WAN interface and route outward across the Internet, which would point to either a routing issue such as pfSense not being the gateway (as in Routing and gateway considerations), an incorrectly specified remote subnet on the tunnel definition, or to a tunnel which has been disabled.

Some hosts work, but not all

If traffic between some hosts over the VPN functions properly, but some hosts do not, this is commonly one of four things:

Missing, incorrect or ignored default gateway:
 If the device does not have a default gateway, or has one pointing to something other than the pfSense firewall, it does not know how to properly get back to the remote network on the VPN (see Routing and gateway considerations). Some devices, even with a default gateway specified, do not use that gateway. This has been seen on various embedded devices, including IP cameras and some printers. There isn’t anything that can be done about that other than getting the software on the device fixed. This can be verified by running a packet capture on the inside interface of the firewall connected to the network containing the device. Troubleshooting with tcpdump is covered in Using tcpdump from the command line, and an IPsec- specific example can be found in IPsec tunnel will not connect. If traffic is observed leaving the inside interface of the firewall, but no replies return, the device is not properly routing its reply traffic or could potentially be blocking it via a local client firewall.
Incorrect subnet mask:
 If the subnet in use on one end is 10.0.0.0/24 and the other is 10.254.0.0/24, and a host has an incorrect subnet mask of 255.0.0.0 or /8, it will never be able to communicate across the VPN because it thinks the remote VPN subnet is part of the local network and hence routing will not function properly. The system with the broken configuration will attempt to contact the remote system via ARP instead of using the gateway.
Host firewall:If there is a firewall on the target host, it may not be allowing the connections. Check for things like Windows Firewall, iptables, or similar utilities that may be preventing the traffic from being processed by the host.
Firewall rules on pfSense:
 Ensure the rules on both ends allow the desired network traffic.

Connection Hangs

IPsec does not gracefully handle fragmented packets. Many of these issues have been resolved over the years, but there may be some lingering problems. If hangs or packet loss are seen only when using specific protocols (SMB, RDP, etc.), MSS clamping for the VPN may be necessary. MSS clamping can be activated under VPN > IPsec on the Advanced Settings tab. On that screen, check Enable MSS clamping on VPN traffic and then enter a value. A good starting point would be 1400, and if that works slowly increase the MSS value until the breaking point is hit, then back off a little from there.

“Random” Tunnel Disconnects/DPD Failures on Embedded Routers

If IPsec tunnels are dropped on an ALIX or other embedded hardware that is pushing the limits of its CPU, DPD on the tunnel may need disabled. Such failures tend to correlate with times of high bandwidth usage. This happens when the CPU on a low-power system is tied up with sending IPsec traffic or is otherwise occupied. Due to the CPU overload it may not take the time to respond to DPD requests or see a response to a request of its own. As a consequence, the tunnel will fail a DPD check and be disconnected. This is a clear sign that the hardware is being driven beyond its capacity. If this happens, consider replacing the firewall with a more powerful model.

Tunnels Establish and Work but Fail to Renegotiate

In some cases a tunnel will function properly but once the phase 1 or phase 2 lifetime expires the tunnel will fail to renegotiate properly. This can manifest itself in a few different ways, each with a different resolution.

DPD Unsupported, One Side Drops but the Other Remains

Consider this scenario, which DPD is designed to prevent, but can happen in places where DPD is unsupported:

  • A tunnel is established from Site A to Site B, from traffic initiated at Site A.
  • Site B expires the phase 1 or phase 2 before Site A
  • Site A will believe the tunnel is up and continue to send traffic as though the tunnel is working properly.
  • Only when Site A’s phase 1 or phase 2 lifetime expires will it renegotiate as expected.

In this scenario, the two likely things resolutions are: Enable DPD, or Site B must send traffic to Site A which will cause the entire tunnel to renegotiate. The easiest way to make this happen is to enable a keep alive mechanism on both sides of the tunnel.

Tunnel Establishes When Initiating, but not When Responding

If a tunnel will establish sometimes, but not always, generally there is a mismatch on one side. The tunnel may still establish because if the settings presented by one side are more secure, the other may accept them, but not the other way around. For example if there is an Aggressive/Main mode mismatch on an IKEv1 tunnel and the side set for Main initiates, the tunnel will still establish. However, if the side set to Aggressive attempts to initiate the tunnel it will fail.

Lifetime mismatches do not cause a failure in Phase 1 or Phase 2.

To track down these failures, configure the logs as shown in IPsec Logging and attempt to initiate the tunnel from each side, then check the logs.

IPsec Log Interpretation

The IPsec logs available at Status > System Logs, on the IPsec tab contain a record of the tunnel connection process and some messages from ongoing tunnel maintenance activity. Some typical log entries are listed in this section, both good and bad. The main things to look for are key phrases that indicate which part of a connection worked. If “IKE_SA … established” is present in the log, that means phase 1 was completed successfully and a Security Association was negotiated. If “CHILD_SA … established” is present, then phase 2 has also been completed and the tunnel is up.

In the following examples, the logs have been configured as listen in IPsec Logging and irrelevant messages may be omitted. Bear in mind that these are samples and the specific ID numbers, IP addresses, and so forth will vary.

Successful Connections

When a tunnel has been successfully established both sides will indicate that an IKE SA and a Child SA have been established. When multiple Phase 2 definitions are present with IKEv1, a child SA is negotiated for each Phase 2 entry.

Log output from the initiator:

charon: 09[IKE] IKE_SA con2000[11] established between 192.0.2.90[192.0.2.90]...192.0.2.74[192.0.2.74]
charon: 09[IKE] CHILD_SA con2000{2} established with SPIs cf4973bf_i c1cbfdf2_o and TS 192.168.48.0/24|/0 === 10.42.42.0/24|/0

Log output from the responder:

charon: 03[IKE] IKE_SA con1000[19] established between 192.0.2.74[192.0.2.74]...192.0.2.90[192.0.2.90]
charon: 16[IKE] CHILD_SA con1000{1} established with SPIs c1cbfdf2_i cf4973bf_o and TS 10.42.42.0/24|/0 === 192.168.48.0/24|/0

Failed Connection Examples

These examples show failed connections for varying reasons. In most cases it’s clear from the examples that the initiator does not receive messages about specific items that do not match, so the responder logs are much more informative. This is done to protect the security of the tunnel, it would be insecure to provide messages to a potential attacker that would give them information about how the tunnel is configured.

Phase 1 Main / Aggressive Mismatch

In this example, the initiator is set for Aggressive mode while the responder is set for Main mode.

Log output from the initiator:

charon: 15[IKE] initiating Aggressive Mode IKE_SA con2000[1] to 203.0.113.5
charon: 15[IKE] received AUTHENTICATION_FAILED error notify
charon: 13[ENC] parsed INFORMATIONAL_V1 request 1215317906 [ N(AUTH_FAILED) ]
charon: 13[IKE] received AUTHENTICATION_FAILED error notify

Log output from the responder:

charon: 13[IKE] Aggressive Mode PSK disabled for security reasons
charon: 13[ENC] generating INFORMATIONAL_V1 request 2940146627 [ N(AUTH_FAILED) ]

Note that the logs on the responder state clearly that Aggressive mode is disabled, which is a good clue that the mode is mismatched.

In the reverse case, if the side set for Main mode initiates, the tunnel to a pfSense firewall will establish since Main mode is more secure.

Phase 1 Identifier Mismatch

When the identifier does not match, the initiator only shows that the authentication failed, but does not give a reason. The responder states that it is unable to locate a peer, which indicates that it could not find a matching Phase 1, which implies that no matching identifier could be located.

Log output from the initiator:

charon: 10[ENC] parsed INFORMATIONAL_V1 request 4216246776 [ HASH N(AUTH_FAILED) ]
charon: 10[IKE] received AUTHENTICATION_FAILED error notify

Log output from the responder:

charon: 12[CFG] looking for pre-shared key peer configs matching 203.0.113.5...198.51.100.3[someid]
charon: 12[IKE] no peer config found
charon: 12[ENC] generating INFORMATIONAL_V1 request 4216246776 [ HASH N(AUTH_FAILED) ]

Phase 1 Pre-Shared Key Mismatch

A mismatched pre-shared key can be a tough to diagnose. An error stating the fact that this value is mismatched is not printed in the log, instead this messages is shown:

Log output from the initiator:

charon: 09[ENC] invalid HASH_V1 payload length, decryption failed?
charon: 09[ENC] could not decrypt payloads
charon: 09[IKE] message parsing failed

Log output from the responder:

charon: 09[ENC] invalid ID_V1 payload length, decryption failed?
charon: 09[ENC] could not decrypt payloads
charon: 09[IKE] message parsing failed

When the above log messages are present, check the Pre-Shared Key value on both sides to ensure they match.

Phase 1 Encryption Algorithm Mismatch

Log output from the initiator:

charon: 14[ENC] parsed INFORMATIONAL_V1 request 3851683074 [ N(NO_PROP) ]
charon: 14[IKE] received NO_PROPOSAL_CHOSEN error notify

Log output from the responder:

charon: 14[CFG] received proposals: IKE:AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
charon: 14[CFG] configured proposals: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
charon: 14[IKE] no proposal found
charon: 14[ENC] generating INFORMATIONAL_V1 request 3851683074 [ N(NO_PROP) ]

In this case, the log entry tells shows the problem exactly: The initiator was set for AES 128 encryption, and the responder is set for AES 256. Set both to matching values and then try again.

Phase 1 Hash Algorithm Mismatch

Log output from the initiator:

charon: 10[ENC] parsed INFORMATIONAL_V1 request 2774552374 [ N(NO_PROP) ]
charon: 10[IKE] received NO_PROPOSAL_CHOSEN error notify

Log output from the responder:

charon: 14[CFG] received proposals: IKE:AES_CBC_256/MODP_1024
charon: 14[CFG] configured proposals: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
charon: 14[IKE] no proposal found
charon: 14[ENC] generating INFORMATIONAL_V1 request 2774552374 [ N(NO_PROP) ]

The Hash Algorithm is indicated by the HMAC portion of the logged proposals. As can be seen above, the received and configured propsals do not have matching HMAC entries.

Phase 1 DH Group Mismatch

Log output from the initiator:

charon: 11[ENC] parsed INFORMATIONAL_V1 request 316473468 [ N(NO_PROP) ]
charon: 11[IKE] received NO_PROPOSAL_CHOSEN error notify

Log output from the responder:

charon: 14[CFG] received proposals: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_8192
charon: 14[CFG] configured proposals: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
charon: 14[IKE] no proposal found
charon: 14[ENC] generating INFORMATIONAL_V1 request 316473468 [ N(NO_PROP) ]

DH group is indicated by the “MODP” portion of the listed proposal. As indicated by the log messages, the initiator was set for 8192 (Group 18) and the responder was set for 1024 (Group 2). This error can be corrected by setting the DH group setting on both ends of the tunnel to a matching value.

Phase 2 Network Mismatch

In the following example, the Phase 2 entry on the initiator side is set for 10.3.0.0/24 to 10.5.0.0/24. The responder is not set to match as it lists 10.5.1.0/24 instead.

Log output from the initiator:

charon: 08[CFG] proposing traffic selectors for us:
charon: 08[CFG] 10.3.0.0/24|/0
charon: 08[CFG] proposing traffic selectors for other:
charon: 08[CFG] 10.5.0.0/24|/0
charon: 08[ENC] generating QUICK_MODE request 316948142 [ HASH SA No ID ID ]
charon: 08[NET] sending packet: from 198.51.100.3[500] to 203.0.113.5[500] (236 bytes)
charon: 08[NET] received packet: from 203.0.113.5[500] to 198.51.100.3[500] (76 bytes)
charon: 08[ENC] parsed INFORMATIONAL_V1 request 460353720 [ HASH N(INVAL_ID) ]
charon: 08[IKE] received INVALID_ID_INFORMATION error notify

Log output from the responder:

charon: 08[ENC] parsed QUICK_MODE request 2732380262 [ HASH SA No ID ID ]
charon: 08[CFG] looking for a child config for 10.5.0.0/24|/0 === 10.3.0.0/24|/0
charon: 08[CFG] proposing traffic selectors for us:
charon: 08[CFG] 10.5.1.0/24|/0
charon: 08[CFG] proposing traffic selectors for other:
charon: 08[CFG] 10.3.0.0/24|/0
charon: 08[IKE] no matching CHILD_SA config found
charon: 08[IKE] queueing INFORMATIONAL task
charon: 08[IKE] activating new tasks
charon: 08[IKE] activating INFORMATIONAL task
charon: 08[ENC] generating INFORMATIONAL_V1 request 1136605099 [ HASH N(INVAL_ID) ]

In the responder logs it lists both the networks it received (“child config” line in the log) and what it has configured locally (“proposing traffic selectors for…” lines in the log). By comparing the two, a mismatch can be spotted. The “no matching CHILD_SA config found” line in the log will always be present when this mismatch occurs, and that directly indicates that it could not find a Phase 2 definition to match what it received from the initiator.

Phase 2 Encryption Algorithm Mismatch

Log output from the initiator:

charon: 14[CFG] configured proposals: ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ
charon: 14[ENC] generating QUICK_MODE request 759760112 [ HASH SA No ID ID ]
charon: 14[NET] sending packet: from 198.51.100.3[500] to 203.0.113.5[500] (188 bytes)
charon: 14[NET] received packet: from 203.0.113.5[500] to 198.51.100.3[500] (76 bytes)
charon: 14[ENC] parsed INFORMATIONAL_V1 request 1275272345 [ HASH N(NO_PROP) ]
charon: 14[IKE] received NO_PROPOSAL_CHOSEN error notify

Log output from the responder:

charon: 13[CFG] selecting proposal:
charon: 13[CFG] no acceptable ENCRYPTION_ALGORITHM found
charon: 13[CFG] received proposals: ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ
charon: 13[CFG] configured proposals: ESP:AES_CBC_256/HMAC_SHA1_96/NO_EXT_SEQ
charon: 13[IKE] no matching proposal found, sending NO_PROPOSAL_CHOSEN
charon: 13[IKE] queueing INFORMATIONAL task
charon: 13[IKE] activating new tasks
charon: 13[IKE] activating INFORMATIONAL task
charon: 13[ENC] generating INFORMATIONAL_V1 request 1275272345 [ HASH N(NO_PROP) ]

In this case, the initiator receives a message that the responder could not find a suitable proposal (“received NO_PROPOSAL_CHOSEN”), and from the responder logs it is obvious this was due to the sites being set for different encryption types, AES 128 on one side and AES 256 on the other.

Phase 2 Hash Algorithm Mismatch

Log output from the initiator:

charon: 10[CFG] configured proposals: ESP:AES_CBC_256/HMAC_SHA2_512_256/NO_EXT_SEQ
charon: 10[ENC] generating QUICK_MODE request 2648029707 [ HASH SA No ID ID ]
charon: 10[NET] sending packet: from 198.51.100.3[500] to 203.0.113.5[500] (188 bytes)
charon: 10[NET] received packet: from 203.0.113.5[500] to 198.51.100.3[500] (76 bytes)
charon: 10[ENC] parsed INFORMATIONAL_V1 request 757918402 [ HASH N(NO_PROP) ]
charon: 10[IKE] received NO_PROPOSAL_CHOSEN error notify

Log output from the responder:

charon: 11[CFG] selecting proposal:
charon: 11[CFG] no acceptable INTEGRITY_ALGORITHM found
charon: 11[CFG] received proposals: ESP:AES_CBC_256/HMAC_SHA2_512_256/NO_EXT_SEQ
charon: 11[CFG] configured proposals: ESP:AES_CBC_256/HMAC_SHA1_96/NO_EXT_SEQ
charon: 11[IKE] no matching proposal found, sending NO_PROPOSAL_CHOSEN
charon: 11[IKE] queueing INFORMATIONAL task
charon: 11[IKE] activating new tasks
charon: 11[IKE] activating INFORMATIONAL task
charon: 11[ENC] generating INFORMATIONAL_V1 request 757918402 [ HASH N(NO_PROP) ]

Similar to a Phase 1 Hash Algorithm mismatch, the HMAC values in the log entries do not line up. However the responder also logs a clearer message “no acceptable INTEGRITY_ALGORITHM found” when this happens in Phase 2.

Phase 2 PFS Mismatch

Log output from the initiator:

charon: 06[ENC] generating QUICK_MODE request 909980434 [ HASH SA No KE ID ID ]
charon: 06[NET] sending packet: from 198.51.100.3[500] to 203.0.113.5[500] (444 bytes)
charon: 06[NET] received packet: from 203.0.113.5[500] to 198.51.100.3[500] (76 bytes)
charon: 06[ENC] parsed INFORMATIONAL_V1 request 3861985833 [ HASH N(NO_PROP) ]
charon: 06[IKE] received NO_PROPOSAL_CHOSEN error notify

Log output from the responder:

charon: 08[CFG] selecting proposal:
charon: 08[CFG] no acceptable DIFFIE_HELLMAN_GROUP found
charon: 08[CFG] received proposals: ESP:AES_CBC_256/HMAC_SHA1_96/MODP_2048/NO_EXT_SEQ
charon: 08[CFG] configured proposals: ESP:AES_CBC_256/HMAC_SHA1_96/NO_EXT_SEQ
charon: 08[IKE] no matching proposal found, sending NO_PROPOSAL_CHOSEN
charon: 08[ENC] generating INFORMATIONAL_V1 request 3861985833 [ HASH N(NO_PROP) ]

Perfect Forward Secrecy (PFS) works like DH Groups on Phase 1, but is optional. When chosen PFS options do not match, a clear message is logged indicating this fact: “no acceptable DIFFIE_HELLMAN_GROUP found”.

Note

In some cases, if one side has PFS set to off , and the other side has a value set, the tunnel may still establish and work. The mismatch shown above may only be seen if the values mismatch, for example 1 vs. 5.

Mismatched Identifier with NAT

In this case, pfSense is configured for a Peer Identifier of Peer IP address, but the remote device is actually behind NAT. In this case strongSwan expects the actual private before-NAT IP address as the identifier. The racoon daemon used on older versions was much more relaxed and would match either address, but strongSwan is more formal and requires a correct match.

Log output from the responder:

   charon: 10[IKE] remote host is behind NAT
   charon: 10[IKE] IDir '192.0.2.10' does not match to '203.0.113.245'
[...]
   charon: 10[CFG] looking for pre-shared key peer configs matching 198.51.100.50...203.0.113.245[192.0.2.10]

To correct this condition, change the Peer Identifier setting to IP Address and then enter the pre-NAT IP address, which in this example is 192.0.2.10.

Disappearing Traffic

If IPsec traffic arrives but never appears on the IPsec interface (enc0), check for conflicting routes/interface IP addresses. For example, if an IPsec tunnel is configured with a remote network of 192.0.2.0/24 and there is a local OpenVPN server with a tunnel network of 192.0.2.0/24 then the ESP traffic may arrive, strongSwan may process the packets, but they never show up on enc0 as arriving to the OS for delivery.

Resolve the duplicate interface/route and the traffic will begin to flow.

IPsec Status Page Issues

If the IPsec status page prints errors such as:

Warning: Illegal string offset 'type' in /etc/inc/xmlreader.inc on line 116

That is a sign that the incomplete xmlreader XML parser is active, which is triggered by the presence of the file /cf/conf/use_xmlreader. This alternate parser can be faster for reading large config.xml files, but lacks certain features necessary for other areas to function well. Removing /cf/conf/use_xmlreader will return the system to the default parser immediately, which will correct the display of the IPsec status page.