News Stay informed about the latest enterprise technology news and product updates.

Debugging IPsec VPNs: Questions and answers

SearchSecurity recently invited networking expert Lisa Phifer to speak about troubleshooting IPSec VPNs. We ran out of time during the Webcast for her to answer several questions from the audience, but, she answers those questions here. Phifer is the vice president of Core Competence.

I've heard that I should use the VPN client sold by my VPN gateway vendor, but we would rather use the VPN client included with Windows. Can we do this?
The good news is that many vendors have simplified VPN client installation, policy update, user authentication, and addressing. The bad news is that vendor customizations can inhibit plug and play interoperability. You probably can use a third-party VPN client -- but you might have to forego value-added features to do so.

With the Windows VPN client, there are two possible IPsec scenarios:

If your VPN gateway supports L2TP over IPsec, Win2K and XP users will have little to configure, but you'll have to install client software on other Win32 operating systems. In addition, the default Windows policy requires every PC to have its own certificate.

If your VPN gateway supports IPsec, but not L2TP, you may still be able to work with the IPsec built into Win2K and XP. This requires explicit policy configuration and static IPs on your Windows clients. Many VPN vendors publish "how to" guides to help you. Our VPN client connections work for about an hour, but then they seem to get "stalled." Where would you start looking for this problem?
A tunnel that breaks consistently after a given interval is probably hitting some kind of timeout. Start with the tunnel lifetime negotiated between your server and client. When a tunnel's soft lifetime is encountered, IKE initiates a new IPsec SA with new keys. When the tunnel's hard lifetime expires, the old SA must be disconnected. Hard lifetime is usually configurable, but a few implementations let you configure both. If you have a rekey problem, try adjusting the lifetime at just one end to see if that makes a difference.

To conserve resources, inactive SAs may be forcibly disconnected by an idle timer running at either endpoint. A timer could also exist somewhere in between, like a NAT binding that expires. If idle timers are bringing your client down prematurely, you can verify this by sending a "keep alive" -- for example, pinging a destination on the far side of the tunnel once per minute.

FOR MORE INFORMATION: Webcast transcript: "Troubleshooting IPsec VPNs" news exclusive: "Crypto for VPNs: Questions and answers Our VPN tunnels work fine with preshared secrets; what new problems are we likely to encounter when we upgrade to certificates?
Initial debugging for certificate authentication can be harder than preshared secrets, but after certificates are in place, they will not be more problematic -- in fact, they'll be easier to change over time.

When upgrading to certificates, you'll need to set up or contract with a certificate authority. You'll need to get a root certificate for the CA, issue certificates for every user or device, then deliver and install certificates and key pairs. Many VPN products comply with PKI standards -- for example, by generating PKCS#10 request files or sending on-line certificate requests. But interoperability issues are not unusual, so start with a CA that's already been tested with your VPN product(s).

Before you generate certificates, consider how users and devices will be identified. For example, you can identify gateways by IP address, hostname, or both. You need to know which ID types your VPN gateway will support when certificates are used. Also consider constraints imposed by your VPN gateway or client on private keys. Some VPN gateways can only use private keys they generate themselves, while others import "personal certificates" that include the public-private key pair.

Also, think about what happens when employees leave the company or a laptop with a stored certificate gets stolen. CAs put expiration dates on certificates and generate Certificate Revocation Lists (CRLs). Gateways check expiration dates during IKE authentication, and should also check CRLs to be sure the cert has not been revoked. But not all products check CRLs or behave the same way when they cannot get the CRL. These are some of the issues to consider when adding certificate authentication. What might cause a VPN user to be suddenly booted from inside the VPN router (pinging server shows 10.0.0.x) to outside the VPN router (pinging the same server shows 216.n.n.n)?
You tell me that this user is tunneling between the Windows IPsec client and an access router at your office. The router's public IP is 216.n.n.n, and it is using NAT to hide your 10.0.0.x private subnet. Your IPsec policies were hand-configured into every user's PC. Only one user is experiencing this problem. When he pings 10.0.0.x, he gets a full set of ping success responses from 216.n.n.n. This implies that requests are reaching the server over IPsec but responses are being returned over the Internet. The source address in response packets is being updated by NAT.

Since this occurs consistently for just one client, I suspect a policy configuration error on that PC. Make sure the policy is not permitting incoming ICMP without IPsec, and is not accepting cleartext when IPsec cannot be negotiated. Look for asymmetry between outbound and inbound policies - anything that might cause the tunnel to be established successfully in one direction but not the other. Using MMC, find Local Computer Policy / Computer Configuration / Windows Settings / Security Settings / Local Policies / Audit Policies, enable auditing for logon and object access events, reboot, then use the Event Viewer to see IPsec success and failure events on that user's PC.

If this occurred sporadically, I would suspect a problem on the access router. For example, could the tunnel be torn down right after the ping request is received over IPsec? If the tunnel is no longer up when the server returns the ping response, that response might "leak" back over the Internet. (This could also happen consistently if you had two routers and the server was returning the response back through the wrong router.) Because your access router does not provide you with interface counters or a debug log, I would start by sniffing packets on the public side of the router, looking for incoming ESP, followed by outgoing ICMP. We have established a IPsec VPN using Cisco routers and OS. The tunnel seems to be functioning, however our Citrix Term Server can no longer ping or browse across the WAN as it could before we had a VPN.
After further discussion, it is my understanding that you have two offices with a point-to-point T1 between them. Each office also has an Internet uplink used for unprotected Internet browsing and for secure VPN tunneling to an application hosted at a third-party site. When the VPN tunnel was added to the Internet uplink, interoffice traffic stopped flowing consistently over the T1. In your words, "Sometimes we get there and sometimes not."

This feels like a routing problem. You are using a dynamic routing protocol between offices over your T1. When you added the VPN, each office gained a new route to the third-party site. Perhaps these two routers incorrectly believe they now have a better route to each other through the third party router, over the VPN?

I would start by examining the routing tables of both routers. I would look at interface counters to see whether inter-office packets are heading out the T1 or the VPN. If inter-office packets are heading out the wrong interface, I might try adding static routes to force inter-office traffic over the T1. If that cured the problem, I would look more closely at how your dynamic routing protocol is configured. Our VPN clients can connect to our network, but they cannot browse shared resources in our Windows domain. How can we fix this?
The NetBIOS broadcast traffic used by Windows PCs to broadcast PC and share names was designed for use in a local area network. Some gateways provide an option to relay NetBIOS broadcasts over VPN tunnels, but most do not.

In your gateway does not relay NetBIOS, there are other alternatives. The client may connect directly to a server on the far side of the tunnel by using IP address instead of name. Alternatively, the server's name and IP address can be defined in the lmhosts file. Or the name and IP address of the primary domain controller can be included in the lmhosts file, so that the client can look up other names. These solutions may not let the client browse the network neighborhood, but they will let the client connect to network shares across a VPN tunnel. The tunnel security policy must of course permit TCP and UDP on ports used by Windows networking. I want to configure my VPN gateway to accepts tunnels from teleworker firewalls, but I'm having trouble because they get dynamic IP addresses with DHCP. Can we make this work and what are the limitations?
Dynamically-addressed teleworker firewalls cannot use the simplest IKE configuration of main mode with preshared secrets, identified by IP address. You'll need to use another ID type - for example, fully qualified domain name (FQDN, the firewall's hostname) or user-FQDN (email address).

Both of these ID types can only be used with preshared secrets in IKE aggressive mode. Most gateways support VPN clients identified by email address in aggressive mode. Some gateways support site-to-site policies that identify peers by hostname instead of IP address. Either way, the VPN tunnel will only be able to come up if the teleworker initiates contact. Your central gateway will not have an IP address to initiate contact and can only be a responder.

Next Steps

Dig Deeper on VPN security