How Do I Block access to a Web Site?

A question we get asked very often is “How do I block access to a web site?”, or to be more accurate: “How do I block access to Facebook?” And it isn’t always an easy question to answer. There are several possible tactics to accomplish the goal, some are discussed elsewhere in the book.

Using DNS

If the built in DNS Resolver or Forwarder are active an override can be entered there to resolve the unwanted website to an invalid IP address such as 127.0.0.1.

Using Firewall Rules

If a website rarely changes IP addresses, access to it can be blocked using an alias containing its IP addresses and then using this alias in firewall rules. This is not a feasible solution for sites that return low TTLs and spread the load across many servers and/or datacenters, such as Google and similar very large sites. Most small to mid sized websites can be effectively blocked using this method as they rarely change IP addresses.

A hostname can also be inside a network alias. The hostname will be resolved periodically and updated as needed. This is more effective than manually looking up the IP addresses, but will still fall short if the site returns DNS records in a way that changes rapidly or randomizes results from a pool of servers on each query, which is common for large sites.

Another option is finding all of a site’s IP subnet allocations, creating an alias with those networks, and blocking traffic to those destinations. This is especially useful with sites such as Facebook that spread large amounts of IP space, but are constrained within a few net blocks. Using regional registry sites such as ARIN can help track down those networks. For example, all of the networks used by Facebook in the region covered by ARIN can be found at http://whois.arin.net/rest/org/THEFA-3.html under “Related Networks”. Companies may have other addresses in different regions, so check other regional sites as well, such as RIPE, APNIC, etc.

As an alternative to looking up the IP blocks manually, locate the target company’s BGP Autonomous System (AS) number by doing a whois lookup on one of their IP addresses, then use that to find all of their allocations. For example, Facebook’s AS number is AS32934:

# whois -h whois.radb.net -- '-i origin AS32934' | awk '/^route:/ {print $2;}' | sort | uniq

Copy the results of that command into a new alias and it will cover all of their currently allocated networks. Check the results periodically for updates.

The pfBlocker package offers mechanisms which can be useful in this area, such as DNSBL, geographic IP address blocking, and automation of the AS lookup process.

Using a Proxy

If web traffic flows through a proxy server, that proxy server can likely be used to prevent access to such sites. For example, Squid has an add-on called SquidGuard which allows for blocking web sites by URL or other similar criteria. There is a very brief introduction to Squid and SquidGuard to be found in A Brief Introduction to Web Proxies and Reporting: Squid, SquidGuard, and Lightsquid.

Prevent Bypassing Restrictions

With any of the above methods, there are many ways to get around the defined blocks. The easiest and likely most prevalent is using any number of proxy websites. Finding and blocking all of these individually and keeping the list up to date is impossible. The best way to ensure these sites are not accessible is using an external proxy or content filtering capable of blocking by category.

To further maintain control, use a restrictive egress ruleset and only allow traffic out to specific services and/or hosts. For example, only allow DNS access to the firewall or the DNS servers specifically used for LAN clients. Also, if a proxy is in use on the network, make sure to disallow direct access to HTTP and HTTPS through the firewall and only allow traffic to and/or from the proxy server.