A common issue with blocking/allowing access to a site (www.example.newsite.com) is the underlying domains that need to be accounted for.
If you wanted to block/allow www.bostonglobe.com for instance, it would be great to just put bostonglobe.com in a list.
Unfortunately it's not that simple.
If we do a packet capture on the DNS calls involved with bringing up www.sfgate.com, we would see the following DNS records needed to completely render the site:
Some of the content provided by most advanced websites is driven from sites in a category you have blocked. This will mean that elements of the page won't load properly, or there will be no formatting.
An easy way to find the domains required is to use Google Chrome.
This browser has an interesting DNS prefetch which logs your queries.
More detailed information about the feature is available here -> http://www.chromium.org/developers/design-documents/dns-prefetching
Once the feature is turned on in your browser (it's most likely on for you right now), you can visit the site you want to collect information about.
After the site has completely rendered (all elements are downloaded), you can enter the following into the URL bar in the browser:
Now use CTRL-F to find the domain you are looking for. I've used www.bostonglobe.com as an example.
These entries would be required in order to Allow this site to render completely:
This issue is very prevalent in whitelist only mode. It takes a bit of work to get access just right if everything is being blocked.
Packet captures are also an effective way to find this information. Wireshark is a robust free tool that can easily filter DNS traffic.