Filed under Development
These days lot of bloggers have been writing about the new API of Google: Google Safe Browsing API. The Safe Browsing API is an experimental API that enables client applications to check URLs against Google's constantly updated blacklists of suspected phishing and malware pages.
I think this is a cool API and if you want to take profit of it you already have it with FireFox. But, I like Internet Explorer so I would like to use both together.
If you take a look to the documentation, you will see that its usage is actually very easy. The idea is that you get a list of URL hashes, which are marked as phishing or malware; then you only need to validate the URL's you want to check against these hash tables. I'm not going to enter in too much details since you can read the guide that Google provides you.
What I've done is let's say: a "proof of concept", that it's possible to use Google Safe Browsing with IE. I've built an implementation of this using C#. The code you can download consists in two main parts:
- A Windows Service that hosts a WCF Service with the logic to perform updates and the validation against the hash tables.
- An BHO (Browser Helper Object) that queries the information to the WCF Service.
The BHO is configured as an add-on for IE. It just handles the BeforeNavigate2 event and checks the URL making a call to the published service, if the URL is safe then nothing happens otherwise the navigation is canceled and the message displayed:
The code for the BHO is pretty simple, the first you need to do is to declare the interface IObjectWithSite to make available for .NET
4: public interface IObjectWithSite
6: void SetSite([In, MarshalAs(UnmanagedType.IUnknown)] Object pUnkSite);
7: void GetSite(ref Guid riid, [MarshalAs(UnmanagedType.IUnknown)] out Object ppvSite);
The other interesting part of it, is how to cancel the navigation and modify the HtmlDocument, with our custom html.
1: void webBrowser_BeforeNavigate2(object pDisp, ref object URL, ref object Flags, ref object TargetFrameName, ref object PostData, ref object Headers, ref bool Cancel)
5: if (result == UrlValidationResult.Malware || result == UrlValidationResult.BlackList)
7: IHTMLDocument2 doc = webBrowser.Document as IHTMLDocument2;
9: if (doc != null)
12: doc.writeln(result == UrlValidationResult.Malware ? Resources.MalwareWarning : Resources.BlackWarning);
16: Cancel = true;
The WCF Service exposes only two methods: "public void Update()" and "public UrlValidationResult ValidateUrl(Uri uri)". This service what it does is to obtain and keep updated the hash tables doing incremental updates, as well as the logic to validate the URL's against these tables. The tables are stored in an IsolatedStorage to avoid obtaining them from Internet every time, nevertheless the tables are loaded and operated in memory.
In order to validate a URL you need to perform some steps, first of all you need to obtain a 128 bit MD5 Hash of the URL you want to check, then you need to get the string representation of this hash. With .NET you can accomplish this easily.
1: private string GetHash(string url)
3: byte hashBytes;
4: using (MD5 md5 = MD5.Create())
6: hashBytes = md5.ComputeHash(System.Text.Encoding.ASCII.GetBytes(url));
9: StringBuilder sb = new StringBuilder(32);
11: int length = hashBytes.Length;
12: for (int i = 0; i < length; i++)
17: return sb.ToString();
Google suggest you also to perform several lookups from the same URL to get an accurate result, which consists of: the exact hostname in the URL, up to 4 hostname's formed by starting with the last 4 components and successively removing the leading component. In addition for the path you should try at most 6 different strings: the exact path of the URL, including query parameters; the exact path of the URL, without query parameters and the 4 paths formed by starting at the root (/) and successively appending path components, including a trailing slash. A sample displays better what this means.
For the URL http://a.b.c.d.e.f.g/1.html
*(Note that b.c.d.e.f.g, is skipped since we'll take only the last 5 hostname components, and the full hostname)
Another interesting feature is that you can verify that the tables obtained in the requests come from Google, this is obtained by requesting a pair of keys, client key and wrapped key. The wrapped key must be sent along the requests for updates, then Google will include a MAC (Message Authentication Code) in the header of each response following the structure "[mac=dRalfTU+bXwUhlk0NCGJtQ==]". In order to validate this mac you need to do again a 128 bit MD5 Hash with the following information: client_key|separator|table data|separator|client_key. Where the separator is the string:coolgoog: - that is a colon followed by "coolgoog" followed by a colon. To be honest I got a bit stuck here, I've tried a few ways to verify a MAC but there is something I'm missing and I cannot get the expected result. Maybe I try again when I come back from holidays.
You can download the code and test it, but remember that this code is provided as is and cannot be considered finished code. There is room enough for improvement in many areas including exception handling that has not been considered for this sample.
In order to test you will need to install and start the windows service included, before start it be sure that you include your own key in the appSettings section of the file "BalearesOnNet.GoogleSafeBrowsing.Service.exe.config". The BHO is configured to be installed when you compile it with Visual Studio, you can disable this option by unchecking the option "Register for COM interop" in the properties of the project.
I hope you like it.