Jul 27, 2013

Making web tracking

A few weeks ago, I had a task to track if users at a partner site open our pop-up. There are a lot of solutions, but the main idea is simple. When an event happens, we send a request to our server.

Need to say that I just need to send get request and make it as most universal as possible.

One of the best solutions (that I chose) is to use a transparent image (remember how to make transparent png working in ie6, no? but I remember =). It is more than universal - img tag appeared very long time ago!

When someone include our image in markup of page via img tag, browser will load it - that is what i want. So lets take 1px transparent gif (small and good supported) - I did not have such so I googled it and found… several. I took the smallest one - 35 bytes.

Well, yes this does not prevent you to get request you do not want.

I will use Lift framework as we use it at InGo. First I need bytes of this image:

Files.realAllBytes(Paths.get(”./empty.gif”))

And save it somewhere.

val emptyGif: Array[Byte] = Array(71, 73, 70, 56, 57, 97, 1, 0, 1, 0, -128, -1, 0, -1, -1, -1, 0, 0, 0, 44, 0, 0, 0, 0, 1, 0, 1, 0, 0, 2, 2, 68, 1, 0, 59)

Now i need rest endpoint to return my image:

object TrackApi extends RestHelper with Loggable {

    val emptyGifResponse = InMemoryResponse(emptyGif, headers, Nil, 200)

    serve {
        case "track" :: "empty" :: Nil Get req if req.path.suffix == "gif" =>
            emptyGifResponse
    }
}

That is it! Working and powerful web analytics platform … almost.

Basic working case it is

<img alt=”” src=//example.com/track/empty.gif” />

This is enough to know that someone open some page. From request headers we can get all required information about client: Referer, User-Agent, etc.

I almost forget to say about how to prevent browser from caching our image - minimal set of headers:

val headers = List(
"Expires"       -> "0",
"Cache-Control" -> "no-cache, no-store, must-revalidate",
"Pragma"        -> "no-cache",
"Content-Type"  -> "image/gif"
)

If we want to limit with server-side solution then we can set response header Set-Cookie and identify user each time. But this can be problematic sometimes because, in general, we cannot trust request headers. It was not my case and add small js library to be included in site of our partner.

First we need some good way to load our js library to do not interfere client site. Remember how each big site load its own scripts, create script tags and insert them somewhere in page and load async, let make similar or maybe in the same way:

// i will expose only 2 public objects to host object,
// this is because i want to be sure that client can 
// change it without problems if it is required
// SuperAnalyticsObject - first exposed object. 
// it contains name of tracking function - second exposed object
window.SuperAnalyticsObject = name;
// first need to check if library was not loaded already in some way
// if it is not then create small mock object 
// that collect tracked events before script loaded
window[name] = window[name] || function () {
    (window[name].q = window[name].q || []).push(arguments)
}
// now construct script tag and insert it at the top
script = document.createElement(scriptTag),
firstScript = document.getElementsByTagName(scriptTag)[0];
script.async = 1;
script.src = src;
firstScript.parentNode.insertBefore(script, firstScript);

Now let make it friendly to js minifier (I am using closure usually - it shows the best results) and save variable values for this small script:

(function (window, document, scriptTag, src, name, script, firstScript) {
    window.SuperAnalyticsObject = name;
    window[name] = window[name] || function () {
        (window[name].q = window[name].q || []).push(arguments)
    };
    script = document.createElement(scriptTag),
    firstScript = document.getElementsByTagName(scriptTag)[0];
    script.async = 1;
    script.src = src;
    firstScript.parentNode.insertBefore(script, firstScript);
})(window, document, 'script', '//example.com/public/tracking.js', '__it');

With this code snippet I will load my js script and set name of exposed function.

Now about tracking script.

First I need some utility functions to read and write cookie (I want to know that it is the same user) and generate user id.

Id generation is the most simple function there (just funky random string good enough to be user id):

var generateId = function () {
    return Math.round(Math.random() * new Date().getTime()) + '.' + new Date().getTime();
};

Reading the cookie is also not hard - all cookies available for this page are stored in document.cookie as string name=value;*. Lets parse it:

var getCookieValue = function (name) {
    var res = [],
        cookies = document.cookie.split(";");
        name = RegExp("^\\s*" + name + "=\\s*(.*?)\\s*$");
    for (var i = 0; i < cookies.length; i++) {
        var m = cookies[i].match(name);
        m && res.push(m[1])
    }
    return res;
};

How to set cookie:

var setCookie = function (cookieName, cookieValue, path, domain, expires) {
    var removeWWW = function (a) {
        0 == a.indexOf("www.") && (a = a.substring(4));
        return a.toLowerCase();
    };
    // i remove www. prefix because i want cookie to be available on all subdomains
    domain = domain || removeWWW(document.domain);
    expires = expires || 63072E6; // 2 years in miliseconds by default
    path = path || '/';
    // cookie is restricred in length
    cookieValue &&
    2E3 < cookieValue.length && (cookieValue = cookieValue.substring(0, 2E3));
    cookieName = cookieName + "=" + cookieValue + "; path=" + path + "; ";
    expires && (cookieName += "expires=" + (new Date((new Date).getTime() + expires)).toGMTString() + "; ");
    domain && (cookieName += "domain=" + domain + ";");
    document.cookie = cookieName;
};

Now lets load our empty gif - it core of tracking. Just load gif:

var loadImage = function (query, callback, src) {
    callback = callback || function () {};
    src = (src || baseUrl) + '?' + query;
    // Image it is DOM representation of <img alt="" /> set it height and width
    var img = new Image(1, 1);
    img.src = src;
    // for me it does not matter if it loaded or it is not, i just should try to load
    img.onload = function () {
        img.onload = null;
        img.onerror = null;
        callback();
    };
    img.onerror = function () {
        img.onload = null;
        img.onerror = null;
        callback();
    };
};

That is more than enough to make it working, finally lets describe default query parameters and load events that were generated before script loaded.

And let generate event:

// this function i will expose it can be called with any number of arguments first one is required it is name of event
var track = function () {
    var values = Array.prototype.slice.call(arguments, 0),
    name = values.shift();
    // first required parameter it is user id that we take from cookie
    var q = 'u=' + encodeURIComponent(track.options.id) +
        // second required parameter it is name of event
        '&e=' + encodeURIComponent(name) +
        // third required parameter it is host where event was generated
        '&h=' + encodeURIComponent(location.host) +
        // fourth parameter it is just random number to be suer browser will not try cache request
        '&r=' + (1 * new Date());
        // other parameters it is arguments of event any relevant information that i will want to send
    for (var i = 0; i < values.length; i++) {
        q += '&a=' + encodeURIComponent(values[i]);
    }

    loadImage(q);
};

track.options = {};
// try to read user id
var id = getCookieValue(cookieName)[0];
if (!id) {
    // and create one if it is empty
    id = 'IT-' + generateId();
    setCookie(cookieName, id);

    track.options.init = true
}
track.options.id = id;

// how to expose my function in host object
var name = window.SuperAnalyticsObject || '__it';

// load all already generated events
if(window[name] && window[name].q) {
    var len = window[name].q.length;
    for(var i = 0; i < len; i++) track.apply(undefined, window[name].q[i]);
}
window[name] = track;

That is all, fully functional code to make web tracking. As an improvement, we can record number to tried and request frequency to do not kill our server. There is a lot of things that can be done and improved, but similar code solve my task.

Maybe someone will be usefull to use XHR and CORS to restrict some request with only user browsers. To send POST you will need to create form with inputs and fire submit.

I took most ideas from stackoverflow and from already working analytics scripts.