In recent years, real-time bidding (RTB) has emerged as the leading way for publishers to sell their ads to advertisers.

Most often it is the so-called Demand Side Platforms who are doing the biddings on behalf of advertisers at ad exchanges. Paying models for publishers are usually cost per impressions (CPM), cost-per-click (CPC) or cost-per-action (CPA). To set proper bid prices for advertisers, algorithms are implemented with the main objective of estimating click-through-rate (CTR) of ad impressions.

In this article we will be exploring a model for predicting CTR as part of a RTB system.

When training models for this kind of prediction we usually deal with data which has either integer features or categorical features. As we want to build an online model with minimum memory footprint, one-hot encoding is not an option and the technique we will rely on is called a hashing trick or sometimes feature hashing.

One implementation is given below, where we chose a particular hash function (feel free to experiment with your own):

N is a parameter denoting the size of the hashing table. 2**20 is a good size for typical problems.

Next step is to implement SGD for minimisation of log loss function and carry out training iterations:

The dataset examined in this article is from Avazu Kaggle CTR competition (link).

After training, we checked our model against test set:

The log loss result is 0.4129.

For comparison to respective Kaggle competition results please visit link.

Get in Touch

Do you need consultation or have a project in mind? We would love to hear from you!

Get in Touch