Tutorial Notes: Real-Time Bidding based Display Advertising: Mechanisms and Algorithms
- Slides Link: http://wnzhang.net/slides/ecir16-rtb.pdf
This tutorial is an excellent introduction to real-time bidding by my current adviser Weinan Zhang at ECIR 2016.
Table of Contents
- RTB system
- Auction mechanisms
- User response estimation
- Conversion attribution
- Learning to bid
- Data Management Platform techniques
- Floor price optimization
- Fighting against fraud
RTB system
Why RTB?
Buying banner ads may not be a good idea because people may not be interested in the product. Advertisers want to show ads to interested users.
RTB are used more in displaying advertising (ads on webpages), and less in sponsored search (ads matching your keywords shown in a search engine).
RTB Display Advertising Mechanism
There are 3 roles in RTB system:
- User: browse websites; see ads on the page; might click on ads; might even convert, for example, sign up, buy something, etc
- Demand-Side Platform (DSP): i.e. advertiser; provide ads; hope to increase some key performance indicator (KPI), for example, user sign up, user buying something, etc.
- Ad Exchange: match the user with advertiser; give user info to advertisers; run an auction for each ad impression (= display); charge advertiser some money
- Additionally, advertisers may want to get more information about the web page and the user from some Data Management Platform (DMP).
The mechanism is quite easy to understand:
(My note: RTB system is quite busy. So many users and so many webpages, thus so many auctions, and each ad needs to be display within 100ms. Quite a challenging work! A combination of research and industry!)
Auction mechanisms
The most widely-used auction mechanism in RTB is second-price auction, i.e., the one with highest price wins the auction but he only needs to play the second highest price.
Ad exchange is assumed not to sell below the reserve price and reserve prices are assumed to be known to all advertisers. i.e. = the minimum bids for the impression.
Advertisers may also need to pay the ad exchange some entry fees.
User response estimation
Problem
This is an important task for DSP because user response estimation is a key factor to decide price when bidding. There are many papers on it.
The raw input of this task is information of an impression, such as date, hour, weekday, IP, region, city, country, ad exchange, domain, URL, OS, browser, ad size, ad ID, user tags and so on.
The output is the user response estimation. For example, click or not click, click-through rate (CTR, the probability of clicking) and conversion rate (CVR).
Feature representation is usually concatenation of one-hot categorial data. So, very very high dimensional sparse vector. (My question: I don’t know if embedding is OK?)
My question about feature representation
(My question: Actually, I think it quite interesting that you may need to first scan through your dataset then you can fix your feature representation, i.e., scanning through the dataset to get how many distinct value a field can have, then you can decide how many dimension this one-hot vector has.
I guess maybe there is no need to pre-scan the dataset in real world. First is because this pre-scanning thing sounds like nonsense. Imagine you have a trained model, and now, new data comes in and you want to further train the model. Since possible values of each field may has changed, you need to change the feature representation slightly and also migrate from the old one to the new one. The second reason is that maybe we can just allocate all possible values in advance, for example, keep IPv4 /8
. But how about those values that are even larger, such as IPv6?)
Unique problem properties
This task is actually a binary classification or regression problem. The problem has some properties that need to be taken care of:
- Large binary feature space (> 10m dimension)
- In practice, we need a sparse solution. Vanilla L1 regularization doesn’t work well.
- Follow-The-Regularized-Leader (FTRL) is a good algorithm. See [McMahan et al. Ad Click Prediction : a View from the Trenches. KDD 13].
- Large data instance number (> 10m daily)
- A seriously unbalanced label
- Negative labels may be 1k or 100k times more than positive labels
- When training, use a technique called negative down sampling
- After training, it is necessary to do calibration
Linear model
Linear model is heavily used and works quite well. Pros and cons:
- Pro: Highly efficient and scalable
- Pro: Fast, thus can explore larger feature space and training data
- Con: Lower model capability, feature independence assumption
- Cannot capture feature interactions unless defining high order combination features, for example, create a new category for
(hour, city, browser)
and take values like(hour=10AM, city=London, browser=Chrome)
.
- Cannot capture feature interactions unless defining high order combination features, for example, create a new category for
Conversion attribution
For a converted user, his path to the conversion may be a long journey, for example, he first saw a display ad, then watched a video ad, then a display ad again, a social ad next and finally convert. Conversion attribution is to assign credits to each channel according to contribution.
Current industrial solution: last-touch attribution, i.e., no conversion attribution at all. The last ad takes all the credit.
Learning to bid
Problem
Given a bid request (user, ad, page, context, etc), we need to find a best bidding strategy that optimize KPI within the budget constraint outputting bid price.
Or in a simpler way:
We can easily understand what utility estimation means, but what is bid landscape?
Bid landscape
There are many advertisers bidding a ad request. They gives different prices. For a bidding strategy to win, it also need to consider the market, i.e., guess how much money others will bid.
The problem is like to estimate a probability density function whose input is bid price and output is the probability of winning the bid.
Bidding strategies
There are two heavily used bidding strategies.
- Truthful bidding
- Bid the true value of the impression
- Averaged impression value = value of click * CTR
- Non-truthful linear bidding
- bid = base bid * predicted CTR / base CTR
- (My question: where does the base come from?)
Data Management Platform techniques
(My note: Technologically interesting and appeal to me, but nothing to do with research.)
Floor price optimization
(My note: don’t know :-( )
Fighting against fraud
Fraud is a serious problem. (My note: again, not key research issue)