A New Model To Detect The Thousands of Fake But “IAB Certified” Podcast Downloads I got

13 min readSep 29, 2020

It is possible to generate legit podcasts downloads, with a script and a mobile card inserted into a USB modem.

I managed to obtain rigged but “certified” figures. I have not used it for my own benefit, and I do not encourage anyone to do so to get money. My goal is to highlight the fraud. The most important thing to consider is how to detect this type of fraud (detailed in the last part of this article).

This is adapted from a French paper / Version française ici

The idea: falsify requests sent to servers

Maybe you remember cases of inflated numbers, revealed by Podnews. In these specific cases, it was necessary to have significant web traffic on one of its interfaces to also consider it as an audience for its podcasts. This can be easily detected as a large part of downloads comes from web applications (while in general Apple Podcasts would account for around 60% of the downloads).
But here, we get thousands of “IAB certified” downloads, only with a 4G USB modem connected to a Raspberry Pi running a script ! The figures are validated by Podtrac and Chartable as much as hosting companies like Spreaker (all 3 having received the IAB certification).

*About 1000 downloads per day per podcast (fake, but certified), during the test.* ***And it’s possible to generate way more…***

To measure podcasts audience, there are specifications from the IAB. It’s limited to traffic analysis on the server side, i.e. where the audio files are hosted. Audience metrics on the “client” side are not standardized, because the main platforms (Apple Podcasts, Spotify) have their own methodologies and do not uniformly open access to the listening data.
Thus, the podcast industry relies on only one KPI: the “number of downloads”, computed from server logs. The technical document produced by the IAB provides requirements regarding the exclusion of bot traffic, the non-consideration of duplicate downloads, among others rules.
To summarize (very) roughly, a download is considered valid if it is a request on the server:

From an IP not listed as coming from data centers like AWS’s one for example. This IP should not be responsible for a large number of requests
From a valid software (Apple Podcasts User-Agent, and not an obsolete web browser for example)
Transferred payload size represents more than one minute of audio (let’s say 1MB by approximation — for MP3 128 kbit/s).

The quantity of valid and unique “IP / User Agent / Audio File” triplets determines the number of downloads.

How to (concretely) get fake-but-certified results

I carried out this experiment at home (in France). I tried to inflate audiences for three different podcasts, which I created. These podcasts aren’t supposed to get plays other than with the cheat system. They are hosted using three separate solutions : Acast, Ausha, Spreaker. Two of the three podcasts are available on Apple Podcasts & Spotify (they had very few streams), and could therefore be measured with Chartable. The other one is not available on any podcasting platform.

Materials and required subscriptions

Note that this set can be duplicated as many times as desired to optimize (increase) the number of downloads.

Setup costs: around $180

4G USB modem. I tested with a HUAWEI LTE USB Stick ($80)
A device to plug the modem on, able to run the script. I chose a Raspberry Pi + charger + SD pack ($100)

Monthly subscriptions: around $30–50/month

To get a complete database of User Agents, a WhatIsMyBrowser subscription (“Pro” plan, $15 / month). The same account can be shared between several instances.
Mobile subscription, a generous data plan. In France, it is possible to have a real unlimited 4G plan with Free Mobile, for $19 per month for all “Freebox subscribers”. I also tested a 200 GB plan ($24/month) with Prixtel.

The script to cheat

A simple program, which executes in loop these steps:

Selection of a random number of episodes from one (or more) RSS feed(s). I have set up mechanisms to make the simulated listening realistic. For example, the more recent the content, the more likely it is to be chosen.
Selection of a User-Agent from the WhatIsMyBrowser database. The script can be configured as we can target X% of requests with a User-Agent from Apple Podcasts, Y% from Spotify, etc.
User-Agents from Apple Watch are excluded from the selection, regarding the latest directive of the IAB.
It requests the URL(s) selected in step 1, with the User Agent picked on step 2 (HTTP header). The script only requests about 1 MB of data, and terminates the connection. This saves mobile data and make sure to download more than one minute of audio, the minimum to be valid according to the IAB
(optional) For monitoring purposes, the process stores requests information in a database (IP, UA, episode, etc.)
Modem restart, to get a new IP address.
It waits until Internet connection is up. I also purpose to wait more depending on the day and time (configuration with the OpenStreetMap, grammar opening_hours). This allows to control the number of downloads over time, and create “peaks” of downloads in the morning and driving periods, with few listeners at night for example.

To sum up, the service is connected to Internet with many different IP addresses (via the mobile network), and simulates listening from podcasting applications.

The system is limited by the time it takes to get a new IP address. According to my tests, it’s about 45 seconds for this step. So you need several mobile subscriptions (from different operators, that’s even better) to scale up.
You can also set the maximum number of requests per loop higher to increase the number of total downloads. I put a limit of 10 so that the behavior is realistic (between 0 and 10 downloads for each execution), but it is possible to set any limit.

*Environment variables (configuration) for the script I developed*

How to slightly improve the system

Have its own User-Agents database. I used an external service to prototype faster.
Carry out queries with different amounts of data transferred (do not systematically interrupt after a fixed number of bytes)

Let’s analyze the results

I stored each request information in a database, and I built a dashboard giving an overview of the “fake downloads”.

*Note: I added Podcast 1 on August 27, Podcast 2 and Podcast 3 few days earlier*

Figures from two external podcast analytics tools: Podtrac & Chartable

Generally, all the requests executed with the cheat system are considered as “Downloads” on Podtrac and Chartable. We can also see on Chartable that we manage to fake audience “peaks” when an episode is published. I can’t explain why Chartable removes so many downloads only on September 7.

Acast Open: all requests recorded (except those coming from a Spotify UA)

The figures displayed on the Acast interface corresponds to the number of downloads performed with the cheating system, excluding all requests which simulate listens from Spotify.

For now, Acast Open statistics are not IAB-certified. Acast “Pro” received the certification. I did not test to integrate the Acast Marketplace to see if these fake downloads would be sold as IAB Certified Listens.

UPDATE 30/09 : I’ve tested with Acast Pro too. Fake downloads are accounted (IP-UA aggregation). Screenshots here.

Ausha: no cheating detection either

We also figure out all the cheating requests in the Ausha dashboard. The User Agents are not always interpreted as expected, but it does not matter. Since I’ve got a significant number of plays, access to automatic monetization (ad placements by the host) was made available to me! But I don’t know if I could have actually made money. Maybe some checks would have been done.

Spreaker : aggregation by IP, UA and time to get “Downloads”

With Spreaker, I would say the number of downloads is equivalent to the number of requests the cheating system executed. But with a specific aggregation : if there are two downloads with the same IP, User-Agent and datetime (same YYYY-MM-DD HH:mm:ss value), only one is counted.
The number of listeners seems to correspond to the number of unique ‘IP-App’ pairs. To explain why the numbers are not really the same, I guess my platform classification from User Agent is too basic (App = “Apple Podcasts”, “Spotify”, “Web Browser”, “Alexa”, “Castbox” or “Others”).
The figures provided by this host are supposed to be “IAB certified”. However, as well as Podtrac and Chartable, requests with a payload much lower than 1MB (1 minute of audio) are often taken into account too. But I note on Spreaker’s dashboard a significant drop on September 12 and September 13, when requests were interrupted with a very low limit (100 KB).

Less than 1 min of audio… and still considered as a valid download ?

So I tested with several values for the limit, number of bytes triggering the interruption of requests.
Downloads were still “valid” although they should not be, regarding IAB specifications.
But maybe it’s not a mistake. Let’s say the minimum is 1 MB (= 1 minute of audio, MP3 128 kbits/s). A higher amount of data is already transferred by the server, before the request interrupts almost instantaneously. I tried to test on my personal server, hosted in France. And with several requests executed from the same country, where around 0.2MB is received (and therefore billed as mobile data), on the server side the number of bytes transferred is much higher, between 0.4 MB and 1.3 MB:

*Each row represents a request (payloads targeted to be approximately 0.2MB). The number after “200” represents the number of bytes actually transferred by the server.*

Can this system make a podcast a real/artificial success?

Without “pushing too hard”, a podcast can get 30,000 downloads per month per podcast (with ONE mobile subscription) with this cheating system. In France, in theory, a show could easily be ranked at the top of the ACPM podcast chart. However, I have to pay € 1,500 excluding VAT to the organization to test…

But if we look at the Podtrac rankings (for the United States), it is difficult to get this audience level …

On the podcast applications side, Apple Podcasts or Spotify to cite the main ones, the charts are not affected by this specific cheating system, since there is no activity on these platforms.

Limits

Statistics provided by listening platforms as Apple Podcast Connect, Spotify for Podcasters, Castbox, and others, highlight the numbers are wrong.
This system is more suitable for daily podcasts, because for more realism it is necessary to create significant audience peaks when new episodes are published.
If all the traffic comes from the same mobile network (same IP range), it is suspect. Several instances with several mobile network operators make this less obvious.
It is not possible to control the values of IP addresses. Depending on the operator, the number of distinct IPs and the recurrence (reassignment of an address) may be totally different. The associated location also cannot be completely controlled.
There’s a risk of being blocked by the mobile network for misuse. And if the process stops, there is a big downloads drop.
Such a large number of requests with few data transferred is suspect. But in the case the episodes are short (example: news bulletins, for a daily podcast), there are only complete downloads.

A methodology to limit faked numbers ?

Compute total amount of data transferred (a sort of total listening time metric) ?

This cheating process is limited by the renewal time of IP addresses, but this can be improved with multiple mobile subscriptions. The main limitation today is the available mobile data… There are still very few mobile plans with unlimited Internet access, they are still quite expensive.
If we consider the total listening time, depending on the bitrate and the number of bytes the server transfer, the cheat is less interesting.

And can we adapt this cheat for webradios ? Since radios are streamed, it is possible to rank according to the total listening duration. A 200 GB mobile plan would only provide between 3000 and 7000 hours of listening (depending on the audio quality, and other parameters…). Not really efficient.

In radio, it’s all about “cum” / ”reach” but also “hours per listener” and “listening share”, to distinguish the number of listeners from the overall listening volume.
What about a similar distinction for podcasts?
Why not but… Note that outbound server data does not really reflect the listening time. It represents the theoretical maximum listening time, assuming that a person who downloads a file listens to the episode only once.

Client-side audience measurements

Remote Audio Data (RAD) is a specification proposed by NPR’s R&D teams to standardize data collection from audio platforms. However, since it is possible to send payloads in batch, it seems possible to fake the figures sent to the server in charge of collecting statistics.

It would therefore be necessary to define how to authenticate the listening platform (other than with the User Agent) and listening data.
Or the client app has to send, while it’s playing, real time requests every X number of seconds (but that’s quite tedious!), or using socket technology for example.
But anyway, such a standard on client side is far from being achievable. The statistics from applications like Apple Podcasts or Spotify might never be aggretable each others. And to top it all, it’s possible to manipulate the charts on Apple Podcasts, you can also pay to get podcast streams on Spotify…

Ways to detect this type of fraud

Thus, there are several possible approaches to detect the cheat. The characterization of the listening behaviors seems to be the most pragmatic solution. Three main points :

Compare the numbers computed on the server side with those available on the client side (Alexa, Apple Podcasts, Castbox, Google Podcasts, Spotify, …). We don’t need the apps to cooperate and open their data to host providers. Publishers/podcasters provide their credentials. WARNING: Data from podcast applications are not necessarily more “reliable”, and statistics can also be manipulated using other methods. But if the differences between them and those from the podcast host are too high, it is an indicator that can trigger a red flag, a suspicion of cheating.
Analyze the number of partial requests. If there is a very high share of partial downloads, especially with the equivalent of few minutes of audio, it’s an attempt to cheat. However, for short episodes, it will still be considered as legit, because there would be only full downloads.
Analyze the distribution of the requests traffic types. Of course, it is possible to carry out the cheat with several 4G subscriptions, from several operators. But if almost all downloads come from mobile network IPs, it’s weird. I can’t find recent figures… What is the current proportion of Internet traffic via mobile data vs. wired/home Internet, satellite, etc.?Note also that a such cheat could be done with VPNs instead of 4G. In this case there are no more limitations on the available data and so it can be done by downloading the episodes in full. It would be more judicious to build a list of IPs coming from VPNs, and exclude these addresses (or trigger red flag if traffic from these IPs is too important). It’s a complicated job and it needs to be done on a daily basis, I’m aware of that.

A set of flawed suspicions indicators therefore, but no clear methodology to get an exact number of downloads, resisting this type of fraud. But maybe that’s precisely the problem: the desire to give only raw figures, without context, without qualifying the audience.
The audience metrics are essentially used to structure the advertising market, with the aim of harmonization, but above all also of transparency. And why don’t we keep the IAB rules, which cannot be improved anyway, and we complete the number of downloads with sort of audience qualification “scores”?

A system that would give a result like :

=> You have XXX downloads, with XX% confidence. We estimate between XXX and XXX the number of *legit* downloads
(*this is an estimate, according to our audience qualification rules. You can find the open methodology — free of access — to understadn the results by yourself)
[[Client-side correlation::Red flag]] The publisher has provided its credentials for the following podcast applications: XX, XX, XX. Based on the available data (from APIs and/or proprietary dashboards), we found big differences between the server side numbers and those provided by the different podcast applications
OR
[[Client-side correlation::Orange/neutral flag]]
We don’t have access to client side data / we only get it from XX podcast app. The publisher has not set up connections and/or there’s no available data. There’s a risk of cheat.
[[Partial downloads:: Red flag]] Downloads represent on average XX% of the episodes sizes, which seems far too low.
OR
[[Short episodes:: Orange flag]] The podcast only contain media files with small sizes, which can be subject of fraud
[[Diversity of network traffic sources: Red flag]] We measured a large number of downloads from mobile networks and/or VPNs.

(… and many other indicators & classification to determine. I give examples with ‘red flags’ but there would have ‘green flags’ too for healthy qualification scores)

This is far from perfection, but provides more confidence in the numbers. It indicates clearly what you can expect from the measured audience. The rules of calculation and scoring must be established with care and total transparency. A collegial work is needed.

Download numbers and “qualification scores” must be explicit, and anyone should be able to find back the results from server logs. Or it might extend Podsight’s “oDL” project.

Contact

Anthony GOURRAUD, Media Innovation Engineer.
French “Creative technologist”, passionate about radio (broadcast & podcasting).
I produce a monthly podcast about radio: Des Ondes Vocast. Every single episode is about news, innovation and technologies for the radio and podcast industry, with also a historical part accompanied by archives.
Mail: anthony@vocast.fr