Challenges of Offering an API

Reading Time: 5 minutes

Last week, I dove into my ideal customer. Now that I have chosen that elusive ideal customer profile, there are consequences. I have found the right people to build for, but what do they need? I’ll walk you through my product challenges today.

Let’s recap. I decided to turn Podscan into the most comprehensive podcast data platform it can be. My ideal customer is anyone who wants to build a product, a service, or a business on top of that data platform.

And that means that I’m selling something that is extremely easy to copy, clone, and abuse. The thing that I want to freely give to my paying customers —transcripts, rankings, metadata, and all other kinds of things related to podcasts— is also the thing I have to protect at all costs.

That’s the bizarre thing about APIs: the easier it is to grab data, the more people want to use them. Yet, the easier it is to grab a LOT of data, the more risky it gets for the business offering it.

Experience this article as a podcast or as a newsletter:

There are a few problematic kinds of behaviors that software businesses have to contend with, and they are exacerbated for an API-centric business:

Scraping

The biggest threat is someone just grabbing the whole database in one go. Every single podcast, every single transcript, all connections, all ratings, the whole thing. Duplicating a valuable database is what the Internet was built for. Every time we visit a website, a small copy is made on our computers, and most of the time, website owners want that. That’s how it works. But a fully-fledged database that costs hundreds of hours and tens of thousands of dollars to create?

Yeah, not so much.

So I need to prevent this. From the start, I need to stay ahead of those who would want to siphon this treasure trove into their own systems. With that in mind, I need to think defensively in a few ways:

I need to make it hard to iterate over my database entries easily. If you’re downloading record #4287, you know that there probably is a #4288 as well. That way, a scraper could be automated to grab every single record in a row. That’s why I created encoded IDs in my API, just like Stripe, that both obfuscate the underlying ID and make the record more recognizable. Podcast #4287 turns into pod_a8625b — something that looks more like a podcast and less like a random number. If someone were to get their hands on a list of these, of course, they could still scrape them, but all this needs to do is to deter people from seeing an easy opportunity.
Any API I offer needs to be severely rate-limited. Podcast information, particularly historical data, doesn’t change after the fact. Even with mild scraping, someone could eventually explore the whole API within a few months. That’s where rate limits come on. My trial plan allows a measely 100 requests per day. For a scraper, this is used up within seconds. For someone evaluating the product, it’s more than enough. Paid plans get liberal but still sensible limits. If someone needs more, they can buy an enterprise plan and get in touch. For anyone else, these limits will be sufficient — and if they’re not, I can modify them as I learn more.
Finally: no freemium! I can not and will not allow non-paying customers to access this data. If they can’t afford the $19/month plan, they can’t have it. People go through great lengths of automating account creation and data extraction in freemium products. Not going to happen here: Podscan is pay to play.

Copycats

I do this mostly because the easiest part of Podscan that a copycat founder could clone is the interface. The complicated and expensive stuff is all in the backend and the database. And that’s what people are after.

And product limitations aren’t the only barriers I can throw into their path.

Of course, I drafted terms & conditions for the API. I had that in place before I even activated it. The first sentence of these terms should make it absolutely clear what’s okay and what is not: “You can not use the Podscan API create an application or service that competes directly with Podscan’s core products.”

I also added a few sentences about storing the data — also not allowed if it’s not meant for immediately serving their customers. That’s a limitation that every API users agrees with upon connecting to the Podscan APIs.

The Problem with Limiting Access

When you limit access like this, you also limit opportunity, and that’s the hard balance to strike here. I want my users to feel they can build anything they want on top of the APIs, but I also want to very much stay in control of the data that powers these products.

I got a message earlier this week on my helpdesk chat widget from a founder who wondered just how much they could cache the data they receive from the API. Is a few seconds fine? Can they go into a cache to be sent out in an email later that day?

It got quite specific, and it reminded me just how much “just-in-time” decision-making running a software business really is about. I found a way that both the user and I were happy, and we took it from there (after all, the phrase “we may be able to offer an exemption to these rules in certain circumstances” is part of the terms & conditions too).

The more data I Podscan ingests, transcribes, and analyzes, the more critical these choices and partnership agreements will become. Right now, my users have personal access to me (and often a personal history from prior conversations on Twitter). But some day, these will be bigger and bigger businesses trying to get their hands on as much as they can.

What to Share and What to Hide

Which brings me to another conundrum. There are some kinds of data that I collect from a wide variety of sources that I might not want to share on the API at all. Audience size data is one of the best-kept secrets of the podcasting world. No hosting provider, no podcast player creator gives away even a glimpse at the actual numbers behind the podcasts they work with. The only people who know how many listeners they have are the owners of the podcasts themselves. And they don’t share.

In such a situation, what does one do? Guesstimates! One checks the Apple Podcast charts, looks for review counts and the size of social media profiles, and then compiles them into some kind of score. Podchaser does this, as do ListenNotes, and I’m working on something similar.

But I could share these metrics on my API. I have a full history of review counts on Apple. Why not add it to the API?

I struggle with this a lot. I want my users to be able to get as much as they can from the platform. But I also want to keep some secret sauce to myself. So I’ve been looking at how other platforms solve this. Most of them just don’t. If anything at all, they share a rough score — a simple ranking like “4/10” or “Top 10%”.

And even that tends to be only available in the more expensive tiers.

I think that’s what I’ll do with Podscan. Audience information is probably the most expensive non-AI-work to do for Podscan. It involves constantly scanning the web and parsing websites. Occasionally, I need proxies to reliably get results. And that has a cost.

For that reason, I think I’ll make anything indicating reach, audience, or listener data a Premium-and-higher feature. The API will not return these fields for Essentials customers and only return example data or rounded numbers for trial accounts. I’ll have to figure out how I can communicate this in the documentation and inside the product, but I think that’s the way forward. If it costs me to create, it should cost to consume.

Of course, I’ll have to make sure that all these limitations and protections are also present in the user-facing website. Scraping often happens right at that level, and I can already feel that my eagerness to present all kinds of interesting data might lead to a kind of data extraction that isn’t easily fought with rate limits and IP blocks.

No doubt I’ll run into other API- and data-related issues in the future. You might even think of one that I missed right now. Please feel free to send me a Twitter DM or an email at arvid@podscan.fm. I really appreciate all the wonderful feedback I have been getting over the last week as I’ve shared the Podscan journey in public.

Challenges of Offering an API

Scraping

Copycats

The Problem with Limiting Access

What to Share and What to Hide

Related Articles from the Blog

Published by Arvid Kahl

Leave a ReplyCancel reply

Scraping

Copycats

The Problem with Limiting Access

What to Share and What to Hide

Share this:

Related Articles from the Blog

Published by Arvid Kahl

Leave a ReplyCancel reply

Discover more from The Bootstrapped Founder