When Google Killed My Moderation API, I Had to Rebuild the Entire Stack

I was building an anonymous platform, so moderation wasn’t optional.

I used Perspective API for toxicity detection. It was simple. Send text, get a score, block or allow. It worked, so I didn’t question it much.

Then I found out it was being discontinued.

My moderation system didn’t degrade. It stopped.

The part I didn’t think about

Using an external API felt like the right move early on. I didn’t have to deal with models, which meant no infra, no optimization, and no real headaches. Just call the API and move on.

But I ignored a few things. I didn’t control the system. I had no fallback. I couldn’t tweak how it behaved.

It worked because everything was stable. The moment it wasn’t, I had a problem.

Looking for alternatives

My first instinct was simple. Replace the API.

I checked a few options. Some were expensive. Some were slow. Some didn’t give enough control. Nothing felt like a clean fit.

At that point, I had two choices. Keep depending on another service, or take ownership.

I went with the second.

When it broke

Moderation sat right in the middle of my flow. Every post and comment went through it.

When the API stopped being usable, I couldn’t safely accept content. I had no backup system, and the pipeline was effectively broken.

This wasn’t a bug. It was a missing system.

Looking for alternatives

My first instinct was simple. Replace the API.

I checked a few options. Some were expensive. Some were slow. Some didn’t give enough control. Nothing felt like a clean fit.

At that point, I had two choices. Keep depending on another service, or take ownership.

I went with the second.

Moving to a self-hosted model

I switched to a local model and picked Detoxify. It’s a BERT-based model trained for toxicity detection by Unitary.

This solved one problem. I was no longer dependent on an API.

But it created another one. Now I had to serve the model.

The real problem started here

Running a model on your machine is easy. Running it in a backend is different.

I had to deal with model size, inference time, CPU limits, and memory limits. I wasn’t using any high-end infra either. This had to run on a free tier.

At this point, moderation was no longer just a feature. It became a system I had to design.

Making it fast enough

The raw PyTorch model was too heavy. Inference was slow, and memory usage was high.

So I moved to ONNX.

I didn’t use a prebuilt export. I created my own using ONNX and onnx-script. This gave me more control over how the model runs.

After that, inference got faster, memory usage dropped, and performance became predictable.

Now it was usable.

Deploying it on cheap infra

I deployed the service on Oracle Ampere free tier. No GPU, limited CPU, and tight memory.

So everything had to be efficient.

The ONNX version made it possible to run inference within those limits. It wasn’t perfect, but it was stable, and that mattered more.

Current flow

This is what it looks like now.

A user submits content. It goes to the moderation service. The model checks toxicity, and the system decides whether to allow or block it.

The flow is simple. The work behind it is not.

This is just one layer of my moderation system, not the entire thing.

What changed for me

Before this, I treated APIs as building blocks.

Now I think differently. If something is core to the product, I want control over it.

External services are still useful, but not for everything.

What I would improve next

There are still gaps.

I’d move moderation to async queues, batch requests instead of handling them one by one, add proper metrics and monitoring, and explore smaller models to reduce cost.

This system works. It can be better.

Closing

I started with one API call.

Now I have a moderation service.

I didn’t plan for it. I was forced into it.

That’s how most real systems get built.

When Google Killed My Moderation API, I Had to Rebuild the Entire Stack

Table of Contents

The part I didn’t think about

Looking for alternatives

When it broke

Looking for alternatives

Moving to a self-hosted model

The real problem started here

Making it fast enough

Deploying it on cheap infra

Current flow

What changed for me

What I would improve next

Closing

Related content