Building a Notification Service: Top Technical Mistakes

Product-to-user notifications seem easy, right? Integrate an email or push service, and you are done.

Well… not really.

This blog post lists software teams’ top technical mistakes when implementing their notification service, followed by their solutions!

Summary - mistakes & consequences

Not implementing a suppression list - increasing bounces (and risk of getting banned)
Poorly-thought automated tests - increasing bounces
Not implementing user email verification - increasing bounces
Not implementing notification preferences - increasing spam complaints
Not separating notifications from business logic - performance issues and risk of outage
Not implementing a job/queue mechanism - reduced deliverability
Lack of monitoring and control - angry users and PMs
Poor sender verification - reduced deliverability and throttling

Let’s get into it:

‍Not implementing a suppression list

Consequence: Increased bounces and risk of getting banned on SES, Twilio, SendGrid, MailGun, or similar

Let’s understand two simple concepts first:

Bounce: When you send an email/text to an address that doesn’t exist
Complaint: When a user complains about your notification (this happens even with users who signed up for your software)

Notification delivery services (SES, SendGrid, Twilio, …) monitor your bounces and complaints and take action by temporarily or permanently banning your account. You may be surprised by how easy it is to reach their thresholds. For example, AWS SES recommends a complaint ratio of 1/1000. That means you are above the threshold if you have ONE complaint from A THOUSAND emails.

What to do:

A suppression list is a list of email addresses or phone numbers that your notification service will never use again. You can add addresses that produced a bounce/complaint to this list to avoid getting more bounces/complaints. Some services, such as SES or SendGrid, have built-in suppression lists, which you should configure. NotificationAPI comes with a suppression list automatically.

If you are not using any of these, consider implementing a suppression list using a database table.

‍

Poorly-thought automated tests

Consequence: Increased bounces and risk of getting banned

What do automated tests and notifications have in common? Test users! If you have integration tests, stress tests, or end-to-end tests, chances are you are creating new users in your tests. What happens if these test users use imaginary email addresses or phone numbers? You get BOUNCES at every test run!

What to do:

Check your automated tests to ensure your tests use an actual email address and phone number. Or add your test users to the suppression list.

Not implementing user verification

Consequence: Increased bounces and risk of getting banned

Users can sign up for new services using fake or temporary email addresses. If you don’t verify new users’ email addresses and continue to send notifications to them, you will have more bounces.

What to do:

Enabling user verification is pretty straightforward if you use an Authentication-as-a-Service solution like Auth0 or Cognito. If you don’t (you really should), consider implementing user verification using a service such as Twilio Verify.

Not implementing notification preferences

Consequence: Increased bounces and risk of getting banned

The suppression list only stops complaints from happening again from the same address. Ideally, you are not frustrating your user to report you as spam in the first place. The easiest solution is to have an unsubscribe option in your notifications or implement a full-on user notification preferences page.

What to do:

Refer to our blog on Notification Service Design for architectural insights and diagrams for this. If you use NotificationAPI, it comes with the unsubscribe option and user notification preferences out-of-the-box.

Not separating notifications from business logic

Consequence: Performance issues and risk of outage

You notice that notifications require a lot of logic and network calls; for example, checking user preferences, suppression lists, account status, batching into daily/weekly jobs and so on. Separating the notifications has two benefits:

In most cases, you want your business logic to continue even if the notification network calls or logic fails
Cleaner and maintainable codebase

What to do:

To have a better understanding of a notification service design with a broad picture, please read our Notification Service Design Guide, or
To understand the design requirements of a Serverless notification service, please read our Serverless Notification Service Design

Not implementing a job/queue mechanism

Consequence: Notifications not being sent

As your user base grows and your notifications grow in number and complexity, you will be putting a lot of stress on your notification service. For example, it is common for software companies to implement a weekly digest. At some point, sending a weekly digest to X thousand users grows beyond the capability of one worker/function, resulting in function timeouts or out-of-memory errors.

What to do:

If you plan to implement a notification service, it’s easier to implement a job/queue mechanism at the beginning. If you already have a notification service in place, make sure to implement some monitoring to at least know when you need to refactor. This brings us to the next point:

Lack of monitoring and control

Consequence: Angry users and PMs

Here’s an easy test to know if you have good monitoring in place:

When an issue happens - let’s say with notifications - is the engineering team the first to know?

Here are the two common problems we keep seeing with teams implementing their own notification service:

Notifications not being sent or sent too much, generally due to software or logical bugs
Poor notification UX/UI - happens when PMs don’t have control over the notification system

What to do:

Setup monitoring dashboards for your outgoing notifications, ideally per type of notificaiton
Create controls and visual editors for your non-technical team to improve the notifications

Poor sender verification

Consequence: Reduced deliverability and throttling

To reliably send a large volume of notifications, you need to verify yourself to the world and those in charge in various forms:

DMARC using both DKIM and SPF for email
Verifying with The Campaign Registry for telecom (SMS/MMS/Calls)
A good reputation and relationship with your cloud provider for increasing quotas
A professional website, active social networks, and communication channels (there are almost always manual verification checks in place)

Final Thoughts

I am sure you are thinking:

Really? Is building a good notification system this much work?

Let me ask you a question: Are you happy with the notifications you receive from reputable applications you use?

‍NotificationAPI exists because it is difficult. Most engineering teams we talk to have spent weeks or not months building a disappointing notification feature. Delight your users and avoid having your business team wonder where your time went.

Features

Channels

Building a Notification Service: Top Technical Mistakes

Summary - mistakes & consequences

Poorly-thought automated tests

Not implementing user verification

Not implementing notification preferences

Not separating notifications from business logic

Not implementing a job/queue mechanism

Lack of monitoring and control

Poor sender verification

Final Thoughts