Building a Notification Service: Top Technical Mistakes

Sending product-to-user notifications looks easy. But this blog post lists software teams' critical technical mistakes when implementing their notification service.

 Min read
October 17, 2022

Product-to-user notifications seem easy, right? Integrate an email or push service, and you are done.

Well... not really.

This blog post lists software teams' top technical mistakes when implementing their notification service, followed by their solutions!

Summary (mistake => consequence)

  1. Not implementing a suppression list => Getting banned
  2. Poorly-thought automated tests => Getting banned
  3. Not implementing user email verification => Getting banned
  4. Not implementing notification preferences => Getting banned
  5. Not separating notifications from business logic => Performance issues
  6. Not implementing a job/queue mechanism => Notifications not being sent
  7. Lack of monitoring and control => Angry users
  8. Poor sender verification => Reduced deliverability and throttling

Let's get into it:

Not implementing a suppression list

Consequence: Getting banned on SES, Twilio, SendGrid, MailGun, or similar

Let's understand two simple concepts first:

  • Bounce: When you send an email/text to an address that doesn't exist
  • Complaint: When a user complains about your notification (this happens even with users who signed up for your software)

Notification delivery services (SES, SendGrid, Twilio, ...) monitor your bounces and complaints and take action by temporarily or permanently banning your account. You may be surprised by how easy it is to reach their thresholds. For example, AWS SES recommends a complaint ratio of 1/1000. That means you are above the threshold if you have ONE complaint out of A THOUSAND emails.

What to do:

A suppression list is a list of email addresses or phone numbers that your notification service will never use again. You can add addresses that produced a bounce/complaint to this list to avoid getting more bounces/complaints. Some services, such as SES or SendGrid, have a suppression list built-in which you should configure. NotificationAPI comes with a suppression list automatically.

If you are not using any of these, consider implementing a suppression list using a database table.

Poorly-thought automated tests

Consequence: Getting banned

What do automated tests and notifications have in common? Test users! If you have integration tests, stress tests, or end-to-end tests, chances are you are creating new users in your tests. What happens if these test users use imaginary email addresses or phone numbers? You get BOUNCES at every test run!

What to do:

Check your automated tests to ensure your tests use an actual email address and phone number. Or add your test users to the suppression list.

Not implementing user verification

Consequence: Getting banned

Users can sign up for new services using fake or temporary email addresses. If you don't verify new users' email addresses and continue to send notifications to them, you will have more bounces.

What to do:

Enabling user verification is pretty straightforward if you use an Authentication-as-a-Service solution like Auth0 or Cognito. If you don't (you really should), consider implementing user verification using a service such as Twilio Verify.

Not implementing notification preferences

Consequence: Getting banned (again)

The suppression list only stops complaints from happening again from the same address. Ideally, you are not frustrating your user to report you as spam in the first place. The easiest solution is to have an unsubscribe option in your notifications or implement a full-on user notification preferences page.

What to do:

Refer to our blog on Notification Service Design for architectural insights and diagrams for this. If you use NotificationAPI, it comes with the unsubscribe option and user notification preferences out-of-the-box.

Not separating notifications from business logic

Consequence: Performance Issues

You notice that notifications require logic and network calls, e.g., checking user preferences and suppression lists. Separating the notifications has two benefits:

  • In most cases, you want your business logic to continue even if the notification network calls or logic fails
  • Cleaner and maintainable codebase

What to do:

Not implementing a job/queue mechanism

Consequence: Notifications not being sent

As your user base grows and your notifications grow in number and complexity, you will be putting a lot of stress on your notification service. For example, it is common for software companies to implement a weekly digest. At some point, sending a weekly digest to X thousand users grows beyond the capability of one worker/function, resulting in function timeouts or out-of-memory errors.

What to do:

If you plan to implement a notification service, it's easier to implement a job/queue mechanism at the beginning. If you already have a notification service in place, make sure to implement some monitoring to at least know when you need to refactor. This brings us to the next point:

Lack of monitoring and control

Consequence: Angry users

Here's an easy test to know if you have good monitoring in place:

When an issue happens - let's say with notifications - is the engineering team the first to know?

Here are the two common problems we keep seeing with teams implementing their own notification service:

  • Notifications not being sent - happens due to a bug or hitting a service quota
  • Users being bombarded by notifications due to faulty code - happens with handling notification failures poorly

What to do:

  • Setup alerts on your outgoing notifications, which should be triggered when the volume is too low or too high
  • Create a kill switch so you can stop sending out notifications when there is faulty code bombarding people with notifications

Poor sender verification

Consequence: Reduced deliverability and throttling

To reliably send a large volume of notifications, you need to verify yourself to the world and those in charge in various forms:

  • DMARC using both DKIM and SPF for email
  • Verifying with The Campaign Registry for telecom (SMS/MMS/Calls)
  • A good reputation and relationship with your cloud provider for increasing quotas
  • A professional website, active social networks, and communication channels (there are almost always manual verification checks in place)

Final Thoughts

I am sure you are thinking:

So much work for sending a notification... WTF

Well, that is precisely why we built NotificationAPI. So you avoid months of engineering effort and job dissatisfaction, delight your users and not have your business team wonder where your time goes.

Like the article? Spread the word