Serverless Notification Service Design in AWS - with diagrams

In this article, we design a fully Serverless Notification Service using AWS to send product-to-user notifications. Fully scalable, extensible, with minimal cost.

Engineering
 — 
13
 Min read
 — 
June 7, 2022

Background and Motivation

We discussed why some software development teams consider an internal or external notification service in our Ultimate Guide to Notification Services, including reasons such as:

  • Complexity of notifications (number of channels, number of notifications) demanding a service-oriented approach
  • Reliability and scalability
  • Creating internal tools for observability and monitoring

We also proposed a generic Notification Service Design compatible with any cloud provider, with critical technical constraints and business requirements. This article proposes a Serverless design for a notification microservice with AWS services, given the same objectives mentioned in the previous article.

High-Level Architecture

Serverless Notification Service Design - Click to see in a new tab

Step by Step Architecture:

1. POST /send

  • The /send end-point is a Lambda function with AWS's new Function URL capability.
  • The Function URL authorizes the incoming requests based on the IAM_ROLE of the caller. This allows you to skip API Gateway or write any authorization code. So, for example:
    A) Your code runs on EC2 with a specific IAM Role
    B) You modify the policy on your EC2 role to allow invoking the /send end-point from your code
    C) When making an API call to /send from your code, you follow this article to sign your requests
  • The request to this end-point contains the userId of the user. This allows the service that sends the notification has no knowledge of your user's attributes.
  • The request also contains a notificationId, which indicates the type of notification, such as alert or new_customer.
  • The request to this end-point also contains all the channels the service wishes to send. These channels will get filtered in the next step based on the user preferences.

2. Match Notification Preferences

  • A DynamoDB stores records that indicating whether user should receive a specific notification on a specific channel. Record structure: 
    KEY: user_id:notification_id, VALUE: [{channel: "email", state: true}, {channel: "sms", state: false}]
  • If the send end-point reads a "false" value from the records, it will ignore that channel. If a record for a channel does not exist, it means the user has not explicitly set their preferences. In this case, you need to agree to a default.
  • Notice how we allow users to update the data in this table directly. How does this work? You can allow your front-end to directly update records in a DynamoDB with Fine-Grained IAM Policy for DynamoDB. For example, you only allow users to update records if the KEY contains their userId.

3. Get User Attributes

  • The /send end-point read the user's email address, phone number, ... from the Cognito User Pool.
  • Notice that you can only make Cognito's getUser() request 120 times per second. If you wish to send at higher scales, you may store this information in a DynamoDB.

4. Publish to Fanout

  • The /send end-point forms a message similar to the image above (top-right) and sends it to SNS, only containing channels that the user is subscribed to.
  • The SNS is configured to fanout this message to multiple queues.
  • However, SNS is configured to filter destinations based on the list of channels in the message.
  • Example:
    A) The caller requests sending a notification using email and SMS
    B) The /send end-point decides that the user is not subscribed to SMS using the DynamoDB, so it only includes the email data in the SNS message
    C) SNS looks at the channels in the message and only duplicates the message to the email queue

5. Fanout to Job Processors

  • The messages go into SQS queues until they are processed.
  • You can configure complicated or straightforward failure and retry mechanisms at this stage.
  • We recommend setting the retry to 1. We also recommend setting up a DLQ with a 24hour time to live window. It will hold onto failed messages to reroute them later when issues are resolved.

6. Process Delivery Jobs

  • These are simple Lambda functions that take the message from the queue and send it to the appropriate service. For example, the email sender lambda would read the message and send it to SendGrid or a similar service.
  • You can limit the number of concurrent lambdas to avoid sending too many emails or SMS and getting throttled.

Further Improvements

Here are a few things that are possible but we haven't covered. If you need any of these capabilities, read the next section.

  • In-App Notifications: They are complex. They require tables, APIs, and UIs
  • Centralizing the notification UI elements and providing a visual editor on it
  • Feature flags for notifications
  • Monitoring and reporting

An external Notification-as-a-Service

Building an internal notification service is a lot of (annoying) work. We know because we have built this a few times in our careers.

That is why we made NotificationAPI: a plug-and-play notification-as-a-service solution. It takes 10 minutes to configure and integrate and gives you a lot more functionality, including in-app notifications, feature flags, and logs.

"If I had to describe NotificationAPI in one word, it would be simple!"

-- Jacob Brown, Software Team Lead

Watch Jacob's interview, discussing how they implemented their own notifications, then switch to NotificationAPI.

Like the article? Spread the word