Efficiently Delete Inactive User Data Using TypeScript and AWS Lambda

July 7, 2024

11 min read

Efficiently Delete Inactive User Data Using TypeScript and AWS Lambda
Watch on YouTube

Deleting Inactive User Data in Increaser

In this article, we will tackle a common issue: deleting inactive user data from our system. Our work will be within the TypeScript repository of the productivity app, Increaser. Although the source code for this project is housed in a private repository, all the reusable code can be found in the RadzionKit repository. Throughout this article, you will see the relevant code snippets needed to understand and implement the solution.

The Problem of Inactive Users

Increaser is a paid web application that offers a free trial period. During this trial, only a fraction of registered users convert to paying customers. Many users try the app and never return, leaving their data stored in our database and S3 storage, which incurs costs. To address this, we will create a Lambda function that runs daily to clean up inactive user data.

import { deleteInactiveAccounts } from "./deleteInactiveAccounts"
import { AWSLambda } from "@sentry/serverless"
import { getEnvVar } from "./getEnvVar"

AWSLambda.init({
  dsn: getEnvVar("SENTRY_KEY"),
})

export const handler = AWSLambda.wrapHandler(deleteInactiveAccounts)

Environment Variables

The getEnvVar function serves as the source of truth for all environment variables needed in this package. It throws an error if any required variable is missing.

type VariableName = "SENTRY_KEY" | "APP_URL"

export const getEnvVar = (name: VariableName): string => {
  const value = process.env[name]
  if (!value) {
    throw new Error(`Missing ${name} environment variable`)
  }

  return value
}

Cleanup Logic

Our logic will be straightforward: if a non-paying user hasn't opened the app in the last 60 days, we will send them a warning email about . If they still haven't opened the app in the last 90 days, we will delete their account.

import { Days } from "@lib/utils/time/types"

export const deleteInactiveAccountAfter: Days = 90
export const notifyInactiveAccountAfter: Days = 60

Tracking Last Visit

Since the app always queries the user state when the user opens the app, tracking the last visit is straightforward. We can update the user with the lastVisitAt field within the user resolver. If you're interested in learning how to implement a backend within a TypeScript monorepo, check out this post.

import { assertUserId } from "../../auth/assertUserId"
import { getUser, updateUser } from "@increaser/db/user"
import { ApiResolver } from "../../resolvers/ApiResolver"

export const user: ApiResolver<"user"> = async ({
  input: { timeZone },
  context,
}) => {
  const userId = assertUserId(context)

  await updateUser(userId, { timeZone, lastVisitAt: Date.now() })

  return getUser(userId)
}

Fetching Users

First, we fetch all the users from the database, limiting the query to only the necessary fields. Currently, the system doesn't have many users, but as the data grows, it would be more efficient to process users in batches. We use DynamoDB as our database and employ the scan operation to retrieve all users.

import { getAllUsers, updateUser } from "@increaser/db/user"
import { convertDuration } from "@lib/utils/time/convertDuration"

import { isActiveSubscription } from "@increaser/entities-utils/subscription/isActiveSubscription"
import { deleteUser } from "@increaser/data-services/users/deleteUser"
import { notifyAboutAccountDeletion } from "./notifyAboutAccountDeletion"
import { reportError } from "@lib/lambda/reportError"
import {
  deleteInactiveAccountAfter,
  notifyInactiveAccountAfter,
} from "@increaser/config"

export const deleteInactiveAccounts = async () => {
  const users = await getAllUsers([
    "id",
    "email",
    "accountDeletionEmailSentAt",
    "lastVisitAt",
    "lifeTimeDeal",
    "subscription",
    "freeTrialEnd",
  ])
  const now = Date.now()
  await Promise.all(
    users.map(
      async ({
        id,
        email,
        accountDeletionEmailSentAt,
        lastVisitAt,
        subscription,
        lifeTimeDeal,
        freeTrialEnd,
      }) => {
        try {
          if (freeTrialEnd > now) return

          if (lifeTimeDeal) return

          if (subscription && isActiveSubscription(subscription)) return

          if (
            accountDeletionEmailSentAt &&
            lastVisitAt < accountDeletionEmailSentAt
          ) {
            const shouldBeDeletedAt =
              accountDeletionEmailSentAt +
              convertDuration(
                deleteInactiveAccountAfter - notifyInactiveAccountAfter,
                "d",
                "ms"
              )
            if (shouldBeDeletedAt < now) {
              await deleteUser(id)
            }
          } else if (
            convertDuration(now - lastVisitAt, "ms", "d") >
            notifyInactiveAccountAfter
          ) {
            await notifyAboutAccountDeletion({ email })
            await updateUser(id, {
              accountDeletionEmailSentAt: now,
            })
          }
        } catch (error) {
          reportError(error, {
            id,
            email,
            msg: "deleteInactiveAccounts: Failed to handle user",
          })
        }
      }
    )
  )
}

Generating Query Parameters

We use DynamoDB as our database and employ the scan operation to retrieve all users. With the help of the getPickParams function from the RadzionKit repository, we can generate all the necessary parameters to apply the ProjectionExpression and query only the required attributes.

import { User } from "@increaser/entities/User"
import { tableName } from "./tableName"
import { getPickParams } from "@lib/dynamodb/getPickParams"
import { totalScan } from "@lib/dynamodb/totalScan"

export const getAllUsers = async <T extends (keyof User)[]>(attributes: T) => {
  return totalScan<Pick<User, T[number]>>({
    TableName: tableName.users,
    ...getPickParams(attributes),
  })
}

Handling Pagination

The totalScan function handles pagination and waits until there are no more items to fetch by checking the LastEvaluatedKey field in the response. You can find the implementation of the totalScan and fetchAll functions in the RadzionKit repository.

import { ScanCommand, ScanCommandInput } from "@aws-sdk/lib-dynamodb"
import { fetchAll } from "@lib/utils/query/fetchAll"
import { dbDocClient } from "./client"

export const totalScan = <T,>(
  params: Omit<ScanCommandInput, "ExclusiveStartKey">
): Promise<T[]> => {
  return fetchAll({
    fetch: (lastEvaluatedKey: ScanCommandInput["ExclusiveStartKey"]) => {
      return dbDocClient.send(
        new ScanCommand({
          ExclusiveStartKey: lastEvaluatedKey,
          ...params,
        })
      )
    },
    getItems: (response) => response.Items as T[],
    getNextPageParam: (response) => response.LastEvaluatedKey ?? null,
  })
}

Processing Users

Once we have all the users, we proceed by iterating over each user and wrapping the operations within a Promise.all to wait until all the promises are resolved. If we fail to handle one of the items, we won't stop the process. Instead, we will report the error to Sentry via the reportError function, which supports adding context to the error for better understanding of the issue.

import { ErrorWithContext } from "@lib/utils/errors/ErrorWithContext"
import { getErrorMessage } from "@lib/utils/getErrorMessage"
import { isRecordEmpty } from "@lib/utils/record/isRecordEmpty"
import * as Sentry from "@sentry/serverless"

export const reportError = (
  err: unknown,
  errorContext?: Record<string, any>
) => {
  const context = {
    ...errorContext,
    ...(err instanceof ErrorWithContext ? err.context : {}),
  }
  console.log(
    `Reporting an error: ${getErrorMessage(err)}${
      !isRecordEmpty(context)
        ? ` with context: ${JSON.stringify(errorContext)}`
        : ""
    }}`
  )

  Sentry.withScope((scope) => {
    Object.entries(context).forEach(([key, value]) => {
      scope.setExtra(key, value)
    })

    Sentry.captureException(err)
  })
}

Skipping Active Users

First, we check if the user has a free trial, an active subscription, or a lifetime deal. If any of these conditions are met, we skip the user. Next, we check if the user has received an account deletion email and if they haven't visited the app after the email was sent. In our logic, we ignore accountDeletionEmailSentAt if the user visited the app after the email was sent. Then, we calculate the date when the account should be deleted and compare it with the current date. If the date has passed, we delete the user. If the user hasn't visited the app for a certain period, we send them an email about the account deletion and update the accountDeletionEmailSentAt field.

Deleting User Data

When deleting a user, we first remove the user item from the users table. Next, we proceed with removing the user's folder from the S3 bucket. Finally, we update the scoreboard tables and features table, as they might contain references to the user.

import { getAllFeatures, updateFeature } from "@increaser/db/features"
import { getScoreboard, updateScoreboard } from "@increaser/db/scoreboard"
import * as userDb from "@increaser/db/user"
import { scoreboardPeriods } from "@increaser/entities/PerformanceScoreboard"
import { deletePublicBucketFolder } from "@increaser/public/deletePublicBucketFolder"
import { getPublicBucketUserFolder } from "@increaser/public/getPublickBucketUserFolder"

export const deleteUser = async (id: string) => {
  console.log(`Deleting user with id: ${id}`)

  await userDb.deleteUser(id)

  await deletePublicBucketFolder(getPublicBucketUserFolder(id))

  const features = await getAllFeatures(["id", "proposedBy"])
  const featuresToUpdate = features.filter(
    (feature) => feature.proposedBy === id
  )
  await Promise.all(
    featuresToUpdate.map((feature) =>
      updateFeature(feature.id, {
        proposedBy: undefined,
      })
    )
  )

  const scoreboards = await Promise.all(
    scoreboardPeriods.map((period) => getScoreboard(period))
  )
  await Promise.all(
    scoreboards.map((scoreboard) => {
      const isUserInScoreboard = scoreboard.users.find((user) => user.id === id)
      if (isUserInScoreboard) {
        return updateScoreboard(scoreboard.id, {
          users: scoreboard.users.filter((user) => user.id !== id),
        })
      }
    })
  )
}

Saving User Emails

While Increaser doesn't have email marketing campaigns, we still save the user's email to a table dedicated to user emails. This way, even if we delete the user, we can still reach out to them if needed.

import { getUserByEmail, putUser } from "@increaser/db/user"
import { AuthenticationResult } from "./AuthenticationResult"
import { getAuthSession } from "./getAuthSession"
import { getUserInitialFields } from "@increaser/entities-utils/user/getUserInitialFields"
import { AuthSession } from "@increaser/entities/AuthSession"
import { CountryCode } from "@lib/countries"
import { putEmail } from "@increaser/db/email"
import { asyncAttempt } from "@lib/utils/asyncAttempt"

interface AuthorizeParams extends AuthenticationResult {
  timeZone: number
  country?: CountryCode
}

export const authorize = async ({
  email,
  name,
  country,
  timeZone,
}: AuthorizeParams): Promise<AuthSession> => {
  const existingUser = await getUserByEmail(email, ["id"])
  if (existingUser) {
    return {
      ...(await getAuthSession(existingUser.id)),
      isFirst: false,
    }
  }

  const newUser = getUserInitialFields({
    email,
    name,
    country,
    timeZone,
  })

  await putUser(newUser)
  await asyncAttempt(() => putEmail({ id: email }), undefined)

  const session = await getAuthSession(newUser.id)

  return {
    ...session,
    isFirst: true,
  }
}

Notifying Users

To notify the user, we send a plain text email informing them that their account will be deleted if they don't visit the app within the next 30 days.

import {
  deleteInactiveAccountAfter,
  notifyInactiveAccountAfter,
  productName,
} from "@increaser/config"
import { getEnvVar } from "../api/getEnvVar"

import { sendEmail } from "@increaser/email/utils/sendEmail"

type NotifyAboutAccountDeletionParams = {
  email: string
}

export const notifyAboutAccountDeletion = async ({
  email,
}: NotifyAboutAccountDeletionParams) => {
  console.log(`Notifying user about account deletion: ${email}`)

  const appUrl = getEnvVar("APP_URL")

  const body = `
  <p>Dear User,</p>
  <p>We hope this message finds you well. We are writing to inform you that your ${productName} account has been inactive for ${notifyInactiveAccountAfter} days. As per our policy, accounts that remain inactive for more than ${deleteInactiveAccountAfter} days are subject to deletion.</p>
  <p>To avoid having your account deleted, please visit the app within the next ${
    deleteInactiveAccountAfter - notifyInactiveAccountAfter
  } days:</p>
  <p><a href="${appUrl}">${appUrl}</a></p>
  <p>If you have any questions or need further assistance, feel free to contact our support team.</p>
  <p>Best regards,<br>The ${productName} Team</p>
`

  await sendEmail({
    email,
    body,
    subject: `Important: Your ${productName} Account is Scheduled for Deletion`,
    source: "Increaser <noreply@increaser.org>",
  })
}

Rate Limiting Email Sending

We send emails using AWS SES. Since we might trigger a large number of emails at once, we use the makeFunctionRateLimitProtected function from the RadzionKit repository to rate-limit the function, ensuring we do not send more than 14 emails per second.

import { SESv2Client, SendEmailCommand } from "@aws-sdk/client-sesv2"
import { getEnvVar } from "./getEnvVar"
import { makeFunctionRateLimitProtected } from "@lib/utils/makeFunctionRateLimitProtected"

interface SendEmailParameters {
  email: string
  body: string
  subject: string
  source: string
}

const client = new SESv2Client({ region: getEnvVar("SES_AWS_REGION") })

export const sendEmail = makeFunctionRateLimitProtected({
  func: ({ email, body, subject, source }: SendEmailParameters) => {
    console.log(`Sending email to ${email} with a subject: ${subject}`)
    const command = new SendEmailCommand({
      Destination: {
        ToAddresses: [email],
      },
      Content: {
        Simple: {
          Body: {
            Html: {
              Data: body,
            },
          },
          Subject: {
            Data: subject,
          },
        },
      },
      FromEmailAddress: source,
    })

    return client.send(command)
  },
  delay: 100,
})

Hosting the Lambda Function

We will host our Lambda on AWS using Terraform to manage our AWS infrastructure. The configuration begins with defining the S3 backend for Terraform's state. We create an S3 bucket for the Lambda function code and set up the aws_lambda_function resource with details such as memory size, handler, runtime, and environment variables. An IAM role and policy grant necessary permissions to access S3, DynamoDB, and CloudWatch logs. A CloudWatch Events rule triggers the Lambda function daily, ensuring the cleanup runs automatically.

terraform {
  backend "s3" {
    bucket = "infrastructure-remote-state"
    key    = "pomodoro/accounts-cleaner.tfstate"
    region = "eu-central-1"
  }
}

resource "aws_s3_bucket" "lambda_code_storage" {
  bucket = "tf-${var.name}-storage"
}

resource "aws_lambda_function" "service" {
  function_name = "tf-${var.name}"

  s3_bucket   = aws_s3_bucket.lambda_code_storage.bucket
  s3_key      = "lambda.zip"
  memory_size = "1024"

  handler = "index.handler"
  runtime = "nodejs20.x"
  timeout = "50"
  role          = aws_iam_role.service_role.arn

  environment {
    variables = {
      SENTRY_KEY : var.sentry_key,
      APP_URL : var.app_url,
      EMAIL_DOMAIN : var.email_domain,
      SES_AWS_REGION : var.ses_aws_region,
      PUBLIC_BUCKET_NAME: var.public_bucket_name,
      PUBLIC_BUCKET_REGION: var.public_bucket_region
    }
  }
}

resource "aws_iam_role" "service_role" {
  name = "tf_${var.name}_lambda_execution_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_policy" "service_permissions" {
  name = "tf_${var.name}_service_permissions"

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Sid = "AllowDynamoDBActionsUsersTable",
        Effect = "Allow",
        Action = "dynamodb:*",
        Resource = "${var.users_table_arn}"
      },
      {
        Sid = "AllowDynamoDBActionsFeaturesTable",
        Effect = "Allow",
        Action = "dynamodb:*",
        Resource = "${var.features_table_arn}"
      },
      {
        Sid = "AllowDynamoDBActionsScoreboardsTable",
        Effect = "Allow",
        Action = "dynamodb:*",
        Resource = "${var.scoreboards_table_arn}"
      },
      {
        Sid = "AllowSpecificS3ActionsOnPublicBucket",
        Effect = "Allow",
        Action = [
          "s3:PutObject",
          "s3:GetObject",
          "s3:DeleteObject"
        ],
        Resource = [
          "arn:aws:s3:::${var.public_bucket_name}",
          "arn:aws:s3:::${var.public_bucket_name}/*"
        ]
      },
      {
        Sid = "AllowSendEmailSES",
        Effect = "Allow",
        Action = [
          "ses:SendEmail"
        ],
        Resource = "*"
      },
      {
        Sid = "AllowCloudWatchLogs",
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ],
        Effect = "Allow",
        Resource = "*"
      }
    ]
  })
}


resource "aws_iam_role_policy_attachment" "service_role_attachment" {
  policy_arn = aws_iam_policy.service_permissions.arn
  role       = aws_iam_role.service_role.name
}

resource "aws_cloudwatch_event_rule" "cron" {
  name = "tf-${var.name}"
  schedule_expression = "rate(1 day)"
}

resource "aws_cloudwatch_event_target" "cron" {
  rule = "${aws_cloudwatch_event_rule.cron.name}"
  target_id = "tf-${var.name}"
  arn = "${aws_lambda_function.service.arn}"
}

resource "aws_lambda_permission" "cron_cloudwatch" {
  statement_id = "AllowExecutionFromCloudWatch"
  action = "lambda:InvokeFunction"
  function_name = "${aws_lambda_function.service.function_name}"
  principal = "events.amazonaws.com"
  source_arn = "${aws_cloudwatch_event_rule.cron.arn}"
}

resource "aws_cloudwatch_log_group" "lambda_log_group" {
  name              = "/aws/lambda/tf-${var.name}"
  retention_in_days = 14
}