The Ops Community ⚙️

Cover image for Task scheduling with a message broker
Avital Trifsik for Memphis{dev}

Posted on • Originally published at memphis.dev

Task scheduling with a message broker

Introduction

Task scheduling is essential in modern applications to maximize resource utilization and user experience (Non-blocking task fulfillment).
A queue is a powerful tool that allows your application to manage and prioritize tasks in a structured, persistent, and scalable way.
While there are multiple possible solutions, working with a queue (which is also the perfect data structure for that type of work), can ensure that tasks are completed in their creation order without the risk of forgetting, overlooking, or double-processing critical tasks.

A very interesting story on the need and evolvement as the scale grows can be found in one of DigitalOcean’s co-founder’s blog post
From 15,000 database connections to under 100.


Any other solutions besides a queue?

Multiple. Each with its own advantages and disadvantages.

Cron
You can use cron job schedulers to automate such tasks. The issue with cron is that the job and its execution time have to be written explicitly and before the actual execution, making your architecture highly static and not event-driven. Mainly suitable for a well-defined and known set of tasks that either way have to take place, not by a user action.

Database
A database can be a good and simple choice for a task storing place, and actually used for that in the early days of a product MVP,
but there are multiple issues with that approach, for example:

  1. Ordering of insertion is not guaranteed, and therefore the tasks handling might not take place in the order they actually got created.
  2. Double processing can happen as the nature of a database is not to delete a record once read, so there is a potential of double reading and processing a specific task, and the results of that can be catastrophic to a system’s behavior.

Traditional queues

Often, for task scheduling, the chosen queue would probably be a pub/sub system like RabbitMQ.

Choosing RabbitMQ over a classic broker such as Kafka, for example, in the context of task scheduling does make sense as a more suitable tool for that type of task given the natural behavior of Kafka to retain records (or tasks) till a specific point in time, no matter if acknowledged or not.

The downside in choosing RabbitMQ would be the lack of scale, robustness, and performance, which in time become increasingly needed.

With that idea in mind, Memphis is a broker that presents scale, robustness, and high throughput alongside a type of retention that fully enables task scheduling over a message broker.


Memphis Broker is a perfect queue for task scheduling

On v1.2, Memphis released its support for ACK-based retention through Memphis Cloud. Read more here.

Messages will be removed from a station only when acknowledged by all the connected consumer groups. For example:

  • If we have only one connected consumer group when a message/record is acknowledged, it will be automatically removed from the station.

  • If we have two connected consumer groups, the message will be removed from the station (=queue) once all CGs acknowledge the message.

We mentioned earlier the advantages and disadvantages of using traditional queues such as RabbitMQ in comparison to common brokers such as Kafka in the context of task scheduling. When comparing both tools to Memphis, it’s all about getting the best from both worlds.

A few of Memphis.dev advantages –

  1. Ordering
  2. Exactly-once delivery guarantee
  3. Highly scalable, serving data in high throughput with low 4. latency
  4. Ack-based retention
  5. Many-to-Many pattern

Getting started with Memphis Broker as a tasks queue

  1. Sign up to Memphis Cloud.
  2. Connect your task producer –
  3. Producers are the entities that insert new records or tasks.
  4. Consumers are the entities who read and process them.
  5. A single client with a single connection object can act as both at the same time, meaning be both a producer and a consumer. Not to the same station because it will lead to an infinite loop. It’s doable, but not making much sense. That pattern is more to reduce footprint and needed “workers” so a single worker can produce tasks to a specific station, but can also act as a consumer or a processor to another station of a different use case. The below code example will create an Ack-based station and initiate a producer in node.js –
const { memphis } = require("memphis-dev");

(async function () {
  let memphisConnection;

  try {
    memphisConnection = await memphis.connect({
      host: "MEMPHIS_BROKER_HOSTNAME",
      username: "CLIENT_TYPE_USERNAME",
      password: "PASSWORD",
      accountId: ACCOUNT_ID
    });

    const station = await memphis.station({
      name: 'tasks',
      retentionType: memphis.retentionTypes.ACK_BASED,
    })

    const producer = await memphisConnection.producer({
      stationName: "tasks",
      producerName: "producer-1",
    });

    const headers = memphis.headers();
    headers.add("Some_KEY", "Some_VALUE");
    await producer.produce({
      message: {taskID: 123, task: "deploy a new instance"}, // you can also send JS object - {}
      headers: headers,
    });

    memphisConnection.close();
  } catch (ex) {
    console.log(ex);
    if (memphisConnection) memphisConnection.close();
  }
})();
Enter fullscreen mode Exit fullscreen mode
  1. Connect your task consumer –
  2. The below consumer group will consume tasks, process them, and, once finished – acknowledge them. By acknowledging the tasks, the broker will make sure to remove those records to ensure exactly-once processing. We are using the station entity here as well in case the consumer starts before the producer. No need to worry. It is applied if the station does not exist yet.Another thing to remember is that a consumer group can contain multiple consumers to increase parallelism and read-throughput. Within each consumer group, only a single consumer will read and ack the specific message, not all the contained consumers. In case that pattern is needed, then multiple consumer groups are needed.
const { memphis } = require("memphis-dev");

(async function () {
  let memphisConnection;

  try {
    memphisConnection = await memphis.connect({
      host: "MEMPHIS_BROKER_HOSTNAME",
      username: "APPLICATION_TYPE_USERNAME",
      password: "PASSWORD",
      accountId: ACCOUNT_ID
    });

    const station = await memphis.station({
      name: 'tasks',
      retentionType: memphis.retentionTypes.ACK_BASED,
    })

    const consumer = await memphisConnection.consumer({
      stationName: "tasks",
      consumerName: "worker1",
      consumerGroup: "cg_workers",
    });

    consumer.setContext({ key: "value" });
    consumer.on("message", (message, context) => {
      console.log(message.getData().toString());
      message.ack();
      const headers = message.getHeaders();
    });

    consumer.on("error", (error) => {});
  } catch (ex) {
    console.log(ex);
    if (memphisConnection) memphisConnection.close();
  }
})();
Enter fullscreen mode Exit fullscreen mode

If you liked the tutorial and want to learn what else you can do with Memphis Head here


Join 4500+ others and sign up for our data engineering newsletter.


Follow Us to get the latest updates!
GithubDocs•[Discord (https://discord.com/invite/DfWFT7fzUu)


Originally published at Memphis.dev By Shay Bratslavsky, Software Engineer at @Memphis.dev

Top comments (1)

Collapse
 
iziodev profile image
Romain Billot

Have you tried task (workflow) management tools such as Netflix Conduct, Uber Cadence or the shiny one : Temporal?

If that's so, what would be the benefit of manually orchestrate all of this? Would it be for better lower level controls for specific tasks?