diff --git a/content/blog/webhook-notifications-on-systemd-service-failure.md b/content/blog/webhook-notifications-on-systemd-service-failure.md new file mode 100644 index 0000000..afc8207 --- /dev/null +++ b/content/blog/webhook-notifications-on-systemd-service-failure.md @@ -0,0 +1,124 @@ +--- +title: "Webhook notifications on systemd service failure" +date: 2024-09-04T21:03:38-07:00 +draft: false +tags: [] +math: false +medium_enabled: false +--- + +Every morning like every good system administrator, I log onto all my machines and type the following command + +```bash +systemctl --failed +``` + +This gives me a list of all my systemd services that have failed. I pray that it's empty + +``` + UNIT LOAD ACTIVE SUB DESCRIPTION +0 loaded units listed. + +``` + +Except, I don't. + +Instead, I have it set up so that I receive a webhook notification via [Zulip](https://zulip.com) whenever a service fails. With the right infrastructure in place, it's as simple as adding a `OnFailure` line to all the services you want to monitor. + +**Step 1:** Setting up the webhook. + +On Zulip, I use the [Slack incoming webhook](https://zulip.com/integrations/doc/slack_incoming) integration. (Note the [URL specification](https://zulip.com/api/incoming-webhooks-overview#url-specification)) + +As you might guess, this style of webhook works on Slack and on Discord as well. + +For our notification script we'll need two environmental variables + +| Name | Description | +| ----------- | ------------------------------------------------------------ | +| SERVICE | The name of the `systemd` service. We will automatically populate this | +| WEBHOOK_URL | The URL to send the webhook to. This is chat application specific. | + +We'll need the following CLI applications installed + +| Name | Description | +| ------ | --------------------------------------------------- | +| `curl` | Sends the POST request. | +| `jq` | Sanitizes the log output before sending it to curl. | + +The script `/bin/webhook-notify.sh` + +```bash +#!/bin/bash + +if [ -z "$SERVICE" ]; then + echo "SERVICE variable not set or empty" + exit 1 +fi + +if [ -z "$WEBHOOK_URL" ]; then + echo "WEBHOOK_URL variable not set or empty" + exit 1 +fi + +if ! command -v jq &> /dev/null; then + echo "jq is not installed" + exit 1 +fi + +LOG_CONTENTS=$(systemctl status --full --no-pager ${SERVICE} | jq -Rsa .) + +curl -X POST --data-urlencode "payload={\"text\": $LOG_CONTENTS}" ${WEBHOOK_URL} +``` + +Make the script executable + +```bash +chmod u+x /bin/webhook-notify.sh +``` + +At this point you should be able to test out the script and make sure you get notifications. Set the two environmental variables and run the script. + +Example: + +```bash +export WEBHOOK_URL="https://INSERT-NAMESPCE.zulipchat.com/api/v1/external/slack_incoming?api_key=INSERT-API-KEY&stream=INSERT-STREAM-ID&topic=Systemd" +export SERVICE=NetworkManager +/bin/webhook-notify.sh +``` + +**Step 2:** Setup the Systemd Service + +When a systemd unit fails, we are able to call another systemd service. The service that we'll call will run our script from the last step. + +In `/etc/systemd/system/webhook-notify@.service` + +```ini +[Unit] +Description=Send Systemd Notifications via Webhook + +[Service] +Type=oneshot +Environment=WEBHOOK_URL="INSERT-WEBHOOK-URL-HERE" +Environment=SERVICE=%i +ExecStart=/bin/webhook-notify.sh + +[Install] +WantedBy=multi-user.target +``` + +Note the `@` in the filename. This is important since this service will run with the failed unit name as the argument that appears after the `@`. Within the script, this is the `%i` variable. + +Example test: + +```bash +sudo systemctl start webhook-notify@NetworkManager +``` + +**Step 3:** Add `OnFailure` to all the services we want to monitor + +Within the `[Unit]` section of our Systemd service, add the following + +```ini +OnFailure=webhook-notify@%i.service +``` +