mirror of
https://github.com/Brandon-Rozek/website.git
synced 2025-07-11 06:43:10 +00:00
New post
This commit is contained in:
parent
303a045f87
commit
6173816ab3
1 changed files with 108 additions and 0 deletions
108
content/blog/please-monitor-disk-usage.md
Normal file
108
content/blog/please-monitor-disk-usage.md
Normal file
|
@ -0,0 +1,108 @@
|
||||||
|
---
|
||||||
|
title: "Please monitor disk usage"
|
||||||
|
date: 2024-11-26T19:59:10-05:00
|
||||||
|
draft: false
|
||||||
|
tags: []
|
||||||
|
math: false
|
||||||
|
medium_enabled: false
|
||||||
|
---
|
||||||
|
|
||||||
|
You know one of the worst errors to deal with on Linux?
|
||||||
|
|
||||||
|
> No space left on device
|
||||||
|
|
||||||
|
Why? Because recovery becomes really annoying. Depending on your luck, Linux may try to cache to disk even when it's not possible causing several commands to fail.
|
||||||
|
|
||||||
|
If you're already in this situation, the best thing you can do is try to locate files to remove. You can run `du -sh *` in any given directory to see the sizes of files and subfolders.
|
||||||
|
|
||||||
|
Common places that hold temporary files which can likely be removed are:
|
||||||
|
|
||||||
|
- `/tmp`
|
||||||
|
- `~/.cache`
|
||||||
|
|
||||||
|
An even better solution is to not get into this situation in the first place. For that, I introduce a bash script which sends a notification when the disk is getting full!
|
||||||
|
|
||||||
|
In order to see the amount of available and total space for a given `$MOUNTPOINT` (for example, `/`), we run the following:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
available_space=$(df "$MOUNTPOINT" | awk 'NR==2 {print $4}')
|
||||||
|
total_space=$(df "$MOUNTPOINT" | awk 'NR==2 {print $2}')
|
||||||
|
```
|
||||||
|
|
||||||
|
Add a couple if statements and we have ourselves a full-blown script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
|
||||||
|
MAX_USAGE_PERCENT=90
|
||||||
|
|
||||||
|
if [ -z "$MOUNTPOINT" ]; then
|
||||||
|
echo "MOUNTPOINT variable not set or empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get the available and total disk space for the specified mount point
|
||||||
|
available_space=$(df "$MOUNTPOINT" | awk 'NR==2 {print $4}')
|
||||||
|
total_space=$(df "$MOUNTPOINT" | awk 'NR==2 {print $2}')
|
||||||
|
|
||||||
|
# Check if the df command was successful
|
||||||
|
if [ -z "$available_space" ] || [ -z "$total_space" ]; then
|
||||||
|
echo "Error: Could not retrieve disk space for $MOUNTPOINT"
|
||||||
|
sendMsg "Error: Could not retrieve disk space for $MOUNTPOINT"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
usage_percent=$(( (total_space - available_space) * 100 / total_space ))
|
||||||
|
|
||||||
|
if [ $usage_percent -ge $MAX_USAGE_PERCENT ]; then
|
||||||
|
host_name=$(hostname)
|
||||||
|
echo "Low Disk on $host_name at mountpoint $MOUNTPOINT. Currently using ${usage_percent}% of available space."
|
||||||
|
sendMsg "Low Disk on $host_name at mountpoint $MOUNTPOINT. Currently using ${usage_percent}% of available space."
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Mountpoint $MOUNTPOINT is currently using ${usage_percent}% of available space."
|
||||||
|
```
|
||||||
|
|
||||||
|
The only part left undefined here is the `sendMsg` function. For me, I send a [webhook notification](https://brandonrozek.com/blog/webhook-notifications-on-systemd-service-failure/) to Zulip in order to both get notified and have a log of these messages.
|
||||||
|
|
||||||
|
To have this check regularly automatically, we create a systemd service and timer files.
|
||||||
|
|
||||||
|
`/etc/systemd/system/lowdiskcheck.service`
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Check for low disk space
|
||||||
|
Requires=network-online.target
|
||||||
|
Wants=
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
# Feel free to change the mountpoint to one that you care about
|
||||||
|
Environment=MOUNTPOINT=/home
|
||||||
|
ExecStart=/usr/local/bin/lowdiskcheck.sh
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
`/etc/systemd/system/lowdiskcheck.timer`
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Check for low disk space daily
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=daily
|
||||||
|
Persistent=true
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
|
```
|
||||||
|
|
||||||
|
Then enable the timer,
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl enable lowdiskcheck.timer
|
||||||
|
```
|
||||||
|
|
Loading…
Add table
Reference in a new issue