L2 Support Engineer · Fintech · Extra Learning
Scripts & Commands
Extra Learning · Reference Guide

L2 Scripts &
Commands Reference

Six essential scripts every L2 engineer writes and runs — from auto-restarting a failed service to scheduling health checks with cron. Every script is broken down line by line so you know exactly what each part does and when to use it.

Service Restart Log Error Count Disk Health CPU Check Backup Logs Cron Health Check
01 How Scripts Are Structured Here
Reading Guide

Each script card gives you four things:

What it does — plain English explanation of the purpose.
Full script — the complete, runnable bash code you copy to Kali.
Line-by-line breakdown — what every part of the script means.
When to use it — the real L2 scenario where this script saves you time.

02 The 6 Core L2 Scripts
01
Service Restart
Auto-restart a service if it is down
Auto-recovery
This script checks whether a service is running. If it is down — it restarts it automatically. This is the foundation of any service monitoring script. In production this would run via crontab every 5 minutes so crashed services recover without manual intervention.
service-restart.sh
#!/bin/bash
# Checks if a service is running — restarts it if not

SERVICE="app" # replace with your service name e.g. nginx, rabbitmq

if ! systemctl is-active --quiet $SERVICE; then
  echo "$SERVICE is down. Restarting..."
  systemctl restart $SERVICE
  echo "$SERVICE restarted at $(date)"
else
  echo "$SERVICE is running. No action needed."
fi
Line-by-Line Breakdown
SERVICE="app"
Variable — set the service name here. Change to nginx, rabbitmq, payment-service etc.
if ! systemctl is-active
if ! means "if NOT." systemctl is-active --quiet checks if the service is running silently. If NOT running — go into the if block.
systemctl restart
The restart command. Sends a stop signal then starts the service again cleanly.
$(date)
Inserts the current date and time into the log message so you know exactly when the restart happened.
else ... fi
If the service IS running — print "running" and do nothing. fi closes the if block.
When to use this
Schedule this via crontab every 5 minutes. When a service crashes at 3 AM — it auto-recovers before the next check cycle. You wake up to a log showing "restarted at 03:22" instead of an outage that lasted until 9 AM. Also useful on bridge calls — run it manually to confirm a service restart completed.
02
Log Error Count
Count ERROR lines in a log file
Log Analysis
The most-used log command in daily L2 work. It searches the application log for the word ERROR and counts how many lines contain it. The result tells you in one number how many errors occurred — the starting point of every investigation.
log-error-count.sh
#!/bin/bash
# Count ERROR lines in a log file

LOG="/var/log/app.log"

# Basic one-liner — count and show the number
grep -i error "$LOG" | wc -l

# Full version — with verdict
COUNT=$(grep -i "error" "$LOG" | wc -l)
echo "Error count: $COUNT"

[ "$COUNT" -gt 5 ] && echo "ACTION REQUIRED" || echo "OK"

# Bonus — show what types of errors appeared
grep -i "error" "$LOG" | awk '{print $4}' | sort | uniq -c | sort -rn
Line-by-Line Breakdown
grep -i error
grep searches for a pattern. -i makes it case-insensitive — finds ERROR, Error, error. Without -i it only finds exact case matches.
| wc -l
The pipe | sends grep's output to wc -l which counts the number of lines. Together: count how many error lines.
[ "$COUNT" -gt 5 ]
A condition check — -gt means "greater than." If COUNT is more than 5 → print ACTION REQUIRED, otherwise print OK.
awk '{print $4}'
awk prints the 4th word of each line — in a structured log that is usually the error type (e.g. DB_CONNECTION_TIMEOUT).
sort | uniq -c | sort -rn
Sort alphabetically, count duplicates with uniq -c, then sort numerically reversed so the most common error appears first.
When to use this
Run every morning as part of the health checklist. Also run immediately when an alert fires — the error count and type breakdown tells you within 5 seconds whether it is a DB issue, a network issue, or an application bug. Forms the first line of every RCA.
03
Disk Health
Check disk usage and alert if above threshold
Resource Check
Disk filling up is the most common silent killer in fintech — it fills slowly and then everything breaks at once. This script checks the disk usage percentage and prints a warning if it is above the threshold. The awk command extracts just the number so you can compare it in an if statement.
disk-health.sh
#!/bin/bash
# Check disk usage — alert if above 80%

# One-liner from the screenshot
df -h | awk '$5 > 80 {print $0}'

# Full version — with threshold and verdict
THRESHOLD=80
DISK_USED=$(df / | awk 'NR==2 {print $5}' | tr -d '%')

echo "Disk usage: $DISK_USED%"

if [ "$DISK_USED" -ge 90 ]; then
  echo "CRITICAL: Disk at $DISK_USED% — act immediately"
elif [ "$DISK_USED" -ge $THRESHOLD ]; then
  echo "WARNING: Disk at $DISK_USED% — plan cleanup"
else
  echo "OK: Disk at $DISK_USED%"
fi

# Find what is eating disk space
du -sh /var/log/* 2>/dev/null | sort -rh | head -5
Line-by-Line Breakdown
df -h
df = disk free. -h = human-readable sizes (GB, MB). Shows all partitions.
awk '$5 > 80 {print $0}'
The screenshot's one-liner. $5 is the 5th column (Use%). If it is above 80 — print the whole line ($0). This filters df output to only show over-80% partitions.
NR==2 {print $5}
NR==2 = "only the second row" (skip the header). $5 = the Use% column. Gets just the percentage number for the root partition.
tr -d '%'
Removes the % character from the number so bash can do math with it. Without this — "81%" cannot be compared to 80.
du -sh /var/log/*
du = disk usage. -s = summary (one line per folder). -h = human-readable. Shows the size of every folder in /var/log.
When to use this
Run first thing every morning. Always check disk before CPU or memory — a full disk causes everything else to break. When disk is critical, run the du command to find which folder is the culprit — almost always old log files in /var/log.
04
CPU Check
Show top CPU-consuming processes
Process Analysis
When CPU is high, you need to find which process is eating it — fast. This command lists all running processes sorted by CPU usage, highest first. The top entry is your suspect. You then check how long it has been running and decide whether to wait, kill, or escalate.
cpu-check.sh
#!/bin/bash
# Show top CPU processes

# The core command from the screenshot
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head

# Full version — with load average and verdict
echo "=== CPU Check — $(date) ==="
echo "Load average: $(cat /proc/loadavg | awk '{print $1, $2, $3}')"
echo ""
echo "Top 5 processes by CPU:"
ps -eo pid,user,cmd,%cpu,%mem --sort=-%cpu | head -6

# Check CPU idle — below 10% means overloaded
echo ""
top -bn1 | grep "Cpu(s)" | awk '{print "CPU idle: "$8}'

# How long has the top process been running?
TOP_PID=$(ps -eo pid --sort=-%cpu | sed -n '2p')
ps -p "$TOP_PID" -o pid,etime,pcpu,cmd 2>/dev/null
Line-by-Line Breakdown
ps -eo
ps = process status. -e = show every process. -o = choose which columns to display. You choose pid, ppid, cmd, %mem, %cpu.
--sort=-%cpu
Sort by CPU percentage descending (the - before %cpu means highest first). The top row = the most CPU-hungry process.
| head
Shows only the first 10 lines (header + top 9 processes). Add -6 for top 5. Stops the screen from filling with hundreds of processes.
cat /proc/loadavg
Reads the system load average from the kernel. The three numbers are 1-minute, 5-minute, 15-minute averages. Above 1.0 per CPU core = system is overloaded.
ps -p "$TOP_PID" -o etime
Gets the elapsed time (how long running) for a specific PID. etime shows hours:minutes:seconds. A process running for 3 hours when it should take 5 minutes = stuck.
When to use this
Run immediately when CPU is above 90% on the dashboard. The top process is almost always your root cause. Check the elapsed time — if it has been running far longer than expected, it is stuck in a loop. Get bridge approval then kill -15 PID to stop it gracefully.
05
Backup Logs
Compress and archive log files
Backup
Log files grow continuously. Left unchecked they fill the disk. This script uses tar and gzip to compress the log folder into a single timestamped archive, freeing space without losing the data. The archived file can be restored at any time for incident investigation.
backup-logs.sh
#!/bin/bash
# Compress logs — from the screenshot
tar -czf logs.tar.gz /var/log/app

# Full version — timestamped backup with cleanup
LOG_DIR="/var/log/app"
BACKUP_DIR="$HOME/backups"
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
BACKUP_FILE="$BACKUP_DIR/logs-$TIMESTAMP.tar.gz"

mkdir -p "$BACKUP_DIR"

echo "Creating backup: $BACKUP_FILE"
tar -czf "$BACKUP_FILE" "$LOG_DIR"

echo "Size: $(du -sh $BACKUP_FILE | awk '{print $1}')"

# Delete backups older than 7 days
find "$BACKUP_DIR" -name "logs-*.tar.gz" -mtime +7 -delete
echo "Done. Old backups cleaned."
Line-by-Line Breakdown
tar -czf
tar = tape archive. Flags: -c create, -z compress with gzip, -f filename follows. Together = create a compressed archive.
logs.tar.gz
The output filename. .tar = the bundle, .gz = gzip compressed. Together = compressed archive format used everywhere in Linux.
$(date +"%Y-%m-%d_%H-%M-%S")
Inserts the current timestamp into the filename. Creates unique filenames like logs-2024-03-15_14-30-00.tar.gz so backups never overwrite each other.
mkdir -p
Creates the backup folder if it does not exist. -p means no error if the folder already exists.
find -mtime +7 -delete
Find files modified more than 7 days ago and delete them. Prevents the backup folder itself from filling up with old archives.
When to use this
Run this when disk is above 80% and logs are the cause. Also schedule it nightly via crontab (midnight) so logs are automatically archived before they cause a disk full incident. The 7-day cleanup ensures you always have a week of history for investigations.
06
Cron Health Check
Schedule a health script to run automatically
Automation
Cron is the Linux scheduler. It runs scripts automatically at defined intervals — without you doing anything. This is what turns a one-time script into a continuous monitoring system. The cron expression defines exactly when the script runs.
crontab entry — from the screenshot
# The screenshot's cron entry
*/5 * * * * /home/ops/healthcheck.sh

# What each field means:
# ┌── minute (*/5 = every 5 minutes)
# │ ┌── hour (* = every hour)
# │ │ ┌── day of month (* = every day)
# │ │ │ ┌── month (* = every month)
# │ │ │ │ ┌── day of week (* = every day)
# │ │ │ │ │
# */5 * * * * /home/ops/healthcheck.sh

# Common cron schedules
*/5 * * * * /home/ops/healthcheck.sh # every 5 min
0 * * * * /home/ops/log-check.sh # every hour
0 9 * * 1-5 /home/ops/morning-drill.sh # 9 AM weekdays
0 0 * * * /home/ops/backup-logs.sh # midnight daily

# Add a cron job
crontab -e # opens cron editor
crontab -l # lists all scheduled jobs
crontab -r # removes all jobs (use carefully)
Cron Expression Breakdown
*/5
*/5 in the minute field = every 5 minutes. The * means "every" and /5 means "divided by 5" — so every 5th minute.
*
An asterisk alone means "every possible value." In the hour field it means every hour. In the day field it means every day.
0 9 * * 1-5
Minute 0, Hour 9, every day, every month, days 1–5 (Monday to Friday). Reads as: run at 09:00 every weekday.
/home/ops/healthcheck.sh
Always use the full absolute path in cron jobs. Cron does not have the same PATH as your terminal so relative paths fail silently.
crontab -e
Opens the cron editor (vi or nano). Add your cron line, save and exit. The job is now scheduled. Use crontab -l to confirm it was saved.
When to use this
Use crontab after every script you write that should run automatically. The service restart script → every 5 minutes. The log error count → every hour. The backup → midnight. The morning drill → 07:45 weekdays. Cron is what turns manual checks into a self-running monitoring system.
03 Bonus — Additional Commands from the Training
💡 These are the most-used individual commands from Weeks 2–6.5. Each one appears repeatedly across different topics. Knowing these by memory makes investigation 10x faster.
Log Investigation

Trace One Transaction

grep "TXN-501" payment-service.log
grep -B 3 "ERROR" payment-service.log

Find all log lines for a specific TXN ID. The -B 3 flag shows 3 lines before each error — reveals the build-up.

Process Control

Kill a Heavy Process

kill -15 PID # graceful stop
kill -9 PID # force stop (last resort)

-15 sends a clean stop signal. The process shuts down properly. -9 forces it immediately — never use on a DB process.

Memory Check

RAM and Swap Status

free -h
free -m | awk 'NR==2{print $7}'
free -m | awk 'NR==3{print $3}'

free -h shows the full picture. The awk commands extract just available RAM and swap used for scripting.

Network

Connectivity and Connections

ping -c 4 8.8.8.8
netstat -an | grep ESTABLISHED | wc -l
netstat -an | grep TIME_WAIT | wc -l

Ping tests basic connectivity. netstat counts open and stuck connections — TIME_WAIT above normal = connection leak.

Backup / Restore

tar Commands

tar -czf backup.tar.gz /folder # create
tar -tzvf backup.tar.gz # list
tar -xzf backup.tar.gz -C /dest # restore

-c create, -z gzip, -f filename, -t list, -x extract, -C destination. These 3 cover 100% of backup operations.

SSL / Cert

Check Certificate Expiry

openssl s_client -connect host:443 2>/dev/null \
| openssl x509 -noout -dates

Checks the TLS certificate of a live server and shows notBefore and notAfter dates. Immediately tells you if a cert is expired or about to expire.

Bash

L2-Bash Scripting

Common commands and examples
Open L2-Bash Scripting

Mobile-first, center-aligned reference for L2 Bash scripts — optimized for iOS and Android screens while matching the site's colour and card style.

✅ L2 Scripts & Commands — What I Know