Extra Learning · Reference Guide

L2 Scripts &
Commands Reference

Six essential scripts every L2 engineer writes and runs — from auto-restarting a failed service to scheduling health checks with cron. Every script is broken down line by line so you know exactly what each part does and when to use it.

Service Restart Log Error Count Disk Health CPU Check Backup Logs Cron Health Check

01 How Scripts Are Structured Here

Reading Guide

Each script card gives you four things:

What it does — plain English explanation of the purpose.
Full script — the complete, runnable bash code you copy to Kali.
Line-by-line breakdown — what every part of the script means.
When to use it — the real L2 scenario where this script saves you time.

02 The 6 Core L2 Scripts

Service Restart

Auto-restart a service if it is down

Auto-recovery

This script checks whether a service is running. If it is down — it restarts it automatically. This is the foundation of any service monitoring script. In production this would run via crontab every 5 minutes so crashed services recover without manual intervention.

service-restart.sh

#!/bin/bash
# Checks if a service is running — restarts it if not

SERVICE="app" # replace with your service name e.g. nginx, rabbitmq

if ! systemctl is-active --quiet $SERVICE; then
  echo "$SERVICE is down. Restarting..."
  systemctl restart $SERVICE
  echo "$SERVICE restarted at $(date)"
else
  echo "$SERVICE is running. No action needed."
fi

Line-by-Line Breakdown

SERVICE="app"

Variable — set the service name here. Change to nginx, rabbitmq, payment-service etc.

if ! systemctl is-active

if ! means "if NOT." systemctl is-active --quiet checks if the service is running silently. If NOT running — go into the if block.

systemctl restart

The restart command. Sends a stop signal then starts the service again cleanly.

$(date)

Inserts the current date and time into the log message so you know exactly when the restart happened.

else ... fi

If the service IS running — print "running" and do nothing. fi closes the if block.

When to use this

Schedule this via crontab every 5 minutes. When a service crashes at 3 AM — it auto-recovers before the next check cycle. You wake up to a log showing "restarted at 03:22" instead of an outage that lasted until 9 AM. Also useful on bridge calls — run it manually to confirm a service restart completed.

Log Error Count

Count ERROR lines in a log file

Log Analysis

The most-used log command in daily L2 work. It searches the application log for the word ERROR and counts how many lines contain it. The result tells you in one number how many errors occurred — the starting point of every investigation.

log-error-count.sh

#!/bin/bash
# Count ERROR lines in a log file

LOG="/var/log/app.log"

# Basic one-liner — count and show the number
grep -i error "$LOG" | wc -l

# Full version — with verdict
COUNT=$(grep -i "error" "$LOG" | wc -l)
echo "Error count: $COUNT"

[ "$COUNT" -gt 5 ] && echo "ACTION REQUIRED" || echo "OK"

# Bonus — show what types of errors appeared
grep -i "error" "$LOG" | awk '{print $4}' | sort | uniq -c | sort -rn

Line-by-Line Breakdown

grep -i error

grep searches for a pattern. -i makes it case-insensitive — finds ERROR, Error, error. Without -i it only finds exact case matches.

| wc -l

The pipe | sends grep's output to wc -l which counts the number of lines. Together: count how many error lines.

[ "$COUNT" -gt 5 ]

A condition check — -gt means "greater than." If COUNT is more than 5 → print ACTION REQUIRED, otherwise print OK.

awk '{print $4}'

awk prints the 4th word of each line — in a structured log that is usually the error type (e.g. DB_CONNECTION_TIMEOUT).

sort | uniq -c | sort -rn

Sort alphabetically, count duplicates with uniq -c, then sort numerically reversed so the most common error appears first.

When to use this

Run every morning as part of the health checklist. Also run immediately when an alert fires — the error count and type breakdown tells you within 5 seconds whether it is a DB issue, a network issue, or an application bug. Forms the first line of every RCA.

Disk Health

Check disk usage and alert if above threshold

Resource Check

Disk filling up is the most common silent killer in fintech — it fills slowly and then everything breaks at once. This script checks the disk usage percentage and prints a warning if it is above the threshold. The awk command extracts just the number so you can compare it in an if statement.

disk-health.sh

#!/bin/bash
# Check disk usage — alert if above 80%

# One-liner from the screenshot
df -h | awk '$5 > 80 {print $0}'

# Full version — with threshold and verdict
THRESHOLD=80
DISK_USED=$(df / | awk 'NR==2 {print $5}' | tr -d '%')

echo "Disk usage: $DISK_USED%"

if [ "$DISK_USED" -ge 90 ]; then
  echo "CRITICAL: Disk at $DISK_USED% — act immediately"
elif [ "$DISK_USED" -ge $THRESHOLD ]; then
  echo "WARNING: Disk at $DISK_USED% — plan cleanup"
else
  echo "OK: Disk at $DISK_USED%"
fi

# Find what is eating disk space
du -sh /var/log/* 2>/dev/null | sort -rh | head -5

Line-by-Line Breakdown

df -h

df = disk free. -h = human-readable sizes (GB, MB). Shows all partitions.

awk '$5 > 80 {print $0}'

The screenshot's one-liner. $5 is the 5th column (Use%). If it is above 80 — print the whole line ($0). This filters df output to only show over-80% partitions.

NR==2 {print $5}

NR==2 = "only the second row" (skip the header). $5 = the Use% column. Gets just the percentage number for the root partition.

tr -d '%'

Removes the % character from the number so bash can do math with it. Without this — "81%" cannot be compared to 80.

du -sh /var/log/*

du = disk usage. -s = summary (one line per folder). -h = human-readable. Shows the size of every folder in /var/log.

When to use this

Run first thing every morning. Always check disk before CPU or memory — a full disk causes everything else to break. When disk is critical, run the du command to find which folder is the culprit — almost always old log files in /var/log.

CPU Check

Show top CPU-consuming processes

Process Analysis

When CPU is high, you need to find which process is eating it — fast. This command lists all running processes sorted by CPU usage, highest first. The top entry is your suspect. You then check how long it has been running and decide whether to wait, kill, or escalate.

cpu-check.sh

#!/bin/bash
# Show top CPU processes

# The core command from the screenshot
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head

# Full version — with load average and verdict
echo "=== CPU Check — $(date) ==="
echo "Load average: $(cat /proc/loadavg | awk '{print $1, $2, $3}')"
echo ""
echo "Top 5 processes by CPU:"
ps -eo pid,user,cmd,%cpu,%mem --sort=-%cpu | head -6

# Check CPU idle — below 10% means overloaded
echo ""
top -bn1 | grep "Cpu(s)" | awk '{print "CPU idle: "$8}'

# How long has the top process been running?
TOP_PID=$(ps -eo pid --sort=-%cpu | sed -n '2p')
ps -p "$TOP_PID" -o pid,etime,pcpu,cmd 2>/dev/null

Line-by-Line Breakdown

ps -eo

ps = process status. -e = show every process. -o = choose which columns to display. You choose pid, ppid, cmd, %mem, %cpu.

--sort=-%cpu

Sort by CPU percentage descending (the - before %cpu means highest first). The top row = the most CPU-hungry process.

| head

Shows only the first 10 lines (header + top 9 processes). Add -6 for top 5. Stops the screen from filling with hundreds of processes.

cat /proc/loadavg

Reads the system load average from the kernel. The three numbers are 1-minute, 5-minute, 15-minute averages. Above 1.0 per CPU core = system is overloaded.

ps -p "$TOP_PID" -o etime

Gets the elapsed time (how long running) for a specific PID. etime shows hours:minutes:seconds. A process running for 3 hours when it should take 5 minutes = stuck.

When to use this

Run immediately when CPU is above 90% on the dashboard. The top process is almost always your root cause. Check the elapsed time — if it has been running far longer than expected, it is stuck in a loop. Get bridge approval then kill -15 PID to stop it gracefully.

Backup Logs

Compress and archive log files

Backup

Log files grow continuously. Left unchecked they fill the disk. This script uses tar and gzip to compress the log folder into a single timestamped archive, freeing space without losing the data. The archived file can be restored at any time for incident investigation.

backup-logs.sh

#!/bin/bash
# Compress logs — from the screenshot
tar -czf logs.tar.gz /var/log/app

# Full version — timestamped backup with cleanup
LOG_DIR="/var/log/app"
BACKUP_DIR="$HOME/backups"
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
BACKUP_FILE="$BACKUP_DIR/logs-$TIMESTAMP.tar.gz"

mkdir -p "$BACKUP_DIR"

echo "Creating backup: $BACKUP_FILE"
tar -czf "$BACKUP_FILE" "$LOG_DIR"

echo "Size: $(du -sh $BACKUP_FILE | awk '{print $1}')"

# Delete backups older than 7 days
find "$BACKUP_DIR" -name "logs-*.tar.gz" -mtime +7 -delete
echo "Done. Old backups cleaned."

Line-by-Line Breakdown

tar -czf

tar = tape archive. Flags: -c create, -z compress with gzip, -f filename follows. Together = create a compressed archive.

logs.tar.gz

The output filename. .tar = the bundle, .gz = gzip compressed. Together = compressed archive format used everywhere in Linux.

$(date +"%Y-%m-%d_%H-%M-%S")

Inserts the current timestamp into the filename. Creates unique filenames like logs-2024-03-15_14-30-00.tar.gz so backups never overwrite each other.

mkdir -p

Creates the backup folder if it does not exist. -p means no error if the folder already exists.

find -mtime +7 -delete

Find files modified more than 7 days ago and delete them. Prevents the backup folder itself from filling up with old archives.

When to use this

Run this when disk is above 80% and logs are the cause. Also schedule it nightly via crontab (midnight) so logs are automatically archived before they cause a disk full incident. The 7-day cleanup ensures you always have a week of history for investigations.

Cron Health Check

Schedule a health script to run automatically

Automation

Cron is the Linux scheduler. It runs scripts automatically at defined intervals — without you doing anything. This is what turns a one-time script into a continuous monitoring system. The cron expression defines exactly when the script runs.

crontab entry — from the screenshot

# The screenshot's cron entry
*/5 * * * * /home/ops/healthcheck.sh

# What each field means:
# ┌── minute (*/5 = every 5 minutes)
# │ ┌── hour (* = every hour)
# │ │ ┌── day of month (* = every day)
# │ │ │ ┌── month (* = every month)
# │ │ │ │ ┌── day of week (* = every day)
# │ │ │ │ │
# */5 * * * * /home/ops/healthcheck.sh

# Common cron schedules
*/5 * * * * /home/ops/healthcheck.sh # every 5 min
0 * * * * /home/ops/log-check.sh # every hour
0 9 * * 1-5 /home/ops/morning-drill.sh # 9 AM weekdays
0 0 * * * /home/ops/backup-logs.sh # midnight daily

# Add a cron job
crontab -e # opens cron editor
crontab -l # lists all scheduled jobs
crontab -r # removes all jobs (use carefully)

Cron Expression Breakdown

*/5

*/5 in the minute field = every 5 minutes. The * means "every" and /5 means "divided by 5" — so every 5th minute.

An asterisk alone means "every possible value." In the hour field it means every hour. In the day field it means every day.

0 9 * * 1-5

Minute 0, Hour 9, every day, every month, days 1–5 (Monday to Friday). Reads as: run at 09:00 every weekday.

/home/ops/healthcheck.sh

Always use the full absolute path in cron jobs. Cron does not have the same PATH as your terminal so relative paths fail silently.

crontab -e

Opens the cron editor (vi or nano). Add your cron line, save and exit. The job is now scheduled. Use crontab -l to confirm it was saved.

When to use this

Use crontab after every script you write that should run automatically. The service restart script → every 5 minutes. The log error count → every hour. The backup → midnight. The morning drill → 07:45 weekdays. Cron is what turns manual checks into a self-running monitoring system.

03 Bonus — Additional Commands from the Training

💡 These are the most-used individual commands from Weeks 2–6.5. Each one appears repeatedly across different topics. Knowing these by memory makes investigation 10x faster.

Log Investigation

Trace One Transaction

grep "TXN-501" payment-service.log
grep -B 3 "ERROR" payment-service.log

Find all log lines for a specific TXN ID. The -B 3 flag shows 3 lines before each error — reveals the build-up.

Process Control

Kill a Heavy Process

kill -15 PID # graceful stop
kill -9 PID # force stop (last resort)

-15 sends a clean stop signal. The process shuts down properly. -9 forces it immediately — never use on a DB process.

Memory Check

RAM and Swap Status

free -h
free -m | awk 'NR==2{print $7}'
free -m | awk 'NR==3{print $3}'

free -h shows the full picture. The awk commands extract just available RAM and swap used for scripting.

Network

Connectivity and Connections

ping -c 4 8.8.8.8
netstat -an | grep ESTABLISHED | wc -l
netstat -an | grep TIME_WAIT | wc -l

Ping tests basic connectivity. netstat counts open and stuck connections — TIME_WAIT above normal = connection leak.

Backup / Restore

tar Commands

tar -czf backup.tar.gz /folder # create
tar -tzvf backup.tar.gz # list
tar -xzf backup.tar.gz -C /dest # restore

-c create, -z gzip, -f filename, -t list, -x extract, -C destination. These 3 cover 100% of backup operations.

SSL / Cert

Check Certificate Expiry

openssl s_client -connect host:443 2>/dev/null \
| openssl x509 -noout -dates

Checks the TLS certificate of a live server and shows notBefore and notAfter dates. Immediately tells you if a cert is expired or about to expire.

Bash

L2-Bash Scripting

Common commands and examples
Open L2-Bash Scripting

Mobile-first, center-aligned reference for L2 Bash scripts — optimized for iOS and Android screens while matching the site's colour and card style.

✅ L2 Scripts & Commands — What I Know

Service Restart — if ! systemctl is-active checks if a service is down and restarts it automatically — schedule every 5 minutes via crontab
Log Error Count — grep -i error /var/log/app.log | wc -l counts errors and sort | uniq -c shows the breakdown by type
Disk Health — df -h | awk '$5 > 80 {print $0}' filters partitions above 80% — always run first in any investigation
CPU Check — ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head shows top processes by CPU — then check etime to see how long they have been running
Backup Logs — tar -czf logs.tar.gz /var/log/app compresses logs — timestamped backups with 7-day auto-cleanup prevent disk full incidents
Cron Health Check — */5 * * * * /home/ops/healthcheck.sh schedules any script to run automatically — the 5-field cron syntax controls when
grep -B 3 "ERROR" — shows 3 lines before each error, revealing the WARN build-up that caused it — most useful single grep flag in L2 work
kill -15 PID for graceful stop, kill -9 only as last resort, never on a DB process — openssl s_client to check SSL certificate expiry on live servers

L2 Scripts &Commands Reference

Trace One Transaction

Kill a Heavy Process

RAM and Swap Status

Connectivity and Connections

tar Commands

Check Certificate Expiry

L2-Bash Scripting

✅ L2 Scripts & Commands — What I Know

L2 Scripts &
Commands Reference