← Experiments
#001
SMS Monitoring Node.js Ops
Complete Jan 2026

SMS-Based Server Downtime
Alert System

using the 46elks API · one npm dependency · zero platforms

Every production system goes down eventually. The question is whether you find out in thirty seconds or thirty minutes.

I had a small VPS running a few personal services — nothing critical, but things I wanted to stay up. I was already paying for uptime monitoring via a third-party dashboard, but the alert channel was email. Email requires me to have a client open. It requires me to be at a desk. For a 3am incident, it's effectively silent.

PagerDuty and similar platforms solve this, but they introduce a managed layer between my system and my phone. I don't want a platform. I want an SMS that fires the moment a threshold is crossed.

The 46elks SMS API is a single authenticated POST request. The question was: can I build a reliable, low-noise downtime alerting system on top of it in an afternoon, with no external dependencies beyond the API itself?

A Node.js health check loop with a consecutive-failure threshold, combined with a direct call to the 46elks SMS API, can deliver a downtime alert to my phone within 90–120 seconds of a real outage — with a low enough false-positive rate to be trustworthy.

The threshold matters. A single failed check shouldn't send an alert. Three consecutive failures should.

  ┌───────────────────────────────────────────────────┐
  │  Monitoring Process (Node.js, separate VPS)       │
  │                                                   │
  │  setInterval(30s)                                 │
  │       │                                           │
  │       ▼                                           │
  │  HTTP GET → target/health                         │
  │       │                                           │
  │       ├── 200 OK ──────────────────► reset state  │
  │       │                                           │
  │       └── timeout / 5xx / error                  │
  │               │                                   │
  │               ▼                                   │
  │       consecutiveFailures++                       │
  │               │                                   │
  │     ┌─────────┴─────────┐                         │
  │     │ failures < 3      │ failures >= 3           │
  │     │ do nothing        │ AND isDown === false    │
  │     └───────────────────┘         │               │
  │                                   ▼               │
  │                    POST api.46elks.com/a1/sms     │
  │                                   │               │
  │                         isDown = true             │
  │                         downSince = Date.now()    │
  └───────────────────────────────────────────────────┘
                                      │
                                      ▼
                             46elks SMS gateway
                                      │
                                      ▼
                          +46 XXX XXX XXX (my phone)
                          SMS delivered ~1.5–3s

  Recovery path:
  Next successful check after isDown === true
       │
       ▼
  POST /a1/sms — "[RECOVERED] back online after N min"
  isDown = false, consecutiveFailures = 0
Component Choice Reason
Runtime Node.js 20 LTS No dependency overhead, built-in HTTPS
HTTP client Node built-in https Zero dependencies — the API doesn't need a library
Scheduler setInterval No cron daemon needed, restarts cleanly
Config dotenv Only external dependency. No credentials in code.
Alert channel 46elks SMS API Direct REST, basic auth, no platform
Monitor host Separate VPS Monitor must survive what it monitors
Process manager pm2 Auto-restart, log rotation, startup on boot
  1. Provision the monitor host on separate infrastructure. Running the monitor on the same server as the monitored service is the first mistake. I used a $4/mo VPS from a different provider. If the primary host dies, the monitor survives.
  2. Define a /health endpoint on the target server. Don't ping the root path. A /health endpoint returns a deliberate 200 with no side effects. Mine returns {"status":"ok","ts":1736330400}. If the endpoint is slow or broken, that's a real signal.
  3. Write the check loop with hysteresis. The failure threshold absorbs transient blips — a DNS hiccup, a brief network stall. Three consecutive failures over 90 seconds means something is genuinely wrong. Without this, false positives make the alert untrustworthy within 24 hours.
  4. Implement a cooldown on repeat alerts. If the server stays down, you don't want an SMS every 30 seconds. A 10-minute cooldown sends a "still down" reminder after sustained outages without flooding your phone.
  5. Always send a recovery alert. The down alert opens a loop. The recovery alert closes it. Without it you're left manually checking whether the server came back, or whether your intervention actually worked.
  6. Run with pm2. pm2 start monitor.js --name monitor then pm2 save && pm2 startup. Auto-restart on crash, log rotation included, survives server reboots.
monitor.js
// monitor.js — adham46elks.com/experiments/001
require('dotenv').config();
const https = require('https');
const { URL } = require('url');

const CONFIG = {
  targetUrl:        process.env.TARGET_URL,
  alertPhone:       process.env.ALERT_PHONE,
  elksUser:         process.env.ELKS_API_USER,
  elksPassword:     process.env.ELKS_API_PASSWORD,
  fromName:         process.env.ALERT_FROM        || 'Monitor',
  checkIntervalMs:  Number(process.env.CHECK_INTERVAL_MS)  || 30_000,
  failureThreshold: Number(process.env.FAILURE_THRESHOLD)  || 3,
  timeoutMs:        Number(process.env.TIMEOUT_MS)         || 5_000,
  cooldownMs:       Number(process.env.COOLDOWN_MS)        || 10 * 60_000,
};

let consecutiveFailures = 0;
let isDown              = false;
let downSince           = null;
let lastAlertAt         = 0;

// ── HTTP check ──────────────────────────────────────────────

function httpCheck(targetUrl, timeoutMs) {
  return new Promise((resolve, reject) => {
    const parsed = new URL(targetUrl);
    const lib    = parsed.protocol === 'https:' ? https : require('http');
    const req    = lib.get(targetUrl, (res) => {
      resolve(res.statusCode);
      res.resume(); // drain to free socket
    });
    const timer = setTimeout(() => {
      req.destroy(new Error('timeout'));
    }, timeoutMs);
    req.on('error', (err) => { clearTimeout(timer); reject(err); });
    req.on('close', ()    => clearTimeout(timer));
  });
}

// ── 46elks SMS ──────────────────────────────────────────────

function sendSms(message) {
  return new Promise((resolve, reject) => {
    const body = new URLSearchParams({
      from:    CONFIG.fromName,
      to:      CONFIG.alertPhone,
      message,
    }).toString();

    const auth = Buffer
      .from(`${CONFIG.elksUser}:${CONFIG.elksPassword}`)
      .toString('base64');

    const req = https.request({
      hostname: 'api.46elks.com',
      path:     '/a1/sms',
      method:   'POST',
      headers: {
        'Authorization':  `Basic ${auth}`,
        'Content-Type':   'application/x-www-form-urlencoded',
        'Content-Length': Buffer.byteLength(body),
      },
    }, (res) => {
      let raw = '';
      res.on('data', (chunk) => { raw += chunk; });
      res.on('end',  ()      => resolve(JSON.parse(raw)));
    });

    req.on('error', reject);
    req.write(body);
    req.end();
  });
}

// ── Main check loop ─────────────────────────────────────────

async function check() {
  const ts = new Date().toISOString();
  let statusCode;
  let ok = false;

  try {
    statusCode = await httpCheck(CONFIG.targetUrl, CONFIG.timeoutMs);
    ok = statusCode >= 200 && statusCode < 400;
  } catch (err) {
    statusCode = err.message;
  }

  if (!ok) {
    consecutiveFailures++;
    console.log(`[${ts}] FAIL — ${statusCode} (${consecutiveFailures}/${CONFIG.failureThreshold})`);

    const shouldAlert     = consecutiveFailures >= CONFIG.failureThreshold;
    const cooldownExpired = Date.now() - lastAlertAt > CONFIG.cooldownMs;

    if (shouldAlert && !isDown) {
      isDown    = true;
      downSince = new Date();
      const msg = `[DOWN] ${CONFIG.targetUrl} is unreachable. Detected ${ts}.`;
      console.log('→ Sending alert:', msg);
      await sendSms(msg).catch(console.error);
      lastAlertAt = Date.now();

    } else if (isDown && cooldownExpired) {
      const mins = Math.round((Date.now() - downSince.getTime()) / 60_000);
      const msg  = `[STILL DOWN] ${CONFIG.targetUrl} unreachable for ${mins} min.`;
      console.log('→ Sending reminder:', msg);
      await sendSms(msg).catch(console.error);
      lastAlertAt = Date.now();
    }

  } else {
    if (isDown) {
      const mins = Math.round((Date.now() - downSince.getTime()) / 60_000);
      const msg  = `[RECOVERED] ${CONFIG.targetUrl} is back after ${mins} min.`;
      console.log('→ Sending recovery:', msg);
      await sendSms(msg).catch(console.error);
    }
    consecutiveFailures = 0;
    isDown              = false;
    downSince           = null;
    console.log(`[${ts}] OK — ${statusCode}`);
  }
}

// ── Start ────────────────────────────────────────────────────

console.log(`Monitor starting. Target: ${CONFIG.targetUrl}`);
console.log(`Interval: ${CONFIG.checkIntervalMs / 1000}s | Threshold: ${CONFIG.failureThreshold} failures`);

check();
setInterval(check, CONFIG.checkIntervalMs);
.env
# Target
TARGET_URL=https://yourserver.com/health
ALERT_PHONE=+46700000000

# 46elks credentials — dashboard.46elks.com
ELKS_API_USER=your_api_user
ELKS_API_PASSWORD=your_api_password

# Tuning
ALERT_FROM=Monitor
CHECK_INTERVAL_MS=30000
FAILURE_THRESHOLD=3
TIMEOUT_MS=5000
COOLDOWN_MS=600000

Tested over 14 days continuous operation · 2 real incidents captured

MeasurementObserved
Threshold breach → API call < 100ms
46elks API response time 180–320ms
SMS delivery to handset (Swedish number) 1.4–3.2s
Total: outage detected → SMS received ~92–95s worst case
False positives over 14 days Zero
Missed real alerts Zero
npm dependencies 1 (dotenv)

── INCIDENT LOG ──

Incident 12026-01-11 03:17 UTC Cause: VPS OOM kill, nginx process died Detected: 03:18:47 (91s after nginx stopped responding) Alert SMS received: 03:18:49 Resolved: 03:24:12 (manual: pm2 restart nginx) Recovery SMS received: 03:24:25 Total downtime: 6m 25s Incident 22026-01-19 14:02 UTC Cause: Brief network partition on host provider Duration: ~45 seconds Result: Never hit failure threshold — resolved between checks Alert: none sent (correct behaviour)

The 46elks API leg is not the bottleneck. The bottleneck is the intentional 90-second detection window — 3 checks at 30-second intervals. That's a design decision, not an API limitation. The API call itself completes in under 500ms.

  1. No acknowledgment mechanism. When the SMS arrives, there's no way to reply and signal "I'm handling this." That requires inbound SMS and webhook handling — a different experiment.
  2. Polling is not event-driven. The 90-second worst-case detection window is a direct consequence of polling. A dead server can't announce itself. This is an inherent constraint, not an implementation flaw.
  3. SMS delivery is best-effort by protocol. 46elks has strong delivery rates, but SMS is not a guaranteed delivery protocol at the network level. Not suitable as a sole alerting channel for life-critical systems.
  4. Single point of failure on the monitor host. If the monitoring VPS goes down, all alerts stop. You need either a second independent monitor or a periodic heartbeat SMS to catch this.
  5. No escalation path. If the first SMS goes unread, nothing follows up except the 10-minute cooldown reminder. Escalation to a voice call is the subject of Experiment #002.