Ezmon — Network Monitoring

Documentation

Introduction

Ezmon is a cloud-based infrastructure monitoring platform. It checks your services every 30 seconds using seven protocols and fires alerts the moment something degrades — before your users notice.

Everything lives in the dashboard: create a monitor, pick an alert channel, and you are done. No agent required for public endpoints. The optional private agent extends monitoring to hosts that are never exposed to the internet.

Protocols

HTTP · HTTPS · TCP · PING · DNS · SNMP · SSL

< 45s

Alert speed

Average time from failure to notification

30s

Check interval

Minimum on Pro; 5 min on Free

Quick start

Get a monitor running in under 60 seconds.

Create an account

Go to /register and sign up with your email. No credit card required on the Free plan.

Add your first monitor

Open the Monitors page and click "New Monitor". Enter a name, pick a type (HTTP is the most common), paste the URL, and hit Create.

Set up an alert rule

Navigate to Alerts → New Rule. Choose the condition "Goes down", select the Email channel, and save. You will receive an email the next time the monitor fails.

Watch the dashboard

The Monitors page refreshes every 30 seconds. Status dots, latency figures, and sparkline trends update live as checks come in.

✓

The dashboard link is /dashboard after you log in. Bookmark it — it is the single pane of glass for your infrastructure.

Plans & limits

Three tiers. All plans include the core monitoring engine; higher tiers unlock shorter intervals, more monitors, and more team members.

Feature

Free

Starter — €19/mo

Pro — €49/mo

Monitors

200

Check interval

5 min

1 min

30 sec

Users

Alert channels

Email + Slack

All channels

SNMP monitoring

—

✓

History retention

7 days

30 days

90 days

REST API access

—

✓

Status pages

Unlimited

ℹ

Plan limits are enforced when you try to create a new resource — existing monitors and rules are never deleted when downgrading.

Monitors

A monitor is a periodic check against a single target. Every result is stored and used to compute uptime, feed the sparkline trend, and trigger alert rules.

Status values

Status

Meaning

The last check passed within the expected response criteria.

WARN

The check passed but latency exceeded the degraded threshold.

DOWN

The last check failed — connection refused, timeout, wrong status code, etc.

UNKNOWN

No results have been received yet (new monitor or just re-enabled).

PAUSED

The monitor is disabled and no checks are running.

Bulk actions

Click Select on the Monitors page to enter multi-select mode. You can then enable, pause, or delete multiple monitors at once using the bulk action bar that appears above the list.

Uptime calculation

Uptime is computed as the ratio of successful checks to total checks over the displayed window. A check counts as "up" when it returns status up. Paused intervals are excluded so maintenance windows do not penalise your SLA.

Monitor types

HTTP / HTTPS

Sends an HTTP GET to the target URL and evaluates the response. Two optional assertions can be configured:

Expected status code — If set, the check fails unless the HTTP response matches exactly (e.g. 200). Leave blank to accept any 2xx or 3xx.
Keyword check — A string that must appear somewhere in the response body. Useful for detecting blank pages or maintenance screens that still return 200.

TCP

Opens a TCP connection to host:port and measures the time to establish it. Useful for databases, mail servers, game servers, and any service that does not speak HTTP. Specify the port in the dedicated Port field or embed it in the target (db.internal:5432).

PING (ICMP)

Sends ICMP echo requests and measures round-trip time. Ideal for network devices, firewalls, and hosts where you only need to verify reachability. Requires the target to respond to ICMP — some cloud providers block it by default.

DNS

Performs a DNS lookup for the target domain and checks that a record exists. Additional options:

Option

Description

Record type

A, AAAA, CNAME, MX, NS, or TXT. Defaults to A.

Expected value

If set, the resolved value must match exactly (e.g. an IP address for A records). Leave blank to accept any valid response.

SNMP

Polls an SNMP agent using the standard sysUpTime.0 OID to verify the device is reachable and responding. Available on Pro only. Options:

Option

Values

Community string

The SNMP community (default: public)

Version

v1 or v2c

SSL certificate

Connects to the target host over TLS and inspects the certificate expiry date. The check goes to WARN when the certificate expires within the configured threshold (days), and DOWN when expired or unreachable. Pair with an ssl_expiry_below alert rule to get notified before your certificate lapses.

Alerts

Alert rules define what condition to watch, which monitors to watch, and where to send the notification. You can have as many rules as you need.

Conditions

Condition

Triggers when…

status_down

The monitor transitions to DOWN (connection failure, timeout, wrong status, etc.)

status_warn

The monitor transitions to WARN (latency above the degraded threshold)

status_unknown

The monitor becomes unreachable (no response from the check runner)

latency_above

Measured latency exceeds the configured threshold in milliseconds

ssl_expiry_below

SSL certificate expires in fewer than N days

Channels

EmailSlackDiscordMicrosoft TeamsWebhookSMS

Multiple channels can be selected per rule. The Webhook channel posts a JSON payload to any URL — useful for integrating with PagerDuty, custom automation, or anything with an HTTP endpoint.

Custom notify email

By default, alerts go to the workspace owner. Enter a different address in the Notify email field to route that rule's notifications elsewhere — useful for on-call rosters or team distribution lists.

Tag-based routing

Set Monitor scope to "All monitors with tag…" to apply a rule only to monitors sharing a given tag. This lets you route database alerts to the DB team's Slack channel while sending API alerts to the backend team, all from a single workspace.

Cooldown

The cooldown window (default 30 minutes) prevents duplicate notifications for the same ongoing incident. After the first alert fires, subsequent failures for the same monitor are silenced until either the monitor recovers or the cooldown expires.

Testing an alert rule

Click Test on any saved rule to fire a test notification immediately. This verifies your webhook URL, Slack integration, or email deliverability without waiting for a real failure.

Incidents

Every time a monitor transitions to DOWN or WARN and back to UP, an incident record is created. Incidents give you a historical timeline of outages including duration, affected monitor, and who acknowledged it.

Lifecycle

OPENED

Created automatically when a monitor first fails.

ONGOING

The monitor is still in a failed state.

RESOLVED

The monitor recovered. Duration is locked in.

ACKNOWLEDGED

A team member has seen the incident and optionally left a note. Acknowledgement does not resolve the incident.

Acknowledging an incident

Open the Incidents page and click Acknowledge next to an unacknowledged incident. You can optionally add a note (e.g. "investigating — DB migration in progress") that is saved with the incident for the post-mortem.

Filtering

Use the filter tabs at the top of the Incidents page to show only ongoing incidents, unacknowledged incidents, or all. The monitor name in each row links directly to that monitor's detail page.

Maintenance windows

Schedule a maintenance window on any monitor to suppress alerts and exclude downtime from uptime calculations during planned work.

Creating a window

Open a monitor's detail page and scroll to the Maintenance section. Pick a start time, end time, and an optional label. While the window is active, the monitor badge shows MAINT in the list, and alert rules skip firing for that monitor.

ℹ

Maintenance windows are per-monitor. If you are taking down an entire service that spans multiple monitors, create a window on each one or use a shared tag to quickly locate and update them.

Status pages

A public status page at ezmon.dev/status/your-slug gives your customers a single URL to check during an outage — no need to field support tickets.

What is shown

Overall system status banner (All systems operational / Degraded / Major outage)
Per-component status with the current state for each monitor
90-day uptime bar chart — each bar represents one day, color-coded green/yellow/red
Recent incident history — resolved incidents from the past 90 days with duration

Component status mapping

Monitor status

Shown as

Operational (green)

WARN

Degraded (yellow)

DOWN

Major Outage (red)

In maintenance window

Under Maintenance (blue)

UNKNOWN / PAUSED

Unknown (grey)

Auto-refresh

The status page polls for updates every 60 seconds so subscribers always see fresh data without manually refreshing.

SLA reports

The Reports page generates a per-monitor uptime breakdown for the last three calendar months, making it easy to demonstrate SLA compliance to clients or leadership.

Metrics per monitor per month

Column

Description

Uptime %

Percentage of checks that passed (up_count / total_count × 100)

Checks passed

Number of successful check results

Total checks

Total checks executed in the period

Avg latency

Mean response time across all results

Workspace summary

A headline figure at the top of the page shows the overall workspace uptime for the current month — the combined result across all monitors.

CSV export

Click Export CSV to download the full report as a comma-separated file. The CSV includes one row per monitor per month with all four metrics, suitable for import into a spreadsheet or BI tool.

ℹ

History depth depends on your plan: 7 days (Free), 30 days (Starter), 90 days (Pro). Upgrading does not backfill historical data — upgrade before you need it.

Network map

The Map page provides a free-form canvas where you can arrange your monitors visually, exactly as they appear in your network diagram or server room.

Node groups

A group is a named container node on the canvas. Drag it to any position; it snaps to a 28 px grid. The node circle pulses in the colour of the worst-status monitor linked to it — green when all is well, red when something is down.

Widgets

Place resizable widgets next to groups or anywhere on the canvas for at-a-glance metrics:

Widget type

Shows

Latency

Live latency line chart for the chosen monitor

Uptime

Uptime percentage bar chart over the last 20 checks

Status

Current status badge with monitor name and last-seen time

Three sizes are available: Small (240 × 148), Medium (340 × 195), Large (480 × 250). Drag any widget to reposition it; use the resize handle to change size.

Edit mode

Toggle Edit in the top-right of the Map page to unlock drag-and-drop, add/remove groups and widgets, and rename groups. Positions are saved automatically.

Private agent

The private agent is a lightweight process that runs inside your network and executes checks on behalf of Ezmon. It is the only way to monitor hosts that are never exposed to the public internet — internal databases, private APIs, network printers, industrial controllers, and so on.

How it works

The agent polls the Ezmon API at regular intervals, pulls the list of monitors assigned to it, runs each check locally, and posts the results back. All traffic is outbound from your network — you do not need to open any inbound firewall ports.

Setup

Go to Agent in the sidebar and generate an agent token. Copy the Docker command shown and run it on any host inside your network:

shell

docker run -d \
  --name ezmon-agent \
  --restart unless-stopped \
  -e EZMON_TOKEN=your_agent_token_here \
  ghcr.io/ezmon/agent:latest

✓

The agent container needs outbound HTTPS access to the Ezmon API and whatever protocols you are checking (ICMP, SNMP, etc.). Run it on a host that can reach all the private targets you want to monitor.

Multiple agents

You can run multiple agents — one per network segment, data centre, or VLAN. Each agent token is independent. The monitor list view shows which agent last ran a check next to each monitor row, so you always know which probe is responsible.

Revoking a token

Delete the token from the Agent page to immediately stop that agent from submitting results. The agent process will begin failing its API calls and log an authentication error. Rotate tokens regularly as part of your security hygiene.

Team & roles

Workspaces support multiple users. All monitors, alert rules, and data within a workspace are shared across all members.

Inviting members

Go to Team in the sidebar and enter the invitee's email address. They will receive an invitation link valid for 7 days. When they accept, they are added to the workspace and can log in immediately.

Roles

Role

Can do

Owner

Full access including billing, workspace settings, and removing members

Admin

Create and delete monitors, manage alert rules, acknowledge incidents, invite members

Member

View all data, acknowledge incidents, create monitors

Removing a member

Open the Team page, find the member, and click Remove. Their session tokens are invalidated immediately. The workspace data they created (monitors, rules) is retained.

REST API

The Ezmon REST API lets you read monitor state, pull check results, and manage incidents programmatically. Available on the Pro plan.

Base URL

text

https://ezmon.dev/api/v1

Authentication

Pass your API token in the Authorization header as a Bearer token:

shell

curl https://ezmon.dev/api/v1/monitors \
  -H "Authorization: Bearer ezm_your_token_here"

GET /monitors

List all monitors in the workspace.

Parameter

Type

Description

limit

integer

Maximum results to return (1–200, default 200)

enabled

boolean

Filter by enabled state (true / false)

type

string

Filter by monitor type (http, tcp, ping, dns, snmp, ssl)

tag

string

Filter to monitors that include this tag

json

// Response — array of monitor objects
[
  {
    "id": "mon_abc123",
    "name": "Production API",
    "type": "http",
    "target": "https://api.example.com/health",
    "interval": 60,
    "enabled": true,
    "tags": ["production", "api"],
    "last_status": "up",
    "last_latency_ms": 42,
    "created_at": "2025-01-15T10:30:00Z"
  }
]

GET /monitors/:id

Fetch a single monitor by ID.

GET /monitors/:id/results

Retrieve raw check results for a monitor.

Parameter

Type

Description

limit

integer

Maximum results (1–1000, default 100)

from

ISO 8601

Only return results at or after this timestamp

ISO 8601

Only return results at or before this timestamp

json

// Response — array of result objects
[
  {
    "id": "res_xyz789",
    "monitor_id": "mon_abc123",
    "check_type": "http",
    "status": "up",
    "timestamp": "2025-05-18T14:22:00Z",
    "latency_ms": 38,
    "message": null,
    "metrics": { "status_code": 200 }
  }
]

GET /incidents

List incidents in the workspace. Optional ?ongoing=true to return only open incidents.

ℹ

The API currently supports read operations only. Write operations (create / delete monitors, manage alert rules) are on the roadmap.

API tokens

API tokens authenticate requests to the REST API. They are workspace-scoped — a single token grants access to all data in the workspace.

Creating a token

Go to Settings → API Tokens and click New Token. Enter a descriptive name (e.g. "Grafana integration" or "CI pipeline") and copy the token value immediately — it will not be shown again.

Token format

text

ezm_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Tokens are 32-character hex strings prefixed with ezm_. Store them in a secrets manager (AWS Secrets Manager, HashiCorp Vault, GitHub Actions secrets, etc.) — never commit them to source control.

Revoking a token

Click the delete icon next to a token in Settings. Revocation takes effect immediately — any in-flight requests using that token will receive a 401 Unauthorized response.

⚠

There is no way to recover a deleted token. If you lose a token, delete it and generate a new one.

Introduction

Quick start

Plans & limits

Monitors

Status values

Tags

Bulk actions

Uptime calculation

Monitor types

HTTP / HTTPS

TCP

PING (ICMP)

DNS

SNMP

SSL certificate

Alerts

Conditions

Channels

Custom notify email

Tag-based routing

Cooldown

Testing an alert rule

Incidents

Lifecycle

Acknowledging an incident

Filtering

Maintenance windows

Creating a window

Status pages

What is shown

Component status mapping

Auto-refresh

SLA reports

Metrics per monitor per month

Workspace summary

CSV export

Network map

Node groups

Widgets

Edit mode

Private agent

How it works

Setup

Multiple agents

Revoking a token

Team & roles

Inviting members

Roles

Removing a member

REST API

Base URL

Authentication

GET /monitors

GET /monitors/:id

GET /monitors/:id/results

GET /incidents

API tokens

Creating a token

Token format

Revoking a token