Forwarding Error Logs: Tracking Down Why Mail Isn't Reaching Your Upstream

The Hidden Failure Mode

A spam filter proxy sits between the public internet and your real mail server. Inbound mail arrives at the proxy, gets scored and tagged, and is then forwarded upstream. Most of the time this works invisibly. But when the upstream is down, slow, or rejecting messages, the proxy is in an awkward position — it has already accepted the message from the sender, so it can't bounce it back, but it also can't deliver it.

Different proxies handle this differently. Some queue and retry indefinitely, building up a hidden backlog. Some defer the connection (4xx) and hope the sender retries. Some silently drop messages and log nothing useful, leaving you to discover days later that mail has been disappearing.

The right answer is to log every forwarding failure with enough detail to investigate, in a format your tools can actually parse. That's what the dedicated forwarding error log provides.

JSONL: One Failure Per Line

The forward error log uses JSONL format — one JSON object per line, no wrapping array, no commas between entries. Each line represents one failed delivery attempt:

{"timestamp":"2026-04-13T14:32:18Z","from":"sender@example.com","to":["user@yourdomain.com"],"upstream":"mail.yourdomain.com:25","error":"Connection refused","retry_count":0,"message_id":"<abc123@example.com>","size":4823}

Why JSONL? Three reasons:

Append-friendly. New entries are appended to the end of the file with no need to rewrite the previous content. Safe under concurrent writes from multiple worker threads.
Stream-parseable. jq, grep, and any line-oriented tool can process it. You don't need to load the whole file to extract specific entries.
Schema-stable. New fields can be added without breaking existing parsers — JSON consumers ignore unknown keys.

Querying With spam-filter-stats

The bundled spam-filter-stats CLI (renamed from the older Python forward_errors.py) understands the forward error log natively. Common queries:

# Show errors from the last 24 hours
spam-filter-stats --errors --since 1d

# Show errors filtered by sender
spam-filter-stats --errors --from "@suspicious-domain.com"

# Show errors filtered by recipient domain
spam-filter-stats --errors --to "@yourdomain.com"

# Show errors filtered by upstream server
spam-filter-stats --errors --upstream "mail.yourdomain.com"

# Show errors matching a specific error string
spam-filter-stats --errors --error "Connection refused"

# Combine filters
spam-filter-stats --errors --since 6h --error "timeout"

For one-off investigation, raw jq works fine:

jq -r 'select(.error | contains("timeout")) | "\(.timestamp) \(.from) -> \(.to[0]): \(.error)"' \
  /var/log/spam-filter/forward-errors.log

Common Errors and What They Mean

Connection refused — The upstream server is down or not listening on the configured port. Check that the upstream is running and that its IP/port matches your upstream config. Check firewall rules between the proxy and upstream.

Connection timed out — The upstream accepted the TCP connection but didn't respond to the SMTP banner within the configured timeout (default 30 seconds). Usually indicates the upstream is overloaded or its SMTP daemon is stuck.

451 4.7.1 Temporary failure or other 4xx codes — The upstream is asking for a retry. Spam Killer doesn't retry internally (queueing is the upstream's job), so these messages will need to be re-sent by the original sender. If you see large numbers of these, the upstream is rate-limiting or greylisting the proxy itself, which usually requires whitelisting the proxy's IP on the upstream.

550 5.1.1 User unknown or other 5xx codes — The upstream rejected the message permanently. Often this is a recipient that was deleted upstream but is still receiving mail through the public-facing MX. The proxy can't fix this; the upstream needs to either accept the address or have it removed from DNS.

STARTTLS handshake failed — TLS negotiation failed. Check upstream certificate validity and TLS version compatibility. Older upstream servers may not support TLS 1.3.

Log Rotation and Retention

The forward error log uses the same rotation system as the main log file. When it grows past the configured max_size, it's rotated to forward-errors.log.1, then .2, and so on, with a configurable retention count.

If compress_rotated: true is set, rotated files are gzipped automatically — useful for keeping more history without consuming disk. Compressed logs are still queryable by piping through zcat:

zcat /var/log/spam-filter/forward-errors.log.5.gz | jq 'select(.error | contains("timeout"))'

Alerting on Forward Errors

For production deployments, you'll want to alert on a sustained increase in forward errors — they typically mean either the upstream is down or your mail config is broken in some way. Two practical approaches:

Prometheus metric. The spamfilter_forward_errors_total counter increments on every forward failure. Set up a Prometheus alert rule: increase(spamfilter_forward_errors_total[5m]) > 10 to fire when more than 10 errors occur in a 5-minute window.

Log tail check. A simple cron job that tails the forward error log and counts entries in the last N minutes is sufficient for smaller deployments. The advantage is that it works without setting up Prometheus and is easy to explain to anyone.

Either way, the alert should be loud — sustained forward errors mean mail is being lost (or at least delayed beyond what senders will tolerate), and that's the kind of failure mode you want to know about within minutes, not hours.

Forwarding Error Logs: Tracking Down Why Mail Isn't Reaching Your Upstream

The Hidden Failure Mode

JSONL: One Failure Per Line

Querying With spam-filter-stats

Common Errors and What They Mean

Log Rotation and Retention

Alerting on Forward Errors

Related Articles

Compressed Log Rotation: Keeping Mail Server Disks From Filling Up

Using Prometheus and Grafana to Monitor Your Email Pipeline

Migrating to a Spam Proxy Without Dropping Email