Connectivity & OTA

Connectivity State Machine

stateDiagram-v2
    [*] --> WiFiScan: boot
    WiFiScan --> WiFiConnected: SSID found
    WiFiScan --> FallbackAP: all SSIDs failed
    WiFiConnected --> MQTT_TLS: PubSubClient
    FallbackAP --> CaptivePortal: ap_timeout_s
    FallbackAP --> WiFiScan: timeout, retry
    WiFiConnected --> CellularStandby: WiFi ok

    state "WiFi Path" as wp {
        MQTT_TLS --> Publishing: connected
        Publishing --> MQTT_TLS: reconnect
    }

    CellularStandby --> CellularActive: WiFi drops
    CellularActive --> ModemMQTT: AT+SMCONN
    ModemMQTT --> CellularPublishing: connected
    CellularActive --> WiFiScan: periodic recheck

WiFi is always tried first. Each SSID is retried up to wifi.retries times (default 2) before moving on. If no WiFi network is in range, the fallback AP starts a captive portal for local configuration. Cellular activates in parallel when WiFi fails.

OTA Update (OTAUpdate)

HTTP(S) pull-based OTA. The device fetches a JSON manifest, compares semver, and streams + verifies the firmware binary.

flowchart LR
    A[Trigger] --> B[Fetch manifest]
    B --> C{Newer version?}
    C -->|No| D[Skip]
    C -->|Yes| E[Download .bin]
    E --> F{SHA256 match?}
    F -->|No| G[Abort]
    F -->|Yes| H[Flash]
    H --> I[Reboot]

Manifest format (build/firmware.json, generated by scripts/generate_manifest.py):

{
  "version": "1.x",
  "url": "https://example.com/firmware.bin",
  "sha256": "abc123..."
}

Triggers:

  1. Periodic interval check (default 6 h, configurable via ota.check_interval_s)
  2. First check runs 30 seconds after boot
  3. MQTT message to ota.cmd_topic (any payload) - defaults to <topic_prefix>/cmd/ota

Trigger manually from the shell without an external MQTT client:

lua.exec MQTT.publish("thesada/node/cmd/ota", "check")

TLS: loads /ca.crt from LittleFS; falls back to setInsecure() with a warning if absent.

config.json keys:

"ota": {
  "enabled":          true,
  "manifest_url":     "https://github.com/Thesada/thesada-fw/releases/latest/download/firmware.json",
  "check_interval_s": 21600,
  "cmd_topic":        "thesada/node/cmd/ota"
}

MQTT Subscriptions

MQTTClient::subscribe(topic, callback) stores subscriptions and re-applies them automatically on reconnect. Callbacks are dispatched by exact topic match or trailing /# wildcard in onMessage().

MQTT CLI (v1.0.19+)

The primary interface for remote management. Subscribe to <prefix>/cli/# - the topic is the command, the payload is the arguments. Response published to <prefix>/cli/response as JSON.

thesada/node/cli/sensors          payload: ""              -> all sensors
thesada/node/cli/sensors          payload: "temp_1"        -> specific sensor
thesada/node/cli/config.set       payload: "mqtt.ha_discovery true"
thesada/node/cli/config.reload    payload: ""
thesada/node/cli/ota.check        payload: ""
thesada/node/cli/restart          payload: ""
thesada/node/cli/battery          payload: ""
thesada/node/cli/version          payload: ""

Response format:

{"cmd": "sensors", "ok": true, "output": ["temp_1: 65.2C", "temp_2: 57.1C"]}

Any shell command works - same 30+ commands available over serial, WebSocket, HTTP, and now MQTT.

Special command: cli/file.write - payload is <path>\n<content>. All three mosquitto_pub modes work (-m, -f, -s):

mosquitto_pub ... -t '<prefix>/cli/file.write' -m '/test.txt
hello world'

Notes:

  • All CLI commands execute in loop() via deferred processing (not inside the PubSubClient callback). This prevents keepalive timeouts on slow operations like LittleFS writes.
  • config.set saves to flash but does not auto-reload. Run config.reload after to apply.
  • Full config replacement: use cli/file.write with path /config.json + cli/config.reload.

Legacy cmd/* topics (removed in v1.2.3)

The cmd/config/set, cmd/config/push, and default cmd/ota topics have been removed. Use the CLI equivalents:

Old topic CLI replacement
cmd/config/set cli/config.set payload: <key> <value>
cmd/config/push cli/file.write payload: /config.json\n<json> + cli/config.reload
cmd/ota cli/ota.check

If ota.cmd_topic is explicitly set in config.json, that dedicated subscription still works for backwards compatibility.

Modules and Lua scripts can add further subscriptions via MQTTClient::subscribe() or MQTT.publish() / EventBus.subscribe().


HA MQTT Auto-Discovery

On every MQTT connect, the firmware publishes retained discovery config messages to homeassistant/sensor/<device_id>/... and homeassistant/binary_sensor/<device_id>/.... Home Assistant picks these up automatically - no manual YAML sensor config needed.

Enabled by default. Disable with mqtt.ha_discovery: false in config.json.

Each sensor publishes on its own topic with a simple value (no JSON parsing needed by HA):

<prefix>/sensor/temperature/<slug>   -> "65.20"
<prefix>/sensor/current/<slug>       -> "0.70"
<prefix>/sensor/power/<slug>         -> "84.0"
<prefix>/sensor/battery/percent      -> "100"
<prefix>/sensor/battery/voltage      -> "4.19"
<prefix>/sensor/battery/charging     -> "Charging"
<prefix>/sensor/wifi/rssi            -> "-52"
<prefix>/sensor/wifi/ssid            -> "MyNetwork"
<prefix>/sensor/wifi/ip              -> "172.16.0.100"
<prefix>/status                      -> "online" / "offline" (LWT)

Discovery template is `` for all entities. WiFi diagnostics are entity_category: diagnostic (disabled by default in HA).

Combined JSON payloads are still published on the original topics for backwards compatibility (Lua scripts, SD logging, cellular relay).

All entities are grouped under a single HA device (device name from device.friendly_name, manufacturer “Thesada”, model “Base Node”, sw_version from firmware).

No manual YAML sensor config needed - auto-discovery handles everything.


MQTT Connection Reliability (v1.2.7+)

Three mechanisms prevent connection drops after extended uptime:

Connection watchdog (10 min) - if no successful _client.loop() or publish in 10 minutes, the client force-disconnects and reconnects. Catches half-open TCP sockets that PubSubClient reports as connected.

TCP keepalive - enabled on the MQTT socket after connect via setsockopt(). Sends OS-level TCP probes after 30s of silence, every 10s, and declares dead after 3 failed probes. Detects NAT table timeouts and router reboots faster than MQTT-level keepalive.

Telegram HTTPS timeout - all outbound HTTPS requests (Telegram Bot API, webhooks) are capped at 10 seconds. Without this, a slow DNS lookup or TLS handshake can block loop() long enough for the MQTT keepalive (60s) to expire.

Persistent WiFiClientSecure (v1.2.8+) - the Telegram HTTPS client uses a static WiFiClientSecure instance instead of allocating a new one per request. On ESP32, 3 new+delete cycles of the ~30KB TLS buffer fragments the heap beyond recovery. The persistent instance avoids this and keeps Telegram reliable after extended uptime.

NTP-aware TLS (v1.3.0+) - on cold boot when NTP hasn’t synced yet, the system clock is at epoch (Jan 1970). Certificate validation fails because every cert looks expired. MQTTClient now connects insecure on cold boot, then forces a reconnect with proper cert validation once NTP syncs. This eliminates the ~10 minute initial connect delay that happened when the client retried TLS handshakes with a bad clock.

TLS heap guard - on boards with limited RAM (e.g. CYD/WROOM-32), the cert upgrade is skipped when free heap is below 40KB. WiFiClientSecure’s SSL allocation needs a large contiguous block - attempting it on a tight heap causes OOM crashes. The connection stays insecure with a warning logged.

Connection uptime is logged on disconnect to help diagnose patterns (consistent ~3600s = NAT timeout, random = WiFi instability).


WiFi Path (normal)

  • Multi-SSID: configure a list of networks; ranked by RSSI at scan time
  • Configurable retries per SSID before fallback: wifi.retries (default 2)
  • NTP synced on connect (pool.ntp.org by default, configurable)
  • PubSubClient MQTT over TLS (port 8883)
  • Optional minimum send interval: mqtt.send_interval_s
  • Optional static IP: wifi.static_ip / gateway / subnet / dns
"wifi": {
  "retries": 2
}

Fallback AP (captive portal)

When no configured WiFi network is in range (or none are configured), the node starts a SoftAP for local configuration. Previously the firmware would skip straight to cellular fallback - now the AP always starts first so you can configure WiFi locally.

  • SSID: <device.name>-setup (e.g. thesada-owb-setup)
  • Password: from wifi.ap_password (min 8 chars for WPA2; open if empty or shorter)
  • Captive portal: all DNS queries redirect to 192.168.4.1, and unknown HTTP requests redirect to the dashboard. Phones and laptops auto-open the config page on connect.
  • Timeout: after wifi.ap_timeout_s (default 300s) the AP stops and WiFi scan retries. This cycles until WiFi connects.

The web interface is fully functional in AP mode - you can view sensors, edit config, and upload firmware.

"wifi": {
  "ap_password":  "my-setup-pass",
  "ap_timeout_s": 300
}

Cellular Fallback (LTE-M/NB-IoT)

  • Activates when all WiFi networks fail (in parallel with fallback AP)
  • SIM7080G modem-native MQTT over TLS via AT+SM* commands
  • Periodic WiFi recheck every 15 min (configurable); reverts to WiFi when available
  • If /ca.crt is present, the modem uses it for TLS verification (AT+SMSSL). If absent, the AT+SMSSL and AT+CSSLCFG CONVERT commands are skipped entirely - the modem connects without CA verification and a warning is logged.

CA Certificate

No certificate is compiled in. Place a CA cert PEM bundle as data/ca.crt and upload to LittleFS (pio run --target uploadfs). Multiple certs can be concatenated in one file. Both WiFi MQTT and the OTA HTTPS client load it at boot. If absent, TLS connects without certificate verification and a warning is logged.

Important: use self-signed root certificates only. Cross-signed intermediates will not work - the ESP32 TLS stack needs the actual trust anchor. For example, the USERTrust ECC Certification Authority root is self-signed. The “Sectigo Root E46 signed by USERTrust ECC” cert looks similar but is a cross-signed intermediate - using it causes silent OTA and TLS failures. Always verify your CA cert is self-signed (issuer == subject) before uploading.

The production bundle contains two roots:

  • ISRG Root X1 - covers Let’s Encrypt (MQTT broker, release-assets.githubusercontent.com)
  • USERTrust ECC Certification Authority (self-signed root) - covers github.com (OTA manifest fetch, Sectigo chain)

You can also upload ca.crt at runtime via POST /api/file?path=/ca.crt&source=littlefs without reflashing the filesystem.


AsyncTCP (vendored)

AsyncTCP v3.3.2 is vendored in lib/AsyncTCP/ with null-pointer guards added to _accept, _s_accept, and _s_accepted. These prevent LoadProhibited crashes (EXCVADDR 0x00000030) when lwIP calls TCP callbacks with a null PCB or freed server pointer.


Thesada - CERN-OHL-P-2.0 / GPL-3.0-only / CC BY 4.0

This site uses Just the Docs, a documentation theme for Jekyll.