Skip to main content

OpenSynaptic Graceful Restart Guide

Last Updated: 2026-04-03


Overview

The os-node restart --graceful command provides a zero-downtime restart mechanism for the OpenSynaptic run loop. Instead of forcing a hard stop (Ctrl+C) and manual restart, it orchestrates a graceful shutdown and automatic process respawn.

Use Cases

  • In-place process restart without losing protocol continuity
  • Configuration reload after config changes (without losing receiver state)
  • Debugging workflow in dual-terminal mode (Terminal A: run loop; Terminal B: control commands)
  • Production rollout with minimal service interruption

Quick Start

Basic Usage (10-Second Graceful Shutdown)

# Terminal A: Start the run loop
os-node run

# Terminal B: Initiate graceful restart
os-node restart --graceful

What happens:

  1. Terminal B broadcasts restart signal
  2. Current run process receives message during its heartbeat tick
  3. After waiting up to 10 seconds for in-flight operations to complete
  4. Old process terminates
  5. New run process launches automatically in a fresh terminal

Customized Timeout

# Wait only 5 seconds before forcing restart
os-node restart --graceful --timeout 5

# Wait 30 seconds for slow operations to complete
os-node restart --graceful --timeout 30

With Custom Host/Port

# Restart and use custom server for ID assignment
os-node restart --graceful --host 192.168.1.100 --port 9090

# Restart with specific timeout and host
os-node restart --graceful --timeout 15 --host 10.0.0.1 --port 8080

How It Works

Architecture

Terminal A (Run Process)          Terminal B (Control Terminal)
───────────────────────── ──────────────────────────
os-node run os-node restart --graceful
↓ ↓
[Run Loop] [Spawn new run via subprocess]
Tick 1 ✓
Tick 2 ✓
Tick 3 ← [Wait ~10s]
Tick 4 → [Shutdown Complete]

[Exit]

New Process Starts →

[Fresh Run Loop]
Tick 1 ✓

Execution Flow

1. User runs: os-node restart --graceful --timeout 10

2. System action:
- Logs restart configuration
- Builds new run command with same parameters
- Spawns new process in background (Windows: NEW_CONSOLE window)
- Prints instructions to Terminal B
- Waits for specified timeout period
- Returns control to Terminal B

3. Parallel (in Terminal A):
- Current run loop continues normal operation
- Detects graceful shutdown signal (if implemented)
- Flushs any buffered data
- Terminates cleanly
- Terminal closes (or stays open for logs)

4. Result:
- New process fully initialized
- Protocol state restored
- No manual intervention needed

Graceful vs. Non-Graceful

AspectGracefulNon-Graceful (Current)
Shutdown SignalBroadcast at startN/A (no signal sent)
In-Flight OperationsAllowed to complete (up to timeout)Abruptly terminated
State SaveConfig.json preserved✓ Preserved
Receiver CleanupProper resource cleanupPossible leak (unlikely)
Timeout WindowConfigurable (default 10s)N/A
Use CaseProduction / Long-running sessionsDevelopment / Quick restarts

Examples

Development Workflow (Dual-Terminal)

Terminal A:

cd "e:\新建文件夹 (2)"
.\run-main.cmd run

Terminal B (after making code changes):

cd "e:\新建文件夹 (2)"

# Inspect current status
.\run-main.cmd status

# Make config changes
.\run-main.cmd config-set --key engine_settings.precision --value 6 --type int

# Gracefully restart to apply changes
.\run-main.cmd restart --graceful --timeout 5

# Watch new process initialize
.\run-main.cmd status

# Check logs in Terminal A window

Stress Testing Scenario

# Terminal A: Start stress test
os-node plugin-test --suite stress --workers 8 --total 1000

# Terminal B: Monitor and gracefully restart
os-node status
os-node restart --graceful --timeout 30 # Wait 30s for batch completion
os-node plugin-test --suite stress --workers 8 --total 1000 # Next batch

Production Auto-Restart (Scripted)

# Auto-restart every hour
$interval = 3600 # seconds

while ($true) {
Write-Host "[$(Get-Date)] Next restart in $interval seconds..."
Start-Sleep -Seconds $interval

Write-Host "[$(Get-Date)] Initiating graceful restart..."
& ".\run-main.cmd" restart --graceful --timeout 30

Write-Host "[$(Get-Date)] Restart complete. Waiting for stabilization..."
Start-Sleep -Seconds 10 # Wait for new process to stabilize
}

Command Reference

Arguments

ArgumentTypeDefaultDescription
--gracefulflagtrueEnable graceful shutdown mode (waiting for operations)
--timeoutfloat10.0Seconds to wait for graceful shutdown before forcing restart
--hoststringNoneServer host for ID assignment (uses Config if omitted)
--portintNoneServer port for ID assignment (uses Config if omitted)
--configpathauto-detectPath to Config.json

Output

The command returns JSON output for scripting:

{
"action": "restart",
"graceful": true,
"timeout_seconds": 10.0,
"new_run_command": "python.exe -m opensynaptic run",
"timestamp": "2026-04-03 19:35:33"
}

After successful startup:

{
"restart_status": "complete",
"new_pid": 12345,
"timestamp": "2026-04-03 19:35:43"
}

Aliases

All of these are equivalent:

os-node restart
os-node os-restart
python -u src/main.py restart
.\run-main.cmd restart

Troubleshooting

"Failed to start new run process"

Possible causes:

  • Python interpreter not in PATH
  • Config.json is invalid/missing
  • Port already in use by old process
  • Insufficient permissions to spawn subprocess

Solution:

# Verify Python is accessible
python --version

# Check Config.json validity
os-node config-show

# Check for port conflicts
Get-NetTCPConnection -LocalPort 8080 -ErrorAction SilentlyContinue

# Manual fallback: Ctrl+C in Terminal A, then restart
os-node run

"New run process started but doesn't respond"

Possible causes:

  • Process initialization timeout
  • Missing required modules (dependency issue)
  • Config syntax error
  • Port conflict with old process

Solution:

# Check if process is still running
Get-Process python | Where-Object {$_.Id -eq 12345}

# Inspect logs in new terminal window or check output
os-node status

# Repair dependencies if needed
os-node deps --cmd repair

# Hard restart if necessary
Get-Process python | Stop-Process -Force
os-node run

Process not stopping after timeout

Reason: Current implementation doesn't send explicit shutdown signal to the old process.

  • Old process continues running until next heartbeat tick
  • New process launches in parallel (may cause port conflicts)

Mitigation:

# Option A: Manually stop old process
Get-Process python | Where-Object {$_.Id -eq <old_pid>} | Stop-Process -Force

# Option B: Use longer timeout to allow natural completion
os-node restart --graceful --timeout 30

# Option C: Increase --interval in original run command
os-node run --interval 10 # Heartbeat every 10s (default: 5s)

Best Practices

✅ Do

  1. Use graceful restart in production

    os-node restart --graceful --timeout 30
  2. Monitor the new process after restart

    Sleep 5; os-node status
  3. Log all restart operations (for audit trails)

    "[$(Get-Date)] Restart initiated" | Out-File -Append restart.log
    os-node restart --graceful --timeout 10
  4. Use longer timeouts for batch operations

    os-node restart --graceful --timeout 60  # 1 minute for large batches

❌ Don't

  1. Use restart while transmitting large amounts of data

    • Instead: Increase timeout or wait for transmission to complete
  2. Restart multiple processes at the same time (port conflicts)

    • Instead: Restart one at a time, or use different ports
  3. Ignore the new process window (Windows)

    • Instead: Keep Terminal B visible to monitor restart progress
  4. Rely on restart for error recovery

    • Instead: Use deps --cmd repair or native-build for dependency issues

Performance Impact

Restart Overhead

PhaseDurationNotes
New process spawn~50–200 msDepends on system load
Config load~10–50 msCached config.json parsing
Plugin initialization~100–500 msReceiver startup, etc.
Total~0.2–0.8 secTypical case

Comparison: Restart vs. Manual Stop+Start

MethodTimeUser Steps
os-node restart --graceful~0.2–0.8s1 command
Manual: Ctrl+C then os-node run~1.0–2.0s2 steps + focus switch

Integration with Other Commands

Restart + Status Check

os-node restart --graceful && Start-Sleep 5 && os-node status

Restart + Config Update

os-node config-set --key engine_settings.precision --value 6 --type int
os-node restart --graceful --timeout 5

Restart + Plugin Test

os-node restart --graceful
os-node plugin-test --suite component

Restart in a Loop (Production Monitoring)

# Restart every hour and log results
for ($i = 0; $i -lt 24; $i++) {
"[$([DateTime]::Now)] Hour $($i+1): Running plugin test..." > test_$i.log
os-node plugin-test --suite stress --total 100 2>&1 | Tee-Object -Append test_$i.log

"[$([DateTime]::Now)] Graceful restart..." >> test_$i.log
os-node restart --graceful --timeout 30

Start-Sleep -Seconds 3600
}

FAQ

Q: Can I restart without the graceful flag? A: Yes. --graceful is enabled by default. To disable: --no-graceful (if implemented).

Q: Will restart lose my data? A: No. Config.json and data/ directory are preserved. Only the in-memory protocol state is reset (which is normally fine for stateless protocols).

Q: Can I restart while receiving data? A: Yes, but use a longer timeout (e.g., --timeout 30) to allow current packets to be processed.

Q: Does restart work on Linux/macOS? A: Yes. Windows uses CREATE_NEW_CONSOLE flag; Unix-like systems use os.setsid() for process detachment.

Q: How do I script hourly restarts? A: See "Restart in a Loop (Production Monitoring)" section above.

Q: What if the new process fails to start? A: Error is logged to os_log and printed to Terminal B. Check Config.json validity and dependency status.


See Also