Session Monitoring

Every time a signal triggers a job in Atlas, it creates a session. Sessions are the execution contexts where agents do their work. Learning to monitor sessions is key to understanding what your AI agents are doing.

What is a Session?

A session represents:
  • 🎯 A single job execution
  • 🤖 One or more agents working together
  • 📊 Status, logs, and results
  • ⏱️ Start time, duration, and completion
  • 🧠 Memory context and learnings

Session Lifecycle

  1. Created - Signal triggers job, session initialized
  2. Running - Agents actively working
  3. Completed - All agents finished successfully
  4. Failed - Error occurred during execution
  5. Stopped - Manually terminated

Monitoring Commands

List Active Sessions

# Quick list (shorthand)
atlas ps

# Detailed list
atlas session list

# Output example:
SESSION ID                          WORKSPACE    JOB              STATUS    DURATION
8f3d2a1b-4c5e-6f7a-8b9c-0d1e2f3a4b5c  my-project   analyze-data     running   00:01:23
7e2c1a0b-3c4d-5e6f-7a8b-9c0d1e2f3a4b  my-project   chat-session     complete  00:05:47

View Session Details

atlas session get <session-id>

# Or in interactive mode
/session get 8f3d2a1b
Shows:
  • Session metadata
  • Triggered by (signal and data)
  • Agents involved
  • Current status
  • Execution timeline

Stream Session Logs

# Real-time log streaming
atlas logs <session-id>

# Follow mode (like tail -f)
atlas logs <session-id> --follow

# Filter by log level
atlas logs <session-id> --level info

Log Levels and Colors

Atlas uses colored output for easy scanning:
  • 🔵 DEBUG - Detailed debugging information
  • 🟢 INFO - General information
  • 🟡 WARN - Warning messages
  • 🔴 ERROR - Error messages
  • 🟣 TRACE - Very detailed trace logs

Understanding Session Output

Agent Execution Logs

[2024-01-15 10:23:45] INFO  [session:8f3d2a1b] Starting session for job 'analyze-data'
[2024-01-15 10:23:46] INFO  [supervisor] Creating execution plan with 2 agents
[2024-01-15 10:23:47] INFO  [agent:researcher] Starting execution
[2024-01-15 10:23:48] DEBUG [agent:researcher] Received input: {"file": "data.csv"}
[2024-01-15 10:23:52] INFO  [agent:researcher] Analysis complete, found 3 key insights
[2024-01-15 10:23:53] INFO  [agent:reporter] Starting execution
[2024-01-15 10:23:58] INFO  [agent:reporter] Report generated successfully
[2024-01-15 10:23:59] INFO  [session:8f3d2a1b] Session completed successfully

Supervisor Activity

Watch for supervisor decisions:
[SUPERVISOR] Analyzing job requirements...
[SUPERVISOR] Selected execution strategy: sequential
[SUPERVISOR] Agent 'researcher' assigned task: "Analyze the CSV data"
[SUPERVISOR] Agent 'reporter' will receive output from 'researcher'

Memory Operations

[MEMORY] Loading relevant context from previous sessions
[MEMORY] Found 2 similar analyses from past 7 days
[MEMORY] Storing session learnings for future reference

Interactive Monitoring

In Atlas interactive mode (atlas):

Real-Time Dashboard

The right panel shows live logs from active sessions:
  • Auto-scrolls with new entries
  • Color-coded by level
  • Clickable session IDs

Quick Actions

# List all sessions
/ps

# Focus on specific session
/session logs 8f3d2a1b

# Stop a runaway session
/session stop 8f3d2a1b

Session Filtering

By Status

# Only running sessions
atlas ps --status running

# Failed sessions
atlas ps --status failed

By Time Range

# Sessions from last hour
atlas session list --since 1h

# Sessions from today
atlas session list --since today

By Workspace

# Sessions in specific workspace
atlas ps --workspace my-project

Performance Monitoring

Session Metrics

Track performance indicators:
  • Execution time per agent
  • Token usage (for LLM agents)
  • Memory consumption
  • API calls made

Long-Running Sessions

Identify and investigate slow sessions:
# Sessions running over 5 minutes
atlas ps --min-duration 5m

Debugging Failed Sessions

View Error Details

atlas session get <failed-session-id>

# Look for:
# - Error messages
# - Stack traces
# - Failed agent ID
# - Input data that caused failure

Common Issues

  1. Timeout - Agent took too long
  2. API Error - External service failed
  3. Configuration - Invalid agent config
  4. Input Validation - Bad signal data

Debug Mode

Run jobs with debug logging:
atlas signal trigger analyze --debug

Session History

View Past Sessions

# Last 10 sessions
atlas session history --limit 10

# Sessions from specific job
atlas session history --job analyze-data

Export Session Data

# Export logs
atlas session export <session-id> --format json > session.json

# Export metrics
atlas session metrics <session-id> --format csv > metrics.csv

Best Practices

1. Use Descriptive Job Names

Makes sessions easier to identify:
jobs:
  analyze-customer-feedback:  # Clear purpose
    name: "Customer Feedback Analysis"

2. Add Progress Logging

In agent prompts:
prompts:
  system: |
    Provide progress updates:
    - "Starting analysis..."
    - "Processing 50% complete..."
    - "Generating final report..."

3. Set Appropriate Timeouts

Prevent zombie sessions:
jobs:
  quick-task:
    config:
      timeout: "60s"  # 1 minute max

4. Monitor Resource Usage

Watch for:
  • Excessive token usage
  • Memory spikes
  • Too many parallel sessions

5. Clean Up Old Sessions

Periodically review and clean up:
# Remove completed sessions older than 7 days
atlas session cleanup --older-than 7d --status completed

Integration with External Tools

Webhook Notifications

Configure session notifications:
jobs:
  important-job:
    config:
      on_complete:
        webhook: "https://slack.webhook.url"
      on_failure:
        webhook: "https://pagerduty.webhook.url"

Metrics Export

Send session metrics to monitoring systems:
  • Prometheus
  • Datadog
  • CloudWatch
  • Custom endpoints

Next Steps