This document outlines the technical architecture of Casino of Life, providing insights into its components, data flow, and integration points.
System Overview
Casino of Life consists of several interconnected components that work together to enable natural language-driven AI training for retro fighting games.
Copy ┌─────────────────┐ ┌───────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ Natural │────▶│ Training │────▶│ Game │
│ Language │ │ Pipeline │ │ Environment │
│ Interface │ │ │ │ │
│ │ │ │ │ │
└────────▲────────┘ └───────┬───────────┘ └────────┬───────┘
│ │ │
│ │ │
│ ▼ ▼
┌────────┴────────┐ ┌───────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ Web │◀───▶│ Reward │◀────│ Observation │
│ Interface │ │ System │ │ Processor │
│ │ │ │ │ │
└─────────────────┘ └───────────────────┘ └────────────────┘ Core Components
1. Natural Language Interface
The natural language interface, powered by CaballoLoko, translates human instructions into training configurations.
Key Classes:
CaballoLoko: Main interface for natural language processing
IntentProcessor: Identifies training intents from text
ParameterExtractor: Extracts specific training parameters
ResponseGenerator: Creates human-readable responses
Example Flow:
User input is processed by CaballoLoko.chat()
IntentProcessor identifies the training intent
ParameterExtractor pulls specific parameters
Configuration is passed to the training pipeline
ResponseGenerator creates a human-readable response
2. Game Environment
Built on top of the Stable-Retro library, the game environment component handles game emulation and state management.
Key Classes:
RetroEnv: Main environment class for game emulation
ObservationProcessor: Processes raw game frames
ActionSpace: Defines available actions for the agent
StateManager: Handles game state loading and saving
Technical Details:
Stochastic frame skipping (2-4 frames)
84x84 grayscale observation processing
4-frame stacking for temporal information
Multi-player support (2 players)
3. Training Pipeline
The training pipeline integrates with Stable-Baselines3 to provide reinforcement learning capabilities.
Key Classes:
DynamicAgent: Main agent class with adaptive learning
TrainingManager: Handles training configuration and execution
ModelRegistry: Manages saved models and checkpoints
HyperparameterOptimizer: Optimizes training parameters
Supported Algorithms:
PPO (Proximal Policy Optimization)
A2C (Advantage Actor Critic)
4. Reward System
The modular reward system allows for flexible definition of success criteria.
Key Classes:
BaseRewardEvaluator: Abstract base class for reward evaluators
MultiObjectiveRewardEvaluator: Combines multiple reward sources
RewardEvaluatorManager: Manages and switches between reward systems
RewardScaler: Scales and normalizes rewards
See the Reward System arrow-up-right documentation for details.
5. Web Interface
The web interface provides visualization and control capabilities.
Key Classes:
TrainingServer: Main server class for the web interface
DashboardManager: Manages dashboard components and views
WebSocketHandler: Handles real-time data streaming
APIEndpoints: Defines RESTful API endpoints
See the Web Interface arrow-up-right documentation for details.
Training Initialization:
User provides natural language instruction
CaballoLoko processes instruction into training parameters
Training pipeline configures agent and environment
Training begins with specified parameters
Training Loop:
Environment produces observation
Observation processor converts raw frames to agent input
Agent selects action based on policy
Environment executes action and returns next observation and reward
Reward evaluators calculate composite reward
Agent updates its policy based on experience
Metrics are collected and sent to web interface
Model Persistence:
Checkpoints are saved at configured intervals
Models can be loaded for continued training
Trained agents can be exported for deployment
Integration Points
External Libraries
Casino of Life integrates with several key libraries:
Stable-Retro : Game emulation and environment
Stable-Baselines3 : Reinforcement learning algorithms
PyTorch : Neural network backend
FastAPI : Web server and API
React : Frontend dashboard
Custom Integration
You can extend Casino of Life with custom components:
Memory Management : Automatic garbage collection for efficient memory use
Vectorized Environments : Support for parallel environment execution
Frame Skipping : Reduces computational load while maintaining learning capability
Checkpointing : Efficient model saving and loading
Observation Caching : Reduces redundant processing
The modular architecture of Casino of Life allows for flexible configuration and extension while maintaining performance and stability.
Last updated 11 months ago