Technical Architecture

This document outlines the technical architecture of Casino of Life, providing insights into its components, data flow, and integration points.

System Overview

Casino of Life consists of several interconnected components that work together to enable natural language-driven AI training for retro fighting games.

┌─────────────────┐     ┌───────────────────┐     ┌────────────────┐
│                 │     │                   │     │                │
│  Natural        │────▶│  Training         │────▶│  Game          │
│  Language       │     │  Pipeline         │     │  Environment   │
│  Interface      │     │                   │     │                │
│                 │     │                   │     │                │
└────────▲────────┘     └───────┬───────────┘     └────────┬───────┘
         │                      │                          │
         │                      │                          │
         │                      ▼                          ▼
┌────────┴────────┐     ┌───────────────────┐     ┌────────────────┐
│                 │     │                   │     │                │
│  Web            │◀───▶│  Reward           │◀────│  Observation   │
│  Interface      │     │  System           │     │  Processor     │
│                 │     │                   │     │                │
└─────────────────┘     └───────────────────┘     └────────────────┘

Core Components

1. Natural Language Interface

The natural language interface, powered by CaballoLoko, translates human instructions into training configurations.

Key Classes:

CaballoLoko: Main interface for natural language processing
IntentProcessor: Identifies training intents from text
ParameterExtractor: Extracts specific training parameters
ResponseGenerator: Creates human-readable responses

Example Flow:

User input is processed by CaballoLoko.chat()
IntentProcessor identifies the training intent
ParameterExtractor pulls specific parameters
Configuration is passed to the training pipeline
ResponseGenerator creates a human-readable response

2. Game Environment

Built on top of the Stable-Retro library, the game environment component handles game emulation and state management.

Key Classes:

RetroEnv: Main environment class for game emulation
ObservationProcessor: Processes raw game frames
ActionSpace: Defines available actions for the agent
StateManager: Handles game state loading and saving

Technical Details:

Stochastic frame skipping (2-4 frames)
84x84 grayscale observation processing
4-frame stacking for temporal information
Multi-player support (2 players)

3. Training Pipeline

The training pipeline integrates with Stable-Baselines3 to provide reinforcement learning capabilities.

Key Classes:

DynamicAgent: Main agent class with adaptive learning
TrainingManager: Handles training configuration and execution
ModelRegistry: Manages saved models and checkpoints
HyperparameterOptimizer: Optimizes training parameters

Supported Algorithms:

PPO (Proximal Policy Optimization)
A2C (Advantage Actor Critic)
DQN (Deep Q-Network)
SAC (Soft Actor-Critic)

4. Reward System

The modular reward system allows for flexible definition of success criteria.

Key Classes:

BaseRewardEvaluator: Abstract base class for reward evaluators
MultiObjectiveRewardEvaluator: Combines multiple reward sources
RewardEvaluatorManager: Manages and switches between reward systems
RewardScaler: Scales and normalizes rewards

5. Web Interface

The web interface provides visualization and control capabilities.

Key Classes:

TrainingServer: Main server class for the web interface
DashboardManager: Manages dashboard components and views
WebSocketHandler: Handles real-time data streaming
APIEndpoints: Defines RESTful API endpoints

Data Flow

Training Initialization:
- User provides natural language instruction
- CaballoLoko processes instruction into training parameters
- Training pipeline configures agent and environment
- Training begins with specified parameters
Training Loop:
- Environment produces observation
- Observation processor converts raw frames to agent input
- Agent selects action based on policy
- Environment executes action and returns next observation and reward
- Reward evaluators calculate composite reward
- Agent updates its policy based on experience
- Metrics are collected and sent to web interface
Model Persistence:
- Checkpoints are saved at configured intervals
- Models can be loaded for continued training
- Trained agents can be exported for deployment

Integration Points

External Libraries

Casino of Life integrates with several key libraries:

Stable-Retro: Game emulation and environment
Stable-Baselines3: Reinforcement learning algorithms
PyTorch: Neural network backend
FastAPI: Web server and API
React: Frontend dashboard

Custom Integration

You can extend Casino of Life with custom components:

# Example: Custom observation processor
from casino_of_life.environment import ObservationProcessor

class CustomObservationProcessor(ObservationProcessor):
    def __init__(self, resolution=(96, 96)):
        super().__init__()
        self.resolution = resolution
        
    def process(self, observation):
        # Custom processing logic
        processed_obs = self._resize(observation, self.resolution)
        processed_obs = self._normalize(processed_obs)
        return processed_obs
        
# Register and use the custom processor
from casino_of_life.environment import RetroEnv

env = RetroEnv(
    game='MortalKombatII-Genesis',
    observation_processor=CustomObservationProcessor(resolution=(96, 96))
)

Performance Considerations

Memory Management: Automatic garbage collection for efficient memory use
Vectorized Environments: Support for parallel environment execution
Frame Skipping: Reduces computational load while maintaining learning capability
Checkpointing: Efficient model saving and loading
Observation Caching: Reduces redundant processing

The modular architecture of Casino of Life allows for flexible configuration and extension while maintaining performance and stability.

PreviousWeb Interface and Dashboard NextAdvanced Training Techniques

Last updated 3 months ago

┌─────────────────┐ ┌───────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ Natural │────▶│ Training │────▶│ Game │ │ Language │ │ Pipeline │ │ Environment │ │ Interface │ │ │ │ │ │ │ │ │ │ │ └────────▲────────┘ └───────┬───────────┘ └────────┬───────┘ │ │ │ │ │ │ │ ▼ ▼ ┌────────┴────────┐ ┌───────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ Web │◀───▶│ Reward │◀────│ Observation │ │ Interface │ │ System │ │ Processor │ │ │ │ │ │ │ └─────────────────┘ └───────────────────┘ └────────────────┘

# Example: Custom observation processor from casino_of_life.environment import ObservationProcessor class CustomObservationProcessor(ObservationProcessor): def __init__(self, resolution=(96, 96)): super().__init__() self.resolution = resolution def process(self, observation): # Custom processing logic processed_obs = self._resize(observation, self.resolution) processed_obs = self._normalize(processed_obs) return processed_obs # Register and use the custom processor from casino_of_life.environment import RetroEnv env = RetroEnv( game='MortalKombatII-Genesis', observation_processor=CustomObservationProcessor(resolution=(96, 96)) )