Understanding the Reward System

The reward system is a crucial component of reinforcement learning in Casino of Life. This guide explains how to create, customize, and optimize reward functions for your AI agents.

Reward System Basics

In reinforcement learning, rewards guide the agent toward desired behaviors. Casino of Life provides a flexible and modular reward system that allows you to precisely define what constitutes "success" for your fighting game AI.

Built-in Reward Evaluators

Casino of Life comes with several pre-built reward evaluators:

BasicRewardEvaluator

Handles fundamental fighting game metrics:

from casino_of_life.reward_evaluators import BasicRewardEvaluator

basic_reward = BasicRewardEvaluator(
    health_reward=1.0,        # Reward for maintaining health
    damage_penalty=-1.0,      # Penalty for taking damage
    hit_reward=0.5,           # Reward for landing hits
    block_reward=0.2,         # Reward for successful blocks
    move_penalty=-0.01        # Small penalty to discourage button mashing
)

StageCompleteRewardEvaluator

Provides rewards for level progression:

from casino_of_life.reward_evaluators import StageCompleteRewardEvaluator

stage_reward = StageCompleteRewardEvaluator(
    stage_complete_reward=100.0,    # Large reward for completing a stage
    round_win_reward=25.0,          # Reward for winning a round
    time_bonus_factor=0.5           # Additional reward based on remaining time
)

SpecialMoveRewardEvaluator

Encourages the use of specific techniques:

from casino_of_life.reward_evaluators import SpecialMoveRewardEvaluator

special_move_reward = SpecialMoveRewardEvaluator(
    moves={
        "fireball": 1.0,            # Reward for using fireball
        "uppercut": 1.5,            # Reward for uppercut
        "sweep": 0.8                # Reward for sweep
    },
    successful_hit_multiplier=2.0   # Double reward if move connects
)

Combining Reward Evaluators

Multiple reward evaluators can be combined using the MultiObjectiveRewardEvaluator:

from casino_of_life.reward_evaluators import MultiObjectiveRewardEvaluator

reward_system = MultiObjectiveRewardEvaluator([
    BasicRewardEvaluator(health_reward=1.0, damage_penalty=-1.0),
    StageCompleteRewardEvaluator(stage_complete_reward=100.0),
    SpecialMoveRewardEvaluator(moves={"fireball": 1.0, "uppercut": 1.5})
])

Creating Custom Reward Evaluators

You can create custom reward evaluators by extending the BaseRewardEvaluator class:

from casino_of_life.reward_evaluators import BaseRewardEvaluator

class ComboRewardEvaluator(BaseRewardEvaluator):
    def __init__(self, combo_thresholds=None):
        super().__init__()
        self.combo_thresholds = combo_thresholds or {
            2: 1.0,    # 2-hit combo: 1.0 reward
            3: 2.0,    # 3-hit combo: 2.0 reward
            5: 5.0     # 5+ hit combo: 5.0 reward
        }
        self.current_combo = 0
        self.last_hit_time = 0
        
    def evaluate(self, state, next_state, action, info):
        reward = 0
        
        # Check if a hit was registered
        if info.get("hit", False):
            current_time = info.get("frame_count", 0)
            
            # If hit is within combo window (20 frames)
            if current_time - self.last_hit_time < 20:
                self.current_combo += 1
            else:
                self.current_combo = 1
                
            self.last_hit_time = current_time
            
            # Apply rewards based on combo thresholds
            for threshold, reward_value in sorted(self.combo_thresholds.items()):
                if self.current_combo >= threshold:
                    reward = max(reward, reward_value)
                    
        # Reset combo if hit streak ends
        elif self.current_combo > 0:
            current_time = info.get("frame_count", 0)
            if current_time - self.last_hit_time >= 30:
                self.current_combo = 0
                
        return reward

Managing Reward Evaluators

Use the RewardEvaluatorManager to organize and switch between different reward systems:

from casino_of_life.client_bridge import RewardEvaluatorManager

# Initialize reward manager
reward_manager = RewardEvaluatorManager()

# Register different reward systems for different scenarios
reward_manager.register_evaluator("tournament", tournament_reward_system)
reward_manager.register_evaluator("practice", practice_reward_system)
reward_manager.register_evaluator("aggressive", aggressive_reward_system)

# Use the appropriate reward system based on the current task
agent = DynamicAgent(
    env=env,
    reward_evaluator=reward_manager.get_evaluator("tournament"),
    frame_stack=4,
    learning_rate=0.0003
)

Reward Scaling and Balancing

Proper scaling of rewards is essential for effective learning:

from casino_of_life.reward_evaluators import RewardScaler

# Scale a reward system to prevent value explosions
scaled_reward = RewardScaler(
    reward_evaluator=reward_system,
    scale_factor=0.1,
    clip_min=-10,
    clip_max=10
)

Best Practices for Reward Design

Start simple: Begin with basic health/damage rewards before adding complexity
Balance immediate vs. delayed rewards: Mix short-term feedback with long-term goals
Avoid reward hacking: Test for unintended behaviors that might exploit your reward system
Normalize reward scales: Keep different reward components on similar scales
Introduce curriculum learning: Gradually increase the complexity of the reward system as the agent improves

By mastering the reward system in Casino of Life, you can create sophisticated AI agents with complex, nuanced behaviors that reflect your intended fighting game strategies.

PreviousNatural Language Training Interface NextWeb Interface and Dashboard

Last updated 3 months ago