The reward system is a crucial component of reinforcement learning in Casino of Life. This guide explains how to create, customize, and optimize reward functions for your AI agents.
Reward System Basics
In reinforcement learning, rewards guide the agent toward desired behaviors. Casino of Life provides a flexible and modular reward system that allows you to precisely define what constitutes "success" for your fighting game AI.
Built-in Reward Evaluators
Casino of Life comes with several pre-built reward evaluators:
BasicRewardEvaluator
Handles fundamental fighting game metrics:
from casino_of_life.reward_evaluators import BasicRewardEvaluator
basic_reward = BasicRewardEvaluator(
health_reward=1.0, # Reward for maintaining health
damage_penalty=-1.0, # Penalty for taking damage
hit_reward=0.5, # Reward for landing hits
block_reward=0.2, # Reward for successful blocks
move_penalty=-0.01 # Small penalty to discourage button mashing
)
StageCompleteRewardEvaluator
Provides rewards for level progression:
from casino_of_life.reward_evaluators import StageCompleteRewardEvaluator
stage_reward = StageCompleteRewardEvaluator(
stage_complete_reward=100.0, # Large reward for completing a stage
round_win_reward=25.0, # Reward for winning a round
time_bonus_factor=0.5 # Additional reward based on remaining time
)
SpecialMoveRewardEvaluator
Encourages the use of specific techniques:
from casino_of_life.reward_evaluators import SpecialMoveRewardEvaluator
special_move_reward = SpecialMoveRewardEvaluator(
moves={
"fireball": 1.0, # Reward for using fireball
"uppercut": 1.5, # Reward for uppercut
"sweep": 0.8 # Reward for sweep
},
successful_hit_multiplier=2.0 # Double reward if move connects
)
Combining Reward Evaluators
Multiple reward evaluators can be combined using the MultiObjectiveRewardEvaluator:
You can create custom reward evaluators by extending the BaseRewardEvaluator class:
from casino_of_life.reward_evaluators import BaseRewardEvaluator
class ComboRewardEvaluator(BaseRewardEvaluator):
def __init__(self, combo_thresholds=None):
super().__init__()
self.combo_thresholds = combo_thresholds or {
2: 1.0, # 2-hit combo: 1.0 reward
3: 2.0, # 3-hit combo: 2.0 reward
5: 5.0 # 5+ hit combo: 5.0 reward
}
self.current_combo = 0
self.last_hit_time = 0
def evaluate(self, state, next_state, action, info):
reward = 0
# Check if a hit was registered
if info.get("hit", False):
current_time = info.get("frame_count", 0)
# If hit is within combo window (20 frames)
if current_time - self.last_hit_time < 20:
self.current_combo += 1
else:
self.current_combo = 1
self.last_hit_time = current_time
# Apply rewards based on combo thresholds
for threshold, reward_value in sorted(self.combo_thresholds.items()):
if self.current_combo >= threshold:
reward = max(reward, reward_value)
# Reset combo if hit streak ends
elif self.current_combo > 0:
current_time = info.get("frame_count", 0)
if current_time - self.last_hit_time >= 30:
self.current_combo = 0
return reward
Managing Reward Evaluators
Use the RewardEvaluatorManager to organize and switch between different reward systems:
from casino_of_life.client_bridge import RewardEvaluatorManager
# Initialize reward manager
reward_manager = RewardEvaluatorManager()
# Register different reward systems for different scenarios
reward_manager.register_evaluator("tournament", tournament_reward_system)
reward_manager.register_evaluator("practice", practice_reward_system)
reward_manager.register_evaluator("aggressive", aggressive_reward_system)
# Use the appropriate reward system based on the current task
agent = DynamicAgent(
env=env,
reward_evaluator=reward_manager.get_evaluator("tournament"),
frame_stack=4,
learning_rate=0.0003
)
Reward Scaling and Balancing
Proper scaling of rewards is essential for effective learning:
from casino_of_life.reward_evaluators import RewardScaler
# Scale a reward system to prevent value explosions
scaled_reward = RewardScaler(
reward_evaluator=reward_system,
scale_factor=0.1,
clip_min=-10,
clip_max=10
)
Best Practices for Reward Design
Start simple: Begin with basic health/damage rewards before adding complexity
Balance immediate vs. delayed rewards: Mix short-term feedback with long-term goals
Avoid reward hacking: Test for unintended behaviors that might exploit your reward system
Normalize reward scales: Keep different reward components on similar scales
Introduce curriculum learning: Gradually increase the complexity of the reward system as the agent improves
By mastering the reward system in Casino of Life, you can create sophisticated AI agents with complex, nuanced behaviors that reflect your intended fighting game strategies.