Create a Pandemic Simulation with Unity III: Adding Artificial Intelligence
In the last two article we have created an epidemic simulation within Unity and analyze the spread of epidemic disease. In this part we will convert our epidemic simulation to a reinforcement learning environment and train artificially intelligent agents both in single-agent scenario and multi-agent scenario. In the end, agents will learn how to do social distancing. This article series is based on my MSc dissertation at University of Sussex for Advanced Computer Science and it is called “Simulation-Based Reinforcement Learning for Social Distancing”
The source of the project is available on github. You can download or clone from here.
Since we have already created our epidemic simulation we can start adding up AI and spice up things even more. We will use Unity ML-Agents Toolkit Release-6 to create our environment and train agents with reinforcement learning techniques. if you didn’t have done the first two tutorials I would strongly recommend you to take a look and then return to this tutorial back.
- Create an Epidemic Simulation in Unity I: Simulating an epidemic spread
- Create a Pandemic Simulation with Unity II: Analyzing SIR graphs
Background: Reinforcement Learning
Reinforcement Learning (RL) is an area of Machine Learning where an agent learns the best behavior by interacting with the environment. Creating these complex environments and artificially intelligent agents that solve complex human-relevant tasks has been a life-long challenge for RL researchers. The environment is a crucial component of RL and determines broadly the task that the agent has to solve. Agents and environments cannot be considered separately, and it is only a design choice for the researcher to determine where the environment starts and the agent ends. Most of the time the environment is defined as anything that an agent cannot have direct control over. The agent perceives its environment through sensor observations and acts upon that environment through effectors. For each action selected by the agent, the environment provides a reward, and the agent aims to maximize the total rewards that it gets. In every step, the agent performs an action by following a strategy which is called policy. The agent starts with a random policy, and as the training continues the policy is optimized by a learning algorithm.
Unity ML-Agents Toolkit
The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables Unity scenes to serve as environments for training intelligent agents. Agents can be trained using reinforcement learning, imitation learning, neuroevolution, or other machine learning methods through a simple-to-use Python API. In this article we utilize standard reinforcement learning techniques to train our agents. In this part, we will create one script called PandemicAgent.cs and update one of the previous script called PandemicArea.cs.
We will create two different scenarios: single-agent and multi-agent. In single-agent scenario, agent will be trained to avoid infected bots and at the same time try to catch rewards as much as possible. In multi-agent scenario there will be many agents in the same environment and they will all try to avoid each other.
Lets start with a single-agent scenario since it will be easier to iterate and clean the code, also there is literally a few change between two scenarios.
ML-Agent Toolkit make it extremely easy to create agents by inheriting abstract class “Agent”. An agent is an entity that can observe its environment, decide on the best course of action using those observations, and execute those actions within its environment. In order to use the mlagents package you need to download Window>Package Manager >ML Agents.
After adding the needed packages we are ready to use ML-Agents Toolkit .Agents can be created in Unity by inheriting the
Agent class. There are some methods that needs to be implemented in order to use Agent class. We will define each method as follows:
Create a new script called PandemicAgent.cs and change its monoBehavior type to “Agent”
Agent.OnEpisodeBegin()— Called at the beginning of an Agent's episode, including at the beginning of the simulation. This will be called in Pandemic Area script inside of the resetPandemicArea() method.
Agent.Initialize()— Called when agent is created. Inside of this method, we initialize the needed objects that we will communicate during the training such as area, agent’s rigidbody etc.
Agent.CollectObservations(VectorSensor sensor)— Called every step that the Agent requests a decision. For our agents there will be 4 observations: local velocity in x and z, and hot encoded health status which can be represent with 2 binary in total 4.
Agent.OnActionReceived()— Called every time the Agent receives an action to take. Receives the action chosen by the Agent. It is also common to assign a reward in this method. We created another method as
MoveAgent()inside of this function.
Agent.Heuristic()- When the
Behavior Typeis set to
Heuristic Onlyin the Behavior Parameters of the Agent, the Agent will use the
Heuristic()method to generate the actions of the Agent. As such, the
Heuristic()method writes to the array of floats provided to the Heuristic method as argument.
AddReward()— Adding reward mostly comes as a design choice. Important thing to remember for the designer is
Here is the script: PandemicAgent.cs
Designing an Agent
In this tutorial we used the same model of dummyBot for Agent with only change in color. Create a new dummyBot remove the dummBot.cs component and add PandemicAgent.cs. After the addition you should see that a script called Behavior Parameters automatically added into your model. We should also add DecisionRequester.cs This script will decide which action our agent should take. If you forget to put this script agents are not gonna take any actions.
Lastly add Ray Perception Sensor 3D script.This script creates vector observations for our agent to see the area. We used 16 rays per direction but you can optimize it to a lower number of rays.With the help of rays and other observations that we define in Pandemic Agent script,we created our RL agent and it is ready for an epidemic outbreak.
Most of the parts are the same with the first two article, but as an extra we added reset function for agents at the start of every episode. The spawning of agents are also similar to dummyBots() but with one difference. We decided to put agents before running the simulation so PandemicArea.cs do not create new agents, instead it finds all agents and store them in a single list where we can control all of them easily.
Here is the updated script: PandemicArea.cs
Training the environment
We finished the implementing part in Unity. Lets train our first agent using a terminal.
- Open a command or terminal window.
- Navigate to the folder where you cloned the
Pandemic_Simulationrepository. Note: If you followed the default installation, then you should be able to run
mlagents-learnfrom any directory.
- Run following code block:
mlagents-learn config/PPO/trainer_config_MVP.yaml --run-id=firstRun
config/ppo/trainer_config.yamlis the path to a default training configuration file that we provide. The
config/PPOfolder includes training configuration files for all many different configuration for epidemic simulation
run-idis a unique name for this training session.
- When the message “Start training by pressing the Play button in the Unity Editor” is displayed on the screen, you can press the Play button in Unity to start training in the Editor.
mlagents-learn runs correctly and starts training, you should see some similar messages on terminal:
Congrats you just started your first training for epidemic simulation in Unity👏 Now from another terminal you can open and see the results of your simulation with tensorboard.
- Open an another command or terminal window.
- Navigate to the folder where your configuration file is. You should see a new folder called “results” appeared in the config folder.
- Run following code:
tensorboard --logdir results
Then navigate to
localhost:6006 in your browser to view the TensorBoard summary statistics as shown below. For the purposes of this section, the most important statistic is
Environment/Cumulative Reward which should increase throughout training and eventually converge to some value. In your local host you should see the following graphs.
Fantastic, you succeed to create your own reinforcement learning environment! From now on, you can try to optimize the hyperparameters of PPO and our deep neural network by changing the values in the configuration file. This was the last article of Series “Create a Pandemic Simulation with Unity”. I learned many things while doing the project both practical and theoretical. I hope I was able to share some of my know-how.and make someone’s life a little bit easier who follows this tutorial series. I know the path is tough but I always remind myself: Success comes with hard work. See you next time 👋