Introduction
ETA Ctrl is the rolling horizon optimization module which combines the functionality of the other modules. It is based on the Farama gymnasium framework and utilizes algorithms and functions from the stable_baselines3 package. The ETA Ctrl module also contains some extensions for stable_baselines3, these include additional policies, extractors, schedules and agents.
The module contains functions meant to simplify the general process of creating rolling horizon optimization models. It contains the EtaCtrl class which in turn combines all of this information such that you can start simple optimizations in just two lines. For example, to start the pendulum example (which is taken from the gymnasium framework):
experiment = EtaCtrl(
config_name="config_learning", root_path=root_path, config_overwrite=overwrite, config_relpath="."
)
experiment.learn(series_name="learning_series", run_name="run1", reset=True)
experiment.play(series_name="learning_series", run_name="run1")
The resulting optimization will have full configuration support, logging, support for multiple series of optimization runs and many other things.
Note
It is not necessary to use the EtaCtrl class to utilize the other tools provided by this module. For example, you can utilize the functionality provided by Experiment configuration and Common Functions to build completely custom optimization scripts while still benefitting from centralized configuration, management of file paths, additional logging features and so on.
The algorithms in ETA Ctrl are an extension to the algorithms provided by stable_baselines3. These algorithms specifically include some algorithms which are not from the field of reinforcement learning but can be employed in generalized rolling horizon optimization settings.
The functions available in eta_ctrl.envs make it easy to create new, custom environments (see stable_baselines3 custom environments). For example, they provide functionality for integrating FMU simulation models, communicating with real assets in factories, or integrating Pyomo models as environments.
The EtaCtrl class is built on top of this functionality and orchestrates the interaction between the agent, the environment, and any optional external models. For a detailed description of the control loop, time indexing, scenario data availability, and timing assumptions, see Time Management and Control Flow.
Take a look at the examples folder in the ETA Ctrl repository to see some of the possibilities.
What are series and runs?
ETA Ctrl builds on the concept of experiments. An experiment can be configured to perform optimizations using specific environments and agents. An example of this concept is shown in the figure
Example of the ETA Ctrl experiment concept.
As shown in the figure, the process starts with the configuration (setup) file, which is written in JSON format (see Experiment configuration). Based on this configuration, the environment and corresponding agent can be initialized and executed.
An experiment with a single configuration can consist of a series of different optimization runs. Each optimization run could for example have different external conditions for the environment (such as being performed at a different time of the year).
What is an algorithm or agent?
The control algorithm receives inputs from the environment and follows a strategy to control a system. This strategy could be either rule based, determined by mathematical optimization, machine learning (reinforcement learning) or other methods, such as metaheuristics.
The agent receives observations from an environment and determines actions to control the environment based on those observations.
What is an environment?
The environment is a dynamic system, which receives inputs (actions) from the control algorithm. Observations made in the environment are passed to the agent.
How to get started
Usually you want to use the EtaCtrl class as shown above to initialize your experiment. This will automatically load a JSON configuration file (see also :ref: eta_experiment_config). The file to load the configuration from is specified during class instantiation:
- class eta_ctrl.EtaCtrl(config_name: str, root_path: Path | None = None, config_overwrite: Mapping[str, Any] | None = None, config_relpath: Path | None = None)[source]
Initialize an optimization model and provide interfaces for optimization, learning and execution (play).
- Parameters:
config_name – Name of configuration file in configuration directory (should be JSON format).
root_path – Root path of the application (the configuration will be interpreted relative to this).
config_overwrite – Dictionary to overwrite selected configurations.
config_relpath – Path to configuration file, relative to root path.
After the class is instantiated, you can use the play and learn methods to execute the experiment:
- EtaCtrl.learn(self, *, series_name: str | None = None, run_name: str | None = None, run_description: str = '', reset: bool = False, callbacks: MaybeCallback = None) None[source]
Start the learning job for an agent with the specified environment.
- Parameters:
series_name – Name for a series of runs.
run_name – Name for a specific run.
run_description – Description for a specific run.
reset – Indication whether possibly existing models should be reset. Learning will be continued if model exists and reset is false.
callbacks – Provide additional callbacks to send to the model.learn() call.
- EtaCtrl.play(self, *, series_name: str | None = None, run_name: str | None = None, run_description: str = '') None[source]
Play with previously learned agent model in environment.
- Parameters:
series_name – Name for a series of runs.
run_name – Name for a specific run.
run_description – Description for a specific run.
Experiment configuration
The central part of the ETA Ctrl module is the experiment configuration. This configuration can be read from a JSON file and determines the setup of the entire experiment, including which agent and environment to load and how to set each one up. The configuration is defined by the Config dataclass and its subsidiaries ConfigSetup and ConfigSettings.
When you are using EtaCtrl (the class) the configuration will be read automatically.
Use eta_ctrl.config::from_file() to read the configuration from a JSON, TOML or YAML file:
- Config.from_file(config_name: str, root_path: StrPath | None = None, config_relpath: StrPath | None = None, overwrite: Mapping[str, Any] | None = None) Config[source]
Configuration example
The following is the configuration for the pendulum example in this repository. It is relatively minimal in that it makes extensive use of the defaults defined in the Config classes.
{
"setup": {
"environment_import": "examples.pendulum.pendulum.PendulumEnv",
"agent_import": "stable_baselines3.ppo.PPO",
"vectorizer_import": "stable_baselines3.common.vec_env.SubprocVecEnv",
"policy_import": "stable_baselines3.ppo.MultiInputPolicy",
"tensorboard_log": true,
"monitor_wrapper": true,
"norm_wrapper_obs": true,
"norm_wrapper_reward": true
},
"settings": {
"sampling_time": 0.05,
"episode_duration": 15,
"save_model_every_x_episodes": 100,
"n_episodes_learn": 1000,
"n_episodes_play": 5,
"n_environments": 4,
"seed": 321,
"environment": {
"max_speed": 8,
"max_torque": 2.0,
"g": 10,
"mass": 1,
"length": 1
},
"agent": {
"gamma": 0.99,
"n_steps": 256,
"ent_coef": 0.01,
"learning_rate": 0.00020,
"vf_coef": 0.5,
"max_grad_norm": 0.5,
"gae_lambda": 0.95,
"batch_size": 4,
"n_epochs": 4,
"clip_range": 0.2,
"policy_kwargs": {
"net_arch": [500, 400, 300]
}
}
}
}
Config section ‘setup’
The settings configured in the setup section are the following:
- pydantic model eta_ctrl.config.config_setup.ConfigSetup[source]
Helper class, which is part of Config, for import and setup parameters.
Show JSON schema
{ "title": "ConfigSetup", "description": "Helper class, which is part of `Config`, for import and setup parameters.", "type": "object", "properties": { "agent_import": { "description": "Import description string for the agent class.", "title": "Agent Import", "type": "string" }, "environment_import": { "description": "Import description string for the environment class.", "title": "Environment Import", "type": "string" }, "vectorizer_import": { "default": "stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv", "description": "Import description string for the environment vectorizer\n(default: stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv).", "title": "Vectorizer Import", "type": "string" }, "policy_import": { "default": "eta_ctrl.common.NoPolicy", "description": "Import description string for the policy class (default: eta_ctrl.agents.common.NoPolicy).", "title": "Policy Import", "type": "string" }, "monitor_wrapper": { "default": false, "description": "Flag which is true if the environment should be wrapped for monitoring (default: False).", "title": "Monitor Wrapper", "type": "boolean" }, "norm_wrapper_obs": { "default": false, "description": "Flag which is true if the observations should be normalized (default: False).", "title": "Norm Wrapper Obs", "type": "boolean" }, "norm_wrapper_reward": { "default": false, "description": "Flag which is true if the rewards should be normalized (default: False).", "title": "Norm Wrapper Reward", "type": "boolean" }, "tensorboard_log": { "default": false, "description": "Flag to enable tensorboard logging (default: False).", "title": "Tensorboard Log", "type": "boolean" } }, "additionalProperties": false, "required": [ "agent_import", "environment_import" ] }
- field agent_import: str [Required]
Import description string for the agent class.
- Validated by:
_resolve_classes
- field agent_class: type[BaseAlgorithm] [Required]
Agent class (automatically determined from agent_import).
- Validated by:
_resolve_classes
- field environment_import: str [Required]
Import description string for the environment class.
- Validated by:
_resolve_classes
- field environment_class: type[BaseEnv] [Required]
Imported Environment class (automatically determined from environment_import).
- Validated by:
_resolve_classes
- field vectorizer_import: str = 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'
Import description string for the environment vectorizer (default: stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv).
Import description string for the environment vectorizer (default: stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv).
- Validated by:
_resolve_classes
- field vectorizer_class: type[DummyVecEnv | SubprocVecEnv] [Required]
Environment vectorizer class (automatically determined from vectorizer_import).
- Validated by:
_resolve_classes
- field policy_import: str = 'eta_ctrl.common.NoPolicy'
Import description string for the policy class (default: eta_ctrl.agents.common.NoPolicy).
- Validated by:
_resolve_classes
- field policy_class: type[BasePolicy] [Required]
Policy class (automatically determined from policy_import).
- Validated by:
_resolve_classes
- field monitor_wrapper: bool = False
Flag which is true if the environment should be wrapped for monitoring (default: False).
- Validated by:
_resolve_classes
- field norm_wrapper_obs: bool = False
Flag which is true if the observations should be normalized (default: False).
- Validated by:
_resolve_classes
- field norm_wrapper_reward: bool = False
Flag which is true if the rewards should be normalized (default: False).
- Validated by:
_resolve_classes
- field tensorboard_log: bool = False
Flag to enable tensorboard logging (default: False).
- Validated by:
_resolve_classes
Config section ‘paths’
The optional paths section can contain the following optional relative paths:
- pydantic model eta_ctrl.config.config_paths.ConfigPaths[source]
Show JSON schema
{ "title": "ConfigPaths", "type": "object", "properties": { "state_file_relpath": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Relative path to the state_config file (default: [environment_classname]_state_config).\nThe method :py:meth:`~eta_ctrl.envs.StateConfig.from_file` will first try at the root path,\nthen search in the /environment folder if not successful.", "title": "State File Relpath" }, "results_relpath": { "default": "results", "description": "Relative path to the results folder.", "format": "path", "title": "Results Relpath", "type": "string" }, "scenarios_relpath": { "default": "scenarios", "description": "Relative path to the scenarios folder.", "format": "path", "title": "Scenarios Relpath", "type": "string" } }, "additionalProperties": false }
- field state_file_relpath: Path | None = None
Relative path to the state_config file (default: [environment_classname]_state_config). The method
from_file()will first try at the root path, then search in the /environment folder if not successful.Relative path to the state_config file (default: [environment_classname]_state_config). The method
from_file()will first try at the root path, then search in the /environment folder if not successful.
- field results_relpath: Path = PosixPath('results')
Relative path to the results folder.
- field scenarios_relpath: Path = PosixPath('scenarios')
Relative path to the scenarios folder.
Config section ‘settings’
The configuration options in the settings section are the following.
Note
The configuration options “environment” and “agent” are separate sections. They are loaded into the settings object as dictionaries. To determine, which options are valid for these sections, please look at the arguments required for instantiation of the agent or environment. These arguments must be specified as parameters in the corresponding section.
- pydantic model eta_ctrl.config.config_settings.ConfigSettings[source]
Helper class, which is part of Config, for settings parameters.
Show JSON schema
{ "title": "ConfigSettings", "description": "Helper class, which is part of `Config`, for settings parameters.", "type": "object", "properties": { "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Seed for random sampling (default: None).", "title": "Seed" }, "verbose": { "default": 2, "description": "Logging verbosity of the framework (default: 2).", "maximum": 3, "minimum": 0, "title": "Verbose", "type": "integer" }, "n_environments": { "default": 1, "description": "Number of vectorized environments to instantiate (if not using DummyVecEnv) (default: 1).", "minimum": 1, "title": "N Environments", "type": "integer" }, "n_episodes_play": { "anyOf": [ { "minimum": 1, "type": "integer" }, { "type": "null" } ], "default": 1, "description": "Number of episodes to execute when the agent is playing (default: None).", "title": "N Episodes Play" }, "n_episodes_learn": { "anyOf": [ { "minimum": 1, "type": "integer" }, { "type": "null" } ], "default": 1, "description": "Number of episodes to execute when the agent is learning (default: None).", "title": "N Episodes Learn" }, "save_model_every_x_episodes": { "default": 10, "description": "How often to save the model during training (default: 10 - after every ten episodes).", "minimum": 1, "title": "Save Model Every X Episodes", "type": "integer" }, "plot_interval": { "default": 10, "description": "How many episodes to pass between each render call (default: 10 - after every ten episodes).", "minimum": 1, "title": "Plot Interval", "type": "integer" }, "scenario_time_begin": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Beginning time of the scenario.", "title": "Scenario Time Begin" }, "scenario_time_end": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Ending time of the scenario.", "title": "Scenario Time End" }, "use_random_time_slice": { "default": false, "description": "Boolean flag whether to use a random time slice when the difference of\nscenario_time_end and scenario_time_begin is greater than the episode duration (default: False).", "title": "Use Random Time Slice", "type": "boolean" }, "sampling_time": { "description": "Duration between time samples in seconds (can be a float value).", "exclusiveMinimum": 0, "title": "Sampling Time", "type": "number" }, "episode_duration": { "description": "Duration of an episode in seconds (can be a float value).", "exclusiveMinimum": 0, "title": "Episode Duration", "type": "number" }, "prediction_horizon": { "anyOf": [ { "exclusiveMinimum": 0, "type": "number" }, { "type": "null" } ], "default": null, "description": "Total duration of one prediction/optimization run when used with the MPC agent.", "title": "Prediction Horizon" }, "sim_steps_per_sample": { "anyOf": [ { "minimum": 1, "type": "integer" }, { "type": "null" } ], "default": null, "description": "Simulation steps for every sample.", "title": "Sim Steps Per Sample" }, "scale_actions": { "anyOf": [ { "type": "number" }, { "type": "null" } ], "default": null, "description": "Multiplier for scaling the agent actions before passing them to the environment (default: None).", "title": "Scale Actions" }, "round_actions": { "anyOf": [ { "minimum": 1, "type": "integer" }, { "type": "null" } ], "default": null, "description": "Number of digits to round actions to before passing them to the environment (default: None).", "title": "Round Actions" }, "env": { "additionalProperties": true, "description": "Settings dictionary for specifically the environment.", "title": "Env", "type": "object" }, "agent": { "additionalProperties": true, "description": "Settings dictionary for specifically the agent.", "title": "Agent", "type": "object" }, "log_to_file": { "default": true, "description": "Flag which is true if the log output should be written to a file (default: True).", "title": "Log To File", "type": "boolean" }, "scenario_files": { "anyOf": [ { "items": { "$ref": "#/$defs/ConfigCsvScenario" }, "type": "array" }, { "type": "null" } ], "default": null, "title": "Scenario Files" } }, "$defs": { "ConfigCsvScenario": { "additionalProperties": false, "properties": { "path": { "description": "Relative path to the scenario.", "title": "Path", "type": "string" }, "interpolation_method": { "anyOf": [ { "enum": [ "ffill", "bfill", "interpolate", "asfreq" ], "type": "string" }, { "type": "null" } ], "default": null, "description": "Pandas method to use for filling missing data [\"ffill\", \"bfill\", \"interpolate\", \"asfreq\"].", "title": "Interpolation Method" }, "scale_factors": { "anyOf": [ { "additionalProperties": { "type": "number" }, "type": "object" }, { "type": "null" } ], "default": null, "description": "Scale factors for each column.", "title": "Scale Factors" }, "prefix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Prefix for all column names.", "title": "Prefix" }, "infer_datetime_cols": { "anyOf": [ { "enum": [ "string", "dates" ], "type": "string" }, { "maxItems": 2, "minItems": 2, "prefixItems": [ { "type": "integer" }, { "type": "integer" } ], "type": "array" } ], "default": "dates", "description": "Methof of datetime parsing", "title": "Infer Datetime Cols" }, "time_conversion_str": { "default": "%Y-%m-%d %H:%M", "description": "Time conversion string used when ``infer_datetime_cols`` is set to 'string'.\n\nShould specify the format for Python ``strptime``.", "title": "Time Conversion Str", "type": "string" }, "rename_cols": { "anyOf": [ { "additionalProperties": { "type": "string" }, "type": "object" }, { "type": "null" } ], "default": null, "description": "Dictionary for renaming column names.\n\n.. note::\n\n The column names are stripped of illegal characters and underscores are added in place of spaces.\n \"Water Temperature #2 [\u00b0C]\" becomes \"Water_Temperature_2_C\". If you want to rename the column,\n you need to specify the processed name, for example: {\"Water_Temperature_2_C\": \"T_W\"}.", "title": "Rename Cols" }, "scenarios_path": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Directory for the scenarios. Not included in config declaration, passed by main Config object.", "title": "Scenarios Path" } }, "required": [ "path" ], "title": "ConfigCsvScenario", "type": "object" } }, "additionalProperties": true, "required": [ "sampling_time", "episode_duration" ] }
- field seed: int | None = None
Seed for random sampling (default: None).
- Validated by:
_check_duplicate_aliases_validate_time_params
- field verbose: int = 2
Logging verbosity of the framework (default: 2).
- Constraints:
ge = 0
le = 3
- Validated by:
_check_duplicate_aliases_validate_time_params
- field n_environments: int = 1
Number of vectorized environments to instantiate (if not using DummyVecEnv) (default: 1).
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field n_episodes_play: int | None = 1
Number of episodes to execute when the agent is playing (default: None).
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field n_episodes_learn: int | None = 1
Number of episodes to execute when the agent is learning (default: None).
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field save_model_every_x_episodes: int = 10
How often to save the model during training (default: 10 - after every ten episodes).
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field plot_interval: int = 10
How many episodes to pass between each render call (default: 10 - after every ten episodes).
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field scenario_time_begin: datetime | None = None
Beginning time of the scenario.
- Validated by:
_check_duplicate_aliases_convert_datetimes_validate_time_params
- field scenario_time_end: datetime | None = None
Ending time of the scenario.
- Validated by:
_check_duplicate_aliases_convert_datetimes_validate_time_params
- field use_random_time_slice: bool = False
Boolean flag whether to use a random time slice when the difference of scenario_time_end and scenario_time_begin is greater than the episode duration (default: False).
Boolean flag whether to use a random time slice when the difference of scenario_time_end and scenario_time_begin is greater than the episode duration (default: False).
- Validated by:
_check_duplicate_aliases_validate_time_params
- field sampling_time: float [Required]
Duration between time samples in seconds (can be a float value).
- Constraints:
gt = 0
- Validated by:
_check_duplicate_aliases_validate_time_params
- field episode_duration: float [Required]
Duration of an episode in seconds (can be a float value).
- Constraints:
gt = 0
- Validated by:
_check_duplicate_aliases_validate_time_params
- field prediction_horizon: float | None = None
Total duration of one prediction/optimization run when used with the MPC agent.
- Constraints:
gt = 0
- Validated by:
_check_duplicate_aliases_validate_time_params
- field sim_steps_per_sample: int | None = None
Simulation steps for every sample.
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field scale_actions: float | None = None
Multiplier for scaling the agent actions before passing them to the environment (default: None).
- Validated by:
_check_duplicate_aliases_validate_time_params
- field round_actions: int | None = None
Number of digits to round actions to before passing them to the environment (default: None).
- Constraints:
ge = 1
- Validated by:
_check_duplicate_aliases_validate_time_params
- field environment: dict[str, Any] [Optional]
Settings dictionary for specifically the environment.
- Validated by:
_check_duplicate_aliases_validate_time_params
- field agent: dict[str, Any] [Optional]
Settings dictionary for specifically the agent.
- Validated by:
_check_duplicate_aliases_validate_time_params
- field log_to_file: bool = True
Flag which is true if the log output should be written to a file (default: True).
- Validated by:
_check_duplicate_aliases_validate_time_params
- field scenario_files: list[ConfigCsvScenario] | None = None
- Validated by:
_check_duplicate_aliases_validate_time_params
Configuration for optimization runs
An optimization run must also be configured. This is done through the ConfigRun class. The
class uses the series name and run names for initialization. It provides facilities to
create the paths required for optimization and to store information about the environments.
Below, you can see the parameters that ConfigRun offers. Full documentation is in the API
docs: eta_ctrl.config.ConfigRun.
Note
EtaCtrl instantiates an object of this class automatically from the JSON configuration file. You do not need to specify any of the parameters listed here. They are listed here to show what is available for use during the optimization run.
- class eta_ctrl.config.ConfigRun(*, series: str, name: str, description, root_path, results_path, scenarios_path)[source]
Configuration for an optimization run, including the series and run names descriptions and paths for the run.
- series: str
Name of the series of optimization runs.
- name: str
Name of an optimization run.
- description: str
Description of an optimization run.
- root_path: Path
Root path of the framework run.
- results_path: Path
Path to results of the optimization run.
- scenarios_path: Path
Path to scenarios used for the optimization run.
- series_results_path: Path
Path for the results of the series of optimization runs.
- run_model_path: Path
Path to the model of the optimization run.
- run_info_path: Path
Path to information about the optimization run.
- run_monitor_path: Path
Path to the monitoring information about the optimization run.
- vec_normalize_path: Path
Path to the normalization wrapper information.
- net_arch_path: Path
Path to the neural network architecture file.
- log_output_path: Path
Path to the log output file.