Environments
ETA Ctrl environments are based on the interfaces offered by stable_baselines3 which are in turn based on the Farama gymnasium environments. The ETA Ctrl environments are provided as abstract classes which must be subclassed to create useful implementations. For the specific use cases they are intended for, these base classes make the creation of new environments much easier.
Custom environments should follow the interface for custom environments discussed in the stable_baselines3 documentation. The following describes the functions available to simplify implementation of specific functionality in custom environments. You can look at the Usage examples for some inspiration what custom environments can look like.
For simulation environments using FMU files, see the FMU Workflow documentation for a streamlined approach to initially create FMU-based environments.
The custom environments created with the utilities described here can be used directly with stable_baselines3 or
gymnasium. However, using the EtaCtrl class is recommended (see Introduction).
When using the EtaCtrl class for your optimization runs, the parameters required for environment instantiation must
be configured in the environment_specific section of the configuration.
Environment State Configuration
The most important concept to understand when working with the environment utilities provided by ETA Ctrl is
is the handling and configuration of the environment state. The state is represented by a
eta_ctrl.envs.StateConfig object. Each StateConfig contains eta_ctrl.envs.StateVar objects which
each correspond to one variable of the environment. From the StateConfig object we can
determine most other aspects of the environment, such as for example the observation space and action space. The
gymnasium documentation provides more information about Spaces.
ETA Ctrl supports simple definition of StateConfigs in .toml, .yaml, or .json files. By default a config file is expected in the same folder as the environment with the name environment_class*_state_config.*suffix. See the examples for details.
A minimal state TOML structure might look like this:
[[actions]]
name = "heater_power"
low_value = 0.0
high_value = 1.0
is_agent_action = true
ext_id = "heater_u"
[[observations]]
name = "room_temp"
low_value = -50.0
high_value = 80.0
is_agent_observation = true
ext_id = "temp"
Each state variable is represented by a StateVar object:
- class eta_ctrl.envs.StateVar(*, name: str, is_agent_action: bool = False, is_agent_observation: bool = False, add_to_state_log: bool = True, ext_id: str | None = None, is_ext_input: bool = False, is_ext_output: bool = False, ext_scale_add: float = 0.0, ext_scale_mult: float = 1.0, scenario_id: str | None = None, from_scenario: bool = False, scenario_scale_add: float = 0.0, scenario_scale_mult: float = 1.0, low_value: float = -3.4028234663852886e+38, high_value: float = 3.4028234663852886e+38, abort_condition_min: float = -inf, abort_condition_max: float = inf, index: int = 0, duration: int = 1)[source]
A variable in the state of an environment.
For example, the variable “tank_temperature” might be part of the environment’s state. Let’s assume it represents the temperature inside the tank of a cleaning machine. This variable could be read from an external source. In this case it must have
is_ext_output = Trueand the name of the external variable to read from must be specified:ext_id = "T_Tank". If this value should also be passed to the agent as an observation, setis_agent_observation = True. For observations and actions, you also need to set the low and high values, which determine the size of the observation and action spaces in this case something likelow_value = 20andhigh_value = 80(if we are talking about water temperature measured in Celsius) might make sense.If you want the environment to safely abort the optimization when certain values are exceeded, set the abort conditions to sensible values such as
abort_condition_min = 0andabort_condition_max = 100. This can be especially useful for example if you have simulation models which do not support certain values (for example, in this case they might not be able to handle water temperatures higher than 100 °C):v1 = StateVar( "tank_temperature", ext_id = "T_Tank", is_ext_output = True, is_agent_observation = True, low_value = 20, high_value = 80, abort_condition_min = 0, abort_condition_max = 100, )
As another example, you could set up an agent action named
name = "set_heater"which the environment uses to set the state of the tank heater. In this case, the state variable should be configured withis_agent_action = Trueand you might want to pass this on to a simulation model or an actual machine by settingis_ext_input = True:v2 = StateVar( "set_heater", ext_id = "u_tank", is_ext_input = True, is_agent_action = True, )
Finally, let’s create a third variable which is read from a scenario file and converted from kilowatts to watts (multiplied by 1000). Additionally, this variable needs to be offset by a value of -10 due to measurement errors:
v3 = StateVar( "outside_temperature", scenario_id = "T_ouside", scenario_scale_add = -10, scenario_scale_mult = 1000, is_agent_observation = True, low_value = 0, high_value = 40, )
- model_config = {'extra': 'forbid', 'frozen': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str
Name of the state variable (This must always be specified).
- is_agent_action: bool
Should the agent specify actions for this variable? (default: False).
- is_agent_observation: bool
Should the agent be allowed to observe the value of this variable? (default: False).
- add_to_state_log: bool
Should the state log of this episode be added to state_log_longtime? (default: True).
- ext_id: str | None
Name of the variable in the external model (e.g.: environment or FMU) (default: StateVar.name if (is_ext_input or is_ext_output) else None).
- is_ext_input: bool
Should this variable be passed to the external model as an input? (default: False).
- is_ext_output: bool
Should this variable be parsed from the external model output? (default: False).
- ext_scale_add: float
Value to add to the output from an external model (default: 0.0).
- ext_scale_mult: float
Value to multiply to the output from an external model (default: 1.0).
- scenario_id: str | None
Name of the scenario variable, this value should be read from (default: None).
- from_scenario: bool
Should this variable be read from imported timeseries date? (default: False).
- scenario_scale_add: float
Value to add to the value read from a scenario file (default: 0.0).
- scenario_scale_mult: float
Value to multiply to the value read from a scenario file (default: 1.0).
- low_value: float
Lowest possible value of the state variable (default: -np.finfo(np.float32).max).
- high_value: float
Highest possible value of the state variable (default: np.finfo(np.float32).max).
- abort_condition_min: float
If the value of the variable dips below this, the episode should be aborted (default: -np.inf).
- abort_condition_max: float
If the value of the variable rises above this, the episode should be aborted (default: np.inf).
- index: int
Determine the index, where to look (useful for mathematical optimization, where multiple time steps could be returned). In this case, the index values might be different for actions and observations.
- duration: int
For scenario StateVars: Length of StateVars horizon in state, e.g. the prediction horizon length (unit: steps).
All state variables are combined into the StateConfig object:
- class eta_ctrl.envs.StateConfig(*state_vars: StateVar, source_file: Path | None = None)[source]
The configuration for the action and observation spaces. The values are used to control which variables are part of the action space and observation space. Therefore, the StateConfig is very important for the functionality of EtaCtrl.
Using the examples above, we could create the StateConfig object by passing our three state variables to the constructor:
state_config = StateConfig(v1, v2, v3)
If you are creating an environment, assign the StateConfig object to
self.state_config. This will sometimes even be sufficient to create a fully functional environment.- vars
Mapping of the variables names to their StateVar instance with all associated information.
- df_vars: pandas.DataFrame
- ext_inputs: list[str]
List of variables that should be provided to an external source (such as an FMU).
- ext_outputs: list[str]
List of variables that can be received from an external source (such as an FMU).
- classmethod from_file(root_path: pathlib.Path, filename: Path, extra_params: Mapping[str, float] | None = None) Self[source]
Load a StateConfig from a config file.
- Parameters:
file – Path of the config file.
- Returns:
StateConfig object.
- store_file(file: Path) None[source]
Save the StateConfig to a comma separated file.
- Parameters:
file – Path to the file.
- within_abort_conditions(state: Mapping[str, float]) bool[source]
Check whether the given state is within the abort conditions specified by the StateConfig instance.
- Parameters:
state – The state array to check for conformance.
- Returns:
Result of the check (False if the state does not conform to the required conditions).
The state config object and its attributes (such as the observations) are used by the environments to determine which values to update during steps, which values to read from scenario files and which values to pass to the agent as actions.
Base Environment
TODO: What is an environment?
- class eta_ctrl.envs.BaseEnv(env_id: int, config_run: ConfigRun, state_config: StateConfig, verbose: int = 2, callback: Callable | None = None, state_modification_callback: Callable | None = None, seed: int | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, sim_steps_per_sample: int | str = 1, scenario_manager: ScenarioManager | None = None, render_mode: str | None = None, path_env: Path | None = None, **kwargs: Any)[source]
-
Abstract environment definition, providing some basic functionality for concrete environments to use.
The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA Ctrl framework and should be used as the starting point for new environments.
The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.
There are some class attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required class attributes are:
version: Version number of the environment.
description: Short description string of the environment.
The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.
step()
reset()
close()
render()
Note
Subclasses should implement the private _step and _reset methods rather than overriding the public step and reset methods. The public methods handle the Gymnasium interface and state management automatically.
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback that should be called after each episode.
state_modification_callback – callback that should be called after state setup, before logging the state.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
path_env – Explicit path to the environment directory. If not provided, the path will be automatically detected from the call stack. If detection fails, falls back to current working directory.
kwargs – Other keyword arguments (for subclasses).
- abstract property version: str
Version of the environment.
Needs to be implemented for each subclass as a class attribute.
- abstract property description: str
Long description of the environment.
Needs to be implemented for each subclass as a class attribute.
- verbose: int
Verbosity level used for logging.
- config_run: ConfigRun
Information about the optimization run and information about the paths. For example, it defines results_path and scenarios_path.
- callback: Callable | None
Callback can be used for logging and plotting.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- env_id: int
ID of the environment (useful for vectorized environments).
- episode_duration: float
Duration of one episode in seconds.
- sampling_time: float
Sampling time (interval between optimization time steps) in seconds.
- n_episode_steps: int
Number of time steps (of width sampling_time) in each episode.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- state_config: StateConfig
State Configuration for defining State Variables.
- action_space
- observation_space
- scenario_manager
Manager to load scenario data into the state
- property run_name: str
- property results_path: Path
- property series_results_path: Path
- abstractmethod _step() tuple[float, bool, bool, dict][source]
Abstract method to perform one internal time step.
This private method must be implemented by subclasses to update the internal state dictionary and return step results. It should work with the internal state rather than returning observations directly.
- Returns:
Tuple of (reward, terminated, truncated, info)
- step(action: np.ndarray) StepResult[source]
Proceed one time step and return the reward for the action provided as well as the new observation.
This method handles the public interface for the step operation. It validates actions, executes actions by calling the private _step method implemented by subclasses, increments n_steps, manages state updates, and returns the formatted results (reward of the previous action taken, new environment state).
It also updates the state log and calls the state modification callback.
- Parameters:
action – Actions taken by the agent.
- Returns:
The return value represents the state of the environment after the step was performed:
- observations: A dictionary with new observation values as defined by the
observation space, automatically extracted from the internal state.
reward: The value of the reward function. This is just one floating point value.
- terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call
reset().
- truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call
reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- abstractmethod _reset(*, seed: int | None = None, options: dict[str, Any] | None = None) dict[str, Any][source]
Abstract reset method that must be implemented by subclasses.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]][source]
Reset the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seedparameter.- Parameters:
seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space(typically a numpy array) and is analogous to the observation returned bystep(). Info is a dictionary containing auxiliary information complementingobservation. It should be analogous to theinforeturned bystep().
- abstractmethod close() None[source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
- abstractmethod render() None[source]
Render the environment.
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- classmethod get_info() tuple[str, str][source]
Get info about environment.
- Returns:
Tuple of version and description.
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None[source]
Extension of csv_export to include timeseries on the data.
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.
- get_external_inputs() dict[str, int | float | bool | str][source]
Gather external inputs from the state. Uses scalar values instead of numpy arrays for values.
- set_external_outputs(external_outputs: Mapping[str, int | float | bool | str]) None[source]
Set external outputs in the state. Accepts scalars instead of numpy arrays as values.
- set_scenario_state(reset: bool = False) None[source]
Set scenario output values for the current timestep in the state.
- Parameters:
reset – Indicator whether this was called from the reset method
- property np_random: numpy.random.Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int
Returns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.If
np_random_seedwas set directly instead of throughreset()orset_np_random_through_seed(), the seed will take the value -1.- Returns:
int: the seed of the current np_random or -1, if the seed of the rng is unknown
- set_wrapper_attr(name: str, value: Any, *, force: bool = True) bool
Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Envinstance
Pyomo Simulation Environment
PyomoSimEnv is a class for using Pyomo modelling language for environment representation.
- class eta_ctrl.envs.PyomoSimEnv(*args: Any, **kwargs: Any)[source]
-
Gymnasium environment that simulates state transitions using a Pyomo model without a solver.
Instead of optimizing over a full prediction horizon,
PyomoSimEnvinstantiates the model withprediction_horizon = sampling_time(i.e. one time step) and evaluates PyomoExpressioncomponents to compute the next state.The model must define an
start_value_mappingthat maps initial-condition Param names to their corresponding Expression names. Each step, the environment:Fixes agent actions in the model at t=0.
Evaluates the mapped Expressions at t=1 to obtain the next state.
Updates the initial-condition Params via
pyo_update_params()for the following step.
This allows reusing the same Pyomo model definition for both MPC optimization (with
MpcAgent) and step-by-step simulation.- Parameters:
args – Positional arguments forwarded to
BaseEnv.kwargs – Keyword arguments forwarded to
BaseEnv. May includemodel_parameters(dict) which is extracted and passed to the model constructor.
- abstract property model_import: str
Dotted import path to the
PyomoModelsubclass.
- _step() tuple[float, bool, bool, dict][source]
Perform one internal time step and return core step results.
This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).
- Returns:
A tuple containing:
reward: The value of the reward function. This is just one floating point value.
- terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call
reset().
- truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call
reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
Note
Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.
- _reset(*, seed: int | None = None, options: dict[str, Any] | None = None) dict[str, Any][source]
Reset the internal state of the environment and return info dictionary.
This private method initializes the internal self.state dictionary by reading initial paramneter values from the PyomoModel. It does not use the seed parameter since the initial state is determined by the user configuration.
For Custom environments, the first line of
reset()should besuper().reset(seed=seed)which implements the seeding correctly.The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.
- Parameters:
seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.
Note
The base implementation initializes observations from the pyomo model without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.
- close() None[source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
- static create_state(model: pyo.ConcreteModel, model_name: str, output_dir: pathlib.Path | str | None = None) None[source]
Create both state config and parameters files from a Pyomo model.
This method creates both a state configuration TOML file (containing variables/observations) and a parameters TOML file from a Pyomo ConcreteModel, providing a complete setup for Pyomo-based environments.
- Parameters:
model – Pyomo ConcreteModel instance.
model_name – Name of the model for identification.
output_dir – Directory where files should be created. If None, uses current working directory.
Simulation (FMU) Environment
The SimEnv supports the control of environments represented as FMU simulation models. Make sure to set the fmu_name attribute when subclassing this environment. The FMU file will be loaded from the same directory as the environment itself.
- class eta_ctrl.envs.SimEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any] | None = None, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]
-
Base class for FMU Simulation models environments.
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
model_parameters – Parameters for the mathematical model.
sim_steps_per_sample – Number of simulation steps to perform during every sample.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- abstract property fmu_name: str
Name of the FMU file.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- path_fmu: pathlib.Path
The FMU is expected to be placed in the same folder as the environment
- model_parameters: Mapping[str, int | float] | None
Configuration for the FMU model parameters, that need to be set for initialization of the Model.
- simulator: FMUSimulator
Instance of the FMU. This can be used to directly access the eta_ctrl.FMUSimulator interface.
- simulate() tuple[bool, float][source]
Perform a simulator step.
Updates the state with new external outputs from the simulation results.
- Returns:
Boolean showing whether all simulation steps were successful and time elapsed during simulation.
- _step() tuple[float, bool, bool, dict][source]
Perform one internal time step and return core step results.
This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).
Note
This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to work with modified actions (e.g., discretized or shaped actions), ensure they are processed before reaching this method or handle them within this method using the values in self.state. If you need to manipulate observations afterwarads, you can do this using the state modification callback.
- Returns:
A tuple containing:
reward: The value of the reward function. This is just one floating point value.
- terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call
reset().
- truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call
reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
Note
Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.
- _reset(*, seed: int | None = None, options: dict[str, Any] | None = None) dict[str, Any][source]
Reset the internal state of the environment and return info dictionary.
This private method initializes the internal self.state dictionary by reading initial values directly from the FMU/simulator. It does not use the seed parameter since the initial state is determined by the simulator configuration.
The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.
- Parameters:
seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.
Note
The base implementation initializes external outputs from the FMU without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.
Live Connection Environment
The LiveEnv is an environment which creates direct (live) connections to actual devices. It utilizes
eta_nexus.ConnectionManager to achieve this. Please also read the corresponding documentation
because ConnectionManager needs additional configuration.
- class eta_ctrl.envs.LiveEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, max_errors: int = 10, render_mode: str | None = None, **kwargs: Any)[source]
-
Base class for Live environments.
The class will create an ETA Nexus ConnectionManager instance and provide facilities to automatically read step results and reset the connection.
Additionally to required class attribute from BaseEnv, LiveEnv requires the name of the connection manager configuration file as a class attribute:
config_name: Name of the connection manager configuration.
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
max_errors – Maximum number of connection errors before interrupting the optimization process.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- abstract property config_name: str
Name of the connection manager configuration.
Needs to be implemented for each subclass as a class attribute.
- connection_manager: ConnectionManager
Instance of the Live Connector.
- connection_manager_config: Path | Sequence[Path] | dict[str, Any] | None
Path or Dict to initialize the live connector.
- max_error_count: int
Maximum error count when connections in live connector are aborted.
- _step() tuple[float, bool, bool, dict][source]
Perform one internal time step and return core step results.
This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment, which are available in the state dictionary.
This also updates self.state and self.state_log to store current state information.
Note
This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.
- Returns:
A tuple containing:
reward: The value of the reward function. This is just one floating point value.
- terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call
reset().
- truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call
reset().
- info: Provide some additional info about the state of the environment. The contents of this may
be used for logging purposes in the future but typically do not currently serve a purpose.
Note
Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.
- _reset(*, seed: int | None = None, options: dict[str, Any] | None = None) dict[str, Any][source]
Reset the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seedparameter otherwise if the environment already has a random number generator andreset()is called withseed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.- Parameters:
seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.
Note
The base implementation initializes external outputs from the live connector without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.