Environments

ETA Ctrl environments are based on the interfaces offered by stable_baselines3 which are in turn based on the Farama gymnasium environments. The ETA Ctrl environments are provided as abstract classes which must be subclassed to create useful implementations. For the specific use cases they are intended for, these base classes make the creation of new environments much easier.

Custom environments should follow the interface for custom environments discussed in the stable_baselines3 documentation. The following describes the functions available to simplify implementation of specific functionality in custom environments. You can look at the Usage examples for some inspiration what custom environments can look like.

For simulation environments using FMU files, see the FMU Workflow documentation for a streamlined approach to initially create FMU-based environments.

The custom environments created with the utilities described here can be used directly with stable_baselines3 or gymnasium. However, using the EtaCtrl class is recommended (see Introduction). When using the EtaCtrl class for your optimization runs, the parameters required for environment instantiation must be configured in the environment_specific section of the configuration.

Environment State Configuration

The most important concept to understand when working with the environment utilities provided by ETA Ctrl is is the handling and configuration of the environment state. The state is represented by a eta_ctrl.envs.StateConfig object. Each StateConfig contains eta_ctrl.envs.StateVar objects which each correspond to one variable of the environment. From the StateConfig object we can determine most other aspects of the environment, such as for example the observation space and action space. The gymnasium documentation provides more information about Spaces.

ETA Ctrl supports simple definition of StateConfigs in .toml, .yaml, or .json files. By default a config file is expected in the same folder as the environment with the name environment_class*_state_config.*suffix. See the examples for details.

A minimal state TOML structure might look like this:

[[actions]]
name = "heater_power"
low_value = 0.0
high_value = 1.0
is_agent_action = true
ext_id = "heater_u"

[[observations]]
name = "room_temp"
low_value = -50.0
high_value = 80.0
is_agent_observation = true
ext_id = "temp"

Each state variable is represented by a StateVar object:

class eta_ctrl.envs.StateVar(*, name: str, is_agent_action: bool = False, is_agent_observation: bool = False, add_to_state_log: bool = True, ext_id: str | None = None, is_ext_input: bool = False, is_ext_output: bool = False, ext_scale_add: float = 0.0, ext_scale_mult: float = 1.0, scenario_id: str | None = None, from_scenario: bool = False, scenario_scale_add: float = 0.0, scenario_scale_mult: float = 1.0, low_value: float = -3.4028234663852886e+38, high_value: float = 3.4028234663852886e+38, abort_condition_min: float = -inf, abort_condition_max: float = inf, index: int = 0, duration: int = 1)[source]

A variable in the state of an environment.

For example, the variable “tank_temperature” might be part of the environment’s state. Let’s assume it represents the temperature inside the tank of a cleaning machine. This variable could be read from an external source. In this case it must have is_ext_output = True and the name of the external variable to read from must be specified: ext_id = "T_Tank". If this value should also be passed to the agent as an observation, set is_agent_observation = True. For observations and actions, you also need to set the low and high values, which determine the size of the observation and action spaces in this case something like low_value = 20 and high_value = 80 (if we are talking about water temperature measured in Celsius) might make sense.

If you want the environment to safely abort the optimization when certain values are exceeded, set the abort conditions to sensible values such as abort_condition_min = 0 and abort_condition_max = 100. This can be especially useful for example if you have simulation models which do not support certain values (for example, in this case they might not be able to handle water temperatures higher than 100 °C):

v1 = StateVar(
    "tank_temperature",
    ext_id = "T_Tank",
    is_ext_output = True,
    is_agent_observation = True,
    low_value = 20,
    high_value = 80,
    abort_condition_min = 0,
    abort_condition_max = 100,
)

As another example, you could set up an agent action named name = "set_heater" which the environment uses to set the state of the tank heater. In this case, the state variable should be configured with is_agent_action = True and you might want to pass this on to a simulation model or an actual machine by setting is_ext_input = True:

v2 = StateVar(
    "set_heater",
    ext_id = "u_tank",
    is_ext_input = True,
    is_agent_action = True,
)

Finally, let’s create a third variable which is read from a scenario file and converted from kilowatts to watts (multiplied by 1000). Additionally, this variable needs to be offset by a value of -10 due to measurement errors:

v3 = StateVar(
    "outside_temperature",
    scenario_id = "T_ouside",
    scenario_scale_add = -10,
    scenario_scale_mult = 1000,
    is_agent_observation = True,
    low_value = 0,
    high_value = 40,
)

model_config = {'extra': 'forbid', 'frozen': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str: Name of the state variable (This must always be specified).

is_agent_action: bool: Should the agent specify actions for this variable? (default: False).

is_agent_observation: bool: Should the agent be allowed to observe the value of this variable? (default: False).

add_to_state_log: bool: Should the state log of this episode be added to state_log_longtime? (default: True).

ext_id: str | None: Name of the variable in the external model (e.g.: environment or FMU) (default: StateVar.name if (is_ext_input or is_ext_output) else None).

is_ext_input: bool: Should this variable be passed to the external model as an input? (default: False).

is_ext_output: bool: Should this variable be parsed from the external model output? (default: False).

ext_scale_add: float: Value to add to the output from an external model (default: 0.0).

ext_scale_mult: float: Value to multiply to the output from an external model (default: 1.0).

scenario_id: str | None: Name of the scenario variable, this value should be read from (default: None).

from_scenario: bool: Should this variable be read from imported timeseries date? (default: False).

scenario_scale_add: float: Value to add to the value read from a scenario file (default: 0.0).

scenario_scale_mult: float: Value to multiply to the value read from a scenario file (default: 1.0).

low_value: float: Lowest possible value of the state variable (default: -np.finfo(np.float32).max).

high_value: float: Highest possible value of the state variable (default: np.finfo(np.float32).max).

abort_condition_min: float: If the value of the variable dips below this, the episode should be aborted (default: -np.inf).

abort_condition_max: float: If the value of the variable rises above this, the episode should be aborted (default: np.inf).

index: int: Determine the index, where to look (useful for mathematical optimization, where multiple time steps could be returned). In this case, the index values might be different for actions and observations.

duration: int: For scenario StateVars: Length of StateVars horizon in state, e.g. the prediction horizon length (unit: steps).

model_post_init(context: Any) → None[source]: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

All state variables are combined into the StateConfig object:

class eta_ctrl.envs.StateConfig(*state_vars: StateVar, source_file: Path | None = None)[source]

The configuration for the action and observation spaces. The values are used to control which variables are part of the action space and observation space. Therefore, the StateConfig is very important for the functionality of EtaCtrl.

Using the examples above, we could create the StateConfig object by passing our three state variables to the constructor:

state_config = StateConfig(v1, v2, v3)

If you are creating an environment, assign the StateConfig object to self.state_config. This will sometimes even be sufficient to create a fully functional environment.

vars: Mapping of the variables names to their StateVar instance with all associated information.

source_file: Path | None: Attribute to store the source file path (if loaded from file).

df_vars: pandas.DataFrame

actions: list[str]: List of variables that are agent actions. Needs to be ordered.

observations: list[str]: Set of variables that are agent observations.

add_to_state_log: list[str]: Set of variables that should be logged.

ext_inputs: list[str]: List of variables that should be provided to an external source (such as an FMU).

ext_outputs: list[str]: List of variables that can be received from an external source (such as an FMU).

map_ext_ids: dict[str, str]: Mapping of variable names to their external IDs.

rev_ext_ids: dict[str, str]: Reverse mapping of external IDs to their corresponding variable names.

scenario_outputs: list[str]: List of variables which are loaded from scenario files.

map_scenario_ids: dict[str, str]: Mapping of internal environment names to scenario IDs.

abort_conditions_min: list[str]: List of variables that have minimum values for an abort condition.

abort_conditions_max: list[str]: List of variables that have maximum values for an abort condition.

classmethod from_file(root_path: pathlib.Path, filename: Path, extra_params: Mapping[str, float] | None = None) → Self[source]

Load a StateConfig from a config file.

Parameters:: file – Path of the config file.
Returns:: StateConfig object.

store_file(file: Path) → None[source]

Save the StateConfig to a comma separated file.

Parameters:: file – Path to the file.

within_abort_conditions(state: Mapping[str, float]) → bool[source]

Check whether the given state is within the abort conditions specified by the StateConfig instance.

Parameters:: state – The state array to check for conformance.
Returns:: Result of the check (False if the state does not conform to the required conditions).

continuous_action_space() → Box[source]

Generate a numpy ndarray action space.

Returns:: Action space.

continuous_observation_space() → Dict[source]

Generate a dictionary observation space.

Returns:: Observation Space.

continuous_spaces() → tuple[Box, Dict][source]

Generate continuous action and observation spaces according to the OpenAI specification.

Returns:: Tuple of action space and observation space.

The state config object and its attributes (such as the observations) are used by the environments to determine which values to update during steps, which values to read from scenario files and which values to pass to the agent as actions.

Base Environment

TODO: What is an environment?

class eta_ctrl.envs.BaseEnv(env_id: int, config_run: ConfigRun, state_config: StateConfig, verbose: int = 2, callback: Callable | None = None, state_modification_callback: Callable | None = None, seed: int | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, sim_steps_per_sample: int | str = 1, scenario_manager: ScenarioManager | None = None, render_mode: str | None = None, path_env: Path | None = None, **kwargs: Any)[source]

Bases: Env, ABC

Abstract environment definition, providing some basic functionality for concrete environments to use.

The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA Ctrl framework and should be used as the starting point for new environments.

The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.

There are some class attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required class attributes are:

version: Version number of the environment.

description: Short description string of the environment.

The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.

step()

reset()

close()

render()

Note

Subclasses should implement the private _step and _reset methods rather than overriding the public step and reset methods. The public methods handle the Gymnasium interface and state management automatically.

Parameters:

env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback that should be called after each episode.
state_modification_callback – callback that should be called after state setup, before logging the state.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
path_env – Explicit path to the environment directory. If not provided, the path will be automatically detected from the call stack. If detection fails, falls back to current working directory.
kwargs – Other keyword arguments (for subclasses).

abstract property version: str

Version of the environment.

Needs to be implemented for each subclass as a class attribute.

abstract property description: str

Long description of the environment.

Needs to be implemented for each subclass as a class attribute.

verbose: int: Verbosity level used for logging.

config_run: ConfigRun: Information about the optimization run and information about the paths. For example, it defines results_path and scenarios_path.

callback: Callable | None: Callback can be used for logging and plotting.

state_modification_callback: Callable | None: Callback can be used for modifying the state at each time step.

env_id: int: ID of the environment (useful for vectorized environments).

render_mode: str | None = None: Render mode for rendering the environment

episode_duration: float: Duration of one episode in seconds.

sampling_time: float: Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int: Number of time steps (of width sampling_time) in each episode.

sim_steps_per_sample: int: Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

state_config: StateConfig: State Configuration for defining State Variables.

action_space

observation_space

scenario_manager: Manager to load scenario data into the state

property run_name: str

property results_path: Path

property scenarios_path: Path | None

property series_results_path: Path

abstractmethod _step() → tuple[float, bool, bool, dict][source]

Abstract method to perform one internal time step.

This private method must be implemented by subclasses to update the internal state dictionary and return step results. It should work with the internal state rather than returning observations directly.

Returns:: Tuple of (reward, terminated, truncated, info)

step(action: np.ndarray) → StepResult[source]

Proceed one time step and return the reward for the action provided as well as the new observation.

This method handles the public interface for the step operation. It validates actions, executes actions by calling the private _step method implemented by subclasses, increments n_steps, manages state updates, and returns the formatted results (reward of the previous action taken, new environment state).

It also updates the state log and calls the state modification callback.

Parameters:

action – Actions taken by the agent.

Returns:

The return value represents the state of the environment after the step was performed:

observations: A dictionary with new observation values as defined by the
observation space, automatically extracted from the internal state.
reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

abstractmethod _reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Abstract reset method that must be implemented by subclasses.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[ObservationType, dict[str, Any]][source]

Reset the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

abstractmethod close() → None[source]: Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

abstractmethod render() → None[source]

Render the environment.

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.

rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

classmethod get_info() → tuple[str, str][source]

Get info about environment.

Returns:: Tuple of version and description.

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') → None[source]

Extension of csv_export to include timeseries on the data.

Parameters:

names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.

get_observations() → dict[str, ndarray][source]

Gather observations from the state.

Raises:: KeyError – Observation is not available in state
Returns:: Filtered observations as a dictionary.
Return type:: dict[str, np.ndarray]

get_external_inputs() → dict[str, int | float | bool | str][source]

Gather external inputs from the state. Uses scalar values instead of numpy arrays for values.

Raises:

KeyError – External input is not available in state
ValueError – External input value is not scalar

Returns:

Filtered external inputs with external id as keys.

Return type:

dict[str, int | float | bool | str]

set_action(action: ndarray | dict[str, ndarray]) → None[source]

Set action values in the state.

Parameters:: action (np.ndarray | dict[str, np.ndarray]) – Actions to be set.

set_external_outputs(external_outputs: Mapping[str, int | float | bool | str]) → None[source]

Set external outputs in the state. Accepts scalars instead of numpy arrays as values.

Parameters:: external_outputs (Mapping[str, int | float | bool | str]) – Dict of external outputs with external_ids as keys.
Raises:: KeyError – Received an unknown external id

set_scenario_state(reset: bool = False) → None[source]

Set scenario output values for the current timestep in the state.

Parameters:: reset – Indicator whether this was called from the reset method

get_wrapper_attr(name: str) → Any: Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool: Checks if the attribute name exists in the environment.

property np_random: numpy.random.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: int: the seed of the current np_random or -1, if the seed of the rng is unknown

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool: Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gymnasium.Env instance

Pyomo Simulation Environment

PyomoSimEnv is a class for using Pyomo modelling language for environment representation.

class eta_ctrl.envs.PyomoSimEnv(*args: Any, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Gymnasium environment that simulates state transitions using a Pyomo model without a solver.

Instead of optimizing over a full prediction horizon, PyomoSimEnv instantiates the model with prediction_horizon = sampling_time (i.e. one time step) and evaluates Pyomo Expression components to compute the next state.

The model must define an start_value_mapping that maps initial-condition Param names to their corresponding Expression names. Each step, the environment:

Fixes agent actions in the model at t=0.
Evaluates the mapped Expressions at t=1 to obtain the next state.
Updates the initial-condition Params via pyo_update_params() for the following step.

This allows reusing the same Pyomo model definition for both MPC optimization (with MpcAgent) and step-by-step simulation.

Parameters:

args – Positional arguments forwarded to BaseEnv.
kwargs – Keyword arguments forwarded to BaseEnv. May include model_parameters (dict) which is extracted and passed to the model constructor.

abstract property model_import: str: Dotted import path to the PyomoModel subclass.

_step() → tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).

Returns:

A tuple containing:

reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Reset the internal state of the environment and return info dictionary.

This private method initializes the internal self.state dictionary by reading initial paramneter values from the PyomoModel. It does not use the seed parameter since the initial state is determined by the user configuration.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes observations from the pyomo model without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() → None[source]: Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

static create_state(model: pyo.ConcreteModel, model_name: str, output_dir: pathlib.Path | str | None = None) → None[source]

Create both state config and parameters files from a Pyomo model.

This method creates both a state configuration TOML file (containing variables/observations) and a parameters TOML file from a Pyomo ConcreteModel, providing a complete setup for Pyomo-based environments.

Parameters:

model – Pyomo ConcreteModel instance.
model_name – Name of the model for identification.
output_dir – Directory where files should be created. If None, uses current working directory.

Simulation (FMU) Environment

The SimEnv supports the control of environments represented as FMU simulation models. Make sure to set the fmu_name attribute when subclassing this environment. The FMU file will be loaded from the same directory as the environment itself.

class eta_ctrl.envs.SimEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any] | None = None, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for FMU Simulation models environments.

Parameters:

env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
model_parameters – Parameters for the mathematical model.
sim_steps_per_sample – Number of simulation steps to perform during every sample.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).

abstract property fmu_name: str: Name of the FMU file.

sim_steps_per_sample: int: Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

path_fmu: pathlib.Path: The FMU is expected to be placed in the same folder as the environment

model_parameters: Mapping[str, int | float] | None: Configuration for the FMU model parameters, that need to be set for initialization of the Model.

simulator: FMUSimulator: Instance of the FMU. This can be used to directly access the eta_ctrl.FMUSimulator interface.

simulate() → tuple[bool, float][source]

Perform a simulator step.

Updates the state with new external outputs from the simulation results.

Returns:: Boolean showing whether all simulation steps were successful and time elapsed during simulation.

_step() → tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to work with modified actions (e.g., discretized or shaped actions), ensure they are processed before reaching this method or handle them within this method using the values in self.state. If you need to manipulate observations afterwarads, you can do this using the state modification callback.

Returns:

A tuple containing:

reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Reset the internal state of the environment and return info dictionary.

This private method initializes the internal self.state dictionary by reading initial values directly from the FMU/simulator. It does not use the seed parameter since the initial state is determined by the simulator configuration.

The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes external outputs from the FMU without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() → None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the Simulation environment is to close the FMU object.

Live Connection Environment

The LiveEnv is an environment which creates direct (live) connections to actual devices. It utilizes eta_nexus.ConnectionManager to achieve this. Please also read the corresponding documentation because ConnectionManager needs additional configuration.

class eta_ctrl.envs.LiveEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, max_errors: int = 10, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for Live environments.

The class will create an ETA Nexus ConnectionManager instance and provide facilities to automatically read step results and reset the connection.

Additionally to required class attribute from BaseEnv, LiveEnv requires the name of the connection manager configuration file as a class attribute:

config_name: Name of the connection manager configuration.

Parameters:

env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
max_errors – Maximum number of connection errors before interrupting the optimization process.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).

abstract property config_name: str

Name of the connection manager configuration.

Needs to be implemented for each subclass as a class attribute.

connection_manager: ConnectionManager: Instance of the Live Connector.

connection_manager_config: Path | Sequence[Path] | dict[str, Any] | None: Path or Dict to initialize the live connector.

max_error_count: int: Maximum error count when connections in live connector are aborted.

_step() → tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment, which are available in the state dictionary.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Returns:

A tuple containing:

reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may
be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Reset the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes external outputs from the live connector without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() → None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the connection_manager environment is to do nothing.