eta_ctrl.envs.base_env module

class eta_ctrl.envs.base_env.BaseEnv(env_id: int, config_run: ConfigRun, state_config: StateConfig, verbose: int = 2, callback: Callable | None = None, state_modification_callback: Callable | None = None, seed: int | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, sim_steps_per_sample: int | str = 1, scenario_manager: ScenarioManager | None = None, render_mode: str | None = None, path_env: Path | None = None, **kwargs: Any)[source]

Bases: Env, ABC

Abstract environment definition, providing some basic functionality for concrete environments to use.

The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA Ctrl framework and should be used as the starting point for new environments.

The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.

There are some class attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required class attributes are:

version: Version number of the environment.

description: Short description string of the environment.

The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.

step()

reset()

close()

render()

Note

Subclasses should implement the private _step and _reset methods rather than overriding the public step and reset methods. The public methods handle the Gymnasium interface and state management automatically.

Parameters:

env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback that should be called after each episode.
state_modification_callback – callback that should be called after state setup, before logging the state.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
path_env – Explicit path to the environment directory. If not provided, the path will be automatically detected from the call stack. If detection fails, falls back to current working directory.
kwargs – Other keyword arguments (for subclasses).

abstract property version: str

Version of the environment.

Needs to be implemented for each subclass as a class attribute.

abstract property description: str

Long description of the environment.

Needs to be implemented for each subclass as a class attribute.

verbose: int: Verbosity level used for logging.

config_run: ConfigRun: Information about the optimization run and information about the paths. For example, it defines results_path and scenarios_path.

callback: Callable | None: Callback can be used for logging and plotting.

state_modification_callback: Callable | None: Callback can be used for modifying the state at each time step.

env_id: int: ID of the environment (useful for vectorized environments).

episode_duration: float: Duration of one episode in seconds.

sampling_time: float: Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int: Number of time steps (of width sampling_time) in each episode.

sim_steps_per_sample: int: Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

state_config: StateConfig: State Configuration for defining State Variables.

scenario_manager: Manager to load scenario data into the state

property run_name: str

property results_path: Path

property scenarios_path: Path | None

property series_results_path: Path

abstractmethod _step() → tuple[float, bool, bool, dict][source]

Abstract method to perform one internal time step.

This private method must be implemented by subclasses to update the internal state dictionary and return step results. It should work with the internal state rather than returning observations directly.

Returns:: Tuple of (reward, terminated, truncated, info)

step(action: np.ndarray) → StepResult[source]

Proceed one time step and return the reward for the action provided as well as the new observation.

This method handles the public interface for the step operation. It validates actions, executes actions by calling the private _step method implemented by subclasses, increments n_steps, manages state updates, and returns the formatted results (reward of the previous action taken, new environment state).

It also updates the state log and calls the state modification callback.

Parameters:

action – Actions taken by the agent.

Returns:

The return value represents the state of the environment after the step was performed:

observations: A dictionary with new observation values as defined by the
observation space, automatically extracted from the internal state.
reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

abstractmethod _reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Abstract reset method that must be implemented by subclasses.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[ObservationType, dict[str, Any]][source]

Reset the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

abstractmethod close() → None[source]: Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

abstractmethod render() → None[source]

Render the environment.

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.

rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

classmethod get_info() → tuple[str, str][source]

Get info about environment.

Returns:: Tuple of version and description.

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') → None[source]

Extension of csv_export to include timeseries on the data.

Parameters:

names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.

get_observations() → dict[str, ndarray][source]

Gather observations from the state.

Raises:: KeyError – Observation is not available in state
Returns:: Filtered observations as a dictionary.
Return type:: dict[str, np.ndarray]

get_external_inputs() → dict[str, int | float | bool | str][source]

Gather external inputs from the state. Uses scalar values instead of numpy arrays for values.

Raises:

KeyError – External input is not available in state
ValueError – External input value is not scalar

Returns:

Filtered external inputs with external id as keys.

Return type:

dict[str, int | float | bool | str]

set_action(action: ndarray | dict[str, ndarray]) → None[source]

Set action values in the state.

Parameters:: action (np.ndarray | dict[str, np.ndarray]) – Actions to be set.

set_external_outputs(external_outputs: Mapping[str, int | float | bool | str]) → None[source]

Set external outputs in the state. Accepts scalars instead of numpy arrays as values.

Parameters:: external_outputs (Mapping[str, int | float | bool | str]) – Dict of external outputs with external_ids as keys.
Raises:: KeyError – Received an unknown external id

set_scenario_state(reset: bool = False) → None[source]

Set scenario output values for the current timestep in the state.

Parameters:: reset – Indicator whether this was called from the reset method