eta_ctrl.envs.sim_env module

class eta_ctrl.envs.sim_env.SimEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any] | None = None, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for FMU Simulation models environments.

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • model_parameters – Parameters for the mathematical model.

  • sim_steps_per_sample – Number of simulation steps to perform during every sample.

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

abstract property fmu_name: str

Name of the FMU file.

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

path_fmu: pathlib.Path

The FMU is expected to be placed in the same folder as the environment

model_parameters: Mapping[str, int | float] | None

Configuration for the FMU model parameters, that need to be set for initialization of the Model.

simulator: FMUSimulator

Instance of the FMU. This can be used to directly access the eta_ctrl.FMUSimulator interface.

simulate() tuple[bool, float][source]

Perform a simulator step.

Updates the state with new external outputs from the simulation results.

Returns:

Boolean showing whether all simulation steps were successful and time elapsed during simulation.

_step() tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to work with modified actions (e.g., discretized or shaped actions), ensure they are processed before reaching this method or handle them within this method using the values in self.state. If you need to manipulate observations afterwarads, you can do this using the state modification callback.

Returns:

A tuple containing:

  • reward: The value of the reward function. This is just one floating point value.

  • terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)

    which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().

  • truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied

    (i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) dict[str, Any][source]

Reset the internal state of the environment and return info dictionary.

This private method initializes the internal self.state dictionary by reading initial values directly from the FMU/simulator. It does not use the seed parameter since the initial state is determined by the simulator configuration.

The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.

Parameters:
  • seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes external outputs from the FMU without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the Simulation environment is to close the FMU object.