eta_ctrl.envs.pyomo_env module

class eta_ctrl.envs.pyomo_env.PyomoEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any], prediction_horizon: TimeStep | str | None = None, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for mathematical MPC models. This class can be used in conjunction with the MathSolver agent. You need to implement the _model method in a subclass and return a pyomo.AbstractModel from it.

Parameters:

env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
sampling_time – Duration of a single time sample / time step in seconds.
model_parameters – Parameters for the mathematical model.
prediction_horizon – Duration of the prediction (usually a subsample of the episode duration).
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).

n_prediction_steps: int: Number of steps in the prediction (prediction_horizon/sampling_time).

model_parameters: Configuration for the MILP model parameters.

time_var: str | None: Name of the “time” variable/set in the model (i.e. “T”). This is if the pyomo sets must be re-indexed when updating the model between time steps. If this is None, it is assumed that no reindexing of the timeseries data is required during updates - this is the default.

nonindex_update_append_string: str | None: Updating indexed model parameters can be achieved either by updating only the first value of the actual parameter itself or by having a separate handover parameter that is used for specifying only the first value. The separate handover parameter can be denoted with an appended string. For example, if the actual parameter is x.ON then the handover parameter could be x.ON_first. To use x.ON_first for updates, set the nonindex_update_append_string to “_first”. If the attribute is set to None, the first value of the actual parameter (x.ON) would be updated instead.

use_model_time_increments: bool: Some models may not use the actual time increment (sampling_time). Instead, they would translate into model time increments (each sampling time increment equals a single model time step). This means that indices of the model components simply count 1,2,3,… instead of 0, sampling_time, 2*sampling_time, … Set this to true, if model time increments (1, 2, 3, …) are used. Otherwise, sampling_time will be used as the time increment. Note: This is only relevant for the first model time increment, later increments may differ.

property model: tuple[ConcreteModel, list]

The model property is a tuple of the concrete model and the order of the action space. This is used such that the MPC algorithm can re-sort the action output. This sorting cannot be conveyed differently through pyomo.

Returns:: Tuple of the concrete model and the order of the action space.

abstract _model() → AbstractModel[source]

Create the abstract pyomo model. This is where the pyomo model description should be placed.

Returns:: Abstract pyomo model.

_step() → tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to work with modified actions (e.g., discretized or shaped actions), ensure they are processed before reaching this method or handle them within this method using the values in self.state. If you need to manipulate observations afterwarads, you can do this using the state modification callback.

Returns:

A tuple containing:

reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

update(observations: Sequence[Sequence[float | int]] | None = None) → None[source]

Update the optimization model with observations from another environment. New observations are stored in self.state.

Parameters:: observations – Observations from another environment.

solve_failed(model: pyo.ConcreteModel, result: SolverResults) → None[source]

This method will try to render the result in case the model could not be solved. It should automatically be called by the agent.

Parameters:

model – Current model.
result – Result of the last solution attempt.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Reset the internal state of the environment and return info dictionary.

This private method initializes the internal self.state dictionary by reading initial values directly from the FMU/simulator. It does not use the seed parameter since the initial state is determined by the simulator configuration.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes observations from the pyomo model without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() → None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the MPC environment is to do nothing.

Retrieve parameters for the named component and convert the parameters into the pyomo dict-format. If required, timeseries can be added to the parameters and timeseries may be reindexed. The pyo_convert_timeseries function is used for timeseries handling. See also pyo_convert_timeseries

Parameters:

component_name – Name of the component.
ts – Timeseries for the component.
index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.

Returns:

Pyomo parameter dictionary.

Convert a time series data into a pyomo format. Data will be reindexed if a new index is provided.

Parameters:

ts – Timeseries to convert.
index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.
component_name – Name of a specific component that the timeseries is used for. This limits which timeseries are returned.
_add_wrapping_none – Add a “None” indexed dictionary as the top level.

Returns:

Pyomo parameter dictionary.

pyo_update_params(updated_params: MutableMapping[str | None, Any], nonindex_param_append_string: str | None = None) → None[source]

Update model parameters and indexed parameters of a pyomo instance with values given in a dictionary. It assumes that the dictionary supplied in updated_params has the correct pyomo format.

Parameters:

updated_params – Dictionary with the updated values.
nonindex_param_append_string – String to be appended to values which are not indexed. This can be used if indexed parameters need to be set with values that do not have an index.

Returns:

Updated model instance.

pyo_get_solution(names: set[str] | None = None) → dict[str, float | int | dict[int, float | int]][source]

Convert the pyomo solution into a more usable format for plotting.

Parameters:: names – Names of the model parameters that are returned.
Returns:: Dictionary of {parameter name: value} pairs. Value may be a dictionary of {time: value} pairs which contains one value for each optimization time step.

pyo_get_component_value(component: Component, *, at: int = 1, allow_stale: bool = False) → float | int | None[source]

classmethod create_state(model: pyo.ConcreteModel, model_name: str, output_dir: pathlib.Path | str | None = None) → None[source]

Create both state config and parameters files from a Pyomo model.

This method creates both a state configuration TOML file (containing variables/observations) and a parameters TOML file from a Pyomo ConcreteModel, providing a complete setup for Pyomo-based environments.

Parameters:

model – Pyomo ConcreteModel instance.
model_name – Name of the model for identification.
output_dir – Directory where files should be created. If None, uses current working directory.