eta_ctrl.envs.pyomo_sim_env module

class eta_ctrl.envs.pyomo_sim_env.PyomoSimEnv(*args: Any, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Gymnasium environment that simulates state transitions using a Pyomo model without a solver.

Instead of optimizing over a full prediction horizon, PyomoSimEnv instantiates the model with prediction_horizon = sampling_time (i.e. one time step) and evaluates Pyomo Expression components to compute the next state.

The model must define an start_value_mapping that maps initial-condition Param names to their corresponding Expression names. Each step, the environment:

Fixes agent actions in the model at t=0.
Evaluates the mapped Expressions at t=1 to obtain the next state.
Updates the initial-condition Params via pyo_update_params() for the following step.

This allows reusing the same Pyomo model definition for both MPC optimization (with MpcAgent) and step-by-step simulation.

Parameters:

args – Positional arguments forwarded to BaseEnv.
kwargs – Keyword arguments forwarded to BaseEnv. May include model_parameters (dict) which is extracted and passed to the model constructor.

abstract property model_import: str: Dotted import path to the PyomoModel subclass.

_step() → tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This private method implements the actual environment transition logic. It works with the internal self.state dictionary that already includes actions and returns the core step results without observations (which are handled by the public step method).

Returns:

A tuple containing:

reward: The value of the reward function. This is just one floating point value.
terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().
truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied
(i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → dict[str, Any][source]

Reset the internal state of the environment and return info dictionary.

This private method initializes the internal self.state dictionary by reading initial paramneter values from the PyomoModel. It does not use the seed parameter since the initial state is determined by the user configuration.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

The public reset method handles the Gymnasium interface including observation filtering and proper seeding mechanism.

Parameters:

seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes observations from the pyomo model without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() → None[source]: Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

static create_state(model: pyo.ConcreteModel, model_name: str, output_dir: pathlib.Path | str | None = None) → None[source]

Create both state config and parameters files from a Pyomo model.

This method creates both a state configuration TOML file (containing variables/observations) and a parameters TOML file from a Pyomo ConcreteModel, providing a complete setup for Pyomo-based environments.

Parameters:

model – Pyomo ConcreteModel instance.
model_name – Name of the model for identification.
output_dir – Directory where files should be created. If None, uses current working directory.