eta_ctrl.envs.live_env module

class eta_ctrl.envs.live_env.LiveEnv(env_id: int, config_run: ConfigRun, verbose: int = 2, callback: Callable | None = None, *, episode_duration: TimeStep | str, sampling_time: TimeStep | str, max_errors: int = 10, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for Live environments.

The class will create an ETA Nexus ConnectionManager instance and provide facilities to automatically read step results and reset the connection.

Additionally to required class attribute from BaseEnv, LiveEnv requires the name of the connection manager configuration file as a class attribute:

  • config_name: Name of the connection manager configuration.

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • max_errors – Maximum number of connection errors before interrupting the optimization process.

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

abstract property config_name: str

Name of the connection manager configuration.

Needs to be implemented for each subclass as a class attribute.

connection_manager: ConnectionManager

Instance of the Live Connector.

connection_manager_config: Path | Sequence[Path] | dict[str, Any] | None

Path or Dict to initialize the live connector.

max_error_count: int

Maximum error count when connections in live connector are aborted.

_step() tuple[float, bool, bool, dict][source]

Perform one internal time step and return core step results.

This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment, which are available in the state dictionary.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Returns:

A tuple containing:

  • reward: The value of the reward function. This is just one floating point value.

  • terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)

    which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barto Gridworld. If true, the Vectorizer will call reset().

  • truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied

    (i.e. the episode ended). Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the Vectorizer will call reset().

  • info: Provide some additional info about the state of the environment. The contents of this may

    be used for logging purposes in the future but typically do not currently serve a purpose.

Note

Stable Baselines3 combines terminated and truncated with a logical OR to trigger the automatic environment reset. Implement both flags for compatibility.

_reset(*, seed: int | None = None, options: dict[str, Any] | None = None) dict[str, Any][source]

Reset the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

Parameters:
  • seed – The seed for initializing any randomized components of the state. Subclasses should use this for reproducible randomness in their state init

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Info dictionary containing information about the initial state. The initial observations are automatically filtered from the internal state by the public reset method and must not be returned here.

Note

The base implementation initializes external outputs from the live connector without using the seed. Subclasses should use the seed parameter for any additional randomized state observations they implement.

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the connection_manager environment is to do nothing.