`fragile.core.env`#

This module contains the PlangymEnv class.

Module Contents#

Classes#

`PlangymEnv`	Wrapper class for running a Swarm to solve planning tasks in a Plangym environment.
`Function`	Environment that represents an arbitrary mathematical function bounded in a given interval.

class fragile.core.env.PlangymEnv(plangym_env, swarm=None)[source]#

Bases: fragile.core.api_classes.EnvironmentAPI

Wrapper class for running a Swarm to solve planning tasks in a Plangym environment.

This class allows running gymnasium simulations compatible with the plangym API. It can be used as an interface for passing states and actions, and receiving observation and reward information from the environment.

Parameters

plangym_env (plangym.core.PlangymEnv) –
swarm (Optional[fragile.core.api_classes.SwarmAPI]) –

plangym_env#

The Plangym environment this instance is wrapped around.

Type: Env
Return type: plangym.core.PlangymEnv

_has_terminals#

True if the environment has terminal signals, False otherwise.

Type: bool

_has_rgb#

True if the environment includes RGB color data, False otherwise.

Type: bool

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)

property states_shape#

Returns the shape of the states tensor.

Returns: The shape of the states tensor.
Return type: tuple

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)
>>> env.states_shape
(4,)

property states_dtype#

Returns the data type of the states tensor.

Returns: The data type of the states tensor.
Return type: judo.dtype

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)
>>> env.states_dtype
dtype('float64')

property plangym_env#

Returns the underlying Plangym environment.

Returns: The underlying plangym.Plangym environment.
Return type: plangym.PlangymEnv

property inputs#

Returns a dictionary of input data for the environment.

Returns: A dictionary of input data for the environment.
Return type: InputDict

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)
>>> env.inputs  
{'actions': {},
 'states': {'clone': True},
 'dt': {'optional': True, 'default': 1}}

property outputs#

Returns a tuple of output variables for the environment.

Returns: A tuple of output variables for the environment.
Return type: Tuple[str, …]

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)
>>> tuple(sorted(env.outputs))
('infos', 'n_steps', 'observs', 'oobs', 'rewards', 'states')

property param_dict#

Returns a dictionary of parameters for the environment.

Returns: A dictionary of parameters for the environment.
Return type: StateDict

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)
>>> print(env.param_dict)  
{'observs': {'shape': (4,), 'dtype': dtype('float32')},
 'rewards': {'dtype': <class 'numpy.float32'>},
 'oobs': {'dtype': <class 'numpy.bool_'>},
 'actions': {'shape': (), 'dtype': dtype('int64')},
 'n_steps': {'dtype': <class 'numpy.int32'>},
 'infos': {'shape': None, 'dtype': <class 'dict'>},
 'states': {'shape': (4,), 'dtype': dtype('float64')}}

property has_rgb#

Return whether the environment includes RGB color data or not.

Returns: True if the environment includes RGB color data, False otherwise.
Return type: bool

__getattr__(item)[source]#

step(actions, states, dt=1)[source]#

Takes an action in the environment and returns information about the new state.

Parameters

actions (Tensor) – The actions to take. Shape should be (n_walkers,) + actions_shape.
states (Tensor) – The states to act on. Shape should be (n_walkers,) + state_shape.
dt (int, optional) – The number of simulation steps to take per gym step. Defaults to 1.

Returns

A dictionary containing the following keys:

observs (Tensor): The observatons from the last step.
Has shape (n_walkers,) + observation_shape.
rewards (Tensor): The rewards received from the last step.
Has shape (n_walkers,).
oobs (List[Any]): List of episode endings from the last step.
Length is n_walkers.
infos (List[Dict[str, Any]]): Additional information about the last step.
Length is n_walkers.
n_steps (List[int]): The number of simulation steps taken in the last step.
Length is n_walkers.
states (Tensor): The new states after taking the given actions.
Has shape (n_walkers,) + state_shape.
terminals (List[bool], optional): List of raw terminal values if available.
Length is n_walkers. Only returned if the environment has terminal signals.
rgb (Tensor, optional): The rendered RGB output of the last step. Only returned
by some environments. Has shape (n_walkers, h, w, c).

Return type

Dict[str, Any]

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> env = PlangymEnv(plangym_env)
>>> n_walkers = 3
>>> actions = np.array([plangym_env.sample_action() for i in range(n_walkers)])
>>> state_dict = env.reset(n_walkers=n_walkers, inplace=False)
>>> result = env.step(actions, state_dict["states"])
>>> type(result)
<class 'dict'>
>>> tuple(sorted(result.keys()))
('infos', 'n_steps', 'observs', 'oobs', 'rewards', 'states')

reset(inplace=True, root_walker=None, states=None, n_walkers=None, **kwargs)[source]#

Resets the environment(s) to its initial state.

This method resets the states and observables of the environment(s) stored in the object’s swarm.state attribute to their initial values.

Parameters

inplace (bool) – If True, updates the current instance state with the reset values. If False, returns a new dictionary with the states, observables and info dicts.
root_walker (Optional[StateData]) – The state information to reset from, if not using default initial state. Defaults to None.
states (Optional[StateData]) – The states to use as initial values, if provided, it will ignore root_walker. Defaults to None.
n_walkers (Optional[int]) – The number of walkers to reset. Defaults to None.
**kwargs – Other parameters that might be necessary depending on the specific implementation of the class.

Returns

A StateDict containing the states, observables, and info dict after the reset. Only returned when inplace is False.

Return type

Optional[StateDict]

Examples

>>> import plangym
>>> plangym_env = plangym.make("CartPole-v0")
>>> n_walkers = 3
>>> # Reset environment and update the state of the swarm:
>>> env = PlangymEnv(plangym_env, swarm=None)
>>> env.reset(n_walkers=n_walkers)  # Fails because there is no swarm
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'state'

>>> # Get reset data without modifying current instance:
>>> env = PlangymEnv(plangym_env)
>>> reset_data = env.reset(n_walkers=n_walkers, inplace=False)
>>> tuple(sorted(reset_data.keys()))
('infos', 'observs', 'rewards', 'states')

class fragile.core.env.Function(function, bounds, custom_domain_check=None, actions_as_perturbations=True, start_same_pos=False, x0=None)[source]#

Bases: fragile.core.api_classes.EnvironmentAPI

Environment that represents an arbitrary mathematical function bounded in a given interval.

Parameters

function (Callable[[judo.typing.Tensor], judo.typing.Tensor]) –
bounds (Union[judo.Bounds, gym.spaces.box.Box]) –
custom_domain_check (Callable[[judo.typing.Tensor, judo.typing.Tensor, int], judo.typing.Tensor]) –
actions_as_perturbations (bool) –
start_same_pos (bool) –
x0 (fragile.core.typing.Tensor) –

default_inputs#

property n_dims#

Return the number of dimensions of the function to be optimized.

Return type: int

property shape#

Return the shape of the environment.

Return type: Tuple[int, Ellipsis]

property action_space#

Action space with the same characteristics as self.bounds.

Return type: gym.spaces.box.Box

classmethod from_bounds_params(function, shape=None, high=numpy.inf, low=numpy.NINF, custom_domain_check=None)[source]#

Initialize a function defining its shape and bounds without using a Bounds.

Parameters

function (Callable) – Callable that takes a batch of vectors (batched across the first dimension of the array) and returns a vector of typing.Scalar. This function is applied to a batch of walker observations.
shape (tuple) – Input shape of the solution vector without taking into account the batch dimension. For example, a two-dimensional function applied to a batch of 5 walkers will have shape=(2,), even though the observations will have shape (5, 2)
high (Union[int, float, judo.typing.Tensor]) – Upper bound of the function domain. If it’s a typing.Scalar it will be the same for all dimensions. If it’s a numpy array it will be the upper bound for each dimension.
low (Union[int, float, judo.typing.Tensor]) – Lower bound of the function domain. If it’s a typing.Scalar it will be the same for all dimensions. If it’s a numpy array it will be the lower bound for each dimension.
custom_domain_check (Callable[[judo.typing.Tensor], judo.typing.Tensor]) – Callable that checks points inside the bounds to know if they are in a custom domain when it is not just a set of rectangular bounds.

Returns

Function with its Bounds created from the provided arguments.

Return type

Function

__repr__()[source]#: Return repr(self).

step(actions, observs, **kwargs)[source]#

Sum the target action to the observations to obtain the new points, and evaluate the reward and boundary conditions.

Returns

Dictionary containing the information of the new points evaluated.

{"states": new_points, "observs": new_points, "rewards": typing.Scalar array, "oobs": boolean array}

Return type

fragile.core.typing.StateData

reset(inplace=True, root_walker=None, states=None, **kwargs)[source]#

Reset the Function to the start of a new episode and returns an StatesEnv instance describing its internal state.

Parameters

inplace (bool) –
root_walker (Optional[fragile.core.typing.StateData]) –
states (Optional[fragile.core.typing.StateData]) –

Return type

Union[None, fragile.core.typing.StateData]

calculate_oobs(points, rewards)[source]#

Determine if a given batch of vectors lie inside the function domain.

Parameters

points (judo.typing.Tensor) – Array of batched vectors that will be checked to lie inside the Function bounds.
rewards (judo.typing.Tensor) – Array containing the rewards of the current walkers.

Returns

Array of booleans of length batch_size (points.shape[0]) that will be True if a given point of the batch lies outside the bounds, and False otherwise.

Return type

judo.typing.Tensor

sample_action(batch_size)[source]#

Return a matrix of points sampled uniformly from the Function domain.

Parameters: batch_size (int) – Number of points that will be sampled.
Returns: Array containing batch_size points that lie inside the Function domain, stacked across the first dimension.
Return type: judo.typing.Tensor

fragile.core.env

Contents

fragile.core.env#

Module Contents#

Classes#

`fragile.core.env`#