fragile.core.env
Contents
fragile.core.env
#
This module contains the PlangymEnv
class.
Module Contents#
Classes#
Wrapper class for running a Swarm to solve planning tasks in a Plangym environment. |
|
Environment that represents an arbitrary mathematical function bounded in a given interval. |
- class fragile.core.env.PlangymEnv(plangym_env, swarm=None)[source]#
Bases:
fragile.core.api_classes.EnvironmentAPI
Wrapper class for running a Swarm to solve planning tasks in a Plangym environment.
This class allows running gymnasium simulations compatible with the plangym API. It can be used as an interface for passing states and actions, and receiving observation and reward information from the environment.
- Parameters
plangym_env (plangym.core.PlangymEnv) –
swarm (Optional[fragile.core.api_classes.SwarmAPI]) –
- plangym_env#
The Plangym environment this instance is wrapped around.
- Type
Env
- Return type
plangym.core.PlangymEnv
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env)
- property states_shape#
Returns the shape of the states tensor.
- Returns
The shape of the states tensor.
- Return type
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> env.states_shape (4,)
- property states_dtype#
Returns the data type of the states tensor.
- Returns
The data type of the states tensor.
- Return type
judo.dtype
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> env.states_dtype dtype('float64')
- property plangym_env#
Returns the underlying Plangym environment.
- Returns
The underlying
plangym.Plangym
environment.- Return type
plangym.PlangymEnv
- property inputs#
Returns a dictionary of input data for the environment.
- Returns
A dictionary of input data for the environment.
- Return type
InputDict
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> env.inputs {'actions': {}, 'states': {'clone': True}, 'dt': {'optional': True, 'default': 1}}
- property outputs#
Returns a tuple of output variables for the environment.
- Returns
A tuple of output variables for the environment.
- Return type
Tuple[str, …]
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> tuple(sorted(env.outputs)) ('infos', 'n_steps', 'observs', 'oobs', 'rewards', 'states')
- property param_dict#
Returns a dictionary of parameters for the environment.
- Returns
A dictionary of parameters for the environment.
- Return type
StateDict
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> print(env.param_dict) {'observs': {'shape': (4,), 'dtype': dtype('float32')}, 'rewards': {'dtype': <class 'numpy.float32'>}, 'oobs': {'dtype': <class 'numpy.bool_'>}, 'actions': {'shape': (), 'dtype': dtype('int64')}, 'n_steps': {'dtype': <class 'numpy.int32'>}, 'infos': {'shape': None, 'dtype': <class 'dict'>}, 'states': {'shape': (4,), 'dtype': dtype('float64')}}
- property has_rgb#
Return whether the environment includes RGB color data or not.
- Returns
True if the environment includes RGB color data, False otherwise.
- Return type
- step(actions, states, dt=1)[source]#
Takes an action in the environment and returns information about the new state.
- Parameters
actions (Tensor) – The actions to take. Shape should be (n_walkers,) + actions_shape.
states (Tensor) – The states to act on. Shape should be (n_walkers,) + state_shape.
dt (int, optional) – The number of simulation steps to take per gym step. Defaults to 1.
- Returns
- A dictionary containing the following keys:
- observs (Tensor): The observatons from the last step.
Has shape (n_walkers,) + observation_shape.
- rewards (Tensor): The rewards received from the last step.
Has shape (n_walkers,).
- oobs (List[Any]): List of episode endings from the last step.
Length is n_walkers.
- infos (List[Dict[str, Any]]): Additional information about the last step.
Length is n_walkers.
- n_steps (List[int]): The number of simulation steps taken in the last step.
Length is n_walkers.
- states (Tensor): The new states after taking the given actions.
Has shape (n_walkers,) + state_shape.
- terminals (List[bool], optional): List of raw terminal values if available.
Length is n_walkers. Only returned if the environment has terminal signals.
- rgb (Tensor, optional): The rendered RGB output of the last step. Only returned
by some environments. Has shape (n_walkers, h, w, c).
- Return type
Dict[str, Any]
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> n_walkers = 3 >>> actions = np.array([plangym_env.sample_action() for i in range(n_walkers)]) >>> state_dict = env.reset(n_walkers=n_walkers, inplace=False) >>> result = env.step(actions, state_dict["states"]) >>> type(result) <class 'dict'> >>> tuple(sorted(result.keys())) ('infos', 'n_steps', 'observs', 'oobs', 'rewards', 'states')
- reset(inplace=True, root_walker=None, states=None, n_walkers=None, **kwargs)[source]#
Resets the environment(s) to its initial state.
This method resets the states and observables of the environment(s) stored in the object’s swarm.state attribute to their initial values.
- Parameters
inplace (bool) – If True, updates the current instance state with the reset values. If False, returns a new dictionary with the states, observables and info dicts.
root_walker (Optional[StateData]) – The state information to reset from, if not using default initial state. Defaults to None.
states (Optional[StateData]) – The states to use as initial values, if provided, it will ignore root_walker. Defaults to None.
n_walkers (Optional[int]) – The number of walkers to reset. Defaults to None.
**kwargs – Other parameters that might be necessary depending on the specific implementation of the class.
- Returns
A StateDict containing the states, observables, and info dict after the reset. Only returned when inplace is False.
- Return type
Optional[StateDict]
Examples
>>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> n_walkers = 3 >>> # Reset environment and update the state of the swarm: >>> env = PlangymEnv(plangym_env, swarm=None) >>> env.reset(n_walkers=n_walkers) # Fails because there is no swarm Traceback (most recent call last): ... AttributeError: 'NoneType' object has no attribute 'state'
>>> # Get reset data without modifying current instance: >>> env = PlangymEnv(plangym_env) >>> reset_data = env.reset(n_walkers=n_walkers, inplace=False) >>> tuple(sorted(reset_data.keys())) ('infos', 'observs', 'rewards', 'states')
- class fragile.core.env.Function(function, bounds, custom_domain_check=None, actions_as_perturbations=True, start_same_pos=False, x0=None)[source]#
Bases:
fragile.core.api_classes.EnvironmentAPI
Environment that represents an arbitrary mathematical function bounded in a given interval.
- Parameters
function (Callable[[judo.typing.Tensor], judo.typing.Tensor]) –
bounds (Union[judo.Bounds, gym.spaces.box.Box]) –
custom_domain_check (Callable[[judo.typing.Tensor, judo.typing.Tensor, int], judo.typing.Tensor]) –
actions_as_perturbations (bool) –
start_same_pos (bool) –
x0 (fragile.core.typing.Tensor) –
- default_inputs#
- property action_space#
Action space with the same characteristics as self.bounds.
- Return type
gym.spaces.box.Box
- classmethod from_bounds_params(function, shape=None, high=numpy.inf, low=numpy.NINF, custom_domain_check=None)[source]#
Initialize a function defining its shape and bounds without using a
Bounds
.- Parameters
function (Callable) – Callable that takes a batch of vectors (batched across the first dimension of the array) and returns a vector of typing.Scalar. This function is applied to a batch of walker observations.
shape (tuple) – Input shape of the solution vector without taking into account the batch dimension. For example, a two-dimensional function applied to a batch of 5 walkers will have shape=(2,), even though the observations will have shape (5, 2)
high (Union[int, float, judo.typing.Tensor]) – Upper bound of the function domain. If it’s a typing.Scalar it will be the same for all dimensions. If it’s a numpy array it will be the upper bound for each dimension.
low (Union[int, float, judo.typing.Tensor]) – Lower bound of the function domain. If it’s a typing.Scalar it will be the same for all dimensions. If it’s a numpy array it will be the lower bound for each dimension.
custom_domain_check (Callable[[judo.typing.Tensor], judo.typing.Tensor]) – Callable that checks points inside the bounds to know if they are in a custom domain when it is not just a set of rectangular bounds.
- Returns
Function
with itsBounds
created from the provided arguments.- Return type
- step(actions, observs, **kwargs)[source]#
Sum the target action to the observations to obtain the new points, and evaluate the reward and boundary conditions.
- Returns
Dictionary containing the information of the new points evaluated.
{"states": new_points, "observs": new_points, "rewards": typing.Scalar array, "oobs": boolean array}
- Return type
fragile.core.typing.StateData
- reset(inplace=True, root_walker=None, states=None, **kwargs)[source]#
Reset the
Function
to the start of a new episode and returns anStatesEnv
instance describing its internal state.- Parameters
inplace (bool) –
root_walker (Optional[fragile.core.typing.StateData]) –
states (Optional[fragile.core.typing.StateData]) –
- Return type
Union[None, fragile.core.typing.StateData]
- calculate_oobs(points, rewards)[source]#
Determine if a given batch of vectors lie inside the function domain.
- Parameters
points (judo.typing.Tensor) – Array of batched vectors that will be checked to lie inside the
Function
bounds.rewards (judo.typing.Tensor) – Array containing the rewards of the current walkers.
- Returns
Array of booleans of length batch_size (points.shape[0]) that will be
True
if a given point of the batch lies outside the bounds, andFalse
otherwise.- Return type
judo.typing.Tensor