:py:mod:`fragile.core.env` ========================== .. py:module:: fragile.core.env .. autoapi-nested-parse:: This module contains the :class:`PlangymEnv` class. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: fragile.core.env.PlangymEnv fragile.core.env.Function .. py:class:: PlangymEnv(plangym_env, swarm = None) Bases: :py:obj:`fragile.core.api_classes.EnvironmentAPI` Wrapper class for running a Swarm to solve planning tasks in a Plangym environment. This class allows running gymnasium simulations compatible with the plangym API. It can be used as an interface for passing states and actions, and receiving observation and reward information from the environment. .. attribute:: plangym_env The Plangym environment this instance is wrapped around. :type: Env .. attribute:: _has_terminals `True` if the environment has terminal signals, `False` otherwise. :type: bool .. attribute:: _has_rgb `True` if the environment includes RGB color data, `False` otherwise. :type: bool .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) .. py:method:: states_shape() :property: Returns the shape of the states tensor. :returns: The shape of the states tensor. :rtype: tuple .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> env.states_shape (4,) .. py:method:: states_dtype() :property: Returns the data type of the states tensor. :returns: The data type of the states tensor. :rtype: judo.dtype .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> env.states_dtype dtype('float64') .. py:method:: plangym_env() :property: Returns the underlying Plangym environment. :returns: The underlying :class:`plangym.Plangym` environment. :rtype: plangym.PlangymEnv .. py:method:: inputs() :property: Returns a dictionary of input data for the environment. :returns: A dictionary of input data for the environment. :rtype: InputDict .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> env.inputs # doctest: +NORMALIZE_WHITESPACE {'actions': {}, 'states': {'clone': True}, 'dt': {'optional': True, 'default': 1}} .. py:method:: outputs() :property: Returns a tuple of output variables for the environment. :returns: A tuple of output variables for the environment. :rtype: Tuple[str, ...] .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> tuple(sorted(env.outputs)) ('infos', 'n_steps', 'observs', 'oobs', 'rewards', 'states') .. py:method:: param_dict() :property: Returns a dictionary of parameters for the environment. :returns: A dictionary of parameters for the environment. :rtype: StateDict .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> print(env.param_dict) # doctest: +NORMALIZE_WHITESPACE {'observs': {'shape': (4,), 'dtype': dtype('float32')}, 'rewards': {'dtype': }, 'oobs': {'dtype': }, 'actions': {'shape': (), 'dtype': dtype('int64')}, 'n_steps': {'dtype': }, 'infos': {'shape': None, 'dtype': }, 'states': {'shape': (4,), 'dtype': dtype('float64')}} .. py:method:: has_rgb() :property: Return whether the environment includes RGB color data or not. :returns: `True` if the environment includes RGB color data, `False` otherwise. :rtype: bool .. py:method:: __getattr__(item) .. py:method:: step(actions, states, dt = 1) Takes an action in the environment and returns information about the new state. :param actions: The actions to take. Shape should be (n_walkers,) + actions_shape. :type actions: Tensor :param states: The states to act on. Shape should be (n_walkers,) + state_shape. :type states: Tensor :param dt: The number of simulation steps to take per gym step. Defaults to 1. :type dt: int, optional :returns: A dictionary containing the following keys: - observs (Tensor): The observatons from the last step. Has shape (n_walkers,) + observation_shape. - rewards (Tensor): The rewards received from the last step. Has shape (n_walkers,). - oobs (List[Any]): List of episode endings from the last step. Length is n_walkers. - infos (List[Dict[str, Any]]): Additional information about the last step. Length is n_walkers. - n_steps (List[int]): The number of simulation steps taken in the last step. Length is n_walkers. - states (Tensor): The new states after taking the given actions. Has shape (n_walkers,) + state_shape. - terminals (List[bool], optional): List of raw terminal values if available. Length is n_walkers. Only returned if the environment has terminal signals. - rgb (Tensor, optional): The rendered RGB output of the last step. Only returned by some environments. Has shape (n_walkers, h, w, c). :rtype: Dict[str, Any] .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> env = PlangymEnv(plangym_env) >>> n_walkers = 3 >>> actions = np.array([plangym_env.sample_action() for i in range(n_walkers)]) >>> state_dict = env.reset(n_walkers=n_walkers, inplace=False) >>> result = env.step(actions, state_dict["states"]) >>> type(result) >>> tuple(sorted(result.keys())) ('infos', 'n_steps', 'observs', 'oobs', 'rewards', 'states') .. py:method:: reset(inplace = True, root_walker = None, states = None, n_walkers = None, **kwargs) Resets the environment(s) to its initial state. This method resets the states and observables of the environment(s) stored in the object's `swarm.state` attribute to their initial values. :param inplace: If True, updates the current instance state with the reset values. If False, returns a new dictionary with the states, observables and info dicts. :type inplace: bool :param root_walker: The state information to reset from, if not using default initial state. Defaults to None. :type root_walker: Optional[StateData] :param states: The states to use as initial values, if provided, it will ignore `root_walker`. Defaults to None. :type states: Optional[StateData] :param n_walkers: The number of walkers to reset. Defaults to None. :type n_walkers: Optional[int] :param \*\*kwargs: Other parameters that might be necessary depending on the specific implementation of the class. :returns: A StateDict containing the states, observables, and info dict after the reset. Only returned when `inplace` is False. :rtype: Optional[StateDict] .. rubric:: Examples >>> import plangym >>> plangym_env = plangym.make("CartPole-v0") >>> n_walkers = 3 >>> # Reset environment and update the state of the swarm: >>> env = PlangymEnv(plangym_env, swarm=None) >>> env.reset(n_walkers=n_walkers) # Fails because there is no swarm Traceback (most recent call last): ... AttributeError: 'NoneType' object has no attribute 'state' >>> # Get reset data without modifying current instance: >>> env = PlangymEnv(plangym_env) >>> reset_data = env.reset(n_walkers=n_walkers, inplace=False) >>> tuple(sorted(reset_data.keys())) ('infos', 'observs', 'rewards', 'states') .. py:class:: Function(function, bounds, custom_domain_check = None, actions_as_perturbations = True, start_same_pos = False, x0 = None) Bases: :py:obj:`fragile.core.api_classes.EnvironmentAPI` Environment that represents an arbitrary mathematical function bounded in a given interval. .. py:attribute:: default_inputs .. py:method:: n_dims() :property: Return the number of dimensions of the function to be optimized. .. py:method:: shape() :property: Return the shape of the environment. .. py:method:: action_space() :property: Action space with the same characteristics as self.bounds. .. py:method:: from_bounds_params(function, shape = None, high = numpy.inf, low = numpy.NINF, custom_domain_check = None) :classmethod: Initialize a function defining its shape and bounds without using a :class:`Bounds`. :param function: Callable that takes a batch of vectors (batched across the first dimension of the array) and returns a vector of typing.Scalar. This function is applied to a batch of walker observations. :param shape: Input shape of the solution vector without taking into account the batch dimension. For example, a two-dimensional function applied to a batch of 5 walkers will have shape=(2,), even though the observations will have shape (5, 2) :param high: Upper bound of the function domain. If it's a typing.Scalar it will be the same for all dimensions. If it's a numpy array it will be the upper bound for each dimension. :param low: Lower bound of the function domain. If it's a typing.Scalar it will be the same for all dimensions. If it's a numpy array it will be the lower bound for each dimension. :param custom_domain_check: Callable that checks points inside the bounds to know if they are in a custom domain when it is not just a set of rectangular bounds. :returns: :class:`Function` with its :class:`Bounds` created from the provided arguments. .. py:method:: __repr__() Return repr(self). .. py:method:: step(actions, observs, **kwargs) Sum the target action to the observations to obtain the new points, and evaluate the reward and boundary conditions. :returns: Dictionary containing the information of the new points evaluated. ``{"states": new_points, "observs": new_points, "rewards": typing.Scalar array, "oobs": boolean array}`` .. py:method:: reset(inplace = True, root_walker = None, states = None, **kwargs) Reset the :class:`Function` to the start of a new episode and returns an :class:`StatesEnv` instance describing its internal state. .. py:method:: calculate_oobs(points, rewards) Determine if a given batch of vectors lie inside the function domain. :param points: Array of batched vectors that will be checked to lie inside the :class:`Function` bounds. :param rewards: Array containing the rewards of the current walkers. :returns: Array of booleans of length batch_size (points.shape[0]) that will be ``True`` if a given point of the batch lies outside the bounds, and ``False`` otherwise. .. py:method:: sample_action(batch_size) Return a matrix of points sampled uniformly from the :class:`Function` domain. :param batch_size: Number of points that will be sampled. :returns: Array containing ``batch_size`` points that lie inside the :class:`Function` domain, stacked across the first dimension.