fragile.optimize.utils#

Module Contents#

Classes#

GradFuncWrapper

Create a function-like object that combines provided lnp and grad(lnp)

Functions#

numerical_grad_single_vector(theta, f[, dx, order])

return numerical estimate of the local gradient

numerical_grad(theta, f[, dx, order])

return numerical estimate of the local gradient

leapfrog(init_pos, init_momentum, grad, step_size, ...)

Perfom a leapfrog jump in the Hamiltonian space

fai_leapfrog(init_pos, init_momentum, grad, step_size, ...)

Perfom a leapfrog jump in the Hamiltonian space

fragile.optimize.utils.numerical_grad_single_vector(theta, f, dx=0.001, order=1)[source]#

return numerical estimate of the local gradient The gradient is computer by using the Taylor expansion approximation over each dimension:

function(t + dt) = function(t) + h df/dt(t) + h^2/2 d^2f/dt^2 + …

The first order gives then:

df/dt = (function(t + dt) - function(t)) / dt + O(dt)

Note that we could also compute the backwards version by subtracting dt instead:

df/dt = (function(t) - function(t -dt)) / dt + O(dt)

A better approach is to use a 3-step formula where we evaluate the derivative on both sides of a chosen point t using the above forward and backward two-step formulae and taking the average afterward. We need to use the Taylor

expansion to higher order:

function (t +/- dt) = function (t) +/- dt df/dt + dt ^ 2 / 2 dt^2 function/dt^2 +/- dt ^ 3 d^3 function/dt^3 + O(dt ^ 4) df/dt = (function(t + dt) - function(t - dt)) / (2 * dt) + O(dt ^ 3)

Note: that the error is now of the order of dt ^ 3 instead of dt In a same manner we can obtain the next order by using function(t +/- 2 * dt):

df/dt = (function(t - 2 * dt) - 8 function(t - dt)) + 8 function(t + dt) - function(t + 2 * dt) / (12 * dt) + O(dt ^ 4)

In the 2nd order, two additional function evaluations are required (per dimension), implying a more time-consuming algorithm. However, the approximation error is of the order of dt ^ 4

theta: ndarray[float, ndim=1]

vector value around which estimating the gradient

function: callable

function from which estimating the gradient

dx: float

pertubation to apply in each direction during the gradient estimation

order: int in [1, 2]
order of the estimates:

1 uses the central average over 2 points 2 uses the central average over 4 points

df: ndarray[float, ndim=1]

gradient vector estimated at theta

COST: the gradient estimation need to evaluates ndim * (2 * order) points (see above) CAUTION: if dt is very small, the limited numerical precision can result in big errors.

fragile.optimize.utils.numerical_grad(theta, f, dx=0.001, order=1)[source]#

return numerical estimate of the local gradient The gradient is computer by using the Taylor expansion approximation over each dimension:

function(t + dt) = function(t) + h df/dt(t) + h^2/2 d^2f/dt^2 + …

The first order gives then:

df/dt = (function(t + dt) - function(t)) / dt + O(dt)

Note that we could also compute the backwards version by subtracting dt instead:

df/dt = (function(t) - function(t -dt)) / dt + O(dt)

A better approach is to use a 3-step formula where we evaluate the derivative on both sides of a chosen point t using the above forward and backward two-step formulae and taking the average afterward. We need to use the

Taylor expansion to higher order:

function (t +/- dt) = function (t) +/- dt df/dt + dt ^ 2 / 2 dt^2 function/dt^2 +/- dt ^ 3 d^3 function/dt^3 + O(dt ^ 4)

df/dt = (function(t + dt) - function(t - dt)) / (2 * dt) + O(dt ^ 3)

Note: that the error is now of the order of dt ^ 3 instead of dt In a same manner we can obtain the next order by using function(t +/- 2 * dt):

df/dt = (function(t - 2 * dt) - 8 function(t - dt)) + 8 function(t + dt) - function(t + 2 * dt) / (12 * dt) + O(dt ^ 4)

In the 2nd order, two additional function evaluations are required (per dimension), implying a more time-consuming algorithm. However the approximation error is of the order of dt ^ 4 INPUTS —— theta: ndarray[float, ndim=1]

vector value around which estimating the gradient

function: callable

function from which estimating the gradient

dx: float

pertubation to apply in each direction during the gradient estimation

order: int in [1, 2]
order of the estimates:

1 uses the central average over 2 points 2 uses the central average over 4 points

df: ndarray[float, ndim=1]

gradient vector estimated at theta

COST: the gradient estimation need to evaluates ndim * (2 * order) points (see above) CAUTION: if dt is very small, the limited numerical precision can result in big errors.

fragile.optimize.utils.leapfrog(init_pos, init_momentum, grad, step_size, function, force=None)[source]#

Perfom a leapfrog jump in the Hamiltonian space INPUTS —— init_pos: ndarray[float, ndim=1]

initial parameter position

init_momentum: ndarray[float, ndim=1]

initial momentum

grad: float

initial gradient value

step_size: float

step size

function: callable

it should return the log probability and gradient evaluated at theta logp, grad = function(theta)

new_position: ndarray[float, ndim=1]

new parameter position

new_momentum: ndarray[float, ndim=1]

new momentum

gradprime: float

new gradient

new_potential: float

new lnp

fragile.optimize.utils.fai_leapfrog(init_pos, init_momentum, grad, step_size, function, force=None)[source]#

Perfom a leapfrog jump in the Hamiltonian space INPUTS —— init_pos: ndarray[float, ndim=1]

initial parameter position

init_momentum: ndarray[float, ndim=1]

initial momentum

grad: float

initial gradient value

step_size: float

step size

function: callable

it should return the log probability and gradient evaluated at theta logp, grad = function(theta)

new_position: ndarray[float, ndim=1]

new parameter position

new_momentum: ndarray[float, ndim=1]

new momentum

gradprime: float

new gradient

new_potential: float

new lnp

class fragile.optimize.utils.GradFuncWrapper(function, grad_func=None, dx=0.001, order=1)[source]#

Create a function-like object that combines provided lnp and grad(lnp) functions into one as required by nuts6. Both functions are stored as partial function allowing to fix arguments if the gradient function is not provided, numerical gradient will be computed By default, arguments are assumed identical for the gradient and the likelihood. However you can change this behavior using set_xxxx_args. (keywords are also supported) if verbose property is set, each call will print the log-likelihood value and the theta point

__call__(theta)[source]#