Context Navigation

ISequentialDecisionPolicy.cs

Timestamp:

08/24/15 13:56:27 (9 years ago)

Author:

gkronber

Message:

#2283: experiments on grammatical optimization algorithms (maxreward instead of avg reward, ...)

File:

r11850	r12893
12	12	// we also assume that the policy can fail to select one of the followStates
13	13	public interface ISequentialDecisionPolicy<in TState> {
14		bool TrySelect(Random random, TState curState, IEnumerable<TState> afterStates, out int selectedStateIdx); // selectedState \in afterStates
	14	bool TrySelect(System.Random random, TState curState, IEnumerable<TState> afterStates, out int selectedStateIdx); // selectedState \in afterStates
15	15
16	16	// state-trajectory are the states of the episode, at the end we recieved the reward (only for the terminal state)

Note: See TracChangeset for help on using the changeset viewer.