In the book [Wild problems : a guide to the decisions that define us](https://www.worldcat.org/title/1321820629), Russ Roberts analyzes decision making on difficult life problems. These types of problems don't have straightforward or measurable goals, and there's no set procedure for success. Examples of these problems are choosing whether to marry, have kids, or switch jobs.
Russ Roberts is an economist, however, he starts the book
by showing how economic principles cannot be easily
applied to arrive at a decision for these problems.
Economic principles are grounded in the utilitarian approach to ethics. In it, a utility function is defined which measures how much benefits outweighs the costs within a certain situation.
In a mathematical formulation, we would define it like the following:
$$
u_\mathfrak{a}(s) \rightarrow \mathbb{R}
$$
If the utility is positive, then we say that there are more benefits than downsides. If negative, then vice versa. As differing agents have different preferences,
the utility function is dependent upon the agent $\mathfrak{a}$. The input of the function is some *state* $s$ which represents the environment that the agent is in. The output is a continuous real number.
A rational action in this context is the one that provides the highest utility in the next state.
$$
a_{rational} = argmax_a(u_\mathfrak{a}(P(s,a)))
$$
In the last equation $P(s, a)$ provides the state that the agent finds themselves in after performing the action $a$ in their current state $s$.
In sequential decision making, a popular formalism is the Markov Decision Process. This defines a tuple $\langle S,A, P, R \rangle$ where:
- $S$ represents the set of states
- $A$ represents the set of actions
- $P$ represents the transition function between states via the actions
- $R$ represents the reward function based on a state and action
Initially, I'll talk about deterministic MDPs. This means that the state after performing a state and action pair is always the same.
One popular approach to solving these problems is through value iteration. This approach is characterized by the Bellman Equation:
$$
V(s) = max_a(R(s,a) + \gamma V(P(s,a)))
$$
This says that the value of the current state is the highest combination possible between the direct reward for some action $a$ and the discounted value of the next state.
Russ Roberts didn't mention MDPs in his book, but I think he had something similar in mind when he writes about the difficulty of computing utility given a wild problem. One example he uses, comes from L.A. Paul and her book [Transformative Experience](https://www.worldcat.org/title/872342141).
**The Vampire Problem**
> Before you become a vampire, you can't really imagine what it will be like. Your current experience doesn't include what it's like to subsist on blood and sleep in a coffin when the sun is shining. Sounds dreary? But most, maybe all, of the vampires you meet speak quite highly of the experience. Surveys of vampires reveal a high degree of happiness.
Let's look at it in terms of value iteration. The value of becoming a vampire is equal to the reward during the initial transformation plus the discounted value of living life after the transformation.
When an agent performs the action $a_{transform}$ does it know that it performed that specific action? One way of looking at this is to see if the agent is able to tell apart any two arbitrary actions, say $a_{transform}$ and $a_{morph}$. I don't believe this problem is well studied in literature. Do let me know if you know of any literature covering an agent confusing actions.
Two reasons immediately come to mind for why partial observability exists.
- Compromised Perception: A state in which the agent's perception only captures a subset of the total information. For example, heavy fog covers the roadway
limiting visibility of other traffic.
- Group Decision Making: Agents often don't have insight to other's thought processes or perceptions.