Consider a young woman that pursues a career as a doctor because she looks forward to “being the source of somebody’s good news” every day. This involves her placing value on states in which she believes another person has received information from her and labeled it “positive” and using them to define her goals. In order to pursue this career, she needs to reason over a wide range of temporal resolutions (e.g. short-term and long-term) about goal-directed options that will lead her into a consistent pattern of states that provide her her desired reward (telling people good news every day).
How are our choices and actions motivated?
How does she use her value for states to define her goals?
How does she represent her “terminal” ending state sequence?
How does she reason that particular middle-range options will lead her to her goal?
How does she coordinate planning at different temporal resolutions?
Does her function for state-value estimation depend on her plan’s current temporal resolution?
When predicting future states to use for evaluation of current prospects, the components used to represent these states are prospect-contingent. For example, when predicting and assessing future states corresponding to different medical school choices, location may be a useful state-component but less so for future states corresponding to meal choices. As these components are used for state-value estimation, this brings in questions about how the value for state-components impacts state-values.
How do we represent states and estimate their values?
Does the woman represent states as a composition of hidden variables?
If so, when predicting future states, how does she learn to decide which variables to useful for value-estimation?
Does she impose some sort of structure over the hidden variable composition?
How does she learn to place value on a state’s hidden variable constituents? This is closely related to the problem in reinforcement learning of credit assignment.
If learning occurs by updating state-value, how is value updated for a state’s constituents? This seems closely related to the problem in reinforcement learning of credit assignment.