January Dissociable neural signals for reward and emotion prediction errors Joseph Heffner 2 4 Romy Frömer 1 Matthew R. Nassar 0 3 Oriel FeldmanHall oriel.feldmanhall@brown.edu 0 2 Thayer St. Carney Institute for Brain Science, Brown University , United States Centre for Human Brain Health, School of Psychology, University of Birmingham , United Department of Cognitive, Linguistic, Psychological Sciences, Brown University , United States Department of Neuroscience, Brown University , United States Department of Psychology, Yale University , United States 2024 25 2024 836 850 Significance:

For nearly a century, scientists have asked how humans learn about their worlds. Learning models borrowed from computer science4namely, reinforcement learning4provide an elegant and simple framework that showcases how reward prediction errors are used to update one9s knowledge about the environment. However, a fundamental question persists: what exactly is 'reward'? This gap in knowledge is problematic, especially when we consider the multiplicity of social contexts where external rewards must be contextualized to gain value and meaning. We leverage electroencephalography to interrogate the role of emotion prediction errors4violations of emotional expectations4during learning. We observe distinct neural signals for reward and emotion prediction errors, suggesting that emotions may act as a bridge between external rewards and subjective value.

Introduction

Reward prediction errors (PE)4the difference between expected and experienced reward4serve as the dominant mechanism for explaining how people learn to make adaptive, value-based decisions1-6. Reward PEs function within a reinforcement learning (RL) framework, illustrating how people adjust their actions based on past experiences to achieve more successful outcomes7. This framework has been successfully applied to explain a host of simple behaviors such as avoiding financial losses1 and navigating new environments8, to more complex behaviors, such as determining who can be trusted9. While the term reward is frequently used to explain learning, the fact that 8rewards9 in the social world are abstract, difficult to quantify, and shaped by multiple features of a social situation, suggests a gap between the externally 8rewarding9 reinforcers encountered (e.g., money, smiles) and how the brain interprets them as value. A critical unresolved question thus centers around what exactly is 8reward9 and how does the brain represent it? Here we examine how the human brain computes external rewards into internal value during a social learning paradigm.

An intuitive possibility for how external reinforcers are transformed into internal value comes from the field of emotion. Decades of work indicate that emotions play a vital role in the decisionmaking process10,11, where stress inductions12, mood inductions13 and emotion regulation14 can all impact choice. Indeed, several affective theories propose that emotions are the evaluation of external rewards, and thus possess the capacity to influence future behavior15-18. Building on this theory, our previous research operationalized affect within an RL framework, which led to the hypothesis that violations of emotion expectations4known as affective prediction errors (PEs)4 influence choice. By formally quantifying affective PEs as the difference between expected and experienced emotion, we observed that affective PEs exhibit an independent effect that is stronger than monetary reward PEs in predicting one-shot social choices19. The distinction between emotion and reward was further exhibited in an independent sample, where individuals at risk of depression demonstrated selective impaired use of affective PEs but fully intact use of reward PEs in a social exchange task.

Although this dissociation suggests affective PEs have a central role in guiding socially adaptive behaviors, it remains unknown whether affective PEs also act as critical signals during trial-bytrial learning, where knowledge about others must be continually updated to adjust future choices. Additionally, there is no evidence for the separability between affective and reward PEs at the neural level. Neural separability4at the temporal, functional, or localization levels4between affective and reward PEs would be strong evidence that emotion serves as a distinct learning signal separate from reward, one that may transform external rewards into internal, subjective value. To test whether affective and reward PEs are critical for learning and are separable at the neural level, we used a repeated social exchange paradigm in conjunction with electroencephalography (EEG). EEG was chosen for its superior temporal resolution capable of capturing unfolding neural processes on the order of milliseconds. We a priori identified three potential event-related potentials (ERPs) known to reflect changes in EEG activity in response to feedback or reward processing. In particular, the feedback-related negativity (FRN) is an ERP thought to reflect the evaluation of surprising events20,21 and is theorized to be the neural basis of reward PE processing22. Additionally, the P3a and P3b are commonly linked to various aspects of feedback processing, including reward magnitude23, reward valence24, and rare, surprising outcomes25. Given the mixed evidence of how these ERPs map onto the construct of reward, we were agnostic as to which neural signal would preferentially index reward or affective PEs.

EEG was recorded during a repeated Ultimatum Game (UG), where participants (N=41) interacted with three different partner types offering a range of fair to unfair monetary offers (fair, unfair, neutral; see Methods and Fig. 1B). The repeated nature of the UG, five trials in a row per partner, allowed participants to update their expectations of a partner based on the history of offers with that person. We included two key measurements to enhance our understanding of how rewards and emotions influence feedback processing and updating (Fig. 1A). First, rather than using computational models to infer participants9 reward expectations26,27, we asked them to report the amount of money they expected to receive on each trial (ranging from $0 to $10). This allowed us to compute trial-by-trial reward PEs as the discrepancy between the actual offer and the expected one. Second, participants used a 2D affect grid to predict how they thought they would feel after receiving an offer (affect expectation), and to express how they actually feel once the offer was received (affective experience)19,28. This measure captures participant9s core affect, a consciously accessible facet of subjective feelings that categorizes feelings into core dimensions of valence (pleasurableness) and arousal (alertness/intensity). We computed affective PEs for both arousal and valence dimensions as the difference in participant9s expectation and experience on a trial-bytrial basis (Fig. 1C). Together, these measurements allow us to map all three empirical PEs (reward, valence, and arousal) onto distinct neural EEG signatures expressed during social interactions (Fig. 1D).

Results

[Figure 1 placeholder] Reward and affective PEs have separable contributions to learning. Given our prior work found that valence, compared to reward PEs, exert a stronger influence on one-off decisions to punish norm violators19, we began by examining the marginal strength of reward, valence and arousal PE signals when deciding to punish in a social learning context. This additive linear mixedeffects model (LMM) represents a strict test of our theory since each PE type competes with all others to explain variance during learning. Replicating our prior research, we found that both = 20. 76 ± 0.11, Ā = 26.65, < 0. 001) and reward PEs ( = 20. 67 ± 0.14, Ā = 24.75, < 0.001) have independent contributions when learning when to punish. That is, participants punished at higher rates when experiencing more unpleasantness (valence) or less reward than expected. Unlike our prior results, however, we observed no unique contribution of arousal PEs on decisions to punish (

= 0. 13 ± 0.11, Ā = 1.14, = 0. 26) once valence and reward PEs are accounted for.

To examine how the relationship between each PE type and choice changes over time, we interacted PE type with round number. We observe that the strength of reward and valence PEs change in opposite directions overtime (Table 1; Fig. 2A). Valence PEs exert the strongest relationship to choice on the first round when uncertainty is greatest, and significantly weakens over time as uncertainty about a partner9s behavior is slowly resolved. In contrast, reward PEs show a marginally significant interaction with round number such that they weakly predict choice in the beginning but become more predictive over time. Directly pitting both PE types against one another reveals that valence PEs have a significantly stronger impact on motivating punitive choices on the first round when compared to reward PEs ( coefficient test: z = -1.70, P = 0.04), while reward PEs have a stronger, albeit not-significant, influence on the final round, when compared to valence PEs (z = 1.05, P = 0.15). This reversal reveals how valence PEs are more impactful early on when uncertainty is greatest. Once participants have better estimates about the type of offer to expect, reward PEs become more useful for informing whether to accept or reject an unfair offer.

[Table 1 placeholder] [Figure 2 placeholder]

Reward and emotion PEs indexed by separate and dissociable neural signals. We next

investigated if each PE4reward, valence, and arousal PEs4is represented by distinct and separable neural architecture. We first standardized all PEs at the group level and used each PE type as independent variables in separate linear mixed effects regressions predicting our a priori ERPs of interest: the FRN, P3a, and P3b (see Methods). The FRN is known to encode both signed and unsigned PEs22,29, and RPE effects on the P3 component are also somewhat ambiguous on what they index30, potentially reflecting the magnitude of the PE rather than the valence23. Given the lack of clarity around what exactly these ERPs index, we first tested whether absolute or signed PEs predicted the ERPs of interests, and found more evidence for absolute, compared to signed, PEs (see Supplement). Using the absolute value of each PE as the predictor, we found that trialby-trial FRN amplitudes were uniquely predicted by reward PEs ( = 20. 23± 0.07, ā = 23.17, = 0.003 ) but not valence ( = 0. 06 ± 0.07, ā = 0.91, = 0. 36) or arousal PEs ( = 0.04 ± 0.08, ā = 0.44, = 0. 67; Fig. 3). In contrast, trial-by-trial P3b amplitudes were only predicted by valence PEs ( = 0. 06 ± 0.02, ā = 3.98, < 0.001 ), but not arousal ( = 20.001 ± 0.01, ā = 20.71, = 0. 48) or reward

PEs ( = 0. 03 ± 0.02, ā = 1.37, = 0.18; þÿý. 3), and a beta-coefficient test showed that the valence PE effect was marginally greater than the reward PE (ý = 1.44, = 0. 07). Finally, trial-by-trial P3a amplitudes were only weakly linked to arousal PEs (

= 0. 04 ± 0.02, ā = 1.90, = 0. 07), and not associated with either = 0. 01 ± 0.02, ā = 0.18, = 0. 86) or reward PEs ( = 0. 04 ± 0.02, ā = 1.69, = 0.10; Fig. 3; see supplement for an exploratory model showing that the P3a is best tracked by offer extremity). To ensure that we were capturing all possible electrophysiological signatures of reward, valence, and arousal PEs (which we might have missed with a pure a priori ERP approach) we additionally employed a data-driven method that does not rely on predefined signals31. Results from this data driven approach showed converging evidence for separate spatiotemporal clusters in response to offers for reward and valence PEs (see Supplement). Taken together, these results are the first to illustrate that emotion and reward learning signals are separately encoded in the brain.

[Figure 3 placeholder] To assess the neural learning effects on choice, especially given our behavioral findings showing that the relationship between PEs and choice varies with round number, we next allowed each ERP to interact with round. Results reveal a significant simple effect of P3b on choice (Table 2), as well as an interactive effect between P3b and round on choice (Table 2; Fig. 2B). This mirrors our behavioral finding, revealing that the P3b4which is uniquely associated with valence PEs4is the sole neural signal driving choice, and the strength of this neural signal diminishes over time as more information is gleaned about a partner9s behavior. Although not statistically significant, the interaction between FRN and round also reflects the behavioral pattern, where the relationship between FRN and choice strengthens with time (Table 2; Fig. 2B). Collectively, these findings illustrate that both FRN and P3b uniquely track reward and emotion learning signals, but that only emotion PEs, indexed by the P3b, are relevant for choice.

[Table 2 placeholder]

Reward and affective PEs are resolved through different mechanisms. Although most PEs are

no longer predictive by the final round4indicating rapid learning4it remains unclear which component of the PE drives the error signal. On one hand, participants might adjust their expectations to make upcoming events less surprising. This dovetails with reinforcement learning accounts which predominantly emphasize adjusting PEs by altering expectations (i.e., increasing Q-value of an action to anticipate a greater reward next time). On the other hand, participants could alter experiences (perhaps by employing emotion regulation tactics) to lessen an event9s impact. While research on affective forecasting suggests that accurately predicting future emotional events is challenging32,33, one could use emotion regulation strategies to modify responses to events like unfair offers34. These two accounts present divergent theories about how affective and reward PEs might drive learning.

To explore the theory that modifying expectation can reduce both affective and reward PEs, we examined how reward, valence, and arousal expectations changed throughout the task. As expected, participants show the largest update in both reward and emotion expectations between the first and second rounds (Table 3, Fig. 4). We then probed how expectations change between rounds two through five to understand if participants continue to adapt after an initial surprise. Reward expectations for this period reveal that participants continue to refine beliefs about their partners, baring evidence of continued learning through reward PEs. In contrast, expectations about valence and arousal remain largely consistent across all partner types from rounds two through five, suggesting participants do not continue to adjust their affective expectations after the initial round. In other words, only reward4and not affective4PEs are resolved by altering expectations. Next, we examined whether experiences change across the task. We found that participant9s subjective reports of their valence and arousal experiences changed significantly over rounds. Specifically, negative reactions to unfair offers wane over time, and the intense positive feelings of receiving a fair offer also diminish as you learn more about the fair partner (Table 3, rounds. Thus, these results paint a clear distinction in how affective and reward PEs are leveraged for learning: after an initial large update of expectations, reward PEs are resolved by adjusting reward expectations (dovetailing with the RL literature), whereas affective PEs are managed by aligning emotional experiences with prior predictions.

[Table 3 placeholder] [Figure 4 placeholder]

Discussion

While emotions clearly influence learning and decision-making35, most reinforcement learning models do not incorporate emotions as an error signal driving social learning. Instead, these models predominantly emphasize reward PEs as the central driver of behavior, with some exceptions36,37. In this study, we leverage EEG to determine if emotions serve as a key learning signal in a repeated economic game, and whether these emotion signals can be differentiated from reward PEs. Our behavioral results show that affective PEs have an independent and stronger contribution to choice, especially when there is significant uncertainty about a partner9s actions. As this uncertainty decreases with experience, reward PEs begin to play a more dominant role in guiding choice. At the neural level, distinct ERPs were found to index each PE: the FRN corresponds most closely with reward PEs, the P3b is associated with valence PEs, and the P3a is indexed by arousal about offer extremity. Collectively, these findings suggest that reward and emotion learning signals have dissociable neural locations with distinct temporal trajectories. At the trial level, that the temporal trajectory of the FRN comes online first and is then followed by the P3b, suggests that monetary offers are likely initially evaluated based on how surprising the reward is (e.g., <How does this monetary offer differ from my reward expectations?=) and only later are violations of emotion expectations incorporated (e.g., <I feel better/worse than anticipated=). An examination of social learning over time, however, shows that affective PEs are most powerful during early trials and attenuate as uncertainty is reduced. In contrast, reward PEs follow the opposite pattern, growing in predictive power over time.

In contrast to reward-centric accounts that dominate the learning literature38, our findings underscore that violations of affect expectations play a particularly privileged role in social learning, a role that cannot be solely subsumed by rewards. Although rewards may be sufficient for building successful artificial agents, there is growing concern about whether external reward functions are enough to explain the breadth and flexibility of human decision-making, and it has been suggested that emotional signals might bridge this gap39-41. While prior affective research has primarily explored long-term affective forecasting errors33 or contexts absent of trial-by-trial learning19, here we provide a direct assessment of affective learning signals at the neural level. Specifically, we find that the P3b component, which tracks valence PEs, stands as the sole neural predictor of social choice. This aligns with prior work showing that the P3b integrates information from multiple learning mechanisms relevant for decision policy changes31,42. While far less influential, reward PEs still contribute to learning22,29,43,44, and our findings show that the FRN uniquely indexes monetary reward PEs4even when controlling for affective signals. In summary, a comprehensive explanation of human decision-making necessitates consideration of both rewards and emotions.

While research indicates that emotional stimuli, such as faces or words, are sometimes processed relatively early45-47, our results indicate that evaluations of emotionally charged social exchanges are initially evaluated based on reward. While at first blush this might appear as a discrepancy, emotional experiences unfold over time, which include initial attention to the event and subsequent evaluation48. That we probe participants9 self-reports of their affective experience following the offer, may align more consistently with the evaluation of an affective experience rather than initial attention. This account aligns with other EEG work showing that late neural components, such as the Late Positive Potential (LPP), are associated with emotional stimuli49. Interestingly, the LPP shares similar morphology with the P3b, is sensitive to a stimulus9s emotional saliency50, can be influenced by cognitive reappraisal or attentional shifts51, and is also known to modulate attention during late-stage processing52,53. Taken together, this suggests that emotional processing that unfolds on later temporal trajectories can still be influential for higher cognition. Our data also suggests that affective experiences habituate with repeated experiences, giving rise to smaller prediction errors over time. The consequence of blunted affective responses through experience (i.e., reduced prediction errors) means that the emotional experiences of an initial interaction with a social partner is amplified compared to subsequent encounters. This pattern is consistent with normative prescriptions for learning in uncertain and dynamic environments54-56, highlighting the possibility that the attenuation of affective experience might serve an important role in optimizing learning under uncertainty. Future work could explore this further by extending our paradigm to manipulate a broader array of factors that influence normative learning dynamics, such as outcome stochasticity55,57, volatility54,57, and temporal structure31,58.

By taking the simple, albeit novel, step of incorporating emotion as an error signal into a framework, we reveal the pivotal role of affective PEs in driving social learning. With the precise temporal neural time course of EEG, we provide evidence for early processing of reward PEs and the later processing of affective PEs when learning about other people. Although violations of emotion expectations are integrated relatively late in the decision process, they play the strongest role in predicting social choice4providing evidence of a neurobiologically plausible distinct emotion error signal.

Materials and methods Participants

Participants (N = 41, 25 female, mean age = 20.8 ± 4.4) received either monetary compensation ($15 per hour) or course credits and provided informed consent in a manner approved by Brown University9s Institutional Review Board under protocol 1607001555. A power analysis of the unique effect of valence prediction error on choice in our prior work19 revealed that 18 participants would be sufficient to detect this effect with an alpha of 0.05 and power (beta) of 0.80. Accordingly, we aimed to exceed this and collected a sample of 40 participants which matches sample sizes of recent EEG studies focusing on the FRN and P30031,42.

Task and procedure

Participants played an adapted repeated Ultimatum Game59,60 that included subjective emotion and reward ratings19,28. Participants were told they were playing with past participants who gave offers across five rounds, conditional on participant9s choices to accept or reject each offer4similar to strategy methods used in economics61. Unknown to participants, their partner9s offers were generated from one of three normal distributions representing three types of proposers: 1) <unfair= proposers gave offers according to a normal distribution with a mean of $1 and SD of 0.50; 2) <neutral= proposers gave $3 on average with a SD of 0.50; and 3) <fair= proposers who gave $5 on average with a SD of 0.50. Participants played with 36 unique partners, 12 of each type and all five offers from each partner were randomly drawn from their respective normal distribution. We used faces from the 74 image MR2 database62 to represent partners and 36 faces were pseudorandomly pulled from this database per participant to achieve a balanced distribution of images of men and women of European, African, and East Asian ancestry. Subjective affective predictions and experiences were reported using a 500-500 pixel two-dimensional affect grid where the horizontal axis was valence (unpleasant/pleasant feelings) and the vertical axis was arousal (low/high intensity feelings). Both dimensions range from -250 to +250. To familiarize participants with this affect grid, participants completed an emotion classification task prior to the repeated UG. Participants made affect ratings of 20 canonical emotion words (for example, angry, sad, and surprised) on the grid, twice for each word, in a randomized order. Training participants to interpret this subjective affect grid has shown strong convergent validity with other approaches for emotion ratings63.

We calculated affective prediction errors (PEs) on a trial-by-trial basis by measuring the discrepancy between participants9 actual affective experiences and their affect expectations. Emotion PEs can be defined on both valence and arousal dimensions. A valence PE was computed by subtracting the predicted level of (un)pleasantness of an offer from the actual experienced (un)pleasantness, while an arousal PE was the difference between the expected arousal and actual experienced arousal. For instance, if a participant felt unpleasant about receiving an offer (e.g., rating it -200) but had anticipated feeling slightly pleasant (e.g., rating it +40), the valence PE would be -240 (-200 minus +40). Similarly, reward PEs were calculated by subtracting the predicted monetary reward from the actual offer given to the participant for each trial. The was UG comprised of nine blocks of four partners each (20 trials per block), with self-paced rests between blocks for a total of 180 trials. Participants additionally completed pre-UG and postUG likability ratings for all 36 partners on a visual analog scale (0 3 10, in increments of .01). The experiment was delivered in Matlab (The MathWorks, Inc.) using the Psychtoolbox-3 package and included stimulus presentation, event, and response logging. A standard computer mouse and keyboard were used for response registration.

During the UG, participants were first shown a picture of their partner for 1000ms, followed by a fixation cross (500ms, same timing for all fixations). Participants were then given cues for reward predictions ($?) or affect predictions (E?; 1000ms each); these cues indicate that participants will be making the required response on the next screen. Reward predictions (how much participants expect the partner to offer) are reported on a visual analog scale ($0 - $10) and participants have unlimited time to respond. Affect predictions (how participants expect to feel after the offer) are reported on the valence-arousal grid and participants are required to answer within 5s. The order of predictions was counterbalanced and separated by fixations. Following predictions, the offer was given (2000ms) in dollar and cent format (e.g., $2.37), and followed by another fixation. Participants were then given an affective experience rating cue (E; 1000ms), which indicated that they should rate how they felt about the offer using the affect-grid (required within 5s). A fixation followed and then participants were given a choice cue (C; 1000ms) indicating they would need to make their choice. The choices to accept or reject were presented on the screen (e.g., [A] [R]), and the order of these options was counterbalanced. When participants were matched with a new partner, they were presented with a waiting screen for 1-4s before starting the next trial. Prior to experiment, participants filled in the following two personality questionnaires: the 20-item Toronto Alexithymia Scale64 and the Temporal Experience of Pleasure Scale65. These measures were registered as potential control variables and for other purposes not addressed here. Participants were then seated in a shielded EEG cabin. Prior to completing the emotion classification and UG task, participants performed practice trials.

Psychophysiological recording and processing

EEG was recorded using BrainVision recorder software (Brain Products, München, Germany) at a sampling rate of 500 Hz from 64 Ag/AgCl electrodes mounted in an electrode cap (ECI Inc.). Data was collected using Cz as a reference channel and re-referenced to average reference offline. Electrodes below the eyes (IO1, IO2) and at the outer canthi (LO1, LO2) recorded vertical and horizonal ocular activity. At the end of the experiment we recorded prototypical eye movements (20 trials of each: up, down, left, and right) for offline ocular artifact correction. We kept electrode impedance below 10 kΩ.

EEG data were processed using Matlab (The MathWorks Inc.) using the EEGlab toolbox66 as previously described42 and included the following steps: (1) re-referencing to average reference and retrieving the Cz channel, (2) removal of blink and eye movement artifacts using BESA67, (3) bandpass filtering of .1 3 40 Hz, (3), (4) epoching the ongoing EEG from -200 to 800ms relative to offer onset, (5) removal of segments containing artifacts, based on values exceeding ±150 µ V and gradients larger than 50 µ V between two adjacent sampling points. Baselines were corrected to the 200ms pre-stimulus interval (offer onset) using the regression method in subsequent analyses68.

To define the time windows for single-trial analyses of FRN, P3a and P3b amplitudes, we first determined the grant average peak latencies of FCz, FCz, and Pz, respectively. Accordingly, the FRN was quantified on single trials as the average voltage within an interval from 315 to 415ms after offer onset across all electrodes within a fronto-central region of interest including F3, Fz, F4, FC3,FCz, FC4, C3, Cz, C423. To control for P2 effects on the FRN, the P2 amplitude was also extracted within each trial as the average voltage between 199-299ms across fronto-central electrodes F1, Fz, F2, FC1, FCz, FC2, C1, Cz, C2, and included as a regressor in the analyses. P3a amplitude was quantified on single trials as the average voltage within a 363-463ms interval postoffer across fronto-central electrodes F1, Fz, F2, FC1, FCz, FC2, C1, Cz, C242. P3b amplitude was quantified on single trials as the average voltage within a 530-630ms interval post-offer within a parietally-focused region of interest including CP1, CPz, CP2, P1, Pz, P2, PO3, POz, PO442.

Analyses

Reward PEs (RPE) was determined as the offer minus the reward prediction given by participants Valence and Arousal PEs were determined similarly: the affective experience participants reported upon receiving the offer minus participant9s affective prediction for how they would feel after the offer. Prior to analyses, reward, valence, and arousal PEs were standardized but not mean centered, as zero represents a meaningful value on these scales (predicted and actual experiences are the same). Inspection of the behavioral data identified four trials in which impossible affect ratings were given (valence or arousal ratings outside of the 500 by 500-pixel grid) and these data were excluded from relevant analyses.

References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Ting, X., Terry, L. & Montague, P. R. Computational Substrates of Norms and Their Violations during Social Exchange. The Journal of Neuroscience 33, 1099 (2013). https://doi.org:10.1523/JNEUROSCI.1642-12.2013 Xiaosi, G. et al. Necessary, Yet Dissociable Contributions of the Insular and Ventromedial Prefrontal Cortices to Norm Adaptation: Computational and Lesion Evidence in Humans. The Journal of Neuroscience 35, 467 (2015). https://doi.org:10.1523/JNEUROSCI.2906-14.2015 Heffner, J. & FeldmanHall, O. A probabilistic map of emotional experiences during competitive social interactions. Nature Communications 13, 1718 (2022). https://doi.org:10.1038/s41467-022-29372-8 Sambrook, T. D. & Goslin, J. A neural reward prediction error revealed by a metaanalysis of ERPs using great grand averages. Psychological bulletin 141, 213 (2015). San Martín, R. Event-related potential studies of outcome processing and feedbackguided learning. Front Hum Neurosci 6, 304 (2012). https://doi.org:10.3389/fnhum.2012.00304 Nassar, M. R., Bruckner, R. & Frank, M. J. Statistical context dictates the relationship between feedback-related EEG signals and learning. eLife 8, e46975 (2019). https://doi.org:10.7554/eLife.46975 Gilbert, D. T., Pinel, E. C., Wilson, T. D., Blumberg, S. J. & Wheatley, T. P. Immune neglect: A source of durability bias in affective forecasting. Journal of Personality and Social Psychology 75, 617-638 (1998). https://doi.org:10.1037/0022-3514.75.3.617 39 40 41 42 43 44 45 46 47 48 49 50 51 FeldmanHall, O. & Heffner, J. A generalizable framework for assessing the role of emotion during choice. American Psychologist 77, 1017-1029 (2022). https://doi.org:10.1037/amp0001108 Frömer, R. et al. Response-based outcome predictions and confidence regulate feedback processing and learning. eLife 10, e62825 (2021). https://doi.org:10.7554/eLife.62825 Holroyd, C. B., Nieuwenhuis, S., Yeung, N. & Cohen, J. D. Errors in reward prediction are reflected in the event-related brain potential. Neuroreport 14, 2481-2484 (2003). Cavanagh, J. F. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205-216 (2015). https://doi.org:https://doi.org/10.1016/j.neuroimage.2015.02.007 Willis, J. & Todorov, A. First Impressions: Making up Your Mind after a 100-Ms Exposure to a Face. Psychological Science 17, 592-598 (2006).

Baum, J. & Abdel Rahman, R. Emotional news affects social judgments independent of perceived media credibility. Social Cognitive and Affective Neuroscience 16, 280-291 (2021). https://doi.org:10.1093/scan/nsaa164 Phelps, E. A., Ling, S. & Carrasco, M. Emotion facilitates perception and potentiates the perceptual benefits of attention. Psychological science 17, 292-299 (2006). https://doi.org:10.1111/j.1467-9280.2006.01701.x Barrett, L. F. & Gross, J. J. in Emotions: Currrent issues and future directions. Emotions and social behavior. 286-310 (The Guilford Press, 2001).

Rahman, A. Facing good and evil: early brain signatures of affective biographical knowledge in face recognition. Emotion 11, 1397-1405 (2011). https://doi.org:10.1037/a0024717 Brown, S., van Steenbergen, H., Band, G., de Rover, M. & Nieuwenhuis, S. Functional significance of the emotion-related late positive potential. Frontiers in Human Neuroscience 6 (2012). https://doi.org:10.3389/fnhum.2012.00033 DeCicco, J. M., O'Toole, L. J. & Dennis, T. A. The late positive potential as a neural signature for cognitive reappraisal in children. Dev Neuropsychol 39, 497-515 (2014). https://doi.org:10.1080/87565641.2014.959171 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

Bagby, R. M., Parker, J. D. A. & Taylor, G. J. The twenty-item Toronto Alexithymia scale4I. Item selection and cross-validation of the factor structure. Journal of Psychosomatic Research 38 (1994). Alday, P. M. How much baseline correction do we need in ERP research? Extended GLM model can replace baseline correction while lifting its limits. Psychophysiology 56, e13451 (2019). https://doi.org:https://doi.org/10.1111/psyp.13451 ĂÿÿĀ/i,t~ β0 + β1ýĂ þÿā + 5ýĂ þÿā + 7 ÿĀĂĀþ ý i,t + β2 þ ĂÿĀĂ ý i,t + β3 ÿĀĂĀþ ý i,t + 4ýĀĂÿā , ý , : ýĀĂÿā , + 6 þ ĂÿĀĂ ý , : ýĀĂÿā , ý , : ýĀĂÿā , + ε

Estimate (SE) Variable z p Punish

Intercept -1.61 (0.32) -5.03 <.001*** Reward PE -0.50 (0.17) -3.02 .003** Valence PE -0.99 (0.15) -6.42 <.001*** Arousal PE 0.05 (0.14) 0.34 .737 Round 0.04 (0.03) 1.66 .10 Reward PE×Round -0.06 (0.03) -1.87 .06 Valence PE×Round 0.08 (0.03) 2.30 .02*

Arousal PE×Round 0.03 (0.03) 0.98 .33 Note. Reward PEs are calculated by taking the difference between the experienced and predicted reward. Valence PEs and Arousal PEs are calculated by taking the difference between the experienced and predicted emotion on the relevant affect dimension. All variables were scaled but not mean-centered, as the zero point on each scale refers to the meaningful instance where expectations matched experience. The model includes subject-specific random intercepts and slopes for Reward PE, Valence PE, and Arousal PE. The dataset includes 7,376 observations from 41 participants. *** p <.001, ** p < .01, * p < .05. 5ýĀĂÿā , 8þý , : ýĀĂÿā ,

Fair partner

Reward 9.53 (<.001***) 0.02±0.01 (.02*) Valence 9.23 (<.001***) 0.01±0.01 (.44) -0.04±0.01 (<.001***) Arousal 5.00 (<.001***) -0.001±0.02 (.94) -0.08±0.02 (<.001***)

Neutral partner

Reward -3.00 (.005**) -0.02±0.01 (.09) Valence -1.44 (.16) -0.02±0.01 (.16) 0.02±0.01 (.05) Arousal -1.45 (.15) -0.03±0.01 (.02*) -0.01±0.01 (.35)

Unfair partner

Reward -14.19 (<.001***) -0.05±0.02 (.006**) Valence -10.36 (<.001***) 0.01±0.02 (.51) 0.07±0.02 (<.001***) Arousal -2.28 (0.03*) -0.01±0.02 (.55) -0.003±0.01 (.82) Note. Rounds 1-2 update (expectations) shows the result of a paired t-test comparing expectation values from round 2 to round 1. Rounds 2-5 update (expectations) shows the result of a LMM comparing how expectation values change between rounds 2-5. Rounds 1-5 update (experience) shows the result of LMMs comparing how emotional experiences change between rounds 1-5. Reward experiences were defined by task parameters and are stable across rounds (see Methods). ***P<.001, **P<.01, *P<.05. beta coefficient from LMMs modeling the marginal contributions of the absolute value of each PE *P<.05. type on separate ERPs: the FRN, P3a, and P3b. Error bars reflect ±1 S.E. ***P<.001, **P<.01,

Pessiglione , M. , Seymour , B. , Flandin , G. , Dolan , R. J. & Frith , C. D. Dopaminedependent prediction errors underpin reward-seeking behaviour in humans . Nature 442 , 1042 - 1045 ( 2006 ). https://doi.org: 10 .1038/nature05051 King-Casas, B. et al. Getting to know you: reputation and trust in a two-person economic exchange . Science 308 , 78 - 83 ( 2005 ). https://doi.org: 10 .1126/science.1108062 Schultz, W. , Dayan , P. & Montague , P. R. A neural substrate of prediction and reward . Science 275 , 1593 - 1599 ( 1997 ). https://doi.org:DOI 10.1126/science.275.5306.1593 Schultz, W. & Dickinson , A. Neuronal coding of prediction errors . Annual Review of Neuroscience 23 , 473 - 500 ( 2000 ). https://doi.org: 10 .1146/annurev.neuro. 23 .1.473 Daw, Nathaniel D. , Gershman , Samuel J. , Seymour , B. , Dayan , P. & Dolan , Raymond J. Model-Based Influences on Humans' Choices and Striatal Prediction Errors. Neuron 69 , 1204 - 1215 ( 2011 ). https://doi.org:https://doi.org/10.1016/j.neuron. 2011 . 02 .027 Rouhani, N. & Niv , Y. Signed and unsigned reward prediction errors dynamically enhance learning and memory . eLife 10 , e61077 ( 2021 ). https://doi.org: 10 .7554/eLife.61077 Sutton, R. S. & Barto , A. G. Reinforcement learning: An introduction , 2nd ed. (The MIT Press, 2018 ). Sosa , M. & Giocomo , L. M. Navigating for reward . Nature Reviews Neuroscience 22 , 472 - 487 ( 2021 ). https://doi.org: 10 .1038/s41583-021-00479-z Lamba , A. , Frank , M. J. & FeldmanHall , O. Anxiety Impedes Adaptive Social Learning Under Uncertainty . Psychological Science 31 , 592 - 603 ( 2020 ). https://doi.org: 10 .1177/0956797620910993 Lerner, J. S. , Li , Y. , Valdesolo , P. & Kassam , K. S. Emotion and Decision Making . Annual Review of Psychology 66 , 799 - 823 ( 2015 ). https://doi.org: 10 .1146/annurevpsych-010213-115043 Phelps , E. A. Emotion and Cognition: Insights from Studies of the Human Amygdala . Annual Review of Psychology 57 , 27 - 53 ( 2005 ). https://doi.org: 10 .1146/annurev.psych. 56 .091103.070234 FeldmanHall, O. , Raio , C. M. , Kubota , J. T. , Seiler , M. G. & Phelps , E. A. The Effects of Social Context and Acute Stress on Decision Making Under Uncertainty . Psychological science 26 , 1918 - 1926 ( 2015 ). https://doi.org: 10 .1177/0956797615605807 Stanton, S. J. , Reeck , C. , Huettel , S. A. & LaBar , K. S. Effects of induced moods on economic choices . Judgment and Decision Making 9 , 167 - 175 ( 2014 ). Martin , L. N. & Delgado , M. R. The influence of emotion regulation on decision-making under risk . J Cogn Neurosci 23 , 2569 - 2581 ( 2011 ). https://doi.org: 10 .1162/jocn. 2011 .21618 Frijda, N. H. Emotion , cognitive structure, and action tendency . Cognition and Emotion 1 , 115 - 143 ( 1987 ). https://doi.org: 10 .1080/02699938708408043 Lowe, R. The Feeling of Action Tendencies: On the Emotional Regulation of GoalDirected Behavior. Frontiers in Psychology 2 ( 2011 ). https://doi.org: 10 .3389/fpsyg. 2011 .00346 Barrett, L. F. The theory of constructed emotion: an active inference account of interoception and categorization . Social Cognitive and Affective Neuroscience 12 , 1 - 23 ( 2017 ). https://doi.org: 10 .1093/scan/nsw154 Heimer, O. , Kron , A. & Hertz , U. Temporal dynamics of the semantic versus affective representations of valence during reversal learning . Cognition 236 , 105423 ( 2023 ). https://doi.org:https://doi.org/10.1016/j.cognition. 2023 .105423 Heffner, J. , Son , J.-Y. & FeldmanHall , O. Emotion prediction errors guide socially adaptive behaviour . Nature Human Behaviour 5 , 1391 - 1401 ( 2021 ). https://doi.org: 10 .1038/s41562-021-01213-6 Hauser, T. U. et al. The feedback-related negativity (FRN) revisited: New insights into the localization, meaning and network organization . NeuroImage 84 , 159 - 168 ( 2014 ). https://doi.org:https://doi.org/10.1016/j.neuroimage. 2013 . 08 .028 Hayden, B. Y. , Heilbronner , S. R. , Pearson , J. M. & Platt , M. L. Surprise signals in anterior cingulate cortex: neuronal encoding of unsigned reward prediction errors driving adjustment in behavior . J Neurosci 31 , 4178 - 4187 ( 2011 ). https://doi.org: 10 .1523/JNEUROSCI.4652- 10 . 2011 Holroyd, C. B. & Coles , M. G. H. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity . Psychol Rev 109 , 679 - 709 ( 2002 ). https://doi.org: 10 .1037/ 0033 - 295x . 109 .4.679 Yeung, N. & Sanfey , A. Independent coding of reward magnitude and valence in the human brain . Journal of Neuroscience 24 , 6258 - 6264 ( 2004 ). Hajcak , G. , Holroyd , C. B. , Moser , J. S. & Simons , R. F. Brain potentials associated with expected and unexpected good and bad outcomes . Psychophysiology 42 , 161 - 170 ( 2005 ). https://doi.org: 10 .1111/j.1469- 8986 . 2005 . 00278 . x Mars , R. B. et al. Trial-by-Trial Fluctuations in the Event-Related Electroencephalogram Reflect Dynamic Changes in the Degree of Surprise . The Journal of Neuroscience 28 , 12539 ( 2008 ). https://doi.org: 10 .1523/JNEUROSCI.2925- 08 .2008 Wilson, T. D. & Gilbert , D. T. Affective Forecasting: Knowing What to Want. Current Directions in Psychological Science 14 , 131 - 134 ( 2005 ). https://doi.org: 10 .1111/j.0963- 7214 . 2005 . 00355 . x Xiao , E. & Houser , D. Emotion expression in human punishment behavior . Proc Natl Acad Sci U S A 102 , 7398 - 7401 ( 2005 ). https://doi.org: 10 .1073/pnas.0502399102 Phelps, E. A. , Lempert , K. M. & Sokol-Hessner , P. Emotion and Decision Making: Multiple Modulatory Neural Circuits . Annual Review of Neuroscience 37 , 263 - 287 ( 2014 ). https://doi.org: 10 .1146/annurev-neuro- 071013 -014119 Eldar , E. , Rutledge , R. B. , Dolan , R. J. & Niv , Y. Mood as Representation of Momentum. Trends in Cognitive Sciences 20 , 15 - 24 ( 2016 ). https://doi.org: 10 .1016/j.tics. 2015 . 07 .010 Bennett, D. , Davidson , G. & Niv , Y. A model of mood as integrated advantage . Psychol Rev 129 , 513 - 541 ( 2022 ). https://doi.org: 10 .1037/rev0000294 Silver, D. , Singh , S. , Precup , D. & Sutton , R. S. Reward is enough . Artificial Intelligence 299 , 103535 ( 2021 ). https://doi.org:https://doi.org/10.1016/j.artint. 2021 .103535 Liu, Y. , Huang , H. , McGinnis-Deweese , M. , Keil , A. & Ding , M. Neural substrate of the late positive potential in emotional processing . J Neurosci 32 , 14563 - 14572 ( 2012 ). https://doi.org: 10 .1523/JNEUROSCI.3109- 12 .2012 Dennis, T. A. & Hajcak , G. The late positive potential: a neurophysiological marker for emotion regulation in children . J Child Psychol Psychiatry 50 , 1373 - 1383 ( 2009 ). https://doi.org: 10 .1111/j.1469- 7610 . 2009 . 02168 .x Behrens, T. E. , Woolrich , M. W. , Walton , M. E. & Rushworth , M. F. Learning the value of information in an uncertain world . Nat Neurosci 10 , 1214 - 1221 ( 2007 ). https://doi.org: 10 .1038/nn1954 Nassar, M. R. , Wilson, R. C. , Heasly , B. & Gold , J. I. An Approximately Bayesian DeltaRule Model Explains the Dynamics of Belief Updating in a Changing Environment . The Journal of Neuroscience 30 , 12366 - 12378 ( 2010 ). https://doi.org: 10 .1523/jneurosci.0822- 10 . 2010 Yu, L. Q. , Wilson, R. C. & Nassar , M. R. Adaptive learning is structure learning in time . Neuroscience & Biobehavioral Reviews 128 , 270 - 281 ( 2021 ). https://doi.org:https://doi.org/10.1016/j.neubiorev. 2021 . 06 .024 Piray, P. & Daw , N. D. A model for learning based on the joint estimation of stochasticity and volatility . Nature Communications 12 , 6587 ( 2021 ). https://doi.org: 10 .1038/s41467-021-26731-9 Razmi, N. & Nassar , M. R. Adaptive Learning through Temporal Dynamics of State Representation . The Journal of Neuroscience 42 , 2524 - 2538 ( 2022 ). https://doi.org: 10 .1523/jneurosci.0387- 21 . 2022 Cooper, D. J. & Dutcher , E. G. The dynamics of responder behavior in ultimatum games: a meta-study . Experimental Economics 14 , 519 - 546 ( 2011 ). https://doi.org: 10 .1007/s10683-011-9280 -x Güth , W. , Schmittberger , R. & Schwarze , B. An experimental analysis of ultimatum bargaining . Journal of Economic Behavior & Organization 3 , 367 - 388 ( 1982 ). https://doi.org:https://doi.org/10.1016/ 0167 - 2681 ( 82 ) 90011 - 7 Bahry, D. L. & Wilson, R. K. Confusion or fairness in the field? Rejections in the ultimatum game under the strategy method . Journal of Economic Behavior &amp; Organization 60 , 37 - 54 ( 2006 ). Strohminger , N. et al. The MR2: A multi-racial, mega-resolution database of facial stimuli . Behavior Research Methods 48 , 1197 - 1204 ( 2016 ). https://doi.org: 10 .3758/s13428-015-0641-9 Russell, J. , Weiss , A. & Mendelsohn , G. Affect Grid: A Single-Item Scale of Pleasure and Arousal . Journal of Personality and Social Psychology 57 , 493 - 502 ( 1989 ). https://doi.org: 10 .1037/ 0022 - 3514 . 57 .3.493 Gard, D. E. , Gard , M. G. , Kring , A. M. & John , O. P. Anticipatory and consummatory components of the experience of pleasure: A scale development study . Journal of Research in Personality 40 , 1086 - 1102 ( 2006 ). https://doi.org: 10 .1016/j.jrp. 2005 . 11 .001 Delorme, A. & Makeig , S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis . J Neurosci Methods 134 , 9 - 21 ( 2004 ). https://doi.org: 10 .1016/j.jneumeth. 2003 . 10 .009 Ille, N. , Berg , P. & Scherg , M. Artifact correction of the ongoing EEG using spatial filters based on artifact and brain signal topographies . J Clin Neurophysiol 19 , 113 - 124 ( 2002 ). https://doi.org: 10 .1097/ 00004691 -200203000-00002