December *,¬ Equal contribution Sebastijan Veselic Timothy H. Muller Elena Gutierrez Timothy E. J. Behrens Laurence T. Hunt James L. Butler 1 ¬ Steven W. Kennerley College London London Radcliffe Hospital Oxford London London Clinical and Movement Neurosciences, Department of Motor Neuroscience, University Department of Experimental Psychology, University of Oxford , UK Department of Psychiatry, University of Oxford , Oxford , UK Sainsbury Wellcome Centre for Neural Circuits and Behaviour College, University College Wellcome Centre for Human Neuroimaging, University College London , London , UK Wellcome Centre for Integrative Neuroimaging, University of Oxford, FMRIB , John 2023 16 2023 1282 1323

The prefrontal cortex is crucial for economic decision-making and representing the value of options. However, how such representations facilitate flexible decisions remains unknown. We reframe economic decision-making in prefrontal cortex in line with representations of structure within the medial temporal lobe because such cognitive map representations are known to facilitate flexible behaviour. Specifically, we framed choice between different options as a navigation process in value space. Here we show that choices in a 2D value space defined by reward magnitude and probability were represented with a grid-like code, analogous to that found in spatial navigation. The grid-like code was present in ventromedial prefrontal cortex (vmPFC) local field potential theta frequency and the result replicated in an independent dataset. Neurons in vmPFC similarly contained a grid-like code, in addition to encoding the linear value of the chosen option. Importantly, both signals were modulated by theta frequency - occurring at theta troughs but on separate theta cycles. Furthermore, we found sharp-wave ripples - a key neural signature of planning and flexible behaviour - in

-

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 vmPFC, which were modulated by accuracy and reward. These results demonstrate that multiple cognitive map-like computations are deployed in vmPFC during economic decisionmaking, suggesting a new framework for the implementation of choice in prefrontal cortex.

Main text

The prefrontal cortex (PFC) is fundamental for learning and choice1–10. A dominant idea in economic decision-making has been that value is represented in a common currency format, which facilitates efficient action selection1,5,11. While these studies have led to great progress towards understanding PFC9s neural code in relatively simple and overlearned decision contexts, humans and animals often face novel choices where they must infer or construct value based on previous experience12,13. Here, we investigated whether making choices of this kind requires a new framework for representing choice options within a value space; one in which the representation of choice task structure becomes crucial.

In contrast to research on the PFC, research on the medial temporal lobe (MTL) has investigated representations of structure and representations supporting inference14. In the MTL, cognitive maps encoding the relationships between entities in the world are built, supporting flexible behaviour15,16. Two key neural substrates allow for this: the ability to infer vectors between different locations using a grid-like code14, and planning-related signals observed during replay and coinciding with sharp wave ripples (SWR)17,18. Grid cells represent the structure of space19–21 and such structural representations are thought necessary for inferential choices that go beyond direct experience14,16,21. Relatedly, ripples in MTL may reflect planning signals during model-based reinforcement learning18, as place cells encode sequences of locations during replay17, which may underlie our ability to compositionally bind information and help facilitate novel choice22. 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

While investigating cognitive

map-like representations and computations has historically focussed on physical space and the MTL, such signals have now been found in abstract spaces and outside MTL12,13,15,23–27. fMRI work in (non) human primates has implicated the ventral and medial parts of PFC (mPFC, vmPFC)12,13,15, perhaps due to their strong anatomical links with the MTL28,29. This suggests the computations supporting inference, novel choice, and planning in spatial navigation may be a general neural mechanism implemented in the brain; however, the link to choice has not yet been shown. Motivated by recent findings in non-human primate fMRI showing a grid-like code in a value space as subjects passively navigated between trials12, we asked whether the same neural code is present during choice itself, which would suggest it is used for navigation between possible (choice) locations in abstract space.

Here, we demonstrate the presence of a grid-like code at choice, defined by the trajectory between choice options in a two-dimensional (2D) value space. This grid-like code is present in local field potential (LFP) and single neurons in vmPFC, suggesting such a code is used for making choices in a similar fashion to navigating routes between locations in physical space. In addition, we show that while this code is stable over the same stimulus sets, it realigns over new stimulus sets, analogous to grid realignment observed in physical space, deepening the parallel between spatial and abstract map-like representations. Finally, we report the first evidence of ripples in non-human primate vmPFC and show these ripples are present primarily at choice and outcome, suggesting their role in binding choices and outcomes. Jointly, these results suggest the neural code in vmPFC underlying economic decision-making bears resemblance to the well-characterised representations of cognitive maps in the MTL that support inferential choices in space. This bridges two seemingly disparate fields – one traditionally focused on space and memory in the MTL, and one focussing on value and PFC – in a more unified perspective. 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

Behaviour and chosen value representations in PFC

Two male rhesus macaques (Macaca mullata) made decisions between two options (Figure 1A), where an 8option9 had to be constructed compositionally from two previously learned cues (images) that never formed an option during training (See Methods). A set of five cues mapped to one out of five reward probability levels, while another set of five cues mapped to one out of five reward magnitude levels. Subjects9 choice accuracy was above chance (Figure 1BC). We first looked for canonical value signals in single neurons across the four regions we recorded from (see Supplementary Figure S1): anterior cingulate cortex (ACC, n = 198), dorsolateral prefrontal cortex (dlFPC, n = 156), orbitofrontal cortex (OFC, n = 195), and vmPFC (n = 160), in line with how data from such experiments are typically analysed. Neurons in ACC, dlPFC, and OFC significantly encoded chosen value (Figure 1DE) and chosen value difference (Figure S2) in a 300-millisecond window before subjects initiated their choice. The signal was overall strongest in ACC neurons compared to the other three regions, thereby matching similar studies3,30. In contrast, we found no evidence for a chosen value or chosen value difference signal in vmPFC neurons during the same time period. This highlighted a frequent divergence between fMRI research, where choice-related value signals in vmPFC are commonly found12,31–33, and findings from electrophysiology, where choicerelated activity is often weaker or absent relative to neurons in other regions34–36. 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118

A cognitive map of value space in vmPFC emerges at choice

While we did not find canonical value signals in vmPFC, we hypothesised the presence of map-like representations, based on previous work showing a grid-like code within vmPFC 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 and adjacent (medial PFC) regions12,13,15. We predicted the cues representing individual attributes would be composed into options and mapped onto a reward magnitude by reward probability value space (Figure 2A, left panel), where a choice option corresponds to a location in 2D space. Each choice between two options reflects a trajectory between two locations in this 2D space. This is analogous to vector-based navigation in physical space14. We tested the hypothesis that subjects construct a cognitive map of value space using a gridlike code. We did this by looking for periodic modulation of neural activity predicted to arise from firing rate properties of grid cells37,38. Specifically, we looked for a grid-like code in local field potential activity in the theta frequency, based on previous work in physical space39. Such hexadirectional analyses rely on a pattern of signal where trajectories that are aligned with the grid field elicit stronger neural activity compared to those that are misaligned (Figure 2A, middle and right panel).

We first regressed out all canonical value signals and reaction times, ensuring any obtained signals would not be confounded by them (see Methods). In the resulting residuals we found a cognitive map of value space, represented with a grid-like code in vmPFC theta frequency in a 300-millisecond window before subjects initiated their choice (Figure 2B, t(15) = 3.20, p = 0.0061) across recording sessions. The effect appeared before subjects made their choice (Figure 2C) and had distinct sixfold periodicity (Figure 2D), as in previous work investigating grid-like coding15,38,39. In contrast, there was no grid-like code in control LFP frequencies (Figure S3C), as in ref39. Individually, 25% of sessions (4/16) showed significant 6-fold modulation (Figure 2E, Binomial test, p < .001, see Figure 2FG for an example individual session). We found no grid-like code in ACC (t(40) = 1.54, p = 0.13) or OFC (t(33) = 1.73, p = 0.09), despite previous reports15,40, nor any grid-like signal in dlPFC (t(31) = 0.12, p = 0.182; see Figure S3ABD for regional comparisons).

1 pBonferroni = 0.03. 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164

In additional analyses, we observed two suggestive trends. The first trend implied the grid-like code in vmPFC became stronger as a stimulus set became more familiar (Figure S3EFG), perhaps suggesting the cognitive map is constructed during choice and further consolidated during sleep by hippocampal sharp wave ripples41. The second trend suggested the neural geometry of the cognitive map in vmPFC is better explained by a veridical compared to a distorted representation of the cognitive map (Figure S4).

Finally, to demonstrate the robustness of our result, we replicated the main effect from Figure 2B using the same analysis approach in an independent dataset that is currently being collected (Figure 2H, t(10) = 2.74 p = 0.02), see Figure S5 for a task explanation of the independent dataset). The effect similarly occurred before choice (Figure 2I) and exhibited sixfold periodicity (Figure 2J).

These results show subjects construct a cognitive map of value space with a grid-like code in vmPFC, echoing previous reports12,13, and suggest how it may be used for choice. Subjects construct a cognitive map by compositionally binding cues into options and embed these options as <locations= in a value space spanning choice-relevant attributes. Because this embedding occurs at choice on a trial-by-trial basis, this allows for computing navigation trajectories between choice locations in the value space. Crucially, the navigation trajectory between choice options and the length of that trajectory within a value space allows for computing the decision variable and thus allows for optimal choice (Figure 2A). This suggests a grid-like code may be used for the choice process itself, where it could facilitate inference of novel choices13,26 analogous to how grid cells facilitate inference of spatial shortcuts14. 167 168 169 170 171 172 173 174 175 176 177 178

Figure 2: The value space is represented with a grid-like code at choice in vmPFC. a) Left panel: A value space is organised along the reward magnitude and reward probability values used for choice. Within this value space, the left and right options are embedded as <locations=. A trajectory or navigation angle can be computed between each pair of possible locations. Middle & right panel: Navigation angles falling along the grid field of a hypothetical grid cell (aligned) are predicted to elicit stronger oscillatory activity compared to angles that do not fall along the grid field (unaligned). b) Significant hexadirectional (sixfold) modulation in vmFPC but not control symmetries across recording sessions. Each session represents the average of several channels recorded within that session. See also Figure S3A. Error bars represent SEM across sessions (n = 16). * pBonferroni < .05, corrected for symmetries. c) Time course of hexadirectional (sixfold) modulation in vmPFC and other brain regions. Blue shading denotes the original time window in Figure 2B and Figure 1E. Lines above brain regions denote significant hexadirectional (sixfold) encoding at p < .05. See also Figure 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209

S3B. Error bars represent SEM across sessions within brain region (dlPFC = 32, OFC = 34, ACC = 41). d) Sixfold periodicity in vmPFC across sessions as predicted by Figure 2A, right panel. Error bars represent SEM across sessions. e) Percentage of significant sessions for individual symmetries obtained through permutation testing (n = 1000) by comparing session signal averages to the 99th percentile of a null distribution. f) Sixfold periodicity for an example session. Error bars represent SEM across channels within that session (n = 3). g) Average signal for aligned compared to unaligned trials. Error bars represent SEM averaged across recorded channels (n = 3) within that session. h) Significant hexadirectional (sixfold) modulation in vmPFC but not control symmetries in an independent dataset occurring 100-500 msec after Option 2 onset. See Figure S5 for the task description. * p < .05. Error bars represent SEM across sessions (n = 11). i) Time course of hexadirectional (sixfold) modulation in vmPFC and another brain region (OFC) in an independent dataset. Error bars represent SEM across sessions within brain region (OFC, n = 8). j) Sixfold periodicity in vmPFC across sessions in an independent dataset. Error bars represent SEM across sessions.

Grid orientations realign across stimulus sets but are stable within stimulus sets

To deepen the parallels with spatial cognition, we next tested whether the observed grid-like code was consistent across sessions and stimulus sets15. Grid cell grid orientations are typically stable within the environments they are recorded in but realign their grid fields across different environments42. Therefore, if the signal we observed indeed arose from populations of grid cells, this would predict a systematic difference in grid orientations across sessions, but only when the stimulus sets (i.e. <environment=) differed. Our study was unique in that every few sessions the stimuli denoting reward magnitude and reward probability values were changed (Figure 3A). Subjects had to learn these in a separate experiment interleaved with the choice task reported here. This meant identical value spaces (i.e. identical task structure) were decorrelated from sensory properties of stimuli subjects observed, analogous to having different sensory experiences in different spatial environments with identical underlying spatial structure42. As predicted, when we used grid orientations from one session to estimate a grid-like code in different sessions, we only observed a consistent grid-like code within (Figure 3B, t(8) = 4.77, p = .00142) but not across stimuli sets (Figure 3C, t(15) = 0.06, p = .95). 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226

We further demonstrated differences in grid orientations were not due to noise by computing grid orientation angle distances between sessions, comparing within vs. between stimuli set sessions. Average distances between grid orientations within stimuli sets were significantly smaller compared to ones across stimuli sets (Figure 3DE, z = 2.42, p < .01). The result in neither Figure 3B nor Figure 3E was not explained by potential within-session grid orientation correlations across channels as these were removed from the analysis beforehand. It was also not explained by time differences of pairwise comparisons (i.e. degree of exposure to a stimulus set) on which the distances were computed as a control analysis (Figure S6).

Overall, these results are the first demonstration of non-spatial, or abstract, grid realignment. Given grid cell realignment occurs when generalising across spatial contexts42, our results suggest similar mechanisms may be at play in non-spatial generalisation. This deepens the connection between known computational properties of grid cells measured in rodents during spatial navigation42 and our understanding of grid-like codes in non-spatial environments of other species. It also further provides evidence that the hexadirectional analysis used to estimate grid-like codes likely measures underlying grid cell activity, and hence supports the idea of prefrontal cortex containing grid cells constructing cognitive maps of abstract spaces. 229 230 231 232 233 234 235 236 237 238 239 from different sessions with shared stimulus sets was smaller than the average difference in grid orientation from different sessions with different stimulus sets. The black line denotes the empirical value obtained for the difference of between-within grid orientation distances. The gray histogram denotes the shuffled null distribution (n = 1000 permutations).

A grid-like code in vmPFC neurons and its theta phase-dependency

Having established a cognitive map of value space exists in the vmPFC LFP signal, we next sought evidence for such a representation in vmPFC neurons. Despite multiple studies demonstrating population-level grid-like codes in BOLD and LFP activity12,13,15,38,39, it has not been shown whether neurons represent information with a grid-like code as well.

Similar to cells in the hippocampal formation and OFC43,44, we observed neurons in vmPFC exhibit theta modulation (Figure 4ABC, F(9, 159) = 8.87, p = 4.20-13, see also Figure S7ABC) – firing most at theta troughs and firing least at theta peaks. We therefore isolated our hexadirectional analyses in vmPFC neurons to specific phases of several theta cycles using grid orientations from vmPFC channels (Figure 4D). This approach yielded a significant grid-like code across vmPFC neurons, but only at the trough of the average theta phase (Figure 4E, t(159) = 3.08, p = 0.0023), which was when cells fired the most (Figure 4A, right panel).

To further validate this signal was specific to theta troughs, we aligned theta phases in time on a trial-by-trial basis for each channel (see example for one channel in Figure 4F) and shifted the firing rates of vmPFC neurons based on their corresponding channels. This ensured that across neurons, neuronal firing at different timepoints was precisely synchronized to the theta phase of the channel on which a neuron was recorded. This analysis similarly revealed a grid-like code in vmPFC neurons at one theta trough (Figure 4G), which was temporally localized (Figure S7G) to the approximate time period as the theta trough labelled 829 in Figure 4DE. This temporal pattern suggests the observed grid-like code occurred predominately within one theta cycle before choice – however – it remained weakly significant when we averaged over all theta troughs occurring in the 300-millisecond window before subjects initiated their choice from Figure 2B (Figure S7D, t(159) = 2.26, p = 0.025). When we plotted vmPFC neurons with high grid-like coding estimates (Figure 4HIJK), their rate maps had irregular firing fields, exhibiting a higher firing rate for multiple nonneighbouring states in the value space.

Because we found a theta phase-dependent grid-like code in vmPFC neurons, we tested whether this may explain the lack of chosen value signals in vmPFC in Figure 1DE, given previous work45, and considering other recorded regions encoded this information. Using the theta phase-aligned neuronal firing rates mentioned above, we indeed observed the chosen value signal occurring in a theta phase-dependent manner (Figure 4L). vmPFC neurons encoded chosen value at the trough of the theta cycle following the trough of the theta cycle containing a grid-like code; suggesting the cognitive map must first be constructed, and the relevant navigation angle computed, before estimates of chosen value can be represented. As an additional test, we used the same seed windows from Figure 4D and replicated this observation whereby vmPFC neurons weakly encoded chosen value at the subsequent theta trough (Figure S7E; corresponding to the time period labelled as 849 in Figure 4DE) but not the one before. Across neurons, the chosen value signal did not correlate with the strength of the grid-like code (Figure S7F, S7I), suggesting neurons supporting cognitive map-like representations are independent from neurons representing chosen value. This implies both neuronal subpopulations may interact with one another to facilitate optimal choice.

More broadly, these results show vmPFC neurons may employ a theta phasedependent neural code, which could be a possible explanation for the divergence between vmPFC findings in fMRI12,31–33 and electrophysiology34–36. Furthermore, finding a cognitive 291 map of value space in vmPFC neuronal activity strongly suggests this region could contain chosen value code in separate theta cycles. a) Left panel: a sample theta cycle (blue) with its corresponding phase (black) showing the cycle–phase convention used throughout this

Sharp wave ripple events are present in vmPFC

Considering subjects had constructed a cognitive map for value that was used during choice, we looked for evidence of other MTL cognitive map-related computations in vmPFC. Another such phenomenon are sharp wave ripples (SWR) – and associated replay – in awake animals occurring at choice points17,41,46. SWR-associated replay is suggested to be a neural mechanism for the retrieval of information41,47,48, and model-based planning18.

We used a ripple detector similar to previously published work investigating ripples in MTL and cortex47–50 to demonstrate the presence of oscillatory events within the ripple band (80 – 180 Hz) comparable to ones previously reported in non-human primates51. After 337 338 339 340 341 342 343 344 345 346 347 348 identifying candidate events, we further filtered events with a high signal-to-noise ratio (see Figure S8 and Methods) to avoid issues with artefacts or noisily defined events52. This procedure revealed candidate MTL-like sharp wave ripples (Figure 5A, n = 4162, median duration = 103 ± 42.08SD msec; see also Figure S8-9). Crucially, despite our wide frequency range, the ripples were localised to a narrow band spanning 90-140 Hz (Figure 5B, bottom). This concurs with previous findings24,48,51,53 and suggests these ripple events were generated by a specialised circuit rather than reflecting spurious high-frequency noise spanning the entire frequency range. The simultaneously recorded vmPFC neurons were found to elevate their firing rate around the ripple window (Figure 5C) – and more on trials with ripples (Figure S10A) – which coincided with a ripple sharp wave component (Figure 5D). While ripples have been previously described in human cortex and rodent mPFC, this demonstrates the first evidence of ripples in non-human primate vmPFC within a value-based decisionmaking context48. 351 352 353 354 355 356 357 358 359 360 361 362 363 vmPFC ripples are modulated by value-guided choice Having identified the presence of ripples in vmPFC, we next asked whether these were modulated by value-guided choice. We localized ripple events in time relative to task events and compared the ripple event frequency in vmPFC with that of the other brain regions. We observed a temporal specificity of ripple events whereby they mostly occurred shortly before and after choice, and at feedback, and they occurred at a much higher frequency in vmPFC compared to the other brain regions (Figure 6A). Furthermore, the periods where ripples in vmPFC occurred most did not coincide with periods of a general tonic increase in the firing rate of the vmPFC neural population, which might have suggested that the recorded events in the previous figure simply indexed a tonic increase in the firing of vmPFC neurons, such as what occurs in the other brain regions (Figure S10B).

Because ripples have been previously associated with task performance (e.g. memory retrieval)48,49, we predicted the frequency of these events would be related to behavioural performance in the context of choice accuracy, considering their potential role in modelbased planning17,18. When we tested this, we observed that ripple event frequency before choice predicted session accuracy (Figure 6BC), meaning that subjects performed better on the sessions where more ripples were observed across channels. This result was not explained by average session reaction time, stimulus set familiarity, or subject identity, as these were regressed out before performing this analysis.

Recent work has shown that theta stimulation during the fixation period disrupts learning in an RL task and this is thought to be due to impaired communication between PFC (OFC) and hippocampus44. If ripples play a comparable communication-related role in our task, we should find increased ripple frequency in the fixation period after subjects make correct choices or receive reward. Because the probability of detecting these events on channels was lower in fixation compared to during choice, we performed a median split on 389 390 391 392 393 394 395 396 397 398 399 channels, picking ones where more ripples were detected, to improve our signal-to-noise ratio. We then tested our prediction and found a higher proportion of ripple events occurring on trials after subjects were rewarded for their choice (Figure 6D); similar to work showing ripple event modulation by previous reward54.

Therefore, in addition to demonstrating the presence of ripples in vmPFC, we show these are modulated by task events during value-based choice, and that vmPFC ripples correlate with task performance and reward. This suggests vmPFC ripples may be involved in value-based learning and decision-making processes and offers a further parallel to the wellestablished planning and spatial navigation processes of the medial temporal lobe.

Discussion

We investigated neural representations that facilitate flexible, value-guided choice in prefrontal cortex, by taking inspiration from the well-characterised map-like representations in medial temporal lobe and spatial cognition. We demonstrated – for the first time – a gridlike code of value space at choice in the LFP of two separate datasets, showed the same gridlike code in neurons, and demonstrated how such encoding in neurons may be theta phasedependent. Furthermore, we found grid orientations realign across stimuli sets. Finally, we found sharp wave ripples in the vmPFC of non-human primates and that they are modulated by task events in a value-guided choice task.

The grid-like code was found at choice and represented a navigation angle that embedded both currently relevant choice options as locations in a value space. That this navigation angle revealed a grid-like code suggests the latter is traversed during choice, like in spatial navigation14,38. This is important because this navigation angle, together with the length of the vector between the options, contains all the information necessary to determine the correct option choice (decision variable) in such a value space. For example, anything in the upper right quadrant of a value map (Figure 2A) is more valuable compared to anything in the bottom left. Hence spatial navigation problems and making optimal choices in valueguided decision-making tasks may be solved using common neural mechanisms, such as vector-based navigation.

Furthermore, we demonstrate that when subjects must make choices between options which themselves have to first be composed with distinct visual cues, the attributes indexed by those visual cues, such as probability and magnitude, are abstracted away from the sensory properties of the cues to form a latent cognitive map for value. This abstraction is crucial because such a representation of the task structure could enable generalisation across sensory instantiations of this task, or tasks similar to it16,21. Notably, this result also corroborates a 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054

∆12 = 2|| 1 2 2| 2 | Where is a constant denoting the width of the grid field for a given symmetry level 1 which was set to 60, is the grid orientation from one channel and 2 the grid orientation for another channel. This yielded a channel-by-channel distance matrix from which only the upper triangle was used for analyses due to the matrix being symmetric. From this matrix, we computed the average distance for each session for vmPFC channels as a function of whether the distance was obtained from other sessions with the same stimulus set or with another stimulus set. Crucially, in our analysis, we did not use within stimuli set distances that came from the same session as this would potentially artificially inflate how small within stimuli set distances are due to signal correlations that may exist across channels within a session (i.e. it may have increased our false positive rate). In the final step, we created two averages, one coming from the same stimuli set and another one coming from different stimuli sets. This distance was compared to the null distribution where we shuffled the label at the previous step 1000 times (same stimuli set, different stimuli set) and used a Fisher Z transformation to assess statistical significance. The subsequent analysis where we estimated the grid consistency followed an identical procedure to the one described in Hexadirectional Analysis. As we observed a weaker grid-like code on the first day a stimulus set was presented, we excluded channels from such days in this analysis to ensure avoiding false negatives that could have occurred due to badly estimated grid orientations.

Neuronal phase-aligning, grid-code, and chosen value analyses

To estimate phase-dependent firing in neurons, we first estimated theta phase in the same frequency range in which we found the grid-like signal (5 – 8Hz) on every trial. On every trial, we then binned theta phase into 10-non overlapping phase bins within which the raw neuronal firing rates were binned. For each phase bin, we slid a larger 300-millisecond 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 window over time and averaged the neuronal firing rates within this window for each neuron and trial at each window increment. This allowed us to estimate neuronal firing rates in a theta phase-specific manner, estimate how theta phase dependency may change over time, and crucially, allow capturing more than one theta cycle within the larger 300-millisecond time window which increased the robustness of the phase-dependent firing. Figure 4A shows the phase-dependent average across vmPFC neurons collapsed across the choice epoch while Figure 4B shows the phase-dependent average of an example neuron without collapsing across the choice epoch. To determine whether neurons exhibited phase-dependent firing we first mean-subtracted each neuron9s activity at individual phase bins using the average across all phase bins and examined the effect of phase in a one-way ANOVA. In the supplementary materials, we performed further post-hoc paired t-tests across neurons where we compared the neuronal firing across neurons at each phase bin. The post-hoc paired t-tests across neurons were used to determine the 8Preferred9 and 8Non-Preferred9 phases that were used to illustrate the phase-dependent firing in Figure 4C; the 8Preferred9 phase corresponded to the phase bin with the largest relative neuronal firing activity across neurons compared to other phase bins while the 8Non-Preferred9 phase corresponded to the phase bin with the lowest relative neuronal firing activity across neurons.

We compared the relationship between neurons and oscillatory data in three different ways. First, we bandpass filtered our LFP activity across all vmPFC channels within the theta range in which we found the grid-like encoding of value (5 – 8Hz). We then averaged each channel across trials and further averaged the data across all channels. Through this, we obtained the average theta oscillatory activity across all channels shown in Figure 4D which enabled us to define theta-based seed windows (labelled with numbers from 1 to 4) within the original 300 millisecond window before subjects initiated their choice from Figure 2B in which we report the grid-like code. Each seed window was centred on the maximum (peak: 1, 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 3) and minimum (trough: 2, 4) of the average theta oscillatory activity in 60 msec bins. This bin length allowed us to minimize the amount of temporal overlap, i.e. it allowed us to use two non-overlapping windows (peak, trough) of an 8Hz oscillation (1 cycle = 125 msec) which was the highest frequency tested and reported in the main text. These seed windows were used to compute the grid orientations of vmPFC channels which were then used to compute the strength of grid-like encoding in vmPFC neurons. Therefore, the only difference between these grid orientations was the part of the average theta oscillatory cycle within vmPFC they occurred in. For each seed window, all control symmetries were computed.

Next, instead of relying on the average theta oscillatory cycle, we phase-aligned neuronal firing rates in time on a trial-by-trial basis within the original time window. For each channel, we computed the theta phase for every trial and used the temporal differences in phase peaks across trials to fully align the neuronal firing rates in vmPFC neurons identified on those channels. After alignment, the neuronal firing rates of each neuron were smoothed with a 25 msec boxcar filter to minimize the possibility of contaminating phase-specific signals occurring across cycles. We repeated this across all channels and their corresponding neurons. The strength of this approach is that it allows us to examine the properties of neuronal firing rates fully aligned to theta oscillatory cycles. The limitation of this approach is that it jitters the precise information of each time point with respect to task epochs (e.g. Choice onset). Due to this, we computed a histogram of the average times that were used to time-align these theta cycles (i.e. where peaks occurred) to temporally localize the time points with respect to task epochs (Figure S7G). This showed a correspondence between this approach and the above 8seed window9 approach. As in the approach above, we used the grid orientations of vmPFC channels to compute the strength of grid-like encoding in vmPFC neurons. This allowed us to hold grid orientations constant and examine the emergence of a grid-like code in neuronal firing rates. We determined whether a grid-like code in neuronal 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 firing rates exists by estimating the hexadirectional beta for each neuron and doing a onesample t-test against zero across all neurons. If there is a grid-like code in neurons, this test should yield a significant positive t-value. To determine whether such significant positive tvalues could be obtained by chance we used cluster-based permutation testing. For each neuron, we randomly shuffled the firing rates across trials 1000 times – using identical shuffles across time to preserve temporal smoothness - before computing the betas at each time point. We then performed t-tests across neurons for each shuffle and compared the empirically obtained t-values against the null distribution (97.5th percentile of a lengthcorrected null distribution).

To examine the firing properties of neurons with a high grid-like code, we averaged the neuronal firing within the window with a significant grid-like code in Figure 4G of for two neurons. The averaged firing rates were z-scored across all trials. The firing rates were then further averaged as a function of each state presentation (i.e. each presentation of reward magnitude and reward probability). The presentations of each state were averaged for the left option and right option separately. The obtained rate map for each neuron and each option was then convolved with a 3x3 Gaussian kernel with sigma = 0.5.

The phase-aligned neuronal firing rates were also used to examine the emergence of a chosen value signal using a CPD approach like the one described in Neuronal CPD analyses. To determine the significance of the chosen value CPD, we used the same length-corrected null distribution permutation testing approach as described in this section.

Sharp wave ripple detection

To detect sharp wave ripples, we pre-processed the raw LFP signal by removing line noise and band-passing it between 1 and 250Hz. This band-passed signal was further filtered to the ripple band (80 – 180Hz) as in previous non-human primate work16. To detect candidate 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 ripple events, we used ripple detectors like previous work16-18. In brief, the ripple bandfiltered LFP signal was smoothed with a 4SD Gaussian. The envelope of the smoothed trace was then computed by using the Hilbert transform. This envelope was z-scored. On the zscored envelope we looked for events with a total length of at least 25 msec, of which at least 15 msec needed to exceed 3SD throughout their duration. If several events were recorded where peak to peak was separated by less than 40 msec, these were combined into one ripple event. These candidate events were then further processed by a non-negative matrix factorisation with the aim of improving our signal-to-noise ratio and selecting events that have clearer spectral properties16 due to recent concerns with respect to the signal-to-noise ratio of ripple estimation19. We subjected the wideband spectral signal of each ripple event to a non-negative matrix factorisation with three clusters and 10 replicates. We then proceeded to compute for each of the three obtained clusters their low-rank approximations of the original spectral signal. For each cluster, we then computed the estimated signal-to-noise ratio for each frequency (by dividing the spectral signal by the summed squared error the projection of each cluster gave rise to relative to the original spectral signal). We then defined a minimum signal-to-noise threshold that was used to discard events below this threshold. We used a signal-to-noise ratio of 5 compared to the original 3 used by16. The events passing this criterion needed to further pass the final criteria by which we selected only events as ripples when the projection of the cluster with the highest signal-to-noise ratio had a clear unimodal peak that was within the ripple band, as defined in ref16. Crucially, this allowed for ripple events where the projection of a cluster with high signal-to-noise ratio can also be observed in other frequencies (e.g. due to theta activity).

To determine whether the observed ripples were associated with increased oscillatory power in the ripple range, we computed scalograms using bandpass-filtered and line noiseremoved LFP data using Morlet wavelets in the frequency range of 1 and 200 Hz. These were 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 always centred on the peak ripple signal using the ripple detector. We computed the power by squaring the magnitude and then performed baseline decibel normalisation with respect to the first 300 msec of the fixation period. Similarly, we computed the peak-centred firing rate of all neurons recorded within vmPFC across all channels by concatenating all trials where ripples occurred for each neuron. This was followed by computing averages on a per-neuron basis and smoothing the raw data with a 50 msec boxcar filter. Finally, we baseline subtracted the firing rates using activity of the first 300 msec on the same trials. The ripple band oscillation and sharp wave components were averaged in the same manner as described above. The ripple band oscillation was bandpass-filtered to the frequency described in the manuscript while the sharp wave component oscillation was low-pass-filtered as in ref16.

Ripple analyses

To determine the relationship between ripple frequency and choice accuracy we computed subjects9 choice accuracy on a session-by-session basis and examined whether it is related to the probability of detecting ripples on LFP channels across sessions. To do this, we used a GLM where we predicted session-level choice accuracy using channel-level ripple frequency. Crucially, this GLM contained co-regressors for subject identity, session-level reaction time, and the number of days a stimulus set was presented for to rule out confounding explanations that could have driven choice accuracy across sessions. To determine the significance of the relationship between ripple probability and choice accuracy we asked whether the obtained t values from the GLM exceed the 97.5th percentile of a length-corrected null distribution.

To determine the relationship between rewarded and unrewarded trials in relation to ripple presence, we split trials according to whether they were rewarded and unrewarded on trial t and averaged the ripple rasters on the subsequent fixation/rest period which occurred before the next trial started. To determine significance, we used cluster-based permutation 1181 1182 testing where we shuffled trials and investigated whether ripples survived the 97.5th percentile of the length-corrected null distribution.

Supplementary Figures

Figure S1. Neuronal recording locations. Taken from2. Images are each monkey9s MRI at various coronal planes. Numbers signify the distance in millimetres from anterior posterior 0.

Each dot represents on electrode from one recording location.

Figure S3. Grid-like code across regions, frequencies, and across stimulus set repetitions. a) Hexadirectional (sixfold) modulation across the four recorded regions. The height of each bar represents the average across sessions for each brain region. Note that the vmPFC bar is identical to the one from Figure 2B. Error bars represent SEM across sessions for each brain region. * p < .05. b) Hexadirectional (sixfold) modulation in vmPFC for 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 sixfold (green) and control (4-8) symmetries (gray). Blue shading denotes the 300millisecond window before subjects initiated their choice. The thick gray and green lines denote session averages of betas obtained from the regression with which we determined the existence of the grid-like code. Shaded areas denote SEM across sessions. c) Grid-like code in vmPFC across frequencies defined based on ref9. We find no significant grid-like code in control frequencies. Notably other work has reported a grid-like code in gamma20. Note the tstatistic for Theta is identical to the one reported for Figure 2B. d) The effect reported in Figure 2B for other regions. Height of bars and SEM conventions are identical to Figure 2B. e) The Hexadirectional (sixfold) modulation effect reported in Figure 2B split as a function of the number of days of stimulus set exposure (a stimulus set was swapped every few days). There was a trending effect indicating the grid-like code became stronger on subsequent days of a stimulus set presentation compared to the first day it was presented (t14 = 2.12, p = 0.052). f) The hexadirectional (sixfold) modulation effect reported in Figure 2B shown for Day 1 and the average of Day 2-4. ** pBonferroni < .01. g) Sixfold periodicity reported in Figure 2D shown for Day 1 and the average of Day 2-4. h) Decrease in grid orientation variance in vmPFC compared to other brain regions. The average standard deviation in the grid orientation across the three folds significantly decreased in vmPFC relative to other regions during the period in which we observed the grid-like encoding of value (ACC: t55 = 2.59, p = 0.01, dlPFC: t46 = 2.42, p = 0.02, OFC: t48 = 2.35, p = 0.02).

Distorting the cognitive map for value

The analyses presented in Figure 2 make a key assumption: a fully veridical representation of the cognitive map for value. That is, the hexadirectional analysis used to test for grid-like encoding assumes that the representational geometry for neighbouring magnitude or probability levels corresponds to how the experimenter defines those quantities. However, human and non-human primates exhibit choice biases12,21,22, with this being especially well studied in the human decision-making literature based on prospect theory21.

If the cognitive map contains representational distortions, as opposed to being represented veridically, then those representational distortions may be related to distortions in behaviour. We tested this idea by first fitting different prospect theory models as in ref11 using equations from ref10 on a session-by-session basis. The prospect theory model with the lowest BIC indicated an overweighting of low magnitude and probability values and underweighting of high magnitude and probability values across recording sessions relative to not being distorted (Figure S4A). 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257

Next, we asked whether behavioural distortions are related to neural distortions (i.e. a distorted representation of the cognitive map). Because previous work reported grid-like representations in ACC, OFC, and vmPFC7,23, and vmPFC was significantly stronger only compared to dlPFC but not the other two regions (Figure S3A), we initially collapsed our analysis across these regions on a per-session basis. We then asked if distorting the magnitude and probability values using the best-fitting behavioural parameters would improve the grid-like code in these regions. We found that distorting both magnitude and probability according to behaviour (Figure S4B) resulted in a reduction of the grid-like code. This effect was driven specifically by vmPFC (Figure S4C, t14 = 2.30, p < .05). It is important to highlight that this effect did not replicate in the second independent dataset (Figure S5). It should, therefore, be treated as preliminary and warrants further investigation. Different possible factors may have led to a lack of replication: e.g. lower statistical power in Dataset 2 (fewer total sessions), differences in task design, and or differences in training regimes.

If the result is replicated in future work, it suggests vmPFC contains a veridical representation of the cognitive map, while other regions downstream in the choice process may distort this veridical representation. While the exact relationship between neural and behavioural distortions of this kind is currently unclear, this approach provides a way of bridging biases from rational choice theory and distortions in the representational geometry of the structure of value-based decision-making tasks. 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280

Figure S4. Distorting the cognitive map. a) Difference in expected value (EV) between the theoretical and empirical EV space. The theoretical EV space has no distortions in either magnitude or probability, while the empirical EV space has distortions for both magnitude and probability. The coloured plane shows the difference between the theoretical – empirical EV across sessions. High values (red) indicate behavioural underweighting of option combinations with high EV. Low values (black) indicate behavioural underweighting of option combinations with low EV. The flat stripped plane is used for visual purposes to indicate no difference between the theoretical and empirical EV, b) a comparison of the average hexadirectional effect across vmPFC, ACC, and OFC for the veridical (blue) cognitive map (analogous to averaging Figure 2C across the three regions) and the distorted (red) cognitive map. The thick blue line corresponds to a significant difference between the veridical and distorted conditions obtained from cluster-based permutation testing (exceeds the 97.5th percentile of a length-corrected null distribution). The error bars represent SEM averages across sessions of vmPFC, ACC, and OFC. c) Comparison of the veridical and distorted hexadirectional effect across the three regions within the time window of Figure 2B.

Error bars represent SEM across sessions within brain regions. * p < .05. 1283 1284 1285 1286 1287 1288 1289 1290

Figure S5. Independent dataset task. Two options were presented sequentially for 700ms (Option 1 onset, Option 2 onset). The subject was allowed to freely fixate each of them. Once central fixation was re-acquired (after Option 2 onset), both options were presented simultaneously for choice (Choice onset). A choice was made via fixation of the chosen option for 500ms. The chosen option was rewarded according to the corresponding magnitude and probability (Feedback). Choice feedback was provided through a change in background colour. 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310

Seed window

Figure S7. Firing rate properties of vmPFC neurons and additional control tests for a grid-code in neurons. a) a phase-by-phase t-value map showing theta frequency phases where vmPFC neurons have significantly higher / lower firing rates. This map was obtained by performing t-tests on firing rates for every pairwise phase comparison. For example, the first column across all rows indicates neurons fired less at theta phase peak (indicated by 0) as theta phase progressed towards the trough (indicated by π). b) firing activity of an example neuron which exhibited a firing pattern opposite to that of the majority cells – increasing firing most at theta peak instead of trough. Preferred phase refers to a portion of the theta phase where cells, on average exhibited the highest firing activity. Non-Preferred phase refers to a portion of the theta phase where cells, on average, exhibited the lowest firing activity.

Error bars represent SEM across trials for that neuron. c) firing activity of an example neuron 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 showing no discernible modulation by phase. d) A grid-like code can be found in vmPFC neurons when firing rates are averaged across the same phase of all theta cycles before choice within the 300-millisecond window before subjects initiated their choice from Figure 2B. Error bars represent SEM across vmPFC neurons. Dotted black lines correspond to significance thresholds based on one sample t-tests. e) Chosen value CPD before choice. The numbering on the x-axis follows the convention from Figure 4D. Significance was obtained through permutation testing (n = 1000). * p < .05. f) The obtained CPD values in the significant seed window (4) from e) do not correlate with vmPFC neuron grid-like code betas. g) Temporal localization of the phase-time aligned signal from Figure 4G. Negative values denote milliseconds before subjects made a choice. The peak frequency approximately corresponds to the time of the peak LFP signal. h) vmPFC neurons exhibit theta modulation over time, the firing rates are aligned according to one theta cycle and plotted over time, thus keeping the neuronal firing aligned to all phases of theta. Error bars are SEM across vmPFC neurons. i) The significant clusters where chosen value was represented in Figure 4L do not correlate with the significant cluster where the grid-like code was observed in Figure 4G.

Figure S8. Second-order filtering of candidate events. a) Example of a candidate ripple event which was initially detected as a candidate event but was rejected during our secondorder filtering procedure. Top left panel: Signal-to-noise ratio for the three identified clusters as a function of the frequency space. Top middle panel: original power spectrum of a candidate ripple event. Top right panel: Reconstruction of the original power spectrum based on the obtained clusters. Bottom left panel: ripple band oscillatory activity (black) with super1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1387 1388 1389 1390 1391 imposed candidate ripple events detected by the ripple detector (red). Bottom right panel: wideband oscillatory activity (black) with super-imposed candidate ripple event detected by the ripple detector (red). We computed the power spectrogram (top panel middle) for candidate events (bottom left panel – ripple band activity, bottom right panel – wideband signal). We then performed non-negative matrix factorisation on the power spectrograms with three clusters like ref51. For each cluster, we computed its signal-to-noise ratio across frequencies marginalising over time. Ripple events that we used in the analyses presented in the manuscript were events where the principal cluster (cluster with the highest signal-tonoise ratio) exceeded our predefined threshold level and had a clear unimodal peak. Such events could then reconstruct where the dominant spectral signal occurs (e.g. in the ripple frequency domain). b). Example of an accepted event at the second level. The panel conventions are identical to a). c) Probability of detecting ripple events across vmPFC channels.

Figure S9. Example session with all recorded channels organised according to brain region. All simultaneously recorded channels from one session at identical timepoints; highlighting cases where events could be detected simultaneously on several channels within vmPFC but not outside vmPFC. 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405

Figure S10. Firing rate on ripple and no ripple trials and firing rate plots over time across regions. a) The mean firing rate of vmPFC neurons on ripple trials was higher compared to the firing rate of non-ripple trials (t159 = 3.36, p = 9.67e-04). This was assessed by comparing the average firing rate of vmPFC neurons within a 100 msec window centred on obtained ripple peaks and compared to the firing rate aligned to oscillatory troughs within the same time period but on trials with no ripples detected. Error bars represent SEM across vmPFC neurons. b) Mean firing rate before and throughout choice across all regions. Error bars represent SEM across neurons within a brain region. The color convention is identical to Figure 1DE. c) All regions except vmPFC increased mean firing rate by more than 1Hz before choice relative to fixation. Error bars represent SEM across neurons within brain region.

Hunt, L. T. et al. Triple dissociation of Attention and decision computations across

Butler, J. L. et al. Covert valuation for information sampling and choice. Biorxiv

Cavanagh, S. E., Malalasekera, W. M. N., Miranda, B., Hunt, L. T. & Kennerley, S. W. Visual fixation patterns during economic choice reflect covert valuation processes that emerge with learning. Proc Natl Acad Sci U S A 116, 22795–22801 (2019). Muller, T. H. et al. Distributional reinforcement learning in prefrontal cortex. Biorxiv

Oostenveld, R., Fries, P., Maris, E. & Schoffelen, J. FieldTrip : Open Source Software for Advanced Analysis of MEG , EEG , and Invasive Electrophysiological Data. 2011, (2011).

Doeller, C. F., Barry, C. & Burgess, N. Evidence for grid cells in a human memory network. Nature 463, 657–661 (2010).

Constantinescu, A. O., O9Reilly, X. J. & Behrens, T. E. J. Organising conceptual knowledge in humans with a gridlike code. Science (1979) 352, 1464–1468 (2016). Chen, D. et al. Theta oscillations coordinate grid-like representations between ventromedial prefrontal and entorhinal cortex. Sci Adv 7, 1–13 (2021).

Maidenbaum, S., Miller, J., Stein, J. M. & Jacobs, J. Grid-like hexadirectional modulation of human entorhinal theta oscillations. PNAS 115, 10798–10803 (2018). 10.

Prelec, D. The Probability Weighting Function. vol. 66 (1998). 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 11.

Bongioanni, A. et al. Activation and disruption of a neural mechanism for novel choice in monkeys. Nature 591, 270–274 (2021).

Imaizumi, Y., Tymula, A., Tsubo, Y., Matsumoto, M. & Yamada, H. A neuronal

prospect theory model in the brain reward circuitry. Nat Commun 13, (2022). Schwarz, G. Estimating the Dimension of a Model. Ann Stat 6, 461–464 (1978). Park, S. A., Miller, D. S. & Boorman, E. D. Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci 24, 1292–1301 (2021). 15.

Fyhn, M., Hafting, T., Treves, A., Moser, M. B. & Moser, E. I. Hippocampal remapping and grid realignment in entorhinal cortex. Nature 446, 190–194 (2007). 16.

Logothetis, N. K. et al. Hippocampal-cortical interaction during periods of subcortical silence. Nature 491, 547–553 (2012). 17.

Vaz, A. P., Inati, S. K., Brunel, N. & Zaghloul, K. A. Coupled ripple oscillations between the medial temporal lobe and neocortex retrieve human memory. Science (1979) 363, 975–978 (2019). 18.

Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat Neurosci 12, 913–918 (2009). 19.

Liu, A. A. et al. A consensus statement on detection of hippocampal sharp wave ripples and differentiation from other fast oscillations. Nature Communications vol. 13 Preprint at https://doi.org/10.1038/s41467-022-33536-x (2022). 20.

Staudigl, T. et al. Hexadirectional Modulation of High-Frequency Electrophysiological Activity in the Human Anterior Medial Temporal Lobe Maps Visual Space. Current 1461 1462 1463 1464 1465 1466 1467

Kahneman, D. & Tversky, A. Prospect Theory: An Analysis of Decision under Risk.

Econometrica 47, 263–291 (1979).

Lakshminarayanan, V. R., Chen, M. K. & Santos, L. R. The evolution of decisionmaking under risk: Framing effects in monkey risk preferences. J Exp Soc Psychol 47, 689–693 (2011).

Jacobs, J. et al. Direct recordings of grid-like neuronal activity in human spatial

prefrontal cortex . Nat Neurosci 21 , 1471 - 1481 ( 2018 ). 22 . 23.