Extensive childhood experience with Pokémon suggests eccentricity drives organization of visual cortex

Jesse Gomez, Michael Barnett, and Kalanit Grill-Spector


Read full article

The functional organization of human high-level visual cortex, such as the face- and place-selective regions, is strikingly consistent across individuals. An unanswered question in neuroscience concerns which dimensions of visual information constrain the development and topography of this shared brain organization. To answer this question, we used functional magnetic resonance imaging to scan a unique group of adults who, as children, had extensive visual experience with Pokémon. These animallike, pixelated characters are dissimilar from other ecological categories, such as faces and places, along critical dimensions (foveal bias, rectilinearity, size, animacy). We show not only that adults who have Pokémon experience demonstrate distinct distributed cortical responses to Pokémon, but also that the experienced retinal eccentricity during childhood can predict the locus of Pokémon responses in adulthood. These data demonstrate that inherent functional representations in the visual cortex—retinal eccentricity—combined with consistent viewing behaviour of particular stimuli during childhood result in a shared functional topography in adulthood.

Humans possess the remarkable ability to rapidly recognize a wide array of visual stimuli. This ability is thought to occur as a result of cortical computations in the ventral visual stream1 : a processing hierarchy that extends from the primary visual cortex to the ventral temporal cortex (VTC). Previous research has shown that VTC responses are key for visual recognition because (1) distributed VTC responses contain information about objects2–4 and categories5 , and (2) responses in category-selective regions in the VTC, such as the face-, body-, word- and place-selective regions6–12, are linked to the perception of these categories13–15. Distributed VTC response patterns to different visual categories are distinct from one another and are arranged with remarkable spatial consistency along the cortical sheet across individuals16–23. For example, peaks in distributed VTC response patterns to faces are consistently found on the lateral fusiform gyrus (FG). Although several theories have been suggested to explain the consistent spatial topography of the VTC24–26, developmental studies suggest that experience may be key for the normal development of the VTC and recognition abilities. For example, behavioural studies suggest that the typical development of recognition abilities is reliant on viewing experience during childhood27–32. However, the nature of childhood experience that leads to consistent spatial functional topography of the VTC—whether it is the way stimuli such as faces or places are viewed, or the image-level statistics of the stimuli themselves—remains unknown. Several theories have proposed attributes that may underlie the functional topography of human high-level visual cortex. These include: (1) the eccentricity bias of retinal images associated with the typical viewing of specific categories19,24, for example, face discrimination is thought to require high visual acuity supported by foveal vision, but peripheral vision is believed to be more important for processing places, as in the real world they occupy the entire the visual field; (2) the average rectilinearity of stimuli from particular categories25,33, for example, faces are curvilinear, but man-made places tend to be rectilinear; (3) the perceived animacy of stimuli5,34–37, for example, faces are perceived to be animate whereas places are not; and (4) the real-world size of stimuli38, for example, faces are physically smaller than places and buildings. Each of these theories proposes an underlying principle to describe the coarse functional topography of the VTC relative to its cortical macroanatomy. That is, inherent in all these theories is the idea that a physical or perceived dimension of a stimulus maps onto a physical dimension along the cortical surface. For example, in the human VTC, small, curvy, animate and foveal stimuli elicit stronger responses lateral to the mid-fusiform sulcus (MFS), whereas large, linear, inanimate and peripherally extending stimuli elicit stronger responses in the cortex medial to the MFS.

However, which of these dimensions drives the development of the functional organization of the VTC is unknown. Research on cortical plasticity in animals has made two key discoveries related to this question. First, eccentricity representations in the early and intermediate visual cortex are probably established in infancy39,40, as they may be constrained by both wiring41 and neural activity that starts in utero42. For example, research on ferret and mouse development suggests that retinal waves during gestation and before eye-opening are sufficient to establish eccentricity representations in the visual cortex40. An eccentricity proto-architecture is also detectable early in infant macaque development39. Second, visual development has a critical period during which the brain is particularly malleable and sensitive to visual experience32,43–48. For example, previous research suggests that new category representations in high-level visual cortex emerge with experience only in juvenile macaques, but not if the same experience happens to adult macaques43. Furthermore, visual deprivation of a category (for example, faces) in infancy results in a lack of development of a cortical representation for that category32. Together, these findings support the following predictions regarding human development: first, if eccentricity representations in high-level visual cortex are present early in development, then eccentricity stands to be a strong developmental constraint for the later emergence of object representations; second, testing theories of VTC development requires the measurement of the effects of childhood experience on the formation of new brain representations.

Figure 1
Fig. 1 | Localizer stimuli and behavioural naming performance. a, Distributions of participant accuracies from a five-alternative-choice Pokémon naming task outside the scanner. Experienced participants (blue; n= 11) significantly outperformed novices (grey; n= 9). b, Example stimuli from each of the categories used in the fMRI experiment. In each 4 s trial, participants viewed 8 different stimuli from each category at a rate of 2 Hz while performing an oddball task to detect a phase-scrambled stimulus with no intact object overlaid. Participants completed 6 runs, of 3 min 38 s each, using different stimuli. See https://www.pokemon.com/us/pokedex/ for more general examples of Pokémon and Supplementary Fig. 9 for more examples of the pixelated GameBoy Pokémon stimuli.

Results

Childhood experience with Pokémon results in distinct and reproducible information across the VTC. Experienced participants were adults (n=11, mean age 24.3±2.8 yr, 3 female) initially chosen through self-reporting, who began playing Pokémon between the ages of 5 and 8 yr. Experienced participants were included in the study if they continued to play the game throughout childhood and revisited the game as adults. Novice participants were chosen as similarly aged and educated adults who never played Pokémon (n=11, mean age 29.5±5.4 yr, 7 female). We validated their self-reported experience with Pokémon with data from a behavioural experiment, in which participants viewed 40 Pokémon images from the original Nintendo game and identified each image by name (from 5 choices). Experienced participants (n=11) significantly outperformed novices (n=9) in their naming ability of Pokémon (Student’s t-test, t(18)=18.2, P < 0.01 Cohen’s d=8.18; Fig. 1a). Despite not being able to name Pokémon, novices are capable of visually distinguishing and individuating Pokémon characters (Supplementary Fig. 1).

All participants underwent fMRI while viewing faces, bodies, cartoons, pseudowords, Pokémon, animals, cars and corridors (Fig. 1b). Cartoons and animals were chosen to create a strict comparison to the Pokémon stimuli and the other categories were included as they have well-established and reproducible spatial topography across the VTC49. Stimuli were randomly presented at a rate of 2Hz in 4 s blocks, each containing 8 images from a category. Participants performed an oddball detection task to ensure continuous attention throughout the scan. Participants completed six runs with different stimuli from these categories. We first examined whether childhood experience affects the representation of category information in the VTC, which was anatomically defined in each participant’s native brain (see the MVPA section in the Methods). Therefore, at the individual level, we measured the representational similarity among distributed VTC responses to the eight categories across runs. Each cell in the representational similarity matrix (RSM) is the voxelwise correlation between the distributed VTC responses to different images of the same category (diagonal) or different categories (off-diagonal) across split halves of the data. Then, we averaged the RSMs across the participants of each group and compared across groups to examine the representational structure of distributed VTC responses in experienced and novice participants. Averaging across participants of each group allowed us to visualize consistency within a given group. We then used decoding approaches to quantify these representational structures in individual participants, described below.

Figure 2
Fig. 2 | Experienced participants demonstrate a consistent and distinct representation for Pokémon compared to novices. a, RSMs calculated by correlating distributed responses (z-scored voxel betas) from an anatomical VTC ROI across split halves of the fMRI experiment. Positive values are presented in orange, negative values in green and near-zero values in white (see the colour scale, which applies to all four RSMs). b, The decoding performance from the winner-takes-all classifier trained and tested on split halves of the fMRI data from the bilateral VTC. The shaded region shows s.e.m. across participants within a group (experienced participants, n= 11; novices, n= 11). The dashed line indicates the chance level performance; decoding performance is represented as a fraction of 1, with 1 corresponding to 100% decoding accuracy. c,d, The decoding performance from distributed bilateral VTC responses for experienced (n= 4) and novice (n= 5) participants in the original oddball task (c) and when brought back to undergo an additional fMRI experiment with an attention-demanding two-back task (d). The same participants are shown in c and d.

We hypothesized that the representation similarity of distributed VTC responses will have one of four outcomes. The first is the null hypothesis: Pokémon will not elicit a consistent response pattern in the VTC in any group and will have near-zero correlation with other items of this and other categories. Second, the animate hypothesis: Pokémon, which have faces, limbs and resemble animals to some extent, will have positive correlations with animate categories, such as faces, bodies and animals. Third, the expertise hypothesis: if Pokémon are processed as a category of expertise, then distributed responses to Pokémon will be most correlated with distributed responses to faces, as the expertise hypothesis predicts that expert stimuli are processed in face-selective regions50,51. Fourth, the distinctiveness hypothesis: as Pokémon constitute a category of their own, they will elicit a unique response pattern. Thus, correlations among distributed responses to different Pokémon will be positive and substantially higher than the correlation between Pokémon and items from other categories.

Experienced participants differ markedly from novices in their distributed VTC response patterns to Pokémon (see the RSM in Fig. 2a). Unlike novices, who demonstrate little to no reproducible pattern for Pokémon in the VTC, consistent with the null hypothesis (mean Pearson correlation±s.d., r=0.1±0.06, n=11), experienced participants demonstrate a significantly more reproducible response pattern for Pokémon (r=0.27±0.11; significant betweengroup difference: t(20)=4, P< 0.001, d=1.8). Furthermore, distributed responses to Pokémon were distinct from those of other categories in experienced participants. We quantify this effect by calculating the mean dissimilarity (D=1−r) of distributed responses to Pokémon from other categories. Distributed responses to Pokémon are significantly more different from distributed responses to the other categories in experienced participants than controls (t(20)=4.4, P< 0.001, d=1.8). Interestingly, in experienced participants, Pokémon response patterns are significantly dissimilar (all t(20)< 4.2, all P< 0.001, all d>0.89) from those of faces (D±s.d.=0.97±0.08), bodies (1.1±0.08) and animals (0.9±0.07) despite Pokémon having faces, bodies and animal-like features themselves. In contrast, when excluding Pokémon, groups do not have a significantly different D between distributed responses to other pairs of categories (t(20)=0.52, P=0.6, d=0.19).

Figure 3
Fig. 3 | Distinct cortical representation for Pokémon in experienced participants . a,b, Unthresholded parameter maps displayed on the inflated ventral cortical surface zoomed on VTC (see inset for the location on a whole-brain map) in an example novice participant (26-year-old female; a) and an example experienced participant (26-year-old male; b) for the contrasts of Pokémon, faces and corridors, each versus all other stimuli. Dashed lines delineate cortical folds; OTS, occipitotemporal sulcus; FG, fusiform gyrus; CoS, collateral sulcus.

Visual features make different predictions for the emergent location of cortical responses. Results of the previous MVPA suggest that intense childhood experience with a novel visual category results in a reproducible distributed response across the human VTC that is distinct from other categories. An open question is whether Pokémon generate distributed response patterns with similar topographies across experienced participants. Therefore, we generated statistical parametric maps that contrast the response to each category versus all others (units: T values) and compared across groups. In typical adults, stimulus dimensions such as eccentricity19,55, animacy37,56, size26 and curvilinearity25 are mapped to a physical, lateral−medial axis across the VTC49.

As the responses to faces on the lateral VTC and places on the medial VTC generate the most differentiated topographies, we analysed the properties of Pokémon stimuli for these attributes relative to faces and places. Thus, we used these metrics to generate predictions for the emergent topography of a Pokémon representation in experienced participants. As expected, in both groups, unthresholded contrast maps demonstrating preferences for faces and places showed the typical topography in relation to major anatomical landmarks. That is, despite both anatomical and functional variability between participants, preference for faces was found in the lateral FG and preference for places in the collateral sulcus (CoS), as illustrated in Fig. 3. However, striking differences can be observed when examining VTC responses to Pokémon. In novices, Pokémon do not elicit preferential responses in the VTC, as with faces or corridors (Fig. 3a). In contrast, an example experienced participant demonstrated a robust preference for Pokémon in the lateral FG and occipitotemporal sulcus (OTS; Fig. 3b). This pattern was readily observable in all the other experienced participants (Supplementary Fig. 3). Given that these data suggest childhood experience with Pokémon results in a spatially consistent topography for Pokémon across individuals, we next asked: what attributes of Pokémon drive this topography? The stimuli of faces, corridors and Pokémon used in the localizer experiments were submitted to a variety of analyses with the goal of ordering these categories linearly along different feature spaces (see the Image statistics analyses section in the Methods). Stimuli were analysed for physical attributes of foveal bias, that is, the retinal size of images when fixated on across a range of typical viewing distances, and rectilinearity, using the Rectilinearity Toolbox25, which evaluates the presence of linear and curved features at a range of spatial scales. Stimuli were also rated for attributes of size and animacy, by independent raters, described below.


Location of novel cortical responses in experienced participants supports eccentricity bias theory. To test these predictions, we produced contrast maps for Pokémon versus all other stimuli in each participant. Using cortex-based alignment (CBA), we transformed each participant’s map to the FreeSurfer57 average cortical space, where we generated a group average Pokémon-contrast map. For visualization, we projected the map onto an individual experienced participant’s cortical surface. These results revealed four main findings. First, in experienced, but not novice participants, we observed that preference for Pokémon reliably localized in the OTS. As illustrated in the average experienced participants’ Pokémon contrast map (Fig. 5a), higher responses to Pokémon versus other stimuli were observed in the OTS and demonstrated two peaks on the posterior and middle portions of the sulcus. Second, we compared Pokémon activations to those of faces by delineating the peaks of face selectivity in the average contrast maps for faces in each group (Fig. 5a, white outlines). This comparison reveals that for experienced participants, Pokémon-preferring voxels partially overlapped face-selective voxels on the lateral FG and extended laterally to the OTS, but never extended medially to the CoS, where place-selective activations occur (Fig. 3).

Third, we compared the volume of category selectivity for Pokémon in each group. Pokémon-selective voxels were any voxels within the VTC that were above the threshold (T>3) for the contrast of Pokémon versus all other stimuli. Pokémon-selective voxels were observed in the lateral FG and OTS of all 11 individual experienced participants. In contrast, although some scattered selectivity could be observed for Pokémon in four novice participants, it was not anatomically consistent. Thus, it did not yield any discernible selectivity for Pokémon in the average novice selectivity map (Fig. 5a). An ANOVA run with the factors group and hemisphere on the volume of Pokémon-selectivity in the VTC revealed a main effect of group (F(1,1)=32.75, P< 0.001, η2=0.45), but no effects of hemisphere, nor any interaction (Fs(1,1)< 0.67, Ps>0.41, η2 s< 0.016). The median volume in experienced participants was sixfold larger than novices (Fig. 5a), with most (7 of 11) novice participants having close to zero voxels selective for Pokémon. This difference in volume between groups was not driven by gender differences (Supplementary Fig. 2a). Fourth, we compared the lateral−medial location of Pokémon selectivity relative to face and place selectivity to directly assess theoretical predictions. Therefore, we partitioned the VTC in each experienced participant into four anatomical bins from lateral (OTS) to medial (CoS); see the inset in Fig. 5b. Within each bin we extracted the mean T value for the contrast of either Pokémon, faces or corridors. Curves fitted to average T values across bins demonstrate that (1) peaks in these curves are the most lateral for Pokémon, intermediately lateral for faces and medial for corridors (Fig. 5b) and (2) Pokémon-selectivity peaks are located significantly more laterally in the VTC than those of faces (t(20)=2.88, P=0.009, d=1.23). Together, this pattern of results is consistent with only the predictions of the eccentricity bias theory for the development of VTC topography (Fig. 4a).

Figure 4
Fig. 4 | Distinct cortical representation for Pokémon in experienced participants a, Distributions of retinal image sizes produced by Pokémon (blue), faces (orange) and corridors (grey) in a simulation that varied viewing distance across a range of sample stimuli. DVA, degrees of visual angle. X axis shows log-scale DVA. b, Distributions of the relative rectilinearity scores of faces, corridors and Pokémon, as measured using the Rectilinearity Toolbox25 (0, least linear; 1, most linear). c, Distributions of the perceived physical size of Pokémon (from 28 raters) and of the physical sizes of faces and corridor stimuli. The distributions of face and corridor size were produced using Gaussian distributions with standard deviations derived from either anatomical or physical variability within the stimulus category (see Methods). The face distribution extends to a value near 100% (the natural variation of face size is very narrow compared to other stimuli). d, Distributions of the scores of perceived animacy collected from a group of 42 independent raters who rated the stimuli of faces, Pokémon and corridors for how ‘living or animate’ these stimuli were perceived to be (1, animate; 5, inanimate).

How does experience affect the amplitude of responses to Pokémon?. To further understand how novel childhood experience has impacted cortical representations in the VTC, we asked two questions. First, are the emergent responses to Pokémon in experienced participants specific to the Pokémon characters that participants have learned to individuate, or will similar patterns emerge for any Pokémon-related stimulus from the game? Second, how does visual experience change the responsiveness of the OTS to visual stimuli? To address the first question, a subset of experienced participants (n=5) participated in an additional fMRI experiment in which they viewed other images from the Pokémon game (see the Pokémon scenes and pixelated faces fMRI control experiment section in the Methods). In this experiment, participants completed a blocked experiment with two categories of images: images of places (for example, navigable locations) from the Pokémon game as well as downsampled face stimuli (to resemble 8bit game imagery). Images were presented in 4 s blocks at a rate of 2Hz while participants performed an oddball task. Results show that places from the Pokémon game drive responses in the CoS, not the OTS or the lateral FG. In other words, they produce the typical pattern of place-selective activations (Fig. 7). In contrast, Pokémon-selective voxels in each participant (Fig. 7, black contours) have minimal to no selectivity for places from the Pokémon game, further demonstrating the specificity of Pokémon-selective voxels to Pokémon characters. To answer the second question, namely, how experience shapes responses in the OTS, we quantified the response amplitude of Pokémon-selective voxels in both experienced and novice participants. To ensure that the region of interest (ROI) was defined independently from the individual’s data, we employed a leave-one-out approach60 in which we produced a group-defined Pokémon ROI from ten experienced participants by transforming ROIs from individuals to the FreeSurfer cortical average using CBA and then using CBA to project the group ROI to the left-out individual’s brain and examining its responses. This procedure was repeated for each experienced participant. For novice participants, we transformed the group ROI produced from all the experienced participants into each individual novice’s brain. To ensure that we did not extract a signal from cortex that was already selective for another category, we removed any voxels that were selective for other categories from the group Pokémon ROI for each participant. From this independently defined Pokémon ROI, we extracted the percent signal change from the eight-category fMRI experiment.

Figure 4
Fig. 5 | Average contrast maps for Pokémon; and anatomical localization reveals lateral VTC responses in experienced subjects. a, Average contrast maps for Pokémon in novice and experienced participants. For each participant, T-value maps were produced for the contrast of Pokémon versus all other stimuli. These maps were aligned to the FreeSurfer average brain using cortex-based alignment (CBA). On this common brain surface we generated a group-average contrast map by averaging maps across all novice participants and all experienced participants. Group-average maps are shown on an inflated right hemisphere of one of our participants, zoomed in on the VTC. White outlines show groupaverage face-preferring voxels (average T> 1) from each respective group. Grey arrows show two peaks in the Pokémon-selectivity maps of experienced participants; the same arrows are shown next to the novice map for comparison. Inset: box plots show the mean (white line), 25% and 75% quartiles (boxes) and range (black dotted line) of the selectivity volume in novices and experienced participants. b, Curves fitted to the mean selectivity for Pokémon, faces or corridors, averaged in one of four anatomically defined regions extending from the lateral to medial VTC (illustrated in the inset for an example participant). Each line represents a participant and the triangles show the peak selectivity values. The peaks for the Pokémon-selectivity curves are significantly more lateral than the peaks for face selectivity. The most lateral ROI is the OTS extending from the inferior temporal gyrus (ITG) to the medial aspect of the OTS. The lateral FG (latFG) ROI includes the lateral FG and ends medially at the MFS; the medial FG (medFG) bin extends from the MFS to the lateral edge of the CoS; the CoS bin includes the CoS up to the lateral edge of the parahippocampal gyrus.

Discussion

By examining cortical representations in adults who have had visual experience with a specific, artificial visual category since childhood, we found that participants who have extensive experience with Pokémon characters, beginning as early as 5 yr old, demonstrate distinct response patterns in high-level visual cortex that are consistent across participants. Category-selective responses to Pokémon in all experienced participants occupied the posterior and middle extent of the OTS, largely lateral to face-selective cortex, and responded selectively to learned Pokémon characters rather than general imagery from the Pokémon game. Note that we do not assert that our results should be interpreted as a new Pokémon functional module in the OTS of experienced participants on par with the face-selective cortex7 . Instead, our data underscore that prolonged experience starting in childhood can lead to the emergence of a new representation in the VTC for a novel category with a surprisingly consistent functional topography across individuals. We demonstrated that this topography is consistent with the predictions of the eccentricity bias theory for two reasons: the small stimuli that required foveal vision during learning biased the emergent representations towards the lateral VTC; and Pokémon-selective voxels in experienced participants show a foveal bias. Together, our data show that shared, patterned visual experience during childhood, combined with the inherent retinotopic representation of the visual system, results in the shared brain organization observed in the adult high-level visual cortex.

The nature of Pokémon as a stimulus category is an interesting one, because it could be seen as similar to other ecological stimuli such as faces or words: the game entails repeated, prolonged and rewarded experience individuating visually similar, but semantically distinct exemplars. This is similar to other stimulus categories for which there is ecological pressure, or interest, to individuate among a visually homogeneous category such as faces, birds or cars. Our data suggest that individuals who have had life-long experience individuating Pokémon characters develop a novel representation for this learned category, demonstrating the plasticity of high-level visual cortex outside of the face-selective regions. This was supported by robust decoding results and a consistent spatial topography of selectivity for Pokémon in experienced but not novice participants. Our findings suggest that early childhood visual experience shapes the functional architecture of high-level visual cortex, resulting in a unique representation whose spatial topography is predictable. One should exercise caution when comparing the current findings with the effects of visual expertise that was acquired in adulthood62–64, as the current study focuses on participants whose visual experience began at a young age. There are several differences that distinguish our current results from such previous research on expertise. First, childhood experience led to the development of a new representation for Pokémon that was anatomically consistent across participants and was coupled with increased responses and selectivity for Pokémon in the OTS. As the same piece of cortex did not show selectivity in novices, this suggests that extensive experience individuating a novel stimulus beginning in childhood is necessary and sufficient for the development of a new representation in the VTC. Although previous investigations of category training in adults also demonstrated increases in voxel selectivity for the learned category different from our data, these activations were not anatomically consistent across individuals and occurred either in the object-selective cortex or outside the visual cortex entirely (in the prefrontal cortex). Second, Pokémon elicited numerically, but not significantly, higher responses in the face-selective cortex of our experienced participants than in our novices. Although these data are consistent with previous reports of increased responses in face-selective areas on the FG to stimuli of expertise gained in adulthood, it is unclear from our data whether these increased responses are due to experience or due to the fact that Pokémon have faces (Supplementary Fig. 9). Third, previous research has shown that learning contextual and semantic features of novel objects in adulthood (for example, this object is found in gardens) can influence VTC representations66. Thus, part of the emergent representation for Pokémon in experienced participants may have stemmed not only from visual features, but also contextual and semantic information learned about Pokémon. In other words, the representation of Pokémon may include additional semantic and contextual information, such as its habitat and characteristics, that can be investigated in future studies. Lastly, developmental work in humans with parametrically morphed stimuli has shown that improved perceptual discrimination among face identities in adulthood is linked with increased neural sensitivity (lesser adaptation) to face identity in the face-selective cortex from childhood to adulthood67. Higher neural sensitivity is thought to be due to narrower neural tuning. Future research can test whether childhood experience with Pokémon also affects neural tuning by measuring adaptation to parametrically morphed Pokémon in experienced versus novice participants.

Our findings also have interesting parallels with research on the development of reading abilities in children. First, visual experience with Pokémon began between the ages of 5 and 8 yr old, similar to the ages during which reading ability rapidly improves69. Second, it is interesting that, like looking at Pokémon, reading words requires foveation and words typically subtend small retinal images. Third, similar to our findings, research on the development of reading has shown that the word-selective cortex emerges during childhood in the lateral OTS and distributed representations in the OTS and the lateral FG become more informative from childhood to adulthood with increasing reading experience52,70. Thus, our data together with the research on the development of reading, suggest that the critical window for sculpting unique response patterns in the human VTC extends to at least school age.

Our results converge with previous research in macaques that offers compelling evidence for three important developmental aspects of high-level visual cortex. First, its organization is sensitive to the timing of visual experience43. In macaques, learning new visual categories in juveniles but not adults resulted in new category-selective regions for the trained stimuli. This suggests that there may be a critical developmental period for cortical plasticity in high-level visual cortex. Second, in macaques, early visual experience led to the formation of category selectivity in consistent anatomical locations33 and deprivation of visual experience with stimuli such as faces results in no face-selective cortex32. Third, eccentricity may be a strong prior that constrains development in high-level visual cortex. For example, in the macaque visual cortex, a protoeccentricity map is evident early in infant development39,40. However, our data also highlight key developmental differences across species. First, the critical window of cortical plasticity in high-level visual cortex may be more extended in humans than macaques. In humans, extensive discrimination training in adults results in changes in amplitudes62,71 and distributed representations63 in high-level visual cortex, but in adult macaques, responses43 and distributed representations72 do not change, even as the monkeys become behaviourally proficient at the task (but see ref. 73, which shows increases in the number of inferotemporal neurons responsive to trained stimuli in adult macaques). Second, the anatomical locus of the effects of childhood experience differs across species. In humans, the most prominent functional developments have been reported in the FG20,74–78 and OTS52,77, but in macaques they are largely around the superior temporal sulcus and adjacent gyri32,33. Notably, the FG is a hominoid-specific structure79, which underscores why development and training effects may vary across species. Third, the features of visual stimuli and how they interact with cognitive strategies that sculpt the brain during childhood may differ across species. That is, developmental predictions about the perceived animacy or size of a visual stimuli is readily queried only in humans.

The unique opportunity presented by Pokémon as a stimulus is the manner in which they vary from other visual stimuli in their physical (retinal image size, rectilinearity) and perceived (animacy, size) properties. Furthermore, the topography of the responses in experienced cortices was consistent across individuals, allowing us to ask which potential dimension of Pokémon visual features, either perceived or physical, may determine the anatomical localization of Pokémon responses in the VTC. The lateral location of this emergent representation, and its foveally biased pRFs, suggests that the act of foveating on images that subtend a small retinal image during childhood biases input towards regions in the lateral VTC that have pRFs that overlap the fovea. We posit that individuals experienced with Pokémon had enough patterned visual experience for this biased input to result in category selectivity. Several aspects of our data indicate that retinal eccentricity is the dominant factor in determining the functional topography of the VTC.

Although this by no means invalidates observations of other large-scale patterns describing the functional topography of high-level vision, our data suggest that retinal eccentricity is a key developmental factor in determining the consistent functional topography of the VTC across individuals, for several reasons. First, analyses of the physical and perceived properties of Pokémon characters suggest that the attribute of Pokémon stimuli that best predicts the location of peak selectivity for responses to Pokémon in the VTC was the experienced retinal eccentricity of Pokémon during childhood. Second, pRFs of VTC voxels selective for Pokémon in experienced participants showed a foveal bias. Future research examining how these representations emerge during childhood as participants learn these stimuli will be important for verifying that the foveal bias in the OTS precedes the emerging selectivity to Pokémon. Indeed, although Pokémon have almost linear statistics at the image level, individuals are capable of integrating holistically over a large number of pixels to perceive a curve. The extent to which this may be modulated by experience and potentially impact pRFs during learning is an interesting focus for future research. Third, although the representation of animacy also exhibits a lateral− medial organization in the human VTC, Pokémon are perceived as less animate than faces, but their representation appeared lateral to face-selective regions. In other words, if the animacy axis is continuous across the VTC, with animate representations in the OTS and inanimate on the CoS, one would expect Pokémon-selective voxels to be located medial to the face-selective cortex.

Although we observe that such a medial region is capable of responding to animate stimuli (Supplementary Fig. 8), it does not become selective for Pokémon across development. Instead, Pokémon-selective voxels were found in the OTS, lateral to the face-selective cortex. Although high-level visual cortex is capable of distinguishing animate from inanimate stimuli23, it might not be a continuous graded representation across the cortical sheet per se. Thus, the framework that human high-level visual cortex has a representation of retinal eccentricity, probably inherited from retinotopic input in earlier visual field maps, offers a parsimonious explanation of the development of Pokémon representations in the VTC.

Future research can examine how becoming a Pokémon expert in childhood, which probably entails learning optimal fixation patterns on Pokémon stimuli, may further sculpt pRFs throughout development as observed in face- and word-selective cortex31. In conclusion, these findings shed light on the plasticity of the human brain and how experience at a young age can alter cortical representations.

An intriguing implication of our study is that a common extensive visual experience in childhood leads to a common representation with a consistent functional topography in the brains of adults. This suggests that how we look at an item and the quality with which we see it during childhood affects the way that visual representations are shaped in the brain. Our data raise the possibility that if people do not share common visual experiences of a stimulus during childhood, either from disease, as is the case in cataracts, or cultural differences in viewing patterns, then an atypical or unique representation of that stimulus may result in adulthood, which has important implications for learning disabilities83,84 and social disabilities85. Overall, our study underscores the utility of developmental research, showing that visual experience beginning in childhood results in functional brain changes that are qualitatively different from plasticity in adulthood. Future research to examine the amount of visual experience necessary to induce distinct cortical specialization and determine the extent of the critical window during which such childhood plasticity is possible will further deepen our understanding of the development of the human visual system and its behavioural ramifications.