Wednesday, October 15, 2014

Memory, Learning, and Modularity: An Interview with Randy Gallistel

Have you ever asked yourself, how does the brain store a number in memory? 

It might surprise you to hear that the neuroscience community doesn’t really have a story about how this basic operation happens. Or so one fellow by the name of Gallistel is saying. 

I caught up with Randy Gallistel in May to chat about his new book Memory & the Computational Brain: Why cognitive science will transform neuroscience

The reason that I think linguists ought to be paying attention to this chap is that he’s been working out the details of symbolic/representational/computational theories of learning & perception for decades now, and the results seem promising (from a generative grammar point of view). 

This is a longer interview and it covers a lot of ground, but I’d like to pick at a couple of things that stood out for me in text. (A long interview deserves an equally long ranty post?)

Thing the first: the relationship between learning & memory 

Under the associanistic view, learning and memory are of a kind: learning is forming and strengthening associations; memory is a reflection of the strengths of those associations. You can gloss the above in your contemporary associanistic jargon of choice (e.g. connection strengths). I think that this way of dividing the labour is deeply confused. I won’t go into it here, though standby for a post devoted to the topic. 

Under the computational view, learning is the extracting of information from experience. And memory is the carrying forward of information through time. This way of carving up the apple seems sound to me. But not without a couple of caveats. 

Fair Warning: if you continue past this point, you’re in for it.  

Caveat about “experience”: the concept of experience should be approached with caution. This is because there’s a difference between an organism being exposed to some stimuli, and an exposure to stimuli that strikes an organism as an experience. That is to say, the sort of stimulation that a particular creature will treat as an experience depends on the kind of a creature it is, and the kind of perceiving and thinking it is capable of. 

Maybe the most common example of this distinction in linguistics is that of synthetic speech. It appears that if you expose a human to synthetic speech (when they are under the impression that what they are hearing is not speech) they report hearing whistles and squeaks. In other words, they do not have a linguistic experience. Alternatively, if you tell the subject that they are listening to speech, they will hear the same stimuli as speech. Now, they have a linguistic experience.1 

Or maybe you’d prefer the following example: supposedly, the ape auditory system is attuned to the same distinctive-features as humans. But observe that even though they organize the acoustic stimuli along the lines of features, the data doesn’t seem to result in anything; they do not have a linguistic experience per se.

Anyway, if you allow yourself to be puzzled by how it is that infants identify language-related data in the environment to begin with, you will see why this caveat is neat. To put it another way, why is it that my pet cat, (or monkey, or seal) doesn’t have a “linguistic experience” from the same stimuli as my niece. Or to put it yet another way, if you expose your eyeballs to acoustic data, we can say that this constitutes a type of stimuli, but it is not a visual experience (which is what eyes are all about, isn’t it?). Only visual stimuli of the proper kind will cause us to have a visual experience from which we can then draw information. 

More importantly, this observation isn’t limited to perceptual systems such as vision. It can be extended to encompass the conceptual apparatus (as Fodor has been trying to do for the past million years or so.) This is a timely issue, especially as work in generative grammar shifts to investigating the Conceptual-Intentional Interface. To re-gloss: a conceptual repertoire depends on how things in the world strike a particular creature (and often what intentional history a thing strikes them as having). Anyway, look out for more on this in the nebulous future. 

Caveat about learning: at some point in the interview, Gallistel says that learning is contiguous with perception, but it seems to me that this is not always the case. First, let’s say, for the sake of argument, that learning is the process by which we fix our beliefs about the world. Well there are patently at least two kinds of beliefs to be fixed: perceptual beliefs about the status of the world, and conceptual-intensional beliefs that are fixed by comparing new potential beliefs with your current belief system. The whole belief system. 

The former kind of belief fixation is mediated by modules. Modules are fast, dumb, topic-specific, informationally encapsulated, and generally quite badass. Consider for instance human perceptual performance under laboratory conditions. Reportedly, even when visual and auditory stimuli are presented at many times the speed at which humans would generally be exposed to them in natural experience, memory recall and analysis is exceedingly reliable. That is, if you have a subject listen to a stream of speech being produced at speeds that aren’t physically possible for a human to produce, the subject is able to parse and understand the utterance. Mutatis Mutandis visual scenes. I haven’t the references on hand, but I will likely edit this post to include them in the near future. 

Digression for B.D. Mitchell: The kind of learning that modules mediate (of course) reflects the innate architecture of the modules, and so the information they can extract is contingent, and peculiar; providing us with richly structure perspectives from which to view the world (as Noam is wont to put it). 

Returning to the main point, this kind of learning is contiguous with perception. Your language module demands, all things being equal, that you hear the sentence you’re paying attention to. You can’t not hear the sentence. Mutatis Mutandis, your language module demands that you learn that all human grammars have the property of being structure-dependent. Your perception and your learning, in these instances, is reflexive, virtually instantaneous, and generalizable among the species. 

The latter kind of belief fixation is mediated by...who-knows-what?  
It is slow(er?), and holistic. I think Lila Gleitman once called this kind of learning hell on wheels. Consider for instance the act of doing science. Hypothesis testing and confirmation of this variety is long, arduous, and generally speaking can be quite lame. Or consider, if you prefer, the far less glamorous example of the popular distinction between a simple linguistic joke & an intellectual joke. This is something that occurred to me at the last linguistics salon that we host here in the wonderful city of Toronto. 

I observe that people generally make a distinction between linguistic jokes that rely on a tacit knowledge of one’s grammar (such as those that play on homophony), and intellectual jokes. The former aren’t taken to reflect on one’s intellectual prowess but rather they are taken to reflect on one’s linguistic aptitude. The latter require you to check the information presented in the joke against your entire belief system. What you’ve got in your central belief box, and how fast you can search it, seems to be what makes ‘getting’ an intellectual joke impressive compared with a simple linguistic joke, which any human can do with ease. 

Compare for example: 
  1. linguistic joke (Aarons, 121) :
    "If it ducks like a quack it probably is one." 
  1. Intellectual joke:
    “Werner Heisenberg, Kurt Gödel, and Noam Chomsky walk into a bar. Heisenberg turns to the other two and says, ‘Clearly this is a joke, but how can we figure out if it's funny or not?’ Gödel replies, ‘We can't know that because we're inside the joke.’ Chomsky says, ‘Of course it's funny. You're just telling it wrong.’”
Thing the second: memory (& attention) & modularity 

There are two ways of thinking about the relationship between memory & modularity (if you buy the modularity story). The first is that modules carry out their proprietary business by drawing on general resources of memory and attention (they all draw on one and the same memory mechanism). 

The second is that modules carry out their proprietary business by deploying proprietary mechanisms of memory & attention. As Gallistel rightly points out in the interview, the neurological evidence strongly suggests that there is one memory mechanism at the neurological level. Although this leaves open, in my opinion, the possibility that there are different deployment conditions (or what have you) at the computational level. Franz Joseph Gall held something similar, and Fodor’s Modularity of Mind has a great discussion of this topic. 

The importance of this distinction becomes apparent when we consider syntactic theory. I suppose that just about everybody in the house believes that the language faculty builds mental representations. And that these mental representations are bona fide mental particulars with all the rights and responsibilities accorded to such things (they have causal powers & they are subject to peculiar conditions on well-formedness). 

One such condition is that of locality. (think displacement in all its various instantiations: binding, raising, control, feature checking, etc). This condition is interesting because there are a number of people that have attempted to explain its presence by appealing to facts about the memory mechanism. (I think maybe I read a paper by Gary Marcus that was trying do this?) 

I’m generally in favour of this approach, but I think that the facts about the memory limitations of the language faculty (and the conditions it imposes such as the kind locality we find in natural language) are going to turn out to be specific to that faculty. I think this is so not only because there isn’t any a priori reason to think otherwise, occasional empiricist kink notwithstanding, but also because it seems to me that different modules build, store, and address different varieties of mental representations. I’m quite open to having my mind changed about this, but I have a strong intuition that the visual system builds mental representations that have forms which reflect the demands of the task at hand. 

To be clear: I am not saying that there are fundamentally different species of mental representations. Maybe all mental representations are formed by the same operations (Merge, Label?), and all share the notion of locality that is implicit in all computational-symbolic representational systems at play today. But any additional constraints on well-formedness, (like maybe having uninterpretable features) are, I think, likely to be peculiar to their domains. And thus, paying attention to and keeping a record of those various structures when all of your modules are firing at once seems to me to suggest a variegated memory/attention mechanism. At least at the computational level. 

Okay... that’s all for now. Feel free to share your thoughts about all this in the comments section below. 

Notes, Admissions, Qualifications, Apologies: 
  1. Fodor, Jerry. (1983) Modularity of Mind. Page 49
  2. The post above draws mostly on my recent reading of Gallistel, Hornstein, Chomsky, and Fodor. Though I doubt they'd approve of their ideas being run together quite like this.  
  3. Apologies for some unintended noise in the recording. I’m not sure where it’s coming from, or how to get rid of it. 
  4. The linguistic joke above is taken from Jokes and the Linguistic Mind by Debra Aarons (2014)
  5. If you'd like to hear more Gallistel, check out this bloggingheads video interview.


  1. Interesting interview; thanks for doing it, and the Fodor one too. I was struck by the very broad definition of learning he uses in contrast to Fodor's very narrow definition (rational, hypothesis testing etc.). The claim "Under the computational view, learning is the extracting of information from experience." seems strange to me. Computational views of learning tend to focus more on the narrower problems of generalisation, rather than this definition which is so broad that it almost covers perception. If I stick my finger in a bowl of ice water, sure I have extracted some information about the environment.

    I don't whether this is just a terminological issue or if it hints at some broader problem or miscommunication. How widespread is Gallistel's broad view of learning?

  2. Hey -- thanks!

    I think Gallistel sees learning as continuous with perception.

    I think this way of thinking does a lot of work for him in helping to carve up the distinction between memory & learning.

    I think Fodor & Chomsky (& Norbert?) would probably be pretty squicked out by the idea of making learning & perception the same sort of thing in all cases.

    But insofar as this view helps to distinguish between the extracting of information from experience, the processing of that information, and the storage of the information (and the results of processing) for further use, I think they could live with it.

    I labelled this perspective 'computational' more to contrast it with the 'associational' perspective. But with all of its explicit assumptions it falls into the computation camp pretty neatly anyway.

    // I couldn't say how widespread Gallistel's view of learning is. From my experience learning is such a poorly defined term that we should probably just ditch it altogether. Instead we could talk about extracting information, computing it, drawing generalizations, etc.?