Neural Constructivism and Language Acquisition
Centre for Cognitive Science
University of Edinburgh
2 Buccleuch Place
Edinburgh EH8 9LW
November 10 1997
Notes on conventions used:
1 *footnote* indicates where the footnote number should go
body of footnote
Universal grammar -- the hypothesis that the acquisition device and parts of the grammar of all human languages are innate -- is a theory of human language learning and computation and is thus a theory of natural computation as well. It is a kind of nativist theory, since it attributes genetically determined properties to the language learner.
However, newer connectionist writings (many of which are reviewed in Elman et al., 1996) have challenged nativism and the language acquisition aspect of modern linguistics. In fact, recent research on constructivist neural networks -- connectionist networks that can add new units to their architectures -- has resuscitated Piagetian constructivism, a rival theory to nativism which posits that cognitive development (including language acquisition) is very much environmentally driven and does not substantially depend on innate capabilities. It is important, when considering issues of natural computation in the domain of language acquisition, to examine the new constructivism and the established nativism to see what each theory can contribute. It turns out that constructivist neural networks do not refute previous arguments against constructivism as a learning theory.
In addition, constructivist neural networks must rely on nativist assumptions to work. Therefore, nativism remains as the only well-articulated theory of natural computation in the field of language acquisition.
Nativism and Constructivism
Nativism about language is the claim that linguistic competence is determined by the genetic endowment of every human being (Cook and Newson, 1996) and it has been pursued as the best hypothesis in language acquisition for some time now (roughly since the publication of Chomsky, 1959). Previously, a domain-specific "language acquisition device" or LAD had been posited as the actual mechanism that underlies the ability to learn one's native tongue(Cook and Newson, 1996:79-80). This was mainly due to the failure to show the feasibility of domain general learning. Under current Principles and Parameters theory (Chomsky, 1995, Haegeman, 1991), universal grammar (UG) is the set of properties that are common to all languages (principles) and the set of properties that can vary in certain finite ways between languages (parameters). Thus, in more recent thinking, UG itself is sufficient to determine the target grammar given language input. The devices for explaining language acquisition and for characterizing what is common to all human languages have been integrated. But, since the principles and parameters of UG are specific to language and are putatively part of the genetic endowment of humans, the hypothesis still amounts to saying that language is innate and that language learning involves mechanisms that are specific to the domain of language.
Although no uncontroversial genetic determiner for language has yet been isolated *footnote*, most people who care about language acquisition have simply bought into the nativist hypothesis because it seems to be the only way that we can explain how children, who are pretty stupid by adult standards, can acquire language effortlessly and quickly from noisy input, while linguists, who are pretty smart (at least when compared to children), have not been able to write a single grammar for any language that can generate all and only the sentences of the given language. This despite the fact that linguists have all the same data available to them as children do and that they have been trying for at least forty years, depending on who's keeping score.
At one point it seemed that such a genetic determiner had been discovered.
Hurst et al. (1990) firstreported on the KE family, who seem to have a specific language impairment (SLI) that is transmitted by Mendelian inheritance. Although, the family were subsequently reported (Gopnik, 1990, Gopnik and Crago, 1991) as having only certain specific language impairments (e.g. inability to use inflectional suffixes properly), further work by Vargha-Khadem et al. (1995) demonstrated that the SLI sufferers in the KE family also performed poorly on other cognitive tests and had speech difficulties as well. Therefore it was unclear that the inherited gene was responsible solely for language. However, van der Lely (1997) has recently reported the apparent existence of such genetic inheritance in a much more closely controlled sample.
But UG was maintained as the best hypothesis of language acquisition only at the expense of other contenders. One of the major rival hypotheses to nativism has been the constructivist hypothesis of Jean Piaget (Piaget, 1980).
In Piagetian constructivism it is contended that a large part of language is constructed in league with the environment and therefore a large part of human language acquisition is not innate, contrary to UG. Another claim is that cognitive abilities, including language, are constructed from sensorimotor skills that the child has acquired at earlier stages. Clearly,if language learning is dependent on other stages of cognition, then the language learning device is not domain specific, since it could not function without the presence of other cognitive abilities. Furthermore, the contention is that learning involves the ability to represent concepts that were previously unrepresentable. Thus, cognitive abilities and their concomitant conceptual repertoires are all constructed from previous abilities and the environment the organism finds itself in. The "previous abilities" buck has to stop some-where, so there is some innateness attributed to children.
But language is not taken to be innate in the sense of there being previously established linguistic abilities that are triggered by the environment and which unfold separately from other aspects of cognition.
Eventually, constructivism was generally rejected and thus the only hypothesis left (UG) became largely unassailable (in the minds of many linguists). The principal argument against constructivism, which is due to Jerry Fodor, is that constructivism is based on a concept of learning which could not in principle be accurate. If constructivism could not provide a coherent picture of general learning, there would be no reason to believe that constructivist learning should be taken seriously as a theory of language acquisition. Along with this nativist position came a view of mental organization that posited domain specific mechanisms for other aspects of cognition than language (Chomsky, 1980, Fodor, 1983). Recently, though, as a result of research carried out using connectionist models, most of the tenets of the nativist hypothesis have been challenged (Elman et al., 1996). Furthermore, if innateness and domain specificity can be shown not to hold for language, there is more reason to doubt that these claims hold for other aspects of cognition. Spurred by these doubts, some researchers (Quartz, 1993, Quartz and Sejnowski, to appear) have argued that Piagetian constructivism is viable after all. This is due to a number of new neural network algorithms which allow the networks to grow new nodes as required to handle the input data (from the environment)*footnote*.
Such a process bears more than a passing resemblance to Piagetian constructivism and in this paper I will not dispute that the two are compatible. However, constructivist neural networks will be shown to have two crucial problems: first, they do not, after all, refute Fodor's argument against constructivism;
second, due to the nature of construction algorithms, the networks that use them cannot offer an account of language acquisition that is different from the nativist account. Finally, the positive contributions of constructivist neural networks and their place in a theory of language acquisition will be briefly discussed.
There are various algorithms that add units to a network based on input. Three examples are the cascadecorrelation (Fahlman and Lebiere, 1990; Fahlman, 1991), upstart (Frean, 1990) and node splitting (Wynne-Jones, 1993) algorithms.
Constructivist Learning and Language Acquisition
Recently, Steven Quartz (1993) has argued that constructivist neural networks have learning properties that enable them to refute Fodor's (1975, 1980) objections to Piagetian constructivism. Of course, a rebuttal of Fodor's arguments will cast doubt on nativism as the only hypothesis of language acquisition and there will be more reason to contend that the domain specificity and the innateness of other mental processes is questionable. In this section, I will argue that although Quartz succeeds in refuting the contention that it is in principle impossible to "acquire 'more powerful' structures" (Piattelli-Palmarini, 1980:142), a neural constructivist learner still cannot learn new concepts in a realistic manner. The reason is that although neural constructivist networks skirt some of Fodor's critiques, his main argument, which regards hypothesis testing and confirmation and the origin of concepts, is not refuted.
I will also attempt to show that the construction algorithms that constructivist network use are not plausible in the domain of language acquisition unless one makes nativist assumptions regarding these algorithms. But I shall begin by characterizing Fodor and Quartz's respective arguments.
Jerry Fodor has arguably the most extremely nativist view on language acquisition. The following quote, with which he opened his paper at the famous Royaumont meeting *footnote*, summarizes his position nicely: "It seems to me that there is a sense in which there isn't any theory of learning, and this is quite compatible with Chomsky's point that maybe there is no general learning mechanism that applies equally to perception, language, and so on. I'll argue not only that there is no learning theory but that in a certain sense there certainly couldn't be; the very idea of concept learning is, I think, confused." (Fodor, 1980:143. His italics.)
Fodor argues for the claim that no new concepts can be learned. When learning occurs, what our theories tell us is how our beliefs about some of our concepts change. But, Fodor argues, none of our theories of learning tell us where the concepts (that these beliefs are beliefs about) come from. It is in this sense that "there is no learning theory". So, what Fodor is saying is not that our current theories of language learning (Gold's being the best example at the time he was writing; see Gold, 1967) are inadequate to escape the nativist conclusion. He is actually making the much stronger claim that no learning of any kind could in principle escape the nativist conclusion about the origin of concepts. It is worthwhile to go through Fodor's main points in more detail, because it is precisely in failing to deal with one of these points that Quartz's argument against Fodor goes wrong.
In 1975 a debate between Noam Chomsky, the chief proponent of nativist linguistic theories, and JeanPiaget, the chief proponent of constructivist theories, was held at the Abbaye de Royaumont near Paris. Many leading biologists, psychologists, linguists, philosophers and anthropologists of the day attended.
Fodor's first point is that there is a difference between a theory of concept acquisition and a theory of fixation of belief. By fixation of belief Fodor means something like learning by induction: the learner is presented with several occurrences of something and by generalizing over the relevant properties of the incidents the learner concludes that all future instances will have these same properties. Of course, the next occurrence can always destroy even the soundest inductive generalization.
Thus, sometimes induction just fails to get the right results.
Fodor grants that this kind of learning is uncontroversial and he really has nothing to say about it. However, it is what underlies the ability to learn this way that makes Piagetian constructivism impossible. The ability to learn inductively, Fodor argues, is at least partly based on hypothesis formation and this is what leads to the inability to learn new concepts.
So, Fodor's second point is that theories of learning tell us what hypothesis the subject could, must, or should entertain, but they don't tell us where the hypothesis comes from. Most theories that do venture to say where it comes from say that it's a property of the learner (and this is indeed what most theories in language learnability do say; see Pinker 1995, Gibson and Wexler 1994, Osherson, Stob and Weinstein 1989).
But is this a necessary conclusion?
It is, according to the form that hypotheses take. In forming a hypothesis the learner has to know what confirms or disconfirms the hypothesis.
Of course, the learner need not consciously know this information.
It is enough that she recognize instances that verify or disprove the hypothesis. To know what confirms or disconfirms the hypothesis, the learner has to understand what the hypothesis means. Again, the learner need not consciously understand the concepts; but she must be able to at least implement them operationally.
To understand what the hypothesis means the learner has to understand all of the concepts in the hypothesis. But since all hypotheses can be characterized as a biconditional containing the concept to be learned (Fodor, 1980:145-148) *footnote*, it follows that the concept (or at least its component concepts; but note that sooner or later you'll run out of components) must already be present for the hypothesis to be confirmed. Of course, hypothesis confirmation must be doing something. The something that it's doing is fixing the learner's beliefs about the concepts in the hypothesis. Fodor's main point is that there is no theory that tells us where the concepts that make up our hypotheses come from. He argues that it is impossible to form a hypothesis without presupposing the concepts that are necessary. Since concepts must be presupposed, they cannot be learned and must be innate.
Due to its postulation of innate concepts, Fodor's theory is also referred to as "the language of thought hypothesis".
Fodor's example of such a biconditional for learning some arbitrary concept "miv" is "X is miv if andonly if X is ... " (Fodor, 1980:145).
Standard connectionist networks (i.e. ones with fixed architectures) cannot refute this argument. Quartz goes to pains to show that fixed architecture neural networks are not only incapable of learning new concepts, but are actually strongly nativist.
Although this seems to be a strange conclusion, given the fact that networks are highly sensitive to environmental data, it can be shown using the Valiant model of learning that the architecture of the network is equivalent to the initial hypothesis space in a learning task (Quartz, 1993:231-232). Thus, by specifying the architecture of a network, we specify the hypothesis space that the network can learn in. The reason for this should be intuitively clear. The only things in a fixed architecture network that can be adjusted are the weights between the nodes and possibly the threshold functions of the nodes. But if readjusting these two parameters does not result in learning the task, such a network is basically just stuck. Given this, it is not surprising, as Quartz points out, that "finding the appropriate initial state of a network so that it lies close to the target function is the fundamental problem of neural network research" (Quartz,1993:232). And by deciding on the hypothesis space beforehand, one assumes that the domain of learning is given somehow in advance. In other words, the domain of learning is innate.
Quartz goes on to argue that constructivist neural networks yield a plausible model of learning (including language acquisition) and do not require these same nativist assumptions.
Quartz (1993:234) writes:
Informally, what is required to refute Fodor's position is some sense in which a system may increase its representational power -- defined as the set of concepts it may express -- as a function of learning ...
But, since the architecture of a network is identified with the class of concepts it may represent, this suggests that a network with the ability to alter its architecture in appropriate ways as a function of learning will be capable of extending its representation class beyond its initial state and will therefore be capable of acquiring novel concepts.
There are three interrelated ways in which Quartz's position, characterized by this quote, fails to redress the impossibility of acquiring new concepts.
First, he fails to address the real issue raised by Fodor, concerning the origin of concepts.
Second, he conflates two different algorithms, only one of which can properly be called the constructivist network's learning algorithm.
Third, the construction algorithm in constructivist learning is either stopped by a signal extrinsic to the algorithm or is failure-driven and such an algorithm cannot be involved in domain general learning because this is (at least) an untenable procedure for language acquisition.
First, it is not sufficient to show that a network can represent something that it could not previously represent to refute the language of thought hypothesis. Obviously, an organism (or network) can learn to express new concepts. To deny this would be to say that, at any given time, an organism (or network) can express anything that it could ever express.
This is plainly absurd: babies aren't born consciously knowing about freedom, asparagus, or red army ants and nativists aren't so stupid as to believe that they are. And clearly, if one goes on a representational theory of mind, the ability to express a new concept is based on the ability to increase representational power. The language of thought hypothesis does not deny any of this. Rather, the hypothesis is about the origin of concepts.
Fodor asks the question, "Where do concepts come from?" and Quartz effectively answers "Concepts come from not having concepts and then having concepts. This network didn't know what asparagus was yesterday and today it does." Representing a new concept is (being literal-minded) like painting a little mental image of the thing the concept stands for. But a painting of asparagus is not a theory about the origin of the concept "asparagus".
Representations presuppose that what they are representations of come from somewhere. But the representation itself has no story to tell about where the thing that it represents comes from. So adding a new representation does not tell you were the concept instantiated by the representation comes from. We need a separate theory for that. In fact, this is precisely why Fodor's question is an interesting one. We know that we learn new concepts in the sense of learning to represent previously unrepresented things, thinking about previously unthought about things, talking about previously unmentioned things, etc. This is precisely what Fodor calls fixation of belief and Quartz has made the exact mistake that Fodor warns against:
Yet it seems to me that nobody has ever presented anything that would have even the basic characteristics of a theory of learning in the sense of a theory of concept acquisition. The reason people think that there is such a theory is that they confuse a theory of concept acquisition with a theory that has quite a different logical structure, what I call a theory of "fixation of belief". (1980:144. His italics)
Indeed, a theory of fixation of belief presupposes a theory of concept acquisition and, logically speaking, no theory that presupposes a second theory can be an explanation of that second theory.
Second, even if we give Quartz the benefit of the doubt, he is conflating two separate forms of "learning" (I use scare quotes because I will attempt to show that one of these things is not learning at all). The first form of learning is that performed by all neural networks. This involves adjusting the weights between nodes and possibly the threshold values of the nodes (Churchland and Sejnowski, 1992:96ff). As Quartz himself notes (1993:227-231),this kind of learning is inadequate to learn new concepts:
But, the effect of learning is to reduce the possible concepts that the system may represent, as learning reduces the set of elements of [the hypothesis space] G that are consistent with the training examples to a proper subset of G (and ultimately to a particular element of G).
Thus, a network learns by converging on a solution and convergence cannot lead to the acquisition of new concepts by (Fodor's) definition.
In fact, the worst thing a network could do would be to overfit the data. Overfitting occurs when, due to the architecture of the network being too large for the problem space it is trained on, the network not only learns the significant regularities in the input, but also learns all the noise (Churchland and Sejnowski, 1992:105ff). Effectively, the network learns the training data by rote and cannot generalize to new data. Clearly, overfitting is what happens when a network does not "reduce the possible concepts that [it] may represent".
Therefore, not only does a fixed architecture network, as a matter of fact, reduce the possible concepts it can represent, it has to.
Otherwise such a network will not learn at all, since all it will have done is translate the input into some coding that is not reusable in the future. The algorithm that Quartz improperly calls learning is the construction algorithm that adds new units to the constructivist learner's architecture. I believe that it is clear from what he writes that Quartz indeed takes the construction algorithm to be a learning algorithm.
For example: "... the addition of structure must be non-trivial by being describable as a process of learning." (Quartz, 1993:234). He goes on to cite several construction algorithms where the addition of units can indeed be considered "non-trivial" (see footnote 2 for references). Granting that the addition of units in these algorithms is non-trivial, I will argue that the addition of structure is neither a necessary nor a sufficient condition for learning.
To show that the addition of structure is not necessary for learning, we must take a short diversion into another theory of brain plasticity and development: selectionism. This theory, first put forward by Changeux and his colleagues in the early seventies (Changeux et al., 1973), essentially postulates initial overgeneration of neurons and neural connectivity followed by a loss of neurons and connectivity which is partially determined by sensory input (use it or lose it *footnote*) and partially maturationally determined. There has been some evidence collected for this kind of development. Johnson (1997, 35-39) mentions a study by Huttenlocher and colleagues in which it was found that, depending on the area of cortex, there is a significant period of initial overgeneration of synapses followed by a pruning down to adult levels.
In the visual cortex, synaptic density returns to adult levels by age four at the latest, whereas in the prefrontal cortex reduction to adult levels does not happen until much later (between the ages of ten and twenty, roughly). Although it is clear that there is some kind of construction phase initially, it seems that this construction phase (in the sense of adding new neurons) is over, depending on the region of cortex involved, by early adulthood at the absolute latest. Now, it is also patently clear that there is plenty of learning, in every sense of this horribly loosely-defined word, that takes place well into adulthood.
Therefore, it seems that the construction phase is somehow necessary for normal development but there is no obvious way in which it seems to be necessary for learning.
Cortical regions that are deprived of their normal sensory input are taken over by other cortical functionsand do not develop as they would have, had the normal sensory input been available. For example, in rodent somatosensory cortex there is a region of barrel fields (so-called because there is a barrel-shaped area of cortex that responds to input from only one whisker) and if a whisker is removed at a very early age, the barrel that would be responsible for that whisker in a normal rodent is not present and the space it would have occupied is taken over by adjacent barrels, which now become more responsive to their inputs than they otherwise would have been (Johnson, 1997:49-50, Schlagger and O'Leary, 1993).
Similarly, Hubel and Wiesel (1963, 1965 1970) have famously demonstrated that if a kitten is deprived of a certain kind of visual input from a very young age, the cells that normally respond to that sort of visual input do not develop properly and the kitten cannot effectively sense the visual stimulus of which it was deprived.
These same results should also make it clear that, in real human learners, the addition of structure cannot be sufficient for learning either (no matter how "non-trivial" the addition) because there also seems to be an elimination process that is necessary for normal development as well.
There is a second, more problematic way in which the addition of structure is not sufficient and it ties in with the previous discussion of overfitting. As mentioned, if a network has an architecture that is too large for its problem domain, the network will not learn properly.
Since the construction algorithm in a constructivist network adds units based on some external impetus, the network must be prevented from growing too large. If the network needs some extrinsic signal to avoid overfitting, then it is clear that the addition of structure, even if it is principled and not arbitrary, is not sufficient for learning by itself, because the network needs at least to know when not to grow and this piece of knowledge is separate from the growth algorithm. Furthermore, without this constraint on adding units, the network will in principle not learn properly.
Another important consideration is just how the construction algorithm is stopped and this leads to the third problem for constructivist algorithms: language acquisition. There seem to be two sensible ways of doing this:
(a) the algorithm has an actual stop signal or
(b) the algorithm is failure-driven.
Option (a) can be exercised in two ways:
either the network has a limit on the number of nodes it can add or some other interlocutor must stop it. The first option seems unlikely to be biologically plausible because of individual differences in brain size and connectivity.
The second option is more biologically plausible. Note that "interlocutor" here just means something external to the algorithm, not something external to the organism. There are ways for this to be instantiated in a biological system. One immediate way to stop the construction algorithm is for it to work on some maturational basis, such that at a certain maturational stage in a biological organism, the adding of structure is stopped no matter what.
There have been proposals like this to account for selectionism (Ebesson, 1988) and there seems to be no reason why this shouldn't in principle work for the termination of construction as well. A second sort of biologically plausible stop signal is nutrition (or abstract encoding of the availability of resources in the case of networks). If the brain does not have the fuel to form new connections, then it obviously cannot do so. However, nutrition can really only be invoked to explain cases of premature stopping, as an organism with a healthy diet will not experience a sudden lack of nutrition.
Speculatively, what might be plausible is to postulate a maturational change in the use of nutrients. But this is just pushing the problem back a step and basically amounts to invoking a maturational stop signal.
Now, having established that the stop signal seems likely to be maturationally determined, it is hard to see how a construction algorithm using option (a) is not just an instantiation of the biologically-triggered, maturational account of Lenneberg (1967).
In Lenneberg's account, which is in accord with nativist theories of language acquisition, it is a genetic predisposition that is responsible for the way that language unfolds. But, the account is not purely maturational, in that there has to be at least sufficient environmental input to the system to get it started and to maintain its functioning.
It is not controversial in this account that language learners have to wait for a certain maturity in the brain before they can achieve certain stages in linguistic competence. And it is not surprising that the development of mental behaviour is contingent on the development of the brain structures that underlie it.
However, if the construction algorithm is maturationally affected, it is hard to see how constructivism amounts to a compromise of the innatist picture of language development.
Option (b), that the algorithm is failure-driven, means that the construction of new units only occurs if the present network architecture cannot succeed in the learning task. If the architecture that the network has at time step i can learn the task then the architecture will not be added to. But if the architecture is inadequate for the learning task, then it is added to and the network is retested at time step i+1 (Westermann, 1997). In the sense of being driven by negative feedback, the construction phase instantiates a cybernetic system (Wiener, 1948): failure to be in the target state means changes to the present state that bring it closer to the target state. Now, for a system to be failure-driven it must have some input to signal failure.
It is important to think about what this input could mean in a real system. Since the domain of real interest in this paper is language acquisition, let us look specifically at the domain of syntax. The most obvious way to signal failure in syntax is the inability of the current network architecture to parse the input string (Gibson and Wexler, 1994). So, let us suppose that the network fails to parse an input string. There are two different ways in which an architecture might be inadequate for parsing the string. The first form of inadequacy involves an improper setting of the network's weights. To try and set up a new architecture that would parse the input string the network has to start rejigging its weights. Only if this rejigging of weights fails to parse the string should the network add a new unit. The first thing to note is that, in a network containing a lot of units, this resetting of weights could take quite a while.
It seems unlikely that such an undirected procedure could work in a network as large as even just the language areas of the human brain. We can gloss over this problem by assuming that the human brain has other mechanisms for coping with what is instantiated as the changing of weights in connectionist models and that these mechanisms are fast.
However, even if such a mechanism were found and if this mechanism were attributable to failure-driven constructivist neural networks, the more difficult problem would lie in the second sort of architectural inadequacy.
The second way in which the network could be inadequate is by not having a large enough architecture for the problem domain, such that, no matter how the free parameters of the current architecture are changed, the problem cannot be learned (the string cannot be parsed in this case). Plainly this kind of failure is contingent on the first kind. Therefore a new unit should not be added unless the current architecture fails to deal with the task. But, it is not enough to add a unit indiscriminately. This is especially true if we give the benefit of the doubt to Quartz and concede that (contrary to Fodor) the network can learn new concepts. Learnability theory tells us why this should be so: if a language learner accidentally gets into a grammar space that contains the target grammar as a proper subset, the language learning domain does not provide the right sort of evidence to get back to the subset grammar (Pinker, 1989:9ff., 1995:153ff). Therefore, the learner would be stuck in this grammar that not only contains all the grammatical strings of the target grammar, but also generates a bunch of strings that the target grammar considers ill-formed. This is referred to as the problem of negative evidence in the language learnability literature.
Negative evidence constitutes some kind of signal to the learner that its grammar is generating strings that are not allowed in the target grammar. Although there is debate about exactly how much negative evidence is available to children and whether there are forms of speech directed at children which might provide indirect negative evidence (for a review of these issues see Pinker, 1995:153ff.), it does seem well-established that there is not a huge amount of negative evidence around and it seems that parents do not always correct the syntax of their children.
Unfortunately there would have to be a huge amount of negative evidence available if it were in principle possible for the child to get into superset grammars. If the child can get into one superset grammar, what is to stop him or her from getting into a superset of the superset and so on?
If such a case where possible, there would have to be masses of negative evidence available to guide the child back, not just the little bit that has occasionally been unearthed.
Note that a standard constructivist algorithm may get into a superset grammar. The reason is that such a network is constantly in serious danger of overgeneralizing. As we noted, the node adding algorithm is failure driven.
However, the data observed by the language learner is full of noise, speech errors, foreign languages, and so forth. Thus, the network would have to wait till the aspects in its environment became salient enough (i.e. statistically significant).
However, it could only do this by keeping track of a large part of what has happened before. One way for the network to achieve this would be to adjust its weights in favour of such inputs. But weight adjustment is just how the network learns in general!
This would lead precisely to learning the noise, etc. Another solution is to keep some kind of buffer of the linguistic context that's gone before and then compute over this. Two points arise here. First, neural networks are not very good at encoding the sequence that they received information in. As soon as the information is assimilated it becomes distributed in the nodes and weights.
The second point is that there does not seem to be any evidence in the literature that children are under this kind of stringent memory constraint (assuming that's what it would be in the real world).
An obvious way for the adding of new units to be constrained is by restricting the kind of evidence that results in a unit being added.
Such an approach has already been taken in Principles and Parameters theory and in learnability theory (particularly Gibson and Wexler's (1994) theory of triggers).
These theories are detailed specifications of the properties of universal grammar. Together they postulate a number of abstract properties that are intrinsic to the organism and a way in which the relevant properties can be selected based on evidence from the language input (without the need for negative evidence).
Indeed, in establishing an unrelated claim, Quartz admits that such constraints are necessary:
An important insight into learning from [Valiant's] model was that feasible learning -- learning that is achieved within some realistic time bounds -- was seen to require (at least) a significant restriction of the possible conjectures that the learner must evaluate. (Quartz, 1993:229)
However, if these kinds of restrictions are built into a construction algorithm, as they must be to avoid the superset problem, it is hard to see how the algorithm is not just an instantiation of a highly nativist position.
Therefore, in the domain of language acquisition, it seems that for a construction algorithm to be failure-driven it must also have many nativist assumptions built in. Option (b) is therefore nothing but a restatement of the nativist position.
It seems, then, that no matter how the construction algorithm in a constructivist neural network functions, in the domain of language acquisition such an algorithm is forced to invoke some kind of nativism.
Neither the option of using a stopping mechanism nor the option of being failure-driven is viable without assuming the network has innate, language-specific properties. In fact, the constructivist position is only consistent in so much as it fits in with previous theories of language acquisition, all of them highly nativist.
Quartz has argued that a constructivist neural network can learn new concepts, but it was shown that such networks only represent new things and no one denies humans have the ability to do this. However, the constructivist network in no way offers an explanation of the origin of concepts and in this sense Fodor's position still holds perfectly consistently. Next, it was demonstrated that even if we did assume that constructivist networks can learn new concepts, once the learning algorithm is extracted from the construction algorithm it can be shown that, in the domain of language acquisition, the construction algorithm must be nativist.
Constructivist neural networks are therefore nativist, as are fixed architecture neural networks.
The question now is what constructivist neural networks can contribute to the study of language acquisition.
Constructivist Neural Networks and Language Acquisition
It has been shown that neural networks, whether constructivist or not, must rely on nativist assumptions about the origins of concepts and about the nature of language acquisition. However, this is in conflict with recent arguments from developmental cognitive neuroscience and connectionist research which claim that innate representations are not biologically plausible and that connectionist networks can get a lot more out of the environment than was previously thought possible (Johnson, 1997, Elman et al., 1996).
The first contribution of constructivism is just the obvious one of adding a new tool to the modelling toolbox. Models and discussions such as Westermann's (1997), can give us important insights into how different algorithms and architectures behave in the domain of language acquisition.
Not only is this important for modelling the natural computation that occurs in human language acquisition, but it can give us insights into issues in the implementation of cognitive functioning in general. The second, and more interesting, contribution of constructivist neural networks is to give us a new paradigm to examine the soundness of arguments like those of Johnson and Elman and his co-writers.
Constructivism is a plausible biological process (Quartz and Sejnowski, to appear). And, like selectionism, it is quite environmentally sensitive. Therefore, constructivist neural networks are to some degree biologically instantiatable, environmentally driven models. By examining their properties carefully, we can learn more about just what parts of language acquisition must be innate (given our current knowledge) and which parts might be environmentally determined after all. The only important thing to bear in mind is that constructivist neural networks do need substantial innate assumptions to get going. Thus, it seems that, contrary to the recent research cited above, from a learning theoretical point of view it is not possible to have learning of new concepts or learning of language without substantial assumptions about the given properties of the learner (i.e. its innate properties), even in a higly environmentally driven, biologically plausible model.
The question now is whether we are at an impasse. The answer is no and there are two reasons for this. First of all, even though Elman and his colleagues argue against the biological plausiblity of literally having innate representations of principles, parameters or anything else, they do allow for innate architectural constraints and developmental timing effects that occur because of and in tandem with these architectural constraints (Elman et al., 1996:357ff). The claim is that language develops through a combination of these architectural and timing constraints and the interaction of the child with his or her environment. They have to make this claim because it is an established fact that there are regularities across languages (Comrie,1989) and that there is regularity in language development. However, given the vast variability in linguistic, social, cultural, physical and familial environments, it seems that it is not tenable to say that most of these regularities are in the environment.
Therefore, it seems that these universals should be attributed to the architectural and timing constraints in development. But this is still consistent with the nativist tradition in language acquisition, because the properties are still attributed to the organism as opposed to the environment, and this is the only important stance as far as nativist theories of language acqusition are concerned.
The second reason why this does not lead to an impasse is that biology is a growing science and hypotheses in language acquisition are about behavioural data. As such, it is not enough to say that our current biological science is incapable of explaining how these hypotheses could have biological correlates to prove the behavioural hypotheses false. To prove these hypotheses (qua hypothesees about behaviour) false, one must present behavioural data.
The biological data can give us an impetus for looking at other solutions, but the fact remains that new findings in biology might equally well substantiate these hypotheses. In fact Elman and his colleagues rely on the fact that biology is not complete themselves:
"there is no known evidence of any biological system which implements backpropagation learning" (Elman et al., 1996:105). Then they go on to describe similar theoretical connectionist properties that have since their inception indeed been found in biological systems. By courtesy then, these same allowances should be made for nativist hypotheses in language acquisition.
Thus constructivism is useful in investigating language acquisition, not just as another modelling paradigm, but specifically as a modelling paradigm that has properties used in arguments against nativist theories of language acquision. By looking at constructivist neural networks as instantiations of the positive proposals in these arguments, we can examine the arguments' strengths and weaknesses. More importantly, we can examine the compatibility of these arguments and existing theories of language acquisition.
Language acquisition has always presented a difficult problem, because it has never been an easy task to characterize the properties of the system performing the computation. One answer to this question is nativism: the system performing the language acquisition has predetermined properties that allow it to do so. This position has been articulated by Noam Chomsky in his theory of universal grammar and has been the predominant mode of thinking in linguistics in the last half of this century. Another possible answer is Piagetian constructivism: the system has few predispositions or innate concepts and new concepts and abilities in language acquisition develop in league with the environment and other cognitive development. This position was argued against by Fodor on the basis that new concepts cannot be learned.
New advances in connectionist research which have resulted in constructivist networks that can add units to their architectures, have led Quartz to argue that such networks can overcome Fodor's objections. It was shown that these networks cannot refute Fodor's argument and that, like fixed architecture networks, they are in fact nativist.
The contribution of constructivist neural networks to language acquisition is not a refutation of nativism, but is rather as a new paradigm for examining claims for and against nativism. Indeed, as an instantiation of the properties that have been put forward as reasons to doubt nativism, constructivist neural networks show that these arguments against nativism must be re-examined. Finally, it is clear that, since constructivist neural networks do no refute Fodor's arguments and are in fact nativist, Piagetian constructivism is still not tenable as a theory of language acquisition. Constructivist neural networks actually give more support, directly and indirectly for nativism as the best theory of natural computation for language acquisition.
Changeux, J-P., Courrege, P. and Danchin, A. (1973) A theory of the epigenesis of neuronal networks by selective stabilization of synapses. Proceedings of the National Acadamy of Sciences of the USA 70, 2974-8.
Chomsky, N. (1959) Review of B.F. Skinner Verbal Behavior. Language 35, 26-58.
Chomsky, N. (1980). Rules and representations. Oxford: Blackwell.
Chomsky, N. (1995) The minimalist program. Cambridge, Mass.: MIT Press.
Churchland, P.S and Sejnowski, T.J. (1992) The computational brain.
Cambridge, Mass.: MIT Press.
Comrie, B. (1989) Language universals and linguistic typology (2nd edition). Oxford: Blackwell.
Cook, V.J. and Newson, M. (1996) Chomsky's universal grammar (2nd edition). Oxford: Blackwell.
Ebbesson, S.O.E. (1988) Ontogenetic parcellation: dual processes. Behavioral and Brain Sciences 11, 548-9.
Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996) Rethinking Innateness: A connectionist perspective on development. Cambridge, Mass.: MIT Press.
Fahlman, S.F. (1991) The recurrent cascade-correlation architecture. In R.P. Lippmann and J.E. Moody (eds.) Advances in neural information processing systems 2. San Mateo: Morgan Kauffmann.
Fahlman, S.F. and Lebiere, C. (1990) The cascade-correlation learning architecture. In D.S. Touretzky (ed.) Advances in neural information processing systems 3. San Mateo: Morgan Kauffmann.
Fodor, J. A. (1975) The language of thought. Cambridge, Mass.: Harvard university Press.
Fodor, J. A. (1980) Fixation of belief and concept acquisition. In M. Piattelli-Palmarini (ed.)
Language and learning: The debate between Jean Piaget and Noam Chomsky. London: Routledge and Kegan Paul.
Fodor, J. A. (1983) The modularity of mind. Cambridge, Mass.: MIT Press.
Frean, M. (1990) The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation 2, 198-209.
Gibson, E. and Wexler, K. (1994) Triggers. Linguistic Inquiry 25, 407-454.
Gold, E. M. (1967) Language identification in the limit. Information and Control 10, 447-474.
Gopnik, M. (1990) Feature-blind grammar and dysphasia. Nature 344, 715.
Gopnik, M. and Crago, M.B. (1991) Familial aggregation of a developmental language disorder. Cognition 39, 1-50.
Hubel, D.H. and Wiesel, T.N. (1963) Receptive fields of cells in striate cortex of very young, visually inexperienced kittens. Journal of Neurophysiology 26, 994-1002.
Hubel, D.H. and Wiesel, T.N. (1963) Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. Journal of Neurophysiology 28, 229-289.
Hubel, D.H. and Wiesel, T.N. (1970) The period of susceptibility to the physiological effects of unilateral eye closure in kittens. Journal of Physiology 206, 419-436.
Hurst, J.A., Baraitser, M., Auger, E., Graham, F., and Norell, S. (1990) An extended family with a dominanty inherited speech disorder. Developmental Medicine and Child Neurology 32, 352-355.
Johnson, M. H. (1997) Developmental cognitive neuroscience. Oxford: Blackwell.
Lenneberg, E. H. (1967) Biological foundations of language. New York: John Wiley & Sons.
Osherson, D. N., Stob, M., Weinstein, S. (1989) Learning theory and natural language. In R.
J. Matthews and W. Demopoulos (eds.) Learnability and linguistic theory. Dordrecht: Kluwer Academic Publishers.
Piaget, J. (1980) The psychogenesis of knowledge and its epistemological significance. In M. Piattelli-Palmarini (ed.) Language and learning: The debate between Jean Piaget and Noam Chomsky. London: Routledge and Kegan Paul.
Piattelli-Palmarini M. (ed.) (1980) Language and learning: The debate between Jean Piaget and Noam Chomsky. London: Routledge and Kegan Paul.
Pinker, S. (1989) Learnability and cognition: The acquisition of argument structure. Cambridge, Mass.: MIT Press.
Pinker, S. (1995) Language acquisition. In L. Gleitman and M. Liberman (eds.) An invitation to cognitive science (vol. 1): Language. Cambridge, Mass.: MIT Press.
Quartz, S. R. (1993) Neural networks, nativism, and the plausibility of constructivism. Cognition 48, 223-242.
Quartz, S. R., and Sejnowski, T. J. (to appear) The neural basis of cognitive development: a constructivist manifesto. Behavioral and Brain Sciences, to appear.
Schlagger, B.L. and O'Leary, D.D.M. (1993) Patterning of the barrel field in somatosensory cortex with implications for the specification of neocortical areas. Perspectives on Developmental Neurobiology 1, 81-91.
Sorace, A., Heycock, C. and R. Shillcock (eds) 1997. Proceedings of GALA '97: Language Acquisition, Knowledge Representation and Processing. University of Edinburgh.
Van der Lely, H.K.J. (1997) Modularity and innateness: Insight from a grammatical specific language impairment. In Sorace, Heycock and Shillcock (eds.). Paper delivered at Gala'97: Language Acquisition, Knowledge Representation and Processing; Edinburgh, April 4-6, 1997.
Varga-Khadem, F., Watkins, K., Alcock, K., Fletcher, P. and Passingham, R. (1995) Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Acadamy of Sciences of the USA 92, 930-933.
Westermann, G. (1997) A constructivist neural network learns the past tense of English verbs. In Sorace, Heycock and Shillcock (eds.). Paper delivered at Gala '97: Language Acquisition, Knowledge Representation and Processing; Edinburgh, April 4-6, 1997.
Wiener, Norbert (1948) Cybernetics, or control and communication in the animal and the machine. New York: John Wiley & Sons.
Wynne-Jones, M. (1993) Node splitting: A constructive algorithm for feed-forward neural networks. Neural Computing and Applications 1, 17-22.