Description in Linguistic Theory
copyright 1995
by Margaret Magnus
all rights reserved

Margaret's Page

Those of you with familiarity of GB or some background in philosophy of science
may want to skip to
The Punch Line

"Instead of preexisting ideas, then, we find values emanating from the system.When they are said to correspond to concepts, it is understood that the concepts are purely differential and defined not by their positive content but negatively by their relation with other terms in the system... Putting it in another way, language is a form, not a substance. This truth could not be overstressed, for all the mistakes in our terminology, all our incorrect ways of naming things that pertain to language, stem from the involuntary supposition that the linguistic phenomenon must have substance."
Ferdinand de Saussure

"If we take in our hands any volume of school metaphysics, for instance, let us ask, 'Does it contain any abstract reasoning concerning quantity and number?' No. 'Does it contain any experimental reasoning concerning matter of fact existence?' No. Commit it to the flames, for it can contain nothing but sophistry and illusion."
David Hume


In this essay, I have combined several papers, essays and attempted dissertations on the same topic which were written during my period at MIT between 1983 and 1986. The methodological errors that seem to me to be made by most linguistic traditions were recognized by de Saussure when he said that linguistics is the study of language, not grammar. His primary thesis was overlooked by the generative tradition, and so I believe it deserves revisitation. It is not a trivial thesis and the penalty for misunderstanding it seems to me very large.

My primary objection to research which is conducted in this way is that whatever proves with time to be substantive becomes buried deep beneath a ponderous theoretical apparatus, which I try to show here, has very little predictive or explanatory power. People end up investing a tremendous amount of energy into developing a theoretical apparatus, and thereby become attached to it. The empirical observations -- the raw data -- which is expressed in terms of this apparatus can be and has been expressed in many different ways. The argument is that although these various theories make more or less the same predictions, this one is more 'explanatory'. But is it? The effect on the linguistics community of this mode of research is to divide the field into factions warring for theoretical supremacy at the expense of investigation into the real substantive findings which survive the collapse of this or that instantiation of the theory. Chomsky has initiated many major revisions to his theory, and very little remains for posterity of those that get dropped... Because the primary object of research is grammar, not language, when the grammar is drastically revised, all efforts of the past disappear. What always remains are substantive empirical observations, but these findings have been put on the back row, even in private circles scorned as 'mere' taxonomy. So you find in the linguistics community a bunch of Chomsky followers or followers of some other leader. These followers take on whatever theoretical changes the master dictates and work on his/her terms. And if the terms or changes are questioned, the follower falls outside the circle. And there is a war for which 'school' will have supremacy -- who gets the best jobs at the best institutions. Chomsky wins this war. But what does that have to do with understanding language? In my opinion, little. You can pursue the one or pursue the other, but you can't pursue both effectively.

So this essay addresses this issue by analyzing the actual structure of GB theory and then showing piece by piece how it fails to say much, and how substantive findings by various linguists are stolen by superimposing an essentially meaningless theoretical apparatus on these basic findings and claiming implicitly that the finding is nothing and the theoretical pparatus is everything. I approach the subject this way also because the argument is frequently made that the critic is just an idiot and doesn't understand the fine points of the theory. So I outline the theory for you. I do understand it. I could teach it competently at the graduate level. I completed all the coursework and qualifying exams for a PhD in linguistics at MIT with a 4.9/5.0 GPA, including some courses which Chomsky taught and evaluated. Chomsky was also on my qualifying exam committee, and he passed me. Had he felt me incompetent in his theory, one must assume he would have felt morally obliged to fail me.

I begin the presentation by outlining Government and Binding Theory much as it was presented in publications in 1981-86, except that I strip it down to its most invariant parts, and leave out a lot of detail. In this presentation, I have added information about basic syntax and also left out discussions of less central issues in GB in the hopes of making this readable also to non-linguists.

In section 2, I make some general philosophical and theoretical observations, using this outline of GB as a base, and present the notion of surface grammar. I do this in preparation for the next section in which I exemplify the reinterpretation by clearing away some of those parts of the grammar which have no predictive power.

My intention in this essay is to bring the reader in my own way to an understanding of what I perceive as an important methodological error, to give a sense for its consequences, as well as a sense for what follows its correction. I use GB, which is an outdated version of the grammar. I do this intentionally, so that the reader will not confuse my writing with politics. The error I address seems to me to be made by most linguistic traditions in most periods of time. The reasons for the shifts away from GB to subsequent instantiations of that framework did not have to do with the issues presented here.

The science that has been developed around the facts of language passed through three stages before finding its true and unique object. First something called grammar was studied. This study, initiated by the Greeks and continued mainly by the French, was based on logic. It lacked the scientific approach and was detached from language itself.
Ferdinand de Saussure

Outline of Government and Binding Theory as of ca. 1985

Philosophical Foundation

First let me mention some assumptions that are common to much of modern linguistics. The most complete discussions on the philosophical foundations of generative grammar can be found in the work of Chomsky. Especially the first chapter of Aspects in the Theory of Syntax is helpful. In The Logical Structure of Linguistic Theory, Chomsky defines the primary question of linguistic theory as:

"How can speakers produce and understand new sentences?"

that is to:

"Provide an explanation for the general process of projection by which speakers extend their limited linguistic experience to new and immediately acceptable forms."

The first step in solving this problem is to assume that every speaker has internalized a grammar that expresses his or her knowledge of the language. The object of study of generative linguistics in Chomsky's view is grammar, not language. A grammar is defined as "a description of the ideal speaker-hearer's intrinsic competence."

We could conceivably come up with a number of different ways to describe the data that we have already considered in a language. These descriptions are called observationally adequate grammars. In the process of writing grammars, we consider a certain amount of data and draw generalizations over it. If these generalizations are correct in some deeper sense, they will apply correctly to data not yet considered. A grammar which consists of correct generalizations of this kind is called descriptively adequate. Chomsky proposes that we have an explanatorily adequate theory which picks out among all of the observationally adequate grammars that one which is descriptively adequate.

All theories which have exactly the same predictions over an infinite set of data can be reduced to one another mathematically, and are therefore notational variants of one another. Chomsky speaks of a simplicity metric which would choose only the simplest of these notational variants.

Basic Syntax

In all of syntax, not just GB, the syntactic structure of a sentence is shown by parsing it, and the parse is done by labelled bracketing. The purpose of parsing is to group together words which hang closely together semantically and syntactically. These groups are called constituents. An example of labelled bracketing follows:

[S[[NPA [Afragrant] [Nrose]][S'which[Vstood][PPin[NPa vase]][PPon [NPthe table]]][VPlost[NPa petal]]]

A notational variant of labelled bracketing is a syntactic tree, which is a major part of the structural description of a sentence:


Each of the points in the tree with labels NP, S and so forth, are called nodes and the lines which come out are called branches. A node which has more than one branch below it is called a branching node. A node which is above another node in the tree is said to dominate it. Each node stands for a constituent in the sentence. A constituent is a group of words that hang together and have a head. An S constituent is similar to what we think of as a sentence or clause. An NP is a noun phrase, which is that part of a sentence which has a noun as its head. An example of a noun phrase is 'the enormous redwood tree on that hillside that has no lower branches'. The head of that noun phrase is 'tree', because everything else in the phrase only describes the tree. The head of the subject NP in the above sentence is 'rose'.

In some grammars, the tree is generated by a set of phrase structure rules. For example, the PS-rule which generates the first branch is S -> NP VP. This rule should be internpreted "A sentence analyzes into an NP (the subject) followed by a VP (the predicate)." The phrase structure rules which can generate the above sentence are:

S -> NP VP
NP -> NP S
NP ->Art N
VP -> V NP
PP -> P NP

In general PS rules are ordered.

X-bar Theory

This theory predates GB, and is outlined most thoroughly in Jackendoff(1977). X'-Theory concerns the structure of constituents in a sentence. Briefly, a sentence is constructed out of several major constituents, or maximal projections, such as clauses (S' and S), verb phrases, noun phrases, adjective phrases, and so on. The categories are projections of their heads (the main noun, verb, etc.). The major insight of X'-Theory is that all maximal projections have basically the same structure. In English this could be stated using the phrase structure rules:

X'' -> (Specx) X'
X' -> X (Compx)

where Spec (specifier) and Comp (complement) are again X'' (or maximal projections). X is a sort of variable ranging over the parts of speech and can therefore be replaced by a word of any part of speech (noun, verb, etc.) X'' is also written XP, and X is the head of XP. Spec tends to correspond to the subject of a projection, and Comp tends to be the objects. X' often corresponds to the predicate of a predication. Some examples are:

Sample Maximal Projections
Type of Phrase Example Headed by

PS Rules

Noun Phrase (NP) [NPJohn's [N'[Nfriend][PPin the store]]] noun NP (or N'') -> NP N'
N' -> N P" (or PP)
Adjective Phrase (AP) [APmore [A'[Abeautiful]]] adjective AP (or A'') -> Adverb A'
A' -> A
Prepositional Phrase (PP) [PP[Pin][NPthe house]] preposition PP (or P'') -> P'
P' -> P N" (or NP)
Verb Phrase (VP) [VP[V'[Vhit][NPthe ball]]] verb VP (or V'') -> V'
V' -> V N" (or NP)

There are also a number of variations on this in the literature. Jackendoff, for example, argues for three X-bar levels, so X''' is the maximal projection.

General Observations

Given this background in basic syntax, consider what kinds of data a theory of syntax would be expected to account for:

1. How do you map the head, subjects and objects into a sentence? For example, what are the rules governing who smakced who in sentences like, 'Horatio smacked Alexander' vs. 'Alexander smacked Horatio' vs. 'Horatio got smacked' vs. 'Horatio got smacked by Alexander'?

2. Given several constituents, like the prepositional phrase 'in the vase', and the noun phrase 'a gragrant rose' how do you combine them to form a legal sentence in a language?

3. There are types of syntactic processes which don't change the argument structure of a verb, such as asking a question. When you rearrange subjects and objects, the changes generally stay very local, bound inside a single constituent:

How do you know that John kissed the gorilla?
How do you know that the gorilla was kissed by John?

You can't say:

*John how do you know kissed that the gorilla.

(Stars in linguistics are used to indicate that a sentence is ungrammatical.)

By contrast in English, you often do ask a question by moving the question word way out of its original clause to the front of the sentence:

William was under the impression that Andrew planted roses along the fence.
What was William under the impression that Andrew planted along the fence?

The rules governing processes like question formation need also be addressed in a theory of syntax. What sorts of semantic processes involve long and short range movement.

4. The last major issue addressed by most theories of syntax, GB included, is the rules governing coreference, or what words in a sentence refer to the same thing. For example, under what circumstances are you required to use 'himself' to refer to 'William', and in what circumstances do you use 'him'?

William thought he might have poisoned *him/himself.
William thought Mary might have poisoned him/*himself.

In the process of answering these questions, you have to define terms, such as subjects, objects, constituents, coreference and so forth. These definitions determine how each sentence would be given a syntactic labelling or structural description. The criterion determining the structural description for each sentence would be its efficiency in answering something like 1-4. The form that GB's answers to these questions takes is a grammar which is such that it fails to assign ungrammatical sentences a legal structural description or phrase structure tree. Those sentences for which a structural description or phrase structure tree can be generated by the phrase structure rules are thus grammatical.

Case Theory

The answers to question (1) above:

1. How do you map the head, subjects and objects into a sentence? For example, what are the rules governing who smakced who in sentences like, 'Horatio smacked Alexander' vs. 'Alexander smacked Horatio' vs. 'Horatio got smacked' vs. 'Horatio got smacked by Alexander'?

are provided in GB primarily by the notions of Case and Government. Abstract Case in GB is different from case in traditional grammar, but presumably a reasonable mapping exists from Case to case to account for case in languages that have it. We say a transitive verb subcategorizes for (or requires inherently, sort of) an object, and in a sentence, this verb is said to govern the object (or tell it what case it has and what syntactic position it occupies). Alternatively, the verb is the governor of the NP which is the object. The subject and objects in a sentence are called the arguments of the verb, and the positions in which they occur in the sentence are called for . A head assigns Case to its arguments. It can assign Case to the right or left, and the Case assigned by each head (verb, preposition), is in general determined individually by that head. In English, for example, a transitive verb is held to assign accusative Case to the right. So in a sentence like:

Benjamin removed the tape.

'the tape' has accusative Case assigned by 'remove', because remove is the head of the verb phrase. In Chomsky's version of GB, however, unlike traditional grammars, a subject is not assigned nominative Case by the verb. In other words, he does not see the verb as the head of a sentence. Instead, he postulates a theoretical category INFL which is responsible for tense, mood, auxiliaries and subject-verb agreement in the sentence. INFL is the head of S, and it is the tense part of INFL which assigns nominative Case to the subject.

The most important principle of Case Theory is the Case Filter stated informally as:

Every overt (meaning non-null, containing words, not merely a structural position) noun phrase must have (abstract) Case.

This mechanism is the primary means to insure that NPs don't appear in random positions in the sentence. The sentence:

Mary Jeff hung a picture on the wall a poster.

is automatically marked ungrammatical by GB, because 'Mary' and 'the poster' violate the Case Filter. They violate the Case Filter, because they are not governed by any head, and therefore do not get abstract Case assigned to them.

In addition, GB assumes that certain lexical processes, like the passive 'absorb accusative Case'. If this is the true, a direct object can't appear behind a passive verb:

*There was hit John.

The passive verb 'was hit' no longer assigns abstract Case to the object position, because the process of making the verb passive (from 'hit' to 'was hit') also results in the 'absorption of accusative Case'. 'John' therefore cannot get Case in the position it now appears in, and has to move to a Case-marked position to avoid a violation of the Case Filter. The only such position left available by the PS (phrase structure) rules is the subject:

John was hit.

Government Theory

The mechanism of Case assignment to subjects and objects nearly insures that any rearrangements of argument structure will stay in a small area surrounding the head of a constituent. Indeed, if rearranging argument structure were marked by moving elements far away from their heads, it would be very difficult to figure out which objects went with which verbs. For example, in English, we form the passive by moving the object to subject position. Imagine how it would be if we moved it somewhere else:

I was nearly convinced when Bill told me that Gracie hit John.
-> I was nearly convinced when Bill told me that John was hit by Gracie. (object 'John' moved to subject position of 'hit')
vs. *John I was nearly convinced when Bill told me that Gracie was hit. ???
or *I was nearly convinced John when Bill told me that Gracie was hit. ???

There is a notion in GB defining this domain around a head to which certain types of processes are limited. This is called a governing category, and is defined as follows:

a is the governing category for b iff a is the minimal category containing b, a governor of b, and a SUBJECT accessible to a. (Chomsky, 1981)

The basic idea is that there is a fairly narrow neighborhood around the head of a phrase within which words have to stay when the sentence is altered by becoming passive or something. Abstract Case is assigned under government and thus limited to the governing category. The formal mechanism limiting rearrangements of argument structure to the confines of a governing category is the Binding Theory. The Binding Theory also answers question (4) above, namely when to use 'him' to refer to another noun in the sentence, and when to use 'himself'. However before I can outline Binding Theory, I need to define a few other terms and concepts in GB.

Levels of the Grammar

GB is structured into modules, such as Case Theory, Government Theory, X'-Theory and Binding Theory, which define the constraints on various aspects of syntax. Each of these modules should be simple, but their interaction should predict the complex array of facts we encounter in language. This philosophy is perhaps familiar to the reader as the ground rules for good computer programming. You divide your program into simple subroutines which when comboned can perform a wide array of complex tasks. In addition, the grammar has levels, at which each of these modules may or may not apply. Within most theories of GB, the structure of the levels looks like this (from Chomsky and Lasnik (1977)):

S-structure is the central level from which the others radiate out. They symbolize the conceptual locations where constraints of the grammar apply.


In most versions of GB, D-structure is the level which most directly reflects the structure of the lexicon. The lexicon is the list of morphemes (prefixes, suffixes and roots) and words in a language together with all the relevant information pertaining to them. The lexicon, for example, tells how each word is pronounced in isolation, what its suffix and prefix structure is, and in the case of verbs, what objects it requires, and what Cases it assigns. These syntactic requirements of words in the lexicon determine an underlying structure and word order (or D-structure) for each sentence. For example, some verbs are transitive, and require both a subject and object, and some are intransitive and require only a subject.

D-Structure to S-Structure

The word order in S-structure is the order the sentence actually appears in in the language. D-structure is mapped onto S-structure by the function Move-alpha. Move-alpha means more or less "move any element or constituent anywhere", or "take the D-structure and scramble it freely to map it into S-structure". That is, the default in a language is completely free word order. Move-alpha can be applied repeatedly to a D-structure in order to produce an S-structure. The history of applications of Move-alpha is called a derivation. English obviously does not allow completely free word order. Constraints on movement within a sentence are not imposed by Move-alpha itself, however, but by the modules of the grammar like Case Theory, and so on. Therefore most linguists assume that Move-alpha can be thought of simply as the expression of a relation between two arbitrary positions in a sentence. Some manifestations of Move-alpha are: (Consider 'i's and 'j's in strange places to be subscripted. Two different words subscripted by an 'i' are considered to refer to the same thing.)

a. [S[NPe] was hit the ball] (D-structure) (The e in [NPe] is called an 'empty NP' and marks a place not filled by a word.)
[SThe balli was hit t
i] (S-structure) (That t is called a trace, and marks the position from which the NP 'the ball' was moved)
Type of Movement: Passivization

b. John looked up the word. (D-structure and S-structure)
John looked the word up. (S-structure)
Type of Movement: Particle Movement (optional)

c. [S[NPe] sank the ship] (D-structure)
[SThe shipi sank ti] (S-structure)
Type of Movement: Unaccusative

d. [S'[SBill can see who]]? (D-structure)
[S'[Who]i [can]j [SBill tj see ti]]? (S-structure)
Type of Movement: Wh-movement

e. I know that man. (D-structure and S-structure)
That mani I know ti (S-structure)
Type of Movement: Topicalization (optional)

f. This is the man [S'[SI know who]] (D-structure)
This is the man [S'whoi[SI know ti]] (S-structure)
Type of Movement: Relativization

What distinguishes these various forms of Move-alpha is not the rule itself, but the constraints on the rule. We still tend to think in terms of these various types of movements and constructions (passive, topicalization, etc.). This is a hold-over from previous versions of generative grammar, but in principle these constructions do not exist as separate entities in GB.

The t's are called traces and mark the D-structure position of an element that has been moved by Move-alppha. The indexes (i,j) indicate which other positions in a sentence a word is to be associated or coindexed with, that is refer to the same thing. A series of elements coindexed as a result of Move-alpha make up a chain. These traces and indexes are part of the structural description of a sentence in GB.

Movements internal to S (a sentence), such as passivization(a), particle movement(b), unaccusative and middle movement(c), and many others are called bounded movement. Bounded movement in general rearranges the argument structure of a verb and is confined to a governing category. When an NP is subject to bounded movement within a clause, it is called NP-movement.

Movements external to S are called unbounded movement. Unbounded movement does not affect the argument structure (or what counts as subject, object, etc.) of the verb. It usually moves a constituent to a COMP position which is defined to be at the head of the constituent S'. The S' is a constituent which was not introduced until later in the history of generative grammar (by Joan Bresnan). COMP is the position at the front of a sentence or clause into which go question words (what, who, when,...) and auxiliaries (have eaten, am eating) in wh-movement (d), topicalization (e), relativization (f), and others. In the following sentences, the bolded element is in the COMP position:

What did he say the cost of that vase in the display window was?
That ridiculous hat I can't believe he still wears.

Bounded and unbounded movement are treated differently in most modules of the grammar. Bounded movement takes place before unbounded movement in the derivation of a sentence. That is, in creating a sentence, first you settle who did what to whom, and then you can focus, topicalize, and ask questions about the established situation.

Logical Form

Logical Form is that part of the grammar that expresses logical relations within the sentence. For example, sentence (d) would (coarsely) have the LF representation:

For which X, X a person, [can Bill see X]?

The wh-word 'who' is translated into a variable X. This particular variable is called a wh-operator, which is said to have scope over X. Logically this means that the wh-word determines the domain within the sentence within which X has force. The frontings of variables in LF are unbounded movements, but you never see them. They are theoretical movements, not real movements of real words.

Phonetic Form

Finally PF, or Phonetic Form, is the least discussed in GB. PF is the level at which the phonological properties of a sentence are determined.

Before we can introduce the Binding Theory, we have to introduce one more module of GB, namely Theta-Theory.


Theta-Theory concerns the relationship between thematic roles (theta-roles), such as 'agent' (the doer) and 'patient' (the do-ee), and the syntactic structure of the sentence (including 'subject', 'object', etc.). Theta roles are essentially semantic (involve meaning), and 'subject', 'object' are syntactic roles (involve formal sentence structure). The most basic principle of Theta-Theory is the Theta-Criterion, originally conceived of by Fillmore(1968):

Each argument (subject, object...) bears one and only one Theta-role, and each Theta-role is assigned to one and only one argument.

In GB, subjects and objects are defined structurally: subjects are the NP immediately dominated by S, ([NP,S]), and objects as the NP immediately dominated by VP. ([NP,VP]). Each lexical entry (for simplicity's sake, I'll use verbs) has associated with it a particular syntactic structure (for example, transitive verbs are followed by an NP), and a thematic structure (for example, the verb 'hit' has an agent (the hitter) and a patient (the hittee)). Objects of a verb get their theta-roles directly from the verb, whereas the theta-role of a subject is assigned indirectly, or compositionally, by the VP as a whole. The idea of the theta-Criterion is that every theta-role assigned by a verb must be assigned to some syntactic argument, and every syntactic argument must bear a theta-role.

There are numerous apparent counterexamples to the Theta-Criterion. For the sake of perspective, let me list here some of the major types of exceptions to the Theta-Criterion:

Passive - We have an intuited Agent thematic role, but no NP for the Agent:
The door was closed. vs.
The door closed. (no implied agent)

Implied Objects - We have an intuited object, but no corresponding NP:
John shaved (himself).

Pleonastics - We have an NP, but no corresponding thematic role:
It's snowing.
It's cold outside.
It's fun to play basketball.

Infinitives - A single element with two thematic roles:
John convinced Bill to play tennis.
(Bill is both object of "convince" and subject of "play".)

Parasitic Gaps - Again we have one element with two thematic roles:
This book is hard to put down without reading. ("Book" is object of "put down" and "read".)

Pro Drop - Optional subjects and objects in most languages but the Germanic ones. When the pronoun is missing, you have a thematic role with no corresponding NP.
Id\t. ((He/She/It) Goes.)

Imperatives - An implied subject with no corresponding NP:
Eat your dinner!

Across the Board Extractions - Again one element playing several roles in the sentence:
What were you looking at and wondering about?

Optional Arguments - The sentence where the argument is omitted must have a thematic role with no corresponding NP:
John ate (the fishpudding).


Let us consider again the inherent reflexive verb, which appears to assign both agent and patient theta-role to its subject:

John washed. = John washed himself. (inherent reflexive)
John ate. *John ate himself. (intransitive)

The way that this type of exception to the Theta-Criterion is generally handled within GB such that this sentence does not violate the Theta-Criterion is to postulate an empty NP following 'washed' which receives the patient Theta-role:

John washed [NPe].

These are empty NPs as are the traces from movement mentioned above. GB theorists spend a great deal of energy studying these empty categories. A fundamental characteristic of GB is that empty categories are defined as legitimate noun phrases and treated as such by the Binding Theory and other modules of the grammar. Empty categories can carry thematic roles and Case.

Now recall that we have a number of levels in the grammar. Every time you postulate a rule, you have to think in terms of which levels it applies at. The Theta-Criterion is assumed to hold at all levels of the syntax (D-Structure, S-Structure and Lofical Form) with the exception of Phonetic Form, where Theta-roles are held to be invisible. The constraint that the Theta-Criterion holds everywhere is called the Projection Principle.

The Projection Principle combined with the Theta-Criterion most strongly affects the nature of the grammar, and it distinguishes GB from all the other theories of grammar. The idea of the Projection Principle is that the thematic relations assigned in the lexicon have to be visible at every other level of the syntax. These two do not directly answer any of questions 1-4, but they form the backbone on which Binding Theory is formulated.

Binding Theory

Argument positions (A-positions) are subject and object positions which are governed by the head of the constituent, and A'-positions (A-bar-positions) are COMP, adjunct positions, and various post-sentential positions. Adjunct positions are those in which time and place adverbs appear after the object:

He was writing the book [under a tree] [in the garden] [during the afternoon].

The Binding Theory refers only to to binding from A-positions. In one fell swoop, it is GB's answer to questions (3) and (4) above, and also answers the parts of (1) left unanswered by Case Theory and Government Theory:

1. How do you map the head, subjects and objects into a sentence?
3. What sorts of semantic processes involve long and short range movement.
4. What words in a sentence refer to the same thing?

To understand Binding Theory, the reader must understand the notion of coreference. When two elements refer to the same object in the world, they are coreferent, and this is expressed in syntactical notion by giving them the same subscript:

a. Johni admires himselfi.

The higher NP 'John' is said to be the antecedent, and the lower NP, 'himself' is in this case an anaphor bound to the antecedent by coreference.

When a trace is left by an application of Move-alpha, as we have seen, the trace is defined to be coreferent with the overt NP which moved and is now in a new position:

b. Johni was removed ti from the list.

This coreference resulting from NP movement in sentence (b) is considered to be in essence the same as the coreference between two overt elements like in sentence (a), and the Binding Theory treats them the same. Thus the Binding Theory offers one solution to (a), which refers to coreference of overt elements, and (b), which defines the movement restrictions on elements in the sentence. Binding Theory A, which defines bounded coreference answers question (1), and Binding Theory B, which defines unbounded coreference, answers question (3).

One major requirement on binding is not stated in the Binding Theory, namely that antecedents must c-command (see below) their anaphors. c-command was first proposed by Tanya Reinhart(1976) as a constraint on movement, and was there defined as:


a c-commands b if a does not dominate b, and the first branching node which dominates a also dominates b.

This requirement is that elements must move upward in the syntactic tree during derivation, and that antecedents must be higher in the tree than the reflexives (like himself and yourself) that they bind. Antecedents are indeed also semantically superior to their reflexives:

a. Johni [was seen ti].
*ti [was seen Johni].

b Whoi [did John tell ti[to give the potatoes to Bill]]?
*John [told ti [to give the potatoes to whoi]]?

c. Johni saw himselfi
*Himselfi saw Johni.

A number of other facts fall out from the Binding Theory. Only the most obvious and elementary are mentioned here.

Thus Binding Theory characterizes the domains within which various elements of a language must find their antecedents, i.e. the domains within which they must be bound. Again, Binding Theory:

A. An anaphor is bound in its minimal governing category.
B. A pronominal is free in its minimal governing category.
C. An R-expression is free.

There is some discussion as to where the Binding Theory applies. The consensus seems to be that it applies at least at S-structure and probably also at LF.

Now let us define the empty and overt anaphors, pronominals and R-expressions in turn. Both overt NPs and empty categories can be anaphors, pronominals and R-expressions. Overt NPs are classified by their lexical form. For example, 'himself' is an anaphor. Empty NPs are defined functionally.

Binding Theory A:
Overt anaphors
are reflexives (himself, herself) and reciprocals (each other). Thus Binding Theory A as applied to overt anaphors allows the first sentence and disallows the second:

They saw themselves.
*They thought [S'that[She saw themselves.]]

Recall that the governing category requires a SUBJECT accessible to the element in question. When a subject is missing, as can be the case in an NP, the governing category gets bigger:

Mary saw a picture of herself.
*Mary saw Michael's picture of herself.

Empty anaphors are the trace of NP-movement. NP-trace is defined functionally as an empty category in an A-position which is not bound from an A'-position. Binding Theory A as applied to empty anaphors accounts for this:

Johni was seen ti.
The boati sank ti.

Recall that NP-movement is triggered by Case-absorption. In other words, 'seen' and 'sank' in these sentences absorb Case, and thereby force their objects to move to avoid a violation of the Case Filter. It is therefore assumed that NP-trace is Caseless.

Binding Theory B:
Overt pronominals
, subject to Binding Theory B, are pronouns. Binding Theory B applied to overt pronominals accounts for this:

*Theyi saw themi.
Theyi think [that [Mary saw themi]].

An example of an empty pronominal is pro, found in the empty subject position in languages like Italian and Russian which don't require an overt subject. (Goveryu = 'Speak' is a complete sentence in Russian. Because of the ending on the verb, one knows that the subject is 'I', but it never appears overtly.) Empty pronominals in A-positions are defined functionally as those which are either free, or locally A-bound by an antecedent with an independent theta-role.

Binding Theory C:
Overt R-expressions
are any other overt referring NPs. They can never be bound to anything according to Binding Theory C:

*Johni saw Johni.
*Johni thought that Mary saw Johni.

And empty R-expressions are traces left by unbounded movement, which are also variables. They are defined functionally as a category which is in an A-position, and locally A'-bound:

Whoi did John think that Mary asked Bill to fire ti?

Here the trace is considered free, because the question word 'What' is an operator, and the binding is therefore not coreference.

Control Theory

If we think in terms of two features, [+,- anaphoric] and [+,- pronominal], elements such as reflexives as (himself, herself), reciprocals (each other) and NP-trace are pure anaphors - [+anaphoric, -pronominal]. Pure pronominals ([-anaphoric, +pronominal]) are pronouns and the null element 'pro' which appears as the null subject in pro-drop languages. And finally, R-expressions (ordinary nouns) are neither ([-anaphoric, -pronominal]).

Control theory is the theory of the element PRO, which is [+anaphor, +pronominal]. Since it is both anaphoric and pronominal, it is subject to both Binding Theory A and Binding Theory B, hence both bound and free in its minimal governing category. This would result in a contradiction unless the element didn't have a governing category at all. Since Case is assigned under government, and overt NPs are subject to the Case Filter, a [+anaphoric, +pronominal] element can generally not be an overt NP. PRO is assumed to be the subject of infinitivals, which Chomsky(1981) argues is an ungoverned position.

John expected Bill [PRO to come].
John promised Bill [PRO to come].

PRO is not governed from outside its clause, since a maximal sentence boundary (S' -> COMP S) intervenes. Nor is it governed from inside the clause if we assume Tense to be the part of INFL which governs the subject. Thus PRO appears only as the subject of tenseless infinitival clauses. PRO acts differently from traces in a number of respects as well. Its antecedent has an independent theta-role. In fact, it need not even have an antecedent:

It's unclear [what [PRO to do]].

In this case, the reference of PRO is arbitrary. And PRO need not be c-commanded by its antecedent:

PROi to support himselfi, Johni got a job as a carpenter.

If PRO were not the subject of an infinitive, then the sentence:

Johni promised Mary [Sto make himselfi at home]

would constituent a violation of Binding Theory A, assuming S is the governing category for 'himself'. 'Himself' would be bound outside its governing category. But we do intuit that 'make oneself at home' has a subject in this sentence, and that subject is 'John'. We need an empty category in the subject position of an infinitival in GB to avoid a violation of the Theta-Criterion, which requires that that intuited subject must have an NP in the sentence as well as Binding Theory A, which requires that 'himself be bound inside the minimal governing category.

Unbounded Movement

Unbounded movement is almost always to a COMP position at the head of a clause or sentence. A question word can't move to a new A-position, because these positions are assigned a thematic role under government, which would conflict with the original thematic role assigned to the word and result in a violation of the Theta-Criterion, giving one NP two thematic roles. It has to move to an A'-position. Furthermore, it has to move to a c-commanding A'-position, which eliminates adjunct positions in higher clauses. This leaves COMP as the only available option. There can be several COMP positions in a sentence:

[S'[COMP ][SAdrian told Jeff[S'[COMPthat][Sam said[S'[COMP ][SWilliam ate the cookies]]]]]]

It is my understanding that at least at certain point in the development of GB, it had to be independently stipulated to which COMP position an element moved. In regular wh-movement, it moved to the highest COMP:

What did Adrian tell Jeff that Sam said William ate?

Whereas in other instances, it stopped at intermediate COMP positions:

One can never tell what William is going to do next.

In addition to the constraint on movement to COMP, unbounded movement has other constraints. The primary one of these is formulated in GB as Subjacency.

Bounding Theory

Subjacency - A'-binding may apply across at most one bounding node.

Bounding nodes are subject to parametric variation - that is, speaking loosely, they are language dependent. In English they are generally assumed to be NP and S, whereas in Norwegian and Italian, they are NP and S'. Subjacency is used to account for facts like the following:

a. [S'Whoi did [SJohn believe [S'ti that [SBill saw ti]]]]?
b. *[S'Whoi did [SJohn believe [NPthe claim [S'ti that [SBill saw ti]]]]]?

In (a) the relationship between 'who' and 't' violates Subjacency. Therefore the trace must be present in the lower COMP position to intervene in the relationship between 't' and who. In this manner, COMP acts as an 'escape hatch' for Subjacency. In (a) neither the relationship between 'who' and 't', nor the relationship between the two traces violates Subjacency. In (b), however, the relationship between the higher trace and 'who' still violates Subjacency, so the grammar marks it as ungrammatical.

Subjacency is used to account for some of the facts previously accounted for by Ross'(1967) Island Constraints, including the Complex NP Constraint, which blocks movement out of the structure [NP[S'___]], and the Sentential Subject Constraint, which blocks movement out of a sentence in subject position.

Cross-Over Phenomena

Consider the sentence:

John saw himself
-> *Whoi did Johni see ti?

in which 'John' and 'who' are coreferenced.

GB explains this violation thus: This trace is bound both from an A-position and from an A'-position. Therefore from the perspective of the Binding Theory it is both an anaphor and an R-expression, hence both A-bound and A-free, a contradiction. Thus the sentence cannot be assigned a structural description and is marked ungrammatical by the grammar.

As we can see from the above example, classifying wh-traces as R-expressions helps us handle the constraint on unbounded movement that disallows an NP from crossing over its antecedent. These facts were discussed in detail in Postal(1971). Wh-trace is generally assumed to be Case marked.

Limitations on Extraction from Subject Position

There is a type of government called proper government which differs a little from the government discussed under which Case is assigned, and which is required to form a governing category. Only overt categories such as verbs and prepositions can properly govern their objects. INFL is not a proper governor. Basically what this means is that objects are properly governed, but subjects, generally speaking, are not. GB has a constraint that movement can take place only from properly governed positions:

Whoi did you think that Bill saw ti?
*Whoi did you think that ti saw Bill?

To express this fact, we have the Empty Category Principle (ECP):

Traces must be properly governed.

This means that movement can take place from subject position only if the subject is properly governed by some other element in the sentence, such as a matrix verb:

Who do you believe -G-> t married Bill?

An adjacent coindexed trace in COMP also counts as a proper governor:

Whoi did you tell Bill [S'[COMPti] -G-> [ti would come]]?

Sample Analysis

This is the core of GB. But there are hundreds of minor rules, constraints and considerations not discussed here. For example, COMP is held to have quite a complex structure allowing for two elements under certain conditions. More of the basic terms and definitions are included in the Appendix.

Let me analyze a particular construction to illustrate how this grammar is used. This is taken from Chomsky's (1982) analysis of 'parasitic gap' constructions. A parasitic gap construction is one in which an operator appears to bind two gaps:

a. Whati couldn't [SJohn [VPput ei down][PPwithout finishing ei]]]?

Some of the properties of the parasitic gap construction (noticed by Engdahl(1983)) are:

1. Neither of the gaps c-commands the other:

b. *Whoi did ei [like ei]? (meaning Who liked himself?)

The theory outlined so far predicts that this should be the case. If one of the gaps were c-commanded by the other, it would be A-bound by that gap. However, a parasitic gap is also by definition an R-expression, since it is a variable bound to a wh-word. R-expressions by Binding Theory C must be A-free. Therefore neither of the gaps may c-command the other.

2. One of the gaps generally violates Subjacency and is replaceable by a pronoun (or removeable altogether). The other gap does not violate Subjacency and is replaceable by anything:

c. Whati couldn't [SJohn [VPput ei down][PPwithout [Sfinishing iti/his tea/ti]]]?
d. *Whati couldn't John put iti down without finishing ei?
e. Here is a mani [S'whoi [NPeveryone [S'whoj tj knows ti] likes ti]]

The wh-word can be moved from only one gap. The other gap (which is replaceable by a pronoun) is the base generated parasitic gap (i.e. it was already there in D-structure). Since Move-alpha is subject to Subjacency, only the trace left by Move-alpha needs to be subjacent to the wh-word. The parasitic gap is A-free, so it cannot be an anaphor (NP-trace). It is in a governed position, so it cannot be PRO. It is A'-bound, so it is not pro. Therefore it must be a variable (R-expression). Since variables are inherently non-referring, it must be bound to something in an A'-position, and the only available binder in (c) is 'what'. This explains why one wh-word can bind two gaps. Since the variable is base generated, it need not be Subjacent to its antecedent. However, (d) is ungrammatical, because the wh-word cannot be base generated in COMP (D-structure being a reflection of the lexicon). Therefore 'what' must be moved by Move-alpha from the only available gap. This movement violates Subjacency, so (d) is ungrammatical. Reasons which I won't go into are given why we can't just randomly base generate gaps that are not Subjacent to the operator... the parasitic gap has to license this base generated gap.



GB consists of a number of assumptions like the Case Filter, which says that every overt NP must have Case. The basic facts of language are held to fall out from these assumptions. They are assumptions, because their truth or falsity cannot be tested directly and independently - they have to be argued for within the context of the whole grammar, and only their consequences are observable. The theory, which is made up of this big structure of assumptions, is held to have independent existence, probably in the mind, and a causal relationship is set up between the theory and the data. It is in this way that the questions 1-4 are answered in GB. !The analysis is not preoccupied with a mere characterization of, but with an explanation for the phenomenon, by showing that the existing grammar is compatible with the characterization!

The idea is that the grammar should tie this phenomenon in with a number of other apparently unrelated phenomena. So what matters is whether our grammar which explains the characterization is the right one, whether it ties in many apparently unrelated phenomena. Almost all GB papers argue on abstract grounds that given a broad characterization of a phenomenon, one or another analysis is preferable. For example, one paper might argue that modals are not verbs by listing ways in which they differ from verbs. And another paper might argue that they are verbs by listing ways in which they act the same as verbs. One might argue that S' is a bounding node for Subjacency, and another might argue it is not.

For example, a simple characterization of the above phenomenon of parasitic gaps would just state that in a case where one wh-word has two thematic roles, 1) the two relevant empty categories as defined by GB do not c-command each other, 2) one of the empty categories violates Subjacency relative to the wh-word, and 3) the empty category which violates Subjacency is can optionally be replaced by a pronoun. This characterization, if correct, is sufficient to make all necessary predictions relative to wh-words with more than one thematic role. But the characterization has been made in the terminology or 'mindscape' of GB. And this mindscape represents the essence of GB which will be preserved in time. But if what matters is predictive power, we must ask why we then needed the grammar to analyze the characterization?

In the first place, I would like to make one thing quite clear... I never explain anything.
Mary Poppins

Observations Concerning Government and Binding Theory

Surface Grammar

I would like to introduce a different methodology and philosophical base from which to do linguistics. I will call it surface grammar. It is not a new idea by any means - just a reformulation of some rather ancient ones. This methodology will allow us to reformulate and 'clean up' GB without loss of predictive or explanatory power. Before I begin, let me mention that many of the thoughts presented in this section have been outlined in Dyvik(1982). Surface grammar is grammar which is subject to the Transparency Principle:

Transparency Principle - Every distinction or concept in grammar must either be intuitively obvious or defined in terms of elements that are intuitively obvious.

Intuitively Obvious - A distinction or element is intuitively obvious if everyone can agree upon whether or not any given phenomenon represents an example of that distinction or not.

Informally, surface grammar is grammar in which all the principles are directly falsifiable. In this sense, it is purely descriptive of surface facts. A non-descriptive element in the grammar cannot be observed or intuited directly from the surface. Nor is it defined within the grammar in terms of anything intuitively obvious. Surface grammar is grammar with no non-descriptive elements.

In surface grammar, then, everything is as it seems. I will try to make clear that everything that does not fit the criteria of surface grammar is superfluous in that it cannot be shown to be true, it has no predictive power, and can never make the grammar smaller, only bigger.

In order to speak at all, of course, we have to use words or concepts that are either intuitively obvious or defined in terms of other concepts that are intuitively obvious. This means that the Transparency Principle only makes explicit the requirement that we are by necessity operating under anyway. Since GB can be understood at all, it can be reinterpreted surface grammatically by making its assumptions explicit. Non-descriptive elements can be thought of as assumptions and in surface grammar are reinterpreted as definitions.

The GB grammar just presented is not surface grammar. The Case Filter, for example, is not intuitively obvious. This is due to the fact that we can't look at an English NP - even if it is overt and in a sentence - and see if it has accusative Case. Accusative Case, unlike traditional accusative case, by definition cannot be seen, heard or intuited. The only way we know an NP has accusative Case is because the Case Filter said it had to. The Case Filter defined it to have accusative Case. A claim that an NP has abstract accusative Case cannot in principle be directly falisified.

Since Case is non-descriptive, every principle that refers to Case automatically becomes non-descriptive. Any grammar containing at least one non-descriptive element or principle is therefore also a non-surface grammar. Non-descriptive elements draw everything they touch into this domain of unfalsifiability.

This invisible structure which includes the Case Filter and which is held to cause the data is illusion. It is this we intend to clear away in our reinterpretation of GB. In a surface theory, there can be no such abstract semantic or syntactic representation. A sentence is the semantic representation of the thought it expresses. Since we have no verifiable test for what the meaning of a sentence is or what the structural description of a sentence is, we would have no direct way to verify whether some abstract representation is the right one. Thus whereas GB is a grammar, surface grammar is simply grammar. It is not a thing, but an activity.

Unlike GB, Surface Grammar neither tries to capture how we actually produce speech, nor is it concerned with what we know consciously or subconsciously. It is only concerned with description of the phenomenon of language. In trying to make predictions about the human mind using sentences as data, transformational grammar is confusing levels. I think Heidegger would say that sense can't be derived from facticity. Put another way, if we examine sentences, we will only be able to make predictions about sentences, but not about the mind. Surface grammar implies that the description of the data does not have independent existence, and does not cause the data.

The object of study is, after all language, and not grammar. Grammar is only our conception of language. In studying grammar, we are studying the theory we just made up. We have used the data to determine the theory, and then considered the theory to be the cause of the data. To the extent we are studying grammar, we are studying the contents of our own heads, and have thus divorced ourselves from reality. If grammar is seen as truly causing language, then it would make sense to think of it as an object of study. But as soon as you realize this grammar is hypothesized from beginning to end and that you have no means to establish that it exists, then it only makes sense to study what you can see, hear, and intuit directly.

When you just describe language as it is, you are forced to look at what generalizations really are there. You are forced to look at language, and not argue about what you think the causer of language might be like. You look at language, not at a concept you have about language. This places language again on the throne where it should be. The stance toward it is once again reverent.

Abstract Arguments and Proofs

Clearly we can never absolutely prove any positive generalization to be true, be it descriptive or not, because the data over which you make a generalization is infinite. (Herein lies the distinction between empirical science and mathematics. In mathematics, things can be proven true.) If you make the generalization that every time I drop my pencil from my desk, it will fall down, the day may come when it levitates or rises. Observable generalizations can be proven false by providing a counterexample.

Anything unobservable, however, can also never be proven false. You can in principle not prove the Case Filter true or false. It is defined in such a way that there can be no example or counterexample. Therefore we 'argue' abstractly on theory internal grounds for the existence of this abstract structure which causes the data. But we argue without in principle ever having the hope to prove anything. That is, the best we can hope for in the unfalsifiable domain is to convince someone of the likelihood that some representation causes the data. It is a foregone conclusion that we in principle never will prove anything. In the falsifiable domain, we can, however, prove whether a generalization is the case for all known data.

Consider generalization S, which says that the sun rises in the east and sets in the west. If tomorrow it rises in the west and sets in the east, we will have to rewrite our theory of cosmology, since our old theory has a counterexample. The difference between generalization S and most of the principles of GB is that S is an observation, whereas most of GB is not. GB is made up of assumptions. This means that our evidence in favor of S is in the form of repeated observations (sunrises and sunsets). Surface grammar is grammar made up of observations. Our evidence in favor of GB is not an observation - it is in the form of linguistic arguments.

An argument can take any number of forms. For the most part, we argue by analogy:

Modals (will, can, must, may,...) behave differently from main verbs in the following ways... Therefore they are not main verbs.

These types of arguments are always concerned with the ontological status of some abstract concept. They are always about the reality or non-reality of certain aspects of this unobservable underlying structure. The question as to whether not modals really are verbs and whether there really are four abstract Cases in English is the wrong question. The question is, "What definition of 'modal' and 'verb' will result in the most straightforward description of the phenomenon?" Once that is decided, it will be easy to decide whether our particular grammar defines modals as verbs or not. This may seem like an innocent enough imprecision, but its consequences are particularly pervasive and insidious.

We can also argue for underlying principles on other grounds. Subjacency is formulated as a constraint on Move-alpha. The fact that Subjacency holds implies that Move-alpha is a rule in the grammar. Again the concern is whether there really is Move-alpha or not. The argument is for the ontological status of our own theory. The structure of the grammar that we wrote begins to force us further and further into specific analyses of phenomena. In this case, we no longer merely describe what is the case - what constructions exist and how they are related semantically - it becomes critical that we describe it within the confines of a particularly thought system.

If, as in surface grammar, every principle must be verifiable without reference to other principles, then your theory never forces you to say anything. At any moment you are then free to call it like you see it using any concepts that suit your purposes, because you are no longer building a theory. You are just describing what you see. Since you are studying language and not grammar, it is not your theory which has to be coherent. Rather, you are in the process of understanding the coherence in the phenomenon of language, and are free to use any means that work.

We can give descriptive evidence for or against things that we have not yet observed - things that are observable in principle. This is done by extending repeated observations into other domains, such as the future. For example, I can testify to the fact that the sun has risen in the east for the past 26 years. Thus I am predisposed to believe that it will also rise there tomorrow. The fact that all languages I know of have vowels is evidence in favor of all other languages having vowels. But we cannot argue descriptively for things which we in principle can never observe.

Deep and Observable Generalizations

The more general a principle is, the 'deeper' it is. 'Deep' in this context is sometimes understood as being further from the surface, i.e. less directly falsifiable, rather than simply capturing more facts. A generalization can be clearly descriptive, yet profound and general. For example, a more profound generalization than S is that the earth rotates toward the east in an elliptical path around the sun (generalization S'). Generalization S' can be said to 'explain' generalization S, as well as a number of other phenomena, such as the motion of the stars relative to us, winds, seasons, and so on. But still S' is descriptive. We can fly into outer space and watch the earth rotate. If it doesn't rotate as defined in S', we have falsified S'. S', unlike the Case Filter, can be directly falsified. And S' is not trivial.

If all statements in a theory are descriptive, then we have no need of linguistic arguments, only examples and counterexamples. That is, we need only data, not arguments to support or falsify a theory. If a sentence then looks like a counterexample to the Theta-Criterion, it is a counterexample. Notice that if every apparent counterexample to a theory is a real counterexample, the job of constructing a grammar becomes much more difficult. It becomes harder to find generalizations, because you actually have to go through reams of data and find correlations, and you can't get rid of even the smallest counterexample other than by rewriting everything from the ground up.


GB's stated goal is to explain language. Consequently, its purpose is not simply to characterize any given language, but to explain why the language is this way. The explanation comes in the form of a theory. If the underlying principles are true, then they explain the data by causing it to be the case.

Notice that surface grammar does not 'explain' things. All its 'explanations' are in terms of more general observations (in the sense that S' explains S). And this process of explanation is regressive and as infinite as the data we are describing. The result of this infinitely regressive descriptive process is a big description of the universe, nothing more.

Notice that a non-surface grammar also involves an infinitely regressive process of explanation. Binding Theory may be thought to explain coreference facts, but we are still lacking an explanation for Binding Theory, and so forth. The result of the infinitely regressive process of non-descriptive explanation is an invention of infinite explanations for explanations for why language is as it is. The explanations in a non-surface grammar are in a causal relationship to the data. The are not simply more and more general descriptions of the same phonemonon. In surface grammar, there is no causation implied. All the extra invented explanation over and above the description is unnecessary to our understanding, because we have no way to show it is true. Fortunately, GB does not only explain language. It also characterizes language before it explains it, and its characterizations are its essence.

Shorter Theories

True predictive power in a theory takes the form of observable correlations concerning observable data. Non-descriptive explanations explain no more than descriptive explanations of the same facts. Therefore, the only reason we might prefer a non-surface grammar to a surface one is if it is shorter or more elegant in some way. However, a non-surface grammar can never be shorter than corresponding observations in surface grammatical form. It has to do the description required of surface grammar, and then it has to explain the description.

Those parts of a grammar that are extraneous have two characteristics: 1) They are exclusively used to explain 'why', and 2) They are unobservable and unintuitable. By means of unobservable entities, we can in principle generalize anything to anything. So the value of a theory should not be judged in terms of its unifying principles, but in its compactness and falsifiability.

Otherwise, given two variants of a grammar, so that predictive power cannot distinguish them, the one we choose will depend primarily on what we use the grammar for. Some computer programs are smaller and slower, and some which do the same thing are larger and faster. Which you choose depends on whether size or processing time matters more for a particular application. But here we are already in the realm of engineering - application - and not scientific truth. That is, whether or not we use filters or generative things by means of rules is not a matter of Truth, but of what is more convenient for a particular practical application.

Notational Variants

Our problems would be solved if the data so restricted the set of descriptively adequate grammars so as to leave us with only one. But this is clearly impossible. Given one descriptively adequate grammar, we create others in one of two ways.

By the first method, we can rearrange it mathematically/combinatorically so as to produce another which conforms to exactly the same output. Suppose F=ma is part of the "grammar" of physics. We can change the grammar but maintain the same output by replacing F=ma by a=F/m. These two grammars of physics are called notational variants in linguistics. In fact, any two grammars which are descriptively adequate over the same set of data must be notational variants of one another, that is mathematically reducible to one another. Feynman(1967) points out that alternative formulations of a given phenomenon can give very different views of what is "really" going on, and depending on which we choose to work from, we will be predisposed differently toward further investigation.

The other way of generating an infinite set of grammars producing the same output from any given descriptively adequate grammar is with the help of imagination and non-descriptive elements. Earlier we have taken unobservable elements out of parts of the grammar. We could also put them in. I found the following excerpt from Rogers(1960), which describes the method. Faustus is arguing for the demon theory of friction:

You: I don't believe in demons.
Faustus: I do.
You: Anyway, I don't see how demons can make friction.
Faustus: They just stand in front of things and push them to stop them from moving.
You: I can't see any demons, even on the roughest table.
Faustus: They are too small. Also transparent.
You: But there is more friction on rough surfaces.
Faustus: More demons.
You: Oil helps.
Faustus: Oil drowns demons.
You: If I polish the table, there is less friction, and also the ball rolls farther.
Faustus: You are wiping the demons off. There are fewer to push.
You: A heavier ball experiences more friction.
Faustus: More demons push it, and it crushes their bones more.
You: I cannot feel them.
Faustus: Rub your finger along the table.

The demon theory of friction must still include the description that the standard theory of friction gives. It must still account for the fact that oil reduces friction, that heavier things move more slowly, etc.. But it must also add a great deal of detail about the personal lives of the demons. That additional explanation cannot in principle make the theory shorter. And that holds true also of a great many theories of science which contain elements that only explain without helping to describe. Linguistics is not alone in making this mistake. Rogers was writing for physics students... there are theories of physics that make the same mistake. Clearly what we are interested in is the set of descriptively adequate grammars which don't have demons.

Rules, Filters and Linguistic Processes

We distinguish between rules and filters in linguistics as means of expressing generalizations. Rules are generative by nature. They change something into something else. A filter is static. It simply accepts, rejects, or states a correlation that holds statically between two or more elements. The actual process of change in generation can not be observed to exist, but only defined to exist. Therefore in surface grammar, rules are reinterpreted as filters.

In GB rules map sentences from one level to another. In surface grammar, there is no such mapping, so the only level of the grammar is the surface, hence the name. All of the principles or filters hold on this level. Only in this way can the grammar be completely falsifiable. Any levels deeper than the surface level are not directly observable, and hence these levels are derived levels - they are defined by the Transparency Principle out from the surface structure, since only that which is on the surface (i.e. directly observable) is directly falsifiable.

Linguistic Frameworks

Surface grammar is something one practices, not something one writes. If one is creating a grammar which is held to really exist, rather than simply trying to come to an understanding of language, then someone must have the right grammar and someone must have the wrong one. If the grammars are unfalsifiable, there is no way to prove which is right and which wrong. The result is war. There are the linguists who are creating GB, and those creating Relational Grammar and Lexical Functional Grammar, and so on and so forth. If GB is right, the others must be wrong. One argues abstractly that one is more likely to be right, without in principle ever being able to prove it. This causes the battle to wage on and on.

However, if you are simply writing observations about language that can be directly falsified, then there can be no grounds for argument. If a counterexample to your generalization is found, it is wrong. There's still nothing to argue about.

That is to say, the unfalsifiable domain is divisive in nature, and the falsifiable realm, which I like to think of as the truth, is uniting in nature.

In the next two sections, I will try to strengthen the reader's intuition about what surface grammar is by means of two examples.

Example 1 - Einstein's Clock

Einstein is reported to have said that the universe is like a big clock, and the job of the scientist is to determine how it works. Suppose, in fact, that we have a big clock, but that we in principle can only see the face of it. Our job would be to find the simplest explanation of how the clock works.

We might begin by noticing that it is round, has numbers on the face and has three hands. Then we could notice that the second hand goes around 60 times for every once the minute hand goes around. Similarly, the hour hand goes around once for every twelve times the minute hand goes around, etc.

We could make several hypotheses:

1. There are three gears in back, each attached to a hand. The circumference of the hour hand gear is twelve times that of the minute hand gear, and that of the minute hand gear is sixty times that of the second hand gear. The circumferences of all these gears is attached to another gear that rotates at a constant speed.

2. The gears attached to the hands are all the same size, but the circumferences of the gears are attached to different gears which rotate at various speeds.

3. Every time the gear for the second hand goes around, it pushes the gear of the minute hand one sixtieth of a rotation, and every time the gear for the minute hand goes around, it pushes the gear of the hour hand one twelfth of a rotation.

We can make several observations about these hypotheses. First, they make different predictions. Hypotheses 1 and 2 predict the hands will move continuously, and hypothesis three predicts that the minute hand will make little jumps each minute. Second, we observe that it would be difficult to construct and maintain a clock like (1) without some modifications. A clock like 2 at first looks easier to build, but it's unclear how the hands move at different speeds.

In fact, it's not obvious how any of the gears move ultimately, just as it is not clear how anything is or moves at all.

Given this, we see immediately the solution to Einstein's clock. The simplest explanation of how the clock moves is simply a description of the face of the clock. The simplest explanation of how the clock moves is not anything like 1, 2, or 3. The simplest explanation is that there is nothing on the back side of the clock. We can make all kinds of hypotheses about what is really making the clock run, but ultimately, we will still have no idea. And any hypotheses we make about the back of the clock based on other clocks we know must involve demons - gears we will never see.

So our explanation of how the clock runs is just a description of the clock. The clock can be described in many ways. We can say that the hour hand moves 1/12 as fast as the minute hand, or we can say that the minute hand moves 12 times as fast as the hour hand. Once a generalization of this kind has been made, we can use them by assuming they'll hold in the future to make predictions. For example, we can use the clock to tell time. Generalizations make descriptions shorter and more complete, but they do not ultimately explain why the clock works. The best description is one that makes the greatest number of generalizations. However, if a description misses a generalization, it isn't a full description. Some facts seem more significant than others. For example, it seems more significant to us that the hands move at proportional speeds than that there is a speck of dust over the number '2'. To the clock, however, these facts are equally relevant or irrelevant.

In the case where we can turn the clock around, we would apply the same methodology as we did to determine how the face of the clock worked, but we would use the additional data presented by the mechanisms on the back. This additional data would allow us to make further predictions. We believe we can determine the structure of the gears on the back, even if we will never seen them. In attempting this, we always create demons/illusion.

Example 2 - Kepler's Laws

Kepler formulated 3 laws of planetary motion based on the data collected by the Danish astronomer, Tycho Brahe. It was later discovered that the laws didn't quite work. The orbit of Neptune was slightly irregular. However, this irregularity would be predicted by Kepler's laws if a planet were found of certain dimensions at a certain point. Astronomers turned their telescopes that way, and sure enough, found Pluto.

But what if they had found no planet? We could, of course, maintain Kepler's laws by demons, by inventing an invisible planet. What is its status? We would have no way to prove its existence. If we postulate that it exists, we have cut back on the predictive power of Kepler's laws. We would have a principled way of maintaining Kepler's laws no matter how many exceptions we find. In such cases, we should simply document the fact as an exception. Kepler's laws don't hold - simply. They may still be very useful, as are Newton's laws, although they have been shown to have exceptions which are observable at speeds approaching the speed of light.

So when we encounter counterexamples, we can proceed in one of two ways:

1) We can allow our laws to hold both of perceivable and invisible elements and postulate the existence of elements which are in principle imperceivable. (These invisible elements are those which can in principle neither be perceived not intuited... we can, for example, make judgements about the meaning of sentences although we don't perceive meaning through our senses.)

2) we can simply list the counterexamples. Time may then show that these counterexamples can be classified nicely.

If we choose the first option, we replace an object by a concept, and confuse our theory with our data. That is, we are no longer clear about what our object of study is. Thus it is not coincidence that this kind of linguistics views grammar and not language as the object of study. Grammar is obviously our theory of language, no matter how much we insist that we've argued for it so beautifully that it must really exist out there. When we cut to the quick, it is still obviously a product of our own minds. It is not an object in the world. Linguists don't observe grammars. Linguists write grammars. Positing grammars as the object of study is an obvious case of bootstrapping.

We will either find a way or make one.

Reanalyzing Government and Binding Theory

Using this philosophical base, let me now examine the structures of GB outlined above in detail. The purpose of this analysis will be to show which parts of the grammar lie within the unfalsifiable realm, and to prepare the ground for reinterpreting GB into surface grammar in the last section.

The Theta-Criterion

The Theta-Criterion is a descriptive principle. It can be directly falsified. Basically it states that the number of NPs we find in a sentence exactly matches the number of thematic roles we intuit. We mentioned a number of counterexamples to the Theta-Criterion. Consider the inherent reflexives which constitute a surface grammatical counterexample:

1. Aaron washed. = Aaron washed Aaron.
2. Aaron ate. Aaron ate Aaron.

In cases like (1), both the subject and object thematic roles appear to be assigned to the subject position. Frequently such sentences are analyzed as having an empty category in the object position which receives the object thematic role:

3. Aaroni washed ei/himselfi.

This empty category is not observable. However, we treat it just like an ordinary observable NP. By giving an observable and an unobservable element the same status, we avoid a violation of the Theta-Criterion just like we did with Kepler's invisible planet. In avoiding a violation of the Theta-Criterion, we make the Theta-Criterion itself unfalsifiable, and therefore non-descriptive. We can no longer use sentences like (1) as evidence for or against the Theta-Criterion, because we have invented an unobservable way of making the counterexample not a counterexample. We now have a principled and untestable mechanism for handling every counterexample to the Theta-Criterion in which two or more thematic roles are assigned to a single argument position - we invent an invisible argument position and coindex it with the visible one.

The more we do this, the more impossible it will be to test the Theta-Criterion. We have defined the Theta-Criterion to be true by redefining NPs from what we always knew them to be to something which exists whenever the Theta-Criterion doesn't work. We then use these counterexamples as "abstract arguments" for the existence of the empty NPs in those constructions. We can in principle never show to be true what we define to be true.

The pattern we see then is this: A counterexample is found to a generalization. The terms of the generalization are redefined to create an entity some of whose member are abstract and some of whose members are not. By this redefinition and reanalysis of sentences, the generalization true again, and the status of the generalization is changed from a prediction to a definition. And at this point, the theory is also mushed in with the data and simultaneously granted independent reality and the power to cause.

Once many of the present grammars have been rewritten as surface grammars, one finds that there are terms defined that are never put to surface grammatical or purely descriptive use. This is due to the process I just described in which predictions are turned into definitions by changing their status to handle counterexamples. We can see that when something that once was a generalization relating two phenomena is changed into a definition which defines a phenomenon to conform to the old generalization, then the new definition is not automatically used for anything within the grammar, and it can easily become what amounts to an unused variable... like in a computer program, a variable that is defined, but never used in any calculation. Formally, pure explanations always have that status. A definition that is never used can be eliminated with no loss of predictive power.

Binding Theory, Empty Categories and Predication

Since empty categories are not overt and cannot be intuited directly, they cannot be axiomatic categories, but they must be defined. GB defines them functionally in terms of the environment in which they are found, which is in spirit compatible with the requirements of surface grammar.

What we are really trying to get at with empty categories is some understanding of the counterexamples listed above to the Theta-Criterion. What we want to understand is given that we intuit certain subjects and objects in a clause, how do we tell where they will actually occur in the sentence? What generalizations govern this mapping?

In the cases where there is an intuited object or subject which does not appear on the surface in the expected position next to its governor, we define an empty category in this position. This is a notation. If the empty category is in subject position, we are expressing that there is an intuited subject associated with this verb. If there is an empty category in object position, this is a notation expressing there is an intuited object associated with this verb or preposition. When the arguments don't actually occur in the predicted position, we ask what it is that determines where they do occur by postulating binding relations and constraints on movement. These generalizations are all stated in terms of syntactic constructions.

Williams(1980) and Hellan(1988) follow a different tack. They suggest that it is a semantic property, namely predication, which determines what arguments of a governor occur where. A sentence is constructed around a base word, and then lots of descriptors are added to that word. Very informally, the subject is what is described, and the predicate describes it. So in the following examples, the bracketed constituents are predicated of 'the man'.

the man [in the park]
The man [walked in the park]
the man [eating ice cream]
the man [for the job]
the man [to help in a situation like this]
I considered the man [very helpful]

We could formulate a version of the Binding Theory based on predication which says that an anaphor is bound to the subject of the predicate it finds itself in. Hellan and Williams do not formulate it as flippantly as this, but my thoughts are inspired by theirs. If Binding Theory were stated purely in terms of predication, we would have no need to postulate PRO, for example, to make anaphoric binding appear to be limited to a syntactic constituent. Recall the examples:

Billi promised Mary [to take care of himselfi]
Billi promised Mary [PROi to take care of himselfi]

PRO makes it appear that the binding occurs in the syntactic category rather than in the predication. By postulating this invisible planet, PRO, we have blinded ourselves to what is really going on, namely that binding is not occuring inside a syntactic category - the governing category - but inside a semantic category - the predication. (The fact that the 'explanation' shifted from syntax - purely formal structures - to semantics is not a fluke. It will happen every time, because the theories that we try to preserve from destruction by positing the existence of undetectable planets or empty categories will always refer to form (syntax) rather than content (semantics).) Also, the fact that an antecedent must c-command its anaphor, would follow from the fact that subjects in general appear next to their predicates.

Looked at another way, GB asks the right questions, but gives illusory answers. GB asked the question, "What are the limitations on the location of antecedents for anaphors like 'himself' and 'each other'?" And it answers the question artificially in terms of syntactic constructions using empty categories to make it appear that anaphoric binding stays within a syntactic constituent. But the right answer is not syntactic at all, and by inventing PRO to preserve the Theta-Criterion from having a counterexample, we have made the Theta-Criterion unfalsifiable and blinded ourselves to the fact that the thing driving restrictions on coreference has nothing to do with syntax, but with a semantic phenomenon - predication.

Put another way, postulating empty categories like PRO makes it possible to formulate the Binding Theory in terms of syntactic categories. Since we have a formulation that pretty much accounts for the facts, we think we have found the answer, and are thus blinded to the fact that we are not looking for what is really going on. If we disallow elements like PRO, we are forced into a different formulation of the facts, and only then do we see that it is a semantic category, the predication, and not a syntactic category, which defines the domain for anaphora.


The most basic English data that the idea of NP-movement is expected to address is the following: (A star in front of a sentence indicates that it is ungrammatical.)

1. Johann heard the music.
2. The music was heard (by Johann).
3. *The music hears.
4. *It hears the music. (an empty or pleonastic it, as in 'It's cold'.)
5. *The music hears.
6. Johann bakes the bread.
7. The bread is baked (by Johann).
8. The bread bakes.
9. *It bakes the bread.
10. *Bakes the bread.

An account for this within GB runs as follows. Assume we have the following constraint:

A. The Case Filter - Every overt NP must have Case.

Assume also the following process occurs:

B. Case Absorption - Passive morphology, as in (2) and (7) and unaccusatives (8) (as well as some other morphological processes not mentioned here like middles, ergatives and some causatives) absorb accusative Case.

Assume finally:

C. Accusative Case - is assigned to the right in English by verbs and prepositions to their objects.

Then in sentences (1) and (2), the verb 'hear' assigns accusative Case to 'the music'. In (2), passive morphology absorbs the Case. Then the direct object 'the music' is left in an un-Case-marked position, violating the Case Filter:

11. *Was heard the music.

Assume that we have the rule:

D. Move-alpha - moves any constituent anywhere (subject to other constraints)

The constituent 'the music' is then free to move to any Case-marked position. The only Case-marked position in (12) is the subject:

E. Nominative Case - is assigned to the position governed by AGR (usually the subject)
F. AGR - is the agreement element found under the node INFL.

The reason AGR assigns nominative Case is that it appears that nominative case correlates with verb agreement in languages of the world. A-F account for grammaticalness in (1), (2), (6), (7), (8). However, the object to subject movement violates a principle in the grammar which has been instituted for independent reasons:

G. Theta-Criterion - Every argument is assigned to exactly one thematic role an every thematic role is assigned to exactly one argument.

When 'the music' moves to the subject position, it carries its object thematic role. Subject thematic role is assigned independently to the subject position. The subject position then gets two thematic roles, a violation of the Theta-Criterion. However, the passive and unaccusative sentences in (2), (7) and (8) are grammatical, so we expect no violations. The following generalization, attributed to Burzio accounts for this:

H. Burzio's Generalization - A thematic role is assigned to the subject if and only if Case is assigned to the object.

For now, assume H is formally a part of the grammar. Then not only passives and accusatives are accounted for, but sentences (4), (5), (9), and (10) are also marked ungrammatical as they should be. In these sentences, the Theta-Criterion requires an NP with a subject theta-role, and in none of them do we find the required subject. Thus the sentences are marked ungrammatical by the Theta-Criterion together with the requirement:

I. English sentences must have a subject.

H did have to be added to the grammar solely to account for this set of facts, but all of the other principles are used in conjunction with other parts of the grammar to account for other types of phenomena. Furthermore, these other principles are simple and believable.

The intuition behind GB, as I see it, is that there is a set of very simple and broad principles which don't have to refer to specific constructions, but whose interaction accounts gracefully for the phenomena in the language. This is a large advance from the previous incarnation, transformational grammar, which depended on masses of smaller rules. TG also was a big advance from earlier work. Transformations caused us to think in terms of syntactic constructions and the relationships between them. In this way many new constructions were observed and analyzed, and we could for the first time think in terms of mappings of arguments to syntactic positions. The degree of detailed investigation into syntactic phenomena that resulted from the transformational tradition has also leaked over into phonology, morphology and semantics as well.

In the above analysis, GB asks the question, "What characteristic is common to those constructions in which the semantic object appears in subject position?" Or more generally, "What are the constraints governing the mapping of thematic roles to argument positions?"

There are a finite number of morphological and lexical processes which involve 'object-to-subject movement'. Examples have been given of passives and unaccusatives. There are also middles, so-called ergatives, inherent reflexives, and some others. All of these processes have one observable feature in common - they all have object thematic role assigned to subject position - call this characteristic X. They differ semantically and morphologically, and to some degree also syntactically. The question we ask is whether they also have other characteristic(s) in common from which we can generalize and create an explanation for characteristic X.

GB answers by saying that the additional characteristic all of these constructions have in common is [+Case-absorption]. Then it is argued, that all we need to say about these verbs is that they absorb Case, and we don't need to mention characteristic X in the grammar - it 'falls out' from Case-absorption and all these other principles.

Surface grammar says the opposite. It is Case-absorption, not characteristic X that is unnecessary in the grammar. The Case Filter is not a generalization relating two observable linguistic phenomena. It is simply a definition of abstract Case. GB grammarians do not want to see the Case Filter as a definition, but as a fact in some abstract realm whose existence can be deduced logically from symmetries in the verifiable realm. The unverifiable realm is what psychologists call 'postdictive'. Once you have the data, it provides you with a story for why it is that way, but it has no predictive power.

We could, for example, replace [+Case-absorption] with [+NP-movement]. In fact, we want to remove Case-absorption from the grammar, because we cannot verify whether it is true or false. We can look at a passive sentence and see that it has characteristic X, but we cannot verify that the passive verb absorbs Case or case or anything else. Case-absorption is an unobservable element which we have invented so that it will seem to us that characteristic X has fallen out from a deeper principle. The whole analysis involving Case-absorption explains and predicts no more than the fact itself - than characteristic X, that lots of times the object appears in subject position. The statement 'The object appears in the subject position in the following cases: passive, ergative, unaccusative,..." has just as much predictive power as that whole explanatory apparatus, and it explains just as much. When it comes down to it, that's still all we really know... that in passive, ergative, etc., the object appears in subject position. The rest is illusion.

Consider what kind of evidence we would have to look for in order to show whether Case-absorption does or does not occur in conjunction with a given verb. We can't use any morphological or lexical process in which the object doesn't appear in subject position, because those are already defined as not having Case-absorption. We can't use any morphological or lexical process in which the object does appear in subject position, because those are already defined as undergoing Case-absorption. So Case-absorption does not explain anything. It is simply defined to be the case exactly for verbs with characteristic X when they have characteristic X.

Now we can see in another way that non-descriptive elements in a grammar are superfluous. The non-descriptive explanation of Case-absorption adds no predictive power to the grammar. So by postulating it, we have added nothing aside from the phenomenon itself. Nor have we explained anything. Suppose nonetheless it is 'true'. And then suppose that we find a semantic property Z which does indeed correlate completely with all instances of characteristic X. In surface grammar, we simply state this correlation directly, but in the non-surface grammar, we still obliged to state this correlation, because it is a real fact, and we have to add the Case-absorption, because we have decided it is a 'fact' in our abstract world. Case-absorption and other non-observable elements can only make the grammar longer without adding any information.

Case-absorption is the illusion which hides the fact that we haven't found the real, observable characteristic which all verbs with characteristic X do or don't have in common. That characteristic once found will turn out to be semantic in nature, not syntactc. By building an involved grammar around Case-absorption, we are preoccupied with the illusion which prevents us from looking for the real reason why objects sometimes appear in subject position. To find the real reason is much harder than to postulate [+Case-absorption]. You have to go through all possible verbs in all possible constructions with and without characteristic X and look for some (undoubtedly) semantic property which they have in common, and then test that it really holds of them and no others.

Beth Levin(1991) and others have done just this kind of work. She has given the beginning of the real answer to the question posed by the generative tradition. The stage was also set for her by others such as Joan Bresnan, whose work allowed for a more general view of the mappings of thematic roles to argument positions. Levin gives evidence that there are indeed semantic properties of verbs which determine the mapping of thematic roles to argument positions. She lists verbs with common semantic properties and shows that they also have common syntactic properties.

For example, the class of scribble verbs, which includes carve, chalk, copy, doodle, draw, paint, pencil,... is transitive and allows for a location to be mentioned:

The jeweller printed the name (on the ring).

There is a class of verbs which involve impressing an image on something. They include applique, emboss, embroider, stamp,... They are also transitive and allow for a location to be mentioned:

Smith inscribed his name on the ring.

The image impression class also allows for the following alteration, which the scribble class does not:

Smith inscribed the ring with his name.
*The jeweller printed the ring with his name.

Levin(1991) lists thousands of such semantic/syntactic correlations.


The phenomena that Subjacency is meant to account for in GB is roughly the same set of phenomena that were accounted for by two of Ross' Island Constraints in Transformational Grammar. These were the Complex NP Constraint and the Sentential Subject Constraint. These constraints listed three classes of constructions from which unbounded movement is impossible, at least in English. A statement of and examples of each of these constraints follows:

Complex NP Constraint: (related to an earlier constraint proposed by Klima)

No element contained in a sentence dominated by a noun phrase with a lexical head noun may be moved out of that noun phrase by a transformation. [NP[S...]]


I read a statement which is about that man.
*Who did I read a statement which is about?

Sentential Subject Constraint: No element dominated by an S may be moved out of that S if that node S is dominated by an NP which is itself immediately dominated by S. [S[NP...[S...]]]


That the principle would fire some teacher was expected by the reporters.
*Who that the principle would fire was expected by the reporters?

In both cases, the construction which is resistent to movement has NPs and Ss immediately dominating one another. Subjacency defines NP, S and S' as bounding nodes, depending on the language, and states that A'-binding may apply across at most one bounding node. There are many apparent counterexamples to Subjacency:

John thought [SMary said [SBill hoped [SAllison would eat the ice cream]]]
Whati did John think Mary said Bill hoped Allison would eat ti?

The latter sentence appears to allow 'Whati' and 'ti' to be coreferenced across three bounding nodes. In order to remedy the situation, GB proposes that there be invisible structure with intermediate traces which intervene in this relationship:

John thought [S'[SMary said [S'[SBill hoped [S'[SAllison would eat the ice cream]]]]]]
Whati did John think [S'ti[SMary said [S'ti[SBill hoped [S'ti[SAllison would eat ti?

Since S' is not a bounding node in English, we no longer have a violation of Subjacency. We can't do the same trick in complex NP and sentential subject constructions because the COMP node, where the trace is in the above sentences, is already filled. The relative pronoun 'which' or 'that' is held to count as filling the COMP node, because it is coreferenced to something and has therefore lexical content, but the conjunction 'that' does not, so a trace can share a COMP node with the 'that' which is not a relative pronoun:

Whati did John think [S'tithat[SMary said [S'tithat[SBill hoped [S'tithat[SAllison would eat ti?

Subjacency does make somewhat different predictions from the abovementioned Island Constraints, and an S' node is defined to be present with any full clause. What has happened is that the two existing constraints, which were well defined in terms of surface structure were replaced by another, namely Subjacency, which would also be well defined, were it not for the intervention of invisible structure. The counterexamples to Subjacency were eliminated by postulating invisible structure, so Subjacency could no longer be tested directly. If Subjacency is tested without reference to invisible structure, it is wrong, and has counterexamples like those mentioned above. Removing invisible structure forces us back for the time being into a characterization much like what was originally formulated by Ross. In addition, the real observation about what the real limitations on movement are was observed by Ross. Reformulating it as Subjacency makes it appear as if he didn't get it fully, when just the opposite is true. Subjacency adds no predictive power over Ross' constraints and only clouds the issue as to what the empirically testable constraints are. In a sense, Ross' observations are stolen from him by this mechanism of invisible structure, but not through a clearer understanding of the phenomenon or through testable generalizations. This is one of many examples of why the field of linguistics has becomes so political and fractioned into warring camps. People get upset when their thunder is stolen in this way.



I have first outlined the basics of Government and Binding Theory. I showed that this was not a grammar which consisted of observations which could be immediately falsified by the data, but rather that it consisted of a number of assumptions which were viewed as causing the data. This resulted directly from viewing the object of linguistics as creating a grammar rather than as understanding language.

In the next section I discussed why a grammar consisting exclusively of directly falsifiable observations was preferable to the type of grammar of which GB is an example, and called this type of grammar a surface grammar. Since a surface grammar requires only what is required for communication in any case, it followed that any grammar could be reinterpreted as a surface grammar. In the process of this reinterpretation, the 'grammar' as an object melts away. It becomes a mass of descriptive statements about language, and the coherence is seen to be in language, not in the grammar. In this way, we become more aware of the mental landscape which our terminology has created and asked us to operate within, and the terminology is seen to be relative. Thus surface grammar is not a thing, but a methodology. Practicing surface grammar means practicing description of the phenomenon of language, rather than practicing explanation of the phenomenon.

In the last section, I applied these thoughts directly to GB and showed how portions of the grammar could be eliminated without loss of explanatory adequacy or predictive power. I showed that these extraneous parts could be viewed as illusory answers to questions implicitly posed by the structure of the grammar, and gave a couple examples of real answers which have been found to the questions GB implicitly posed. I showed that by disallowing invisible elements from the grammar, one was forced into alternative analyses. That is, invisible elements in a grammar afforded a principled way of changing counterexamples into examples. This made the generalization in question unverifiable, and changed its status from a generalization with predictive power to a definition with no predictive power.


There are a number of sources which have influenced me and which I would like to cite. Many of these issues were raised in Dyvik (1982). In particular, Dyvik discusses the object of study as grammar vs. language, issues of falsifiability within GB, the ontological status of theoretical constructs, the infinitely regressive nature of explanation, changing the status of a statment in the grammar from a generalization to a definition, and others. He recognizes that the issue is one of correct interpretation of the results, and that his objections do not undermine the entire pursuit.

Rudolph Botha (1979, 1981) also has a lengthy discussion of Chomskian mentalism and methods. His thoughts are very clearly written and overlap with the thoughts presented here to a very high degree. He does not completely abandon mentalism as I do, believing that the result would be a purely taxonomic and not an analytic theory of linguistics, but his work has also been very important to me. Quine(1972) has long pointed out that you don't have to build grammars to do linguistics. And Mel'chuk(1988) and Chvany(1996) have expressed similar sentiments on explanation. Ross(1986) helped me feel better the stance that arises toward language when you practice surface grammar. Bresnan (1978) and Lightfoot (1980) express some similar thoughts regarding falsifiability. Lawler(1980) draws an analogy with proscriptive linguistics. He calls the practice of illusion 'proscriptive metalinguistics'. It is this which arises when you take the theory more seriously than the data. It seems to me that deSaussure's understanding of structuralism, has reappeared today in linguistics in a different form by the name of connectionism. In general, current work of Ross, Lawler, Langacker, Levin and Lakoff, among others tends to be consistent with the principles presented here, although I have not yet encountered specific references to these issues in their work.

Issues similar to this have been addressed in other fields as well. Popper (1965) has outlined a philosophical base which is largely consistent with what is expressed here. As mentioned above, I was helped particularly by some of Feynman's work for beginning physicists (Feynman (1963), (1967), (1985)). In biology, there are thoughts related to structuralism in linguistics. Varela and Maturana(1970), Gould(1981), and Bateson(1972) have written work which is not directly related to this, but which nonetheless influenced me. Furthermore, some understanding of mathematics and computer programming gave me familiarity with the extent to which the same thing could be represented with very different formalisms, and helped me take formalisms per se less seriously.



Bateson, G. (1972) Steps to an Ecology of Mind, Ballantine Books, New York.
Botha, R. (1979) "Methodological Bases of a Progressive Mentalism", Stellenbosch Papers in Linguistics, #3.
Botha, R. (1981) "On the Galilean Style of Linguistic Inquiry", Stellenbosch Papers in Linguistics, #7.
Bresnan, J.(1978) "A Realistic Transformational Grammar" in Halle, Bresnan and Miller Linguistic Theory and Psychological Reality, MIT Press.
Bresnan, J.(1982) "Control and Complementation" Linguistic Inquiry 13.
Burzio, L. (1981) Intransitive Verbs and Italien Auxiliaries, MIT PhD dissertation.
Chomsky, N. (1957) Syntactic Structures, Mouton, the Hague.
Chomsky, N.(1965) Aspects in the Theory of Syntax, MIT.
Chomsky, N. (1981) Lectures on Government and Binding, Foris Publications Dordrecht.
Chomsky, N.(1982) Some Concepts and Consequences of the Theory of Government and Binding, MIT Press.
Chomsky and Lasnik (1977) "Filters and Control" Linguistic Inquiry 8.3.
Chvany, K (1996), "Explain and Explain", in Olga Yokoyama and Emily Klenin, (eds.), Selected Essays of Catherine V. Chvany, p. 62.
Dyvik(1982) "Epistemological problems in the "Government and Binding' programme", University of Bergen Department of Linguistics and Phonetics, ms.
Engdahl, E. (1983) "Parasitic Gaps" Linguistics and Philosophy.
Feynman, Leighton, Sands (1963), The Feynman Lectures on Physics, Addison Wesley Publishing Company.
Feynman(1967), The Character of Physical Law, MIT Press.
Feynman(1985), QED, Princeton University Press.
Fiengo, R. (1977) "On Trace Theory" Linguistic Inquiry 8.1.
Fillmore, C.J. (1968) "The Case for Case", in Bach and Harms, 1-88.
Gould, S. J. (1981) The Mismeasure of Man, W. W. Norton and Company, New York, London.
Hellan, L. (1983) Anaphora in Norwegian and the Theory of Binding, Foris Publications.
Jackendoff, R. (1972) Semantic Interpretation in Generative Grammar, MIT.
Jackendoff, R. (1977) X'-Syntax: A Study of Phrase Structure, Linguistic Inquiry Monograph 2, MIT.
Kuhn, T. (1962) On the Structure of Scientific Revolutions, University of Chicago Press.
Lawler, J. (1980) "Remarks on [J. Ross on [G. Lakoff on Cognitive Grammar [and Metaphors]]]" in Kac (ed) Current Syntactic Theories, Indiana University Linguistics Club.
Levin, B. (1991) English Verb Classes and Alternations: A Preliminary Investigation, ms.
Lightfoot, D. L. (1980) Principles in Diachronic Syntax, Cambridge University Press.
Mel'chuk, I.(1988), Dependency Syntax: Theory and Practice, StateUniversity of New York Press.
Popper (1965) The Logic of Scientific Discovery, Harper and Row, NY.
Postal, P. (1971) Crossover Phenomena, Holt, Rinehart and Winston.
Quine, W. V. (1972), "Methodological Reflections on Current Linguistic Theory", Semantics of Natural Language, D. Davidson and G. Harman, eds., Reidel Publishing, Dordrecht.
Reinhart, T. (1976) The Syntactic Domain of Anaphora, MIT PhD dissertation.
Rogers, E. (1960) Physics for the Inquiring Mind, Princeton University Press.
Ross, J.R. (1967) Constraints on Variables in Syntax, MIT PhD dissertation.
Ross, J.R. (1986) "Languages as Poems", from D. Tannen and J. Alatis (eds.) Georgetown Roundtable 1985 Language and Linguistics and the Interdependence of Theory, Data and Application, Georgetown University Press, Washington, D.C..
deSaussure, F. (1913, 1972) Cours de linguistique générale, Éditions Payot.
Varela, F. and Maturana, H. (1970), "Mechanism and Biological Explanation",_________, 378-382.
Williams, E. (1980) "Predication" Linguistic Inquiry, 11.1
Wow-mann, G. "Superduperstuff of the Universe", Institute of Innerspace/Outerspace Interfarce, ms!

Margaret's Page

Margo's Magical Letter Page