How I (almost) won an argument using corpus linguistics

I’m taking a class called “Morphology and Syntax.” It’s currently the third week of the class and it has been interesting. In the first week, I got into an interesting debate with another student.

The argument was about the usefulness of an intrusion test to determine phrasal boundaries (1). He argued that a test to see where an adverb can be placed within the sentence to determine the boundaries of a phrase is not completely reliable because, while incorrect according to a prescriptive grammar, some discourse strategies allow intrusion in a phrasal verb (2). In principle, I agree with him. I feel strongly that how people actually use language *should supersede any prescriptive grammar. The question then becomes How do people actually use the language? and this is where corpus linguistics comes in.

The examples my classmate used are below:

(A) The cat will eat its, I suspect, lamb chops …
(B) John rang, almost certainly, up his accountant.
(C) John rang, almost certainly his accountant up.

I left (A) alone in the debate that followed.

Then I turned to COCA (Corpus of Contemporary American English) and discovered after an hour of messing around that I had absolutely no idea how to do what I wanted to do. So I did what any rational tweep would do and turned to our friendly neighborhood #TESOLgeek and #corpusexpert, @muranava.

This led to some more research on commands to use in a corpus, but also led to an answer to my question:

Most commonly, a direct object is inserted between ‘rang’ and ‘up’. 9 and 10 look strange. Judging by the context (“her laughter rang shrill up and down the river”) the phrasal verb is not involved for 10. 9 is a mystery: “I rang turn up on the field phone.”

Next I searched for adverbs:

This image shows that adverbs appear before ‘rang up’. This was not surprising. Next I checked to see if people ever use adverbs within the phrasal verb:

Unsurprisingly, I came up empty.

Of course there are problems with my informal research: COCA is limited to American English for one thing, while the phrasal verb ‘rang up’ is perhaps not very common in American English (compared to ‘called up’ with 549 entries, ‘rang up’ has only 46). Incidentally, I did the same research for ‘called up’ and discovered that ‘called back up’ has a couple entries.

A second consideration is that COCA’s results, while across a variety of genres, are nevertheless limited. My classmate suggested that a search of a non-dialect-specific corpus might give better results. Ah, if only the BNC were free! The random 50 entries for ‘rang _ up’ in BNC all insert direct objects and I cannot do a more advanced search from the “simple search” page.

In the end, my classmate conceded that his specific example might not have been the best one, but his point still held that intrusion tests are not sufficient to determine phrasal boundaries – and I agreed.

The argument ended when the professor chimed in to say that wherever this sort of interruption might occur within a phrasal verb, one is also likely to find disfluency features like throat-clearing or false starts. His post made it clear that he personally finds the intrusion plausible.

Now the more I think about it, the more I realize that I was being a bit of a prescriptivist myself. I was trying to use a descriptive resource to prove my prescriptive point. It didn’t work quite the way I hoped it would, but I learned a few things!

From this interesting experience I learned a lot about the power and limitations of corpus linguistics. I see the value in having access to how people actually use English as an alternative to the prescriptive approach (how we *should use it). I also see the limitations since corpora are not exactly a cross-section of English use yet.

I also started to learn how to search a corpus (two, in fact. Thanks to Mura for getting me started!). These are tools I want to use more often.


Have you ever used corpora? How do you use them?
Feedback of all kinds is appreciated. 



Notes on terminology and/ or grammar:
(1) A bit more about phrases and intrusion tests:
Phrases include, for example, noun phrases (“the big red dog”), prepositional phrases (“through the garden”), etc. In order to determine where the boundaries of a phrase are, there are tests. The intrusion test says that an adverb (for instance) can only be inserted at the boundaries of a phrase.
So in the sentence, “The big red dog ran through the garden,” we can say
“Quickly the big red dog ran through the garden”
and “The big red dog quickly ran through the garden”
and “The big red dog ran quickly through the garden”
and “The big red dog ran through the garden quickly.”
But we can’t say “*The quickly big red dog ran through the garden”
or “*The big red dog ran through quickly the garden.”
This shows that “the big red dog” and “through the garden” are phrases.

(2) Verbs often stand alone, except phrasal verbs. Phrasal verbs, like “rang up,” “called up,” “tried out” are verbs that are inseparable from their particles.
The only thing that can be inserted between the verb and particle is a noun phrase containing the direct object, so “rang the doctor up” or “called the speedy taxi driver up” or “tried the new piano out” are all okay.
On the other hand, “*rang quickly the doctor up” and “*called spontaneously the speedy taxi driver up” and “*tried yesterday the new piano out” – not so much. Thus the intrusion test once again shows that these verbs are phrases.


