Last year I was very lucky to be invited to try out ELT writing. I was extremely grateful for the opportunity, and for the people who believed in me enough to give me the chance. I marvel at how lucky I am (and what amazing friends I have!).
So anyway, I joined a project writing reading texts for a middle school book. I started learning immediately.
I was given an excel file with a vocabulary list. Words in white were level 1. Words in green were level zero. Words in yellow were level two. I was asked to use 70% of the headwords from the white list. I was also given a grammar point to include, and specified a format, topic, and word count.
Now I’m a bit of a tech-dunce, but not a technophobe, and I saw a couple problems.
1) The words were all mixed together (arranged alphabetically and not separated by color). How on earth was I going to compare a 200 word text with the vocabulary list without painfully going through it word by word? Particularly since words like ‘I’, ‘a/an’, and ‘the’ are on the green list!
2) They weren’t all lemmas! Multiple forms of some words were on the list, but not others. But ‘headwords’, they said, so I assumed inflection would be okay.
What I needed was a way to compare the texts with the word lists. And before I could do that, I needed distinct word lists.
Did you know Excel can sort by color? That’s the first thing I learned. This website explains how to do it very well. But because of the way the excel file was set up, I had to do it column by column. Each column was a letter of the alphabet, so that meant 26 times of sorting and then grabbing words from each level and putting them into new pages.
I already knew about some vocabulary tools. Lextutor, for instance, can compare a passage with the general service list and tell you how difficult it is (by telling you which words appear in the first 1000 or 2000 high frequency words). I needed something that work a little differently. I needed to compare against the lists I’d been given and not the GSL. Was there something that could do that?
To find the answer, I took to Twitter. Costas Gabrielatos came to my aid right away. He is a corpus linguistics expert and really helpful person. He introduced me to AntConc and showed me how to make a corpus out of the texts I have and compare the texts to the excel file to find out how many times the words appeared in which text.
I may have mentioned that I’m a bit of a tech dunce. Even with the screenshots of how this would look and what it could do, I couldn’t really understand how it would solve my problem. Reading his suggestions again, I see now that he was solving my problem very neatly. But at the time, I didn’t get it.
Luckily, there was a simpler way. Mura Nava came to my rescue with a patient, dunce level explanation. have you tried antwordprofiler? that’s exactly what it does. So off I went to the antwordprofiler website to watch the helpful video tutorials. This was exactly what I needed.
Now, antwordprofiler comes with GSL 1 and 2 and AWL already installed. I had my own word lists to compare against, though, and needed to replace them. Fortunately, Mura solved that problem for me, too. He directed me to his Google+ Community on Corpus Linguistics, and to a post about how to deal with specialized or technical vocabulary. His post showed how to extract the off-list words into an excel file and from there use them to make a txt file to add to the GSL files. I already had excel files, so I just used the latter part of the process. Once my wordlists were uploaded, I deleted the GSL files.
Finally, I put my reading passages into txt files and ran the program. It worked.
I made adjustments to make my texts closer to 70% on the second list, and felt very techy indeed. Problem solved. I proudly sent in my first five passages and waited for feedback.
And anyone who has worked in this field can probably predict what happened next.
Please consider the difficulty of the passages. I was told. They should be easier than level 2.
I wish I could say that that’s when I figured out that ‘headwords’ to them meant the 7~10 vocabulary items they will highlight and pull out of the text, but I actually only just figured that out now reflecting back 8 months later. So they meant 70% of those 7~10 words, not 70% of the whole text. The antwordprofiler tool would still be useful, but maybe I should have stuck with the GSL.
On the plus side, now I know how to sort in Excel by color, how to use antwordprofiler, and I can start to learn antconc. And I think that’s pretty cool. 🙂