Comprised of

I have just read an article about an extremely dedicated and efficient WikiGnome who has edited tens of thousands Wikipedia articles to change a single grammatical error – “is comprised of”.

https://medium.com/backchannel/meet-the-ultimate-wikignome-10508842caad

My initial reaction was that the phrase was perfectly acceptable and the journalist’s dread of having published work that included the phrase was overblown.

I checked a dictionary, which list examples of the phrase:

http://dictionary.cambridge.org/dictionary/british/comprise

It is sufficiently common that it doesn’t jar.

However, the Wikipedian’s own explanation of why he opposes the phrase is logically sound:

https://en.wikipedia.org/wiki/User:Giraffedata/comprised_of

As is this note from the dictionary:

http://dictionary.cambridge.org/grammar/british-grammar/consist-comprise-or-compose

These sentences are standard English:

“Air consists of nitrogen, oxygen and other gases.”

“Air is composed of nitrogen, oxygen and other gases.”

“Air comprises nitrogen, oxygen and other gases.”

Sentences of this sort are commonly accepted but, I now accept, illogical:

“Air is comprised of nitrogen, oxygen and other gases.”

The phrase “is comprised of” merges the correct uses of consist and compose. It attempts to mean the same thing in the passive and active voice. It’s similar to “irregardless” being used to mean “regardless”: an opposite with the same meaning.

Whether we should use “is comprised of” is a contentious question, and many people feel that it’s fine. However, why do we instantly reject this sentence?

“Air is consisted of nitrogen, oxygen and other gases.”

This sentence is logically equivalent to one with “is comprised of”. Searching for the phrase “is consisted of” in the English Wikipedia turns up only 69 instances, mainly in user talk pages rather than the actual articles:

https://www.google.co.uk/#q=site:en.wikipedia.org+%22is+consisted+of%22

It seems likely that our acceptance of sentences as grammatical or not is based more on recognition and familiarity than on a strict logical evaluation of the words.

Finding Reversed Words

Following on from coding experiments with using the decorator pattern to reverse strings in Java and with palindromes and string reversal in C#, I decided to search for the words in the English language that are the reverse of other words. The Devil lived, but Black Sabbath gave us Live Evil. To beat Sega takes ages. You’re upping the ante living near Etna.

I had a copy of a dictionary file from Ubuntu on my hard drive. To make life a little simpler, I used some PowerShell magic to remove the many words that end in “‘s” (there are no words that start “s'”) and to make the list all lower case:

PS M:data> Get-Content .\british-english.csv | ?{ $_ -NotMatch "'s$" } | %{ $_.ToLower() } | Sort-Object | Get-Unique | Set-Content british-english-without-apostrophe-s.txt

The simplest algorithm for finding words that are the reverse of other words is to simply go through the whole list and then check whether any of the other words are the reverse of that word. To do this, we need a method to test if two strings are a reversed pair. Pushing the characters of the first string onto a stack and then popping them off as you check them against each character of the second string can achieve this:

public bool AreReversedPair(string firstString, string secondString)
{
    if (firstString.Length != secondString.Length)
    {
        return false;
    }

    var firstStringStack = new Stack();

    foreach (var c in firstString)
    {
        firstStringStack.Push(c);
    }

    foreach (var currentCharFromSecond in secondString)
    {
        var currentCharFromFirst = firstStringStack.Pop();

        if (currentCharFromFirst != currentCharFromSecond)
        {
            return false;
        }
    }

    return true;
}

Next, the code for finding the pairs:

public IEnumerable FindReversedWords(IQueryable allWords)
{
    var reversedWords = new LinkedList();
    var reversedStringChecker = new StackReversedStringChecker();

    foreach (var firstWord in allWords)
    {
        foreach (var secondWord in allWords)
        {
            if (reversedStringChecker.AreReversedPair(firstWord, secondWord))
            {
                reversedWords.AddLast(new ReversedWordPair(firstWord, secondWord));
            }
        }
    }

    return reversedWords;
}

This code is very straightforward, but not fast at all. The time complexity of the algorithm is O(n^2) because for each word, we need to check every other word. For a list with almost 73,000 words, this takes a long time. In spite of this, it does finish. I left the program running, went to dinner and found a list waiting for me when I came back.

reversed-word-pairs.txt

As the list of words is not going to change very often and I only need to run the algorithm once, I could just leave the program as it is. However, it would be nice to reduce the time complexity of the algorithm. The simplest way to do this would be to reverse each word and then check whether the list of all the words contains the reversed word (readers who are familiar with .Net may have noticed that I passed the list of words as an IQueryable rather than an IEnumerable for the list of words). Depending on the data structure, checking whether the reversed string is a member of the set of words should be less than O(n). For example, retrieval from a Trie takes O(m) where m is the length of the string. This would leave us with O(nm) time. As with all optimisations, you’ve got to time the code to see whether you get a boost or not.

My next task is to come up with a meaningful sentence with all words the reverse of the corresponding word at the other end of the sentence, with a palindrome at the centre. There must be something that can be made of Dennis, who sinned in the straw, only to get warts to stun his nuts.

See if you can make a mirror sentence of your own.

The complete code, along with some tests, can be found at:

https://github.com/robert-impey/CodingExperiments/tree/master/C-Sharp/FindReversedWords

New TEFL Site

I’ve put together a few pages for a TEFL materials site:

http://tefl.impey.info/

My aim for this site is to be able to produce materials for TEFL lessons more quickly. The first page that I’ve put up generates materials for a missing information game:

http://tefl.impey.info/TEFL_FindTheWordsInCommonGameHTMLPage

I’ve been playing this game for a few weeks in the classroom, but I have grown tired of writing out the cards using MS Word.

As always, I’ve written the site using the Haddock CMS. It’s the first site to make use of the Sky theme plug-in:

http://haddock-cms.googlecode.com/svn/plug-ins/public-html-sky-theme/trunk/

The aim of theme plug-ins is to be able to make giving a style to a site simply a case of checking out a plug-in and then getting the HTML page class to extend a class in the theme plug-in directory.

It’s also the first site to make use of the new “Site Texts” plug-in:

http://haddock-cms.googlecode.com/svn/plug-ins/site-texts/trunk/

This separates all texts from the code of the project. The texts are saved in files in a separate folder to the project-specific code. At the moment they need to be created and edited by hand, but a web interface in the admin section may follow.

Large Heads

As an English teacher, I have to be careful about the things that I say to my students. A few days ago, one of my co-teachers told me that one of our students would not be coming to school today, because he was ill. Normal enough at this time of year, I thought. She went on to say that his mother had mentioned to her that the student did not want to come to school, because I had said something about him having a big face. I insisted that I had had said no such thing and that I had never commented in any way on a single student’s appearance and never would.

I taught the class, without the student, a little concerned about how I could have upset him. The other students seemed happy enough. As I finished the class, I worked out the problem and went down to the staff room to explain what had happened to my co-teacher so that she could relay it to the student’s mother in Korean.

In a previous lesson, I had tried to encourage the students to complete a lengthy homework assignment by telling them that they were very clever. As they were quite young (8 years old), they did not understand the word “clever”. To try to explain the word, I had drawn a quick cartoon on the board of a balding professor with an enormous, domed cranium bursting at the seams with brains. The students seemed to understand at the time, but I guess that kids sometimes get the wrong end of the stick, and this had thought that I was calling him and the rest of the class mutants.

I hope that we have fun in the lesson tomorrow.

Snippets of Latin

I’m currently reading Edmund Burke’s Reflections on the Revolution in France.

I find the text very interesting and feel that a lot of the content pertains to open source software. Repeatedly, Burke argues for gradual improvements to existing systems that have been constructed over time and shaped by actual demands rather than sudden revolutions of men of ideas but little experience. At its best, the open source movement offers software that has evolved in the gradual way, with bugs eroded over time. Something that attracts me to free software more greatly recently is that it can’t be taken off the market. Microsoft have stopped selling XP, sort of. I don’t want to be forced to move Vista. The designers of the original UNIX probably never thought that that OS would still be in such widespread use in the 21st century but here we are. Are there better systems? No doubt. Do I want to “upgrade” all my servers to anything else? Not on your nelly.

Of course, many in the open source movement might see themselves more like the revolutionaries, sweeping away a corrupt monopoly and replacing that with a free utopia. Reality doesn’t reflect that. Where the advocates of free software have presented themselves this way, they have succeeded the least.

A problem that I have encountered whilst reading has been translating the frequent quotations in Latin. Although I studied Latin for six years at school, I can’t remember much more than to parrot off “Bellum, Bellum, Bellum”. A typical problem of a language education focussed almost exclusively on syntax. I cut and paste the sentences into google but more often than not the only results returned are other copies of “Reflections” (of which there are plenty).

Does anyone know of a good repository of Latin quotations? It could make quite an interesting CRUD page (e.g. quotation, original text, original author, texts in which it appears, possible translations, votes for translations and so on). But I don’t want to build such a page as I don’t know enough Latin and I’m sure that something similar must exist already.

Common Language Errors

I recently started working as an EFL teacher in Seoul, South Korea. I like this work a lot and I am planning to work as a teacher for the foreseeable future.

However, you can’t go from being a web programmer to another career and leave everything behind. You see possible computer programs wherever you go. I can see the need for Haddock CMS (or RoR or whatever…) projects all over the place. My new employer’s time-tabling system and vocabulary database would be a lot more simple if they were web based system rather than MS Access and Excel based.

Another project that I thought might be quite interesting to start is one to track common written language errors. From marking book reports and tests, I’ve seen quite a few common errors in language already. For example, “The mother was see the child” and variations along those lines are extremely common. I don’t know enough about the Korean language to be able to say why so many of my students should make that mistake but my assumption is that there is probably a grammatical structure similar to that in the students’ native tongue.

If I get time, I would like to start a project that allows EFL teachers to log these sorts of errors on a web site. There are EFL teachers in every corner of the world now and they a grow online community. It would be interesting to see a student’s native language affects his production of English. This might help teachers to decide on which areas of language to focus their classes.

Does programming turn you into a grammar Nazi?

Recently, I had an exchange on 43things.com about the use of the passive voice.

Something that I love about the internet is that it allows you to find people who are up for a discussion of the most arcane things. I can never talk about grammar with my friends or family, which is perhaps quite healthy and for the best.

What’s more, because of the setting, you have a chance to check your facts and develop your responses in a way that is impossible in real life. Thank you Google and Wikipedia.

One of the points of this discussion was that the passive voice can obfuscate the meaning and is less simple than the active voice and should, therefore, be avoided. I disagree with both reasons. A sentence in the passive shouldn’t be much of a challenge for any native speaker of English.

But I thought about this and wondered about the effect that spending so many hours a week programming has on the linguistic centres of a brain.

Consider a fragment of code like this:

my_div.appendChild(document.createTextNode(my_var.get_foo()));

A translation of this into English might be:

Take the value returned by the get_foo method of the my_var object and, using the createTextNode method of the document object, generate a TextNode object for that value. Append this object to my_div object using the appendChild method.

Any programmer might deal with hundreds of lines similar to this on any given day. Trust me (if you’re not a JavaScript programmer), this is simple stuff.

I dislike the fussiness and delusions of superiority that go with correcting other people’s language. I saw a group on facebook recently for people who always carry red pens so that they can correct menus which contain grammatical and spelling mistakes. Not only is this rude (and possibly xenophobic in some of the situations listed by the group), it also misunderstands the point of language and its evolution. Communication is the aim of speech and writing. I’m very much opposed to the idea of Linguistic prescription. To quote dear old Winston Churchill, it’s “the sort of bloody nonsense up with which I will not put.”

However, programming languages (for very good reasons) do have strict rules of syntax.

Consider:

my_div.appendChild(document.createTextNode(my_var.get_foo());

Hopefully, any programmers reading this will have spotted the mistake more or less instantly.

Just by force of habit, programmers tend to end up being fairly precise when it comes to language. Or perhaps, it’s that people who are precise about language end up working as programmers. I just hope that bashing my mind against compilers for years won’t give me any silly ideas about how human languages work.

I also have to admit that writing about grammar scares me. I hope that I haven’t made any speling mistakes and that the passive voice hasn’t been used too much.