‘W’ Considered Harmful

Not the magazine and not even the former president. But the letter ‘W’ itself. The letter ‘W’, 23rd in the English alphabet, is unique in two ways: it is the only letter whose name is more than one syllable, and also the only letter whose name doesn’t include the sound it makes.

The fact that ‘W’ takes 3 syllables to say bothers me. Even Wikipedia’s entry on ‘W’ points out, twice, that the abbreviation www requires nine syllables to say. Crazy. So I wondered, how often is it the case that words that start with W (hereafter W-words) have fewer syllables than the letter W (double-yew)?

Syllabification in general is a hard problem in English, but fortunately I don’t have to solve it. The Carnegie Mellon University (CMU) Pronouncing Dictionary provides the pronunciations for over 125,000 words. I say pronunciations, plural, because words can be pronounced in a variety of different ways (e.g. fire can be pronounced to rhyme with higher, or in a single syllable. Only 41 W-words in the CMU dict have pronunciations with different numbers of syllables (e.g. warrior). Using the CMU Pronouncing Dictionary, it’s possible to count syllables in a word in a short (if cryptic) Python function, courtesy of Jordan Boyd-Graber — I found it on the nltk-users google group:

from nltk.corpus import cmudict 

dictionary = cmudict.dict()  # Get the CMU Pronouncing Dictionary
entry = dictionary[word.lower()]

def nsyl(entry): 
    """Return the max syllable count in the case of multiple pronunciations.""" 
    return max([len([s for s in p if s[-1].isdigit()]) for p in entry])

So, now that we’ve got a syllable counter, let’s get all the W-words in the CMU dictionary, and see what the syllable distribution looks like.

import pylab 

src = "The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6]"
w_words = {w: nsyl(entry) for (w, entry) in dictionary.items() if w[0] == 'w'}

pylab.hist(w_words.values(), align='left')
fig = pylab.gcf()
pylab.xlabel("Number of Syllables")
pylab.ylabel("Count")
pylab.title("Number of Syllables in {:,} words starting with 'W'".format(len(w_words)))
pylab.figtext(0.99, 0.01, 'Data Source: ' + src, ha='right', c=fig.get_edgecolor())
pylab.savefig('w.png', facecolor=fig.get_facecolor())

worth_abbreviating = [(w, n) for (w, n) in w_words.items() if n < 3] 

Number of Syllables in words starting with 'W'

Only 101 W-words in the CMU dictionary (of 3,805 total W-words) have more than 3 syllables. That’s 2.6%. Here’s a sampling of the words where using W to abbreviate them actually saves syllables: wagnerian, wallpapering, washingtonians, weatherperson, workaholic. So, by all means, call a meeting of the Wagnerian Wallpapering Workaholic Weatherperson Washingtonians the WWWWW. It will save time. Otherwise, consider not using an abbreviation. Or looking for synonyms.

Suggestions for further work:

Take the data from Google n-grams viewer and count syllables for W-words using nltk’s ~90%-accurate syllabification code for words that may not be in the CMU dictionary.
Get a list of acronyms (maybe from netlingo) and see how many of them require more syllables to say than the phrase they stand for.

_{Originally published: Mon, 28 Feb 2011 to https://runningwithdata.tumblr.com/post/3576752158}