David Abel


Silly Game 2.0: The Madison Property


I tend to enjoy games that explore the structure of language -- I've previously written about a little game I called "word shift", with the goal of finding the longest english word such that each "rotation" of the word remains a word (more on that here). Here's another game I've been having fun thinking about: consider the "Madison" property.

Definition (Madison Property): The Madison property is said to obtain of a word \(w\), with respect to a corpus \(C\) if there exists some number of \(n > 2\) subwords, \(w_1, \ldots, w_n\), such that: \(w_1 \circ w_2 \circ \ldots \circ w_n = w\), and \(\forall_{i \in [1:n]} : w_i \in C\).

So, the Madison property says that a given word consists of other words concatenated together, where each subword is also in the corpus. I call it the Madison property since the word "Madison" is a nice example: "mad", "i", and "son" are all likely to be in most English corpora (and it's the word that got me thinking about this -- based on a street sign in Portland, ME).

The question of the game explores the prevalence of the Madison property across different corpora: how common is it for this property to be true in a given language? What fraction of our words are decomposable into other words? I wrote some code here that explores these questions.

With the usual dictionary located at /usr/share/dict/words on most Unix systems, I found (to my surprise!) that nearly every word satisfies the property -- however, on closer inspection, most of the individual letters count as words in the above dictionary (so "z" is listed as a word). To make things more interesting, I grabbed a corpus consisting of the 3000 most common English words (from here)

Here are the results:

I tried it out on Alice in Wonderland too:

The code is available here if you'd like to play around with different texts!