# David Abel

david_abel@brown.edu

### Silly Game 2.0: The Madison Property

##### 10/17/2018

I tend to enjoy games that explore the structure of language -- I've previously written about a little game I called "word shift", with the goal of finding the longest english word such that each "rotation" of the word remains a word (more on that here). Here's another game I've been having fun thinking about: consider the "Madison" property.

Definition (Madison Property): The Madison property is said to obtain of a word $$w$$, with respect to a corpus $$C$$ if there exists some number of $$n > 2$$ subwords, $$w_1, \ldots, w_n$$, such that: $$w_1 \circ w_2 \circ \ldots \circ w_n = w$$, and $$\forall_{i \in [1:n]} : w_i \in C$$.

So, the Madison property says that a given word consists of other words concatenated together, where each subword is also in the corpus. I call it the Madison property since the word "Madison" is a nice example: "mad", "i", and "son" are all likely to be in most English corpora (and it's the word that got me thinking about this -- based on a street sign in Portland, ME).

The question of the game explores the prevalence of the Madison property across different corpora: how common is it for this property to be true in a given language? What fraction of our words are decomposable into other words? I wrote some code here that explores these questions.

With the usual dictionary located at /usr/share/dict/words on most Unix systems, I found (to my surprise!) that nearly every word satisfies the property -- however, on closer inspection, most of the individual letters count as words in the above dictionary (so "z" is listed as a word). To make things more interesting, I grabbed a corpus consisting of the 3000 most common English words (from here)

Here are the results:

• Shortest satisficing word: "ago", ("a", "go).
• Longest satisficing word: "transformation", ("transform", at", "i", "on").
• Satisficing word ratio: 0.082 (247 / 3000).

I tried it out on Alice in Wonderland too:

• Shortest: "ii", ('i', 'i') -- this is from "Chapter II".
• Longest: frontispiece ('front', 'is', 'piece')
• Satisficing word ratio: 0.066 (170 / 2559)

The code is available here if you'd like to play around with different texts!

-Dave