I tend to enjoy games that explore the structure of language -- I've previously written about a little game I called "word shift", with the goal of finding the longest english word such that each "rotation" of the word remains a word (more on that here). Here's another game I've been having fun thinking about: consider the "Madison" property.
So, the Madison property says that a given word consists of other words concatenated together, where each subword is also in the corpus. I call it the Madison property since the word "Madison" is a nice example: "mad", "i", and "son" are all likely to be in most English corpora (and it's the word that got me thinking about this -- based on a street sign in Portland, ME).
The question of the game explores the prevalence of the Madison property across different corpora: how common is it for this property to be true in a given language? What fraction of our words are decomposable into other words? I wrote some code here that explores these questions.
With the usual dictionary located at /usr/share/dict/words on most Unix systems, I found (to my surprise!) that nearly every word satisfies the property -- however, on closer inspection, most of the individual letters count as words in the above dictionary (so "z" is listed as a word). To make things more interesting, I grabbed a corpus consisting of the 3000 most common English words (from here)
Here are the results:
I tried it out on Alice in Wonderland too:
The code is available here if you'd like to play around with different texts!