The Azed Archive WordStats
WordStats is special feature of the Archive that looks at all the words that make up the clues, and their length and frequency. By breaking down clues into their component words it’s possible to see and compare them in a new way. Azed is of course the sole arbiter of quality in the clue-writing competition, but WordStats is a great source when it comes to quantity.
How to find WordStats in the Archive
WordStats are collected for all clues in the Azed archive from 1972 onwards.
- Competitor and Competition WordStats are linked from the competitor lists, the Clues by Competitor, and the Clues by Competition pages via the icon. They’re available for every competitor and competition that has clues in the Archive and include their average clue length, their longest and shortest clues, and all the unique words they have contributed through their clues.
- From the Competitor and Competition WordStats you can also create a Wordle wordcloud out of the clues. Wordle is a popular web application that generates a colourful image representing the distribution of words in a piece of text, like the one below. Instructions are on the WordStats pages.
- And if you want to know more about the WordStats of the Archive, read on …
WordStats shows us that there is a remarkable consistency of clue lengths, even though clues vary in length from 4 to 230 letters. In each of the past 45 competition years, the average ‘normal’ clue has contained between 9 and 10 words and between 40 and 48 letters. The average clue length over the whole archive (more than 14,400 clues of all types) is 9.6 words and 46 letters. There has been no trend to longer or shorter clues over the life of the Azed series.
Clue length is most highly correlated with clue type, for the obvious reason that clues such as ‘Right and Left’ and DLM demand longer clues.
In normal clues there’s a strong relationship between the number of letters in the word clued and the number of letters in the clue. In clues for words from 4 to 12 letters the average clue length increases by about one letter for each extra letter in the word clued. 4-letter words have an average clue length of 40 letters, rising to an average of 47 letters for 12-letter words (competitions 1 to 2395).
Azed competitors can be extremely brief when the opportunity arises. Here are the normal clues – 3 of them Cup-winners – that are shorter than the words they clue:
|1372||MASTERSTROKE||A major coup||10|
and some others that weigh in at no more than ten letters:
|1711||MINUS||Dash in sum?||9|
|788||CROW||Jimmy or Jim?||10|
|1181||SLADE||‘Noddy’ leads ——||10|
Competitors’ brevity is at least equalled by their prolixity. The longest ‘clue’ in the archive is this 230 letter Jingle for the Three Kings by C. H. Hudson from competition 143
You were three chiefs of Middle-East,
You the three Kings of Orient were –
The modern Sheik’s not in the least
Concerned with frankincense and myrrh.
You journeyed through the winter’s cold
To mark with gifts the angels’ news –
Your counterpart, agog for gold,
Just grabs his oily revenues
Definition and Letter Mixtures also discourage terseness. The only 3-word DLM competition (1810 HARE / POSEUR / SERINETTE) produced these 165 letters from Rev Canon C. M. Broun
Be a better preacher, a sensible one – move quickly to conclusion, then shut up – or sermons will seem a show-off of theology and congregations lose interest, glad to substitute hymns and organ for teaching.
The longest cryptic clue is a ‘Right and Left’ of 30 words and 149 letters by P. F. Henderson for URTICARIA / APOGRAPHS in competition 447
Rash primaries in USA – really, those idiots Carter and Reagan (ignoring Anderson) – you can’t choose between them! Perhaps hostages will be released – then you could say: ‘He’s set these free’
As a normal clue, C. M. Edmunds’ work from 1985 (696 ANTIMNEMONIC) of 122 letters and 22 words (10 of which are the definition) has never been surpassed in length:
One leader of morris men experiencing volte-face over performing in grotesque pageant – I’m opposed to knotted handkerchiefs, I shan’t ring any bells!
though R. J. Heald came close with his tribute to the Azed 1750 lunch (1750 PLOUGHMAN), at 120 letters and 19 words
Characters foremost among puzzlers love Oxford University get-togethers with regular doses of champagne (my lunches are far less extravagant!)
You might expect cup-winning normal clues to be somewhat briefer, and they are (8.8 words and 42 letters on average), though in competition 2064 D. K. Arnott got through 79 letters and 13 words clueing the 10 letters of COLD TURKEY
Boxing Day depression, annual excitement over, essential problem being crackers with no crack?
The Printer’s Devilry clue with most letters is from I. Carr (2040 EASTER), at 12 words and 80 letters:
Global warming and oceanic pollution are what? Many scientists’ se/minal planetary afflictions
There are 138,911 words altogether in the clues of the Archive, and 21,499 distinct words. Just as in everyday language, a few words occur very frequently, a large number very rarely, and the rest somewhere in between. But this distribution makes the clues much more diverse than everyday language or literature, as this rather selective table shows:
|King James Bible||788,258||14,565||1.8%|
|Azed Slip Archive Clues||138,911||21,499||15.5%|
The most popular words in the Archive clues are unsurprisingly also some of the most frequently occurring words in English (competitions 1 to 2395).
Behind them we find some cryptic staples
and a little further on, some thematic favourites
which come just ahead of the judge and setter in several guises
and, of course, references to the art of clue-writing
Over half of the distinct words in clues (11,390 out of 21,499) occur only once in the entire Archive. Every competition and most competitors have contributed some unique words. Each competitor is credited with their unique contributions in the competitor WordStats pages.
The very first clue in the Azed Archive, S. L. Paton’s
Before the heart ensnares one, one likes to go on a binge
contains the Archive’s only instance of ‘ensnares’ and Competition 1’s clues include ‘Doughboy’, ‘unhealthy’, ‘groats’, ‘lunch-time’, ‘Borgia’, ‘fascinate’, ‘mysteries’, ‘strychnine’s’, ‘corgi’, ‘tigs’, ‘Tollesbury’, ‘promiscuously’ and ‘Bacchae’, none of which has been repeated in the ensuing 45 years of competitions.
New additions to the word list in the latest competition in the WordStats, 2395, include ‘turnips’, ‘precepts’, ‘Trundling’, ‘courtroom’, ‘Ironside’, ‘naturism’, ‘severity’, ‘Crucible’, ‘repressed’, ‘precept’, ‘pietist’, ‘Arminius’, ‘athletically’, ‘self-control’, ‘equivocal’, ‘influenced’, ‘Milton’, ‘Bunyan’, ‘lied’, ‘silenced’, ‘professing’, ‘Anti-establishment’, ‘pamphlets’, ‘invective’, ‘God-driven’, ‘Cromwell’, ‘pontificating’, ‘Protestantism’, ‘believing’, ‘handbag’, ‘Framework’, ‘same-sex’, ‘relationships’, ‘anathema’, ‘Royalists’ and ‘morality’.
And which clue has contributed the most unique words? The answer lies in that unique ‘Jingle’ competition 143, and the following verse from W. Jackson:
Caspar rex et Melchior
claro iam aenigmatos.
Vos, observatores Stellae,
Stellae nunc Observatoris,
die Salvatoris nostri
What’s in a word?
In these WordStats a ‘letter’ is any character from A to Z (including accented letters) or digit from 0 to 9.
A ‘word’ is any contiguous sequence of letters, digits, hyphens or apostrophes terminated by any other characters (spaces, punctuation) or either end of the clue. Every form of a word, including differently hyphenated or apostrophised forms is considered distinct (e.g. ITS and IT’S are each counted as distinct single words).