The Azed Archive WordStats

WordStats is special feature of the Archive that looks at all the words that make up the clues, and their length and frequency. By breaking down clues into their component words it’s possible to see and compare them in a new way. Azed is of course the sole arbiter of quality in the clue-writing competition, but WordStats is a great source when it comes to quantity.

How to find WordStats in the Archive

J. R. Tozer’s clues. Image generated by wordle.net

Clue lengths

WordStats shows us that there is a remarkable consistency of clue lengths, even though clues vary in length from 4 to 230 letters. In each of the past 39 competition years, the average ‘normal’ clue has contained between 9 and 10 words and between 40 and 48 letters. The average clue length over the whole archive (more than 12,500 clues of all types) is 9.6 words and 45 letters. There has been no trend to longer or shorter clues over the life of the Azed series.

Clue length is most highly correlated with clue type, for the obvious reason that clues such as ‘Right and Left’ and DLM demand longer clues.

In normal clues there’s a strong relationship between the number of letters in the word clued and the number of letters in the clue. In clues for words from 4 to 12 letters the average clue length increases by about one letter for each extra letter in the word clued. 4-letter words have an average clue length of 40 letters, rising to an average of 47 letters for 12-letter words (competitions 1 to 2065).

Word
length
Words
in clue
Letters
in clue
No of
comps
48.3409
58.64126
68.94234
79.14458
89.24465
99.54553
109.64647
119.74634
129.84751
 

Shortest clues

Azed competitors can be extremely brief when the opportunity arises. Here are the normal clues – 3 of them Cup-winners – that are shorter than the words they clue:

576TOP-NOTCHA1 V1?4
709GINGERGo pop5
508POSTURE-MAKERProteus?7
701BALUSTRADEBears cope?9
221PADDY-WHACKIre-lander?9
1372MASTERSTROKEA major coup10
 

and some others that weigh in at no more than ten letters:

788CROWB-r-ag?4
1021BEARDFace down8
88BLOOMERYHothouse?8
709GINGERPop group? 8
891CATDandy lion?9
1711MINUSDash in sum?9
464SIMKINI’m bottled9
735MALIGNKnock Fell9
1797HEARTMeat balls9
709GINGERBeer bottle10
709GINGERBuck Rogers? 10
1026LET-OFFFire escape10
788CROWJimmy or Jim?10
203BOGYPoker fiend10
401GO-AHEADSmart try-on!10
1181SLADE‘Noddy’ leads ——10
 

Longest clues

Competitors’ brevity is at least equalled by their prolixity. The longest ‘clue’ in the archive is this 230 letter Jingle for the Three Kings by C. H. Hudson from competition 143

You were three chiefs of Middle-East,
You the three Kings of Orient were –
The modern Sheik’s not in the least
Concerned with frankincense and myrrh.
You journeyed through the winter’s cold
To mark with gifts the angels’ news –
Your counterpart, agog for gold,
Just grabs his oily revenues

Definition and Letter Mixtures also discourage terseness. The only 3-word DLM competition (1810 HARE / POSEUR / SERINETTE) produced these 165 letters from Rev Canon C. M. Broun

Be a better preacher, a sensible one – move quickly to conclusion, then shut up – or sermons will seem a show-off of theology and congregations lose interest, glad to substitute hymns and organ for teaching.

The longest cryptic clue is a ‘Right and Left’ of 30 words and 149 letters by P. F. Henderson for URTICARIA / APOGRAPHS in competition 447

Rash primaries in USA – really, those idiots Carter and Reagan (ignoring Anderson) – you can’t choose between them! Perhaps hostages will be released – then you could say: ‘He’s set these free’

As a normal clue, C. M. Edmunds’ work from 1985 (696 ANTIMNEMONIC) of 122 letters and 22 words (10 of which are the definition) has never been surpassed in length:

One leader of morris men experiencing volte-face over performing in grotesque pageant – I’m opposed to knotted handkerchiefs, I shan’t ring any bells!

though R. J. Heald came close with his tribute to the Azed 1750 lunch (1750 PLOUGHMAN), at 120 letters and 19 words

Characters foremost among puzzlers love Oxford University get-togethers with regular doses of champagne (my lunches are far less extravagant!)

You might expect cup-winning normal clues to be somewhat briefer, and they are (8.8 words and 41 letters on average), though in competition 2064 D. K. Arnott got through 79 letters and 13 words clueing the 10 letters of COLD TURKEY

Boxing Day depression, annual excitement over, essential problem being crackers with no crack?

The Printer’s Devilry clue with most letters is from I. Carr (2040 EASTER), at 12 words and 80 letters:

Global warming and oceanic pollution are what? Many scientists’ se/minal planetary afflictions

Word frequencies

There are 120,588 words altogether in the clues of the Archive, and 19,709 distinct words. Just as in everyday language, a few words occur very frequently, a large number very rarely, and the rest somewhere in between. But this distribution makes the clues much more diverse than everyday language or literature, as this rather selective table shows:

SourceTotal
Words
Distinct
Words
Diversity
King James Bible788,25814,5651.8%
Shakespeare’s Sonnets60,4314,1696.9%
Hamlet39,4764,68611.9%
Azed Slip Archive Clues120,58819,70916.3%
 

Popular words

The most popular words in the Archive clues are unsurprisingly also some of the most frequently occurring words in English (competitions 1 to 2065).

 Freq
a3454
in3399
of3389
to2431
the2343
with1814
for1435
and1311
one1296
 
 

Behind them we find some cryptic staples

 Freq
end258
see254
old254
bit214
round202
time176
possibly163
new163
left150
head149
man130
half129
good128
short126
love124
Men123
English120
cut116
heart111
find106
top104
back99
initially96
 
 

and a little further on, some thematic favourites

 Freq
work91
run84
Party82
hard82
playing73
play71
French71
big68
Christmas64
red64
power61
King61
endless61
character61
lost59
bar57
girl56
energy54
air54
 
 

which come just ahead of the judge and setter in several guises

 Freq
Azed54
Azed’s34
AZ4
AZ’s1
 
 

and, of course, references to the art of clue-writing

 Freq
Clue51
clues16
clued10
cluers5
Clue’s5
cluing5
cluer3
 
 

Unique words

Over half of the distinct words in clues (10,595 out of 19,709) occur only once in the entire Archive. Every competition and most competitors have contributed some unique words. Each competitor is credited with their unique contributions in the competitor WordStats pages.

The very first clue in the Archive, S. L. Paton’s

Before the heart ensnares one, one likes to go on a binge

contains the Archive’s only instance of ‘ensnares’ and Competition 1’s clues include ‘Doughboy’, ‘unhealthy’, ‘groats’, ‘lunch-time’, ‘Borgia’, ‘fascinate’, ‘mysteries’, ‘strychnine’s’, ‘corgi’, ‘tigs’, ‘fears’, ‘Tollesbury’, ‘promiscuously’ and ‘Bacchae’, none of which has been repeated in the ensuing 38 years of competitions.

New additions to the word list in the latest competition in the WordStats, 2065, include ‘fells’, ‘Idea’s’, ‘metres’, ‘munching’, ‘broadloom’, ‘deet’, ‘stair’, ‘astir’, ‘sited’, ‘destroyers’, ‘Destructive’, ‘spilling’, ‘Household’, ‘maitre’, ‘seated’, ‘uncomfortably’, ‘swept’, ‘unwisely’, ‘spattering’, ‘coleopterans’, ‘poo’, ‘carpet’s’, ‘smiled’, ‘Kidderminster’ and ‘mustiness’.

And which clue has contributed the most unique words? The answer lies in that unique ‘Jingle’ competition 143, and the following verse from W. Jackson:

Caspar rex et Melchior
Balthazarque lumine
lucent Alphabetici
claro iam aenigmatos.
Vos, observatores Stellae,
Stellae nunc Observatoris,
die Salvatoris nostri
Salutamus hodie.

What’s in a word?

In these WordStats a ‘letter’ is any character from A to Z (including accented letters) or digit from 0 to 9.

A ‘word’ is any contiguous sequence of letters, digits, hyphens or apostrophes terminated by any other characters (spaces, punctuation) or either end of the clue. Every form of a word, including differently hyphenated or apostrophised forms is considered distinct (e.g. ITS and IT’S are each counted as distinct single words).