Tuesday, 7 October 2014

More bad statistics

Every time I open a newspaper I seem to come across badly presented or misleading statistics. Often the reporting of statistical findings is so bad that it is impossible to work out what is meant. Here are two examples I have noticed recently of dubious statistical inference.

1. House prices in National Parks in England and Wales.
A report in The Times yesterday was about the extra premium that people pay for a house in a national park, noting that house prices in Snowdonia have the lowest premium compared to houses in the surrounding area of all the National Parks in England and Wales. The findings were on the basis of the average prices of houses in various parks. I assume the 'average' being referred to is the arithmetic mean. The report concluded with the following puzzling statement:

Homes in the New Forest, Hampshire, were found to be the least affordable in the national parks, commanding the highest price premium ... (so far so good, I can understand that) ... with an average price starting at more than £500,000. 

What? An average price 'starting at'? Surely an average price is an average price; a set of data cannot have a range of averages!

So, do they actually just mean 'an average price of more than £500,000'? If so, then the 'starting at' is misleading. Or do they mean that they have worked out the average prices for a number of different categories of houses and the cheapest category has an average price of more than £500,000? That might be what is meant, but no other references to average prices in the article suggest this.

So, once again I am left not knowing what is being asserted by a statistical statement, other than a general sense that I probably could not afford a house in the New Forest!

2. Incidents of sexual abuse on trains in the UK.
A recent report in my daily newspaper invited me to be horrified at the rise in incidents of sexual abuse on trains. I am aware that rightly this is a sensitive subject, so let me state quite clearly that, of course, even a single case of sexual abuse on a train journey is one too many and should be condemned; and steps should taken to prevent such a thing happening. But this report claimed that there had been a 20% increase of such cases over five years from 2008 to 2013. It was at this huge increase in such cases that the reader was being encouraged to be horrified.

But the data provided in the article did not support the conclusion that there was a huge increase in such incidents. There may well have been, but there was no way of actually drawing this conclusion from the information given. For a start, there was insufficient detail about how the data was collected to know whether the comparison being used was valid. It could be, for example, that there have been social or procedural changes since 2008 that make it easier for victims of sexual abuse to report what had happened. 'More reported incidents' is not the same as 'more incidents'. An increase in the number of incidents being reported could be perceived as a good result because it could lead to a decrease in the number of incidents.

But there was also an inbuilt misunderstanding of the idea of a statistical variable in this newspaper report. Closer reading of the article revealed that they were just comparing the number of incidents reported in 2008 with the number reported in 2013. Is a rise of 20% from one to the other such a dramatic event as the headline suggested? Purely in statistical terms, no ... it is not, for two reasons.

First, no information was provided about the actual numbers being compared. So this means we have no idea as to whether this rise of 20% is significant or not. For example, if there had been only 5 incidents in 2008 and then 6 in 2013, that would have been a 20% increase, but somehow just one more incident does not seem like a dramatic increase requiring a headline.

Second, we are not told anything about how much variation there is in this statistic (the number of incidents per year). If, for example, there happens to be a very high variance in the number of incidents, then it could be that 2008 was one of the years towards the lower end of the range of values of this statistic and 2013 was one of the years towards the upper end. In the years in between the statistic might have gone up and down quite a bit. The article asserted that there had been an increase over the five years, giving the impression that the number of incidents has been gradually going up year by year. But that conclusion cannot be drawn from result of comparing 2008 with 2013. A cynical reader (what me?) might even wonder if the two years being used for the comparison had been chosen to generate the greatest possible difference and therefore the most dramatic headline.

Saturday, 20 September 2014

Scotland referendum results

The reporting of the victory of the Nosers over the Yessers in the referendum about Scottish independence provides another example of the potential confusion when people compare percentages.

The result was reported as 55% for No and 45% for Yes. So was the Nosers vote 10% higher than the Yessers, as I heard someone say? Well, no! There are two errors here. Let me explain.

When you compare two quantities you can do this either by using the difference between them or the ratio of one to the other. For example, let's say that I earn £45 an hour and you earn £55 an hour. Using difference I could say that you earn £10 an hour more than me. But using ratio, I could say that your hourly rate is 22.2% higher than mine (to one decimal place). This is because that extra £10 you earn is 22.2% of what I earn (22.2% of £45). This is similar to saying that when a price goes up from £45 to £55 that is a 22.2% increase.

This gets tricky when you are comparing percentages using percentages!

Let's look at the actual data, reported in The Times this morning:

No votes:     2 001 926
Yes votes:    1 617 989
Total votes:  3 619 915

Calculate the No vote as a percentage of the Total vote:   2001926 ÷ 3619915 × 100 = 55.30%
Calculate the Yes vote as a percentage of the Total vote:  1617989 ÷ 3619915 × 100 = 44.70%
(percentages given to 2 decimal places).

The difference between these two percentages is 10.6%, Note that this is closer to 11% than the 10% that was reported. So, that's the first error: a classic rounding error!

But it would still be confusing to report this by saying that the No vote was 10.6% higher than the Yes vote. What we can say, correctly, is:

'The No vote was 10.6 percentage points greater than the Yes vote.'

This is understood to mean that the comparison being used is the difference between the two percentages.

If we compare the actual figures, using ratio, by what percentage is 2001926 greater than 1617989? (Compare my example above for comparing two hourly rates). The No vote is 383937 more than the 1617989 votes for Yes. As a percentage that is 23.73% higher than the Yes vote (383937 ÷ 1617989 × 100). So a correct reporting would be:

'The number of people who voted No was 23.73% greater than than the number who voted Yes.'

That makes the extent of the victory clearer!

Wednesday, 13 August 2014

Golden Ratio and Beethoven's 5th

I had an email yesterday from a student in Spain that began:

'I am currently carrying out a Research Project about the Golden Ratio. In fact, what I am particularly interested in is its appearance in classical music and its use by some musicians in their compositions. It has been brought to my attention that you have been studying this matter. If it is not too much to ask, given your expertise, it would be of great help to me if you could share some of your thoughts about this particular topic ...'

In September 1978 (yes, 36 years ago!) I wrote a brief one-off article that was published in Mathematics Teaching (volume 84) on this subject, mainly focussing on some interesting occurrences of the 'golden section' in the structure of the first movement of Beethoven's 5th symphony. My suggestion was that this might make an interesting investigation for arts sixth-formers doing general studies courses. I have never returned to this subject and not written any more about it. But that little article – from my perspective probably the least significant piece I have ever had published – continues to get cited and references to it turn up often in internet searches. So, 36 years later once again I get someone contacting me as though I am an expert in this field! Which I am not.

However, this is what the article was about.

First some revision. A 'golden rectangle' is one with the lengths of the sides in a particular proportion: approximately 1.618 (or, precisely, half of '1 plus the square root of 5').  So, for example, imagine a rectangle with sides 1 cm and 1.618 cm, a shown. This is a golden rectangle. If you cut off a 1-cm square, as shown, the piece that you are left with has sides of 1 cm and 0.618 cm. The ratio of the lengths of these sides is 1 ÷ 0.618 which, remarkably, equals 1.618! So this smaller rectangle is also a golden rectangle! 
This is the defining property of a golden rectangle: that if you cut off a square, as shown, you are left with a rectangle with sides in the same ratio as the original rectangle. From this, with a little bit of algebraic manipulation, you can work out that the ratio of the sides is 1.618 to three decimal places. Dividing the rectangle like this - or the division of any quantity into two bits that are in this ratio - is also called a 'golden section'. 

For example, if I had 1618 counters and put them into 2 piles with 1000 in one pile and 618 in another, then I would have applied a golden section to the set of counters. If I were then to apply a golden section to the 1000 counters, I would get 618 in one pile and 382 in the other. And so on.

If the smaller golden rectangle in the diagram is divided into a square and an even smaller golden rectangle, the 1 cm length is divided into 0.618 cm and 0.382 cm. So a golden section can be thought of as dividing 1 unit into two parts in the ratio 0.618:0.382 (or, to two decimal places, 0.62:0.38).

This is a magical ratio that keeps turning up in all kinds of situations, in mathematical contexts, but also in art, architecture, the natural world, psychology, and, indeed, music. Bartok, for example, deliberately incorporated the golden ratio into his musical composition.

My discovery was that there are a number of occurrences of this golden ratio in the first movement of Beethoven's fifth symphony – that's the one that starts with the famous motto theme: 'da, da, da, daah'! In Beethoven's original score there are 600 bars before the final statement of the opening motto. But a statement of the opening motto also appears at bar 372. So, we have this structure for the three main statements of the motto: motto starts ... 372 bars ... motto starts... 228 bars ... motto starts. 

Now what do we get if we split the 600 bars into 2 section using the approximation to the golden section 0.62:0.38? Well, amazingly, 0.62 of 600 is 372, and 0.38 of 600 is 228. So the three statements of the opening motto theme begin at points that divide the score using a golden section! 

Is this just something that happened by accident, or did Beethoven do it deliberately? We shall probably never know. I have no view on this. But there are other instances in this movement. I will mention just two.

1. The 'exposition' in the first movement (the statements of the two main musical subjects) is in three sections: 24 bars of the first subject rounded off with a version of the motto; then 38 bars with an extended restatement of the first subject rounded off with a version of the motto; then a further 62 bars that begin with the statement of the second subject. 24:38 is a golden section of the first 62 bars; and 38:62 is a golden section of the last 100 bars!

2. The movement has the most extraordinary 'coda'. A coda is usually just a few bars of music to round off a movement. In this case Beethoven gives us a coda that is 129 bars long! Divide this coda up using the golden section and you get 49 bars and 80 bars. So what happens after 49 bars of the coda? This is the point where Beethoven actually introduces a completely new tune that has not appeared in the movement so far. Before Beethoven no-one would ever introduce new material in a coda! So this is a very significant point in the coda. Is Beethoven signalling his piece of radical creativity in this very long coda by linking it with the golden section? Again, we shall never know. But either way, it is interesting.

Tuesday, 5 August 2014

Number or amount

Here comes another little rant about language. This one is about the distinction between 'number' and 'amount'. Almost every day I hear the word 'amount' used where the word 'number' is what is required (never the other way round). I am getting increasingly irritated by the number of people who talk about 'the amount of people', for instance.

The distinction is so simple – and actually significant in mathematical terms. Some variables are discrete and measured by counting the number of items in a set. The value of such a variable is 'the number of items'. Other variables are continuous and measured by the size of the quantity, often in units of measurement such as grams or litres. The value of this kind of variable is 'the amount'.

It's easy to spot the difference. If you are using a plural noun then it is 'the number' of items. If you are using a singular noun then it is 'the amount'.

The number of bottles of wine, but the amount of wine in each bottle.
The number of cups of coffee, but the amount of coffee in each cup.

Here are some other examples.

I would talk about the large number of slices of toast I had for breakfast (slices: plural); but the huge amount of toast (toast: singular).

I would talk about the number of people in my garden on Sunday lunchtime (people: plural); but the amount of food that they ate (food: singular).

I would talk about the number of cars on the road (cars: plural); but the amount of traffic (traffic: singular).

Wednesday, 23 July 2014

Learning curves

If I hear another person say that they have been on a steep learning curve I shall explode! Here is another example (see previous post) of a phrase from a field of knowledge being taken up by those who do not understand it, and then misused – and, of course, over-used! So people tell me they've started a new job and they've been on a learning curve, when all they mean is they have been learning.

'Learning curve' is a term from the field of the psychology of learning. It refers to a graphical representation of the progress of an individual's learning over time. So, if you are learning some new skill your progress in learning can be represented by a graph, with one axis the cumulative learning achieved (going from 0% to 100%) and the other axis the time devoted to learning.

So 'being on a learning curve' really does not mean anything other than 'learning'! So, why not just say 'I have been learning'!!? Being on a learning curve does not mean that the learning has been especially difficult, demanding, rapid or slow! Anyone who is learning anything can have their progress in learning modelled by a learning curve! Here's an example.

The learning curve simply attempts to model the progress of the learning. In this case, initially the learning curve is shallow. This is because to start with the learner makes slow progress, probably because learning is difficult. Then the rate of learning picks up, as learning gets easier. Then it gets shallow again, as towards the end of the process there is not much more to learn and a lot of time is given to consolidating what has been learnt already.

Here's another example: to start with this learner is on a 'steep (part of a) learning curve'!

Initially this learning task is found to be easy. Learning is rapid, and the learner makes quick progress. Again there is a tailing off in the rate of learning towards the end, as in the previous example.

So, the learner in the initial stages here might well say (correctly) 'I have been on a steep part of a learning curve'. But in saying this they would mean that the learning has been really easy! But people talk about 'steep learning curves' as though the steeper it is the more challenging has been the learning!

So, listen up!
1. Please don't say 'I've been on a learning curve' when you just mean 'I have been learning'!
2. Don't say 'I have been on a steep learning curve'. No learning curve in practice is 'steep' in every part.
3. You may say 'I have been on a steep part of a learning curve'. But be aware that this probably means the learning has been really easy for you!
4. If you want to imply that the learning has been difficult for you, say 'I have been on a really shallow part of a learning curve'!

Better still, just forget all about learning curves. Just talk about learning.

Tuesday, 22 July 2014


What has happened to the powerful mathematical word 'parameter'? More and more we hear people, particularly those engaged in management speak, who clearly have not a clue as to what the word really means, and using to it to mean 'perimeter'! So they talk about 'having to work with certain parameters', meaning within certain boundaries. This has become so commonplace that some dictionaries seem to have conceded this as a secondary meaning. This is all very irritating to a mathematician! Either leave our concepts alone please or use them properly! 

Parameters are not things that you work within. The word is a mathematical term for what is sometimes called, enigmatically, a variable constant. To take a simple example, the equation y = mx is the general form of the equation for a straight line passing through the origin. The x and y are the independent and dependent variables, respectively. But m represents the slope of the line. For any given line, it is a constant. But m can take any value, so it is also a variable. So the value it takes determines which straight line through the origin is being considered. This m is a therefore a parameter. 

If you want to transfer this idea to a business or social context, then you could say that the parameters are the variables whose values determine precisely the nature of the situation you are in. It would be perfectly acceptable, for example, to say that 'two of the the parameters for an architectural project are budget and completion time'. But not to say that 'the parameters of the site are clearly marked on this map'. That's talking about perimeter!

Tuesday, 15 July 2014

"Come in Number 24, your time is up!"

So, education bids farewell to Michel Gove!

In the years of my career in education I have now seen off 24 secretaries of state for education (or variations on the title of the office)! Michael Gove is number 24 on this list, ordered both chronologically and in relation to my personal enthusiasm for his policies!

Looking back, it's quite a list! There are such enduring political names as Margaret Thatcher, Shirley Williams, Keith Joseph, Kenneth Baker, Kenneth Clarke, David Blunkett, Ed Balls ... all have come for a while, dabbled in education, and moved on.

So, we know who would be at the bottom of this list of twenty-four education supremos. But who would I put at the top?

Surprisingly (to me at least), probably the secretary of state who made the most significant impact on education in this country was Kenneth Baker, who in the 1988 Education Act brought in a national Curriculum for the first time and instituted in-service training days for teachers (still called Baker days). He has since won browny-points for his outspoken criticism of Gove, accusing him of basing his policies too narrowly on his own experiences. And he is currently overseeing what looks like a first-rate initiative in education: the university technical colleges.