Coin Flips with Go

Following on from my little experiment with flipping coins millions of times, I thought that it would be interesting to write the same program in Go for comparison.

func main() {
    rand.Seed(time.Now().UTC().UnixNano())

    var competitionSize, runs int
    flag.IntVar(&competitionSize, "competitionSize", 10, "The size of each coin tossing competition.")
    flag.IntVar(&runs, "runs", 10, "The number of runs of the competition")

    flag.Parse()

    fmt.Println("heads, tails, flips")
    for i := 0; i < runs; i++ {
        countHeads := 0
        for j := 0; j < competitionSize; j++ {
            countHeads += rand.Intn(2)
        }

        fmt.Printf("%d, %d, %d\n", countHeads, competitionSize-countHeads, competitionSize)
    }
}

The basic idea of how to write the program is unchanged in this language. The output showed a similar distribution to the C# program.

However, Go has straightforward command line arguments parsing built into the standard library (like Python or Perl). In C#, there are a number of third party libraries that do this that can be installed via NuGet. I went with Command Line Parser.

To use this, I needed to add a class with properties with attributes to describe my expected options:

class Options
{
    // Taken from http://www.bbc.co.uk/news/politics/eu_referendum/results
    [Option('s', "competitionSize", DefaultValue = 33551983, HelpText = "The size of each coin tossing competition.")]
    public int CompetitionSize { get; set; }

    [Option('r', "runs", DefaultValue = 1000, HelpText = "The number of runs of the competition")]
    public int Runs { get; set; }
}

The command line arguments are then parsed in the main method:

var commandLineArgs = new Options();
if (Parser.Default.ParseArguments(args, commandLineArgs))
{
    var runs = commandLineArgs.Runs;
    var flips = commandLineArgs.CompetitionSize;
    ...
}
else
{
    WriteLine("Unable to parse the command line arguments!");
}

While this is not difficult to understand, it is more complicated than it needs to be and requires quite a bit more typing than in the Go program.

10 runs of the referendum simulation in both programs on Windows 10 with an Intel i7 4500U running at 1.8GHz took 7 seconds for the C# program and 13 seconds for the Go program. I wouldn’t take such simple programs very seriously for drawing conclusions about the relative speed of these languages.

The updated C# program and the complete Go program are on GitHub:
https://github.com/robert-impey/CodingExperiments/blob/master/C-Sharp/CoinFlipsAreRandom/CoinFlipsAreRandom/Program.cs
https://github.com/robert-impey/CodingExperiments/blob/master/Go/coinToss.go

Was the EU Referendum Random?

One of the claims that I saw on social media in the aftermath of the recent EU referendum here in the UK was that the result (52% to 48%) was so close that it was little different from tossing a coin.

Without getting bogged down in the politics of that referendum, or the various campaigns that led up to it; I want to consider whether this claim holds any water. How similar to millions people each tossing a coin and voting accordingly was the result?

According to the BBC, there were 17,410,742 votes to leave and 16,141,241 votes to remain, giving a total of 33,551,983 votes. If we were to make each of these people toss a coin and count up the results, the ratio of heads/tails or remains/leaves could be anywhere between all heads and all tails. However, we would expect the counts to be about equal if the coins were all fair. Of course, any ratio is possible, but if we were to run the coin tossing game repeatedly, we would expect to mean ratio to converge on 1:1. How likely would a 52:48 ratio be?

The leave side got a share of the vote equal to 0.51891842. Therefore, their absolute deviation from the expectation (the mean, or 0.5) is 0.01891842. The difference between the two counts is 1,269,501. Would we expect to see a deviation of this magnitude in a coin tossing competition?

Being a computer programmer rather than a mathematician, I’m going to look at this using a simple program.

WriteLine("heads, tails, flips, heads share");

var runs = 1000;

// Taken from http://www.bbc.co.uk/news/politics/eu_referendum/results
var flips = 33551983;

var randomNumberGenerator = new Random();

for (var run = 0; run < runs; run++)
{
    var heads = 0;

    for (var coinToss = 0; coinToss  0)
        {
            heads++;
        }
    }

    WriteLine($"{heads}, {flips - heads}, {flips}, {(0.0 + heads) / flips}");
}

This program simulates 1,000 coin tossing competitions with 33,551,983 players and writes the counts as comma separated values.

Putting the output into Excel, the largest deviation was 0.000275081 (or a difference between counts of 18,459) and the smallest was 1.49022E-08 (which was 16,775,992 heads and 16,775,991 tails. This happened twice in 1,000 runs!) The largest difference between heads and tails was 69 times smaller than the result from the referendum.

Plotting the shares in decreasing order, we see how quickly larger deviations fall off:

deviation

Putting the deviations into bins and counting the competitions by deviation from the expectation, we see that smaller deviations are more common:

bins

Whatever else we might say about the result, we cannot seriously claim that the result was random.

The full program can be found here:

https://github.com/robert-impey/CodingExperiments/blob/master/C-Sharp/CoinFlipsAreRandom/CoinFlipsAreRandom/Program.cs

The Excel file can be found here:
http://www.reversing-entropy.com/wp-content/uploads/2016/09/EuRef.xlsx

People don’t sit on Park Benches at Random

As I was approaching Berkeley Square at lunchtime on Thursday, I noticed a pattern in the way that the people were sitting on the park benches.

2014-03-13 12.45.16-small

Even though it was a sunny day, the square was not very full, and there were about twice as many bench seats as there were people. Nobody wanted to sit too close to another stranger (this is London, after all), so they sat the maximum distance apart that they could. This resulted in evenly distributed people – Empty, Person, Empty, Person, and so on. The sequence cannot be called random because the people are actively deciding where to sit.

I decided to look at a simple random sequence in F#.

To start off with, I decided to define a type for my program. This might not sound like a promising starting point for a program, but in F# it’s so painless that it is a good way to begin thinking about the domain and what states make sense in your domain. For more on this, see Scott Wlaschin’s slides on Domain Driven Design in F#.

My domain is coin tossing. Therefore, I created the following discriminated union in F# interactive:

> type CoinToss = Tails | Heads;;

type CoinToss =
  | Tails
  | Heads

This allows a coin toss to be either heads or tails but not both or neither. Static typing allows the program to make definite statements about the world.

I also needed a random number generator:

> let r = System.Random();;

val r : System.Random

F# allows you to define infinite sequences that can be evaluated lazily. The following is a potentially infinite sequence of random coin tosses.

> let coins = Seq.initInfinite (fun _ -> if r.Next(2) = 0 then Heads else Tails);;

val coins : seq

r.Next(2) means the next random integer less than 2 but greater than or equal to 0, that is a random 0 or 1.

If the idea of an infinite sequence of coin tosses sounds weird and unruly, here’s how we can make use of them. The take function allows us to tame infinity. Or at least take a little bit of it.

The following says “Take 10 of the randomly generated coins and pass them along (|>) to a function that iterates through them one at a time and prints them out”

> Seq.take 10 coins |> Seq.iter (printfn "%A");;
Heads
Heads
Tails
Heads
Heads
Tails
Tails
Heads
Tails
Tails
val it : unit = ()

That sequence is not very dissimilar from the lunchers on the park benches in the photo above. There are no long clusters of one thing (person, heads, etc.) or another (empty seat, tails, etc.).

However, this isn’t always the case:

> Seq.take 10 coins |> Seq.iter (printfn "%A");;
Tails
Tails
Heads
Heads
Heads
Tails
Heads
Tails
Heads
Heads
val it : unit = ()

or even:

> Seq.take 10 coins |> Seq.iter (printfn "%A");;
Tails
Tails
Tails
Tails
Tails
Heads
Heads
Heads
Heads
Heads
val it : unit = ()

Five tails followed by five heads looks completely fixed. In a casino, you might start to get worried. But it happened. Trust me.

One way to check if the coin is being tossed fairly, is to count how many times each face lands. This function does that:

> let rec countCoins coins (heads, tails) = 
    match coins with 
    | [] -> heads, tails
    | hd :: tl ->
        let newCounts =
            match hd with
            | Heads -> heads + 1, tails
            | Tails -> heads, tails + 1
        countCoins tl newCounts;;

val countCoins : coins:CoinToss list -> heads:int * tails:int -> int * int

The way to read this function is as follows.

If the argument coins matches an empty list, return the counts for heads and tails as they are.

Otherwise, coins must be a list with a first element (hd) and the remaining part of the part of the list (tl), which might be an empty list. We need to find the new counts of heads or tails by adding 1 to either count. Finally, we call the countCoins function again with the remainder of the list (tl). This will continue until we have stripped all the first elements from the list and tl is an empty list. The function then terminates as we have satisfied the first condition above.

For convenience, I defined a function that gets a number of coin tosses and puts them into a list to be counted:

> let getCoins numberOfCoins = Seq.take numberOfCoins coins |> Seq.toList;;

val getCoins : numberOfCoins:int -> CoinToss list

For a short list, the counts for each face might not be equal:

> countCoins (getCoins 10) (0, 0);;
val it : int * int = (6, 4)

Note that the counts for heads and tails start off as zeroes.

However, with a larger number of coins (in this case, pown 10 7 or 10 to the power of 7 or 10 million), the counts should both approach half the number of tosses:

> countCoins (getCoins (pown 10 7)) (0, 0);;
val it : int * int = (4997646, 5002354)

Using F# to minimise a function

Finding the minimum of functions is at the heart of optimisation. Mathematicians, engineers and programmers have come up with a large number of approaches to solving this problem, including differentiation, genetic algorithms and even exhaustive search.

Consider a quadratic function such that could be written in F# as

let f x = (x ** 2.0) - (2.0 * x) + 1.0

Finding the minimum means finding the input value for the function that returns to lowest value. If we plot the curve, then the minimum is the lowest point of the curve.

One way to find this is find the derivative of the function:

(2.0 * x) - 2.0

and solve the equation where the derivative is equal to 0, which is when x is 1. This probably the best way to solve this problem in practice.

However, not all functions have derivatives that are easy to find. Therefore, an alternative way to minimise a function is an exhaustive search of the inputs in some range. This is not an elegant or generally efficient solution, but it works.

In F#, we might proceed as follows.

We define the range that we want to search and the step between candidate inputs:

let min = 0.0
let max = 10.0
let step = 0.01

We know where to start searching- with the lowest candidate input:

let firstCandidate = f min, min

This creates a tuple of two floats, the first being the output and the second being the lowest candidate input.

We also want a sequence of tuples of two floats that are the remaining candidate solutions:

let remainingCandidates = seq { for c in min + step .. step .. max do yield f c, c }

Note that at this point the sequence has not been enumerated and the function has not been run with each of the candidate inputs. Because sequences are evaluated lazily, the function will not be run for each of the candidate inputs until it is asked for.

We are trying to minimise the function, therefore we need a function that can compare the first elements of two candidate solutions:

let findMin currentMinSln candidateSln =
    match fst currentMinSln  currentMinSln
    | false -> candidateSln

Running this code in fsi shows its type to be:

val findMin : 'a * 'b -> 'a * 'b -> 'a * 'b when 'a : comparison

The F# compiler can infer that the function takes in two tuples, each with two elements. The tuples must be of the same type and the first element of each tuple must be comparable. Way to go, type inference!

Everything has been set up at this point. We just need to run the code:

let minSln = Seq.fold findMin firstCandidate remainingCandidates

The fold function is to reduce a sequence to a single value, starting with a given accumulator. In this case, the first candidate solution that we created earlier is our starting point. The output of the function for each candidate input is compared to that accumulator.

Running the code finds the same answer (1) that we found with calculus.

This code runs pretty quickly, but this approach is generally slow. Our range is only 10 wide and the step is 0.01, so there are only 1000 candidate inputs. Increasing the size of the range or decreasing the step would increase the number of inputs. More worryingly, the size of the search space increases exponentially as the number of inputs to the function increases. So a function that took two inputs for the same range for each input would have a million candidate inputs, three inputs would take a billion and so on.

However, it is also quite straightforward to parallelise this approach. The sequence of candidate solutions could be split up into partitions and each partition could be sent off to a different computer. Each batch would find a local minimum for the range it was given. Finding the minimum for the whole of our range of inputs is simply a matter of find the minimum in the returned list.

The complete code for this can be found here:

https://github.com/robert-impey/CodingExperiments/blob/master/F%23/Loose/MinimiseQuadratic.fs

Trying F#

As part of my ongoing interest in functional programming languages, I have decided to try F#. I have been looking at the Try F# page:

http://www.tryfsharp.org/

This provides an online REPL environment for exploring the language. I have a number of students at school who are exploring JavaScript and Python as first languages through Code Academy. I thought it would be a useful experiment to try this sort of educational activity with an unfamiliar language myself.

At first glance, I quite like the learning experience of reading through explanations and running some code, although it’s not as much fun as trying to make something yourself. In order to consolidate my learning, I tried to solve the first of the problems in Project Euler, which is to find the sum of the natural numbers less than 1000 that are multiples of 3 or 4. (I have already done this in Haskell).

I came up with this:

[0..999]
|> List.filter (fun x -> x % 3 = 0 || x % 5 = 0)
|> List.sum

Project Euler 1

I took a look at the first problem on the Project Euler site earlier this afternoon:

http://projecteuler.net/problem=1

This asks you to find the sum of the natural numbers that are less than 1000 and are multiples of 3 or 5.

I decided that Haskell’s filter function and a lambda would make this problem trivially easy to solve. In GHCi:

Prelude> let sumMultiplesOf3Or5 max = sum(filter(\x -> (mod x 3 == 0) || (mod x 5 == 0))[1 .. max - 1])
Prelude> sumMultiplesOf3Or5 1000
233168

As I played around with other values for max, something unexpected appeared before my eyes. A pattern of repeated digits begins to emerge in the sums of the multiples of 3 and 5 less than 10 to the power of i as i increases.

Prelude> map sumMultiplesOf3Or5 (map (\i -> 10 ^ i) [1..6])
[23,2318,233168,23331668,2333316668,233333166668]

Something similar happens with the sums of the multiples of 3 or 5 less than 20, 200,…,30,300, and so on:

Prelude> map (\j -> map sumMultiplesOf3Or5 (map (\i -> j * 10 ^ i) [1..6])) [1..9]
[[23,2318,233168,23331668,2333316668,233333166668],[78,9168,931668,93316668,9333166668,933331666668],[195,20850,2098500,209985000,20999850000,2099998500000],[368,37268,3732668,373326668,37333266668,3733332666668],[543,57918,5829168,583291668,58332916668,5833329166668],[810,83700,8397000,839970000,83999700000,8399997000000],[1133,114218,11432168,1143321668,114333216668,11433332166668],[1428,148668,14926668,1493266668,149332666668,14933326666668],[1845,188550,18895500,1889955000,188999550000,18899995500000]]

Reformatted:

[[23,2318,233168,23331668,2333316668,233333166668], -- [10,100,1000,10000,100000,1000000]
[78,9168,931668,93316668,9333166668,933331666668], -- [20,200,2000,20000,200000,2000000]
[195,20850,2098500,209985000,20999850000,2099998500000], -- and so on
[368,37268,3732668,373326668,37333266668,3733332666668],
[543,57918,5829168,583291668,58332916668,5833329166668],
[810,83700,8397000,839970000,83999700000,8399997000000],
[1133,114218,11432168,1143321668,114333216668,11433332166668],
[1428,148668,14926668,1493266668,149332666668,14933326666668],
[1845,188550,18895500,1889955000,188999550000,18899995500000]]

I’m not sure if this continues indefinitely. I wonder if similar patterns emerge with numbers in different bases. I’m curious about trying multiples of numbers other than 3 or 5. At this point, I’ve really no idea why this happens, but my curiosity is aroused.

I’m also really impressed with Haskell, a language that I barely know. The only other time I’ve used functional programming seriously is with C# and VB.Net for LINQ, of which I am a huge fan.

GHCN Monthly and MS Access

For some of the last few evenings, I have been learning about MS Access’s data import functionality in order to interrogate the data of the Global Historical Climatology Network-Monthly dataset. This dataset holds records of temperature readings dating back to the beginning of the eighteenth century from more than seven thousand stations around the globe.

GHCN Monthly

The data are in text files with a fixed width format. It’s very straightforward to set up the import format (although there are many columns!) and save the import specification so that future datasets can be imported as they are released with ease. The largest data files have almost half a million rows, yet Access can import the data in a few seconds. The resulting table of readings of average, minimum and maximum temperatures for each month for each station has more than one million rows. I have not experimented with adding indexes yet but, in spite of this, I can run queries that are not painfully slow.

Now that I have the data into an Access database, I hope to start analysing the data. I have produced a couple of charts for a single station but am yet to run any serious calculations. I have read of a similar project at:

A Quick and Dirty Analysis of GHCN Surface Temperature Data

The algorithm that caerbannog (the blogger) uses in his C++ program to smooth the data and calculate averages is fairly simple. Something similar should be possible (or even easy) using the MS Office tools.

Ultimately, my aim for this is to make this the basis of an ICT lesson or project. As caerbannog notes “Never in history has science been more accessible to the general public than it is now.” The quantities of data that can be accessed for free are as enormous as the power of the computers that are now cheaply available. I hope that my students will be inspired to look deeply into the issue and this will help develop their sense of empirical curiosity.

Richard Feynman on Tuva

We can find Richard Feynman’s Messenger Lectures on physics at the intriguingly named Tuva site:

http://research.microsoft.com/apps/tools/tuva/#data=4%7C0%7C%7C%7C%7C

Dr. Feynman is an engaging lecturer; it is perhaps regrettable that all lectures are not so entertaining.

At one point Dr. Feynman says that “It is impossible, when picking one particular example of anything, to avoid picking one that is atypical in some sense.” Of course, this is true by definition. If we were to find an example that was typical in every sense, it would be atypical in that it was not atypical in some sense, and so it would be atypical in some sense. Oh, the joy of school boy pedantry!

The video is rendered with a Silverlight player, which is perhaps not available on all platforms. It also used 100% of my CPU’s clock cycles and caused the laptop to crash three times. I guess that Silverlight has a long way to go before it can threateningly compete with Flash. On the one hand, it’s a good thing that Flash has some more competition (not that I am accusing the Adobe engineers of laziness, mind). On the other hand, the internet will not be as rich a place as it might be if a lot of content is only available to Microsoft’s customers. I thought that that war had been won a long time ago.

King Canute got his feet wet

In an article to do with G8 leaders and climate change

G8 leaders to set emissions goals

the journalist reports that “Leaders of G8 nations are to set a target to cut greenhouse gases by 80% by 2050”. This seems like something that is difficult but possible. It is something that can be controlled by humans. I doubt that the leaders of the G8 actually have such power. They might represent the populations that emit the greatest amount of carbon dioxide, and measures that they take might reduce carbon dioxide emissions in their countries. However, their power does not extend to every other nation on the planet, which will probably produce a greater share of the world’s carbon dioxide emissions by the middle of the century. However, global carbon dioxide emissions are controllable by humans.

The article goes on to say that “[the G8 leaders] will also call for any human-induced temperature rise to be held below 2 degrees Celsius”. Is this going to be a legally binding limit? What sanctions will they take against the earth’s climate if it disobeys they proclamations? Xerxes once had the Hellespont whipped after a storm washed away a bridge, will future leaders resort to such tactics? King Canute order the tides to stop but ended up getting his feet wet.

I do not want to appear defeatist. I think that there are measures that we can take in order to reduce the impact of human activity. There are clearly much better ways of producing energy than burning coal and petrol. Governments have a role to play in moving to newer technologies. But politicians are no more able to ban global warming than they are able to set the ration the circumference of a circle to its diameter.

http://www.agecon.purdue.edu/crd/Localgov/Second%20Level%20pages/indiana_pi_bill.htm

Soju and prime numbers

Earlier this evening, I went with my girlfriend to eat Samyeopsal, a barbecue pork dish. The traditional accompaniment for this pork dish is Soju, a type of sweet vodka from Korea. One normally drinks the liquor from a two ounce shot glass, so a 360 ml bottle will almost fill seven glasses. Traditionally, one fills the glass of one’s dining partner whenever their glass is empty. If two people drink one bottle, then one ends up drinking four glasses and the other drinks three. If three people drink one bottle, then one person drinks three glasses, but the other two just have two glasses. Because it is awkward for one person to drink alone, one often ends up buying a second bottle to keep the solitary drinker company. The only time that there would not be one shot left over is a table of one (how sad!) or a table of seven (who would certainly order more than one bottle, anyway). Choosing a size of bottle than is a prime multiple of the size of a standard glass is clearly a clever trick from a marketing point of view.

I’m reminded of the periodicity of cicada migrations that Daniel C Dennett writes about in “Darwin’s Dangerous Idea”. Apparently, colonies of different species of cicadas return to different sites at different periods. However, the periods are always a prime number of years, sometimes as long as seventeen years. The explanation that he offers is that if there is a predator that returns to that site regularly (say once ever two years) then the cicadas will avoid that predator more often if the period of their return is a prime number. If they returned with a periodicity that was a composite, non-prime number of years, then one of the factors of that number of years might be the frequency that the predator returned, which would ensure that they met up regularly.