Finding the Nearest Tube Station with LINQ

I recently wrote a little command line program in C# that can tell you the name of the nearest tube station. I’ve put the program on GitHub in my Coding Experiments repository:

https://github.com/robert-impey/CodingExperiments/tree/master/C%23/NearestTube

The interface is very simple- this more of an experiment than something that I intend for mass consumption! You enter a location via the command line as the latitude and longitude:

Nearest Tube CLI

The starting point of the program was to calculate the distances between points. To do this I adapted some JavaScript code that John D. Cook wrote:

http://www.johndcook.com/lat_long_distance.html

Mathematical code looks very similar in many languages, and the translation from JavaScript to C# was trivially simple as the languages are very similar in this case. This code is part of the Point class:

        /// 
        /// Porting JavaScript code from
        /// http://www.johndcook.com/lat_long_distance.html
        /// 
        /// 
        /// 
        public double Distance(Point that)
        {
            // Compute spherical coordinates

            double rho = 6373;

            // convert latitude and longitude to spherical coordinates in radians
            // phi = 90 - latitude
            double phi1 = (90.0 - this.Latitude) * Math.PI / 180.0;
            double phi2 = (90.0 - that.Latitude) * Math.PI / 180.0;
            // theta = longitude
            double theta1 = this.Longitude * Math.PI / 180.0;
            double theta2 = that.Longitude * Math.PI / 180.0;

            // compute spherical distance from spherical coordinates
            // arc length = \arccos(\sin\phi\sin\phi'\cos(\theta-\theta') + \cos\phi\cos\phi')
            // distance = rho times arc length
            return rho * Math.Acos(
                    Math.Sin(phi1) * Math.Sin(phi2) * Math.Cos(theta1 - theta2)
                    + Math.Cos(phi1) * Math.Cos(phi2)) * 1000;
        }

Once I was able to calculate the distances between points on the map, I needed to be able to sort the tube stations of London by distance from the given point. This is the sort of code that LINQ to objects handles very naturally. In the SequentialTubeStationFinder class, I have the following method:

        public TubeStation FindNearestTubeStation(Point point)
        {
            return (from tubeStation in tubeStations
                    orderby tubeStation.Point.Distance(point)
                    select tubeStation).First();
        }

tubeStations is a generic list of TubeStation objects. Each has a Point property that is used to sort the tube stations and find the nearest one to our location.

The declarative style of programming that LINQ uses is much easier to read (and therefore maintain) than the equivalent code written with a for loop.

Trying F#

As part of my ongoing interest in functional programming languages, I have decided to try F#. I have been looking at the Try F# page:

http://www.tryfsharp.org/

This provides an online REPL environment for exploring the language. I have a number of students at school who are exploring JavaScript and Python as first languages through Code Academy. I thought it would be a useful experiment to try this sort of educational activity with an unfamiliar language myself.

At first glance, I quite like the learning experience of reading through explanations and running some code, although it’s not as much fun as trying to make something yourself. In order to consolidate my learning, I tried to solve the first of the problems in Project Euler, which is to find the sum of the natural numbers less than 1000 that are multiples of 3 or 4. (I have already done this in Haskell).

I came up with this:

[0..999]
|> List.filter (fun x -> x % 3 = 0 || x % 5 = 0)
|> List.sum

One liner to show logs without Rotated Backup files

I’ve been looking at the Apache log files on a web server this morning. There are many virtual hosts on the machine and the log rotation scripts have created many numbered backup files of the logs. To make the directory listing more readable, I have been using the following one-liner:

ls -l /var/log/apache2 | grep -Ev -e '[[:digit:]]+(\.gz)?$'

This will only display the current log files, assuming that /var/log/apache2 is the directory in which you store your Apache logs and that you do not store any other files there.

I hope it helps.

Project Euler 1

I took a look at the first problem on the Project Euler site earlier this afternoon:

http://projecteuler.net/problem=1

This asks you to find the sum of the natural numbers that are less than 1000 and are multiples of 3 or 5.

I decided that Haskell’s filter function and a lambda would make this problem trivially easy to solve. In GHCi:

Prelude> let sumMultiplesOf3Or5 max = sum(filter(\x -> (mod x 3 == 0) || (mod x 5 == 0))[1 .. max - 1])
Prelude> sumMultiplesOf3Or5 1000
233168

As I played around with other values for max, something unexpected appeared before my eyes. A pattern of repeated digits begins to emerge in the sums of the multiples of 3 and 5 less than 10 to the power of i as i increases.

Prelude> map sumMultiplesOf3Or5 (map (\i -> 10 ^ i) [1..6])
[23,2318,233168,23331668,2333316668,233333166668]

Something similar happens with the sums of the multiples of 3 or 5 less than 20, 200,…,30,300, and so on:

Prelude> map (\j -> map sumMultiplesOf3Or5 (map (\i -> j * 10 ^ i) [1..6])) [1..9]
[[23,2318,233168,23331668,2333316668,233333166668],[78,9168,931668,93316668,9333166668,933331666668],[195,20850,2098500,209985000,20999850000,2099998500000],[368,37268,3732668,373326668,37333266668,3733332666668],[543,57918,5829168,583291668,58332916668,5833329166668],[810,83700,8397000,839970000,83999700000,8399997000000],[1133,114218,11432168,1143321668,114333216668,11433332166668],[1428,148668,14926668,1493266668,149332666668,14933326666668],[1845,188550,18895500,1889955000,188999550000,18899995500000]]

Reformatted:

[[23,2318,233168,23331668,2333316668,233333166668], -- [10,100,1000,10000,100000,1000000]
[78,9168,931668,93316668,9333166668,933331666668], -- [20,200,2000,20000,200000,2000000]
[195,20850,2098500,209985000,20999850000,2099998500000], -- and so on
[368,37268,3732668,373326668,37333266668,3733332666668],
[543,57918,5829168,583291668,58332916668,5833329166668],
[810,83700,8397000,839970000,83999700000,8399997000000],
[1133,114218,11432168,1143321668,114333216668,11433332166668],
[1428,148668,14926668,1493266668,149332666668,14933326666668],
[1845,188550,18895500,1889955000,188999550000,18899995500000]]

I’m not sure if this continues indefinitely. I wonder if similar patterns emerge with numbers in different bases. I’m curious about trying multiples of numbers other than 3 or 5. At this point, I’ve really no idea why this happens, but my curiosity is aroused.

I’m also really impressed with Haskell, a language that I barely know. The only other time I’ve used functional programming seriously is with C# and VB.Net for LINQ, of which I am a huge fan.

Generics in VB.Net

Taking a lead (again) from Java generic types for beginners, I decided to experiment with generic types in VB.Net. The ICT Cool blog covers the idea behind generics and the MSDN documentation on generic types in VB.Net gives more details on the topic, so I won’t repeat what has been explained elsewhere.

My code shows how generics can make code more robust by enforcing compile time type checking. The type safety of the various objects can be checked in Visual Studio by hovering the mouse over any of the variables in the code. This brings up a pop up that says the type of the object. Many of the lines of the main sub have been commented out because the project would not build otherwise. Try uncommenting them to see the errors that the IDE finds.

The code also shows that generic types allow coders to build and use type safe, polymorphic collections. Put simply, we can create a list (for example) that will only allow us to add an object of any type, so long as it implements an interface given at the time of construction.

GenericsInLists

String Manipulation with the Decorator Pattern

Inspired by the Java program to sort and reverse a string at ICTCool.com, I decided to try to solve the problem using Java and the Decorator Pattern.

StringDecoratorDemo.tar.gz

My solution required considerably more lines of code than the code at ICT Cool. For a problem as simple as this, using the Decorator Pattern is probably overkill. However, the advantage of this pattern is that multiple decorators can be stacked up at run time in any order that we choose. Also, if we wanted to add more operations later, we simply add a new class, which is cleanly separate from the other operations.

I’ve written the program using NetBeans, which creates quite a few files other than the source files, which are in src. To run the program, cd to the dist directory and type java -jar StringDecoratorDemo.jar.

Script for Listing Etexts from Project Gutenberg

local-etext-listing

I’m a big fan of Project Gutenberg and have downloaded many of their etexts over the years. However, their etexts have numeric file names, which aren’t very human friendly. In order to keep track of the etexts that I have saved on my computer, I’ve written a little perl script to extract the author and title from the etexts and generate an HTML file to list them.

The code’s release under the GPL, so feel free to tinker with the code and share alike.

Percidae

For the last few weeks I have been looking for a job as a programmer. One of the jobs that interests me involves writing VB.net. Many of my friends who are also programmers have howled with anguish at the mere mention of this technology.

In order to get a better idea of how the language works, I’ve written a small program for editing the PATH environment variable on Windows machines:

http://code.google.com/p/percidae/

I find myself editing this variable on various Windows machines pretty regularly. Every time, I also am annoyed by the small text box in the dialog in the Control Panel.

So far I’ve enjoyed the experience of writing with Visual Basic and can’t see what all the fuss is about. Having written PHP for years, I’m used to ignoring the comments of language purists. I’m much more interested in getting something working than any imagined superiority of different languages.

Richard Feynman on Tuva

We can find Richard Feynman’s Messenger Lectures on physics at the intriguingly named Tuva site:

http://research.microsoft.com/apps/tools/tuva/#data=4%7C0%7C%7C%7C%7C

Dr. Feynman is an engaging lecturer; it is perhaps regrettable that all lectures are not so entertaining.

At one point Dr. Feynman says that “It is impossible, when picking one particular example of anything, to avoid picking one that is atypical in some sense.” Of course, this is true by definition. If we were to find an example that was typical in every sense, it would be atypical in that it was not atypical in some sense, and so it would be atypical in some sense. Oh, the joy of school boy pedantry!

The video is rendered with a Silverlight player, which is perhaps not available on all platforms. It also used 100% of my CPU’s clock cycles and caused the laptop to crash three times. I guess that Silverlight has a long way to go before it can threateningly compete with Flash. On the one hand, it’s a good thing that Flash has some more competition (not that I am accusing the Adobe engineers of laziness, mind). On the other hand, the internet will not be as rich a place as it might be if a lot of content is only available to Microsoft’s customers. I thought that that war had been won a long time ago.

Is there an algorithm for Wikipedia?

Google’s latest offering,

http://www.google.com/squared

is rather fun, but I’m not convinced that I will use it very often.

Compare search results like this:

http://www.google.com/squared/search?q=premiership+clubs

with

http://en.wikipedia.org/wiki/List_of_Premier_League_clubs

The page on Wikipedia is much more useful. It seems that humans are better at making tables of data from diverse sources of information that computers are at this point. Will it always be this way?

Wikipedia has strict guidelines on how articles are written and how propositions should be backed by reliable sources. Could these guidelines be further formalised and pave the way for an algorithm that could write something like Wikipedia from scratch? Google seem to be attempting to build a system that can produce the pages on Wikipedia with names like “List_of_*”. For all I know, Google might have looked at all the articles on Wikipedia whose names match that pattern and used them to get their tables started.

Sport is a popular subject. It’s safe to say that there are lot of people who are willing to give up their free time to collate data on the subject. If some joker changed the Wikipedia table to say that Manchester United were relegated at the end of the previous season, this error would be corrected quickly as there is no lack of people who care deeply about the matter.

During a presentation for Wolfram Alpha, Stephen Wolfram was asked whether he had taken data from Wikipedia. He denied it and said that the problem with Wikipedia was that one user might conscientiously add accurate data for 200 or so different chemical compounds in various articles. Over the course of a couple of years, ever single article would get edited by different groups. The data diverged. He argued that these sorts of projects needed a director, such as himself. However, he said that his team had used Wikipedia to find out what people were interested in. If the article on carbon dioxide is thousands of characters long, is edited five times a day, has an extensive talk page, is available in dozens of languages, and has 40 references, it is safe to say that carbon dioxide is a chemical compound that people are interested in. This is true regardless of the accuracy content of the article. It would be pretty trivial for Google (or any Perl hacker with a couple of hours to spare and a few gigs of hard disk space) to rank all of the pages on Wikipedia according to public interest using the criteria that I just listed.

In many ways, an algorithmic encyclopaedia is to be preferred because of the notorious problems of vandalism and bias. However, tasks like condensing and summarising are not straightforward. The problem of deciding what to write about could analysing Wikipedia, as described above, and tracking visitor trends. Is there going to be a move to unseat Wikipedia in the coming years? How long before humans can be removed from the algorithm completely?