I’ve been looking at the Apache log files on a web server this morning. There are many virtual hosts on the machine and the log rotation scripts have created many numbered backup files of the logs. To make the directory listing more readable, I have been using the following one-liner:
ls -l /var/log/apache2 | grep -Ev -e '[[:digit:]]+(\.gz)?$'
This will only display the current log files, assuming that /var/log/apache2 is the directory in which you store your Apache logs and that you do not store any other files there.
I’m a big fan of Project Gutenberg and have downloaded many of their etexts over the years. However, their etexts have numeric file names, which aren’t very human friendly. In order to keep track of the etexts that I have saved on my computer, I’ve written a little perl script to extract the author and title from the etexts and generate an HTML file to list them.
The code’s release under the GPL, so feel free to tinker with the code and share alike.
Dr. Feynman is an engaging lecturer; it is perhaps regrettable that all lectures are not so entertaining.
At one point Dr. Feynman says that “It is impossible, when picking one particular example of anything, to avoid picking one that is atypical in some sense.” Of course, this is true by definition. If we were to find an example that was typical in every sense, it would be atypical in that it was not atypical in some sense, and so it would be atypical in some sense. Oh, the joy of school boy pedantry!
The video is rendered with a Silverlight player, which is perhaps not available on all platforms. It also used 100% of my CPU’s clock cycles and caused the laptop to crash three times. I guess that Silverlight has a long way to go before it can threateningly compete with Flash. On the one hand, it’s a good thing that Flash has some more competition (not that I am accusing the Adobe engineers of laziness, mind). On the other hand, the internet will not be as rich a place as it might be if a lot of content is only available to Microsoft’s customers. I thought that that war had been won a long time ago.
I’ve been using the Teach-nology generator for a while for making bingo cards. My generator makes a few improvements to the way that the user operates. In particular, the user doesn’t have to hit ‘Shuffle’ and print for each student.
My kids tend to enjoy bingo. I let them play a game as a reward after a test. It’s more suited to less experienced learners, especially ones learning to match sounds to the words that they read. With more experienced learners, one can say the definition of the word, draw a picture on the board or do a charade instead of just saying the word.
The page on Wikipedia is much more useful. It seems that humans are better at making tables of data from diverse sources of information that computers are at this point. Will it always be this way?
Wikipedia has strict guidelines on how articles are written and how propositions should be backed by reliable sources. Could these guidelines be further formalised and pave the way for an algorithm that could write something like Wikipedia from scratch? Google seem to be attempting to build a system that can produce the pages on Wikipedia with names like “List_of_*”. For all I know, Google might have looked at all the articles on Wikipedia whose names match that pattern and used them to get their tables started.
Sport is a popular subject. It’s safe to say that there are lot of people who are willing to give up their free time to collate data on the subject. If some joker changed the Wikipedia table to say that Manchester United were relegated at the end of the previous season, this error would be corrected quickly as there is no lack of people who care deeply about the matter.
During a presentation for Wolfram Alpha, Stephen Wolfram was asked whether he had taken data from Wikipedia. He denied it and said that the problem with Wikipedia was that one user might conscientiously add accurate data for 200 or so different chemical compounds in various articles. Over the course of a couple of years, ever single article would get edited by different groups. The data diverged. He argued that these sorts of projects needed a director, such as himself. However, he said that his team had used Wikipedia to find out what people were interested in. If the article on carbon dioxide is thousands of characters long, is edited five times a day, has an extensive talk page, is available in dozens of languages, and has 40 references, it is safe to say that carbon dioxide is a chemical compound that people are interested in. This is true regardless of the accuracy content of the article. It would be pretty trivial for Google (or any Perl hacker with a couple of hours to spare and a few gigs of hard disk space) to rank all of the pages on Wikipedia according to public interest using the criteria that I just listed.
In many ways, an algorithmic encyclopaedia is to be preferred because of the notorious problems of vandalism and bias. However, tasks like condensing and summarising are not straightforward. The problem of deciding what to write about could analysing Wikipedia, as described above, and tracking visitor trends. Is there going to be a move to unseat Wikipedia in the coming years? How long before humans can be removed from the algorithm completely?
Above is a link to an article on the carbon footprint of the internet. In the comments, we can find the normal luddite opinions. If only people didn’t like the modern world, we could live in pre-industrial simplicity.
It seems embarrassingly obvious to me that if we have any hope of survival, it is in moving forward, rather than backwards. If we think that we can solve the world’s environmental problems by rejecting technology, then we’re sunk. Do the troglodyte commenters on the Guardian really think that the world is going to be able to implement the sort of engineering projects that are going to be necessary for a revolution in the world’s energy industry without the internet? How do they imagine engineers study and design things like solar panels, wind turbines or smart electricity grids? Using pencils, recycled paper and 30 year old text books?
The Met Office apparently predicted the same thing last year and got it completely wrong. My guess is that they will be wrong again this year. My thinking is not really very scientific, just a willful and immature contrariness coupled with a lifetime of disappointing summers in England.
The sun is at the bottom of its sunspot cycle as well:
Noticing correlations between seemingly unrelated data has always been a rich source of new knowledge. I understand that correlation is not causation, but the reverse does appear to be true. If the expected outcome of a hypothesis does not coincide with the recorded data, then the hypothesis should not be trusted or a least questioned thoroughly.
I imagine that when this goes live, there will be a lot of bloggers who use the data to show relationships that are very questionable. The old line about statistics being used like a drunkard uses a lamp post, for support rather than illumination, will no doubt apply. I am interested to see what all the amateur climatologists who have sprung up since climate change has obsessed the world will come up with. Like everything in life, more than 99% of it will be banal and worthless. It’s the rest that’s intesting. Luckily, the internet provides incredible filters for sorting through enormous amounts of information for finding the most interesting things to think about.
One of the problems that I can see with W|A is that it is closed and proprietary, although users will be able to access the data for free. The company may be able to run profitably. The search engines have done well being run this way. As far as I know, this is a new sort of service that has not been tried before. I hope that the likes of Google, Yahoo! and Microsoft try to build rivals to this quickly.
I hope that the diverse open source communities of the world try to come up with something to compete with it. At this stage, it is clear that a lot of human intervention is needed to get the data into the system. Wikipedia has shown that this is something that people are willing to give up their free time to do. Providing the vast computing resources for an open source version of this project is also a hurdle. I would certainly consider giving up some of my computer’s spare cycles for a distributed and open source version of this came along, and I am sure that I would not be alone. However, the popularity of Google compared to, for example, Yacy shows how difficult it is for these sorts of things to be fully open.
Another development that I hope that W|A brings about is to force universities and other publicly funded research institutions to do more to make all of their experimental data available in machine readable formats. A single open source project that can absorb all of the scientific, engineering, technical, sociological, economic and financial data in the world might not come about, but lots of smaller projects that each try to solve part of the problem might. No doubt, such projects would take pains to cooperate with the other projects.
What developments occur in the next few years in this field are the subjects of anyone’s guesses. As Dickens pointed out in Hard Times,
facts alone do not make a person educated or complete. However, I imagine that everyone being able to ask lots of little questions involving data and relationship between them will have a similar impact to that of Google and Wikipedia. In the past, if we wondered to ourselves “What is the name of the Aztec sun god?” we might not have bothered to go to a public library or even take an encyclopedia of the shelf to find out. Now, we are much more likely to find out about
This separates all texts from the code of the project. The texts are saved in files in a separate folder to the project-specific code. At the moment they need to be created and edited by hand, but a web interface in the admin section may follow.
Literacy is a particular interest of mine, and I have never heard of this. I would recommend deletion.
This seems to be an odd way of thinking for someone helping to write an encyclopedia: I’m an expert; I’ve never heard of this; this, therefore, cannot exist.
One truth that I am continually confronted with (especially when I visit Wikipedia) is that there are more things that exist than I have heard of. This is especially true in the areas in which I consider myself an expert.