Almost a Blog

Month: October, 2004

bOb7waRe b0b7waRe

I got a text message today from someone and the phone number was displayed as
bob7ware or b0b7ware
weird. I can only assume it’s spam.
I have been receiving some spam from some dodgy company with a text message saying that someone close to me fancies me dial this number. Why does our government tolerate this type of nonsense. Its harrassment on a global scale and now that its working its way onto the mobile phone we are not going to be able to get away from the crap.


What is a DSO Exploit

I have noticed that a few people came here to find out information on what exactly a DSO Exploit is so I put together the following. If you need more leave me a comment and I will see what I can do.
Most of you are wondering why spybot is reporting a DSO Exploit. First, there is a bug in spybot at the moment that means it will always report this error. The bug will be fixed in a newer version of spybot.
Don’t panic, your system may be as clean as a whistle.
What is a DSO Exploit.
DSO stands for Data Source Object. So a Data Source Exploit can be very severe when you consider your hard drive is a data source or pretty much anything else for that matter and can be accessed using a method called data binding. A DSO Exploit is where someone maliciously uses data binding techniques to gain access to material they are not meant to access. This was a bug in some versions of Internet Explorer, Outlook Express etc. Note I said old versions, the new versions no longer have this problem and I suggest you upgrade to these to avoid the bug.
This does not mean you have to go and buy the latest microsoft software. Microsoft release service packs that come with the necessary patches required to fix this problem so get the latest service pack for your system and install it and you will be safe from this particular bug, or at least until some smart arse finds another way to crack it
To stop SpyBot reporting the error do the following
Open SpyBot in advanced mode
Select: Settings
Select: Ignore Products
On the “All Products” tab scroll to “DSO Exploit” and check it.

Secure Email Obfuscator / Catchpa

I am sick to death of spam! Its a pain in the ass but there does not seem to be much we can do about it.
One thing spammers do is harvest emails from the internet. This is surprisingly easy to do because people want to put their emails online and its very easy to write a spider. This is further compounded by the requirement of some applications that you put your email online ie Movable Type is one although it can be turned off but this then means I would get more spam.
One partial solution would be to store emails as a security image. I wrote this utility to create my own email images as png’s. If other people find it useful then I will extend the functionality of it. I know it is not unbreakable as has been proven by those clever cloggs at Berkeley who have managed to crack the gimpy catchpa.
There are a few other methods to do this but the hardest ones to crack are those that use some form of Catchpa mine is almost there but has a bit to go. Any suggestions welcome.
Using a secure image is another way to make it harder for spammers to collect emails.

The image above was created by the email obfuscator tool.
The following email was not produced by me but by BobG. One of the guys who left a comment. I have displayed the image here because I don’t want to allow images to be displayed in the comments otherwise we would give the blog spammers another avenue of attack. Anyway, Bob suggested I add the facility to create color for the background so I suppose I will have to do this over the next few days.

Read the rest of this entry »


Throwing someone out a window is not something I often do. In fact I cannot remember the last time I threw someone out/through a window. I imagine if I was to take part in such an event I would be able to remember it taking place. How do I know this? Well, throwing someone through a window is commonly known as defenestration, how could one forget such such a term.
I wonder how many Judges have been regaled about accounts of defenestration. I can see it now, some smart arse lawyer:
Well, your honor my client did not partake in the defenestration itself he was occupied with the carrying of the 3rd party, the defenestration took place after the defendee left the clients hands. My client has no knowledge of what happened to the defendant after the launch but can recount a crashing sound in or around the time of the event.

Creating a Debian Package

Ever since I started using Debian I have meant to try and create a debian package for various reasons,
a) I am just a curious bugger.
b) I am just a curious bugger.
today was my chance to have a go and see what exactly it involves or should I say what it involves to create a very minimal package.
The reason for this is that I have written a Content Management System that uses about 20 Perl modules some internal some external and rather than worry when it comes to the install or an upgrade we have decided to stick the whole thing in subversion and then wrap each release in a debian package. We have several sites to run this from so the more we can automate the better particularly if I can get the automatic testing sorted. Using subversion and the debian packages the whole system should be relatively low maintainance as far as upgrades are concerned and this is important. We don’t want to scrub ourselves into the upgrade corner and find we have neither the time or the budget to spend time on an upgrade. We want it automated for us and although it might be a pain in the arse to put in place it will pay dividends when we come to change things later.
We also have a postgres schema and some config files that need taking care of but as I found out today this is relatively simple using the debian packaging tools.
I suppose I should write a simple howto about how I did it because using the debian new maintainers guide is not really the best tutorial for those wishing to package their own application for internal use. I imagine there are some other tutorials around but I didn’t find them.


I have not been able to do any work on thesearch engine for quite a while due to commitments with maths etc but I now have some free time so I have restarted spidering again.
What I am aiming for is about 100 million pages as a base to start working on. I am probably going to impliment the whole thing using Postgres because I do not have the time to write the software requred to handle the storage (the files are stored as flat files on disk) its the meta data that I will be storing in Postgres. I will let Postgres do all the nitty gritty work so that I can concentrate on the ranking and search algorithms.
I am also looking at just using plain text ie splitting out the html completely and rely in the text and not the formatting of the document to rank each document. The reasons for this are:
a) It is much much simpler. I started writing an HTML parser in flex and believe me its a pain in the ass.
b) Plain text is also where the information is and it is this that I am interested in. Dealing with the formatting is not something I want to have to deal with. I intend to store each document raw to disk incase I change my mind later though 😉

Robots.txt file

There appears to be some misunderstanding surrounding the usage of the robots.txt file.
The following is just a fraction of the stuff I have found while spidering websites.
Noarchive: /
The “noarchive” statement should be part of a meta tag it should not be in the robots.txt file. Its not part of the standard.
I believe that the following or something similar should be in the standard but it isn’t yet ie “Crawl-delay”.
Crawl-Delay: 10
Crawl-delay: 1
Crawl-delay: 60
It is implemented by a few crawlers but people insist on doing the following
User-agent: *
Crawl-delay: 1
The proper way should be as follows,
User-agent: Slurp
Crawl-delay: 1
I know Yahoo’s crawler (Slurp) adheres to the Crawl-delay directive
but here we are endorsing a non-standard method, whether this is a good or bad thing is left up to the reader to decide. I think there needs to be a delay type option in the robots.txt file having been hammered once by msn’s bot.
Then we have the people who think that they need to authorize a spider to spider their website.
Allow: /pages/
Allow: /2003/
Allow: /services/xml/
Allow: /Research/Index
Allow: /ER/Research
Allow: /
The reason for not having an Allow directive is simple. Hardly any of the internet would be indexed becasue only a fraction of the websites online actually uses the robots.txt file. By implementing an Allow directive it would mean that websites are closed for business to the spiders. For instance, take the following directive
Allow: /index_me/
is the spider then to assume that only that directory is available to the spider on the entire website, what can the spider assume about the above directive. To me it reads that only the “index_me” directory is to be indexed. What then is the point in the Disallow directive.
The Disallow directive was chosen because the internet is for all intents and purposes a public medium so we all opt in when we put our websites up then we opt out of the things we don’t want indexed.
my favorite though are the following. The honest mistakes
Diasallow: /editor/
Diasallow: /halcongris/
Disallow /katrina/
Diasallow: /reluz/

M203 Exam

Is over. Thank god I have that out of the way. I now have 3 months before it all starts again and my scheduling is back at the mercy of the Open University.
I am not too sure how I did in the exam. I missed a couple of small bits out and I think I really botched one of the last ten point questions. I know what I needed to do for it but the notation completely escaped me, bollix.
We will see just how I did in a few months time. Now I need to figure out what would be the most beneficial thing to do with all this free time.

From Here To Infinity

I just finished
From Here To Infinity
Author: Ian Stewart
This was tough going. It is not really suitable for people without a decent amount of maths and I have to say that a lot of the topics went clean over my head and I am meant to have a fair bit of maths beneath my belt or at least more then most. I would be hard pushed to recommend it because I did not feel as if I got as much from the book as I hoped. It would not put me off reading any of his other books I just don’t think this book was my cup of tea.

Copyright by Blunderbuss or Creative Commons

I had heard of Prof. Lessig from general browsing on the internet so I know he’s got some clout with the online community, blogosphere whatever you want to call it but I had never really taken the time to find out what he does that seems to cause such a stir. He seems to have an almost religious following in some circles so I thought that I should go and see just exactly what all the fuss is about.
I had heard of the Creative Commons before so off I went umbrella in hand to University College London’s Edward Lewis Theatre and grabbed myself a seat. I immediately recognized him because I had visited his website before I went to the talk for a general nosey.
This is just the way I heard it I am sure I have probably got some of the ideas and concepts wrong 😉
I loved the way he started his talk ie he took us back to the days when George Eastman was setting about pioneering the camera and how a law passed then enabled the camera business to flourish the way it did. He then described a few things that we take for granted ie cultural remix (first time I had heard this phrase), the act of taking something like a song and putting your own spin on it or having watched a movie how we describe it to our friends and embellish it the way we see it. This goes on every day and there are no copyrights on this and there shouldn’t be.
He then moved this onto the digital age and casually pointed out that our cultural remix which we take for granted every day was now, in part, a digital phenomenon and no longer limited by distance. Kids today are growing up in this digital age and are making friends across the world without even meeting up so our once limited cultural remix has set new boundaries on a global scale. The way we think eat and speak and go about our business is now wrapped up online in this huge boiling ménagerie of digital stuff. People are expressing themselves in ways we would not have dreamed about a few years ago ie we have a new age cultural remix going on and this is a good thing. What is not good is that we have the middle men ie the lawyers trying to stifle this from happening. The lawyers and some corporations are doing this by making vast areas of our new remix illegal. ie
“Using DDT to kill a gnat”
(from memory, used by Prof Lessig in the talk, probably slightly misquoted)
was the way Prof Lessig described it and this is wrong. It was quite clear that Prof Lessig believes in copyright and so do I but it was also clear that he does not believe in applying it blindly. The normal bluderbuss approach to copyright seems to get his goat and quite rightly so, its bloody stupid.
Anyway, the talk centered around the creative commons license and what it means to us and what we can use it for and why we need it.
At the moment everything written down is copyright to the author or creator of it regardless of whether they have stuck the big C on it somewhere. This means that everything on the internet is expressly copyright unless stated otherwise. For people who want to use something they find on the internet ie a DJ finding a sample from a song, they cannot unless they have permission from the owner of the copyright so they have to get lawyers (middlemen) involved to sort out the legal stuff and they can carry on with their mixing. What the creative commons enable us to do is release a piece of work and mark it so that people know what they can and cannot do with it without having to get the lawyers involved ie cutting out the middleman. I am all for this, its a wonderful idea.
Can I prove that its a wonderful idea, yes I can. During the talk Prof Lessig played part of a soundtrack that had been released under the creative commons license “My Life” by Colin Mulcher which was then edited by Cora Beth and the editing certainly added something to the track. It was brilliant. This is not an isolated incident either.
Anyway, I have just found some of the material from the talks online so your time would be better spent watching these flash movies than reading this.
You might also be interested in
Learning More
Creative commons website
Find CC content
George Eastman