Few million links more 09 Oct 03

by paddyjackson

Another day, another few million links. Since moving some of the data around and recreating the database there is definitely an increase in performace. I vacuum the database regularly because of the amount of updates that take place.
I am off into London on Saturday in search of some more hardware, I never intend to use http://www.scan.co.uk again. I am going to have a trawl around the computer fairs to see what I can find. I would really like to get a dual chip motherboard, and run a couple of the new Opterons on it. I will have to see what I can afford first then decide on what to do. At the moment it is not really processing power that is limiting me it’s the I/O on the system. I have currently got 1Gb of RAM installed which is the most my motherboard can handle. The disks I am using are not really the quickest in the world either so I need to get some decent 80 Conductor cables for the IDE disks. If there was more RAM in the PC and a few more disks to move some of the database files onto the Athlon XP1700 would start to suffer. I have been looking at the MSI and Tyan motherboards with onboard SATA. They are expensive but they would be the perfect choice for what I am doing. I really wish I had more room, I could then build some smaller PC’s to run more robots.
I am going to start rethinking the layout of the tables. For instance at the moment I am storing duplicate links in the links_found table that are already in the home_page table. These links vary in size from fairly small to massive so I think that an integer value taken from the home_page(url_id) column would be a more efficeint use of space. I am also think that, because the CPU is being under utilised,I should seperate the downloading of the pages from the parsing. This would mean I could get more efficient use of all the resources currently open to me.
44.8 Million links found
6.44 Million unique links found

Advertisements