Search

Sections

Poll

Whats your fav music?
 
Programing

Mysql as a Spider database

E-mail Print PDF

The program that is being created by me is a Internet Crawler/Spider a lot like Google Web Search and Google Images Search. I hope to release the Spider as GPL when done. I also what to release the database, because I believe that even the information should become Open Source (Under an OSI approved licence) as well. (Alot like Wikipedia is because google is way to propitiatory with infomation they dont even own in the first place :( )

In my latest experiments I have been creating mySQL tables with over 2 million rows. (Yes I have indexed over 2million pages in under 2 days!) The problem I have been running into is that my data length for that table is around 300mb but the index length is around 1.8gb. Right now all URLs are indexed into one table, and the Crawler takes forever to fetch a batch of URL rows to work on, because the entire database needs to be sorted. I need to come up with a faster solution.

The soulution I'm working on right now is to create a archive database, to put revisions of URLs into, thus creating as smaller database for the Crawler to work off.

But the unawnserd questions are:

  1. How big can the archive get?
  2. How hard/slow will it be to serve out the database trough a client (A generated web page of search results, or an image blob search for some examples)
  3. Is MySQL truly the right choice for the database?
Well it looks like benchmarking experiment time!
Last Updated on Sunday, 26 July 2009 19:04
 

Exploring the world of Java.

E-mail Print PDF

I am setting out today to freshen up on my Java skills. Here is the story. I love the Eclipse IDE, for many reasons. The Eclipse IDE supports many programing and scripting lanuages, like C++, Java, PHP, Flex/Flash/Actionscript3. So lately I have been designing in both PHP and Flex and I can use eclipse IDE as my only IDE, allowing for faster deployment and comp ling.

I installed Zend Studio for Eclipse (In the past I used PDT) and then installed the Adobe Flex Studio Plug-in for eclipse. So I had both PHP and FLEX in Eclipse. I found out that I also needed C++ to write some custom socket servers and windows services so I installed the Eclipse CDT (Along with the mingw compiler). But I found out the hard way that because I used Zend Studio for Eclipse as my Eclipse base I had no Java :(

After a bit of research I found out that the "Eclipse SDK" (Java Development version of Eclipse) was nothing more then a hidden away project called JDT. Here is how I did it: I went to the "update manager" and chose "The Eclipse Project Update" then once the updates list was displayed, I could check the entire update tree for "Eclipse SDK Eclipse 3.4.2" and clicked finish. After the update and the restart of Eclipse voila! I could now create new Java projects. :>

So that's how you add Java support to Zend Studio for Eclipse, Adobe Flex Studio or any other port if anyone is interested.

Last Updated on Saturday, 25 July 2009 00:50