As the size of the Web gets bigger and bigger, search engines such as Yahoo! and Google may be too general for building applications that focus on some particular domain of information. To solve this problem, Alexa provides a web search platform that allows people to define their own search engine.
Although you have to pay for the service, but it definitely looks promising. Alexa crawl works over 100 Terabytes of Web content spanning 4 billion pages and 8 million sites, and support a wide variety of types of content from the Web (jpgs, gifs, mp3s, movies. text/html, and even metadata). How does Alexa work?