Posted Saturday, April 9, 2011 in Old JamesCMS Posts
I'm currently working on a project to set up a vertical search engine. In other words its a search engine that can crawl a specfic portion of the web. I'm working with a team and we haven't determined what criteria to narrow our search field to but we were able to find a great open source search engine built in C#. Arachnode.NET has support for keyword filtering and even bayessian classification. The search engine index is done with lucene.net which provides modern indexing functions and there's even support to use MS SQL full-text indexing. It's nice to see an open source .NET project on par with those in Apache's Software Foundation.
The vertical search engine project involved setting up a vertical search engine to crawl the web for pages related to steganography. The crawler and query page are from the C# open source project Arachnode.NET.
The search page is a “Google-like” search engine that can search the crawl results.
The full-text index can be downloaded and viewed using Luke.
Screenshots have been taken during the configuration, web crawling and setup of the project.
Google custom search has been set up here to compare results to our own vertical search engine.
There were a total of 793 domains crawled.
There were a total of 15,570 webpages crawled. All of which were from the 793 domains listed above. These were stored because they were found to be related to steganography.