Open Source Search Engine Software!
What can Yioop do?
Yioop software provides many of the same features of larger search portals:
- Search Results. Yioop comes with a crawler which can be used to crawl the open web or a selection of URLs of your choice. It also can index popular archive formats like Wikipedia XML-dumps, arc, warc, Open Directory Project-RDF, as well as dumps of emails or databases. Once you have created Yioop indexes of your desired data sources, Yioop can serve as a search engine for your data. It supports “crawl mixes” of different data sources. Yioop also provides tools to classify and sculpt your data before being used in search results.
- News Service. News is best when it is still fresh. Yioop has a media updater process that can be used to re-index RSS and Atom feeds on an hourly basis. This more timely information can then be incorporated into Yioop search results.
- Social Groups, Blogs, and Wikis. Yioop can be configured to allow users to create discussion groups, blogs, and wikis. If Yioop is configured to allow multiple users, then users can share mixes of crawls they create. Blogs and discussion group can be made public or private and posts can be made to expire if desired. Public ones have public RSS feeds and the better amongst these can be chosen for incorporation in what Yioop’s news service indexes. Each group also comes with its own wiki. Images and video can be uploaded to both feeds and wiki pages and Yioop can be configured to automatically convert video to web viewable formats.
- Web Sites. Yioop’s wiki mechanism can be used to build websites. It also has a Model View Adapter framework which can be easily extended to build customized search portal websites. Yioop can also be integrated into existing sites to provide search functionality either through an API, Open Search RSS, or JSON services. Yioop comes with stemmers, summarizers and other natural language processing tools that you can use as a package in your project via Composer.
The software and hardware requirements for Yioop are relatively low. At a minimum, you only need a web server such as Apache and PHP 5.4 or better. A test set-up consisting of three 2011 Mac Mini’s each with 8GB RAM, a single name server, and five fetchers can crawl about 100 million pages in a month.