136 private links
A research project I spent time working on during my master’s required me to scrape, index and rerank a largish number of websites. While Google would certainly offer better search results for most of the queries that we were interested in, they no longer offer a cheap and convenient way of creating custom search engines.
This need, along with the desire to own and manage my own data spurred me to set about finding a workflow for retrieving decent results for search queries made against a predefined list of websites. That workflow is described here, providing what I hope shall serve as a useful reference for how to go about setting up a small search engine using free and open-source tools.
Note:
• The instructions here assume that you use some UNIX-like operating system (Linux, MacOS, *BSD).
• Any Python code has only been tested using Python 3.7 (The clock is ticking).
• Any code provided is free to use under the MIT license.