Compile helps companies build applications by giving them deep-linked company data. We get this data by crawling millions of publicly available datasets on the internet. One of the core parts of Compile’s infrastructure is a web crawler which we’ve affectionately nicknamed ‘Mordor’.
While working on Mordor, testing our changes locally was painful. Often the changes were subtle and we needed our crawler to crawl a list of URLs to make sure everything was running as intended. This ate into our development time as we had to wait out for Mordor to finish crawling this test-set. Since being lazy is a trait of our engineering team, we decided to look into how we could test Mordor faster.
One way was to use a caching web proxy in front of Mordor so that it caches the responses. We tried using Squid and although it was easy to set-up (we used a docker image), the learning curve for configuring it was too much and we decided that we’d hack together a simple caching proxy server.
Taking ideas from sharebear’s article on how he built a caching server, we made a simple script and baked it into Mordor. Till date, it has saved hundreds of hours of developer time and freed-up tons of bandwidth for us.
Once we released the code, other teams at Compile also found this helpful and began contributing to it. We thought it would benefit others as well and in true open source ethos, we’re proud to announce Cappy!
Cappy stands for ‘Caching Proxy in Python’. It caches responses and saves them locally to a directory you specify. You can check out the code over here
pip install cappy
cappy run runs the server. There are a couple of options that you can configure.
--port- optional (default: 3030)
--cache_dir- optional (default: Temporary platform specific folder)
--cache_timeout- optional (default: 864000 seconds, use 0 for caching indefinitely)
--cache_compress- optional (default: False) Compress and store the cache
P.S: Cappy sounds a lot like Kappi which means coffee in South India. Hope this answers your questions on the coffee image.
At Compile.com, we deal with a variety of datasets both big and small. Often, there is a need to run analysis on top of 3rd party datasets that we haven’t ingested to see if it’s worth the effort. This particular …