Our engine is built to index at a friendly rate. We’d rather score URLs and websites slowly than risk affecting performance for real customers.
We typically won’t retrieve more than one URL per second, although for larger sites our system will attempt to increase that rate tentatively and monitor for any signs of impact on performance.
Our user agent is “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Readable/1.1.4”.
Please note that the version number following “Readable/” will change periodically and the version number shown here may not be current.
Many websites will serve different versions of text to search engine spiders, and some will block access to anything they don’t recognise. Since our job is to measure the readability of the page a typical user sees, we do our best to make sure that’s what we are seeing.
We respect instructions from the robots.txt file.
If you want to specifically exclude anything from being indexed by the Readable.io engine, you can use the user-agent “ReadableBot”. We have included an example of this on our own robots.txt file.