I Don’t Need No Stinking API – Web Scraping in 2016 and Beyondfranciskim.co
Social media APIs and their rate limits have not been nice to me recently, especially Instagram. Who needs it anyway?
Sites are increasingly getting smarter against scraping / data mining attempts. AngelList even detects PhantomJS (have not seen other sites do this). But if you are automating your exact actions that happen via a browser, can this be blocked?
First off, in terms of concurrency or the amount of horsepower you get for your hard earned $$$ – Selenium sucks. It’s simply not built for what you would consider ‘scraping’. But with sites being built with more and more smarts these days, the only truely reliable way to mine data off the internets is to use browser automation.