Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are you using to for the scraping? Playwright…selenium? I wanted to do something as a hobby but my IP kept getting reported lol. Also when you say companies…where are you getting the information from? Data brokers? Anyway, it is an interesting topic to me.


Selenium, although I'm using a wrapper library that uses it. I only query each company every few days or so which probably helps to not get banned IP-wise but also rotate them. But many of the company job links are through external sources too (lever, greenhouse, etc.) which don't seem to mind

The company data was gathered online for a long time until I found https://www.thecompaniesapi.com/ (which now is the source for much of that data)


I tried to use my own desktop machine to process some of these tasks. I can see my fans go jet mode when the scraping was being done lol. Do you have discord or any way to connect? Would love to chat around this topic. Feel free to drop any social media handles. I’ll ping you.


Ohhh yea I run into this memory issue very quickly when scraping (especially if you have a large URL dataset then it will inevitably find a website with a giant bit of markup). So I have to set timeouts and blacklist timely requests but also completely reset the (headless) browser on 2-3 requests (which is overkill but I am restricted on memory for those workers). Feel free to drop me an email sometime (should be on my HN profile)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: