Equifax is seeking a driven and experienced Python Developer to contribute to our team. In this role, you will be responsible for the development, maintenance, and optimization of Python-based web scrapers. You will leverage libraries such as Beautiful Soup, Scrapy, requests, Playwright and Selenium to efficiently collect, clean, and structure both structured and unstructured data from diverse sources, ensuring its accuracy and completeness for analysis or automation purposes.
This role involves extracting, cleaning, and processing data from various online sources, while also handling challenges like dynamic content, anti-bot measures, and data accuracy. The ideal candidate will have a solid foundation in Python, practical experience with web scraping tools/libraries and handling large data sets, as well, a strong understanding of HTML, CSS, and browser behavior.
What you'll do
- Develop and maintain Python-based web scrapers to efficiently extract structured and unstructured data from various websites and sources.
- Design scripts to automate repetitive scraping tasks and schedule jobs using tools like cron or Airflow.
- Store and manage scraped data in databases (SQL/NoSQL) or cloud storage solutions.
- Utilize tools and techniques to bypass CAPTCHAs, IP blocking, and other challenges encountered during web scraping.
- Ensure scrapers are optimized for performance and can handle large-scale scraping without crashing or slowing down.
- Adhere to web scraping best practices and ensure compliance with legal standards.
- Process and clean data: Transform raw scraped data into structured formats (e.g., CSV, JSON) and ensure data quality through validation and cleaning processes.
- Collaborate with data analysts, product managers, and other developers to understand data requirements and deliver high-quality results.
What experience do you need
- A Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field.
- 3+ years of professional experience in software engineering with a strong focus on Python development and proven experience writing Python code to extract data from websites, ensuring efficiency, accuracy, and adherence to best practices.
- 2+ years of experience with web technologies, including a solid understanding of JavaScript, HTML, CSS, and XML for effective entity extraction and hands-on experience designing, querying, and managing data in both SQL or NoSQL databases.
- 2+ years of experience with core Python web scraping libraries such as Scrapy and BeautifulSoup for HTML parsing and browser automation tools like Selenium or Playwright for handling dynamic, JavaScript-rendered content, handling data formats like JSON and CSV, coupled with experience in data cleaning and validation techniques.
- English proficiency of B2 or higher.
What could set you apart
- Understanding the importance of respecting website terms of service and avoiding harmful scraping practices.
- Experience with cloud platforms like AWS, Google Cloud, or Azure.
- Network traffic understanding or experience.
- Experience working with SDLC and Testing.
- Proficiency with version control systems, particularly Git, for collaborative development and code management.
- Familiarity with CI/CD pipelines.
Primary Location:
CRI-Sabana
Function:
Function - Tech Dev and Client Services
Schedule:
Full time
Read Full Description