Job Title: Data Engineer
Location: Karachi, Lahore , Islamabad (Hybrid)
Experience: 5+ Years
Job Type: Full-Time
Job Overview:
We are looking for a highly skilled and experienced Data Engineer with a strong foundation in Big Data, distributed computing, and cloud-based data solutions. This role demands a strong understanding of end-to-end Data pipelines, data modeling, and advanced data engineering practices across diverse data sources and environments. You will play a pivotal role in building, deploying, and optimizing data infrastructure and pipelines in a scalable cloud-based architecture.
Key Responsibilities:
- Design, develop, and maintain large-scale Data pipelines using modern big data technologies and cloud-native tools.
- Build scalable and efficient distributed data processing systems using Hadoop, Spark, Hive, and Kafka.
- Work extensively with cloud platforms (preferably AWS) and services like EMR, Glue, Lambda, Athena, S3.
- Design and implement data integration solutions pulling from multiple sources into a centralized data warehouse or data lake.
- Develop pipelines using DBT (Data Build Tool) and manage workflows with Apache Airflow or Step Functions.
- Write clean, maintainable, and efficient code using Python, PySpark, or Scala for data transformation and processing.
- Build and manage relational and columnar data stores such as PostgreSQL, MySQL, Redshift, Snowflake, HBase, ClickHouse.
- Implement CI/CD pipelines using Docker, Jenkins, and other DevOps tools.
- Collaborate with data scientists, analysts, and other engineering teams to deploy data models into production.
- Drive data quality, integrity, and consistency across systems.
- Participate in Agile/Scrum ceremonies and utilize JIRA for task management.
- Provide mentorship and technical guidance to junior team members.
- Contribute to continuous improvement by making recommendations to enhance data engineering processes and architecture.
Required Skills & Experience:
- 5+ years of hands-on experience as a Data Engineer
- Deep knowledge of Big Data technologies – Hadoop, Spark, Hive, Kafka.
- Expertise in Python, PySpark and/or Scala.
- Proficient with data modeling, SQL scripting, and working with large-scale datasets.
- Experience with distributed storage like HDFS and cloud storage (e.g., AWS S3).
- Hands-on with data orchestration tools like Apache Airflow or StepFunction.
- Experience working in AWS environments with services such as EMR, Glue, Lambda, Athena.
- Familiarity with data warehousing concepts and experience with tools like Redshift, Snowflake (preferred).
- Exposure to tools like Informatica, AbInitio, Apache Iceberg is a plus.
- Knowledge of Docker, Jenkins, and other CI/CD tools.
- Strong problem-solving skills, initiative, and a continuous learning mindset.
Preferred Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
- Experience with open table formats such as Apache Iceberg.
- Hands-on with AbInitio (GDE, Collect > IT) or Informatica tools.
- Knowledge of Agile methodology, working experience in JIRA.
Soft Skills:
- Self-driven, proactive, and a strong team player.
- Excellent communication and interpersonal skills.
- Passion for data and technology innovation.
- Ability to work independently and manage multiple priorities in a fast-paced environment.
Read Full Description