Responsibilities:
Design, build, and maintain efficient and scalable data pipelines that collect, transform, and load data.
Develop the Logical Data Model and Physical Data Models including data warehouse designs
Monitoring and Maintenance: Implement monitoring and alerting systems to ensure data pipeline reliability, and perform routine maintenance tasks.
Work with upstream and downstream teams to ensure end-to-end execution of the data pipelines
Help set up processes and streamline the workflows to help in the expansion of the team and collaborate with the other teams.
Optimize data processing and storage solutions for efficiency and scalability, working with large volumes of data.
Requirements:
Must have:
- Should have 2-5 years of experience working on ETL pipelines.
- Strong programming skills in Python for data manipulation and transformation.
- Should have in-depth knowledge of a NoSQL OR relational database. And should be able to model data on them.
- Should have Experience with workflow management tools like Apache Airflow.
- Should help the team with all phases of development, including design, implementation, and operation of production systems.
- Should know what good code looks like and be able to write the same. Should have a solid understanding of Object-Oriented design and concepts.
- Should be able to deal well with ambiguous/undefined problems.
Understanding of data security and privacy principles.
Good to Have:
- Proficiency in AWS services such as AWS Glue, AWS EMR, AWS Redshift, AWS S3, AWS Athena, and AWS Lambda.
- Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes)
Prior experience in Mentoring/leading a small team is a plus.
- Proficiency with distributed data processing systems like Spark and Flink