Overview:
We are looking for a Data Engineer with strong background in PySpark/Big Data development and Hadoop, with a minimum of 5 years of relevant experience in Big Data alone and Total 7+ Years of Exp. Proficiency in Hadoop, Hive, Spark, Unix, scala, SQL and Python.
Roles and Responsibilities:
- Develop, implement, and design data pipelines and ETL processes to efficiently ingest, transform, and load large volumes of data.
- Collaborate with cross-functional teams to gain insights into data requirements and devise scalable solutions for data storage, processing, and retrieval.
- Fine-tune and optimize data processes to ensure exceptional performance, reliability, and data integrity.
- Utilize PySpark, Spark, Hadoop, to build robust data solutions.
- Keep abreast of the latest industry best practices and emerging technologies in data engineering.
- Address and troubleshoot issues related to data pipelines and processing.
- Participate actively in code reviews and offer constructive feedback to enhance code quality.
Qualifications:
- Strong experience with Hadoop and its ecosystem tools such as Spark, Kafka, Hive, and Sqoop
- Proficiency in SQL for data analysis, querying, and performance optimization
- Hands-on experience with Unix/Linux environments
- Programming experience in Scala and Python for data processing and pipeline development
- Experience working with large-scale datasets and distributed data processing frameworks
- Good understanding of ETL processes, data pipelines, and big data architecture