Hadoop Data Engineer at Dataflix

View All Jobs

Download File

Overview:

We are looking for a Data Engineer with strong background in PySpark/Big Data development and Hadoop, with a minimum of 5 years of relevant experience in Big Data alone and Total 7+ Years of Exp. Proficiency in Hadoop, Hive, Spark, Unix, scala, SQL and Python.

Roles and Responsibilities:

Develop, implement, and design data pipelines and ETL processes to efficiently ingest, transform, and load large volumes of data.
Collaborate with cross-functional teams to gain insights into data requirements and devise scalable solutions for data storage, processing, and retrieval.
Fine-tune and optimize data processes to ensure exceptional performance, reliability, and data integrity.
Utilize PySpark, Spark, Hadoop, to build robust data solutions.
Keep abreast of the latest industry best practices and emerging technologies in data engineering.
Address and troubleshoot issues related to data pipelines and processing.
Participate actively in code reviews and offer constructive feedback to enhance code quality.

Qualifications:

Strong experience with Hadoop and its ecosystem tools such as Spark, Kafka, Hive, and Sqoop
Proficiency in SQL for data analysis, querying, and performance optimization
Hands-on experience with Unix/Linux environments
Programming experience in Scala and Python for data processing and pipeline development
Experience working with large-scale datasets and distributed data processing frameworks
Good understanding of ETL processes, data pipelines, and big data architecture