Able to deploy Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure high availability, schedule and configure and take backups.
Strong Experience with Hadoop ETL/Data Ingestion: Sqoop, Flume, Hive, Spark, Hbase
Experience in Real Time Data Ingestion using Kafka, Storm, Spark or Complex Event Processing (CEP)
Experience in Hadoop Data Consumption and Other Components: Hive, Hue HBase, Phoenix, Spark, Mahout, Pig, Impala, Presto
Experience monitoring, troubleshooting and tuning services and applications and operational expertise such as good troubleshooting skills, understanding of system?s capacity, bottlenecks, and basics of memory, CPU, OS, storage, and networks.
Experience with open source configuration management and deployment tools such as Puppet or Chef and Scripting using Python/Shell/Perl/Ruby/Bash/PowerShell
Good understanding of distributed computing environments
Looking for Big-Data architects that will lead the design and implementation of our next generation of best in class solutions. As a dynamic and talented Big Data architect, you will have proven hands on experience leading architectural design of large scale, high availability Big Data solutions. You will understand the technical opportunities and limitations of the various technologies at your disposal and will be able to specify the tailored hardware for the recommended solutions. Along with the immediate architecture, you will have an innate understanding of how to provision, monitor, support, evolve, and evangelize the chosen technology stack(s). The focus of this position is to provide solutions within the Hadoop environment using technologies such as HDFS, Spark, Storm, Kafka, MapReduce, Pig, Hive, HBase, ZooKeeper and other Big Data technologies for both batch oriented and real-time analytics.
Responsible for setup, administration, monitoring, tuning, optimizing, governing Large Scale Hadoop Cluster and Hadoop components :On-Premise/Cloud to meet high availability/uptime requirements.
Design & implement new components and various emerging technologies in Hadoop Echo System, and successful execution of various Proof-Of-Technology (PoT) / Proof-Of-Concepts (PoC)
Collaborate with various cross functional teams: infrastructure, network, database, application for various activities: deployment new hardware/software, environment, capacity uplift etc.
Work with various teams to setup new Hadoop users, security and platform governance
Create and executive capacity planning strategy process for the Hadoop platform
Work on cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Ambari etc.
Performance tuning of Hadoop clusters and various Hadoop components and routines.
Monitor job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
Harden the cluster to support use cases and self-service in 24x7 model and apply advanced troubleshooting techniques to on critical, highly complex customer problems
· Contribute to the evolving Hadoop architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
· Setup monitoring and alerts for the Hadoop cluster, creation of dashboards, alerts, and weekly status report for uptime, usage, issue, etc.
· Design, implement, test and document performance benchmarking strategy for platform as well for each use cases
· Act as a liaison between the Hadoop cluster administrators and the Hadoop application development team to identify and resolve issues impacting application availability, scalability, performance, and data throughput.
· Research Hadoop user issues in a timely manner and follow up directly with the customer with recommendations and action plans
Work with project team members to help propagate knowledge and efficient use of Hadoop tool suite and participate in technical communications within the team to share best practices and learn about new technologies and other ecosystem applications
· Automate deployment and management of Hadoop services including implementing monitoring
· Drive customer communication during critical events and participate/lead various operational improvement initiatives
2+ Years of strong Hadoop/Big Data experience with an overall industry experience of 8+ years at various levels with good amount of in Java programming.