Devops ( Cloud Solutions Engineer)
With your DevOps expertise, you will have a direct impact on the entire team of Databricks. Our services have to scale, stay highly available, and "just work.” If you love learning, designing, engineering, and running systems and infrastructure that will help us grow together, then this is the place for you!. You will be challenged and be ready to challenge your folks with your constant feedback while working as part of a small but rapidly growing high-energy team delivering incredible, creative improvements to the company. Your everyday won't be the same here.
As a professional in this position will have the following responsibilities:
- Operate and maintain our CICD setup based on Github actions & Jenkins. Bring in constant improvisation.
- Perform POC of several newer DevOps tools which help us make our setup more robust and error-free.
- Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
- Work with the Engineering team for the handover of several operations activities.
- You may be asked to be on-call to assist deployment and other support activities.
- Be ready to write some notebooks leveraging the python and SQL skills.
- Participate in product testing and if you are lazy then bring in some automation.
- 4+ years of hands-on experience in a DevOps role.
- You should have significant experience in Kubernetes and Docker. Not just limited to basic commands.
- You have a good understanding of two or more tools like GitHub-actions or Jenkins or ArgoCD or Terraform.
- Decent knowledge of any one of the public clouds, Preferable Azure.
- Scripting knowledge in Python is essential.
- Expertise in Linux and shell scripting.
- You are familiar with log aggregation and other monitoring tools.
Good to have:
- Awareness and insight into industry trends (technology, methods, and tooling).
- Automation mindset.
We’re growing fast and attracting the best talent in the world. Bricksters — as we call ourselves — are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you’ll likely hear about our culture.
We are seeking an experienced NOC Engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks’ infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks platform.
The impact you will have here:
- Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents.
- Investigate incidents and propose solutions to improve platform reliability and stability.
- Perform root cause analysis for reoccurring incidents and provide proactive solutions.
- Develop toolings or automate processes to improve platform monitoring and alerting.
- Contribute to software development efforts to improve overall service reliability and stability.
- Communicate effectively with internal stakeholders, including executive staff, to provide incident analysis.
- Participate in war rooms and temporary communication channels during outages.
- Demonstrate cross-functional leadership and establish ownership of incidents and outages.
- Possess the ability to multitask on several incidents and/or projects at once
What are we looking for?
- Minimum of 4 years of experience as a NOC, SRE, or DevOps engineer
- Strong knowledge of cloud technologies such as Azure, AWS, and GCP
- Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, Grafana, Pager Duty, etc.
- Hands-on experience with containers and orchestration technologies such as Docker and Kubernetes.
- Strong software development skills and experience
- Proficiency in automation and scripting (Python, Bash, Terraform)
- Understanding of CI/CD principles
- Linux systems administration skills.
- Incident management experience.
- Excellent communication skills.
- Ability to work well under pressure in a fast-paced environment
- A high degree of integrity, accountability, attention to detail, execution, and planning expertise
- Bachelor's degree in Computer Science or a related field
- Willingness to learn Databricks products
- Benefits allowance
- Employee's Provident Fund
- Equity awards
- Gym reimbursement
- Annual personal development fund
- Work headphones reimbursement
- Business travel insurance
Databricks is the data and AI company. More than 9,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.
Our Commitment to Diversity and Inclusion
At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics.
If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.