Sr Data Engineer
CLARA analytics
About CLARA:
CLARA Analytics ("CLARA") is revolutionizing the Insurance industry with its AI based products that dramatically reduce costs and complexity for both insurers and the insured. Our products include the award winning CLARA Providers, a provider scoring engine that helps rapidly connect injured workers with top performing doctors, getting them on a path to speedy recovery, CLARA litigation which uses AI insights to detect litigation risk, manage attorney performance and resolve claims effectively, and CLARA claims, an early warning system that helps frontline claims teams efficiently manage claims, reduce escalations and understand the drivers of complexity. CLARA’s customers include a broad spectrum from the world’s top 25 insurance carriers to small, self-insured organizations.
This is a chance to get in early with a rapidly growing Silicon Valley company in the AI/ML and InsureTech space and to participate in developing the next generation of truly game-changing products. Job title and compensation will be adjusted as appropriate to meet the experience level of the right candidate.
About the role:
CLARA is looking for an experienced Senior Data Engineer who has built data processing systems, data warehouses, and data science supporting ecosystem. They will have strong familiarity working in a cloud environment like AWS or GCP. They will be comfortable in a technical lead role working with other data engineers, application engineers, data scientists, product managers and product delivery teams. This person will be a mentor for the current Data Engineering team.
Key Responsibilities
- Help provide technical leadership in Clara’s data engineering team, driving technology decisions, mentoring others, and contributing significantly on an individual level
- Experience with data tools like AWS Redshift, AWS Sagemaker, AWS Athena/Presto, AWS EMR, and Spark/pyspark
- Use exploration and analytic tools like Apache QuickSight , AWS Athena/Presto, or other BI tools
- Knowledge and experience with Informatica is a plus.
- Build robust data processing pipelines using Airflow, Kubeflow/MLFlow or similar and integrate with multiple components and data sources and sinks
- Design and architect new product features, champion the use of the right cutting edge technologies and tools and mentor the team in the adoption of these new technologies.
- Collaborate with the Data Science and analysts team to ensure that data processing, structure and accessibility maximizes model performance while minimizing costs.
- Work on a bi-weekly sprint schedule in a fast-paced startup environment. Participate in and contribute to scrum meetings i.e. daily stand-up, sprint planning, and retrospectives
- Deliver value in the form of timely, high quality, performant software components and services
- Collaborate with product owners and stakeholders to plan and define requirements
Qualifications & Experience
- Candidates with 7+ years’ experience in software engineering, building large scale systems.
- Experience with the following software/tools is highly desired:
- Highly proficient with Redshift, RDS,
- Apache Spark, Hive, etc
- SQL and NoSQL databases like MySQL, Postgres, DynamoDB, Elasticsearch
- Workflow management tools like Airflow
- AWS cloud services: RDS, AWS Lambda, AWS Glue, AWS Athena, EMR (equivalent tools in the GCP stack will also suffice)
- Strong programming skills in at least one of the following languages, Java, Scala,Python and/or C++ accepted.
- Strong analytical skills and advanced SQL knowledge, indexing, query optimization techniques
- Experience implementing software around data processing, metadata management, ETL pipeline tools like Airflow
- Experience working with cross-functional teams in a fast-paced environment
- Experience working with Data Science and Machine Learning Engineerings and ML Ops to build/deploy robust and scalable production systems.
- Ability to translate data needs into detailed functional and technical designs for development, testing and implementation
- Ability to serve as a liaison between technical, quality assurance and non-technical stakeholders throughout the development and deployment process