Senior Site Reliability Engineer
Paxos
About Paxos
Today’s financial infrastructure is archaic, expensive, inefficient and risky — supporting a system that leaves out more people than it lets in. So we’re rebuilding it.
We’re on a mission to open the world’s financial system to everyone by enabling the instant movement of any asset, any time, in a trustworthy way. For over a decade, we’ve built blockchain infrastructure that tokenizes, custodies, trades and settles assets for the world’s leading financial institutions, like PayPal, Venmo, Mastercard and Interactive Brokers.
About the team
The Site Reliability Engineering team at Paxos builds and maintains the secure cloud infrastructure that supports our mission to reshape the financial industry. They enhance engineering efficiency by creating standardized frameworks across product lines and develop self-service tools to streamline processes, making it easier for teams to get things done.
About the role
As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with our development and operations teams to build and maintain robust systems that support our applications. Your expertise in AWS cloud technologies and database management, particularly with RDS, PostgreSQL, and Aurora, will be essential to our success.
What you’ll do
- Design, implement, and maintain highly available and scalable AWS infrastructure.
- Manage and optimize database technologies, with a focus on Amazon RDS, and Amazon Aurora.
- Monitor system performance, identify potential issues, and implement solutions to ensure system reliability.
- Collaborate with development teams to ensure seamless deployment and integration of new features and updates.
- Develop and maintain automation scripts to improve efficiency and reduce manual intervention.
- Implement security best practices to protect data and ensure compliance with industry standards.
- Conduct root cause analysis of incidents and implement preventive measures to avoid recurrence.
- Participate in on-call rotations to provide 24/7 support for critical systems.
About you
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
- Minimum of 5 years of experience in site reliability engineering or a related role.
- Extensive experience with AWS cloud technologies, including EC2, S3, Lambda, CloudFormation, and CloudWatch.
- Strong expertise in database technologies, particularly RDS, PostgreSQL, and Aurora.
- Proficiency in scripting/programming languages such as Python, Bash, or Go.
- Experience with infrastructure as code (IaC) tools such as Terraform.
- Familiarity with containerization and orchestration technologies such as Docker and Kubernetes.
- Excellent problem-solving skills and the ability to troubleshoot complex issues.
- Strong communication skills and the ability to work effectively in a remote team environment.