Lead Site Reliability Engineer at IMVU
Mountain View, California, United States (Posted Aug 7 2014)
About the company
IMVU, Inc. (http://www.imvu.com) is an online social entertainment destination where members use 3D avatars to meet new people, chat, create and play games with their friends. Our developers are given significant autonomy over their work: we hire smart people and trust them to use their creativity and talents to do right by their peers and our customers. We are longstanding evangelists of Continuous Deployment, strong supporters of open source software and we spawned the Lean Startup movement. We value humility, open collaboration and foster an environment that supports learning from and mentoring others. We want you to grow your talent and career. We'll provide plenty of interesting challenges and strong support to ensure your success. At IMVU, we solve challenging problems and create amazing products together every day! Check out some of what we're doing on our Engineering blog: http://im.vu/engineering
As a Lead Site Reliability Engineer for IMVU's Operations team, you will be a champion for service quality and modernization. Working side by side with our development teams and Operations engineers, you will build and refine the tools that make them and yourself happy. From analyzing cluster performance to debugging live applications, you'll be given autonomy in your role of developing and integrating production-ready software. We trust you to look at service quality as a vigil, and the most challenging scalability issues as wonderfully engaging puzzles. We want you to stand with your teammates and be proud of what you built and eager to share the results. From high-availability load balancing to data integrity, where milliseconds and bytes matter, your choices make the difference between a product and a great product.
Building and scaling production software to power IMVU's servers and applications
Auditing and improving existing processes, software and hardware
Bridge the gap between engineering and system administration - making it seamless
Periodic on-call duty
Solving the pain points of 24x7 monitoring, maximizing service quality and ease of life for our Operations team
Ultimately responsible for IMVU's quality of service and production scaling
A diverse role, ranging from first response, performance analysis, service architect, scaling and more.
Power through autonomy: You will be a self-driven individual who is keen on finding problems and fixing them
Skills & requirements
3+ years experience in 24x7 Operations / cluster management environment
2+ years experience in site reliability development / software engineer role
Solid knowledge of L2/L3 networking fundamentals and technologies
Solid knowledge of CS fundamentals
Strongly experienced in GNU/Linux system administration
Fluent in a variety of open source compiled and scripting languages (Golang, PERL, C, Python, etc)
Highly approachable, capable of interacting with many different people and teams with differing goals
Must be willing to take periodic on-call duty, where you will be responsible as a first responder for all production issues 24x7. Please do not apply for this position if you are unable to meet this requirement.
Strong GoLang and PERL development experience
Strong understanding of TCP/IP, L4 and up load balancing and common routing protocols (BGP & OSPF)
MySQL database administration (master/master replication, real-time backups, recovery, monitoring, performance running)
24x7 Monitoring software experience, such as Nagios
Virtualization technology experience, including KVM & libvirt
Source control experience, including SVN & git
Strong LAMP stack experience
Experience with medium-sized production clusters focused on 5 9's uptime.
Excellent oral and written communication skills.
An exceptionally positive attitude.
Instructions how to apply
see the job website
[ job website
Let them know you found the job via http://www.golangprojects.com
(Companies love to know what recruiting strategies that works)