Maxim Fedorov
Performance & scalability engineer
Maxim Fedorov is a software engineer at WhatsApp, the largest messaging app. Maxim’s work is focused on performance and scalability of the server side.
Before WhatsApp, Maxim has been developing low-latency TCP/IP applications at NetAlliance (Sydney, Australia), designing Kaspersky Enterprise Security Endpoint (Moscow, Russia), improving Parallels Virtual Automation, called Odin now, at Parallels (former SWsoft), and developing network security software before.
Past Activities
Code BEAM America 2021
12.25 - 13.05
Fireside chat on BEAM security
Join Maxim Fedorov and Bram Verburg to discuss security for BEAM-based applications. How can industry best-practices for secure coding, testing and deployment hardening be applied to the Erlang ecosystem? What has been achieved over the last few years and what challenges remain? How can the community collaborate on moving things forward? Audience participation, through questions/comments in the session chat, is encouraged!
Code BEAM America 2021
11.05 - 11.45
Harnessing OTP through Continuous Integration
Erlang/OTP is a live and breathing repository containing over two million lines of code. Over 4,300 commits were made between R23 and R24. Five years ago we started with heavily patched OTP 16 fork powering our servers. Now we run the latest version before it’s officially released, getting the most out of Erlang. This talk is about the road we took, how we ended up running OTP tests in out CI pipeline, and how we made it faster & friendlier for developers.
AUDIENCE:
technical leads, release engineers, development, infrastructure specialists
Code BEAM V Europe 2021
15.20 - 15.50
Ask me anything about Erlang Ecosystem Foundation
Short update from the Erlef Team and then you will be able to ask them any question you like about their work.
Code Mesh LDN
11.25 - 12.10
The art of challenging assumptions
We spent countless hours and sleepless nights bringing and keeping up server side of the most successful messaging service in the world. Looking back, how many choices we'd change? And how to ensure we make the right one next time? "The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil" (Donald Knuth). But why does it happen? Why did we do something we didn't really want? Because we acted on assumptions. This talk will guide through a number of war stories where assumptions were made and acted on. There were regrets and disappointments, and we learned to challenge assumptions the hard way. Now it's time to share what we learnt so far.
OBJECTIVES
- identify sources of human errors in software development
- discuss instruments and routines helping to challenge assumptions
- provide advice for improved decision making process
AUDIENCE
Tech leads, software architects, systems designers and everyone else involved in making technical decisions and facing the consequences.
Code BEAM SF 2019
13.50 - 14.35
Mid-air airplane repair: troubleshooting at WhatsApp
Simple, reliable messaging. It takes a lot to support this statement. For 10 years WhatsApp demonstrated unprecedented reliability and availability, serving over 1.5B users. There is absolutely no way to reproduce interactions between all of them, within the cluster spanning over 10,000 nodes and multiple datacenters. Investigations must be done on a live system without disturbing connected users. If there are repairs needed, it has to be done on the fly.
This talk will guide through debugging and troubleshooting techniques used at WhatsApp. Maxim will share a few case studies, explain monitoring, introspection, performance analysis, and tools.
Some knowledge of Erlang and C is necessary.
OBJECTIVES
Share processes, best practices, tools and war stories about 10 years of reliable messaging service.
TARGET AUDIENCE
Software developers, DevOps, Site Reliability Engineers, System Administrators and everyone else interested in troubleshooting live production system.
Code Mesh LDN 2018
15.25 - 16.10
Scaling Erlang cluster to 10,000 nodes
Growing user population beyond 1.5B does not leave a chance to keep server footprint as small as it used to be. Adding new capabilities requires more and more processing power. When it gets impossible to keep everything on just ten servers, we have to scale the cluster to a hundred. When a hundred gets too tight, we expand it to 1,000. What’s next? 10,000? And how is it possible, considering current scalability limits of a single Erlang cluster?
This talk will guide you along the way we took to improve Erlang scalability, remove bottlenecks and increase the efficiency of our Erlang-based applications.
OBJECTIVES
Demonstrate an example of live Erlang cluster being scaled from just a few nodes to 10,000 machines with no service interruption.
TARGET AUDIENCE
Scalability engineers, people interested in optimising Erlang for large-scale server applications.
Media
Articles: 1
How to serve 1.5 billion active users at the same time - scaling Erlang cluster to 10,000 nodes
A growing user population beyond 1.7B, whilst simultaneously adding new capabilities, does not leave much chance to keep the server footprint as small as it used to be.
READ MORE