Reducing Cold Start Delays by 50% in Serverless and FaaS Environments with eBPF
Eliminating FaaS cold starts with eBPF
It was back in 2021, just before the COVID-19 crisis erupted, that I first encountered some personal use cases that could benefit from the concept of Serverless and FaaS.
What I really liked about it was that, in theory, you only pay for what you use, and when you’re not utilizing it, the resources are scaled down to zero until the next time an event triggers them.
While solutions like AWS Lambda or Google Cloud Run are easy to use, behind this abstraction, they run your code in containers. And with that comes an initial delay—the container needs to initialize before your code can actually execute.
Just think about when you want to run a Docker container locally—it always takes around a second or so before your code starts executing.
For most users, this delay is acceptable since it’s still better than keeping a service running 24/7.
But when it comes to latency-critical applications like web services, industrial monitoring, or stock trading, these delays can significantly impact performance or user experience.
A major contributor to these delays is so called cold start—the time it takes for a serverless function or container to initialize when it has been non-active for some time or is launching for the first time.
While exploring this issue during one of my research projects, I started thinking: Could eBPF help mitigate some of these cold start overheads?
At a high level, Serverless usage patterns can be categorized into three types:
Stream pattern: A steady flow of events where FaaS resources scale dynamically based on metrics like memory or CPU usage.
Burst pattern: Predictable events, such as from a scheduled cron job, allowing the provider to anticipate demand and pre-initialize resources just in time.
Random pattern: The most common pattern, where invocations occur sporadically, making deterministic pre-warming quite challenging.
💡 FaaS is a subset of Serverless (computing) that runs event-driven functions, while Serverless is a broader concept where infrastructure management, including databases, storage, and APIs, is abstracted from the user.
And while FaaS providers offer different solutions like (AI-based) prediction or advanced container state caching to mitigate cold starts, none really eliminate the issue without adding extra costs — something that we try to avoid in the first place.
One of my colleagues confirmed the impact of cold starts in his thesis (in Slovene), benchmarking different FaaS provider by running a simple function that just stored event data in various databases within the same geographical region.
The best way to interpret the following graphs is to observe how both client-side response time (top row) and backend execution time (FaaS function, bottom row) drop significantly for subsequent requests.
💡 Behind the scenes, Vercel Serverless Functions instantiate AWS Lambda functions.
Vercel also provides Vercel Edge Functions, enabling users to deploy FaaS functions in two distinct settings:
Global: Availability across multiple geographical locations for optimal performance and reduced latency
Regional: Deployment within a specific region to optimize performance for users in that region
Another interesting FaaS solution comes from Cloudflare, which instead of containers uses so called Workers—lightweight primitives (V8 isolates). In theory, these isolates require less time to instantiate.
💡 Cloudflare employs a pre-warming technique where, based on the SNI field in the first TLS packet, it begins initializing the corresponding function early.
While the graphs clearly show the cold start effect, it's also important to consider the choice of database, as it significantly impacts response time—but this is beyond the scope of this post.
Additionally, FaaS and database providers are not always strictly deployed in the same region. Often, the FaaS provider is positioned closer to the client at the edge, while data is stored in a distant region e.g. due to regulatory or compliance constraints—further increasing delays.
But when you think about FaaS functions running in the cloud, delays aren't just about container initialization—network latency is also a major factor.
It’s just unlikely that your FaaS function is used exclusively by people near the data center where it is hosted.
💡 Latency data is sourced from Global Ping Statistics, while also considering that TCP + TLS requires more round trips compared to a simple ICMP ping request-response.
In other words, when a client sends an event to a cold FaaS function, basic networking principles tell us that before any application traffic can be exchanged, a TCP handshake must first be completed, followed by a TLS handshake.
Seems unavoidable, right?
I thought so too—until I discovered that by leveraging eBPF, we can cut response times by up to 50%.
eBPF Solution
The core idea behind using eBPF to reduce cold start impact revolves around network latency.
Here’s what typically happens:
The client sends a TCP SYN packet.
The server responds with a TCP SYN+ACK packet.
The client completes the handshake with a TCP ACK packet and initiates a TLS handshake.
Only then is the HTTP request sent.
For a FaaS provider, it's the HTTP request that triggers container (FaaS function) initialization. But what if we could start the process much earlier?
That’s exactly what we can achieve with eBPF.
I’ve developed two different solutions to make this possible.
The first solution involves embedding a function ID into the TCP options of the TCP SYN packet—the first packet in the TCP handshake.
When this packet is received by the FaaS provider, an eBPF program extracts the function ID and sends it to a FaaS Gateway in user space.
This process is asynchronous, meaning the FaaS function begins initialization in parallel with the ongoing TCP handshake.
💡 For the sake of this proof-of-concept, we won’t include TLS in the discussion, although it is, of course, always present.
The limitation of this solution is that it requires modifications on the client-side, which isn’t always the most practical approach.
After all, you're serving an application, and clients prefer minimal configuration changes—they just want it to work.
So, I tried to develop a version that achieves the same optimization with changes only on the server side.
In this second solution, instead of modifying the TCP SYN packet, we:
Assign Unique Port Numbers: Assign a unique port number to each function (e.g., hello-world on port 8081, some-other-function on port 8082, etc.).
Capture the TCP SYN Packet Using eBPF: The eBPF program is triggered by every TCP SYN packet received and extracts the port number from the TCP header.
Transfer the Port Number to User Space: The port number is then transferred to the FaaS Gateway.
Initialize the Function Based on the Port Number: Using the port number, we know which function to initialize.
Like the first approach, the TCP handshake remains unaffected since the server-side processing happens in parallel.
Below is a diagram illustrating the optimization.
What do we achieve by this?
With sufficient network latency, by the time the handshakes are complete, the container running the function is already prepared to handle the client’s requests.
More specifically, this happens when the combined network latency (TCP three-way handshake + potential TLS handshake) is equal to or greater than the container initialization time.
envis one of the FaaS function names I used for testing, with an initialization time of around 250ms.
The graph clearly illustrates how increasing network latency mitigates cold start impacts and reduces client's response time.
Notably, the blue line in the graph has a “breaking point” at 250ms on the X axis—where network latency matches the container initialization time. Beyond this threshold (> 250ms), any additional latency ensures the container behaves as if it were always ready.
Think about it—if this solution were implemented, functions with sufficiently high network latency would appear always ready, yet in the cloud, they would only run when needed and immediately scale to zero afterward.
Due to the nature of research papers and the potential of this topic, I'm unable to share any code details. But still, I’ve tried my best providing a thorough description of the process.
I expect some comments raising concerns about potential vulnerabilities, such as TCP SYN Flood attacks and other threats.
We've considered this as well.
A hint: this solution pairs well with Single Packet Authentication (SPA).
While a deep dive into this is beyond the scope of this post, I'm happy to share further research findings if there's interest.
Also, keep in mind that this approach is particularly well-suited for cloud providers, where the network latency between the client and server is relatively high.
I hope you find this resource helpful. Keep an eye out for more updates and developments in eBPF in next week's newsletter.
Until then, keep 🐝-ing!
Warm regards, Teodor













Awesome idea of utilizing time needed for SSL establishing to prestart a function! Would this work for cold start of additional functions if there was lets say a burst 50 concurrent requests suddenly coming in at same time? Did you maybe test that? Otherwise love your outside the box thinking :)
Very good & interesting topic! Appreciate the share. I always love to see innovations with eBPF!
Would love to see deep dive on “ solution pairs well with Single Packet Authentication (SPA) “, please.