More

erebe__ · on Jan 19, 2025

Hello,

You can see my other comment https://news.ycombinator.com/item?id=42708904#42756072 for more details.

But yes, the cache does persist after the first call, the resolved symbols stay in the cache to speed up the resolution of next calls.

Regarding the why, it is mainly because

1. this app is a gRPC server and contains a lot of generated code (you can investigate binary bloat with rust with https://github.com/RazrFalcon/cargo-bloat)

2. and that we ship our binary with debug symbols, with those options ``` ENV RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zlib -C force-frame-pointers=yes" ```

For the panic, indeed, I had the same question on Reddit. For this particular service, we don't expect panics at all, it is just that by default we ship all our rust binaries with backtrace enabled. And we have added an extra api endpoint to trigger a catched panic on purpose for other apps to be sure our sizing is correct.

erebe__ · on Jan 19, 2025

Sorry if the article is misleading.

The first increase of the memory limit was not 4G, but something roughly around 300Mb/400Mb, and the OOM did happen again with this setting.

Thus leading to a 2nd increase to 4Gi to be sure the app would not get OOM killed when the behavior get triggered. We needed the app to be alive/running for us to investigate the memory profiling.

Regarding the increase of 400MiB, yeah it is a lot, and it was a surprise to us too. We were not expecting such increase. There are, I think 2 reasons behind this.

1. This service is a grpc server, which has a lot of code generated, so lots of symbols

2. we compile the binary with debug symbols and a flag to compress the debug symbols sections to avoid having huge binary. Which may part be of this issue.

the8472 · on Jan 19, 2025

> 2. we compile the binary with debug symbols

symbols are usually included even with debuglevel 0, unless stripped[0]. And debuginfo is configurable at several levels[1]. If you've set it to 2/full try dropping to a lower level, that might also result in less data to load for the backtrace implementation.

[0] https://users.rust-lang.org/t/difference-between-strip-symbo... [1] https://doc.rust-lang.org/cargo/reference/profiles.html#debu...

erebe__ · on Jan 19, 2025

Thanks, was not aware there was granularity for debuginfo ;)

delusional · on Jan 19, 2025

> Sorry if the article is misleading.

I don't think the article is misleading, but I do think it's a shame that all the interesting info is saved for this hackernews comment. I think it would make for a more exciting article if you included more of the analysis along with the facts. Remember, as readers we don't know anything about your constraints/system.

erebe__ · on Jan 19, 2025

It was a parti pris by me, I wanted the article to stay focus on the how, not much on the why. But I agree, even while the context is specific to us, many people wanted more interest of the surrounding, and why it happened. I wanted to explain the method ¯\_(ツ)_/¯

CodesInChaos · on Jan 19, 2025

> we compile the binary with debug symbols and a flag to compress the debug symbols sections to avoid having huge binary.

How big are the uncompressed debug symbols? I'd expected processing uncompressed debug symbols to happen via a memory mapped file, while compressed debug symbols probably need to be extracted to anonymous memory.

https://github.com/llvm/llvm-project/issues/63290

erebe__ · on Jan 19, 2025

Normal build

  cargo build --bin engine-gateway --release
      Finished `release` profile [optimized + debuginfo] target(s) in 1m 00s

  ls -lh target/release/engine-gateway 
  .rwxr-xr-x erebe erebe 198 MB Sun Jan 19 12:37:35 2025    target/release/engine-gateway

what we ship

  export RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zlib -C force-frame-pointers=yes" 
  cargo build --bin engine-gateway --release
      Finished `release` profile [optimized + debuginfo] target(s) in 1m 04s

  ls -lh target/release/engine-gateway
  .rwxr-xr-x erebe erebe 61 MB Sun Jan 19 12:39:13 2025  target/release/engine-gateway

The diff is more impressive on some bigger projects

adastra22 · on Jan 19, 2025

The compressed symbols sounds like the likely culprit. Do you really need a small executable? The uncompressed symbols need to be loaded into RAM anyway, and if it is delayed until it is needed then you will have to allocate memory to uncompress them.

erebe__ · on Jan 19, 2025

I will give it a shot next week to try out ;P

For this particular service, the size does not matter really. For others, it makes more diff (several hundred of Mb) and as we deploy on customers infra, we want images' size to stay reasonable. For now, we apply the same build rules for all our services to stay consistent.

adastra22 · on Jan 19, 2025

Maybe I'm not communicating well. Or maybe I don't understand how the debug symbol compression works at runtime. But my point is that I don't think you are getting the tradeoff you think you are getting. The smaller executable may end up using more RAM. Usually at the deployment stage, that's what matters.

Smaller executables are more for things like reducing distribution sizes, or reducing process launch latency when disk throughput is the issue. When you invoke compression, you are explicitly trading off runtime performance in order to get the benefit of smaller on-disk or network transmission size. For a hosted service, that's usually not a good tradeoff.

erebe__ · on Jan 19, 2025

It is most likely me reading too quickly. I was caught off guard by the article gaining traction in a Sunday, and as I have other duties during the weekend, I am reading/responding only when I can sneak in.

For your comment, I think you are right regarding compression of debug symbols that add up to the peak memory, but I think you are misleading when you think the debug symbols are uncompressed when the app/binary is started/loaded. Decompression only happens for me when this section is accessed by debugger or equivalent. It is not the same thing as when the binary is fully compressed, like with upx for example.

I have done a quick sanity check on my desktop, I got.

  [profile.release]
  lto = "thin"
  debug = true
  strip = false

  export RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zlib -C force-frame-pointers=yes"
  cargo build --bin engine-gateway --release

From rss memory at startup I get ~128 MB, and after the panic at peak I get ~627 MB.

When compiled with those flags

  export RUSTFLAGS="-C force-frame-pointers=yes" 
  cargo build --bin engine-gateway --release

From rss memory at startup I get ~128 MB, and after the panic at peak I get ~474 MB.

So the peak is taller indeed when the debug section is compressed, but the binary in memory when started is roughly equivalent. (virtual mem too)

I had some hard time getting a source that may validate my belief regarding when the debug symbol are uncompressed. But based on https://inbox.sourceware.org/binutils/20080622061003.D279F3F... and the help of claude.ai, I would say it is only when those sections are accessed.

for what is worth, the whole answer of claude.ai

  The debug sections compressed with --compress-debug-sections=zlib are decompressed:

  At runtime by the debugger (like GDB) when it needs to access the debug information:

  When setting breakpoints
  When doing backtraces
  When inspecting variables
  During symbol resolution


  When tools need to read debug info:

  During coredump analysis
  When using tools like addr2line
  During source-level debugging
  When using readelf with the -w option


  The compression is transparent to these tools - they automatically handle the decompression when needed.    The sections remain compressed on disk, and are only decompressed in memory when required.
  This helps reduce the binary size on disk while still maintaining full debugging capabilities, with only a small runtime performance cost when the debug info needs to be accessed.
  The decompression is handled by the libelf/DWARF libraries that these tools use to parse the ELF files.

adastra22 · on Jan 19, 2025

Thanks for running these checks. I’m learning from this too!

the8472 · on Jan 19, 2025

Container images use compression too, so having the debug section uncompressed shouldn't actually make the images any bigger.

xgb84j · on Jan 19, 2025

Thank you for this in-depth reply! Your answer makes a lot of sense. Also thank you for writing the article!

erebe__ · on July 6, 2024

Shameless plug, you can use wstunnel which disguise your traffic as websocket to tunnel any traffic you want. I had most success with it, as it uses TCP, than with QUIC/HTTP3 as usually UDP is more heavily restricted. It works behind GFW and let you use your wireguard for example... I had also good feedback from people in Turkey and Iran

wstunnel: https://github.com/erebe/wstunnel

ChocolateGod · on July 6, 2024

Isn't UDP over TCP a generally bad idea?

g1sm · on July 6, 2024

As far as I know, TCP over TCP is less than ideal. I don’t see why UDP over TCP (or the other way around) would be bad.

fragmede · on July 6, 2024

Packet size is dependent on the MTU, so practically speaking you're trying to put something 1460 bytes into a 1460 byte container and the only way for that to fit is the split the packet or tell the packet generator to make smaller packets. both of which are reasonable options but they're not the most efficient, leading to slower connections when tunneling one inside the other. It's less of a deal theses days, but that's the why of it.

g1sm · on July 6, 2024

Well, sure, but then any kind of encapsulation is less than ideal. However, here the context is VPNs and this means there’s always going to be some sort of encapsulation. And if the choices are between encapsulating something in TCP and encapsulating something in UDP, the latter should always be chosen.

erebe__ · on July 7, 2024

I use the word tunnel but it would be more correct to use "proxy"

There is no wrappring of udp packet into another layer of TCP. Wstunnel unpack the data at the client forward it using tcp/websocket, and after re-take this data to put it back into its original form (i.e: udp)

so there is no encapsulation of many protocol.

The only place where there is encapsulation, is for tls. if your client use tls to connect to wstunnel server. And that your data is already encrypted with tls (i.e: https) there will be 2 tls encryption

erebe__ · on May 23, 2024

WsTunnel improved a lot since last shared/communicated version, with the addition of:

- Reverse tunneling \o/ - Added support of unix socket, transparent proxy/ip, UDP for socks5 - Support of mTLS - Config file to restrict the allowed tunnels - Cerfificate and config auto-reload - Added http2 as transport protocol (websocket is more performant) - Complete rewrite fron Haskell to Rust

Hope you enjoy and find it useful :-)

https://github.com/erebe/wstunnel?tab=readme-ov-file#underst...

erebe__ · on April 8, 2024

Nice project :)

HTTP3 may not be suitable for all environments, as UDP is pretty commonly filtered.

If you are in such scenario, you may want to take a look at wstunnel, it allows you to do the same (and more) over websocket or HTTP2.

https://github.com/erebe/wstunnel

ongy · on April 9, 2024

I just saw, judging by Nick you are probably the author :)

Do you have the protocol defined somewhere? Wstunnel is one of our options, and we'll likely add a golang library for the solution we chose. Would be easier if I don't have to figure out the frame format from code.

ongy · on April 9, 2024

We recently ran into issues with http2. Specifically the lack of support in zscaler.

We are still looking into something like wstunnel and websockets, though I'm preparing myself for the day when we have to add "normal" http1.1 support :(

erebe__ · on Jan 20, 2024

Shameless plug, there is also wstunnel (i am its author) https://github.com/erebe/wstunnel/, hope you enjoy.

erebe__ · on Jan 20, 2024

You can use wstunnel to bypass firewall. I had many feedbacks from chinese/turkish/iranian people using it with success. Easy to setup also with static binaries.

https://github.com/erebe/wstunnel/

erebe__ · on Dec 13, 2023

The article share some common pitfalls people stumble upon when starting using Async in Rust.

If you know some others that are not listed, let's share it in comment :)

P.s: Other tuto article, if you are starting with Async https://www.qovery.com/blog/a-guided-tour-of-streams-in-rust

erebe__ · on Oct 6, 2023

Hello, Just sharing an article that explains how to build your own cloud network with Linux (ntftables) and wireguard. IPv6 is a first class citizen in the article, hope you enjoy :)

ttyprintk · on Oct 7, 2023

Good job—-this is very clearly written, and must have taken quite a bit of time.

erebe__ · on Oct 5, 2023

This article explains how to build your own cloud network, how to build your own place on the Internet, carve your little network cave reachable from others over the Internet!