as i understand it, multiple threads reading single pieces of data is faster than multiple threads reading different copies of the same data, as long as you don't have to lock them or increment their reference counts. it's faster because it reduces cache pressure. rcu is a widely used, highly performant way for many threads to read single pieces of data without having to lock them even if they might change. often you can even implement garbage collection without having to acquire readlocks on garbage-collectable things