Hacker Newsnew | past | comments | ask | show | jobs | submit | virexene's commentslogin

I may have missed something when skimming the paper, but it sounds like Xor filters are constructed offline and can't be modified efficiently afterwards, whereas Bloom filters can be inserted into efficiently. So they don't seem to be an exact replacement


No, you are correct. I misremembered with cuckoofilters.


I wonder if the reason the escape analysis fails could be that, for small enough types, the concrete value is directly inlined inside the interface value, instead of the latter being "a smart pointer" as the author said. So when the compiler needs to take a reference to the concrete value in `vs.chunkStore`, that ends up as an internal pointer inside the `vs` allocation, requiring it to be on the heap.

Either that or the escape analysis just isn't smart enough; taking a pointer to an internal component of an interface value seems like a bit of a stretch.


I'll have to disagree here.

Leaking resources in Rust is pretty trivial: create an Rc cycle with internal mutability. By contrast, in languages with a GC, cycles will be properly collected.

And there is nothing preventing GC languages from tying their non-memory resources to the lifetime of a value like Rust. See Python files, which are closed when the file handle is collected (ie. when all variables holding it go out of scope), and if that's not explicit enough, there's the "with" keyword to explicitly introduce a scope at the end of which the file is closed.


> Leaking resources in Rust is pretty trivial: create an Rc cycle with internal mutability.

It's trivial to do on purpose, but difficult to do accidentally.


> See Python files, which are closed when the file handle is collected (ie. when all variables holding it go out of scope)

That's a CPython implementation detail.

https://docs.python.org/3/reference/datamodel.html

> Do not depend on immediate finalization of objects when they become unreachable (so you should always close files explicitly).


In Rust, the default behavior when using resources like any other type is correct. You have to introduce a specific type to leak them.

In Python, you need to handle resources differently and introduce a language feature to not leak them.

That's the difference you're pointing out.


In Zig's case, you do what Rust/C++ do implicitly and create a table of function pointers


I think "packaging" here refers to the process of putting the silicon die in its plastic casing and connecting the die's pad to the case's pins, see https://en.wikipedia.org/wiki/Integrated_circuit_packaging


The author claims that mmap is exposed by glibc whereas memfd_create is not, so they reimplemented it with syscall, but looking at the man page, did they just forget to #define _GNU_SOURCE?


When I originally wrote this, it indeed was not


Ah, you're right, I didn't see it's a more recent addition.


in what way is unicode similar to html, docx, or a file format? the only features I can think of that are even remotely similar to what you're describing are emoji modifiers.

and no, this webpage is not result of "carefully cutting out the complicated stuff from Unicode". i'm pretty sure it's just the result of not supporting Unicode in any meaningful way.


If you want that projection to still be expressible as a matrix transformation, I don't think that's possible.


indeed, Rust can be fast, if you know what you're doing. this indeed means not making unnecessary allocations, but in my opinion, it also means... using byte-based indices instead of insisting on char-based indices, ie. Version 0, which OP so quickly dismissed.

using char-based indices means you have to convert back to byte-based indices, a linear time operation, every time you want to do much of anything with them. this is a silly performance loss, since you probably got those char-based indices by iterating over the string in the first place: you're doing redundant work, which could be avoided by using char_indices() in your initial iteration and keeping those byte-based indices for later manipulation. this is why that iterator exists, really.

you might ask: "but then if I do +1 to get the index of the next character, it might fall in the middle of a multibyte character and the substringing will panic!" yes, you will need to use char::len_utf8 or char_indices to offset your indices (forwards or backwards: CharIndices is a DoubleEndedIterator!). but this is less work than adding one... and then recounting characters from the beginning.

and importantly, +1 isn't even really appropriate to do with char-based indices either. there are many "characters" in the user-perceived sense of the word that are made up of multiple chars, and while cutting in the middle of one won't panic, it also won't give the user-friendly cutting you're expecting. just try your code with a flag emoji and see what happens: you'll split it into two weird "residual" characters.

if you care about that in your specific application, the solution is to iterate on an even bigger unit than chars: (extended) grapheme clusters, or EGCs. and because EGC segmentation is quite a bit more demanding than simple char-based iteration, using EGC-based numerical indices (which I believe Swift does by default?) is an even bigger waste of CPU time. in my opinion, you need to fully let go of the assumption that characters can be given consecutive numerical indices in a performant way, and once again, use byte-based indices along with the appropriate EGC-aware methods for acquiring and offsetting them.


given that unix time specifically does not handle leap seconds well, i'm not sure i would say they "had it right" all along


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: