More

volta83 · on Dec 22, 2021

This might be an error on both sides.

They want to study how websites handle GDPR and CCPA, and that's probably what they submitted.

The IRB reasoned that "websites are not people", which is true, but failed to reason that "websites are operated by people", and therefore certain measures should be taken.

The IRB bears some responsibility for this.

volta83 · on Dec 13, 2021

> If you didn't have canonical implementations, you would have to give up being able to write safe & efficient data-structures like this.

No? Just make the type generic on the ordering.

Instead of MaxHeap(T), make it MaxHeap(T, Ord:(T,T)->bool) or something like that.

curtisf · on Dec 13, 2021

I explained why this doesn't work -- what do you do when you combine two heaps with different internal orderings?

You need some way to enforce that the orderings are the _same_. A canonical ordering per type is how Haskell does it; dependent types are a more complicated alternative method.

remexre · on Dec 14, 2021

I think GP is suggesting... well, dependent types, but in a restricted enough way that I don't believe it adds additional complexity over DataKinds. At least, I read

> Instead of MaxHeap(T), make it MaxHeap(T, Ord:(T,T)->bool)

as saying the kind of MaxHeap should be, in Coq syntax, (forall (T : Set), (T -> T -> Bool)). I think this doesn't add any additional complexity vs DataKinds since the function doesn't need to be evaluable at typechecking time, just unified against at construction-time.

volta83 · on Dec 15, 2021

> You need some way to enforce that the orderings are the _same_

Just require that the Ord type is the same for all heaps ?

MaxHeap(MaxHeap(T, MyOrd), MyOrd) uses the same ordering, but MaxHeap(MaxHeap(T, Ord0), Ord1) does not.

volta83 · on Dec 16, 2021

A simple way to do this is to:

MyNestedMaxHeap(T, Ord) = MaxHeap(MaxHeap(T, Ord), Ord)

such that when using MyNestedMaxHeap only one Ord type can be passed.

volta83 · on Dec 10, 2021

Many in the core team don’t code, don’t even know Rust well, etc.

hobofan · on Dec 10, 2021

What a ridiculous assertion! From everything I have personally seen (online discussions, presentations and in-person encounters) about the core team, I would say that each of them is in the top 1 percentile of "knowing Rust well".

Yes, some of them may not have much open source Rust code out there (as they use Rust for commercial/proprietary products), and if they do it's not some "ingenious" solutions that other individual contributors in the Rust ecosystem have created, but they all code, and they all know Rust very well.

volta83 · on Dec 12, 2021

Then you must be extremely bad at rust.

volta83 · on Dec 8, 2021

I wonder how they manage to keep the FP64 units busy. Seems this is an HPC product, but many HPC apps are memory bound. So to improve FP64 perf by 4 one might need to improve DRAM bandwidth by 8-16x. Otherwise the units would only be stalled waiting for memory.

But it seems they did not improve bandwidth by much?

dvdkhlng · on Dec 8, 2021

I don't know anything about the details here, but with usual linear algebra stuff, bandwidth depends on the size of the kernel that fits into the local memory inside whatever IC you use for your floating-point computation.

E.g. matrix multiplication of n×n square matrices has computational cost of n³ but bandwidth cost of n². Usuall a big m x m matrix is split into many blocks of n×n matrices (with m = k×n). If a n×n matrix fits into the local store of your CPU (cache or registers), then bandwidth cost for the m x m matrix product is k³×n×n = m×m×m/n, so the bigger the block-size 'n' that you can process inside the CPU, the less bandwidth you need.

edit: formatting

my123 · on Dec 8, 2021

> I wonder how they manage to keep the FP64 units busy

They don’t. See https://www.amd.com/en/graphics/server-accelerators-benchmar....

The MI250X, despite being dual big dies, doesn’t do especially well.

arcanus · on Dec 8, 2021

I disagree. The website you linked to shows speed-ups on MI250X between 1.6x and 3x higher than A100. The theoretical memory bandwidth speed-up between MI250X and A100 is only 1.6X (3.2 TB/s vs 2.0 TB/s). Thus, I'd say they are seeing the advantage of higher FP64 compute in those applications.

volta83 · on Dec 12, 2021

Makes sense. Comparing nodes with 2x or 4x MI250X vs 4x or 8x A100-80 it doesn't really seem that there is any speed up at all for memory bound apps.

dragontamer · on Dec 8, 2021

> Seems this is an HPC product, but many HPC apps are memory bound.

The point of a supercomputer is to throw so much compute at a problem, that everything else is the bottleneck.

If an HPC app is memory-bound, then the GPU / Supercomputer was successful at its job. So many HPC apps are memory bound because... well... turns out our machines are actually quite good.

In any case, MI200 has 1.6x the bandwidth as the A100. So if you have a massively-parallel use-case that is memory bound, the MI200 line should have an advantage.

-------

The main issue IMO, is that the MI200's 1.6x bandwidth is really 80% bandwidth applied over two die, connected with a incredible amount of "infinity fabric" links to share the data. I have to imagine that the A100's larger design wins in some cases over the MI200's chiplet design.

volta83 · on Dec 12, 2021

I agree with you, which is why I don't really understand what the point of improving FP64 perf by 4x is, if that is not the bottleneck for many apps.

Per node, a 4x MI250X node has more or less the same BW as a DGX-A100 (8x A100). It has 2x more FP64 compute, but for most science and engineering apps, which are memory bound, 2x more FP64 compute does not make these apps any faster.

volta83 · on Dec 3, 2021

Ideally it would be a constitutional ammendent, stating that privacy is a right of every citizen, and a human right.

Legislation to the right to encryption would then just follow from it, but having it in the constitution would make it harder for future governments to pass certain kinds of legislation, or if they pass them, the supreme court would declare them inconstitutional.

monopoledance · on Dec 3, 2021

> constitutional ammendent

They don't have the necessary majority to make constitutional changes without the CDU, I think. And the CDU is why we can't have nice things.

solarkraft · on Dec 4, 2021

Which is why it's weird that they keep discussing lowering the voting age to 16. Also impossible without the conservatives, AFAIK.

felixhammerl · on Dec 5, 2021

That is a good example of a misdirection play, a polarizing discussion point that political opponents get hung up on, which you never ever intended to actually pass, and you'll happily drop it later, but it helped relieve political pressure from other potential hot topics. We see typical conservative tactics being employed against the conservative base :)

cblconfederate · on Dec 3, 2021

To be clear, it is already a human right, right of expression association privacy etc. Just nobody defends it

quotemstr · on Dec 3, 2021

The problem is defining "privacy" though, isn't it? What, specifically, would the amendment require or prohibit under the banner of "privacy"? How could this definition, if enshrined into nearly immutable law, be modified to account for a changing technological environment? How do you prevent such an amendment from suffering from the fate of the US second amendment, which in many places is honored mostly in the breach?

xxpor · on Dec 3, 2021

The Germans have a completely different legal system that isn't based on court decided precedent, so the comparison doesn't make any sense.

JCWasmx86 · on Dec 3, 2021

> human right

This should be the goal. In the days of the internet, mass surveillance, nearly eradicated privacy we need a right to have and use strong encryption without any backdoors.

volta83 · on Dec 2, 2021

> DPU SmartNICs,

Which other companies beyond NVIDIA sell DPUs ?

Zandikar · on Dec 2, 2021

Marvell, Fungible, Broadcom, and Intel (though they may brand them as IPU or something else, they're in the same market/use niche), possibly more.

volta83 · on Dec 2, 2021

Can you change the "shape of the vectors? e.g. 1x16 vs 4x4 to support vectors and matrices?

crest · on Dec 2, 2021

You have widening operations e.g. 16x16->32 bit multiplications and can reduce number of available registers to get longer vectors, but among the really interesting ones are fault only first load and masked instructions that enable the vector unit to work on things like null terminated strings. The specification includes vectorized strlen/strcmp/strcpy/strncpy implementations as examples. Most existing (packed) SIMD instruction sets aren't useful for these common functions.

jcranmer · on Dec 2, 2021

Given the number of implementations of str* routines in https://github.com/bminor/glibc/tree/master/sysdeps/x86_64/m..., maybe you might want to revisit your last statement. PCMP/MOVMSK work well enough for finding the trailing NUL.

crest · on Dec 3, 2021

Now compare how many different versions of the functions are required for the dozens of possible x86 extensions (and combinations of them) and all the prologue/epilogue code required to watch out for page boundaries and unaligned pointers and as well as the length of the inner loop to handle all the packing/unpacking and cobbeling together horizontal operations to the required masks and turn somehow use them for flow control where needed. It's enough code to put painful pressure on the instruction cache and requires wide OoO superscalar CPU cores to be worth the overhead compare the code in the RISC V vector spec with this strcmp https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/m... and tell me it's a clean and straightforward implementation using the instruction set as intended and not an ugly hack around its limitations.

jcranmer · on Dec 3, 2021

I'm not going to dispute that x86's approach leads to a lot of duplication for each vector size, but your statement was that the fixed-size vector approach isn't "useful for these common functions," which implies to me that it couldn't be used at all.

volta83 · on Dec 2, 2021

> RISC-V is faster..

I find it funny that you make the same pitfall than the author did.

Faster on which CPU?

The author doesn't measure on any CPU, so here there are dozens of people hypothesizing whether fusion happens or not, and what the impact is.

jhallenworld · on Dec 2, 2021

All other things equal, you would prefer smaller code for better cache use.

volta83 · on Dec 3, 2021

8x 2 byte instructions (16 bytes) lead to smaller code than 4x 8 byte instructions (32 bytes).

Counting number of instructions isn't really a good metric for that either.

brutal_chaos_ · on Dec 2, 2021

> Faster on which CPU?

Perhaps faster means fewer instructions in this instance? Considering number of instructions is what has been discussed.

volta83 · on Dec 3, 2021

Right, but all architectures can handle many combinations of instructions in 1 cycle, so this is not really a great proxy for that.

Same for code size. If the instructions are half the size, having 1.5x more instructions still means smaller binaries.

volta83 · on Dec 2, 2021

The CPU executes the two (or more) dependent instructions "as if" they were one, e.g., in 1 cycle.

The CPU has a frontend, which has a decoder, which is the part that "reads" the program instructions. When it "sees" certain pattern, like "instruction x to register r followed by instruction y consuming r", it can treat this "as if" it was a single instruction if the CPU has hardware for executing that single instruction (even if the ISA doesn't have a name for that instruction).

This allows the people that build the CPU to choose whether this is something they want to add hardware for. If they don't, this runs in e.g. 2 cycles, but if they do then it runs in 1. A server CPU might want to pay the cost of running it in 1 cycle, but a micro controller CPU might not.

cpeterso · on Dec 2, 2021

Do RISC-V specs document which instruction combinations they recommend be fused? Sounds like the fused instructions are an implementation detail that must be well-documented for compiler writers to know to emit the magic instruction combinations.

mlyle · on Dec 3, 2021

For this specific case, yes, the RISC-V ISA document recommends instruction sequences for checking for overflow-- that are both amenable to fusion and are relatively high performing on implementations that don't fuse.

Section 2.4, https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2...

__init · on Dec 3, 2021

It generally goes the other way around -- programmers and compilers settle on a few idiomatic ways to do something, and new cores are built to execute those quickly. Because RISC-V is RISC, it seems likely that those few ways would be less idiomatic and more 'the only real way to do x', which would aid in the applicability of the fusions.

volta83 · on Dec 2, 2021

Good question.

Excess mortality in the USA and the EU in 2020 were ~470k and ~580k deaths.

The population of the USA and the EU in Jan 2020 was ~329 and ~447 million.

The excess mortality in the USA and the EU in 2020 was ~143 vs ~129 excess deaths per 100.000 inhabitants. The USA had ~10% more excess deaths per capita than the EU.

People living in the EU during 2020 had statistically a significantly better chance of not dying of COVID than people living in the USA, even though COVID hit the EU first, which gave the USA longer time to prepare.

This doesn't really answer your question, because the answer is very personal. Some people were really scared and preferred to trade some freedom for more safety. And well we have many examples of vocal famous people that traded off safety for freedom, and died of COVID. These people would have probably been better off had they lived in the EU, even if they would have been breaking the law and paying fines.

disambiguation · on Dec 2, 2021

This is really dumb.

There was no pan-Euro response nor was there a pan-American response, so these groupings are arbitrary. Unless you have an objective "score" that Euro lockdowns and policies on average went further than American lockdown policies, these numbers are meaningless.

You could easily justify the differences as deriving from Europe having an objectively healthier population and better healthcare systems than the US.

volta83 · on Dec 2, 2021

I just showed the data, and summarized that according to the data, people in the EU had better chances.

I never claimed that this data is the result of COVID policies.

This is something you made up, and then proceeded to debunk, which is essentially the definition of a strawman.

Please stop building strawmans and putting words in people's mouth. It's unpolite.

disambiguation · on Dec 3, 2021

So some one asks about the math on the success of lockdown policies, and you take the opportunity to do the math on... something completely unrelated?

It's not clear you're making any claims at all, which begs the question why you bothered posting in the first place.

volta83 · on Dec 3, 2021

Its not completely unrelated.

Excess mortality is the only fact that we know for sure given that every country counts "COVID deaths" differently.

Given that there is a statistical difference, the only thing we can probably know for sure is that an individual chances were slightly better in the EU.

The difference is small, like others have mentioned, so it doesn't seem like EU policies did a lot for the whole EU.

This makes sense, since for example Germany had pretty harsh policies, but due some of its neighboring countries having pretty lax policies (e.g. Austria), some parts of Germany were extremely affected (e.g. Bavaria).

This hints that it doesn't really matter if single states have harsher policies as long as neighboring states do not, at least for states of the size of Germany.

If we look at China, which had a very cohesive policy in all its provinces, the story differs.

This hints that if cohesive policies would have been taken at EU and USA scale, the outcome might have been different.

That didn't happen, so we will never know for sure.

This doesn't answer the OP question, but if someone claims to have an answer, they are probably lying, because we don't really have facts to back that answer up.

adam_arthur · on Dec 2, 2021

I wouldn't call 10% difference significantly higher... more like within the margin of error.

With such drastic policy differences you would expect to see a larger effect.

MagnumOpus · on Dec 2, 2021

The EU didn’t act as one entity in the epidemic/lockdown/furlough policies. In particular Eastern European countries went with US-like policies (and less vaccine provision and even more anti accents) and and suffered more than the US.

The Nordics, France and German-speaking countries have half the death rate of the US or less, in some cases even negative excess deaths.

volta83 · on Dec 2, 2021

The US didn't act as one entity either.

r00fus · on Dec 2, 2021

Well, I'd say the biggest difference was that in the EU, many countries would give you 80% unemployment to stay home for many months.

In the US, you'd get a few thousand $, many months later. So yeah, you might have had a similar chance to catch COVID but your livelihood and financial security were much more secure in Europe.

Policy-wise w/r/t COVID, EU and US are nowhere near NZ, Tiawan or China. Now that's containment taken seriously and it shows.

adam_arthur · on Dec 2, 2021

Many in the US made significantly more on the combination of enhanced unemployment and stimulus checks than they had made working. Greater than 100% previous wage.

You can see the aggregate personal income data here: https://fred.stlouisfed.org/series/PI

shin_lao · on Dec 2, 2021

10% difference for a virus with exponential spread is a margin of error difference.