Hacker Newsnew | past | comments | ask | show | jobs | submit | SpikeGronim's commentslogin

ChaCha20 is implemented in hardware on many mobile platforms. It's often a preferred TLS cipher on Android. AES is common in hardware as well.


What mobile platforms implement ChaCha20? Can you point to any? I'm not aware of any widely available handset that claims to do this.

In fact, the whole reason ChaCha20/Poly1305 was even added to the TLS profile in the first place is because Google originally added it to their own OpenSSL fork, BoringSSL, as well as Android, and it was later proposed for inclusion in the standard. Google wanted a cipher that performed better in software than AES did - because the vast majority of all mobile platforms and handsets do not support AES acceleration either (ARMv8 does introduce cryptographic extensions for the SHA family and AES family, but 99% of handsets aren't those. Also I'm not sure if ARMv8 has a PCLMUL-equivalent for fast GCM computation, which is also a critical component of that scheme.)

That costs energy and battery life, because AES is very difficult to implement efficiently and securely in software, and even the fastest, most secure implementations are relatively slow. In contrast, ChaCha20 is incredibly simple to implement securely in software, even an efficient version is very-well within grasp of mortals (I've managed to do it myself).

That's why your Android phone uses ChaCha20 - not because it has hardware acceleration, but because it's fast in spite of not having it.

I'd be interested to know if any actual hardware implements this in the wild. Generally, a combination of AES-256 with GCM for systems with hardware acceleration, coupled ChaCha20/Poly1305 as a fallback software method, seems to be the way people are going. And ChaCha20/Poly1305, with enough effort, can get very close to rivaling AES performance in hardware on a contemporary x86 machine (ignoring actual ASICs and endpoint devices with hardware offload). For non-hardware AES impls, ChaCha should absolutely crush it in terms of performance.


It's actually the opposite AFAIK. One of the selection criteria for the Advanced Encryption Standard (AES) was cheap hardware implementations, and it's one reason why Rijndael was chosen over some of the stronger ciphers.

DJB has criticized the selection criteria for both AES and SHA3 as being too focused on hardware efficiency. In his opinion it was much more important for software implementations to be simple and efficient. His algorithms tend to be elegant in software but complex in hardware, pretty much guaranteeing his candidates would never be chosen.

I'm not an EE so feel free to correct me, but I closely followed the standards process both times and that's my recollection of things.


> It's actually the opposite AFAIK. One of the selection criteria for the Advanced Encryption Standard (AES) was cheap hardware implementations, and it's one reason why Rijndael was chosen over some of the stronger ciphers.

Oh, I was aware of that bit (vaguely; to be fair I was a child during the AES competition, so I only remember a small bit of the history), I just meant AES is a bit slow in software relative to ChaCha today, is all, which I could have clarified.

EDIT: I think I realized now what you meant. When I said ChaCha20/Poly1305 could, with effort, rival AES-256 in hardware in the last paragraph of my post, what I meant was: a software version of ChaCha20 can get very close to a hardware version of AES, providing you put in a lot of effort.

I can see how that sentence is a mis-parse, sorry about that.

> DJB has criticized the selection criteria for both AES and SHA3 as being too focused on hardware efficiency. In his opinion it was much more important for software implementations to be simple and efficient. His algorithms tend to be elegant in software but complex in hardware, pretty much guaranteeing his candidates would never be chosen.

Yes, this is the basic impression I've gotten as well from all his work - to be fair, software implementations are much more agile and easy to deploy, so I think putting some focus on this is a good thing.

I am also not an EE, but I've heard similar things of this nature before (e.g. that ChaCha/Poly would be much more expensive in hardware compared to AES, which is truly a con, not a pro). I'd be interested if any actual EEs would chime in here.

But yes, given all that, I think AES-GCM + ChaCha/Poly1305 is a good pair that should cover most of your bases for an AEAD, for fast hardware and software implementations.


Not sure about ChaCha but I implemented Salsa20 on a microcontroller. Looked to me that you could generate a mechanical proof that it's 'secure' IE, doesn't have a hole in the design. Also that the microprocessor isn't going to expose you to an oddball timing attack. The adds, xors and rotations aught to be single cycle and the code paths never change based on any of the results.


Is ChaCha20 actually implemented in hardware on any platforms? I was under the impression that the algorithm itself is just really really fast in software (especially so with SIMD).

I implemented ChaCha20 in AArch64 assembly, and it was possible to encrypt/decrypt 6 blocks at once.


The Cryptech project uses ChaCha as CSPRNG in our TRNG. We decided on ChaCha because of its performance and good security margin. I know of at least one more project that uses our ChaCha core.

https://cryptech.is/

ChaCha can efficiently be implemented in HW, esp in FPGAs that supports carry chains, which basically means most FPGAs.

It is somewhat hard do compare size and speed since both ChaCha and AES are so scaleable. In ChaCha there are many places where you can trade operator reuse with performance. But the fundamental operator size is 64-bits.

AES in comparison works on bytes and you can go from a single S-box (implemented as a table, as logic, as part of a T-box etc) that is reused in the datapath as well as key expansion all the way to a fully pipelined (10-14 rounds) humongous implementation. Very flexible and easy to adapt to the system requirements. One additional thing to note with AES is that for many cipher modes, the decryption functionality can be removed.

But with all this said. If I compare my implementation av AES (that includes decryption) with my implementation of ChaCha20, I get about 4x better performance with ChaCha with fairly close the same number of resources.

https://github.com/secworks/chacha https://github.com/secworks/aes

The ChaCha core requires more registers, esp for the API. This is due to the bigger block size (512 vs 128)

I like ChaCha in HW and thinks its a good choice. I'm currently working on a ChaCha20-Poly1305 core compatible with RFC7539 to make it easier for HW projects to use good AEAD ciphers.

https://tools.ietf.org/html/rfc7539


Thanks for the perspective. One small correction/clarification: ChaCha operates on pairs 32-bits at a time, not 64-bits, which makes it nice for 32-bit only systems in software. I really wish ChaCha20/Poly1305 was included in benchmarks for the CAESAR AEAD contest since my understanding is that it would do a little better than NORX (at least in software and it would be interesting to see how it compares in hardware), which is generally the fastest of the secure non-AES options (e.g. disqalifying MORUS due to the BRUTUS identified adaptive chosen plaintext issue).

For those wondering why this came up now, the third round CAESAR candidates will be announced any day now. DJB's choices in Salsa20/ChaCha are still looking very good.

The ability to do relatively effient masking/blinding in LRX algorithms is a major advantage at least, but with NORX you need 64-bit operations to get a 256-bit key which is frustrating. I wonder if NORX32-f could be used to make a Salsa20/ChaCha style stream cipher where you operate on block size data (say use the pseudo-addition to incorporate the start state).


Agree on having ChaCha20-Poly1305 in the benchmarks would be good. RFC 7539 has been publshed and there are already several applications using this combination (as has been mentioned).

Any winning algorithm(s) from Ceasar will compete with ChaCha20-Poly1305 and should be chosen to provide some clear advantage. Better performance, agilty, scalability, security including side-channel leakage and other attacks on implementations for example.

Really looking forward to see the round three announcement.


Sorry, the brain mistyped 32 with 64. Thanks for pointing it out.


This is inaccurate. AES is a block cipher, whereas salsa/cha-cha are stream ciphers. Block ciphers are easy to accelerate in hardware, as they act on "blocks" of data at a time, whereas streams almost go byte by byte


Typical stream cipher produces stream of bits (not even bytes), and often are described in manner that can be readily converted into hardware, also most of such ciphers are non trivial to implement efficiently in software.

Stream ciphers done in software that are actually somewhat widely used are either based on iterating some block cipher like primitive (which may be purpose designed as in Salsa/Chacha) or are related to RC4.

IIRC the fact that you can derive stream cipher by iterating essentially any cryptographic primitive (eg. hash function) was one of the arguments used by DJB in his court case against US.


What do you mean by typical stream ciphers? AFAIK the most common stream ciphers are A5/1 (and A5/2) used in GSM, Snow3G used in 3G and LTE, E0 used in Bluetooth and RC4 for WPA in Wifi.

Of these A5/1 generates bursts of 114 bits, E0 generates two bits at a time, Snow3G generates 32-bit words and RC4 generates bytes.

Implementing A5/1 in SW is not easy, but Snow3G can be efficiently implemented in SW. For RC4 there are many high performance implementations in SW.

I though agree that A5/1, E0 and Snow3G are designed to be efficiently implemented in HW.

Besides these algorithms block ciphers in stream cipher modes (esp CTR) are used a lot. KASUMI in 3G, LTE and AES in IEEE 802.15.4 (CCM mode) and WPA2 for example.


A5/1 is probably perfect example of what I had in mind as it generates output one bit at a time and it's output has quite large period, the fact that in GSM it's used to generate pair of 114bit keystreams is somewhat irelevant to that. All three ciphers in eSTREAM hardware profile are specified in same way (although all of them are designed in a way that allows for more output bits to be computed in parallel)


ChaCha generates keystream blocks of 512 bits. So a comparison to block ciphers in CTR mode is fairly correct.


In fact, if go as far as saying that stream ciphers can't be accelerated as much as block ciphers can even with gpgpu techniques


I respectfully claim that statement is wrong. Some of the most commonly used stream ciphers, A5/1, E0, Snow3G are explicitly designed to be efficiently implemented in HW.

Further, if you look at the eSTREAM you have the profile two algorithms that can be very efficiently be implemented in HW. And to be honest, the profile one algoritms can also be efficiently implemented in HW. I have implemented them all in HW and get good performance.

The stream cipher HC-128/256 for example is very fast in SW. But in HW I can parallelize the state read and updates in ways you can't do in SW due to lack of multiple read and write ports. Doing this you get multiple Gbps performance in HW even with low clock frequency.

https://en.wikipedia.org/wiki/ESTREAM

If you look at the stream cipher RC4, it was not designed for HW implementation. But in HW I can implement RC to do three reads and two updates in parallel and reach 1 cycle/byte. In a low cost FPGA I reach 500 Mbps performance, which is pretty ok. Not that I'm promoting the use of RC4. My implementation was just an experiment to see if it was possible to do such a parallel implementation. Oh, and it is not debugged so don't use it anyway. ;-)

https://github.com/secworks/rc4


This is the subject of an excellent book on Archer Blood:

https://www.amazon.com/Blood-Telegram-Gary-J-Bass/dp/0307744...


You can pressurize the gas and put a barometer in the tube in order to detect any breach of the tube. This was used during the cold war and the NSA tapped the communications cable anyway by filling the space around the tube with gas as well. Source: James Bamford's books on the NSA.


I love Brazil and Apollo. As an ex-Amazonian I miss them on a weekly basis.


If you are on a team that does nothing but Java on a MAWS or corp platform, brazil was built for you and I wouldn't be surprised that you like it. But...if you deviate slightly, you run the chance of not getting what you need. If you stray away from the JVM or Ruby, chances are you will never get anything at all, let alone what you need.


ruby, java, perl :S , python, roll-your-own have done it all. :) yes, the learning curve can be pretty steep, but one thing that always helps is that the probability that there is somebody out there that did it already is very high. you just need to be curios and invest into learning it and looking at what others are doing.


Check out nix / nixpkgs. :-) That's the closest to Brazil IMO.


look pretty awesome. never heard of this before. I'm sure it has some caveats...


Yep. Definitely! One caveat, in short, is: Everything involved in the build must be placed under the nix store (/nix/store). Which, for some software, can cause issues: They might have a hardcoded runtime path that references /usr/lib. Nix provides tools to resolve these assumptions, but they can still be a sticking point at times. There is also, more or less, a unsafePerformIO that can be used. Which is discouraged. Still, for private builds the cost can be acceptable.


do you know if there is a way to cache things if used across multiple apps?


Assuming I understand the question correctly: Yes.

Nix uses a pure evaluation model so it enjoys the property: An artifact in the nix store is uniquely identified by the closure used to build the artifact. For any artifacts A, B if and only if the closure used to build the artifact are equal then the artifact file path will be equal.

This creates opportunities for sharing between builds that can be hard to achieve in other systems. One form of sharing "referencing a derivation in multiple apps" works as expected, just like other systems: Each app will reference the same artifact.

(a derivation is the closure to be evaluated to build an artifact in the nix store. Well, attribute set of closures.)

Suppose a derivation is assigned to the variable "commonData" and two derivations "appX" and "appY" reference this closure. "commonData" will be built once and both app derivations will receive a path to the same file in the nix store.

The other form of sharing comes from the equality comparison being based on the closure and not the name used to reference the closure.

Ehh.. I'm butchering the explanation... I think there is a succinct PL term that covers this.

Suppose we have a derivation:

let x = mkDerviation { name = "foo"; builder = aBuilder; src = /share/src/foo; }

which is referenced by another derivation

let y = mkDerviation { name = "bar"; builder = aBuilder; src = /share/src/bar; inherit x; }

"y" will force the evaluation of the "x" derivation's closure. The source directories, since they are not in the nix store, will be copied to the nix store first. (By an implicit conversion between local files and nix store paths)

So far so good, but what happens if there is another derivation like so?

let z = mkDerviation { name = "zab"; builder = aBuilder; src = /share/src/zab; somethingLikeX = mkDerviation { name = "foo"; builder = aBuilder; src = /share/src/foo; }; }

"somethingLikeX"'s equation is equal to "x" but not the same reference.

What will happen if z is evaluated after y? (assuming aBuilder is the same) First, the derivation "somethingLikeX" will be evaluated. Ah ha! That closure is equal to the closure for "x" above! Which has already been evaluated. So that evaluation result will be shared. Even though "z" does not directly reference "x".

This can result in more sharing than the developer explicitly requested: Equal closures are shared.



I don't think your attack violates the security model of this protocol. One of the proofs in the paper is that if you have a network of good and bad nodes the bad nodes can use tricks like you posted to discover whether a good or bad node sent the message. I think that's all they can discover: did an attacker send this or not? But then again they already knew that, didn't they? So I think your attack is only interesting for 3 nodes.


It is a side channel that breaks the intention of the protocol - Bob and Charles are 'good nodes', one of who paid for the meal. Alice (bad node) is not supposed to be able to find out if it was Bob or Charles who paid.

If the participants don't share or talk about the outcome at all, this particular attack is avoided, so it is a side channel and not a direct failure of the protocol itself.


You generally use Unicorn etc. behind something like nginx. The last time I did that I used nginx to handle thousands of concurrent connections that were forwarded to nCPUs Unicorn instances. Nginx is very good at handling lots of connections, Unicorn is good at handling a Rack app.


To be a bit more explicit, using a separate httpd and application server allows a division of labor between the resource-bound task of handling the request + building the response from the network-bound task of dibbling bytes back to the original requestor.

Nginx (and the general class of highly concurrent servers) is good at handling lots of connections largely because it tries to minimize the resources (memory, process scheduler time, etc) required to manage each connection as it slowly feeds the result down the wire.

The application server generally wants an instance per CPU so that it can hurry up and crank through a memory-, cpu-, or database-hungry calculation in as few microseconds as possible, hand the resulting data back to the webserver and proceed to put the memory, DB, and CPU to the task of processing the next request.

This is in contrast to the (simplified here) old-school CGI way that say ancient Apache would receive a request, then fork off a copy of PHP or Perl for each one, letting the app get blocked by writing to the stdio pipe to Apache then Apache to the requesting socket. All the while maintaining a full OS process for each request in play.


Several things make it easier to maintain:

- super fast compile times for fast developer iterations

- you can create an interface (set of methods) that your module owns, and apply it to objects created by other modules without recompiling. If you only need one method you can take that one. It encourages decoupling you from your dependencies.

- the code has one correct formatting convention and a tool that will auto-format your code

- many complex rules for numeric conversions that are implicit in C and cause no end of trouble are explicit and much simpler in Go.

- treating concurrency as a series of sequential process connected with channels makes it MUCH easier to reason about.


Don't the first 3 apply just as well to python? It compiles fast, and is duck-typed so interfaces are always there and implicit, and the indent-based blocking ends many arguments about formatting.


the difference is that if you're not careful when you cut-paste indented blocks in python, you can easily change logic.


If you cut and paste probably the misaligned indentation is the least problem. Cut and paste should be avoided if it's not done during a refactoring (so actual moving of code for better design).

Beginners usually think Python strong indentation is a weakness of the language. I actually find C++ freedom being more error prone:

    if (a < b);
        a = b

It's a not very frequent bug, but when it happens it takes you hours to spot. ;)


Yes and in Go (and hopefully all curly-brace languages of the future) this is actually a syntax error, you need the braces:

  if (a < b) {
     a = b;
  }


The thing I don't get...if braces are mandatory (Good), why keep the now completely, unambiguously, irrelevant () around the cond?


If you ran the code through gofmt, it would remove the () around the cond for you. I code go in SublimeText with GoSublime. On every save it runs the file through gofmt and reformats it for me. Keeps my code looking pretty with very little effort.


You don't need those parens in Go, and go fmt will in fact remove them/


They're actually not required in Go.


And you don't need the semicolons!


This would never happen to me because I frequently format my doc, and it wouldn't be indented.


Python executes, it does not compiles.

EDIT: Whoops, it's actually compiled into bytecode then executed by the VM.


it's generous to call the CPython interpreter a VM - the binary encoding of Python isn't some crazy IL bytecode, its' really just python-as-binary. Language constructs converted into opcodes, strings with pre-calculated hashes, and local variables within a scope become a sort of vector... but otherwise, it's a prettymuch 1:1 mapping between Python language constructs and the bytecode form.


This is not true.



It actually "compiles" into bytecode, then consumed by the interpreter.


I thought you had meant that Go was "easier to maintain" than python - most of these don't apply to Python: Python has faster compilation times (0); interfaces without recompiling; formatting conventions; and numeric conversions is compared explicitly with C. Channels is the only one that applies to python.


As some who generally prefers Python to Go quite strongly, the formatting situation is no where NEAR equivalent. With go it's trivial to setup your editor to perfectly reformat every time you save, or even on carriage return, and it does in a way that is 100% consistent and will never alter the logic of your code.


I'm not a python expert by any measure, but I've written a couple small projects with it. Python surprises for how well documented it is. In this specific case, PEP8 lays down most formatting conventions, so you'd expect some tooling to be available. Is this not the case?

A quick Google search brought up this, for instance:https://bitbucket.org/StephaneBunel/pythonpep8autoformat


The thing is, while this kind of thing is available for pretty much every language, it is standard for Go. As in, if you don't gofmt your code, everyone who looks at it will bug you to format it. That's a huge difference. There's basically zero code in the wild that you'd ever want to use that /isn't/ gofmted.


Exactly the point. Go is a VERY opinionated language. You see the term "idiomatic Go" used very regularly. The Go team and community push hard for everyone to follow a set way of doing things. This means all Go code must be formatted a certain way, handle errors in a certain way, etc. People even put pre-commit hooks into their VCS system that will reject .go commits unless they are properly formatted before commit.


I use Vim and there's a plugin just for that: making sure my code is PEP8 compliant.


Maybe he meant that having a typed language that is checked at compilation time is a huge plus for maintenance, and the usual downsides that come with compilation (mainly slowness) are pretty much inexistant with Go.


I think he mean one single big ass executable is way easier to maintain than pip install & virtualenv everywhere.

Yes we can also make a single big ass executable out of any python code but it's slower and not supported out of the box.


And IMHO that's whats killing python - the fragmentation.


You are incorrect about compile times in many ways. First- python byte compiles to a weird VM. Then it executes that. Second- go compilation times are comparable to python byte compilation time but the result is a native executable.


Python is interpreted, not compiled... Of course it has 0 compile time.


They say no evidence of illegality, but there is evidence of poor judgement. Without commenting on the specifics of this incident that's a valid reason to fire a CEO/co-founder/exec. I have no particular knowledge of what happened but if GitHub thinks he showed poor judgement then it is reasonable to ask him to resign.


For example, if you didn't actually do anything legally wrong, but made it look an awful lot like you did by screwing up and being a dick about it, and you're a company executive, then you can resign or get fired.

I suspect that this is not a million miles from what happened here: not actual discrimination, but enough of a blunder to look like it.


I am an ex-AWS employee. I can verify that AWS was founded as a new line of business using new servers purchased for AWS. The idea that AWS was started using Amazon's spare capacity is widely repeated but false. Amazon is definitely leveraging their expertise and not their physical machines. Source: I've seen the original pitch deck for Amazon S3.


AWS may not have used Amazon's spare hardware, but it certainly used Amazon's spare code.


Prime is considered a marketing expense. Amazon absorbs hundreds of millions or even a billion dollars in shipping charges through prime. It's a huge reason they have a thin profit margin.


They shouldn't even have a profit margin, they pay no tax inside the UK.


That's populist propaganda. Businesses are taxed on profit, not revenue. Even if they were 100% based in the UK, they'd still probably pay very low corporate taxes (if any) due to the thin margin. They actually do pay an enormous amount of tax in the UK in the form of payroll taxes, VAT collections, etc.

It's sad to see people propagate this crap. It's really about veiled protectionism. A big, efficient player comes in and is willing to forego profit indefinitely. Can't tax them legally, so shame them. Tax on 0 is 0. That's how it works for UK companies too. Try reading between the lines...


I mean they don't pay corporation tax, they're based in Luxembourg for that reason. Why should I have to pay a ridiculous amount of money, when they don't pay anything.


They pay corp tax on the profit they earn in their UK's operations. Just like every other company with operations in the UK.

What percentage of your revenue do you pay in corporate taxes? None! You only pay a percentage of profit. (extremely thin for Amazon) They base in Luxembourg to benefit from UK laws and EU regulation. The UK media and government quoting billions in revenue is meaningless. They're trying to stir you up. They would relatively nothing in corp tax relative to their other tax footprint (payroll, vat, etc) and would absolutely spend it all to avoid doing so.


Then why is Luxembourg such a magnet? It's not just Amazon, companies that exclusively cater for the UK market moved their base there.


Luxembourg is a magnet because the UK government is stupid and creates taxes that are easily avoidable.


Exactly my point, if Amazon are making little now through tax loopholes what happens when they are fixed.


How is it a "loophole"?


Amazon runs a business in the UK, they are owned by a parent company in Luxembourg that they pay the majority of all their revenue to, thus making almost no profit, thus paying very little in tax.

They run a business in the UK but pay almost nothing in tax, it should be fixed its not fair on small business owners who have to pay all of there tax and get hammered for it.

Can you not see how its a loop hole? It can so easily be fixed.


Revenue sent to a parent company abroad is deducted as an expense? I find that hard to believe.

According to this article[1], what they actually do is send the payments directly to the Luxembourg company, and Amazon UK is just classified as a delivery company. That makes sense, and frankly, it doesn't shock me. I'm from Portugal and I just bought some comics from an US company. Should they start paying Portuguese corporate taxes?


Yeah, the reason AMZN doesn't pay UK tax is NOT because they are not profitable. It is because they legally avoid taxes by basing all their EU operations out of Luxemburg. Tax codes need reform world-wide to avoid this kind of bad behavior.


So if Etsy sells something to Portugal, should you pay Portuguese corporate taxes?

I'm sure we won't mind the income ;)


That's different Amazon actually has property in the UK, large warehouses, staff among other things.


Etsy also ships physical things, the only difference is that they subcontract to a delivery company, who obviously doesn't pay taxes over Etsy's profits.

But actually, it's not different, because Amazon also subcontracts, the only difference is that the subcontracted delivery company is named... Amazon. So what they're doing is the same as any companies that ships products to the UK, except that they happen to own both companies and they have similar names.

I fail to see the reason why should Amazon UK pay for the profits earned by Amazon Luxembourg, when DHL or whoever don't pay for Etsy's profits that come from the UK.


Shall we remove all there distribution centres then? Since they can just have a global warehouse in Luxembourg? Since they don't need them.


A business shouldn't make money? Surely that isn't what you meant...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: