Friday, May 27, 2016

Upcoming F# struct tuples: are they always faster?

Don Syme has been working on struct tuples for F# language. Let's see if they are more performant than "old" (heap allocated) tuples in simple scenario: returning tuple from function. The code is very simple:

Decompiled code in Release configuration:



Everything we need to change to switch to struct tuples, is adding "struct" keyword in front of constructor and pattern matching:


Decompiled code in Release configuration:

I don't know about you, but I was surprised with those results. The performance roughly the same. GC is not a bottleneck as no objects were promoted to generation 1.

Conclusions:

  • Using struct tuples as a faster or "GC-friendly" alternative to return multiple values from functions does not make sense.
  • Building in release mode erases away heap allocated tuples, but not struct tuples.
  • Building in release mode inlines the "foo" function, which makes the code 10x faster.
  • You can fearlessly allocate tens of millions of short-living object per second, performance will be great.


Sunday, May 22, 2016

Hash maps: Rust, F#, D, Go, Scala

Let's compare performance of hashmap implementation in Rust, .NET, D (LDC) and Go.
Rust:
F#:
As you can see, Rust is slower at about 17% on insersions and at about 21% on lookups.

Update

As @GolDDranks suggested on Twitter, since Rust 1.7 it's possible to use custom hashers in HashMap. Let's try it:
Yes, it's significantly faster: additions is only 5% slower than .NET implementation, and lookups are 32% *faster*! Great.

Update: D added

LDC x64 on windows
It's very slow at insertions and quite fast on lookups.

Update: Go added


Update: Scala added



Compared to Scala all the other languages looks equally fast :) What's worse, Scala loaded all four CPU cores at almost 100% during the test, while others used roughly single core. My guess is that JVM allocated so many objects (each Int is an object, BTW), that 3/4 of CPU time was spend for garbage collecting. However, I'm a Scala/JVM noob, so I just could write the whole benchmark in a wrong way. Scala developers, please review the code and explain why it's so slow (full IDEA/SBT project is here). Thanks!















Wednesday, May 4, 2016

Akka.NET Streams vs Hopac vs AsyncSeq

Akka.NET Streams is a port of its Scala/Java counterpart and intended to execute complex data processing graphs, optionally in parallel and even distributed. It has quite different semantics compared to Hopac's one and it's wrong to compare them feature-by-feature, but it's still interesting to benchmark them in a scenario which both of them supports well: read lines of a file asynchronously, filter them by a regex in controlled degree of parallelism, then normalize the lines with a simple string manipulation algorithm, also in parallel, then count the number of lines.

Firts, Akka.NET:

Note that I have to use the empty string as indication that the regular expression does not match. I should use `option` of course (just like I do in the Hopac snippet below), but Akka.NET Streams is strict about what is allowed to be returned by its combinators like `Map` or `Filter`, in particular, you cannot return `null`, doing so makes Akka.NET unhappy and it will throw exception at you. In F#, expressions like `fun x -> printfn "%O" x` and `fun x -> None` returns `()` and `None` values respectively, which are represented as `null` at runtime, so you have to be very careful `Map`ping and `Filter`ing (and using all the combinators actually) over side effecting functions or returning `Options` (just do not do either).

Now, Hopac:

And finally AsyncSeq:


Number of allocations is roughly identical for Hopac and Akka, but it's an order of magnitude larger for AsyncSeq.

Conclusions

  • Use Hopac if you need the best performance available on .NET, or if you need to implement arbitrary complex concurrent scenarios.
  • Akka.NET is quite fast and has a full blown graph definition DSL, so it's great for implementing complex stream processing, which can run on a cluster of nodes. However, it has a typical "fluent" C#-targeted API, so it's necessary to write a thin layer over it in order to make it usable from F#.
  • AsyncSeq has the most "F# friendly" API - it's just a combination of two computation expressions which every F# programmer knows: Async and Seq.

Update 6 May, 2016


Marc Piechura suggested a way to exclude materialization phase from the benchmark, here is the modified code:

It turns out it takes Akka.NET about 3 seconds to materialize the graph.

Thanks Vesa Karvoven for help with fixing the Hopac version and Lev Gorodinski for fixing AsyncSeq performance (initially it works awfully slow).