Showing posts from 2015

Regular expressions: Rust vs F# vs Scala

Let's implement the following task: read first 10M lines from a text file of the following format:

then find all lines containing Microsoft namespace in them, and format the type names the usual way, like "Microsoft.Win32.IAssemblyEnum".

First, F#:

Now Rust:

After several launches the file was cached by the OS and both implementations became non IO-bound. F# one took 29 seconds and 31MB of RAM at peak; Rust - 11 seconds and 18MB.

The Rust code is as twice as long as F# one, but it's handling all possible errors explicitly - no surprises at runtime at all. The F# code may throw some exceptions (who knows what kind of them? Nobody). It's possible to wrap all calls to .NET framework with `Choice.attempt (fun _ -> ...)`, then define custom Error types for regex related code, for IO one and a top-level one, and the code'd be even longer then Rust's, hard to read and it would still give no guarantee that we catch all possible exceptions.

Update 4 Jan 2016: Sc…

Elixir: first look

I don't have a clear impression about Elixir language yet. I don't like it has Ruby like syntax, but do like it has pipe operator and macros. So, Fibonacci:

It executes in about 13 seconds which is on pair (even faster for unknown reason) with Erlang, no surprises here.

D (GDC) - 0.990C# - 1.26D (DMD) - 1.3C++ - 1.33F# - 1.38Nemerle - 1.45Rust - 1.66Go - 2.38Haskell - 2.8Clojure - 9Elixir - 13Erlang - 17Ruby - 60Python - 120

SHA1 compile time checked literals: F# vs Nemerle vs D

I've always been interested in metaprogramming. Sooner or later, I'm starting to feel constrained within a language without it. F# is a really nice language, but I'm afraid I'd have got bored with it if it'd not have Type Providers, for example. Why metaprogramming is so important? Because it allows changing a language without cracking the compiler. It allows making things which seemed to be impossible to implement.

I'm dealing with cryptography hashes a lot at work, nothing rocket since, just MD5, SHA-1 and so on. And I write tons of tests where such hashes are used in form of string literals, like this:

The problem with this code is that the compiler cannot guarantee that the hex string in the last line represents a valid SHA-1. If it does not, the test will fail at runtime for a reason it's not intended to.

OK, now we can formulate our task: provide a language construct to enforce a string literal being a valid SHA-1 hexadecimal, at compile time. We wil…

Fib: C++, C# and GDC

As a reference implementation, I added C++ one:

It's execution time is 1.33 seconds, which surprisingly is not the best result so far.
A C# version:

Also, I compiled this D code with GDC compiler and it executed in 990 ms, which is the best result:

D (GDC) - 0.990C# - 1.26D (DMD) - 1.3C++ - 1.33F# - 1.38Nemerle - 1.45Rust - 1.66Go - 2.38Haskell - 2.8Clojure - 9Erlang - 17Ruby - 60Python - 120
Unfortunately, I have not managed to compile the D code with LDC compiler, it returns the following error:
Building: DFib (Release) Performing main compilation... Current dictionary: d:\git\DFib\DFib D:\ldc2-0.15.2-beta1-win64-msvc\bin\ldc2.exe -O3 -release "main.d"   "-od=obj\Release" "-of=d:\git\DFib\DFib\bin\Release\DFib.exe" LINK : fatal error LNK1181: cannot open input file 'kernel32.lib' Error: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\link.exe failed with status: 1181 Exit code 1181

Composing custom error types in F#

I strongly believe that we should keep code as referential transparent as possible. Unfortunately, F# language does not encourage programmers to use Either monad to deal with errors. The common practice in the community is using common in the rest .NET (imperative) world exception based approach. From my experienced, almost all bugs found in production are caused by unhandled exceptions. 
The problem
In our project we've used the Either monad for error handling with great success for about two years. ExtCore is a great library making dealing with Either, Reader, State and other monads and their combinations really easy. Consider a typical error handling code, which make use Choice computation expression from ExtCore:

The code is a bit hairy because of explicit error mapping. We could introduce an operator as a synonym for Choice.mapError, like <!>, after which the code could become a bit cleaner:

(actually it's the approach we use at in our team).

Rust composable errors

Go: fib

Go code is relatively low-level since it does not have "foreach over range" syntax construct:

Results are not as impressive for a systems language: 2.38 seconds. And it lays below Rust but under Haskell:
C# - 1.26D (DMD) - 1.3F# - 1.38Nemerle - 1.45Rust - 1.66Go - 2.38Haskell - 2.8Clojure - 9Erlang - 17Ruby - 60Python - 120

Computing cryptography hashes: Rust, F#, D and Scala

Let's compare how fast Rust, D and F# (.NET actually) at computing cryptography hashes, namely MD5, SHA1, SHA256 and SHA512. We're going to use rust-crypto cargo:

MD5 - 3.39s SHA1 - 2.89s SHA256 - 6.97sSHA512 - 4.47s
Now the F# code:

Results (.NET 4.5, VS 2013, F# 3.1):
MD5CryptoServiceProvider - 2.32s (32% faster)SHA1CryptoServiceProvider - 2.92s (1% slower)SHA256Managed - 16.50s (236% slower)SHA256CryptoServiceProvider - 11.50s (164% slower)SHA256Cng - 11.71s (168% slower)SHA512Managed - 61.04s (1365% slower)SHA512CryptoServiceProvider - 21.88s (489% slower)SHA512Cng - 22.19s (496% slower) (.NET 4.6, VS 2015, F# 4.0):

MD5CryptoServiceProvider elapled 2.55SHA1CryptoServiceProvider elapled 2.89SHA256Managed elapled 17.01SHA256CryptoServiceProvider elapled 8.74SHA256Cng elapled 8.75SHA512Managed elapled 23.42SHA512CryptoServiceProvider 5.81SHA512Cng elapled 5.79


MD5 - 16.05s (470% slower)SHA1 - 2.35s (19% faster)SHA256 - 47.96s (690% slower (!))SHA512 - 61.47s (13…

Rust: fib

Rust is an interesting language. It is not a primitive one, like Go where we don't have ADTs, pattern matching and generics (but we do have Nils). And it's advertising as a safe and performant system language. Today is the very first day I'm looking at it. Let's "smoke" test it with Fibonacci :)

Debug: 3.44 seconds, release: 1.66 seconds. This is not very impressive, but pretty fast indeed.
C# - 1.26D (DMD) - 1.3F# - 1.38Nemerle - 1.45Rust - 1.66Haskell - 2.8Clojure - 9Erlang - 17Ruby - 60Python - 120 It's very interesting how it'll behave in concurrent Fibonacci test.

The compiler is quite slow: it takes 2-3 seconds to build this tiny program.

Parallel reduce: Hopac, Asyncs, Tasks and Scala's Futures

Tuomas Hietanen posted a parallel reduce function that uses TPL Tasks. I found it interesting to compare performance of this function with analogues implemented using F# Asyncs, Hopac Jobs and Scala Futures. The author uses noop long-running reduce function to show that it's really run in parallel. In this blog post we are benchmarking another aspect of the implementations: how much extra cost is introduced by a particular parallezation mechanism (library) itself.

We translate the original code almost as is to Tasks and Hopac:

And Scala's Futures:

The results (Core i5, 4 cores):

Sequential List.reduce: Real: 00:00:00.014, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0 Tasks: Real: 00:00:01.790, CPU: 00:00:05.678, GC gen0: 36, gen1: 10, gen2: 1 Hopac: Real: 00:00:00.514, CPU: 00:00:01.482, GC gen0: 27, gen1: 2, gen2: 1 Asyncs: Real: 00:00:37.872, CPU: 00:01:48.405, GC gen0: 90, gen1: 29, gen2: 4Scala Futures: 4.8 seconds
(Hopac - 3.4 times faster, Asyncs - 21.1 times slower…

Fibonacci: Hopac vs Async vs TPL Tasks on .NET and Mono

Hopac claims that its Jobs are much more lightweight that F# Asyncs. There are many benchmarks on Hopac github repository, but I wanted to make a simple and straightforward benchmark and what could be simpler that parallel Fibonacci algorithm? :) (actually there's a more comprehensive  benchmark in the Hopac repository itself, see Fibonacci.fs)

Sequential Fibonacci function is usually defined as

So write a parallel version in Hopac where each step is performed in a Job and all these Jobs are (potentially) run in Parallel by Hopac's scheduler

An equivalent parallel algorithm written using F# Asyncs

...and using TPL Tasks

All three functions create *a lot* of parallel jobs/asyncs/tasks. For example, for calculating fib (34) they create ~14 million of jobs (this is why Fibonacci was chose for this test). To make them work efficiently we will use the sequential version of fib for small N, then switch to parallel version

Now we can run both of the function with different "le…