ets versus redis benchmarks for a simple key value store

The project that I am currently working on has a huge data set of static lookup data. And, we have been using Redis to store this data since the beginning of the project. We figured, redis would be the fastest as the whole data is in memory. However, in our production use we have found redis to be the bottleneck.

This is not really redis’ fault as the data access pattern that we have involves a huge number of lookups more than 10K lookups per request. Also, since redis runs on a single core, it isn’t able to use all the cores on our server. Add the network costs and the serialization costs to it and things add up very quickly.

This led me to do some benchmarking of redis against ets with our actual production data and (un)surprisingly we found that ets beats Redis for simple key value data. So, if you are using redis as a key value store. Please do yourself a favor and use ets (If you are using Elixir or erlang).

I created a simple mix project which benchmarks ets and redis (https://github.com/minhajuddin/redis_vs_ets_showdown)

Go ahead and try it out by tweaking the count of records or the parallelism.

We found that the ets to redis performance gap actually grows as the parallelism increases.

Checkout the repository for the benchmark data: https://github.com/minhajuddin/redis_vs_ets_showdown

You can also check the reports at:

  1. https://minhajuddin.github.io/redis_vs_ets_showdown/reports/benchmark-1000.html
  2. https://minhajuddin.github.io/redis_vs_ets_showdown/reports/benchmark-1000000.html

Here is the gist of the benchmark:

Quick explanation of names

ets_get_1000: does an ets lookup 1000 times

redis_get_1000: does a redis lookup 1000 times using HGET

ets_get_multi: does an ets lookup 1000 times

redis_get_multi: does a single HMGET Redis lookup

Benchmark for 1000_000 records

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019272 GB
Elixir 1.5.1
Erlang 20.1
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 30.00 s
parallel: 4
inputs: none specified
Estimated total run time: 2.13 min
Name ips average deviation median
ets_get_multi 3.31 K 0.30 ms ±20.60% 0.28 ms
ets_get_1000 2.87 K 0.35 ms ±75.38% 0.31 ms
redis_get_multi 0.34 K 2.95 ms ±17.46% 3.01 ms
redis_get_1000 0.0122 K 82.15 ms ±15.77% 77.68 ms
Comparison:
ets_get_multi 3.31 K
ets_get_1000 2.87 K - 1.15x slower
redis_get_multi 0.34 K - 9.76x slower
redis_get_1000 0.0122 K - 271.91x slower

Benchmark for 1000 records

1
2
3
4
5
6
7
8
9
10
11
Name ips average deviation median
ets_get_multi 4.06 K 0.25 ms ±12.31% 0.24 ms
ets_get_1000 3.96 K 0.25 ms ±18.72% 0.23 ms
redis_get_multi 0.34 K 2.90 ms ±12.34% 2.99 ms
redis_get_1000 0.0115 K 87.27 ms ±17.31% 81.36 ms
Comparison:
ets_get_multi 4.06 K
ets_get_1000 3.96 K - 1.02x slower
redis_get_multi 0.34 K - 11.78x slower
redis_get_1000 0.0115 K - 354.04x slower

Performance variations of string substitution in Elixir

I had to do some string stripping in one of my apps which was a bit performance sensitive. I ended up benching multiple approaches to see the speed differences. The results are not that suprising.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
path = "/haha/index.html"
subdomain_rx = ~r(^\/[^\/]+)
Benchee.run(%{
"pattern_match_bytes" => fn ->
len = byte_size("/haha")
<<_::bytes-size(len), rest :: binary >> = path
rest
end,
"pattern_match" => fn -> "/haha" <> rest = path; rest end,
"slice" => fn -> String.slice(path, String.length("/haha")..-1) end,
"replace_prefix" => fn -> String.replace_prefix(path, "/haha", "") end,
"split" => fn -> String.splitter(path, "/") |> Enum.drop(1) |> Enum.join("/") end,
"regex" => fn -> String.replace(path, subdomain_rx, "") end,
})

output of benchee

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
bench [master] $ mix run lib/bench.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019316 GB
Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 42.00 s
Benchmarking pattern_match...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking pattern_match_bytes...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking regex...
Benchmarking replace_prefix...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking slice...
Benchmarking split...
Name ips average deviation median
pattern_match_bytes 24.05 M 0.0416 μs ±1797.73% 0.0300 μs
pattern_match 22.37 M 0.0447 μs ±1546.59% 0.0400 μs
replace_prefix 3.11 M 0.32 μs ±204.05% 0.22 μs
slice 1.25 M 0.80 μs ±6484.21% 1.00 μs
split 0.75 M 1.34 μs ±3267.35% 1.00 μs
regex 0.42 M 2.37 μs ±1512.77% 2.00 μs
Comparison:
pattern_match_bytes 24.05 M
pattern_match 22.37 M - 1.08x slower
replace_prefix 3.11 M - 7.73x slower
slice 1.25 M - 19.30x slower
split 0.75 M - 32.18x slower
regex 0.42 M - 57.00x slower

So, the next time you want to strip prefixing stuff, use pattern matching :)

Update

Based on the comments by @Matt Widmann and @Peter I did a quick test of replacing the tail of the string using the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
path = "/haha/index.html"
ext_rx = ~r/\.[^\.]+$/
Benchee.run(%{
"reverse_pattern_match_bytes" => fn ->
len = byte_size(".html")
<<_::bytes-size(len), rest :: binary >> = String.reverse(path)
rest |> String.reverse
end,
"reverse_pattern_match" => fn -> "lmth." <> rest = String.reverse(path); String.reverse(rest) end,
"slice" => fn -> String.slice(path, 0..(String.length(".html") * (-1))) end,
"replace_suffix" => fn -> String.replace_suffix(path, ".html", "") end,
"split" => fn -> String.splitter(path, ".") |> Enum.slice(0..-2) |> Enum.join(".") end,
"regex" => fn -> String.replace(path, ext_rx, "") end,
})

The results for this are:

1
2

This led be to look at the actual Elixir code for replace_prefix and replace_suffix which is:

https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/string.ex#L752

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def replace_prefix(string, match, replacement)
when is_binary(string) and is_binary(match) and is_binary(replacement) do
prefix_size = byte_size(match)
suffix_size = byte_size(string) - prefix_size
case string do
<<prefix::size(prefix_size)-binary, suffix::size(suffix_size)-binary>> when prefix == match ->
replacement <> suffix
_ ->
string
end
end
def replace_suffix(string, match, replacement)
when is_binary(string) and is_binary(match) and is_binary(replacement) do
suffix_size = byte_size(match)
prefix_size = byte_size(string) - suffix_size
case string do
<<prefix::size(prefix_size)-binary, suffix::size(suffix_size)-binary>> when suffix == match ->
prefix <> replacement
_ ->
string
end
end

I tweaked the benchmark code a little to run each replace a 1000 times to remove the “too fast” warning.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
defmodule Bench do
def run(fun), do: fun.()
def no_run(_fun), do: :ok
def times(n \\ 1000, fun), do: fn -> Enum.each(1..n, fn _ -> fun.() end) end
end
# match beginning of string
Bench.run(fn ->
path = "/haha/index.html"
subdomain_rx = ~r(^\/[^\/]+)
Benchee.run(%{
"pattern_match_bytes" => Bench.times(fn ->
len = byte_size("/haha")
<<_::bytes-size(len), rest :: binary >> = path
rest
end),
"pattern_match" => Bench.times(fn -> "/haha" <> rest = path; rest end),
"slice" => Bench.times(fn -> String.slice(path, String.length("/haha")..-1) end),
"replace_prefix" => Bench.times(fn -> String.replace_prefix(path, "/haha", "") end),
"split" => Bench.times(fn -> String.splitter(path, "/") |> Enum.drop(1) |> Enum.join("/") end),
"regex" => Bench.times(fn -> String.replace(path, subdomain_rx, "") end),
})
end)
# match end of string string
Bench.run(fn ->
path = "/haha/index.html"
ext_rx = ~r/\.[^\.]+$/
Benchee.run(%{
"reverse_pattern_match_bytes" => Bench.times(fn ->
len = byte_size(".html")
<<_::bytes-size(len), rest :: binary >> = String.reverse(path)
rest |> String.reverse
end),
"reverse_pattern_match" => Bench.times(fn -> "lmth." <> rest = String.reverse(path); String.reverse(rest) end),
"slice" => Bench.times(fn -> String.slice(path, 0..(String.length(".html") * (-1))) end),
"replace_suffix" => Bench.times(fn -> String.replace_suffix(path, ".html", "") end),
"split" => Bench.times(fn -> String.splitter(path, ".") |> Enum.slice(0..-2) |> Enum.join(".") end),
"regex" => Bench.times(fn -> String.replace(path, ext_rx, "") end),
})
end)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
elixir_benchmarks [master *] $ mix run lib/bench.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019316 GB
Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 42.00 s
Benchmarking pattern_match...
Benchmarking pattern_match_bytes...
Benchmarking regex...
Benchmarking replace_prefix...
Benchmarking slice...
Benchmarking split...
Name ips average deviation median
pattern_match_bytes 15.17 K 0.0659 ms ±18.05% 0.0610 ms
pattern_match 14.60 K 0.0685 ms ±17.41% 0.0640 ms
replace_prefix 2.52 K 0.40 ms ±21.46% 0.38 ms
slice 0.83 K 1.20 ms ±21.95% 1.11 ms
split 0.58 K 1.72 ms ±16.76% 1.63 ms
regex 0.45 K 2.24 ms ±7.42% 2.22 ms
Comparison:
pattern_match_bytes 15.17 K
pattern_match 14.60 K - 1.04x slower
replace_prefix 2.52 K - 6.01x slower
slice 0.83 K - 18.24x slower
split 0.58 K - 26.10x slower
regex 0.45 K - 33.98x slower
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019316 GB
Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 42.00 s
Benchmarking regex...
Benchmarking replace_suffix...
Benchmarking reverse_pattern_match...
Benchmarking reverse_pattern_match_bytes...
Benchmarking slice...
Benchmarking split...
Name ips average deviation median
replace_suffix 2633.75 0.38 ms ±21.15% 0.36 ms
split 618.06 1.62 ms ±13.56% 1.57 ms
regex 389.25 2.57 ms ±6.54% 2.54 ms
slice 324.19 3.08 ms ±19.06% 2.88 ms
reverse_pattern_match_bytes 275.45 3.63 ms ±12.08% 3.48 ms
reverse_pattern_match 272.06 3.68 ms ±11.99% 3.54 ms
Comparison:
replace_suffix 2633.75
split 618.06 - 4.26x slower
regex 389.25 - 6.77x slower
slice 324.19 - 8.12x slower
reverse_pattern_match_bytes 275.45 - 9.56x slower
reverse_pattern_match 272.06 - 9.68x slower
elixir_benchmarks [master *] $

For reverse string removal from the end, replace_suffix is the fastest which makes sense. However, for removing the prefix, pattern_match_bytes seems to be the fastest. But, it isn’t really truly correct. Because in my instance, I know for sure that the prefix is present. So, the second best performance for which is pattern_match is 6x better than the current String.replace_prefix implementation.

It may be because I am using OTP 20? I’ll run this on other versions of OTP to compare results. And, if the results are cosistent, will create PR on elixir to change the default implementation.

Optimal order for adding lists in Elixir

Lists are the bread and butter of a functional language and Elixir is no different.

Elixir uses linked lists to represent lists. Which means, if a list is n elements long it will take n dereferences to get to the last element of the list. This understanding is very important for writing efficient code in Elixir. Because of this adding to the head of a list is nearly instantaneous.

Adding to the beginning of a list

Let us take the following list as an example:

el1 -> el2 -> el3 -> el4

It has 4 elements and el1 is the head of the list. To add a new element el0 to the beginning of the list, All you need to do is create a node to store el0 and set its next pointer to el1. This changes the representation to:

el0 -> el1 -> el2 -> el3 -> el4

Now, one thing to note is: if a previous variable has a reference to el1 it will still have a reference to the earlier 4 element list. So, we are not mutating/chaning the existing list/references.

Adding to the end of a list

However, adding something to the end is not the same. Let us take the previous example:

el1 -> el2 -> el3 -> el4

Now, if this list is referenced by a binding foo. And if we want to create a new list bar with a new element el5 at the end. We can’t just traverse the list, create a new node with value el5 and set a reference from el4 to el5. If we did that, the reference foo would also get a new element at the end. And this is not how Elixir/Erlang work. The BEAM does not allow mutation to existing data. So, to work within this framework, we have to create a brand new list containing a copy of all elements el1..el4 and a new node el5. That is why adding elements to the tail of a linked list is slow in Elixir. Because we end up copying the list and appending a new element.

Now, with this understanding. Let us think of the most efficient way of combining two lists when the order of elements doesn’t matter. For instance, when you send http requests using httpoison the order of the headers doesn’t matter. So, when you have the following implementations available:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# ...
# A: First list is small most of the time
@default_headers [{"content-type", "application/json"}, {"authorization", "Bearer Foo"}, {"accept", "application/json"}]
def get(url, headers \\ []) do
headers ++ @default_headers
end
# B: Second list is small most of the time
@default_headers [{"content-type", "application/json"}, {"authorization", "Bearer Foo"}, {"accept", "application/json"}]
def get(url, headers \\ []) do
@default_headers ++ headers
end
# ...

Pick the one where the first list has lesser elements. In this example that would be the A implementation.

I did a quick benchmark just for kicks (Full code available at https://github.com/minhajuddin/bench_list_cat):

Benchmark

Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 14.00 s


Benchmarking big_list_first...
Benchmarking small_list_first...

Name                       ips        average  deviation         median
small_list_first        6.49 K       0.154 ms   ±371.63%      0.0560 ms
big_list_first       0.00313 K      319.87 ms    ±37.78%      326.10 ms

Comparison:
small_list_first        6.49 K
big_list_first       0.00313 K - 2077.49x slower

Code used for benchmarking

small_list = Enum.to_list(1..10_000)
big_list = Enum.to_list(1..10_000_000)

Benchee.run(%{
  "small_list_first" => fn -> small_list ++ big_list end,
  "big_list_first" => fn -> big_list ++ small_list end
})

Note that this is an outrageous benchmark, no one is adding lists containing 10 million elements this way ;). But it demonstrates my point.

How to pass a multi line copy sql to psql

I have been working with a lot of ETL stuff lately and have to import/export data from our postgresql database frequently.

While writing a script recently, I found that psql doesn’t allow using the \COPY directive with a multi line SQL when it is passed to the psql command. The only workaround seemed to be squishing the sql into a single line. However, this makes it very difficult to read and modify the SQL. This is when bash came to my rescue :)

Here is a hacky way to do use multiline SQL with \COPY.

This just strips the newlines before sending it to psql. Have your cake and eat it too :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Using a file
psql -f <(tr -d '\n' < ~/s/test.sql )
# or
psql < <(tr -d '\n' < ~/s/test.sql )
# Putting the SQL using a HEREDOC
cat <<SQL | tr -d '\n' | \psql mydatabase
\COPY (
SELECT
provider_id,
provider_name,
...
) TO './out.tsv' WITH( DELIMITER E'\t', NULL '', )
SQL

How to debug/view phoenix channel network traffic in Google Chrome

When you open Google Chrome’s Network tab, you don’t see the traffic for websockets. I spent quite some time trying to see the messages between my browser and the phoenix server. I was expecting to see a lot of rows in the network tab one for every message.

However, since websockets don’t follow a request-response pattern. They are shown in a different tab. To see the messages sent on the websocket. Click on your Network tab and then click on the websocket request. This should show you a pane with a Frames tab. This Frames tab should show you all the messages that are being sent back and forth on the websocket.

Here is a screenshot: Network Websocket Frames tab

3 things that are needed to make a successful product

In my 10 years in the software industry. I have create a number of products and have worked on a lot of projects. Looking back at the products/projects that have been successful, one this stands out. There are 3 critical pieces for a software product / startup.

0. A value proposition

This is a dead giveaway. So, I haven’t even counted this. Without a value proposition you don’t have anything. Your product must provide value to your customers.

1. Domain knowledge

You need to have someone on your team with the knowledge of the domain. Ideally you would have come up with a product idea because of a good understanding of the pain points. And of the things that can provide value. This is also fairly easy to understand.

2. Marketing Strength

You also need to have someone who can market your product. An awesome product without marketing is a dead product. You need to either build your marketing expertise or get someone who is good at it.

Marketing is one of the things that is often overlooked. People think that if the product is good people will buy it. This is completely false. You need a lot of hustle to market your product.

3. Technical expertise

You obviously need someone who can build a usable product which provides value. But this is the last on the list.

Many people come to me with ideas for startups. I always tell them about these 3 things. The next time you want to build a startup, think about these skills. Without one of these you are dead in the water.

Getting started with Elm

I had a good time presenting a talk about “Getting Started with Elm” at the awesome nghyderabad The audience was very interactive and the food was great. Shout out to Fission Labs for the awesome venue!

Here are a few useful links which should help you learn Elm

  1. elm-lang home page: http://elm-lang.org/
  2. docs: http://elm-lang.org/docs
  3. examples: http://elm-lang.org/examples
  4. A handy online book: https://guide.elm-lang.org/
  5. Online editor: http://elm-lang.org/try
  6. Elm packages website: http://package.elm-lang.org/

Code for the app that we built: https://github.com/minhajuddin/getting-started-with-elm/blob/master/Counter.elm

How to fix Ecto duplicate name migrations error

If you run into the following error while running your Ecto migrations

ReleaseTasks.migrate
** (Ecto.MigrationError) migrations can't be executed, migration name foo_bar is duplicated
   (ecto) lib/ecto/migrator.ex:259: Ecto.Migrator.ensure_no_duplication/1
   (ecto) lib/ecto/migrator.ex:235: Ecto.Migrator.migrate/4

You can fix it by running 1 migration at a time

mix ecto.migrate --step 1

This happens when you are trying to run two migrations with the same name (regardless of the timestamps). By restricting it to run 1 migration at a time you won’t run into this issue.

Ideally you should not have 2 migrations with the same name :)

Lightweight xml utility to pluck elements

jq is an awesome utility for parsing and transforming json via the command line. I wanted something similar for xml. The following is a short ruby script which does a tiny tiny (did I say tiny?) bit of what jq does for xml. Hope you find it useful.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env ruby
# Author: Khaja Minhajuddin
require 'nokogiri'
require 'parallel'
require 'etc'
if ARGV.count < 2
puts <<-EOS
Usage:
xml_pluck xpath file1.xml file2.xml
e.g.
xml_pluck "//children/name/text()" <(echo '<?xml version="1.0"?><children><name>Zainab</name><name>Mujju</name></children>')
# prints
Zainab
Mujju
EOS
exit -1
end
xpath = ARGV.shift
Parallel.each(ARGV, in_processes: Etc.nprocessors) do |file|
doc = Nokogiri::XML(File.read(file))
puts doc.xpath(xpath)
end

Key transformation after decoding json in Elixir

In a previous blog post we saw how to do case insensitive retrieval from maps. A better solution for this if there are many key lookups is to transform the input by lower casing all the keys just after decoding. The solution from the blog post would iterate over each {key, value} pair till it found the desired key. However a proper map lookup doesn’t iterate over the keys but uses a hashing algorithm to get to the key’s location in constant time regardless of the size of the map.

Anyway, Here is the solution to transform each key for input JSON. Hope you find it useful :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
defmodule KeyTransformer do
def lower_case_keys(input) do
transform_keys(input, &String.downcase/1)
end
def transform_keys(input, tx_key_fun) when is_list(input) do
Enum.map(input, fn el -> transform_keys(el, tx_key_fun) end)
end
def transform_keys(input, tx_key_fun) when is_map(input) do
Enum.reduce(input, %{}, fn {k, v}, acc ->
Map.put(acc, tx_key_fun.(k), transform_keys(v, tx_key_fun))
end)
end
def transform_keys(value, _tx_key_fun), do: value
end
ExUnit.start
defmodule KeyTransformerTest do
use ExUnit.Case
import KeyTransformer
test "simple map" do
assert lower_case_keys(%{"NAME" => "Khaja"}) == %{"name" => "Khaja"}
assert lower_case_keys(%{"NAME" => "Khaja", "Age" => 3}) == %{"name" => "Khaja", "age" => 3}
end
test "nested map" do
assert lower_case_keys(%{"Mujju" => %{"NAME" => "Khaja"}}) == %{"mujju" => %{"name" => "Khaja"}}
end
test "deeply nested map" do
assert lower_case_keys(%{"Children" => %{"Mujju" => %{"NAME" => "Khaja"}}}) == %{"children" => %{"mujju" => %{"name" => "Khaja"}}}
end
test "list of maps" do
assert lower_case_keys([%{"NAME" => "Zainu"}]) == [%{"name" => "Zainu"}]
assert lower_case_keys([%{
"NAME" => "Khaja Muzaffaruddin",
"agE" => 2,
}, %{}]) == [%{"age" => 2, "name" => "Khaja Muzaffaruddin"}, %{}]
end
test "nested list of maps" do
assert lower_case_keys(%{
"JUlian" => [%{"Movie" => "Madagascar"}]
}) == %{"julian" => [%{"movie" => "Madagascar"}]}
end
test "deeply nested list of maps" do
assert lower_case_keys(%{"MovieGenres" => [%{
"JUlian" => [%{"Movie" => "Madagascar"}]
}, %{"Ho" => 33}], "OK then" => "little story"}) == %{
"moviegenres" => [%{"julian" => [%{"movie" => "Madagascar"}]}, %{"ho" => 33}], "ok then" => "little story"
}
end
end

How to learn vim properly

Vim is the editor of my choice, I love it a lot. I try to find vim bindings everywhere I can, A few apps which have good vim bindings

  1. Chrome with vimium
  2. The terminal with a proper ~/.inputrc. My ~/.inputrc below

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    # ~/.inputrc
    #vim key bindings
    set editing-mode vi
    set keymap vi
    # do not bell on tab-completion
    set bell-style bell
    set expand-tilde off
    set input-meta off
    set convert-meta on
    set output-meta off
    set horizontal-scroll-mode off
    set history-preserve-point on
    set mark-directories on
    set mark-symlinked-directories on
    set match-hidden-files off
    # completion settings
    set page-completions off
    set completion-query-items 2000
    set completion-ignore-case off
    set show-all-if-ambiguous on
    set show-all-if-unmodified on
    set completion-prefix-display-length 10
    set print-completions-horizontally off
    C-n: history-search-forward
    C-p: history-search-backward
    #new stuff
    "\C-a": history-search-forward
  3. Once you set this up, many repls will respect these bindings. For instance irb, pry respect these. As a matter of fact any good terminal app which use the readline library will respect this.

  4. Tmux is another software that has vim bindings

So, whenever I work with someone people always seem to be impressed that vim can do so much so simply. This is really the power of vim, vim was built for text editing and it is the best for this job. However, learning it can be quite painful and many people will abandon learning it in a few days.

There is a very popular learning curve graph about vim

Editor learning curves

Source

The part about vim is partially true, in that once it clicks everything falls into place.

Notepad is an editor which is very easy to use, but if you compare it to programming languages it has the capability of a calculator. You put your cursor in a place type stuff and that is all. Vim lets you speak to it, in an intelligent way Anyway, I am rambling at this point.

The reason I am writing this blog post in the middle of the night is because many people ask me “How should I setup vim?”, I’d love to have it look/work like yours. And many times I point them to my vimrc. However, if you are planning on learning vim, don’t go there. Start with the following ~/.vimrc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
set nocompatible
" plugins
call plug#begin('~/.vim/plugged')
Plug 'tpope/vim-sensible'
Plug 'kien/ctrlp.vim'
Plug 'matchit.zip'
runtime macros/matchit.vim
call plug#end()
" Ctrlp.vim
let g:ctrlp_map = '<c-p>'
let g:ctrlp_cmd = 'CtrlP'
let g:ctrlp_working_path_mode = 'ra'
let g:ctrlp_custom_ignore = {
\ 'dir': '\v[\/]\.(git|hg|svn)$',
\ 'file': '\v\.(exe|so|dll)$',
\ 'link': 'some_bad_symbolic_links',
\ }

That is all, no more no less.

To finish the installation, you need to do 2 things:

  1. Run curl -fLo ~/.vim/autoload/plug.vim --create-dirs https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim
  2. Run vim +PlugInstall from your terminal

A few simple tips on how to learn vim properly:

  1. Finish vimtutor on your terminal 3 to 4 times. Read everything 3 to 4 times and actually practice it.
  2. Learn about vim movements, commands and modes
  3. Open your vim editor at the root of the project and have just one instance open, don’t open more than one instance per project. This is very very important. I can’t stress this enough. To open another file from your project, hit Ctrl+P
  4. Start with a simple vimrc, The one I pasted above is a good start.
  5. Learn about buffers / windows and tabs in vim and how to navigate them.
  6. Add 1 extension that you think might help every month. And put a few sticky notes with its shortcuts/mappings on your monitor.
  7. Use http://vimawesome.com/ to find useful plugins.

Most important of all: Don’t use any plugin other than sensible and CtrlP for the first month

Once you learn to speak the language of vim, using other editors will make you feel dumb.

A simpler way to generate an incrementing version for elixir apps

Mix has the notion of versions built into it. If you open up a mix file you’ll see a line like below:

1
2
3
4
5
6
7
8
# mix.exs
defmodule Webmonitor.Mixfile do
use Mix.Project
def project do
[app: :webmonitor,
version: "0.1.0",
# ...

If you are using Git, there is a simple way to automatically generate a meaningful semantic version. All you need to do is:

  1. Tag a commit with a version tag, like below:
1
git tag --annotate v1.0 --message 'First production version, Yay!'
  1. Put a helper function which can use this info with git describe to generate a version
1
2
3
4
5
6
7
8
9
10
defp app_version do
# get git version
{description, 0} = System.cmd("git", ~w[describe]) # => returns something like: v1.0-231-g1c7ef8b
_git_version = String.strip(description)
|> String.split("-")
|> Enum.take(2)
|> Enum.join(".")
|> String.replace_leading("v", "")
end
  1. Use the return value from this function as the version
1
2
3
4
5
6
7
8
# mix.exs
defmodule Webmonitor.Mixfile do
use Mix.Project
def project do
[app: :webmonitor,
version: app_version(),
# ...

The way this works is simple. From the man pages of git-describe

NAME git-describe - Describe a commit using the most recent tag reachable from it

DESCRIPTION The command finds the most recent tag that is reachable from a commit. If the tag points to the commit, then only the tag is shown. Otherwise, it suffixes the tag name with the number of additional commits on top of the tagged object and the abbreviated object name of the most recent commit.

So, if you have a tag v1.0 like above, and if you have 10 commits on top of it, git-describe will print v1.0-100-g1c7ef8b where v1.0 is the latest git tag reachable from the current commit, 100 is the number of commits since then and g1c7ef8b is the short commit hash of the current commit. We can easily transform this to 1.0.100 using the above helper function. Now, you have a nice way of automatically managing versions. The patch version is bumped whenever a commit is made, the major and minor version can be changed by creating a new tag, e.g. v1.2

This is very useful when you are using distillery for building your releases

Case insensitive key retrieval from maps in Elixir

I ran into an issue with inconsistent naming of keys in one of my provider’s json. This is really bad data quality, the data that is being sent should have consistent key names. Either uppper, lower, or capitalized, but consistent. Anyway, this provider was sending data will all kinds of mixed case keys.

Here is some elixir code that I wrote to get keys using a case insensitive match. There is an issue on the Poison decoder project which should render this useless, however till that is fixed you can use the code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
defmodule CaseInsensitiveGetIn do
def ci_get_in(nil, _), do: nil
def ci_get_in({_k, val}, []), do: val
def ci_get_in({_k, val}, key), do: ci_get_in val, key
def ci_get_in(map, [key|rest]) do
current_level_map = Enum.find(map, &key_lookup(&1, key))
ci_get_in current_level_map, rest
end
def key_lookup({k, _v}, key) when is_binary(k) do
String.downcase(k) == String.downcase(key)
end
end
ExUnit.start
defmodule CaseInsensitiveGetInTest do
use ExUnit.Case
import CaseInsensitiveGetIn
test "gets an exact key" do
assert ci_get_in(%{"name" => "Mujju"}, ~w(name)) == "Mujju"
end
test "gets capitalized key in map" do
assert ci_get_in(%{"Name" => "Mujju"}, ~w(name)) == "Mujju"
end
test "gets capitalized input key in map" do
assert ci_get_in(%{"Name" => "Mujju"}, ~w(Name)) == "Mujju"
end
test "gets mixed input key in map" do
assert ci_get_in(%{"NaME" => "Mujju"}, ~w(nAme)) == "Mujju"
end
test "gets an exact deep key" do
assert ci_get_in(%{"name" => "Mujju", "sister" => %{"name" => "Zainu"}}, ~w(sister name)) == "Zainu"
end
test "gets an mixed case deep map key" do
assert ci_get_in(%{"name" => "Mujju", "sisTER" => %{"naME" => "Zainu"}}, ~w(sister name)) == "Zainu"
end
test "gets an mixed case deep key" do
assert ci_get_in(%{"name" => "Mujju", "sisTER" => %{"naME" => "Zainu"}}, ~w(sIStER NAme)) == "Zainu"
end
test "gets a very deep key" do
map = %{
"aB" => %{
"BC" => 7,
"c" => %{"DD" => :foo, "Cassandra" => :awesome, "MOO" => %{"name" => "Mujju"}}
}}
assert ci_get_in(map, ~w(ab bc)) == 7
assert ci_get_in(map, ~w(ab c dd)) == :foo
assert ci_get_in(map, ~w(ab c moo name)) == "Mujju"
assert ci_get_in(map, ~w(ab Bc)) == 7
assert ci_get_in(map, ~w(ab C dD)) == :foo
assert ci_get_in(map, ~w(ab C mOo nAMe)) == "Mujju"
end
end

Script to analyze the structure of an xml document

While working with XML data, you often don’t find the WSDL files and may end up manually working through the document to understand its structure. At my current project I ran into a few hundred XML files and had to analyze them to understand the data available. Here is a script I created which prints all the possible nodes in the input files

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/usr/bin/env ruby
# Author: Khaja Minhajuddin <minhajuddin.k@gmail.com>
require 'nokogiri'
class XmlAnalyze
def initialize(filepaths)
@filepaths = filepaths
@node_paths = {}
end
def analyze
@filepaths.each { |filepath| analyze_file(filepath) }
@node_paths.keys.sort
end
private
def analyze_file(filepath)
@doc = File.open(filepath) { |f| Nokogiri::XML(f) }
analyze_node(@doc.children.first)
end
def analyze_node(node)
return if node.is_a? Nokogiri::XML::Text
add_path node.path
node.attributes.keys.each do |attr|
add_path("#{node.path}:#{attr}")
end
node.children.each do |child|
analyze_node(child)
end
end
def add_path(path)
path = path.gsub(/\[\d+\]/, '')
@node_paths[path] = true
end
end
if ARGV.empty?
puts 'Usage: ./analyze_xml.rb file1.xml file2.xml ....'
exit(-1)
end
puts XmlAnalyze.new(ARGV).analyze

It outputs the following for the xml below

1
2
3
4
5
6
7
8
9
10
11
<?xml version="1.0" encoding="UTF-8"?>
<root>
<person>
<name type="full">Khaja</name>
<age>31</age>
</person>
<person>
<name type="full">Khaja</name>
<dob>Jan</dob>
</person>
</root>
1
2
3
4
5
6
/root
/root/person
/root/person/age
/root/person/dob
/root/person/name
/root/person/name:type

Hope you find it useful!

Bash completion script for mix

Bash completion is very handy for cli tools. You can set it up very easily for mix using the following script.

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash
# `sudo vim /etc/bash_completion.d/mix.sh` and put this inside of it
# mix bash completion script
complete_mix_command() {
[ -f mix.exs ] || exit 0
mix help --search "$2"| cut -f1 -d'#' | cut -f2 -d' '
return $?
}
complete -C complete_mix_command -o default mix

How to show your blog content in your Rails application

I recently released LiveForm which is a service which gives you form endpoints (I’d love to have you check it out :) ) I wanted to show my blog’s content on the home page, It is pretty straightforward with the rich ruby ecosystem.

  1. First you need a way to get the data from your blog. The LiveForm blog has an atom feed at http://blog.liveformhq.com/atom.xml . I initially used RestClient to get the data from the feed.
  2. Once we have the feed, we need to parse it to extract the right content. Some quick googling led me to the awesome feedjira gem, (I am not gonna comment about the awesome name:))
  3. feedjira actually has a simple method to parse the feed from a URL Feedjira::Feed.fetch_and_parse(url)
  4. Once I got the entries, I just had to format them properly. However, there was an issue with summaries of blog posts having malformed html. This was due to naively slicing the blog post content at 200 characters by hexo (the blog engine I use), Nokogiri has a simple way of working around this. However, I went one step further and removed all html markup from the summary so that it doesn’t mess with the web application’s markup: Nokogiri::HTML(entry.summary).css("body").text
  5. Finally, I didn’t want to fetch and parse my feed for every user that visited my website. So, I used fragment caching to render the feed once every day.

Here is all the relevant code:

The class that fetches and parses the feed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class LiveformBlog
URL = "http://blog.liveformhq.com/atom.xml"
def entries
Rails.logger.info "Fetching feed...................."
feed = Feedjira::Feed.fetch_and_parse(URL)
feed.entries.take(5).map {|x| parse_entry(x)}
end
private
def parse_entry(entry)
OpenStruct.new(
title: entry.title,
summary: fix_summary(entry),
url: entry.url,
published: entry.published,
)
end
def fix_summary(entry)
doc = Nokogiri::HTML(entry.summary)
doc.css('body').text
end
end

The view that caches and renders the feed

1
2
3
4
5
6
7
8
9
10
11
12
<%= cache Date.today.to_s do %>
<div class='blog-posts'>
<h2 class='section-heading'>From our Blog</h2>
<% LiveformBlog.new.entries.each do |entry| %>
<div class=blog-post>
<h4><a href='<%= entry.url %>'><%= entry.title %></a></h4>
<p class='blog-post__published'><%= short_time entry.published %></p>
<div><%= entry.summary %>...</div>
</div>
<% end %>
</div>
<% end %>

Screenshot of the current page

Liveform blog

How to deploy a simple phoenix app on a single server using distillery

If you find issues or can improve this guide, please create a pull request at:

2. Setup the server

We’ll be running our server under the user called slugex. So, we first need to create that user.

1
2
3
4
5
6
7
8
9
## commands to be executed on our server
APP_USER=slugex
# create prent dir for our home
sudo mkdir -p /opt/www
# create the user
sudo useradd --home "/opt/www/$APP_USER" --create-home --shell /bin/bash $APP_USER
# create the postgresql role for our user
sudo -u postgres createuser --echo --no-createrole --no-superuser --createdb $APP_USER

3. Install the git-deploy rubygem on our local computer

We’ll be using the git-deploy rubygem to do deploys. This allows deploys similar to Heroku. You just need to push to your production git repository to start a deployment.

1
2
3
4
## commands to be executed on our local computer
# install the gem
# you need ruby installed on your computer for this
gem install git-deploy

4. Setup distillery in our phoenix app (on local computer)

We’ll be using distillery to manage our releases.

Add the distillery dependency to our mix.exs

1
2
3
defp deps do
[{:distillery, "~> 0.10"}]
end

Init the distillery config

1
2
3
4
# get dependencies
mix deps.get
# init distillery
mix release.init

Change rel/config.ex to look like below

1
2
3
4
5
6
7
...
environment :prod do
set include_erts: false
set include_src: false
# cookie info ...
end
...

5. Setup git deploy (local computer)

Let us setup the remote and the deploy hooks

1
2
3
4
5
6
7
8
9
10
11
## commands to be executed on our local computer
# setup the git remote pointing to our prod server
git remote add prod slugex@slugex.com:/opt/www/slugex
# init
git deploy setup -r "prod"
# create the deploy files
git deploy init
# push to production
git push prod master

TODO: release this as a book

6. Setup postgresql access

1
2
3
4
5
6
7
8
9
10
## commands to be executed on the server as the slugex user
# create the database
createdb slugex_prod
# set the password for the slugex user
psql slugex_prod
> slugex_prod=> \password slugex
> Enter new password: enter the password
> Enter it again: repeat the password

6. Setup the prod.secret.exs

Copy the config/prod.secret.exs file from your local computer to /opt/www/slugex/config/prod.secret.exs

1
2
## on local computer from our phoenix app directory
scp config/prod.secret.exs slugex@slugex.com:config/

create a new secret on your local computer using mix phoenix.gen.secret and paste it in the server’s config/prod.secret.exs secret

It should look something like below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# on the server
# /opt/www/slugex/config/prod.secret.exs
use Mix.Config
config :simple, Simple.Endpoint,
secret_key_base: "RgeM4Dt8kl3yyf47K1DXWr8mgANzOL9TNOOiCknZM8LLDeSdS1ia5Vc2HkmKhy68"
http: [port: 4010],
server: true, # <=== this is very important
root: "/opt/www/slugex",
url: [host: "slugex.com", port: 443],
cache_static_manifest: "priv/static/manifest.json"
# Do not print debug messages in production
config :logger, level: :info
# Configure your database
config :simple, Simple.Repo,
adapter: Ecto.Adapters.Postgres,
username: "slugex",
password: "another brick in the wall",
database: "slugex_prod",
pool_size: 20

6. Tweak the deploy scripts

7. One time setup on the server

1
2
3
## commands to be executed on server as slugex
MIX_ENV=prod mix do compile, ecto.create
MIX_ENV=prod ./deploy/after_push

Logger

Exception notifications

Setup systemd

6. One time setup on server (on server as slugex user)

1
2
3
4
5
6
7
8
9
10
11
12
## commands to be run on the server as the slugex user
cd /opt/www/slugex
# create the secrets config
echo 'use Mix.Config' > config/prod.secrets.exs
# add your configuration to this file
# update hex
export MIX_ENV=prod
mix local.hex --force
mix deps.get
mix ecto.create

6. Nginx configuration

7. Letsencrypt setup and configuration

9. TODO: Configuration using conform

10. TODO: database backups to S3

10. TODO: uptime monitoring of websites using uptime monitor

10. TODO: email via SES

10. TODO: db seeds

10. TODO: nginx caching basics, static assets large expirations

10. TODO: remote console for debugging

sudo letsencrypt certonly –webroot -w /opt/www/webmonitor/public/ -d webmonitorhq.com –webroot -w /opt/www/webmonitor/public/ -d www.webmonitorhq.com

11. Check SSL certificate: https://www.sslshopper.com/ssl-checker.html

Common mistakes/errors

  1. SSH errors

Improvements

  1. Automate all of these using a hex package?
  2. Remove dependencies on git-deploy if possible
  3. Hot upgrades

How to extract bits from a binary in elixir

Erlang and by extension Elixir have powerful pattern matching constructs, which allow you to easily extract bits from a binary. Here is an example which takes a binary and returns their bits

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
defmodule Bits do
# this is the public api which allows you to pass any binary representation
def extract(str) when is_binary(str) do
extract(str, [])
end
# this function does the heavy lifting by matching the input binary to
# a single bit and sends the rest of the bits recursively back to itself
defp extract(<<b :: size(1), bits :: bitstring>>, acc) when is_bitstring(bits) do
extract(bits, [b | acc])
end
# this is the terminal condition when we don't have anything more to extract
defp extract(<<>>, acc), do: acc |> Enum.reverse
end
IO.inspect Bits.extract("!!") # => [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]
IO.inspect Bits.extract(<< 99 >>) #=> [0, 1, 1, 0, 0, 0, 1, 1]

The code is pretty self explanatory

Elixir process timeout pitfall

If you taken a look at Elixir, you may have come across something like the below code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
defmodule HardWorker do
def work(id) do
Process.sleep(id * 900)
{:done, id}
end
end
defmodule Runner do
@total_timeout 1000
def run do
{us, _} = :timer.tc &work/0
IO.puts "ELAPSED_TIME: #{us/1000}"
end
def work do
tasks = Enum.map 1..10, fn id ->
Task.async(HardWorker, :work, [id])
end
Enum.map(tasks, fn task ->
Task.await(task, @total_timeout)
end)
end
end
Runner.run

Looks simple enough, we loop over and create 10 processes and then wait for them to finish. It also prints out a message ELAPSED_TIME: _ at the end where _ is the time taken for it to run all the processes.

Can you take a guess how long this runner will take in the worst case?

If you guessed 10 seconds, you are right! I didn’t guess 10 seconds when I first saw this kind of code. I expected it to exit after 1 second. However, the key here is that Task.await is called on 10 tasks and if the 10 tasks finish at the end of 1s, 2s, … 10s This code will run just fine.

This is a completely made up example but it should show you that running in parallel with timeouts is not just a Task.await away.

I have coded an example app with proper timeout handling and parallel processing at https://github.com/minhajuddin/parallel_elixir_workers Check it out.

Addendum

I posted this on the elixirforum and got some feedback about it.

1
2
3
4
5
6
7
8
9
tasks = Enum.map 1..10, fn id ->
Task.async(HardWorker, :work, [id])
end
# at this point all tasks are running in parallel
Enum.map(tasks, fn task ->
Task.await(task, @total_timeout)
end)

Let us take another look at the relevant code. Now, let us say that this is spawning processes P1 to P10 in that order. Let’s say tasks T1 to T10 are created for these processes. Now all these tasks are running concurrently.

Now, in the second Enum.map, in the first iteration the task is set to T1, so T1 has to finish before 1 second otherwise this code will timeout. However, while T1 is running T2..T10 are also running. So, when this code runs for T2 and waits for 1 second, T2 had been running for 2s. So, effectively T1 would be given a time of 1 second, T2 a time of 2 seconds and T3 a time of 3 seconds and so on.

This may be what you want. However, if you want all the tasks to finish executing within 1 second. You shouldn’t use Task.await. You can use Task.yield_many which takes a list of tasks and allows you to specify a timeout after which it returns with the results of whatever processes finished. The documentation for Task.yield_many has a very good example on how to use it.

@benwilson512 has a good example on this

..suppose you wrote the following code:

1
2
3
4
task = Task.async(fn -> Process.sleep(:infinity) end)
Process.sleep(5_000)
Task.await(task, 5_000)

How long before it times out? 10 seconds of course. But this is obvious and expected. This is exactly what you’re doing by making the Task.await calls consecutive. It’s just that instead of sleeping in the main process you’re waiting on a different task. Task.await is blocking, this is expected.

How to control pianobar using global hotkeys using Tmux

I love pianobar. However, until yesterday I hated pausing and moving to the next video using pianobar. I had a small terminal dedicated for pianobar and every time I had to change the song or pause, I used to select the window and then hit the right shortcut. I love hotkeys, the allow you to control your stuff without opening windows. I also happen to use tmux a lot. And it hit me yesterday, I could have easily bound hotkeys to send the right key sequences to pianobar running a tmux session. Here is how I did it.

I use xmonad, so I wired up Windows + Shift + p to tmux send-keys -t scratch:1.0 p &> /tmp/null.log So, now whenever I hit the right hotkey it types the letter ‘p’ in the tmux session scratch window 1 and pane 0, where I have pianobar running.

I use xmonad, but you should be able to put these in a wrapper script and wire them up with any window manager or with unity.

1
2
3
4
5
6
7
-- relevant configuration
, ((modMask .|. shiftMask, xK_p ), spawn "tmux send-keys -t scratch:1.0 p &> /tmp/null.log") -- %! Pause pianobar
, ((modMask .|. shiftMask, xK_v ), spawn "tmux send-keys -t scratch:1.0 n &> /tmp/null.log") -- %! next pianobar
, ((modMask, xK_c ), spawn "mpc toggle") -- %! Pause mpd
, ((modMask, xK_z ), spawn "mpc prev") -- %! previous in mpd
, ((modMask, xK_v ), spawn "mpc next") -- %! next in mpd