Performance variations of string substitution in Elixir

I had to do some string stripping in one of my apps which was a bit performance sensitive. I ended up benching multiple approaches to see the speed differences. The results are not that suprising.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
path = "/haha/index.html"
subdomain_rx = ~r(^\/[^\/]+)
Benchee.run(%{
"pattern_match_bytes" => fn ->
len = byte_size("/haha")
<<_::bytes-size(len), rest :: binary >> = path
rest
end,
"pattern_match" => fn -> "/haha" <> rest = path; rest end,
"slice" => fn -> String.slice(path, String.length("/haha")..-1) end,
"replace_prefix" => fn -> String.replace_prefix(path, "/haha", "") end,
"split" => fn -> String.splitter(path, "/") |> Enum.drop(1) |> Enum.join("/") end,
"regex" => fn -> String.replace(path, subdomain_rx, "") end,
})

output of benchee

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
bench [master] $ mix run lib/bench.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019316 GB
Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 42.00 s
Benchmarking pattern_match...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking pattern_match_bytes...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking regex...
Benchmarking replace_prefix...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking slice...
Benchmarking split...
Name ips average deviation median
pattern_match_bytes 24.05 M 0.0416 μs ±1797.73% 0.0300 μs
pattern_match 22.37 M 0.0447 μs ±1546.59% 0.0400 μs
replace_prefix 3.11 M 0.32 μs ±204.05% 0.22 μs
slice 1.25 M 0.80 μs ±6484.21% 1.00 μs
split 0.75 M 1.34 μs ±3267.35% 1.00 μs
regex 0.42 M 2.37 μs ±1512.77% 2.00 μs
Comparison:
pattern_match_bytes 24.05 M
pattern_match 22.37 M - 1.08x slower
replace_prefix 3.11 M - 7.73x slower
slice 1.25 M - 19.30x slower
split 0.75 M - 32.18x slower
regex 0.42 M - 57.00x slower

So, the next time you want to strip prefixing stuff, use pattern matching :)

Update

Based on the comments by @Matt Widmann and @Peter I did a quick test of replacing the tail of the string using the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
path = "/haha/index.html"
ext_rx = ~r/\.[^\.]+$/
Benchee.run(%{
"reverse_pattern_match_bytes" => fn ->
len = byte_size(".html")
<<_::bytes-size(len), rest :: binary >> = String.reverse(path)
rest |> String.reverse
end,
"reverse_pattern_match" => fn -> "lmth." <> rest = String.reverse(path); String.reverse(rest) end,
"slice" => fn -> String.slice(path, 0..(String.length(".html") * (-1))) end,
"replace_suffix" => fn -> String.replace_suffix(path, ".html", "") end,
"split" => fn -> String.splitter(path, ".") |> Enum.slice(0..-2) |> Enum.join(".") end,
"regex" => fn -> String.replace(path, ext_rx, "") end,
})

The results for this are:

1
2

This led be to look at the actual Elixir code for replace_prefix and replace_suffix which is:

https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/string.ex#L752

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def replace_prefix(string, match, replacement)
when is_binary(string) and is_binary(match) and is_binary(replacement) do
prefix_size = byte_size(match)
suffix_size = byte_size(string) - prefix_size
case string do
<<prefix::size(prefix_size)-binary, suffix::size(suffix_size)-binary>> when prefix == match ->
replacement <> suffix
_ ->
string
end
end
def replace_suffix(string, match, replacement)
when is_binary(string) and is_binary(match) and is_binary(replacement) do
suffix_size = byte_size(match)
prefix_size = byte_size(string) - suffix_size
case string do
<<prefix::size(prefix_size)-binary, suffix::size(suffix_size)-binary>> when suffix == match ->
prefix <> replacement
_ ->
string
end
end

I tweaked the benchmark code a little to run each replace a 1000 times to remove the “too fast” warning.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
defmodule Bench do
def run(fun), do: fun.()
def no_run(_fun), do: :ok
def times(n \\ 1000, fun), do: fn -> Enum.each(1..n, fn _ -> fun.() end) end
end
# match beginning of string
Bench.run(fn ->
path = "/haha/index.html"
subdomain_rx = ~r(^\/[^\/]+)
Benchee.run(%{
"pattern_match_bytes" => Bench.times(fn ->
len = byte_size("/haha")
<<_::bytes-size(len), rest :: binary >> = path
rest
end),
"pattern_match" => Bench.times(fn -> "/haha" <> rest = path; rest end),
"slice" => Bench.times(fn -> String.slice(path, String.length("/haha")..-1) end),
"replace_prefix" => Bench.times(fn -> String.replace_prefix(path, "/haha", "") end),
"split" => Bench.times(fn -> String.splitter(path, "/") |> Enum.drop(1) |> Enum.join("/") end),
"regex" => Bench.times(fn -> String.replace(path, subdomain_rx, "") end),
})
end)
# match end of string string
Bench.run(fn ->
path = "/haha/index.html"
ext_rx = ~r/\.[^\.]+$/
Benchee.run(%{
"reverse_pattern_match_bytes" => Bench.times(fn ->
len = byte_size(".html")
<<_::bytes-size(len), rest :: binary >> = String.reverse(path)
rest |> String.reverse
end),
"reverse_pattern_match" => Bench.times(fn -> "lmth." <> rest = String.reverse(path); String.reverse(rest) end),
"slice" => Bench.times(fn -> String.slice(path, 0..(String.length(".html") * (-1))) end),
"replace_suffix" => Bench.times(fn -> String.replace_suffix(path, ".html", "") end),
"split" => Bench.times(fn -> String.splitter(path, ".") |> Enum.slice(0..-2) |> Enum.join(".") end),
"regex" => Bench.times(fn -> String.replace(path, ext_rx, "") end),
})
end)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
elixir_benchmarks [master *] $ mix run lib/bench.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019316 GB
Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 42.00 s
Benchmarking pattern_match...
Benchmarking pattern_match_bytes...
Benchmarking regex...
Benchmarking replace_prefix...
Benchmarking slice...
Benchmarking split...
Name ips average deviation median
pattern_match_bytes 15.17 K 0.0659 ms ±18.05% 0.0610 ms
pattern_match 14.60 K 0.0685 ms ±17.41% 0.0640 ms
replace_prefix 2.52 K 0.40 ms ±21.46% 0.38 ms
slice 0.83 K 1.20 ms ±21.95% 1.11 ms
split 0.58 K 1.72 ms ±16.76% 1.63 ms
regex 0.45 K 2.24 ms ±7.42% 2.22 ms
Comparison:
pattern_match_bytes 15.17 K
pattern_match 14.60 K - 1.04x slower
replace_prefix 2.52 K - 6.01x slower
slice 0.83 K - 18.24x slower
split 0.58 K - 26.10x slower
regex 0.45 K - 33.98x slower
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 12.019316 GB
Elixir 1.4.4
Erlang 20.0-rc2
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 42.00 s
Benchmarking regex...
Benchmarking replace_suffix...
Benchmarking reverse_pattern_match...
Benchmarking reverse_pattern_match_bytes...
Benchmarking slice...
Benchmarking split...
Name ips average deviation median
replace_suffix 2633.75 0.38 ms ±21.15% 0.36 ms
split 618.06 1.62 ms ±13.56% 1.57 ms
regex 389.25 2.57 ms ±6.54% 2.54 ms
slice 324.19 3.08 ms ±19.06% 2.88 ms
reverse_pattern_match_bytes 275.45 3.63 ms ±12.08% 3.48 ms
reverse_pattern_match 272.06 3.68 ms ±11.99% 3.54 ms
Comparison:
replace_suffix 2633.75
split 618.06 - 4.26x slower
regex 389.25 - 6.77x slower
slice 324.19 - 8.12x slower
reverse_pattern_match_bytes 275.45 - 9.56x slower
reverse_pattern_match 272.06 - 9.68x slower
elixir_benchmarks [master *] $

For reverse string removal from the end, replace_suffix is the fastest which makes sense. However, for removing the prefix, pattern_match_bytes seems to be the fastest. But, it isn’t really truly correct. Because in my instance, I know for sure that the prefix is present. So, the second best performance for which is pattern_match is 6x better than the current String.replace_prefix implementation.

It may be because I am using OTP 20? I’ll run this on other versions of OTP to compare results. And, if the results are cosistent, will create PR on elixir to change the default implementation.


I am currently working on Zammu which makes Automatic Deployment of static websites to Github Pages very easy. I would love to get your feedback on it, Use the invitation code KHAJA