Performance variations of string substitution in Elixir
I had to do some string stripping in one of my apps which was a bit performance sensitive.
I ended up benching multiple approaches to see the speed differences. The results are not that suprising.
bench [master] $ mix run lib/bench.exs Operating System: Linux CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz Number of Available Cores: 8 Available memory: 12.019316 GB Elixir 1.4.4 Erlang 20.0-rc2 Benchmark suite executing with the following configuration: warmup: 2.00 s time: 5.00 s parallel: 1 inputs: none specified Estimated total run time: 42.00 s
Benchmarking pattern_match... Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking pattern_match_bytes... Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking regex... Benchmarking replace_prefix... Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning
You may disable this warning by passing print: [fast_warning: false] as configuration options.
Benchmarking slice... Benchmarking split...
Name ips average deviation median pattern_match_bytes 24.05 M 0.0416 μs ±1797.73% 0.0300 μs pattern_match 22.37 M 0.0447 μs ±1546.59% 0.0400 μs replace_prefix 3.11 M 0.32 μs ±204.05% 0.22 μs slice 1.25 M 0.80 μs ±6484.21% 1.00 μs split 0.75 M 1.34 μs ±3267.35% 1.00 μs regex 0.42 M 2.37 μs ±1512.77% 2.00 μs
Comparison: pattern_match_bytes 24.05 M pattern_match 22.37 M - 1.08x slower replace_prefix 3.11 M - 7.73x slower slice 1.25 M - 19.30x slower split 0.75 M - 32.18x slower regex 0.42 M - 57.00x slower
So, the next time you want to strip prefixing stuff, use pattern matching :)
Update
Based on the comments by @Matt Widmann and @Peter I did a quick test of replacing the tail of the string using the following code:
defreplace_prefix(string, match, replacement) when is_binary(string) and is_binary(match) and is_binary(replacement) do prefix_size = byte_size(match) suffix_size = byte_size(string) - prefix_size
case string do <<prefix::size(prefix_size)-binary, suffix::size(suffix_size)-binary>> when prefix == match -> replacement <> suffix _ -> string end end
defreplace_suffix(string, match, replacement) when is_binary(string) and is_binary(match) and is_binary(replacement) do suffix_size = byte_size(match) prefix_size = byte_size(string) - suffix_size
case string do <<prefix::size(prefix_size)-binary, suffix::size(suffix_size)-binary>> when suffix == match -> prefix <> replacement _ -> string end end
I tweaked the benchmark code a little to run each replace a 1000 times to remove the “too fast” warning.
elixir_benchmarks [master *] $ mix run lib/bench.exs
Operating System: Linux CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz Number of Available Cores: 8 Available memory: 12.019316 GB Elixir 1.4.4 Erlang 20.0-rc2 Benchmark suite executing with the following configuration: warmup: 2.00 s time: 5.00 s parallel: 1 inputs: none specified Estimated total run time: 42.00 s
Name ips average deviation median pattern_match_bytes 15.17 K 0.0659 ms ±18.05% 0.0610 ms pattern_match 14.60 K 0.0685 ms ±17.41% 0.0640 ms replace_prefix 2.52 K 0.40 ms ±21.46% 0.38 ms slice 0.83 K 1.20 ms ±21.95% 1.11 ms split 0.58 K 1.72 ms ±16.76% 1.63 ms regex 0.45 K 2.24 ms ±7.42% 2.22 ms
Comparison: pattern_match_bytes 15.17 K pattern_match 14.60 K - 1.04x slower replace_prefix 2.52 K - 6.01x slower slice 0.83 K - 18.24x slower split 0.58 K - 26.10x slower regex 0.45 K - 33.98x slower Operating System: Linux CPU Information: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz Number of Available Cores: 8 Available memory: 12.019316 GB Elixir 1.4.4 Erlang 20.0-rc2 Benchmark suite executing with the following configuration: warmup: 2.00 s time: 5.00 s parallel: 1 inputs: none specified Estimated total run time: 42.00 s
Name ips average deviation median replace_suffix 2633.75 0.38 ms ±21.15% 0.36 ms split 618.06 1.62 ms ±13.56% 1.57 ms regex 389.25 2.57 ms ±6.54% 2.54 ms slice 324.19 3.08 ms ±19.06% 2.88 ms reverse_pattern_match_bytes 275.45 3.63 ms ±12.08% 3.48 ms reverse_pattern_match 272.06 3.68 ms ±11.99% 3.54 ms
For reverse string removal from the end, replace_suffix is the fastest which makes sense.
However, for removing the prefix, pattern_match_bytes seems to be the fastest. But, it isn’t really truly correct. Because in my instance, I know for sure that the prefix is present.
So, the second best performance for which is pattern_match is 6x better than the current String.replace_prefix implementation.
It may be because I am using OTP 20? I’ll run this on other versions of OTP to compare results. And, if the results are cosistent, will create PR on elixir to change the default implementation.
I am currently working on LiveForm which makes
setting up contact forms on your website a breeze.