Exponential vs. Linear Time in Ruby

Visualizing exponential vs. linear growth is as simple as plotting y = x vs. y = x^2. Recognizing it in your code is another thing.

Let’s say we are given an array. Our boss tells us we need to check that array for duplicates and give them back an array with only unique elements. “If I see one duplicate you’re fired!”

Wow they’re not messing around. To be safe we decide to start with an empty array and add each element from our starting array to it, but only after making sure it’s not in there already.

def random_array size
  Array.new(size) { rand(1..size) }
end

def check_for_dupilicates_in_array check_array, final_array
  check_array.each do |element|
    final_array.push element if !final_array.include? element 
  end
  final_array
end

random_array gives us our starting product to modify and check_for_dupilicates_in_array describes our first frantic attempt at doing our job. To start off we’ll be reviewing arrays of size 10,000.

check_for_dupilicates_in_array random_array(10000), Array.new

Then it happens. Like the great razor blade war of whatever year they came out with those 6 blade razors, people wanted bigger arrays. We stopped reviewing arrays of 10,000 and started reviewing arrays of 100,000. Our work output suffers, significantly more than anticipated.

We reach out for help and a friend informs us of a feature we had all along in ruby called benchmark that would allow us to see how long it takes for us to do something.

require "benchmark"

# our working code here

Benchmark.bm do |bm|
  bm.report("1E4: ") { check_for_dupilicated_in_array(random_array(1000), Array.new) }
  bm.report("1E5: ") { check_for_dupilicated_in_array(random_array(10000), Array.new) }
  bm.report("1E6: ") { check_for_dupilicated_in_array(random_array(100000), Array.new) }
end

       user     system      total        real
1E4:   0.010000   0.000000   0.010000 (  0.003280)
1E5:   0.250000   0.000000   0.250000 (  0.249644)
1E6:  24.320000   0.040000  24.360000 ( 24.413419)

The jump from 100,000 to 1,000,000 does a great job of showing the exponential growth in time that occurs as we look through an array and examine another nested array along the way.

Armed with the power to measure our performance, we seek improvements. Array * Array = problem, so we call our friend who knew about benchmark. "Make a hash." Ok. We make our new method and throw a benchmark in there right away.

## other code
def create_hash_from_array check_array, new_hash
  check_array.each do |element|
    new_hash[element.to_s] = "1"
  end
  new_hash
end

Benchmark.bm do |bm|
  bm.report("1E4: ") { create_hash_from_array(random_array(1000), Hash.new) }
  bm.report("1E5: ") { create_hash_from_array(random_array(10000), Hash.new) }
  bm.report("1E6: ") { create_hash_from_array(random_array(100000), Hash.new) }
end

       user     system      total        real
1E4:   0.000000   0.000000   0.000000 (  0.001319)
1E5:   0.010000   0.000000   0.010000 (  0.010179)
1E6:   0.150000   0.000000   0.150000 (  0.156402)

This time we’re just creating a hash with a key for each element of the array and setting the value to one. If we come across the same element twice, we will just be updating the key:value pair to one. Once per element == Linear Time. The time growth isn’t exactly linear but it’s clearly not exponential.

We just realized though, our boss needs an array. So we ask our friend Benchmark but this time start with our job description. They tell us about .uniq, which comes as part of Array in ruby.

require "benchmark"

Benchmark.bm do |bm|
  bm.report("1E4: ") { random_array(1000).uniq! }
  bm.report("1E5: ") { random_array(10000).uniq! }
  bm.report("1E6: ") { random_array(100000).uniq! }
end

       user     system      total        real
1E4:   0.000000   0.000000   0.000000 (  0.000923)
1E5:   0.010000   0.000000   0.010000 (  0.009700)
1E6:   0.090000   0.000000   0.090000 (  0.091487)

Here as we measure the performance of calling .uniq! on arrays growing by a factor of 10 we are not disappointed. Our frantic methods are dominated by the years of improvements and fundamentals built into the Ruby core. Not only is calling uniq the most efficient, but the increase in execution time most closely resembles linear growth.