Dedistributed LLC
Home | Products and Services | Developer | Pricing | Support | About

Performance Benchmarks

Introduction

Below are performance benchmarks for Dedistributed PTR. We are providing them so you can get a sense of the performance you might reasonably expect with your own applications.

The results were obtained under somewhat ideal conditions. As always with benchmarks, your milage may vary; a more elaborate "real" application will be different, and your own application environment is unique to you. There are many factors that effect performance.

Results in Brief

0.22 milliseconds network-induced latency 
330 nanoseconds per IP address

Round trip latency, or end-to-end call initiation to results-available time, is roughly equivalent to the sum of the OS hardware/software stack and network latency (the time it takes packets to travel from your application to the Dedistributed PLS and back) and the product of the number of IP address and the per-IP address cache lookup cost. Expressed as a simple function:

End-to-end lookup time ~= network round-trip time + IP count * IP lookup latency

Using the results for same Availability Zone test below, we can obtain approximate values for this function empirically. Results show a network round trip time of about 0.22 milliseconds. Lookup latency, meanwhile, is about 330 nanoseconds per IP address. Restated:

End-to-end lookup time ~= 0.22  + IP count * 0.000330 milliseconds

Do Your Own

Consider conducting your own benchmarks. The small application used to generate the results here ("Benchmarks.java") is included in the SDK under "examples". After compilation, you can invoke as

$ java -cp "dedistributed-ptr-saas-sdk-1.0.jar:example/src/java" \
  com.dedistributed.sdk.ptr.Benchmarks   

As will become obvious from the results, if your application is very latency-sensitive, you will want to be sure to take advantage of the ability to specify the Region and Availability Zone you desire for the location of the PLS(es) serving your application.

Setup

Here is what we used in our setup:

Results

The times below are in milliseconds, with each reported duration the sum of all end-to-end call times for 1000 lookups of the indicated call size. A sample for each size is collected 5 times, with a several second pause in between. IP addreses for which PTR records are obtained are randomly selected and pre-generated prior to lookup execution. See the source for Benchmarks.java.

Below the Benchmark excution results is the output of sample collection with the network utility ping. It summarizes 100 closely-spaced pings for comparison.

Same AZ Test Results

Output of Benchmarks.java
Call size: 1
  elapsed time for 1000 calls: 220
  elapsed time for 1000 calls: 243
  elapsed time for 1000 calls: 212
  elapsed time for 1000 calls: 224
  elapsed time for 1000 calls: 225
Call size: 10
  elapsed time for 1000 calls: 251
  elapsed time for 1000 calls: 237
  elapsed time for 1000 calls: 239
  elapsed time for 1000 calls: 236
  elapsed time for 1000 calls: 259
Call size: 100
  elapsed time for 1000 calls: 308
  elapsed time for 1000 calls: 308
  elapsed time for 1000 calls: 296
  elapsed time for 1000 calls: 298
  elapsed time for 1000 calls: 312
Call size: 500
  elapsed time for 1000 calls: 485
  elapsed time for 1000 calls: 490
  elapsed time for 1000 calls: 468
  elapsed time for 1000 calls: 474
  elapsed time for 1000 calls: 455
Call size: 1000
  elapsed time for 1000 calls: 654
  elapsed time for 1000 calls: 656
  elapsed time for 1000 calls: 652
  elapsed time for 1000 calls: 639
  elapsed time for 1000 calls: 648
Call size: 2000
  elapsed time for 1000 calls: 990
  elapsed time for 1000 calls: 972
  elapsed time for 1000 calls: 985
  elapsed time for 1000 calls: 1007
  elapsed time for 1000 calls: 995
Call size: 4000
  elapsed time for 1000 calls: 1633
  elapsed time for 1000 calls: 1621
  elapsed time for 1000 calls: 1669
  elapsed time for 1000 calls: 1611
  elapsed time for 1000 calls: 1631
Call size: 8000
  elapsed time for 1000 calls: 2954
  elapsed time for 1000 calls: 2851
  elapsed time for 1000 calls: 2875
  elapsed time for 1000 calls: 2982
  elapsed time for 1000 calls: 2838
ping times
$ ping -i 0.2 -c 100 3.219.98.102
PING 3.219.98.102 (3.219.98.102) 56(84) bytes of data.
64 bytes from 3.219.98.102: icmp_seq=1 ttl=254 time=0.339 ms
64 bytes from 3.219.98.102: icmp_seq=2 ttl=254 time=0.463 ms
... snip of 96 lines ... 
64 bytes from 3.219.98.102: icmp_seq=99 ttl=254 time=0.294 ms
64 bytes from 3.219.98.102: icmp_seq=100 ttl=254 time=0.288 ms

--- 3.219.98.102 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 20195ms
rtt min/avg/max/mdev = 0.263/0.329/0.752/0.086 ms

Test Results for Different AZ, Same Region

To understand the performance impact of using Dedistributed PTR across Availability Zones but within the same AWS Region, we spun up another client on an EC2 instance of the same type used above but in a different AZ. The client was located in AZID use1-az1, rather than in use1-az6.

Output of Benchmarks.java
Call size: 1
  elapsed time for 1000 calls: 527 ms
  elapsed time for 1000 calls: 552 ms
  elapsed time for 1000 calls: 649 ms
  elapsed time for 1000 calls: 572 ms
  elapsed time for 1000 calls: 550 ms
Call size: 10
  elapsed time for 1000 calls: 562 ms
  elapsed time for 1000 calls: 562 ms
  elapsed time for 1000 calls: 585 ms
  elapsed time for 1000 calls: 556 ms
  elapsed time for 1000 calls: 550 ms
Call size: 100
  elapsed time for 1000 calls: 674 ms
  elapsed time for 1000 calls: 640 ms
  elapsed time for 1000 calls: 637 ms
  elapsed time for 1000 calls: 618 ms
  elapsed time for 1000 calls: 644 ms
Call size: 500
  elapsed time for 1000 calls: 848 ms
  elapsed time for 1000 calls: 806 ms
  elapsed time for 1000 calls: 747 ms
  elapsed time for 1000 calls: 740 ms
  elapsed time for 1000 calls: 745 ms
Call size: 1000
  elapsed time for 1000 calls: 897 ms
  elapsed time for 1000 calls: 907 ms
  elapsed time for 1000 calls: 900 ms
  elapsed time for 1000 calls: 916 ms
  elapsed time for 1000 calls: 900 ms
Call size: 2000
  elapsed time for 1000 calls: 1202 ms
  elapsed time for 1000 calls: 1240 ms
  elapsed time for 1000 calls: 1207 ms
  elapsed time for 1000 calls: 1282 ms
  elapsed time for 1000 calls: 1228 ms
Call size: 4000
  elapsed time for 1000 calls: 1844 ms
  elapsed time for 1000 calls: 1960 ms
  elapsed time for 1000 calls: 1930 ms
  elapsed time for 1000 calls: 1861 ms
  elapsed time for 1000 calls: 1878 ms
Call size: 8000
  elapsed time for 1000 calls: 3181 ms
  elapsed time for 1000 calls: 3352 ms
  elapsed time for 1000 calls: 3176 ms
  elapsed time for 1000 calls: 3178 ms
  elapsed time for 1000 calls: 3130 ms
ping times
$ ping -i 0.2 -s 64 -c 100 3.219.98.102
PING 3.219.98.102 (3.219.98.102) 64(92) bytes of data.
72 bytes from 3.219.98.102: icmp_seq=1 ttl=254 time=0.752 ms
72 bytes from 3.219.98.102: icmp_seq=2 ttl=254 time=0.614 ms
...  96 lines ...
72 bytes from 3.219.98.102: icmp_seq=99 ttl=254 time=0.566 ms
72 bytes from 3.219.98.102: icmp_seq=100 ttl=254 time=0.562 ms

--- 3.219.98.102 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 20194ms
rtt min/avg/max/mdev = 0.547/0.602/0.933/0.066 ms

A Few Observations

At smaller sizes, end to end call latency is dominated by the client-to-server round trip time (RTT), corresponding closely to the network ping times.

As one might expect, overall throughtput is highly sensitive to call size. With a call size of 2000 IP addresses, a rate slightly above 2 million per second is achieved (higher with still larger call sizes). With a call size of 100, one achieves only a fraction of this rate, about 300,000 per second.

Alas, no matter how hard we might try, one cannot move data faster than the speed of light. Or for that matter, 60% of the speed of light (if it is traveling through glass fiber).

Conclusion

Latency is basically a function of the speed of light and the distance packets must travel between your application and the assigned Dedistributed PLS. There's no getting around the laws of physics. You should account for this and ensure your application connects to a PLS located in the same AWS Availability Zone datacenter. If it does, you should expect a base latency of better than 0.3 milliseconds. Depending upon call size, you can expect throughput of above 2,000,000 per second.

Site Terms | Cookie Policy | Privacy | Terms of Service