Leveraging Cloud Technology as Trading Infrastructure

Trading systems have long exploited new technologies to give market participants advantages in understanding, strategy, and timing to improve outcomes. The increasing exploitation of latency reductions for market advantage has made co-location with major venues critical — leading market participants to make increasing capital expenses for on-premise and co-lo infrastructure.

For much of IT, the major trend in the second decade of the century is the move to leverage major cloud service providers to provide flexibility, innovation, and capital efficiency in providing services. Regulatory, control, and latency requirements for trading systems have made them one of the last application areas to move to the cloud. OpStack believes that with thoughtful architecture it can be remarkably effective to build and operate trading systems using a hybrid cloud approach.

In consultation with the lead architect at a financial services firm, OpStack developed this reference infrastructure to allow leveraging public cloud services to substantially reduce the capital investment needed to enter a new market without sacrificing any competitive advantage to other participants fully co-located with the electronic trading venues. In this architecture, the presumption was to use cloud neutral technologies (Docker, Kubernetes, et al) with the possibility of replacing some of the less latency sensitive elements with PaaS elements — we envisioned the trade history databases leveraging a cloud DBaaS (Database as a Service) offering rather than building and maintaining dedicated clusters from scratch on cloud IaaS (Infrastructure as a Service).

OpStack uses discussion documents, like this one, as a means of creating conversations with partners that lead to business-specific solutions. We specified this candidate architecture in a fair amount of detail in order to illustrate its viability and cost-effectiveness, with the understanding that a final solution for any client and market will be at least adjusted if not redirected by that ongoing discussion.

Requirements and Best Practices

Requirement Benefit
Low-latency access to Electronic Communication Networks (ECN) Competitive trading posture in time sensitive markets
Hardware infrastructure (servers, switches, and firewalls) that support the exploitation of low-latency data access through co-location with ECNs The ability to translate algorithms and strategies into profitable transactions executed at the speed of the market
Infrastructure does not dictate limits of application capabilities Infrastructure needs to support what the business and market require and be able to adapt as those requirements evolve
Automated CI/CD pipeline fully integrated with infrastructure automation Removing manual tasks for code deployment and infrastructure changes reduces the opportunity for error and lowers time-to-market for new capabilities
Consistent environments across both production and pre-production Minimizes code promotion costs and risks, critical for rapidly changing applications
Ephemeral infrastructure with servers, storage, network, and (optionally) containers fully defined in software The application isn’t tied to any hardware provider or cloud vendor and can be moved or scaled up or down as requirements change
Parallel independent environments, both production and non-production Parallel new version development and production support — plus blue/green testing with production data
Patched and compliant with stateful enforcement Provides infrastructure that is both dynamic and auditably compliant
High availability and disaster recovery should be inherent to the operational platform Failure of a server, service, or site should not take the desk out of the market

Solution Elements

OpStack Baseline Operations Stack for Trading Systems

The tools below have been proven and tested in complex, regulated infrastructure environments. They are a combination of commercially supported open source and best of breed. Substitutions to provide desired capabilities or to conform to existing corporate standards is always possible and often desirable. Capacity modeling, an important capability for trading systems are currently all proprietary but should be part of the solution stack.

The accompanying Stack Builder is solution tailored, but it includes many of the technologies and tools used or incorporated into recent OpStack Engagements to meet the needs of our clients. Not all of the tools in any category are required to automate infrastructure for any single environment or application, OpStack will select the smallest set that fully meets the requirements of the application suite being hosted.

Co-Lo versus Cloud or Co-Lo with Cloud Deployment

The approach proposed allows for development, test, and production environments to be deployed on in the cloud or in the desired trading-advanted co-lo (e.g. the Equinix NY4 data center.) There are cost, complexity, and opportunity trade-offs to be made in the proximity of hosting of trading engines to the ECNs (NYSE Bonds, Bloomberg, Instinet, …). Only as the trading strategies and algorithms are developed and the engine deployed to execute them within the trading application will it be definitive as to the advantage or disadvantage of relative differences in latency. The number of firms that have chosen to co-locate with trading venues in order to effectively pursue low-latency and high-frequency trading strategies argues for opting to minimize latency.

The OpStack approach to defining, deploying, and operating the trading infrastructure is equally applicable to any mix of cloud and co-lo deployments, the table below lays out some of those options.

Hosting Options

Option Example Pro Con
Cloud Virtual AWS EC2 Virtual Low-startup costs

Expandable in minutes

Can use AWS Direct Connect
Least control of consistent performance

Network is shared, firewalls are virtual and shared

Highest latency option - WAN not fibre-LAN, latencies
Cloud Physical AWS EC2 Physical Higher cost, all OpEx

Expandable in days

Can use AWS Direct Connect

Allows option for Real-TIme Linux OS
Network is shared, firewalls are virtual and shared

High latency - WAN not fibre-LAN, latencies
NY4 Co-Lo Virtual HPE Gen 10 Servers

Arista Switches

Palo Alto Firewalls
Sub-millisecond latency to ECNs

Fully independent network and firewalls

Ephemeral deployment of resources across

Ability to distribute load across servers

CapEx option
Capacity upgrades require hardware purchase and installs

Additional hypervisor latency

Requires standard Linux or Windows OS, real-time option irrelevant

Network and firewalls dedicated
NY4 Co-Lo Physical HPE Gen 10 Servers

Arista Switches

Palo Alto Firewalls
Sub-millisecond latency to ECNs

Real-Time Linux option for fully predictable performance

Lowest latency option with COTS hardware and software

Fully independent resources
Capacity upgrades require hardware purchase and installs

Workloads partitionable at the server level (though virtual and physical can be mixed in cage)

The optimal solution is to mix and match the above options as the application is developed. The strength of the design is that the infrastructure is fungible with little or no change required to the trading applications.

Development and functional QA environments can be provisioned in cloud or on-prem virtual environments that are software identical to production. As the number of these environments will be variable with the development and testing cycle (and multiple parallel development streams are common amongst trading shops) they can be created, spun up, spun down, and released at any time with nominal effort. Algorithm testing against real-time data or capture/playback facilities would benefit from environments identical to production — whether cloud or co-lo, but on high performance hardware and production latencies.

A hybrid-cloud approach, using:

is OpStack’s recommendation as the optimal solution in 2020’s technical and trading environment that would be both market competitive and cost-effective.

Representative Co-Lo Configuration

This hardware configuration provides a representative hardware and system software configuration for the Co-Lo Physical and CoLo Virtual options. The equipment proposed is state-of-the-practice and the configurations based on the assumptions laid out by our client prior to the final definition and creation of the platform. The pricing provided was quoted by the manufacturers and is not reflective of competitive procurement practices or negotiated discounts.

Design Approach

The infrastructure is designed to:

  1. Provide high-performance execution engines to be housed in a single rack within the Equinix NY-4 data center.
  2. Redundant UPS-protected power distribution is provided in rack as standard at Equinix.
  3. High-performance, high reliability, industry-standard components have been selected. No GPU co-processors have been specified, but they could easily be added to the strawman design if advantageous.
  4. The design assumes connectivity to both AWS (Direct Connect) infrastructure and to dedicated fibre drops to ECNs co-located at NY-4 or any other latency-advantaged data center.
  5. All hardware is redundant and configured so that the loss of one component of any type or of any one connection would not disrupt operations.
  6. AWS or another cloud provider will be used to house administrative systems, provide a storage target for backups, and host development as well as monitoring and control infrastructure — only the low-latency components need premium hardware and minimum latency location.
  7. Our assumption is that the ECN connections will terminate at the Arista switches (for lowest latency) rather than at the Palo Alto firewalls. This is a design decision to revisit as the application topology is finalized.
  8. The network design provides for external connections at the standard speeds coming from those providers with a 40 gigabit internal network to provide for low latency, DMA-capable, communications between all elements of the trading systems.
  9. There is expansion space both in the rack and in the networking components (including ports on the servers) to support the addition of additional servers, specialized GPU compute hardware, or a SAN storage array.

Hardware Components

Role Selection Notes
Server Hardware 5 x HPE DL360 Gen10 with 512 GB of memory, and 2.5 TB of useable internal RAID-5 SSD storage With the small server count in the initial request, we’ve specified high-quality, high-reliability server hardware with excellent out-of-band management capability
Network Switching 4 x Arista 7010 and 2 x 7050 switches Fully software configurable 40 gigabit fiber optic core distribution switching and gigabit copper switching in front of the firewalls.
Firewalls 2 x Palo Alto PA-3220 Full software configurable threat prevention firewalls with anti-malware and URL filtering
Total Hardware    

System Software Components

In addition to the relevant operations configuration and management components selected from the Operations Stack, the licensing and support for a container-based low-latency trading application include the following two items. Both Real-Time Linux and Kubernetes are open source products and there is a very wide range of list-prices for support and commercial licensing. The choice of distribution for these products will have little to no impact on performance and little impact on the overall cost of environment setup.

Role Selection Notes
Operating System Real-Time Linux We presume that the real-time kernel will be preferred in order to provide predictable performance on time-sensitive compute functions.
Container Orchestration Kubernetes Includes Docker containers and a full Kubernetes orchestration capability with management console.

Logical Connectivity Network Diagram

Logical Connectivity diagram
Logical Connectivity - February 18, 2019

Rack Elevation Diagram

Rack Elevation diagram

OpStack Services

Project Phases - Months not Years

Phase Objective Key Milestones Effort
Dev and Test Environment Scripted Builds in AWS Enable developers to actively work on the pre-production product
  • Basic environments deployed in AWS on demand tied into CI/CD tooling
  • Supporting infrastructure deployed
    • Satellite
    • Gitlab
    • Ansible
480 hours (4 weeks, 3 named resources)
Physical Environment Build with Scripted OS, Network, and Security Configuration Deploy and configure physical compute, network, and security infrastructure for production environment
  • All hardware installed and configured
  • Container platform deployed
  • Environment securely connected to required tools and services
360 hours (3 weeks, 3 named resources)
Production Application Deployment Integration and Automation Deploy and configure production application components, monitored
  • Integrate production environment into CI/CD environment
  • Deploy and configure Splunk integration
  • Integrate CI/CD deployment process into Change Management
600 hours (5 weeks, 3 named resources)
Ongoing Support Ensure operational excellence
  • Monitor infrastructure
  • Break/Fix
  • Code releases
  • Maintenance
  • Patching
SLA - 10 minutes to callback 8-6 ET

Assumptions and Notes