OpenStack your private
cloud

OpenStack is the most widely deployed open source cloud software in the world. It is a proven-at-scale set of software components that provide common services for cloud infrastructure.

OpenStack is a massive collaboration, created and used by, among others, Rackspace, NASA, CERN, Ubuntu, Red Hat, Oracle, HP, Walmart, PayPal, and Telos Digital. It contains all the necessary cloud tools for infrastructure, resilience, configuration, virtualisation, deployment and monitoring, and is entirely open source.

As private cloud, it delivers this at lower-cost, with greater-flexibility, and with privacy and provable trust — this cannot be obtained from proprietary vendors.

However, this combination of flexibility, customisability, and axe-proof reliability does come at the cost of considerable complexity, investment, and a steep learning curve. That's where we can help.

Benefits of OpenStack

OpenStack is a powerful, capable and proven platform. Here's why we especially recommend it.

  • Capability: OpenStack is mature, stable, feature-complete, and industrial-strength. It can do everything that the other major cloud platforms can do, at scales from small to massive.
  • Open source: control of your own destiny, flexibility, compatibility, no license fees, lower cost, no forced-upgrades, no vendor lock-in; no mandatory obsolescence. More details.
  • Flexibility: OpenStack is highly configurable, so it can easily fulfil bespoke requirements.
  • Ownership: long-term savings, especially as good hardware typically lasts > 5 years.
  • Privacy and Security: full control of access, and data-protection. No need to delegate trust. You own the system, and you set the policy. You control the physical security too.
  • Jurisdiction: After the Snowden revelations, and the Schrems judgements, and in the context of the GDPR, it's critical to know where your data resides, and how it moves. OpenStack means you know exactly where your data is processed and stored.
  • Know your virtual neighbours: who are the other VM tenants sharing your physical server? Can you trust them not to be malicious, and can you trust the virtualisation to be perfect? 
  • Dedicated disaster recovery: should something go wrong, your team will focus on your needs; whereas a vendor may have to spread their senior support engineers across multiple clients, all having the same crisis. 
  • Backups: OpenStack multi-site replication is easy. 
  • Energy saving and CO2 reduction: monitor consumption and minimise waste; select green electricity; choose hardware for power efficiency; minimise software-waste. 

Flexibility

The biggest advantage of OpenStack is how well it can be tailored to your precise needs. Here are some examples:

  • Very large VMs: need an 88 core VM with 4 TB of RAM? No problem. Want a really small one (1 core, 128 MB)? Easy.
  • Very fast CPUs: Servers typically optimise for massively-multicore CPUs (e.g. 2× 48 cores, at 2.1 GHz). But some workloads don't parallelise. Perhaps you need to run a single-threaded C.I. test that takes 30 minutes; if you can have an 8 core CPU that turbos at 4.4 GHz, your downtime for critical production hot fixes becomes 15 minutes shorter.
  • Tunable storage: one size doesn't fit all. Mixing SSD and HDD is straightforward; Ceph replication levels can be tuned according to importance: data is triplicated; VM images are duplicated; cached files are only kept once. 
  • Unusual data storage: For example, a client needed 0.5 PB of storage, at a comparatively low bandwidth, with a critical requirement to minimise cost. 
  • Hardware integration: non-standard equipment is easily added to your cluster, in your rack. For example, industrial process-control hardware, science and metrology interfaces, multiple communications interfaces, video-processing, FPGAs, GPUs, even lava lamps. 
  • Hyper-convergence: loading each server with both disk and CPU reduces cost (no dedicated SAN required) and improves storage performance and reliability. 
  • Mixed environments: run different operating systems or different hardware. Any O.S., even esoteric ones can be quickly deployed from images.
  • Local A.I. inference: if you need to run your A.I. workloads locally, for stringent privacy protection, you can, even with unconventional GPU configurations.
  • Dev/Prod workload balancing: you control the CPU and memory allocation policy. When production-workloads are quieter, batch-processing and developer VMs can have generous amounts of CPU allocated. But at times of peak load, unexpected load, or hardware failure, production is protected, and development is slowed or paused. 
  • Free failover: multi-site resilience, (a 2nd cluster on a different site for disaster-recovery failover) can be obtained for almost no extra cost. 

Cost Comparison

We know, from our own experience, and detailed benchmarking, that OpenStack is approximately 1/3 of the price of proprietary cloud systems, like-for-like. If this sounds surprising and “too good to be true”, here is the commercial evidence that proprietary cloud is overpriced, and the underlying technical principles that show why.

  • AWS contributed 2/3 of Amazon's total profits, on just 15.8% of their total income, in 2023. 
  • CERN have been using OpenStack for 14 years, using the savings to advance the state of scientific understanding; while AWS bills enough to contribute to the 2nd largest yacht in the world.
  • Others are repatriating their cloud services and making huge savings from cloud-exit: Hey and Basecamp will save $10 million over 5 years by leaving AWS to run on their own hardware.
  • PostgreSQL (open source database) is radically cheaper than CosmosDB (Microsoft Azure cloud NoSQL database) for some common workloads: we benchmarked it at ∼23, 000 × less expensive! 
  • Storage is particularly expensive on AWS: if you need very large amounts of disk, it can be far more expensive to rent than to buy. We benchmarked it at ∼ 57 × more costly. 
  • Breaking-bulk is highly profitable. The owner of the physical server (typically 96 CPU cores) breaks it up into multiple virtual machines (typically 2-8 cores). Common workloads allow this capacity to be 'overbooked' (just like airlines do with passengers) by a factor that can approach 10 ×
  • License costs: most users of OpenStack avoid proprietary software, making further savings in cost (and complexity) by using a fully open-source ecosystem of infrastructure and applications. 
  • Egress costs: it used to be cheap to put data into AWS/S3, and prohibitively expensive to get it out again. Thanks to the European Data Act (2024), this is no longer the case.

Proprietary Cloud

The various proprietary public cloud systems (AWS, Azure, etc.) do function perfectly well: that's why so many companies use them. While we consider OpenStack to be the better choice in almost all cases, we believe businesses should always tell the whole truth, even the counter-arguments against their own position. So, here's when choosing your own (or shared) OpenStack instance is not optimal:

  • Very small scale: when you only need a couple of small servers (a few CPU cores), e.g. to host one non-transactional website, or company wiki. Here, paying ∼£30/month isn't the best price per server, but it's the cheapest use of your time.
    We recommend Gandi: reliable, inexpensive, straightforward, and incidentally, powered by OpenStack.
  • Bursty workloads: when you have massive demand that is really spiky, in a predictable way, such that you need to provision a huge amount of capacity for a short time, then de-provision it when not in use. For example, a service providing data or video-streaming during a major sporting event, or an investment-fund that wants to periodically test a new financial model (using 1000 CPU cores, for 3 hours, once a month).
    If you only need your resource 0.5% of the time, then it's worth paying a much higher price per CPU-hour, in return for only paying for the hours you use. 
  • Already Locked In: although the main cloud workloads (e.g. Linux/Windows, PostgreSQL/SQLServer, etc.) can run on anything, the cloud management tools are intentionally not portable. Similarly, cloud vendors often add their own wrappers to open-source tools to make them "easier to use" stickier. IBM's Bluemix is a good example.
    If you haven't already made the mistake of needlessly tying your deployment processes to proprietary cloud management tools, please beware!
  • No Sysadmins: if you haven't got any skilled sysadmins of your own, you have to use a 3rd-party cloud. But, using AWS (etc.) correctly, choosing the right options, and avoiding risk of over-billing is still a highly-skilled, complex, and time-consuming operation. We can help.
  • Money no object, or “no one ever got fired for buying IBM ”. AWS and Azure are 'reassuringly expensive', with advertising and corporate outreach to match. If you have infinite money and very limited time, go to AWS. If you want a better deal, come to us.
  • Legal accountability: everyone wants legal recourse for consequential loss from downtime… but in reality, every cloud service provider and data-center operator has terms that prevent this: most SLAs only entitle you to a proportional refund for the fractional downtime.

Why Telos Digital ?

So, if you want to exit the cloud, why should you come to us?

OpenStack is awesome, but it isn't straightforward: there is a steep learning curve requiring lots of experience. It also requires investment in a quorum of redundant hardware to operate reliably. We have the track record of capability, trust, and success.

We can deliver this quickly, with a payback-period of significantly less than a year. For example, we've saved two small businesses £200k and £250k respectively.

If you have complex requirements, we can help with design, migration, or porting. A “lift-and-shift” is a good opportunity for rationalisation, upgrades, and improvements.

Our team of expert sysadmins can help when you can't do everything yourself, or you have a small team which needs assistance, quorum, training, or extra coverage.

We can offer the applications as well as the infrastructure, varying from deploying standard applications, such as MediaWiki, RT or Moodle, to building bespoke applications, or connections between them.


OpenStack 4 Ways

Virtual Machines
VMs on Telos Digital's cluster.
We provision individual virtual machines on our cluster for you. You benefit from the lower-cost and greater flexibility of OpenStack.

This is the easiest way to get started.
  • Lowest cost, most granular option.
  • Resilience is obtained by pooling the overhead across multiple tenants.
  • You have root access on your own VMs, and a management console.
  • VMs available from 1-core to 32-core, from 256 MB to 256 GB of RAM.
  • Sysadmin support available on request.
  • Shared infrastructure is available: monitoring, deployment, configuration, imaging.
Business in a Box
A set of integrated business services.
We provide a suite of integrated business and development tools, customised and configured for you.

We manage everything for you: no sysadmins required (unless you want).
  • Suite includes: MediaWiki, WordPress Moodle, NextCloud, Gitea, RequestTracker, RoundCube, JitSi, SuiteCRM.
  • Integrated tools, Keycloak provides single point of truth user directory, permissions and sign-in.
  • Professional look: your brand, your theme, your domain.
  • Custom requirements, bespoke systems, and specialist services available on request.
  • Backups, monitoring, and SLA included.
Managed Cluster
Fully managed, dedicated cluster.
We provision a dedicated cluster for you, and it can grow and adapt with your needs.

We manage the data center hosting, provide and maintain the physical machines.
  • Long term contract, with option to buy outright.
  • We run the system for you; you also have full root access.
  • No other tenants on the cluster, guaranteeing privacy and security.
  • Bespoke hardware is available on request, e.g. maximum clock-speed CPUs, or customised storage solutions.
  • Backups, monitoring, and SLA included.
Turnkey Setup
We build it; you control it.
We specify, obtain, configure, provision, and document an OpenStack cluster according to your requirements.

We will then hand over the rack of machines, removing our credentials.
  • You own the hardware: this gives you the lowest long-term cost (optimising OpEx vs. CapEx).
  • You have full control (we can, of course, still advise, train, and assist on request).
  • Telos Digital never have access to your data, and you can prove this.
  • Everything is Open Source, so no license-fees; no lock-in.
  • Custom hardware is available for bespoke requirements, including VMs up to 88-core CPUs and 4 TB RAM.

Bespoke requirements are very welcome. We can design contract terms to help you balance your OPEX vs. CAPEX, and facilitate your growth.

Contact Telos Digital



Technical Notes

I.   Trusted Systems

A “Trusted System” is defined as “a system whose failure can break your security-policy.”
In other words, a system that we have no choice but to trust.

For example, the brake-cables in a car are a trusted system: if they fail, you will crash. They remain a "trusted system", even if they are corroded or tampered with!

So, trust, in this respect, isn't necessarily a good thing, and we should always seek to maximise reliability by minimising the number of trusted systems and failure modes. This isn't just about preventing downtime-failures, but about preventing data-loss; another way is to have defence-in-depth.

OpenStack minimises trust in various ways:

  • Redundancy: duplicated networking at all layers: from the upstream fibre, via switches, ethernet-cards; duplicated power-sources and power-supplies. A faulty part can be replaced while the system remains online.
  • Error-correction: triplicated storage (Ceph, or RAID); ECC RAM. Errors can be detected and corrected.
  • Quorum: control nodes always exist in threes, so that if one fails, there is still a safety margin during the time it takes to replace the first failed unit. There is always sufficient hardware to keep services online if one device should fail.
  • Applications: ensure the applications and operating systems (host and guest) are always patched, audited, and pen-tested. This creates defence-in-depth: no one mistake should lead to compromise.
  • Privacy: encryption everywhere, including HTTPS and encrypted filesystems. 
  • Physical security: the data center itself must be secure against external threats, with demanding verification on physical access control to prevent theft. Tier 3+ data-centers are usually suitable.
Panopticons are dangerous

Certain networking "cybersecurity" products take the approach of “we'll protect you from external threats, but in order to do so, we need total visibility” This is dangerous, because it means that network-security, and endpoint-security have to be effectively disabled in order to decrypt the traffic so that the vendor's proprietary (and un-auditable-by-you) monitoring agent can check it — as a result, multiple layers of the Swiss-cheese safety-model are reduced to a single slice.

This panopticon approach is superficially tempting, but it makes your entire network vulnerable, all at once, to a single-point of vulnerability! Such vulnerabilities can arise from a defect in the security-product itself, a compromise of the person who runs it (who now has total access), or a class-break.

Although the Ancient Romans understood this problem, there remains a significant market for such products, which claim to make you safer and actually have significant disadvantages. For example, deep-packet-inspection firewalls , 3rd-party email-scanning and link-protection (Proofpoint ), and network-panopticons (Darktrace ). The worldwide Crowdstrike IT outage (July 2024, damage estimated ∼$10bn), demonstrates the risk of this trade-off: frequent marginal benefits vs. with occasional catastrophes. There are better ways to do cybersecurity.

Should you trust Telos Digital? We believe that we deserve your trust (and encourage you to audit; open-source systems are inherently more trustworthy too). However, if you choose the "Turnkey Setup" option, you never have to trust us with your data — not even in principle — and that's a good thing.


II.   Reliability: Axe-proofing

OpenStack is highly resilient, with a great track-record. We achieve this by combining: highly reliable infrastructure; an expert and dedicated team of sysadmins; automated monitoring; and a well-tested and rehearsed process for fast and simple recovery in the worst-case.

Here is how to make a typical application (e.g. database-driven system with a web or API front-end) really robust:

  1. Start with high-quality, well tested hardware. We have used the HP DL380 series since 2009, and the reliability has been exceptional. Almost every component has redundant pairs: PSU, disk, network and ECC (error-correcting) RAM. 
  2. Rack these in a well-chosen data-center, (Tier 3+) who can provide highly resilient power, cooling, and multiple data-routes. The D.C. should also provide good physical security with audits and access-control, and remote access. 
  3. Virtualise the machines on OpenStack. Internal, and upstream network and power are already redundant. But should any physical host fail, the VMs can automatically migrate to another host.
    We call this axe-proof: go into the data-center, select any one item (server, disk, cable...) and destroy it with an axe … and the application will just keep going!
    We actually test just by suddenly disconnecting the cables, but if you were willing to destroy a £25k server, or risk electrocution, you could use a literal axe.
  4. Distributed storage, with Ceph. RAID is good, but Ceph is better, sharing data across multiple machines. This is triply, not doubly, redundant, has better performance, and is even more resilient against hardware problems (such as a power-supply surge that could potentially destroy multiple RAID disks in a single physical machine at the same instant). 
  5. Other techniques include the use of HAProxy, and some database-caching. Sensible database design is important (many ORMs don't utilise the query-planner very well), while the use of containers (LXC/LXD) reduces the overhead of virtualisation, particularly for RAM.
  6. Duplicate the application across two clusters in duplicate data-centers, and constantly replicate/sync the data between them. A complete data-center failure, exceeding a few minutes downtime, is a very rare event, so this may not be considered necessary. 
  7. Offsite backups should be generated and synchronised daily, and then stored encrypted - ideally with one copy online (remotely accessible), and another copy offline (requiring physical interaction). 
  8. Really sensitive data (e.g. master certificate keys) should be split, and stored in separate pieces, requiring an n-of-m set of sysadmins to be simultaneously physically present, with their hardware tokens and their memorised passwords.