A provider of private social-networks had developed a platform to allow organisations to create customised digital communities.
When a new CTO joined a private-social-network company, it quickly became clear to him that they would be unable to execute their strategy unless action was taken to address their AWS infrastructure. The platform, which had been developed to allow organisations to create customised digital communities, was struggling on multiple counts:
Recognising he needed specialist expertise, he called upon our resources to help.
Working with the CTO and the CEO, a three-stage plan of action to optimise the infrastructure and operations was agreed:
The team quickly ascertained that a significant cause of poor service reliability was operations-related. By creating greater separation of duties across the technology team, and introducing more effective change-control processes much of the problems disappeared. In addition, event logging and monitoring using Splunk was set up, so that when issues did arise they could be diagnosed faster.
In parallel, the team analysed infrastructure cost effectiveness. Workloads, platform-service usage and billing were analysed. As a result, services were rationalised and ‘right-sized’, leading to a 75% cost saving within 12 weeks of project commencement — worth $400k per year.
Security was also strengthened through the implementation of a range of improved identity and access-management measures.
The company’s product line had evolved over time into three separate application stacks, which created significant inefficiencies and inflexibilities across engineering and commercial operations. A plan was therefore agreed to consolidate them into one multi-tenant platform based on a microservices architecture that could scale with the business’s planned growth.
An important part of the architectural review was selecting appropriate cloud applications and services to use. In such a rapidly evolving technology landscape, it was important to choose wisely, assessing not just the immediate capability of a given product, but longevity from a support and skill-pool perspective.
Areas the team considered in depth included container orchestration, IaC (infrastructure-as-code), platform monitoring and application intelligence. After evaluation of different options, it was decided to use microservices on AWS’s Kubernetes implementation (EKS), Terraform for IaC, and a combination of Splunk and Signal FX for instructure and security events management, and application observation and tracing.
Work moved onto infrastructure rebuild, and setting up of new devsecops practices.
Pent-up demand for the service meant that speed of execution was vital. Infrastructure refactoring proceeded hand-in-hand with software engineering who were tasked with transforming their code to a more robust microservices-based stack.
Although the software team was relatively large and experienced, it was spread across one inhouse and two outsourced teams based in the UK, India, Belarus, Switzerland and USA. To maximise momentum across such a dispersed team, a cross-functional leadership ‘pod’ was a established. This ensured that the entire team had a clear understanding of the vision, and software and infrastructure development could be tightly coordinated — underpinned by a standardisation of working processes.
With so many moving parts — upwards of a thousand elements across microservices, security groups, IAM roles, AWS services — enforcing common terminology was a small but critical task to avoid confusion, establish unambiguous documentation and audit of processes.
Security played a key part in the team’s methodology — in cloud-component choices, platform design and operations. For example, a devsecops approach ensured CICD (continuous-integration, continuous-delivery) pipelines had appropriate security measures and auditability inbuilt. Full rollout of Splunk, combined with Signal FX, not only supported infrastructure monitoring but also threat detection, prevention and incident response — Signal FX providing the traceability and observability that’s essential in a complex microservices environment.
The company’s business model put its product — private social networks — at the heart of its go-to-market strategy; i.e. product experience driving user acquisition and usage, which in turn drives monetization through premium-feature adoption and ad revenues. A cornerstone of this “product-led-growth” strategy was effective customer-behaviour analytics that could provide decision-support to prioritise feature development and optimise the customer journey and advertising.
As Splunk was rolled out to support platform management, it quickly became clear to the commercial team how it could support their goals. Dashboards and reports were set up to show who, how and when different user types were using a given network.
Enterprise, consumer
Technology security in line with ISO27001 and Cyber Essentials Plus
Splunk
Signal FX
Kubernetes
Terraform
Microservices
AWS EKS, EC2, RDS, S3, SQS, Cognito, Lambda
With platform reliability and performance stabilised, the infrastructure properly autoscaling, security-hardened and costing 75% less to run — and a clear line of sight into user behaviour — the business was primed for growth and fit for a bright future.