Most IT professionals are familiar with the typical cycle of IT projects, which usually run through ‘design’ and ‘build’ phases before moving into production in ‘run’.
At ProofID, our experience is that many organizations focus a lot of time an energy on the first two stages, ensuring that the system is architected appropriately and built to address all functional and non-functional requirements. However, often, the challenges of the ‘Run’ stage are not considered up front, leading to significant operational difficulties and unplanned costs as the service enters production. We have found this to be equally true for Ping Identity projects; in this series of blogs, we will explore the challenges faced by organizations running Ping Identity deployments in production, and outline some possible solutions to ensure organizations get the best out of their investment in the design and build stages of their deployment.
Ping Identity makes fantastic Identity and Access Management products, and one of their best characteristics is their stability in run – Ping servers failing or crashing is not a common problem. However, even with a stable platform, there are significant challenges involved with maintaining a complex enterprise platform such as PingFederate or PingAccess; furthermore the nature of these products means that any outages will lead to significant disruption, as users and customers will not be able to authenticate to key applications – in some industries such outages can come with a significant price tag. So, what are the issues which can lead to problems in run?
Hard to find, niche skills
Identity and Access Management, whilst being a pervasive technology, requires a deep level of niche skills in order to manage and troubleshoot a platform effectively. Not only are vendor specific skills required, but additionally engineers need a thorough understanding of relevant standards and protocols such as SAML, OpenID Connect and SCIM. As an integration technology, often faults in the wider IAM ecosystem may not be related to the IAM system itself; however, to isolate the problem, the issue must be tracked from source – requiring the ability to understand at the transaction level how authentication is processed. Such skills can be expensive and hard to find, and aren’t always factored into the total cost of ownership for an IAM platform.
Staging of configuration on the ‘route to live’
Most modern organizations maintain multiple replicated environments on the ‘route to live’. Typically, this will include one or more development environments, with configurations then being staged into pre-production and production environments. When a new connection is added into a Ping environment, for example to provide SSO to a new application, the configuration will typically be created and tested in the development environment, before being replicated and tested in pre-production and finally production.
The nature of the underlying protocols means that such configuration changes require many configuration steps – sometimes as many as fifty individual changes may be required to integrate a new application. If carried out manually, the configuration must be painstakingly documented prior to being staged through the environments – ProofID has experienced such documents stretching to over 70 pages for a single change. Not only is this highly inefficient, but there is a high chance of human error being introduced, either in production of the documentation or in its execution. Such errors will at the least lead to delays for troubleshooting, and at their worst could lead to an outage due to misconfiguration. Where outages interrupt business, this isn’t an option.
Understanding how the IAM platform is performing is a key element in ensuring good service in run. Monitoring usage patterns and associated performance is essential to ensure fast response times and an optimal user experience. Additionally, identifying underlying error conditions which may not be immediately obvious can be priceless in terms of avoiding future issues and outages.
Out of the box, Ping Identity products provide a tremendous amount of performance and troubleshooting data, however these are spread across multiple log files which can be difficult to read, particularly in resilient and distributed clustered environments. Aggregating log files to a central database can help, however there is still a requirement to analyse and interpret the data which can be challenging.
In many deployments, these difficulties mean that analysis of logs is something which is done ‘after the fact’ to understand the causes of an issue, rather than something which is done proactively to prevent occurrence of issues. To optimize the run experience, proactive analysis and monitoring of performance should be a core activity.
Always on support
In modern, global enterprises, the IAM system becomes a central part of the organization’s fabric, processing authentication to all organizational applications and assets. Between workforce, customers and external users, authentication never stops, meaning that ‘always on’ support is required around the clock.
Even with platforms as stable as Ping Identity, issues and outages will occur, and as an integration platform, the IAM system will often surface issues first, even if the underlying cause lies elsewhere. For example – if Active Directory is unavailable, this may first become visible as users being unable to SSO into applications. This may turn into an incident in the middle of the night reported as ‘SSO is down’.
Having access to technical support whenever it is needed, with real understanding of the local deployment rather than just the technology, is a key requirement for enterprise IAM systems in run. However, sometimes this can be missed until it is needed.
In this blog, we have identified some of the common challenges facing enterprises as they manage Ping Identity technologies in ‘run’. From sourcing suitably skilled engineers to managing staging of configuration, there are many ways in which the quality of the service being offered can be compromised.
In the second blog in this series, we will focus on the ‘Route to Live’ with ProofID ConfigMigrator.