For the time being, most of the world’s artificial intelligence (AI) infrastructure is in the US. But Gcore’s hopes to change that – or at least move things in that direction. The company is working hard to revolutionise European innovation in AI, at both training and inferencing phases.
Support for training starts with the Gcore Generative AI Cluster, which was announced in October 2023. Powered by NVIDIA A100 and H100 Tensor Core GPUs, the new service is designed to accelerate model training for generative AI (GenAI) applications.
The company has aggressive plans to support inferencing on a very large scale beginning in 2024. This is where it expects some of its biggest growth. In the meantime, it’s rolling out new infrastructure.
Infrastructure and platform services around the globe
Gcore already has more than 150 points of presence internationally and 110 terabits per second of total network capacity.
“We provide the infrastructure and platform services – both cloud and AI edge – for our customers around the globe and help to provision their business and applications on a global scale,” says Seva Vayner, product director of edge, cloud and AI at Gcore.
“It begins with foundational infrastructure services, including bare metal compute and storage, virtual machines, load balancers and external storage. Now we’re seeing more and more customers using platform services. We provide managed kubernetes with autoscaling and auto-healing.”
Although its services could be used by virtually any sector, certain types of applications demand the high performance and reliability Gcore targets, including gaming.
“We support hosting for various games and multiplayer server streaming, including cloud gaming, which is currently in high demand,” says Vayner. “Customers use our services across the full development life cycle – from staging and production to distributing games around the globe, along with analytics and replay. We are also working with telcos, especially with regard to content distribution and 5G services.”
The healthcare industry is also of growing importance to Gcore, especially telemedicine platforms and electronic healthcare records. Gcore has customers in Fintech using infrastructure services to run their payment platforms, and then there are media companies running transcoding sessions in the cloud.
Most customers use applications that run on x86 architectures. But an increasing number of applications now require the Arm architecture, which provides higher performance for certain use cases.
“Several of our customers, including gaming and other industries, are asking us to provide the option of Arm instances,” says Vayner. “We have requests from customers who would like to have their infrastructure based on virtual or bare metal instances based on Ampere’s Arm architecture.”
New Horizons for AI beginning in Europe
Gcore AI clusters have already been established in Europe, based in Luxembourg, Amsterdam, and Newport in Wales. They have also established a foothold in the US, with AI clusters in Manassas, Virginia. The company has an aggressive roadmap to operate in more places and to offer more services. Later this year, Gcore plans to expand with an additional 128 servers powered by the latest Nvidia H100 GPU to further enhance its infrastructure.
But one of the big moves in the works is to launch inference at the edge in the first quarter of 2024. The company will provide customers the option to deploy pre-trained models at the edge for low latency or deploy ready to use open-source models such as Whisper or Llama based on Nvidia L40 GPUs around the globe. The L40s are specifically designed for inference.
“We will have inference nodes available in our edge environments in numerous countries,” says Vayner. “We expect to have up to 100 nodes around the globe in 2024, which will be connected through our smart routing and CDN [content delivery network]. Pre-trained models will be directly connected to the end user by transferring requests efficiently to the nearest inference node.
“Customers can easily deploy their pre-trained models and distribute them around the globe. Our service will automatically route the request to the nearest point of presence, based on the device and request type. We have more than 150 points of presence around the globe, which then send the requests through our network backbone to the nearest inference node.
“Let’s say we have a request from an end user in Osaka. The first connection to our CDN node will be in Osaka. Next, it will travel to the inference node in Tokyo with L40s or Ampere processors. Our inference node will process the request and send the answer back through our network backbone to the Osaka CDN node, which then passes it back to the end user. This arrangement ensures low latency. Ultimately, we will colocate the inference nodes with CDN nodes. This service will then be able to provide real-time interaction with the ML [machine learning] model.”
For training very high-density and high-load models, Gcore uses GPU clusters with InfiniBand. But for inference, Arm-based CPUs may be more in demand than GPUs, so the company offers Ampere processors to give customers a bigger choice.
“We would like to be an agnostic cloud provider where a customer can use the x86 chips or Arm chips,” says Vayner. ”We want to provide types of processing units for different markets and different needs.”