As part of Tech Field Days Cloud Field Day last week, I spent a whole day at the Google Cloud Moffett Place campus in Sunnyvale, CA to experience what I can only describe as drinking from an information firehose as our Googler hosts explain everything Google Cloud enables – file / block / object storage, application workloads, databases, virtual machines, and even bare metal computing.
Our gracious Googler host Bobby Allen brought together an impressive team of presenters and sessions that delved deeply into the underlying infrastructure needed to make the future demands of Generative AI a reality. One of his “Bobbyisms” helped me to keep focused during the in-depth sessions and discussions:
AI is not The Thing – AI Is the thing that makes The Thing Better.
In other words, GenAI is not really brand-new – after all, we’ve had machine learning for well over a decade, and LLMs have been around for some time as well. However, some things like prompt engineering, hallucinations, and the almost-human conversational capabilities of GenAI are quite recent developments.
So how does a major IT organization like Google build offerings for DevOps that combines neos and kainos?
Computing horsepower, in both bare metal or virtual flavors
For starters, Google Cloud offers a plethora of virtual and bare metal machines to provide the horsepower needed to run AI workloads along the DevOps spectrum, featuring Intel and AMD chipsets and various memory configurations for those demands.
Wait … What’s a TPU?
One interesting innovation that Google Cloud has brought to the GenAI party is the Tensor Processing Unit (TPU), a bundle of hardware components designed specifically to handle the rigors of AI computing demands.
Unlike GPUs, TPUs are a combination of shared memory and powerful CPUs tightly coupled together via extremely fast inter-chip interconnects (ICIs) within a three-dimensional “pod.” GenAI applications leverage a smaller 3D “cube” – a portion of a TPU – when they need to perform complex mathematical processing at the heart of many GenAI algorithms. (You can find out more about the v5p TPU here.)
Corpus chunks take up a lot of space. Where do we put it?
The storage requirements for enterprise data needed to make GenAI practical are obviously demanding, especially for cloud-based databases where that data resides. The next generation of Google Cloud block storage – Hyperdisk – is designed for those demands and comes in three flavors depending on the workloads being facilitated.
AI HyperComputer: The ties that bind
Powerful computing through GPUs and TPUs and optimized storage like Hyperdisk would be useless for GenAI workloads unless they’re able to communicate via ultra-fast, resilient networks. Google calls this architecture their AI Hypercomputer.
This is crucial for GenAI implementations because they often leverage Google Kubernetes Engine (GKE) to distribute workloads – whether it’s initial model training, testing, inference, fine-tuning, or even actual prompts and responses.
Google built a resilient network that can also intelligently offload workload demands throughout the architecture so that those hungry GPUs can be continually fed their required data inputs across multiple streams while insuring they can recombine their results as transformation occurs on separate GKE nodes.
Not Just Verbal: Multi-Modal Inputs for AI
One of the most intriguing demonstrations at Google Cloud from Neama Dadkhahnikoo was their latest multi-modal toolset that can interface with just about any type of input and output modality to leverage generative AI. Google showed off its latest Gemini toolset that provides some pretty spectacular capabilities on this playing field.
An actual example: A chatbot was asked to scan a movie to find a specific event, and the result came back in just a few seconds.
Another demo showed a Googler walking around an office with Google Glasses on and asking the AI to perform inferences based on images in focus. One intriguing example: A whiteboard sketch showing a live cat, a dead cat, and an open box. When asked What meme does this remind you of? the chatbot almost immediately replied Schroedinger’s Cat.
What I found especially reassuring is that Google Cloud hasn’t ignored the concerns many of us old-school data engineers have about the underlying data we need to collect as part of our AI applications’ corpus and training data, including ownership of our data within their Cloud.
Building ChatBots With Google CloudRun and LangChain
Tying all this infrastructure and AI-empowering concepts together, Lisa Shen did a fantastic job to demonstrate how Google CloudRun enables rapid development and deployment of GenAI endpoints that end users can access to immediately leverage the application logic. (I’m actually in the midst of building an APEX-based chatbot within Oracle Cloud Infrastructure for a similar LangChain demo for an upcoming session at Oracle Cloud World 2024, so this demo felt like familiar territory.)
Python is a favorite DevOps tool to for interfacing with LangChain to build GenAI apps, of course, but sensitive in-house business data needs to be stored somewhere … so I was glad to see the shout-out for Cloud SQL as a repository.
With sufficient Google Cloud documentation as a pertinent corpus, this simple GenAI Cloud Run example endpoint was able to return respectable answers to several sample prompts during the demo … with coding time of just a few minutes:
GenAI and Goodput
One thread that ran through almost every Google Cloud session about GenAI: the concept of goodput. With the complexity of the components that must work well together all the time to provide optimal results whenever a model is being built, trained, tested, and eventually used, it’s an interesting way of measuring just how well every aspect of a GenAI system is performing.
Oh, One More Thing …
And a final perspective as a 20+ year Oracle DBA: Oracle has recently updated its licensing terms for database instances on GCP, so I’m intrigued to see how an adventurous IT shop might leverage Google Cloud’s infrastructure for their next GenAI DevOps efforts with an Oracle database or two under the covers.