9d3b33bf Eaf6 4d06 9dd6 64aa725ab383

What this pattern does:

Serve a large language model (LLM) with GPUs in Google Kubernetes Engine (GKE) mode. Create a GKE Standard cluster that uses multiple L4 GPUs and prepares the GKE infrastructure to serve any of the following models: 1. Falcon 40b. 2. Llama 2 70b

Caveats and Consideration:

Depending on the data format of the model, the number of GPUs varies. In this design, each model uses two L4 GPUs.

Compatibility:

Recent Discussions with "meshery" Tag

Nov 22 | Meshery CI Maintainer: Sangram Rath Vivek Vishal
Dec 04 | Link Meshery Integrations and Github workflow or local code Shlok Mishra
Nov 20 | Meshery Development Meeting | Nov 20th 2024 Vivek Vishal
Nov 10 | Error in "make server" and "make ui-server" Aadarsh Shekhar
Nov 11 | Difference in dev Environments on port 9081 and 3000 Divyansh Khatri
Nov 10 | npm run lint:fix error Sankarshan Mishra
Oct 30 | Getting Meshery locally using Docker Desktop for Meshery UI contribution Davidudale
Nov 07 | Meshery + GCP Connector Santosh Kumar Doodala
Oct 24 | Getting error when using utils.SetupContextEnv() when writing tests for relationship command Sujai Gupta
Nov 16 | Where's the Cortex Integration of Meshmap? Aditya Gupta

Serve an LLM with multiple GPUs in GKE

Catalog Details

Pattern Snapshot

Related Patterns

Example Labels and Annotations

MESHERY4649

What this pattern does:

Caveats and Consideration:

Compatibility:

Recent Discussions with "meshery" Tag