Catalog Details
CATEGORY
deploymentCREATED BY
UPDATED AT
November 23, 2024VERSION
0.0.1
What this pattern does:
Serve a large language model (LLM) with GPUs in Google Kubernetes Engine (GKE) mode. Create a GKE Standard cluster that uses multiple L4 GPUs and prepares the GKE infrastructure to serve any of the following models: 1. Falcon 40b. 2. Llama 2 70b
Caveats and Consideration:
Depending on the data format of the model, the number of GPUs varies. In this design, each model uses two L4 GPUs.
Compatibility:
Recent Discussions with "meshery" Tag
- Nov 22 | Meshery CI Maintainer: Sangram Rath
- Dec 04 | Link Meshery Integrations and Github workflow or local code
- Nov 20 | Meshery Development Meeting | Nov 20th 2024
- Nov 10 | Error in "make server" and "make ui-server"
- Nov 11 | Difference in dev Environments on port 9081 and 3000
- Nov 10 | npm run lint:fix error
- Oct 30 | Getting Meshery locally using Docker Desktop for Meshery UI contribution
- Nov 07 | Meshery + GCP Connector
- Oct 24 | Getting error when using utils.SetupContextEnv() when writing tests for relationship command
- Nov 16 | Where's the Cortex Integration of Meshmap?