Launching Pienso Faun

Karthik Dinakar, Vlad Preoteasa

Transforming the idea of smaller LLMs with dynamic, bare-metal infrastructure to boost efficiency, cost-effectiveness, and rapid AI workloads in enterprise settings.

It is well known in the deep learning community that most production-caliber workloads that enterprises want don’t need gargantuan large language models. Algorithmic efficiencies, combined with careful purposing of smaller large language models, in fact a garden of smaller models of various architectures can easily be more accurate, leaner, way less expensive and faster on a variety of tasks from summarization to nuanced classification. We’re glad that there is increased awareness of how effective algorithmic efficiencies can be, especially after what pierced through public consciousness after the release of Deepseek-R1. But folks in the LLM engineering community have always known this. A garden of smaller models can also run on relatively less expensive compute, especially as inference becomes the primary point of consumption for enterprises trying to leverage generative AI. Folks may remember this from our hardware accelerator panel at our MIT summit last year, which included leaders of Graphcore, Groq, AMD and Dell. A collection of smaller and more efficient Davids can easily and resoundingly surpass a Goliath. Today, in collaboration with our partners Dell, Liqid & Nvidia, we are thrilled to be launching Pienso Faun to use composable bare-metal infrastructure to support our vision: a garden of smaller models waiting to be sculpted by the non-technical enterprise user, carefully optimized and deployed for fast inference and robust quality. Compositionality is one of our guiding principles at Pienso.

Imagine you’re building a LEGO house. You don’t start with a single, rigid structure that can never change. You use modular blocks that can be rearranged, expanded, or repurposed as your vision evolves. That is exactly how Pienso Faun is designed, from composable infrastructure when developing and deploying AI models.

Traditional AI Infrastructure: A Fixed, Unchangeable LEGO Set

In traditional data centers, AI teams work with prebuilt, rigid infrastructure—like a LEGO set that’s glued together. If you need to modify your creation, you have to tear it down and rebuild it from scratch. For enterprises who for compliance reasons require their own Gen AI regime on bare-metal, they almost always encounter three significant risks. First the risk of over-provisioning by buying too many GPUs, CPUs, or storage in advance, just in case they’re needed. Second, underutilization of hardware accelerators – expensive compute sitting idle when they’re not in use. And third, slow scaling by adding more hardware, which takes weeks or months due to procurement and setup delays. This rigid approach limits how quickly LLM models can be consumed in production, plays slow catchup in the fast moving world of LLM innovation, and increases costs—just like trying to build a house with pre-glued LEGO pieces that can’t be adapted. Nobody wants sunk costs and be told to tear down the entire house in order to update a kitchen.

A garden of smaller LLMs can work with a garden of hardware accelerators, and these accelerators need not be too expensive.

Composable Infrastructure like LEGO blocks for LLMs

Now, imagine LEGO bricks that you can snap together however you want—reconfiguring your house on demand. Need a bigger skyscraper? Add more bricks. Need to repurpose parts of a bridge for a new building? No problem. This is how we’ve designed Pienso Faun. First, enterprises should be able to leverage dynamic gardens of models – instead of assigning fixed hardware accelerators to support a static set of models, Pienso Faun can dynamically compose the exact mix of GPUs, CPUs, and memory needed—like selecting the right LEGO pieces for each project. Second, it should be easy to scale without costing an arm and a leg. Efficient scaling of a new model in the garden could require more GPUs for fine-tuning, Pienso Faun can instantly add GPU bricks to the stack—without shutting anything down. And lastly optimized resource use – different workloads requiring different GPUs can access compute in a manner that maximises utilization. With this kind of LEGO-style modularity, Pienso Faun can train, fine-tune, and deploy LLMs faster while reducing infrastructure costs.

NVIDIA's GPU evolution lends itself to a LEGO-like modular approach, where each new architecture (Volta, Turing, Ampere, Ada Lovelace, Blackwell) represents a stronger, more efficient building block. With Composable Infrastructure, Pienso Faun can dynamically integrate and swap these GPU "bricks" to optimize workloads, ensuring seamless scalability and performance without tearing down the entire hardware stack.

Ok, let’s talk about the bricks. Pienso Faun is designed as an ultra-efficient, modular AI stack that brings together best-in-class compute, storage, and acceleration—all reconfigurable on demand. Faun has two Dell PowerEdge R760 servers, the foundation of its compute and storage infrastructure. These 2U, dual-socket powerhouses not only handle Pienso’s single-node Kubernetes deployment with precision but also serve as high-performance storage nodes, eliminating the need for separate storage appliances. To maximize data throughput, each server is equipped with a Liqid Element LQD4500 PCIe Add-in-Card (AIC)—better known as the “Honey Badger.” These storage bricks are optimized for extreme bandwidth and low-latency data movement, ensuring that LLM models get the data they need, exactly when they need it.

Pienso Faun is designed as an ultra-efficient, modular AI stack that brings together best-in-class compute, storage, and acceleration—all reconfigurable on demand. Faun has two Dell PowerEdge R760 servers, the foundation of its compute and storage infrastructure.

Now, onto acceleration. Pienso Faun is powered by a garden of different GPUs, with different GPU architectures. For example, Nvidia Ada Lovelace GPUs, offering a balanced mix of low-power L4 GPUs for lightweight inference tasks and high-memory L40 GPUs for more intensive workloads. These accelerator bricks are dynamically aggregated via a Liqid SmartStack 10, a modular 4U Liqid EX-4410 chassis paired with a 1U Liqid Director. This system, powered by Liqid Matrix software, allows Pienso Faun to instantly assign the optimal mix of GPUs to any given task—whether fine-tuning a small language model or running multi-modal generative inference. Pienso Faun operates within a streamlined 9U footprint—a testament to how compact, powerful, and purpose-built AI hardware can be when designed for flexibility. But its real strength lies in its composability: no single component is locked into a rigid setup. Enterprises can adapt, expand, or swap hardware seamlessly as AI models and workloads evolve.

A Future-Proof AI Playground

The dazzling pace with which new architectures hardware accelerators are arriving, supporting new algorithmic efficiencies forces enterprises to plan for end-of-life constraints on hardware accelerators. But this shouldn’t require an entirely new bare-metal setup, which can be prohibitively expensive. In other words, LEGO sets evolve—today’s housescape might become tomorrow’s space station. Similarly, AI accelerators constantly improve. With composable infrastructure, Pienso Faun doesn’t have to replace entire servers when upgrading hardware. Instead, it can swap out older GPUs for newer AI accelerators seamlessly (like replacing old LEGO blocks with advanced ones). We also want to support multiple AI architectures (LLMs, vision models, and more) without having to rebuild its entire bare-metal infrastructure, thereby adapting to customer needs dynamically, whether they require small, fast models or large, complex AI applications.

By embracing composable infrastructure, Pienso Faun is designed to give enterprises who want their own bare-metal regime the agility, scalability, and efficiency of a LEGO Master Builder. Just as LEGO lets you build whatever you imagine, Pienso’s AI platform—powered by composable infrastructure—lets enterprises create, experiment, and deploy AI models in a manner that doesn’t cost an arm and a leg. You can now compose your own bare-metal regime for GenAI based on your actual needs and workloads.

Acknowledgments

NVIDIA AI Startups
Dell OEM Solutions
Liqid