At HPE Discover 2026, the NVIDIA RTX PRO 4500 Blackwell Server Edition is the GPU that gets interesting precisely because it is not a flagship. It packs 32GB of GDDR7 and 10,496 Blackwell cores into a single-slot, passively cooled card that draws just 165 watts, and HPE showed it next to the ProLiant DL380 Gen12 server it is built to slot into.
I found it on a riser on the HPE stand, a plain dark slab of a card sitting in front of an open ProLiant Compute DL380 Gen12. Most of the GPUs people photograph at a show like this are 600-watt-plus parts with their own cooling towers. This one is the opposite. It is single-slot, it has no fan of its own, and it is the declared successor to the NVIDIA L4. As someone who runs GPUs for AI and ML work, this is the card on the table I would actually take home.
Table of Contents
Key Takeaways
- The RTX PRO 4500 Blackwell Server Edition is a single-slot, passively cooled, 165W GPU with 32GB of GDDR7 ECC, positioned as the successor to the NVIDIA L4.
- It runs the GB203 Blackwell chip with 10,496 CUDA cores and supports up to two 16GB MIG partitions.
- HPE showed it alongside the ProLiant Compute DL380 Gen12, the kind of standard 2U server it is meant to drop into.
- The whole pitch is density: a card you can rack without redesigning power and cooling, not a 600W flagship.
Specs at a Glance
- Product: NVIDIA RTX PRO 4500 Blackwell Server Edition
- GPU: GB203 Blackwell, 10,496 CUDA cores, 82 RT cores
- Memory: 32 GB GDDR7 ECC, 256-bit, 800 GB/s
- Power: 165 W, single PCIe 16-pin connector
- Form factor: single-slot, full height full length, 4.4 in by 10.5 in
- Cooling: passive, requires server airflow
- Interconnect: PCIe Gen5 x16
- Multi-Instance GPU: up to 2 partitions at 16 GB
- Media and precision: 3x NVENC, 3x NVDEC, FP4 support, 5th-gen Tensor Cores
- Display outputs: none, this is a server card
- Shown with: HPE ProLiant Compute DL380 Gen12
What HPE Put on the Stand
The card sat in front of an open ProLiant DL380 Gen12, which is the detail that matters.

A passive GPU brings no cooling of its own. The fan row visible in that server is what keeps this card alive, so the GPU and the host are really one thermal decision, not two.
One 16-Pin Connector
Stand it on end and the power story is refreshingly dull.

A single 16-pin PCIe connector feeds the card, because at 165 watts there is nothing exotic to feed. No bank of 8-pins, no 600-watt cable to find room for behind the card.
Single Slot, No Display
The bracket tells you who this card is for.

Full height, single slot, vented, and no display outputs at all. The workstation version of this card uses a dual-slot blower and keeps its display ports. This one drops both so it can pack tighter and lean on the chassis for air.
What I Could Not Confirm
The card was on a display riser, not seated in the DL380, so I could not confirm how many of these HPE qualifies per DL380 Gen12, or how they sit relative to the drive bays and the fan wall once installed. I also could not see the host airflow path with a card actually in the slot.
Final Words
The number that decides everything here is 32GB. That is comfortable for inference, computer vision, data science, and fine-tuning of small and mid-size models. A 70B-class model is not fitting in 32GB without quantization or splitting it across cards. What you do get is MIG, so you can carve the card into two 16GB instances and run two models or two tenants at once. This is an inference and data-science card, and it is honest about that.
I work with GPUs for AI and ML, so this is one of the few things at this show I have a real opinion about from use rather than from a spec sheet. The spec I keep coming back to is not the core count. It is the 165 watts and the single slot. That combination is what lets you put several of these in a server you already buy, instead of redesigning a rack around one large accelerator.
That is the whole pitch. The card carries no cooling of its own, so the host’s fans do the work, which means deployability is a question about the server, not the GPU. The single 16-pin keeps the power cabling trivial. NVIDIA built a card that is deliberately boring to deploy, and for anyone who has to rack and power inference nodes, boring is the highest compliment.
So the question this card puts to you is not how fast one GPU can go. It is simpler: if your workload fits in 32GB, why pay the power and cooling bill for anything bigger?


Leave a Reply