Weekend Reading | Full Transcript of Jensen Huang's Latest Interview: Input Electrons, Output Tokens, and in the Middle Lies NVIDIA

Alphabet AI · Apr 18 13:43

Source: Zimu AI

Recently, $NVIDIA (NVDA.US)$ CEO Jensen Huang was interviewed by renowned tech host Dwarkesh Patel in a session lasting 1 hour and 45 minutes.

This YouTube influencer, known for his progressive questioning style and sharp topic selection, did not deviate from his signature approach this time either. He started the interview bluntly by directly addressing NVIDIA's vulnerabilities, asking Huang: 'If software becomes commoditized, will NVIDIA also face the same fate?'

Some netizens commented on the interview, saying it was rare to see Jensen Huang appear so 'agitated.'

In this interview, Jensen Huang provided in-depth answers on the nature of competition in the AI era, how NVIDIA secures its supply chain, why it does not venture into AI cloud services, and issues related to China and AI chips.

We have summarized the key highlights of this interview below.

NVIDIA's true competitive advantage is not its chips, but its supply chain.

Dwarkesh: There is currently a view in the market that AI will gradually commoditize software, causing valuations of many software companies to decline as a result.

From a potentially naive perspective: you outsource designs to foundries, such as handing over GDS files to $Taiwan Semiconductor (TSM.US)$ , which then manufacture the chips. These are subsequently packaged with HBM produced by companies like SK Hynix, Micron, and Samsung, and finally sent to ODMs for assembly into complete systems.

So essentially, what NVIDIA does is software, just manufactured by others. If software becomes commoditized, will NVIDIA be commoditized as well?

Jensen Huang: Ultimately, there is one thing that must happen: turning electrons into Tokens. Not only turning them into Tokens, but also making these Tokens increasingly valuable. I believe this is very difficult to fully commoditize.

The process of converting electrons into Tokens is itself an extremely complex journey, akin to making one molecule more valuable than another. It involves a great deal of art, engineering, science, and invention. We are witnessing this unfold before our eyes, and the process is far from being fully understood or complete. Therefore, I do not think it will be commoditized — although we will make the process more efficient.

If you understand NVIDIA in this way, it is already close to my mental model for the company: the input is electrons, the output is Tokens, and NVIDIA is what happens in between. Our job is to achieve this transformation with as little as possible while pushing our capabilities to the extreme.

By 'as little as possible,' I mean we avoid doing anything unnecessary. We delegate those tasks to our partners, making them part of the broader ecosystem. If you look at NVIDIA today, we have an extensive cooperative ecosystem both upstream and downstream. Therefore, we aim to do as little as possible, but the parts we must handle are extremely challenging. I do not think these parts will be commoditized.

Dwarkesh: What about enterprise software companies? Many people believe they will be disrupted by AI.

Jensen Huang: Many software companies today are essentially 'toolmakers.' For example, Excel, PowerPoint, or companies like Cadence and Synopsys. Of course, some are process systems, but a significant portion are tools. However, the trend I see is quite the opposite: the number of Agents will grow exponentially, and so will the number of users utilizing these tools. In other words, the usage of tools will skyrocket.

Take Synopsys Design Compiler as an example. Its actual usage is very likely to surge in the future. Today, we are constrained by the number of engineers, but in the future, each engineer will be assisted by a multitude of Agents. These Agents will explore design spaces to a degree far beyond what is possible today, and they will rely on the very tools we have now.

Therefore, I believe the increased usage of tools will drive the growth of these software companies rather than overwhelm them. The reason this has not yet happened is that Agents are not yet adept at using tools. Two things will happen next: either these software companies will develop their own Agents, or Agents will become powerful enough to use these tools. Ultimately, both scenarios are likely to occur simultaneously.

Dwarkesh: In your recent earnings report, I noticed that NVIDIA has nearly $100 billion in purchase commitments, and some analyses suggest this figure could reach $250 billion. One interpretation is this: NVIDIA's moat lies in securing scarce resources critical for the next few years, such as wafers, memory, and packaging. Even if others have chip design capabilities, they may not be able to secure these resources. Is this your core advantage for the coming years?

Jensen Huang: This is something we can achieve but others find difficult. The reason is simple: the reason we can make such large commitments upstream is that we have the ability to purchase this capacity and sell it effectively.

Some of these commitments are explicit, such as procurement contracts we sign directly; but many are implicit, like when our supply chain partners make investments on their own. This is because I communicate with their CEOs, explaining to them how large the industry will grow, why it will grow, and how we arrived at these conclusions. In this process, I am essentially aligning the entire upstream ecosystem’s understanding.

So why are they willing to invest for us rather than for others? The reason is simple: they know that I have the ability to purchase their production capacity and sell it downstream. NVIDIA's downstream demand is enormous, and it is precisely this scale of downstream demand that motivates them to invest upstream.

If you look at the GTC conference, you’ll find many people are astonished by its scale. It is a 360-degree AI ecosystem where everyone in the AI world gathers in one place because they need each other. The upstream needs to see the downstream, and the downstream needs to see the upstream. They also need to witness the development of AI, as well as those AI-native companies and ventures.

I spend a great deal of time continuously informing and influencing our supply chain and ecosystem partners, helping them understand what the opportunity is, why it will happen, when it will happen, and how large the scale will be. Many people feel that my Keynote is somewhat like a lecture, or even a bit torturous, but this is intentional. Because I need the entire ecosystem to understand the future as I do.

No single bottleneck will last more than two to three years.

Dwarkesh: I would like to better understand a specific question: can the upstream supply chain really keep up? Your revenue has basically doubled over the past few years while the amount of computing power provided to the world has also grown significantly. The scale is already quite large now — for instance, you are one of the largest customers of Taiwan Semiconductor’s advanced processes (N3, N2), and some analyses suggest that AI could account for 60% of N3 capacity this year and possibly over 80% next year.

So here’s the question: if you are already the largest customer, how can you continue to 'double'? Will this growth be constrained by the upstream supply?

Jensen Huang: At any given point in time, it is normal for demand to exceed supply — in fact, this is a good state. You want an industry’s immediate demand to be greater than total supply. Of course, there may be bottlenecks at certain specific points, such as sometimes being held back by 'plumbers' (laugh).

Dwarkesh: Plumbers?

Jensen Huang: Yes, really. At some point, you might be restricted by a completely unexpected link. But this is not necessarily a bad thing, because when a bottleneck appears, the entire industry will quickly 'converge' on solving it.

For instance: A few years ago, everyone was discussing CoWoS, but now you rarely hear people talking about it. This is because over the past two years, the entire industry has made extreme investments and continuously expanded production capacity, and the issue has been largely resolved. Taiwan Semiconductor now recognizes that the supply of CoWoS must scale in tandem with the demand for logic chips and memory. Previously, CoWoS and HBM were considered 'specialized technologies,' but they are no longer; they have become a part of mainstream computing.

We are now more capable than ever of influencing a broader range of supply chains. What I'm saying now, I've actually been saying for five years. Some companies believed and invested back then, like Micron. I remember the meeting with their CEO at the time, where I clearly told them what would happen in the future and why. They truly invested deeply. From LPDDR to HBM, they made substantial investments, and the results have been excellent. Some companies caught up later, but now almost everyone is on the same page.

So my view is this: No bottleneck will last more than two to three years; each generation will bring new bottlenecks, but they will all be resolved. People are now starting to anticipate these bottlenecks years in advance and investing ahead of time. For example, silicon photonics, new packaging technologies, new testing equipment... Over the past few years, we’ve done a lot of this work, essentially 'rebuilding the supply chain' to prepare it for future scale.

Dwarkesh: It sounds like some bottlenecks are easier to scale than others. For example, CoWoS and packaging can be scaled, but other things, like manufacturing capacity itself, may be harder.

Jensen Huang: I already mentioned the hardest one earlier: plumbers and electricians. Those are the most difficult to scale.

This is also why I’m concerned about the many doomsayers who claim 'AI will end jobs.' If we discourage young people from becoming software engineers, we’ll face a shortage of them in the future. Ten years ago, some said, 'Don’t become a radiologist,' claiming it was the first profession AI would replace. Videos making those claims are still online today. And what happened? We now have a shortage of radiologists.

Dwarkesh: Returning to the earlier question about manufacturing capacity, how do you double wafer production every year? How do you double EUV machines annually?

Jensen Huang: These aren't problems; they can all be scaled within two to three years. The only key factor is the demand signal. Once there's a clear demand, if you can make one unit, you can make ten; if you can make ten, you can make a million. These things are not difficult to replicate.

Dwarkesh: So would you directly tell ASML, 'We need more EUV machines in the next three years'?

Jensen Huang: Some actions I take directly, while others are indirect. If I can convince Taiwan Semiconductor, then ASML will naturally follow. The key is identifying the real critical nodes. But none of these concern me — what truly worries me is energy. You cannot build a new industry without energy. Whether you want to rebuild manufacturing, build AI factories, produce electric vehicles, or develop robots, all require energy, and energy is a long-term issue. In comparison, chip capacity and packaging are just two-to-three-year challenges.

Dwarkesh: I have heard some completely opposite statements, so I am not sure who to believe (laugh).

Jensen Huang: You are now talking to an expert (laugh).

AI is only a part of computing, and computing encompasses far more than just AI.

Dwarkesh: I would like to return to the issue of competition, such as Google TPU. A significant portion of the strongest models in the world today are trained on TPUs. What does this mean for NVIDIA?

Jensen Huang: What we do is fundamentally different. NVIDIA focuses on accelerated computing, not just a 'tensor processing unit.' Accelerated computing is used across a wide range of fields, such as molecular dynamics, data processing, fluid mechanics... and, of course, AI.

AI is currently the hottest topic of discussion, but computing extends far beyond AI. We are reinventing the way computing works, transitioning from general-purpose computing to accelerated computing. Our scope is much broader than any TPU because we can accelerate all applications.

Dwarkesh: But realistically, the majority of your revenue today still comes from AI, not from drug discovery or quantum computing. Many people believe that the core of AI computation is matrix multiplication. TPUs are highly optimized for this, while GPUs are more versatile. So the question is, for this wave of AI demand, are TPUs more suitable?

Jensen Huang: Matrix multiplication is indeed important, but it is not everything. If you want to invent new attention mechanisms, decompose computations in different ways, or design entirely new model architectures, such as hybrid SSM models that combine diffusion and autoregressive models... what you need is a fully programmable system.

The core of AI progress lies in algorithms. Moore's Law improves performance by about 25% annually, but we are achieving 10x or even 100x improvements. These advancements come from new algorithms, new model architectures, and new ways of computing. Without programmability, you wouldn't even know where to begin with these innovations.

Dwarkesh: Let’s discuss a more practical issue. Your clients, such as Amazon, Google, and Microsoft, have the capability to write their own kernels and even develop their own software stacks. So, is CUDA still important?

Jensen Huang: CUDA is a highly robust ecosystem. If you are developing a system, it is very reasonable to start with CUDA. We support all frameworks, and if you need to write custom kernels, that is also possible. We have even invested significant technology into Triton.

However, you must consider this: when the system encounters an issue, is it your code that is problematic, or is it the underlying system? Naturally, you would prefer the issue to be on your end. The value of CUDA lies in the fact that you can trust the underlying infrastructure.

Another key point is the install base. As a developer, what you want most is for your software to run on a large number of machines. We now have hundreds of millions of GPUs across all cloud platforms, in various models and scales. This means that once you develop your solution, it can run globally.

Dwarkesh: But if your primary customers are these mega-companies, they can optimize for their own systems and even support multiple hardware platforms. Does your advantage still hold?

Jensen Huang: We have a large number of engineers working directly with these AI companies. You must understand that a GPU is not the same as a CPU: a CPU is like a cruiser, anyone can drive it; but a GPU is more like an F1 race car—you can drive it, but reaching its full potential requires specialized expertise. We use extensive AI to optimize kernels, and often, after we help our customers optimize, performance improves by two times, sometimes three times... even a 50% improvement is substantial. And for a company with massive computing clusters, improved performance directly translates to increased revenue.

Dwarkesh: If these customers can perform their own optimizations, will competition then shift to who offers lower prices and higher performance?

Jensen Huang: We have a large team of engineers within these AI labs helping them optimize. No one understands our architecture better than we do. A GPU is not as universally applicable as a CPU; it is more complex, and we can help them extract twice the performance from their systems. Moreover, our systems offer the best TCO (Total Cost of Ownership) across the entire industry. No company can demonstrate better TCO than us, whether in training or inference.

Dwarkesh: But there are still companies using other solutions. For example, Anthropic recently announced collaborations with Broadcom and Google, with much of the computation taking place on TPUs.

Jensen Huang: That is a very special case. Without Anthropic, the growth of TPUs would almost be non-existent—they represent an extreme scenario.

Dwarkesh: But OpenAI is also collaborating with AMD and even developing its own chips.

Jensen Huang: But the vast majority of their computing still happens on NVIDIA, and we will continue to collaborate with them. I don't mind if others try alternative solutions. If they don't try, how would they know how good we are? We must continuously prove ourselves.

If you look at the history of ASIC projects, how many of them ended up being discontinued or canceled. Building a system better than NVIDIA is not easy.

Dwarkesh: Their logic is that they don't need to be better than you, just not too far behind while offering lower costs.

Jensen Huang: The profit margin for ASICs is actually also quite high, around 65%, while NVIDIA's is 70%. The gap is not as large as you might think.

Dwarkesh: That brings us back to a question: Why didn't NVIDIA invest in these AI companies earlier?

Jensen Huang: We acted when we could. Earlier on, we did not have the capacity to make investments at that scale (billions of dollars); it was not our model at the time.

Moreover, I was not aware at the time that these companies had no other financing options—I assumed they could seek VC funding like ordinary companies. But later, I realized that what they were trying to achieve was beyond what VCs could invest in. That was my miscalculation.

Dwarkesh: Now you have a significant amount of cash. Why not build your own cloud services and become a company like AWS?

Jensen Huang: This is our company philosophy: Do only what must be done and avoid doing anything extra. If there is something we don't do, and the world won't have it as a result, then we must do it. But for the cloud, if we don't do it, many other companies will. So we don't.

On China's approach: Algorithms are the key

Dwarkesh: Many analyses suggest that China lags behind in advanced process technologies. For instance, much of their production is still at 7nm, and they lack EUV equipment. In terms of computing power, some estimates indicate that they possess only about one-tenth of the United States’ capacity, with a gap in HBM bandwidth that may approach an order of magnitude. Does this imply that the U.S. could achieve these capabilities first, deploy them earlier, patch vulnerabilities sooner, and thus gain a security advantage?

Jensen Huang: For that logic to hold, you would have to assume that they lack computing power, but that is not the reality. They already have substantial computing power. China is the world’s second-largest computing market, and if they choose to concentrate resources, they can certainly aggregate sufficient computing power.

Dwarkesh: But they still lag behind in areas like bandwidth and memory.

Jensen Huang: Then they will use more chips. AI, at its core, is a parallel computing problem. If you have enough energy, you can compensate for the gap by using more nodes. They have abundant energy, numerous data centers already built, and some even sitting idle. They can use more chips to piece the system together, and their semiconductor manufacturing capabilities are already quite strong. Therefore, the claim that 'they lack AI chips' is incorrect.

Dwarkesh: But there is indeed a gap in advanced chips, such as a bandwidth disparity that might approach an order of magnitude.

Jensen Huang: Then they will connect more nodes. Just as we use NVLink, they are already doing similar things. For example, Huawei has been connecting large numbers of computing nodes into a single system, linking vast computational resources through technologies like silicon photonics. So if you only look at individual chips, you will underestimate the entire system.

And don’t forget, algorithms are the key. They have a large number of excellent researchers, which is their biggest advantage. Much of the progress in AI comes from algorithms rather than hardware. If you are constrained in computing power, you are forced to develop better algorithms, such as DeepSeek — this is not an insignificant advancement. It represents a capability: the ability to create very powerful models despite limitations in computing power.

Dwarkesh: What if a model like DeepSeek were first optimized and run on Huawei chips?

Jensen Huang: That would be a very bad outcome. If a powerful model performs better on a non-U.S. technology stack, it would be bad news for the U.S.

Even without AI, NVIDIA would still be a very large company.

Dwarkesh: Earlier, we were discussing bottlenecks like Taiwan Semiconductor and memory. In a future scenario where you have already captured most of the N3 capacity and will likely dominate much of the N2 capacity, is there a possibility that you might return to older nodes, such as 7nm, leveraging that 'idle capacity' to recreate architectures similar to Hopper or Ampere but incorporating today’s advancements in numerical computing and system design? Do you think this could happen before 2030?

Jensen Huang: There is no need for that. The progress of each generation of architecture does not only involve transistor scaling. It also includes extensive engineering optimizations, packaging, stacking, numerical computing (numerics), system architecture, and more. If you were to revert to older nodes, it would mean redoing an entire set of R&D, which is a cost almost no one can afford.

We can move forward, but I don’t think we can go backward. Of course, if we conduct a thought experiment: if one day the world truly says, 'We will never have more advanced capacity,' would I immediately switch to 7nm? Absolutely. Without hesitation.

Dwarkesh: Another question, which someone asked me, is why doesn’t NVIDIA pursue multiple completely different chip projects simultaneously? For instance, something like Cerebras’ ultra-large chips, systems akin to Tesla Dojo, or even an architecture that does not rely on CUDA. You have the resources and engineering capabilities—why not diversify your bets? After all, the direction of AI architectures remains uncertain.

Jensen Huang: We certainly could do that. We just haven’t seen a better alternative. We’ve simulated these ideas in our emulation systems, and their performance turned out to be inferior. Therefore, we won’t pursue them. What we are currently doing represents what we believe is the most correct architecture.

Of course, if the nature of workloads themselves changes—not referring to algorithmic changes but transformations in workload patterns—we may introduce new accelerators. For example, recently, we have embraced directions related to Grok and will integrate it into the CUDA ecosystem.

This is because the value of tokens has increased. A few years ago, tokens were virtually free or very inexpensive. But now, the situation is different. Different customers have varying demands for tokens. For instance, as a software engineer, I would be willing to pay for tokens with faster response times because they enhance my efficiency. This market has only emerged recently.

Therefore, we can now do something: segment the same model into different markets based on response times. This is also why we decided to expand the 'Pareto frontier' of inference by creating a form of inference that offers faster responses but lower throughput.

In the past, everyone believed that higher throughput was always better. However, a new market may emerge where token prices are high (high ASP), and even with lower throughput, overall profitability remains higher. If that happens, we will pursue it. But from an architectural perspective, if I had more resources, I would prefer to invest them in our existing architecture rather than dispersing them.

Dwarkesh: The concept of a high-priced token market is very intriguing.

Jensen Huang: Yes, the essence is market segmentation.

Dwarkesh: One last question. Suppose the deep learning revolution had not happened, what would NVIDIA be doing today?

Jensen Huang: The same thing: accelerated computing. Our company’s fundamental premise has always been that Moore's Law would slow down and general-purpose computing would not work for everything. So we integrated GPUs with CPUs, allowing GPUs to accelerate specific computations.

Different algorithms, different kernels, can be offloaded to the GPU for execution, thereby accelerating applications by 100x or 200x. These applications exist in nearly every field: scientific computing, engineering, physics, data processing, graphics computing, image generation... Even without AI, NVIDIA would still be a very large company.

Because this is a more fundamental issue. The scaling of general-purpose computing is nearing its limits, and the solution lies in domain-specific acceleration. We started with graphics, but there are countless other domains like molecular dynamics, seismic data processing, energy exploration, image processing... These are all problems that general-purpose computing cannot efficiently solve. Our mission has always been to bring accelerated computing to the world, driving progress in areas where general-purpose computing falls short.

Of course, without AI, I would be very sad. But it is precisely because of our advances in computing that deep learning has become more accessible, enabling anyone to do amazing things with just one GPU card. That has never changed.

If you look at GTC, a significant portion of the content isn’t about AI. For instance, computational lithography, quantum chemistry, and data processing—these are all crucial tasks, just not as popular as AI.

There are many important things in the world that don't rely on AI, and Tensor is not the only form of computation. We want to help everyone.

Editor /rice

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to EleBank. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.