China's new AI wave: open source, agents, innovation
A deep dive into China's latest wave of AI models with Tom Wang, head of Asia-Pacific ecosystem at Hugging Face
Watch or listen to the High Capacity podcast on:
In this episode, I speak with Tom Wang, head of APAC ecosystem at Hugging Face. We talk about China’s latest wave of AI models, areas where Chinese AI labs are pushing forward, the rise of agents and OpenClaw, and what’s next for Chinese open source AI.
Links:
Tom Wang on Twitter / X
Tom Wang on Hugging Face
Transcript
Kyle Chan (00:00)
Welcome to the High Capacity Podcast. I’m your host, Kyle Chan, a fellow at Brookings. I’m thrilled to be joined today by my guest, Tom Wang, Head of Asia-Pacific Ecosystems at Hugging Face, which is the global platform for open-source AI. Welcome, Tom, and thanks for coming on the show.
Tom (00:19)
Thanks for the introduction. Hello, everyone. It’s my pleasure to be part of the podcast and share some of my thoughts with you.
Kyle Chan (00:28)
Great. I thought maybe we could start off by talking about what you do at Hugging Face. Maybe you can help describe what Hugging Face is, what your role is with the Asia-Pacific ecosystem, and how that gives you a special view into what’s happening in China’s open-source AI landscape.
Tom (00:52)
Yeah, for sure. For a very quick introduction to Hugging Face, you can think of it as the GitHub for AI. So instead of storing files and source code, we’re a hub for all the AI models, all the weights, and all the datasets. We also have a bunch of AI demos. We call them Hugging Face Spaces, so that you can try out all the SOTA models without actually having to download the models to your local hardware and run them on very expensive GPUs.
So basically, we are a platform for AI developers to get access to the latest open-source models. If you’re unfamiliar with open-source models, you can think of it as DeepSeek. In contrast to Claude and ChatGPT, you do not have the weights, so you have to pay for the API. For open-source models, you have access to the weights. As long as you have the hardware—you can have a Mac Studio—you are able to run it on your local machine. Some small open-source models are able to run on smartphones and gadgets like ESP32. So it’s a very diverse and very different ecosystem from what you see every day in the news.
Kyle Chan (02:18)
Yeah, that’s a great way to describe it, because you have some of these really, really huge models, reaching a trillion parameters, where you would need to deploy on some pretty impressive hardware, versus some models that are so small you can run them on your smartphone. So it’s a huge range, and it’s all there on one platform. The Chinese models, the American ones—it’s such a great resource.
So recently we’ve seen a wave of new Chinese AI models, and most of these are open source. This is more than a year now after DeepSeek R1. Those of you who follow Chinese AI will remember—or maybe even if you don’t, you’ll remember what happened to the U.S. stock market when R1 got released around this time last year. And while there’s been a lot of speculation about when DeepSeek is going to release their new model, we’ve seen a whole bunch of really fascinating models coming out of Chinese AI labs.
I was just wondering, to start off, what are some of the broader trends in this most recent wave? What’s interesting to you? Where are they innovating the most? I’m sure you’ve been reading through all the interesting technical reports and looking at their architectural innovations. If any of them stand out in particular, any that you want to highlight.
Tom (03:50)
Yeah, it’s a fascinating area. Every day we see a lot of new models coming up. Just yesterday Xiaomi released MiMo V2. Although it hasn’t been open-sourced, it will be open-sourced very soon. The whole thing accelerated after DeepSeek was released last year. Everyone was trying to join the game. But the game actually started two years earlier, even before DeepSeek R1 was released, back in the early 2020s.
We started to see a lot of Chinese models getting released as open source. I remember one of the first models that spoke good Chinese was GLM-6B or something like that, actually before Llama. And after Llama, we got a lot of Chinese Llama fine-tunes, et cetera.
There were a few very notable releases back then. For example, if you followed the news closely, you probably heard that the Yi model by Kai-Fu Li was open-sourced under an Apache license. That’s one of the first Apache-licensed, pure open-source releases back then. And then you have Qwen, which got a lot of researchers supporting it.
Qwen is now the model with the most derived models on Hugging Face, which means that researchers are not just using Qwen. They actually download Qwen, feed in some data, do some research, generate a new model—a derived model—and upload it back to Hugging Face. So there are a lot of efforts involved in creating derived models, and the numbers can’t easily be fooled. So this is dominating in the research market in a way.
And then we got DeepSeek. DeepSeek was already quite famous in the field. They had the MLA architecture. They had a bunch of very good technical reports. And when people were wondering how ChatGPT o1 was able to think, DeepSeek R1 got released. It was actually before the thinking model of Claude got released. So that had a huge, huge impact.
DeepSeek was also very different from Qwen. Junyang was posting a lot of things on Twitter, while DeepSeek was very hidden. The only information we could find back then was that their leader was attending a Chinese podcast or interview. Other than that, you had nothing. And then all of a sudden, out of nowhere, a great model gets released. So that caused a lot of attention in Western media as well.
But in the research field, if you track all the histories on Hugging Face, you can find all their progress from the initial version to the second version to the third version, et cetera. And after DeepSeek attracted all the attention, a lot of other players in China saw that this could be a good way to do free marketing. DeepSeek wasn’t paying anyone any money, and their app was the top one on the App Store. That was a very smart move, although they did nothing anyway.
So we started to see Kimi. Kimi K2 was released in mid-last year.
Kyle Chan (07:12)
Yeah.
Yeah, yeah.
Yeah.
Tom (07:32)
And then MiniMax, and then Zhipu—they all came back to the open-source competition. And we are starting to see them releasing better and better models. Zhipu and MiniMax even got IPO’d, I think. A large portion of that awareness and marketing is coming from their open-source contributions and all the great feedback from the community.
Because back in 2023, I remember there was a lot of investment in AI. Most of it died. We called them hand-waving models, competing with each other with more data. For venture capitalists, it’s very hard to evaluate whether they are just making a demo or maybe just rerouting the traffic to ChatGPT, et cetera, because they do not have the technical capability, and the system is not transparent enough.
Kyle Chan (08:08)
Yeah.
Tom (08:27)
Because you are not going to show source code to the VCs. They don’t understand it anyway. So open source has become a great way for people to justify that they have the muscle. Because once the model is open source, it’s put under the spotlight, and everyone is able to identify whether it is a good model or not, because everyone can try the model. Everyone can deploy the model on their local computer and cut off the internet just to make sure that they are not hiding anything or sending requests elsewhere. Everything is coming from their powerful weights.
So venture capital is really happy. From the past year, we have seen Kimi’s valuation rise from something like 20-something to 2 billion-something to 18 billion. Sorry, the market cap has risen six times or even more. So that’s the real power of the open-source community. And that inspires more people to join the open-source battlefield, I would say. Although they are collaborating, it’s actually a battlefield in the arena, right?
And more models are released on Hugging Face. That’s the thing you described: every day there are new models being released. And I think it’s very reasonable. It’s very exciting. In the future, we are going to keep seeing new things come up.
So at Hugging Face, I’ve been watching new model releases every day and helping them coordinate, helping them use Hugging Face in the best way. We have all the features, and I teach them how to use them, and also use Hugging Face as a gateway to help them amplify their impact.
Kyle Chan (10:24)
Yeah, that’s interesting. I want to ask you about that, because I think we sometimes hear about a new model being released, and there’s a big announcement, maybe there’s a blog post, maybe there’s a technical report. But then what happens, right? Especially on the Hugging Face side, how do you get interest? How do you get developers interested? What do you need to do to get your model into people’s hands and actually deployed and actually used?
Tom (10:59)
Yeah, that’s a very good question. Actually, I get asked exactly the same question by a lot of people. I think the simplest way is just to create an organization on Hugging Face, which is free, and start uploading your models. But that’s the simple answer, which is not the reality.
If you just upload a model and forget about it, then I would say it’s not going to get a lot of traction, especially now, because there are so many open-source models. So you have to have a very good model card and explain what your model is, ideally with a link to a technical report for more technical people like researchers. And then you would do some evaluation and say in which areas your model is performing better than other models, or what your selling point is.
It’s kind of a joke if you are saying that in my evaluation, my model outperforms all the closed-source models. People won’t believe that. But if you say that my model is very good at certain things—for example, I can spawn many agents and control them with a relatively small budget—people will believe that, and a lot of people will pay attention to the model.
If you happen to have a Hugging Face Space or demo link, people will come to the link and try it. I have a few canonical prompts that I use to test models. If I feel good about it, I will post it on Twitter, and a lot of people will retweet it. They will do their own experiments as well.
So I remember earlier this year, people started to post all the shiny websites that Kimi and Zhipu and MiniMax models were able to generate, because at the time all these models were optimizing for generating very beautiful websites with all the CSS and so on. So that was very good marketing because it’s something people can see. It’s not just saying, “I’m using 90% less” or “80% less.” It’s something people can see visually.
So that’s good marketing. Another good form of marketing is a technical report where you discover something new. For example, the recent Kimi Linear. Although it got a lot of backlash, I feel that the way they are exploring a new technical direction is very good, because in the world of open source, we are competing for the best one because the world is so diverse. We are actually creating this diversity. Everyone is allowed to try out something new, and you can build on top of other models. That’s the whole spirit of open source.
You create a model, release it, other people see it and patch in some data and make an even better model. So the whole ecosystem is very, very friendly and very good. And that’s why there are so many new models coming up, because everyone is building on top of others.
So if you happen to be building on top of other models, in the metadata of the model card you can say, okay, my parent model is this model, and my grandparent model is that model. This helps people understand where your model is coming from and how it is related to other models, which is actually very important because if you try to deploy the model using a framework, you need some source code changes. But if you happen to be a derivative of DeepSeek, DeepSeek models have already been supported very well by the whole ecosystem. So if you are building on top of DeepSeek, it’s very easy to get your model released to many people.
I think that’s one of the reasons why a recent Japanese model was built on top of DeepSeek, to get the benefits from the community. We are building on top of each other, right? Which is fine. Using other people’s architecture is nothing to be ashamed of. The whole value added on top of the model, and open-sourcing the new derived model, is actually new value added to the ecosystem, and that’s something we would love to see.
And lastly, if you happen to have the model released, you can check out the discussion panel on Hugging Face, and there are a lot of people actually trying to criticize the model. And that’s not really criticism. It’s not saying that your feature is so bad that I’m not going to use it. They are actually potential users. They are telling you where your model can be improved. Some of them would be happy to work with you and give you some data, some directions, some feedback, maybe even an environment, so that you can build on top of their feedback. So that’s how the whole system works.
Kyle Chan (16:19)
Yeah, this is an amazing overview of how these ecosystems emerge. It’s not just one lab going off into a cave, doing some massive training run, and then we wait, and then they do their post-training RL or whatever, and then release the model, and then that’s it. They’re building off other models, they’re building off other architectures, they’re getting feedback from potential users and other developers and incorporating that and improving their models in response. So it’s a very organic, live process rather than just shipping something and delivering the product, right?
Tom (17:04)
Yeah, yeah. In some ways, it’s the same as building in public, because you are building, and you are getting feedback, and you are building, and you’re getting feedback, et cetera.
Kyle Chan (17:14)
Yeah. And I really like your analogy where you mentioned that you can trace the lineage of some of these models and see how they’ve built on previous ones. And one example that really comes to mind is, in general, how Zhipu with the GLM models has built a lot on the DeepSeek architecture, on really interesting sparse attention and some of the other architectural innovations from DeepSeek.
So in a way, what DeepSeek did was not just have an impressive model, but also contribute something back into the broader open-source AI community, and then that could get picked up. I don’t know if you want to say anything about that case, or if there are other interesting examples too where you see an idea or innovation get picked up and maybe diffuse more broadly.
Tom (18:09)
Yeah, you mentioned MLA and all the deep engineering work from DeepSeek, and how it has accelerated the whole field. I can come up with another example, which is linear attention. I remember in the earlier days, in 2020 to 2022, a lot of people were discussing whether transformers were the only answer. And now we kind of agree that it’s a good answer, but we still don’t know if it is the only answer.
A lot of experiments were carried out by Chinese researchers. For example, one of the earlier attempts was RWKV, a model developed by one person at a company called Peng Bo. He developed the model, developed the architecture, and did a lot of experiments, also thanks to a lot of very generous compute contributions from Stability back then. And he was able to start exploring the field and writing a technical paper. Actually, he wrote a paper with all the contributors.
And then we saw this trend start to take off. And if I remember correctly, the first industrial-level linear attention model—or hybrid-weighted attention model, a combination of transformer and linear attention—was done by MiniMax in their M1 open-source models. And then Kimi had this linear model, et cetera.
It’s very interesting because you never know whether Gemini was actually built on linear attention. They don’t tell you. And all the architectural evolution is kind of hidden because they stopped publishing papers. But in the open-source world, you can still see that the world is not just transformers. The world is very diverse, and people are exploring different architectures. Some of them will fail. Some of them might succeed. But that’s how people understand something new and try out different things.
I remember when another version of the transformer—for example, BERT—was released, there was a lot of criticism as well. And people were exploring different directions. There are a lot of BERT-derived architectures, et cetera. And we are starting to see that actually happening in China, because that’s the only place where architectural evolution is being open-sourced.
Kyle Chan (20:33)
Yeah, that’s so interesting. Do you think in general that some of the Chinese models are pushing harder on certain areas like efficiency and trying to build these incredibly efficient models that can have really low compute or memory requirements and yet are still able to deliver very high performance, and also do it at a very low cost-per-token basis? Do you see that as a unique direction that the Chinese AI industry is heading toward, or do you think that’s common across the board?
Tom (21:16)
It’s definitely common across the board. The U.S. has the best chips. They have the best engineers. Although they might think less about how to create less powerful but faster, cheaper models, they still definitely want to do that. They are definitely hiring a lot of engineers trying to optimize the models, because if they can save 5% of compute time or compute cost, they are effectively earning 5% more. So it’s definitely something people would do across the board.
It’s just that closed-source models won’t tell you what kind of optimization they are using. I believe Gemini is using a lot of crazy optimization. That’s how their model is able to run so fast, apart from the fact that they are running on TPUs, which may be faster than NVIDIA in some cases.
I believe there is a very deep engineering gap between them and elsewhere. That’s one of their barriers. It’s not just the model, but also all the engineering details. The Chinese labs are also doing a lot of engineering work, but one of the main goals is that they want to run the model on inferior hardware. So this architectural revolution, or budget saving, has a higher priority than for the U.S. labs. So that’s probably one of the reasons why people were exploring linear attention, because linear attention models use way less RAM and way less compute compared to transformer-based models.
There was a phrase—I forgot how to say it—but the meaning is that limitations and constraints drive innovation, not the other way around. How do you say it?
Kyle Chan (23:26)
Yeah, like necessity is the mother of invention. Necessity is the mother of invention. So constraints breed innovation.
Tom (23:34)
Yeah, yeah. I think that’s the phrase. So that’s very interesting to see, because at the end of the day, even in the U.S. we are constrained by the amount of compute because we do not have enough data centers, we do not have enough power, we do not have enough copper—there are all sorts of real-world limitations. As soon as token usage goes up, the amount of compute required also goes up. And we have seen that electricity prices are already very high, right? So we need to find some way to make it more efficient.
I would guess that a lot of optimization on model architecture and a lot of optimization on the engineering side will be done in the U.S. as well. So to answer your question, I think it’s global. It’s just that one side is transparent and the other side you can’t see.
Kyle Chan (24:40)
Yeah. Well, I wanted to pick up on a thread that you had tossed out there, which was just that, in particular, some of the Chinese models are trying to work with Chinese chips, which are generally lower performance than, say, NVIDIA’s most advanced chips. But now we see a trend where it seems like a number of Chinese AI labs are releasing their models with sort of day-zero native inference capabilities or compatibility with not just Huawei Ascend chips but Cambricon and some of the other Chinese domestic chips. I was just wondering if that’s something you’ve been tracking and have observed across this latest wave of Chinese AI models.
Tom (25:30)
So in a very perfect, ideal situation, I guess a lot of the labs would not want to waste time developing models on inferior chips, but they have to. They have to. I remember back in the old days when Ascend chips were very hard to sell, until the U.S. gave them the best marketing and then banned the NVIDIA chips.
Kyle Chan (25:44)
Right. Right.
Yeah.
Tom (25:53)
But it takes time to build the whole software ecosystem, because you can have a very good thing, but it’s hard for your users to make sure they can use it in the best way. All the chips are designed with different architectures, and they have different software ecosystems, and it all takes time and human resources to adapt, to know what the pitfalls are and how to avoid them, and to co-evolve that with the chip designer.
Now I think they already have a very good collaborative relationship. And that’s one of the reasons why I wrote the blog post about how a parallel ecosystem is being built, and how the next generation of engineers are being trained on a different architecture.
Let me put it in energy terms: if we had a parallel universe where NVIDIA was banned but AMD chips were allowed, then AMD would grow much faster than it is now. The reason AMD chips are relatively hard to sell is that researchers do not have enough incentive to work with AMD, not because AMD is bad at chip design.
Kyle Chan (27:22)
Yeah, yeah. So if you’re focused on just developing the best model as fast as possible, you would rather not waste time and just use the best chips available, which might be NVIDIA’s and might be built on the CUDA platform that you’re used to developing on. But if you’re forced to, then you will end up trying to figure out alternative paths, basically.
I want to shift to agents because this is such a huge theme now. So it seems like agentic AI is something that has come up a lot in some of the latest Chinese AI model releases. And then now we have the whole OpenClaw craze in China as well as in the U.S. I was just wondering what you thought about the rise of these AI agents and whether you feel like there’s a real shift this year or if this is more of a continuation of what we’ve seen before.
Tom (28:33)
Yeah, I think it’s actually a new wave. It’s something very different. Large language models were like a brain in a tank. They just answer your questions, and they answer them after you ask. So you have to ask them, they give you something, and they have no capability to affect the physical world. They are locked in. They can only tell you to do something; they can’t do something for you.
OpenClaw is a completely new species. It’s able to connect with the physical world via all the tool calls. They are able to hire a human. There’s a website called hireahuman.ai. The agent can actually help people do things. It’s also very autonomous. When we talk about agents, we say autonomous, but this is a completely new level of autonomy because there’s a heartbeat mechanism where the agent wakes up, for example, every 15 minutes or every half hour to read the memory and read the task list to know what it is able to do, and read all the logs to continue its work.
And that’s something you have never seen before, because now you do not need to wait for the agents to do something after you ask them to. They can make their own judgments and do something. So the whole user experience is very, very different.
Another thing is that it’s actually an agent, not a product. Normally, if you have an AI app, you say, “Oh, I built this kind of product. Come to my website and I will do it for you.” For example, Manus is one such example. You have to go to Manus, buy credits, and do something on their website. But OpenClaw is totally different. It’s invisible. There is no OpenClaw product. It’s actually a messenger app. It’s hiding in the messenger app. You can talk to it as if it’s a person.
So that’s totally different. And it expands the whole user group because you no longer need to go to a different website. You no longer need to know what a large language model is. You can just talk to it and it will do it for you. And you discover that the more permission you give it—for example, access to my Gmail, access to my Google Drive, which is scary—the more autonomy and permission you give it, the more power it has. It will surprise you. It will try to connect all the dots together and give you a report, which you would never have thought of before.
So the more you give to an agent, the less you do as a human, and the more you get from the agent. This is also very, very different. Psychologically it’s very different, and it was a big shock for me.
My interest in OpenClaw started with open-source models, because OpenClaw is a token burner. It’s actually good news for the AI industry. Everyone was talking about the stock market crash and AI being a bubble, but then OpenClaw came. Just to say a random number, with a chatbot you might consume one billion tokens. With a passive agent, you can consume 10x that. Now with an active agent, you can consume 100x that. So it’s good news for the industry, and everyone was trying to buy into it.
But then people discovered the tokens are simply too expensive. Claude Opus is going to burn hundreds of dollars. And then we have open-source models. As long as you have a data center, you can deploy open-source models. You can sell the token at the same price that you consume electricity. So open-source models do not have this model premium. All you have is an electricity premium. So the price of the token is lower by maybe one or two orders of magnitude. And that actually fueled the whole OpenClaw trend. Without open-source models giving you affordable tokens, you would not be able to use OpenClaw as aggressively as now.
So that’s how I got into the OpenClaw world, and I started using my agent to do a bunch of things—posting on social media, et cetera, giving it a bunch of my documents, asking it to write documentation and write software. But then I discovered that the thing being underestimated by people in the whole OpenClaw movement is that there are many huge security risks related to OpenClaw.
For example, you’re using skills. You can ask OpenClaw to discover skills for you. And OpenClaw is not antivirus software or something like that. It will not check whether the skill is reasonable or not. You may leak your personal information, or you may turn your machine into a Bitcoin miner because you downloaded the wrong skills from somewhere on the internet.
And also, the model itself has some intrinsic defaults. For example, large language models are not always giving you the same answer. It’s a probabilistic machine. It runs autoregression and samples from the output, then predicts the next word probability distribution. So it’s not predicting the next word; it’s actually predicting the next distribution and doing a sample. So there is always a chance that it’s sampling a very low-probability but wrong answer. So if you want the machine to do something 100% right, do not use large language models.
And now we are plugging this uncertainty into a bunch of tools and skills that can affect the world. So this is very, very dangerous. Although we are improving the quality of the models, it’s something we need to be aware of: the machine can do something wrong. And you need something like Control-Z to undo things from the agent, which you do not have with OpenClaw. OpenClaw will send API requests somewhere else, which you cannot undo.
So there are a lot of very interesting questions about how we can make this OpenClaw structure safer, more privacy-preserving, and how to defeat bad skills. But at the end of the day, we are all limited by the current transformer architecture. There are a lot of problems like hallucination. If you have multiple agents, information will get lost at every layer. You pass it down, and eventually it may become a very different answer.
I remember playing a very interesting game when I was a child: 10 or 20 students, or the whole classroom, sit on chairs one after another, and the teacher tells the first student something. The first student goes back—he cannot say the exact word, but he can describe it—and then passes the information along to the last student. The last student says the word, and most of the time the last word is not the word the teacher wanted to pass on. So that’s exactly what we have in this kind of multi-agent, long-term thinking and multi-turn agent system, whatever you call it.
So there are a lot of things we need to be aware of while we are enjoying the benefits from OpenClaw. That’s why I’m writing a book about the security issues around OpenClaw, how we can make it safer, and how it’s connected to human sovereignty and to open source, et cetera.
Kyle Chan (37:14)
Yeah. Basically, if you are the chief technology officer for a major bank, maybe don’t play around with OpenClaw and install it on your corporate servers and just let it go crazy with your financial data. There could be some pretty big security risks. But maybe this is a glimpse of what is to come. As we get future iterations, maybe they are more secure, maybe we’ll patch some of these areas, and then the capabilities themselves might improve. So we have a path forward here, perhaps on the agentic front, that is very different from what you were mentioning before: the older model where you interact with the chat, you prompt the chatbot, it gives you output, you prompt it again, it gives you output. Now you can have it almost be alive on your computer or in the cloud.
Tom (37:55)
For me—
Kyle Chan (38:11)
Yeah, and I think it’s very interesting to see, in particular, how quickly a lot of the Chinese AI labs jumped on the OpenClaw trend and quickly came in with support. What do you see behind that wave? Because basically every single Chinese tech company you could think of—from the big ones like Tencent and Baidu—all jumped in, as well as all the AI labs. What do you make of all that?
Tom (38:41)
Yeah. So what I learned in the past three years, when I was advocating for AI and talking to people who run small businesses and use AI to improve their performance, is that the real power is not coming from building something for the worker or the employee. The real power comes from building something for the boss. And OpenClaw is an ideal thing for the boss, because the boss, without knowing all the technical details, can actually use it just as if he’s talking to one of his employees. And he discovers that after plugging all his company data into OpenClaw, the system is able to perform better work than his best employees.
Although large language models are bottlenecked by context length, one million tokens of context is already much bigger than what a human can do. And the response is very fast. Most importantly, the agent is very good at small talk. It is very good at making you happy. And this is not something you would see from normal employees.
So I would say some of the reason why all the Chinese companies are exploring the OpenClaw stuff is that it’s something for the manager. The manager, the director, the board member—they are all happy about it. They would have been happy with the technology if they knew technology, but not everyone does. So why not? Let’s just do it and break things until we know it.
Kyle Chan (40:32)
Yeah, I like that way of framing it. Well, I want to ask you now about something else. We’ve been talking a lot about LLMs in particular, but there are other kinds of models out there that you see on Hugging Face: VLA models, video generation, image generation, a whole multimodal range of different but related classes of models.
I was just wondering how closely you track all of these different types of models and datasets, and whether you notice anything interesting in the Chinese AI space. One example I would highlight is that this is not an open-source model, but ByteDance’s Seedance has gotten a lot of attention as a powerful video generation model that is arguably better than the top models from the U.S. In other cases, Chinese AI models might be catching up very fast or be very good, but here is a potential case where they might be ahead. So I was just wondering if you had any comments on what’s going on in that space, and especially as it relates to embodied AI and robotics. Again, this is moving beyond just chatbots into the physical world.
Tom (41:58)
Yeah, they are all very great questions. Let’s first focus on Seedance, because if we try to explain all the topics, we won’t have enough time to cover them. But I do have a lot of thoughts about Seedance. I remember when Veo 2, the multimodal generation model from Google, was released, people had a lot of hype too. But I felt that this hype was actually not coming from the technology itself. The technology is great, but if the technology itself were the hype, why did we see much less hype around Veo? The only thing I discovered is that people are not hyped for what they can do, but for what they can modify and connect. Remix is the hype, not the generation itself.
So when I can use the face of some famous actor and do some very famous scenes in a different movie—for example, remix all the DC characters and have them act out a fascinating story—that will go viral on social media. But something I create myself, where people are not familiar with the story behind it, is less likely to spread.
So once Seedance realized there were huge IP concerns, they put a lot of restrictions on what kind of prompt you can use and what kind of video you can generate. And then all of a sudden, there was no hype. You stopped seeing everything on Twitter. I would say it’s actually a very interesting problem, not for the tech field but for how you see IP in this new era, and how that could be helping us or limiting us in certain ways.
Kyle Chan (44:00)
Yeah, that’s a really interesting point. Okay, so that was about Seedance. Maybe let’s go back to some of the other types of models, like VLA—vision-language-action—and interesting datasets around that. Is that something that you follow on Hugging Face?
Tom (44:32)
Yeah, we have this robotics project at Hugging Face, and we do collaborate with a lot of people working on models, datasets, and the embodied side itself in China and the broader APEC region. And I feel that this whole area is moving very fast. For my understanding, all three of these areas are moving very fast.
For example, on the embodied side, we are seeing more and more autonomous workers in different fields. They are working on very exciting stuff. I remember a year ago, folding clothes was considered very hard because cloth is soft and comes in different shapes and different angles. But now we nearly have an industrial-level cloth-folding machine. It’s just a matter of cost, whether we want to deploy it in real scenarios or not. And the success rate for things like holding a glass or doing some simple cooking has risen a lot.
So I do believe that it’s something coming, but the limitation is more on the body itself. For example, if you buy a Unitree, it’s very powerful and it can do a lot of things. But then if you run it for 20 minutes, the motors start generating a lot of heat and you need to cool them down. If you don’t, the motors are very easy to break. And it’s very hard to replace them, or even just to replace the motors, et cetera.
The whole robotics industry is not as mature as the car industry, where we can just send a car to a dealership and ask for repair or whatever. For now, if you buy a Unitree and it’s broken, you have to buy another one. So there are a lot of limitations around the motors.
If you saw the Spring Festival in China, there were a lot of robots doing exciting shows. But after the scene, after the program is finished, you see a lot of people fanning them, trying to cool the motors down. So that’s the reality. You can have humanoid robots working fine on a task, but they are not able to work for very long, and there are a lot of limitations like price, durability, and repair, et cetera.
Kyle Chan (47:27)
Yeah, that makes sense. There’s still so many hardware constraints. So a lot of work is being done on improving the brains, the algorithms, the VLA models, for example, that help power them and do longer-term planning. But at the end of the day, they still face these physical constraints: heat dissipation, battery life, how good the sensors are, wear and tear. I don’t know how long a Unitree humanoid robot would really function on an assembly line doing the kind of mass production that you would at least still see human workers doing today. So that’s still a big question mark.
I was wondering now, kind of looking ahead, maybe for the rest of this year or even further out, what do you think some of the big trends will be in AI, and especially in Chinese AI? And do you think open source will continue to be a big strategy for these Chinese AI labs? Do you think there might be a switch to going closed, maybe if they feel pressure to monetize faster? I don’t know, this is just speculation.
Tom (48:53)
Yeah, on the monetization side, I do have a lot of concerns. Because I believe the best open-source strategy is one that is sustained—meaning they can open-source stuff, get community feedback, and build a better model, but at the same time earn some money so they can pay salaries for the researchers, pay for electricity, and pay the compute bill.
So I would say this is something the Chinese labs are starting to think about. In the past we have seen bad examples, for example Stability. They made very impressive models. They are actually one of the creators of diffusion models and text-to-image models. But then they had a very bad monetization strategy once the pressure was on. So now they are not open-sourcing anymore. They are really focused on making money. I wouldn’t say it’s bad, but I would say that if, while they were building open-source models, they had had more monetization support, they probably would have continued open-sourcing.
In the past year, all the Chinese labs have been rushing into the open-source race except ByteDance. But almost all the other labs are doing open-source stuff. And this year, I would say whoever can sustain the open-source pace is whoever can actually make money from the open-source battlefield. If they figure that out, it will be good news for the whole ecosystem. Then everyone knows it’s going to be a sustained model, and they would have less hesitation adopting it. Because otherwise, if today you use this model and a year later the model doesn’t exist, it’s actually very bad for you to make the decision to use it and contribute to its ecosystem.
So I think monetization is going to be a challenge. And I was just thinking out loud about what the monetization strategy could be. One way is to build a first-party generation service and serve it globally, or sell subscriptions. Maybe they can work with enterprises and jointly build a model for that enterprise using its proprietary data. Or they could issue a different kind of license, which allows them to generate some revenue from the cloud running the model. Or maybe they can just get more money from venture capital or generate money from IPOs.
So there are a lot of possibilities, but that’s going to be a very interesting thing to watch this year and next year.
Kyle Chan (52:06)
Yeah, definitely. Well, kind of related to that, maybe one last question is about global expansion. I was just wondering what your thoughts were watching a lot of these Chinese AI labs really deliberately go for the international market and really try to drive international adoption. We now see, I think, MiniMax and maybe also Zhipu making a lot of money overseas, or perhaps even more than domestically within China. And many of the founders are now going on podcasts and speaking to the public, going on Reddit, and there is just a big presence online—Hugging Face, of course, but even on Twitter or X.
I was wondering what you thought about this effort to have global reach, and where you think that’s headed.
Tom (53:02)
Yeah. I can answer that in two ways. The first is on the economic side of things. If you actually look at who has fueled the whole Chinese AI bubble, et cetera, a lot of it is actually U.S. capital. For example, MiniMax is a Chinese company, true, but a lot of the investors are actually U.S. investors, I believe. Kimi is the same. So them going global—you can think of them as a U.S.-capital-backed company going global. There’s nothing special about it. If it can make money, it will do something.
And on the actual research side, they have released great open-source models, and these open-source models benefit U.S. companies and U.S. researchers for sure. I remember a16z—one of the directors mentioned that 80% of U.S. companies who build models build on top of Chinese open-source models. So naturally, these people will have a great influence over other researchers, because they are the creators of what others are building on.
For example, the creator of OpenClaw had a ton of fans in China. Peter had a lot of fans in China because these are the people who use OpenClaw and are benefiting from whatever Peter has created. It’s the same in the open-source world. If you release a great model, people would love to see and talk to the person who released it. Otherwise, they wouldn’t have that model.
So I think on both levels, it’s going to be very natural. Nothing special about it. It’s just a natural flow.
Kyle Chan (55:07)
Yeah. I feel like a lot of us get to benefit from this global outreach because then we get a lot more visibility into what they’re trying to do, what kind of features they’re offering. And it’s just a much different world than one where they’re closed off and doing their own thing and then surfacing every so often to launch a product and then disappear.
There’s just so much happening. I could ask you a million other questions, including broader questions outside of China, because I know that you cover the broader Asia-Pacific region. Maybe we can save that for another time. I just want to let you have a last word about your book that you’re working on and what you’re trying to do with that project before we wrap up.
Tom (55:47)
Sure.
Oh yeah, it’s going to be a very interesting book. Honestly, I’m writing it with AI. I cover the major story I want to cover, but I’m actually a very bad writer, so I want the AI to handle the actual writing, and I will do the post-editing.
So the story is very simple. We have talked a lot about the power of OpenClaw and the intrinsic risks from the model itself, and also how this changes our impression of what AI can do and makes us reflect on what the relationship between AI and humans is going to be.
So what I did was I read a lot of references—philosophers, past researchers, et cetera. I wrote an outline that was very technically focused, and I felt very happy with it. I thought, okay, I got it, it’s very scientifically grounded. So I sent this outline to my wife, and she said it’s not something she would be happy to read—it’s too technical.
My wife does a lot of market research because she works in a private fund. But even though she has enough knowledge about automotive models and robotics, she’s still unwilling to read such a book, because I was writing a book for myself. I was writing a book for people who were already aware of all the risks.
And one of the best pieces of advice she gave me on this book was that I should make it something people would love to read. So I’m turning it into fiction. Well, it’s not exactly fiction, because a lot of it is based on real things, but under different names, and I try to mix the characters together and have different branches mingling together.
At the end of the day, I want people to read this book and become aware of all the risks we have with OpenClaw. I’m also putting in a lot of interesting story elements. For example, the character is able to earn a lot of money from the agent. He’s working on some very cool agent stuff and earns a lot of money. Then he starts to realize the risks of the models and starts thinking very deeply about how humans and AI relate to each other, what the future relationship between workers and capital might be, and all kinds of things.
So it’s going to be a very interesting book because I have never written such a big novel. It’s getting big.
Kyle Chan (58:47)
Yeah. Well, that’s very exciting. It really sounds like it’s right on the cusp between science fiction and science reality, where the stuff that’s happening with AI feels like it’s in that fuzzy boundary area now. Especially when you’re talking about agents having a heartbeat, for example—coming alive.
Tom (59:07)
Yeah. Exactly. So I also add some of my personal interests. It also talks about how history was forgotten. For example, several generations ago there were Chinese workers working on the railway system in the U.S., and they were kind of forgotten.
And the open-source world has a lot of stories too. I mentioned a bit of it, but there are a ton of stories happening in the open-source world. It’s not just DeepSeek, right? DeepSeek is too big to be ignored. But now if you search on ChatGPT and ask what has happened in the open-source world in the past three years, it will tell you Mistral and that’s it. And DeepSeek is not going to tell you all the details about everyone who has contributed to open source. So part of my book also talks about this history, and I want it to be remembered.
Kyle Chan (1:00:05)
That sounds awesome. I can’t wait to read it. Well, in the meantime, if people want to learn more about you and your work, or if they want to follow you, where should they go?
Tom (1:00:07)
Yeah. I’ll just check out my Twitter.
Kyle Chan (1:00:19)
This is fantastic. Thank you so much, Tom, for an incredible conversation. If you liked this episode, please rate and subscribe on YouTube, Spotify, or Apple Podcasts. You can find episode transcripts and more information on the High Capacity newsletter at highcapacity.org. I’m your host, Kyle Chan.
Thanks for joining, and see you next time.



