Podcast: Z.ai, inside one of China's top AI companies
Z.ai's newest flagship model, agentic AI, and Z.ai's future strategy
Watch or listen to the High Capacity podcast on:
This episode, I talk with Zixuan Li at Z.ai, one of China’s top AI labs and the company behind the popular GLM models. Zixuan is Z.ai’s Director of Product for genAI strategy and global partnerships. We discuss Z.ai’s newest flagship AI model, the challenges of building agentic AI models, and what their longer AI strategy looks like.
Links:
Transcript
Kyle Chan (00:00)
Welcome to the High Capacity Podcast. I’m your host, Kyle Chan, a fellow at Brookings. I’m thrilled to be joined today by my guest, Zixuan Li, who is Director of Product for Generative AI Strategy and Global Partnerships and more at z.ai, also known as Zhipu in China, which is one of China’s top AI labs and the company behind the GLM foundation models. Welcome, Zixuan, and thanks for coming on the show.
Zixuan Li (00:27)
Thank you, Kyle.
Kyle Chan (00:29)
I have to start by asking you about the newest GLM model, the newest flagship GLM model, 5.1. What are its strengths? What can it do? What are its unique features? Anything that you can share with us at this point, because we’re recording before the official launch.
Zixuan Li (00:50)
First of all, GLM 5.1 will be a solid model on par with Opus 4.6, so it’s very strong in coding, agentic tasks, but also in general conversation, Q&A, and deep research capability. I think it’s on par with frontier models. Because GLM 5 is so strong in coding, many people regard GLM as a coding model, but it’s not, frankly speaking. We optimize in all aspects. So when you look at Artificial Analysis, it’s still leading all open-source models. So GLM-5 leads all open-source models in general intelligence, and we’ll have a stronger model, GLM-5.1.
But what’s unique with this model is that we optimize for long-horizon tasks. So it can run deeper, run longer. And our understanding of long horizon is that it doesn’t just run longer, from one hour to 10 hours. Actually, it can optimize results in a very fantastic way. Compared with GLM-5, we ran several tests, like letting the model run a CUDA kernel and optimize the CUDA kernel. Actually, it can achieve 2x the result compared to GLM-5. It’s very amazing. In other criteria, like website creation, and also some agentic tasks like Vending-Bench, like self-evolving tasks, it has about 2x results compared to GLM-5.
Kyle Chan (02:33)
Mm-hmm.
Wow, so what explains that jump in performance from GLM 5? I’m sure there are many things in that mixture, but were there certain features, certain kinds of engineering methods, even architectural changes, that helped to make that leap in the RL process?
Zixuan Li (03:04)
First, no architecture changes, because it still adopts the same architecture as GLM-5. What matters most is that we have been aware of this scenario. So before GLM 5.1, for long-horizon tasks, we might focus more on vibe coding and application creation instead of real optimization on certain tasks.
When we were about to launch GLM 5.1, we realized that longer horizon is better for the next generation. It’s better for AGI. And we were also inspired by agents like OpenClaw. You need to sleep. So what happens during your sleep? You have eight hours. So why not let your agent perform tasks for you during those eight hours?
So based on these observations and scenarios, we created new datasets, new training data, and tried to define the tasks we’re going to solve. I think that’s our competitive advantage. Compared to DeepSeek, DeepSeek is quite strong in architecture and research. But what makes GLM unique is our observation of real-world problems.
Kyle Chan (04:32)
Mm-hmm.
So you have certain long-horizon tasks that you can perform especially well. And when we’re talking about long horizon, how long are we talking about? Is it really like some of these, you know, can I wake up and have it refactor my code base and re-optimize my hard drive and answer all my emails by the time I get up? Or what are some examples of interesting, complex long-horizon tasks that you’re thinking of that GLM 5.1 is really good at?
Zixuan Li (05:09)
I think it depends on the harness. So we’re talking about harness engineering. OpenClaw is a kind of harness, or you can say the Pi agent is the harness. So it’s the Pi agent that lets your email run all the time. But what makes a model unique is that the model can understand the context. It can follow the instruction after eight hours.
A simpler model can also perform some tasks to answer your email, but it cannot perform that well after four hours or five hours. GLM 5.1, I think, can always find a better way to solve problems. So in the first 10 minutes, it has a solution. But it goes back to the original problem and analyzes its first solution and gets a better solution.
Step by step, it can finally get a solution that’s way better than its first solution. So long horizon doesn’t just mean time. It lasts very long, but I’ve given several examples that run very long and are not considered long-horizon tasks. For example, let the model count from one to one million. It may cost a day to finish the task, but it’s not a long horizon.
Kyle Chan (06:15)
Mm-hmm.
Right.
Zixuan Li (06:36)
A long-horizon model actually means you need more time to get the job done better. So if you have one hour, you have a solution within that one hour. But if you’re given 10 hours, you can perform it better. Most models can only get the job done within an hour and cannot improve afterward. But a better model, a model like Opus 4.6, can see a better result after several hours, or after several iterations. So long horizon doesn’t mean time. It actually means the level of iterations.
Kyle Chan (07:09)
Mm-hmm.
Right. Because I remember that for GLM 5, it was able to do very impressive work on basic office work, like data analysis or putting together a presentation that incorporates some charts from data extracted from reports. And it could sort of do it all in one go, basically. Is that the kind of long-horizon complex task that GLM 5.1 is improving on?
Zixuan Li (07:51)
Actually, we are improving on all office work, slides, Excel, things like that, but it won’t cost you more than one hour. So we’re actually doing something harder than that. We are improving all the office tasks, but in terms of long horizon, you can see in our tech report and our tech blog that there will be many use cases. It will be super hard for people to finish those tasks by themselves. So actually a model can really surpass human beings in those tasks.
Kyle Chan (08:30)
Mm-hmm. Mm-hmm.
We’ll come back to the question of AGI and exactly how far, like what types of humans the models can already beat. It’s not that maybe they can beat all humans at all coding tasks, but maybe most humans at many tasks. Related to this question, I want to ask how this differs from GLM 5V Turbo, which had also come out fairly recently, or some of the other models in the GLM family. Some seem to be more aimed at image generation, obviously. Some are aimed at multimodal models for agentic workflows. The Turbo model in particular seems like it was optimized especially for tools like OpenClaw. So I was wondering where you place 5.1 in this broader family and how we should see these different models.
Zixuan Li (09:28)
In terms of capabilities, GLM 5.1 is way better than 5 Turbo, because Turbo means that you sacrifice some capabilities for speed. So GLM 5.1 will be slower than Turbo, but its general capability is better than the Turbo series. And 5.1 does not have vision capabilities. So if you want to try visual understanding, you still need to use 5V Turbo. So we make a lot of trade-offs. As you can see, we trade off visual capabilities and speed and other sorts of things in open and closed scenarios. And there’s also a trade-off between OpenClaw scenarios and coding scenarios.
We make decisions based on our observations, based on our evaluations, trying to see which one matters most to our users. And we don’t have a 1 million context window. The reason is that we found out maybe that’s not the pain point. Maybe abilities matter most. Maybe some tasks need to be performed within an 80k context window, or the context crashes very soon after 100k.
Kyle Chan (11:04)
So that’s maybe not what you think is the key bottleneck. Is it also other factors like better tool calling, better integration with other kinds of systems? I don’t know if there’s agent-to-agent integration as well, or agent orchestration. Maybe that’s more important than just sheer context.
Zixuan Li (11:29)
Yes, exactly. And we also make some trade-offs even within a single category. So when you refer to tool calling, what tools are you thinking about? Claude Code tools, or OpenClaw tools, or your self-defined tools, or other things? Sometimes when you improve that, you’ll see some dropping in other categories.
Kyle Chan (11:36)
Mm-hmm.
Do you think that one day this will all be combined into a single vision-capable multimodal model that will basically supplant all the others, like the entire GLM family will merge and converge onto a single model? Do you think that’s the eventual goal?
Zixuan Li (12:16)
Yes, that’s our goal. Frankly speaking, we know how to do it. But we need to launch GLM-5.1 first. I think a 1 million context window and multimodal capabilities are necessary for the future. So that’s our goal. And we might have different things, like Qwen 3.5 Omni. You can see voice and other modalities are already merging into the model. We’re observing all the feedback on whether these capabilities help people in their real-world scenarios.
We are not just a research lab. We do things that help people solve problems. So we define the problems that matter most. On my X account, I frequently do surveys like, what matters more to you? These feedback signals are very useful to our researchers, and they can hear real-world user feedback. We collected more than 10,000 responses from that survey. It’s very quick and very fast. It lets them see the drawbacks of GLM-5, for example, because they might think, okay, GLM-5 is pretty good. Nobody will choose “the capabilities are not great enough.” But frankly speaking, there are a lot of people still choosing that the capabilities need to be improved. Rather than saying you don’t have vision, for many people it’s okay not to have vision, but you need to catch up with Opus 4.6.
Kyle Chan (14:16)
Mm-hmm. Yeah, ultimately that is probably the must-have, right? And then everything else is additional, layered on top of that.
Z.ai has been working on agentic AI models for a long time now, sort of before it was cool. And I was wondering, going forward, we talked about long horizon, but more generally, what are the challenges with developing these models in a way that’s geared more toward these agentic workflows? Are more of the challenges on the engineering side, trying to wrestle enough compute to be able to train and experiment with different models? Or is it how to get the right kind of RL loop going, because how are you going to give the right feedback for such a complex outcome and then iterate on that? I don’t know if there are certain specific challenges to training foundation models and developing them in the age of agents.
Zixuan Li (15:26)
Yes, there are a lot of barriers, a lot of difficulties, so I’ll mention some. First is the speed of compute. We don’t want the task to take longer, because we want the result in a minute. The reason why it seems so long is that the inference speed is not that fast. So you need to wait for the agent to perform the task. Why not have a result in a minute instead of having it in eight hours?
So for long-horizon tasks, I think we need to improve the infrastructure of agentic inference, things like that. Different architecture, not only GPU inference, but also how you organize the results, whether you do the tasks in parallel or in other forms to speed up the process. For me, I like Gemini because Gemini is super fast. We don’t need an answer that takes an hour. You just want to see it instantly.
If you finish a task within 10 hours, that’s not ideal for the general public. Maybe it’s ideal for researchers or developers, and super developers, not ordinary developers, because they don’t trust AI. They don’t want AI to perform tasks for them during their sleep. So to get more of the general public to accept this idea, we need to run these tasks super fast.
So that’s one issue. The second issue is context. When we see the agent doing tasks in its third round or fourth round, when the context window is compressed, sometimes it loses all the information and it cannot follow instructions. It’s pretty normal, not only for Chinese models but also for all the frontier models like Gemini. They declare they have a 1 million context window, but actually after several hundred thousand tokens, they just cannot recall anything or recall key information. So that’s very important.
And there are also a lot of foundational issues. Hallucination. You cannot solve it completely. The model creates something that doesn’t belong to your work or doesn’t exist. So with long-horizon tasks, it grows exponentially. The hallucination will pass from the first round to the last one.
Kyle Chan (18:34)
Yeah, a lot of labs talk about the same problems, and any user who uses these models on a regular basis will know that feeling of when the context window is starting to run out and it’s just not really responding appropriately anymore, kind of making up stuff.
Zixuan Li (18:57)
Yes, and we also lack training data because no one has done it so far. No one has performed a task over 10 hours and collected all the data and then gone back to label it.
Kyle Chan (19:01)
Mm-hmm.
Yeah, so what do you do without that kind of data, right?
Zixuan Li (19:19)
So we need more RL and we need more synthetic data. We think about synthetic data, but also try to find the hallucination inside it, try to correct all the non-instruction-following issues.
Kyle Chan (19:37)
Z.ai has been especially at the forefront in terms of working with Chinese chipmakers and Chinese hardware to support model deployment on AI chips like Huawei Ascend or Moore Threads or Cambricon. And if I recall correctly, GLM Image, the image generation model, was trained end-to-end on Huawei Ascend chips. And then more recently, in the latest Z.ai earnings report, there was a discussion of co-design and hardware-software collaboration. So I was wondering, what is the strategy behind this? Why is this seemingly a big priority for Z.ai? And what’s it like to work with the Chinese chipmakers too?
Zixuan Li (20:28)
I think the reason is quite simple. We don’t have access to NVIDIA chips. All the Chinese companies, I think, face similar issues. And we don’t have Blackwell. So that may restrict the scaling of our capacity and our performance.
When you look at DeepSeek’s tech report, the reason why they choose the number of parameters is based on the infrastructure. It’s the largest model they can train with their infrastructure. So we face similar issues. We try to select the right balance.
Kyle Chan (21:15)
So what has it been like working with the Chinese AI chipmakers? And how closely do you work together? What’s that process entail? How different is it from working with, say, NVIDIA GPUs?
Zixuan Li (21:32)
At first, we don’t work with NVIDIA GPU makers or enterprises. I think it’s fantastic working with Chinese chipmakers. But the only limitation is their supply. With some makers, we just finished the collaboration, but they haven’t produced so many chips yet. So we are still waiting for their large supply in the upcoming months.
And we co-design the chips, but they also have to figure things out for other large language model companies, because there’s not just one model company here. So we try to get an advantage there, because there’s DeepSeek. DeepSeek is also very strong in their architecture work. They have closer collaboration with these chipmakers. But I cannot share many detailed things. It’s secret. Let’s see what happens next.
Kyle Chan (22:37)
Right. That’s great.
So I want to ask a question about open source. A lot of the GLM models have been open source up until recently, and there are a lot of questions about whether 5.1 will be open source. I was wondering what your thoughts were about open-source strategy more generally, whether going forward there might be more of this hybrid approach to having some open-source models but then some proprietary ones, maybe open source for distribution but proprietary more for direct monetization. How do you see either Z.ai or the broader Chinese AI labs approaching the open-source issue?
Zixuan Li (23:30)
I think we are open to all these possibilities, whether commercialization or continuing to open source, whether to open source our flagship model or smaller ones. We are very open. That’s the first point.
And I have an understanding of open source. Especially, I think it’s not open source, because many people think it’s open weight. My understanding of this open-weight concept contains three layers.
The first layer is that through open source, you create your brand image. Compared to U.S. frontier models, not many U.S. citizens or media care about Chinese models when you are closed source. Seed is the best closed-source model in China, but nobody knows Seed. They only know Seedance, instead of Seed, as part of Doubao, right? So you need to let U.S. inference providers run your model. You need to let those people with GPUs at home try your model. That’s what got Qwen famous, what got Kimi and DeepSeek famous. So that’s the first point. I think it’s especially necessary for a Chinese model company to open-weight your model when you want brand image and want more people to know you.
The second layer is that through open source, you collaborate with the community. So you gather help from others. For example, Intellect-3 is based on GLM 4.5 Air, and we see a lot of people using GLM. They fine-tune GLM, not the original GLM. They fine-tune it, they quantize it. They use a lot of techniques to make open source embeddable, like what Cursor just did with Kimi K2.5. It’s pretty influential.
We won’t make that happen by ourselves, because our domain knowledge and expertise can only get it to maybe 80%. But with the knowledge of the whole community, or their domain knowledge, they can improve it into a better model. They can truly maximize the potential of the foundation model. That’s why we call it a foundation model, because it’s a foundation.
Unless we open-weight all the models, can people truly utilize this as a foundation? If we only provide the resource through APIs, people can only use the API and only pay for APIs. It will restrict the capacity and the potential.
And the third layer is that we try to define the norm. It’s the highest target. I think we are close to that, but only DeepSeek and Llama have reached that level. So DeepSeek defines what thinking tokens look like, what the panel looks like, because people only see it from o1, but they don’t know the secret. It teaches people what’s behind the model and truly defines a pattern, a norm. I think only through open source or open weight can you fully let that happen. That’s our goal. We want to see some training patterns, some model architecture, or the behavior of the model become a pattern for the world. Like long horizon. Maybe after three months, everyone is learning how to do long-horizon tasks.
Kyle Chan (27:42)
Mm-hmm. Yeah, but you get to set the trend, basically, and that’s very valuable.
Zixuan Li (27:53)
Yes. They can learn our data pattern, or they can deploy on their chips to see what happens to the model if they do this or if they do that.
Kyle Chan (28:10)
Yeah.
Do you think that there is, or how much stickiness do you think there is for enterprise customers that are building on the GLM foundation models? Once they get used to building with your models, deploying them, customizing them for their own use, fine-tuning them on their own proprietary data, how much do you think that keeps them wanting to come back to GLM models in their next iteration rather than switch over to another one? Versus if you only have the API service, then it’s like, okay, I’ll just plug and play another API.
Zixuan Li (28:52)
Very sticky, because you still see many people using Qwen 2.5. There are lots of models based on Qwen 2.5.
Before Claude Code, maybe we used workflows, like predefined workflows, like DeFi or other things. The workflow is very complicated. If they think it’s working, if it’s effective for their domain expertise, I think they won’t switch to another model unless they want to fully switch the pattern to another thing, like an autonomous agent. If they use workflow and Qwen 2.5 or a variant is enough, they’ll keep it. It’s the same for GLM 4.5 Air, because when you look at ElevenLabs, on their platform they use GLM 4.5 Air and GPT OSS as a foundation.
Kyle Chan (29:46)
Yeah, that’s really interesting. Does that feed back into the way that you guys develop your models? As you’re thinking forward for later generations, are you trying to retain certain features that you know current existing customers really like and want to have, so that when they update with a more recent GLM, they still feel confident that their systems won’t just break and they can keep building on what they had before?
Zixuan Li (30:37)
Yes, exactly. As you can see, 4.5, 4.6, and 4.7 share the same architecture, to make that shift more convenient. And our training data share the same style, so it won’t shift to another area or aspect.
We’re trying to perform better and better, not to perform completely different capabilities. We’re going to strengthen all the aspects we already have. That’s our primary goal.
Kyle Chan (31:16)
Yeah, that makes sense. There’s always going to be some trade-offs because as you add more capabilities to the model, there will be some novel components, and the question will be how they can fit into people’s workflows, or how you can educate customers and convince them that this is worth trying out.
Zixuan Li (31:41)
Yes. I have an example. ArcAGI is quite popular lately, but we don’t train on similar problems because that’s not what our customers are looking for. Our customers are using GLM in Claude Code and in their agentic workflows. They’re not using it to solve math problems or those kinds of problems. So we won’t make that trade-off to let the model improve on ArcAGI instead of improving in Claude Code.
Kyle Chan (31:55)
Mm-hmm.
Right. In that sense, are there certain benchmarks that you care more about because they’re closer to the real-world use cases that you have in mind, and that your customers will use these models for? So maybe it’s not ArcAGI, maybe it’s not some of the math exams, maybe it’s more of these BrowserComp or real-world search benchmarks, things like that.
Zixuan Li (32:51)
We have proprietary benchmarks, but we are about to open source some of them. So we have CCBench, Cloud Code Bench. As you can see in the X post, we use that instead of a well-known benchmark. In that benchmark, you can see that GLM 5 cannot compare to Opus 4.6, it only has a score around three-fourths of Opus 4.6, but GLM 5.1 is pretty close.
So we try to define these problems especially in real-world scenarios, because when we look at other benchmarks, like WebBench or WebBench Pro, some are using a very ideal environment. So the agentic environment does not capture the real-time user experience. We try to use user feedback to make these benchmarks.
And we also have a benchmark for OpenClaw. It’s called Z-Claw Bench. We have already open-sourced it. About 70% of the questions are Chinese. I think that’s okay because we have so many Chinese customers. It’s okay to blend those Chinese queries into the bench. So we optimize on these benchmarks. I think it’s super great. It captures all the necessary questions you can ask inside OpenClaw. It’s on Hugging Face. Everyone who is interested in this benchmark can search it on Hugging Face and try to translate it from Chinese to English or their language.
Kyle Chan (34:37)
Yeah, but I’m sure that more of the OpenClaw-type benchmarks will come out because there’s a lot of demand for trying to understand which models are best for that kind of use case.
Do you think that there will be efforts for either z.ai or other Chinese AI labs to create their own internal OpenClaw, rather than building on the OpenClaw platform, the way Claude now kind of has its own computer-use tool?
Zixuan Li (35:27)
I think it depends. Inside Z.ai, we have several teams building variants. Some are based on the same harness, some are based on their own harness, but they try to use the class and the name.
Kyle Chan (35:41)
Right. People have the image of the claw, right? So that brand is very powerful.
Zixuan Li (35:49)
Yes, we have like five to seven teams working on it. Because you have different customers. You have customers that have some knowledge of OpenClaw, but you also have people who are not aware of this thing. You have to make it a more convenient product for those customers.
Kyle Chan (35:56)
Yeah, that makes sense. That’s what I kind of see with what Anthropic is trying to do, where they try to have at least some version that’s a more simplified, user-friendly version for those who are just used to downloading a piece of software and running it on their desktop.
Zixuan Li (36:31)
Yes, they also have remote control.
Kyle Chan (36:34)
Right, with the app on your phone. So you can literally just be in bed and run more agentic tasks.
Another question that I have is: z.ai had a really high-profile IPO earlier this year, and you’re now a public company with incredible valuations. I was wondering how different it feels now to be a public company. Do you think it affects the day-to-day work that you do? Is there some relief now that you made it to the IPO and got that financing round, or is it now more pressure because there’s a stock price that everyone’s probably aware of somewhere in the back of their minds?
Zixuan Li (37:25)
No changes, frankly speaking, no changes. That is because we focus more on research and application, and we regard ourselves just as a startup. We are closer to zero rather than closer to one or a hundred.
The final strategy of this company is chasing AGI. But we are nowhere near AGI right now, based on our definition, because we need AI to manipulate real-world tools to help people work on real-world tasks. Rather than making slides or doing your Excel, I think that’s superficial right now.
So for the company, we are still at the very beginning stage.
Kyle Chan (38:27)
Yeah, so what do you see as the next steps to getting to AGI? What are the key thresholds? Is embodied AI a necessary part of that, being able to interact with the physical world in some kind of embodied form? World models are kind of the big thing right now. What do you see as the stepping stones there?
Zixuan Li (38:56)
For large language models, we’re going to do the same thing, enhancing coding capabilities and agentic capabilities. But for other ideas like embodied AI, we’re trying something similar because we have performed quite well on Vending-Bench. So we are about to generate a real-world Vending-Bench, a vending machine. We’re about to have a physical vending machine that operates on its own. It will purchase all the items, calculate all the things, and do the payment by itself, interact with the customers, all through that physical vending machine. So we’ll try something new.
And that’s also the reason why we want to do visual large language models. I think the current use cases focus more on taking screenshots and replicating a website or doing OCR. But in real-world use cases, you need eyes. You need to see everything.
Kyle Chan (40:15)
Right. So maybe GLM 5V for vision was one of the steps to try to get that capability, and then later on that’ll be folded into the mainstream GLM line, perhaps.
Zixuan Li (40:43)
Yes. I think Gemini is pretty close to this target.
Kyle Chan (40:49)
How closely do you follow what is happening with your competitors in China and in the U.S.? Is it something where you’re all sort of nervous as you wait for another model release from someone else, or they’re nervously waiting for the next GLM model to come out?
Zixuan Li (41:10)
I think the most nerve-racking thing for us is missing users’ expectations. When we don’t reach product-market fit when we thought we should, there’s a lot of anxiety. If we provide something that they don’t need, I think that’s the most disappointing thing for us.
Kyle Chan (41:37)
Yeah. And then last question: thinking about your global strategy, you’ve talked about this a little bit already, but how do you see approaching the overseas market versus approaching the Chinese domestic market? Is it a very different strategy or a very different set of customer expectations, or way of interacting with customers or building up the community? What’s similar and what’s different inside China versus outside?
Zixuan Li (42:13)
I think inside China, we’re in the reputation stage. So we try to build more reputation and provide better services rather than just pure capabilities, because everyone is aware of GLM and Z.ai. In the overseas market, we’re still in the awareness stage, because only a few people have heard of it.
So I’m very grateful because today Gemma 4 mentioned GLM-5, which brings GLM-5 to many people’s minds. They ask, what’s this? Why did Google compare to this model? I’ve never heard of it.
After this awareness, we begin to create a brand image and show our capabilities and let more people use it. Then we can do some commercialization or similar stuff. But we also want to explore more things, like AI for science and other more frontier aspects.
Kyle Chan (43:30)
Mm-hmm. So it sounds like you’re going in a number of different directions, trying to build better, more powerful models, better features, try to address these different customer demands, and have fun.
Zixuan Li (43:52)
This is very tough for me because I have to talk to a lot of users. Some of them are AI researchers, top scientists, and Z.ai chat users. Some people use AI only for chat. We need to care about their feelings and their feedback. Some people want to remove their history, so I need to figure that out for them.
Kyle Chan (43:57)
Yeah, I see. So then you’ve got to prioritize which of those different requests. It sounds like a very busy time for you guys. And it’s interesting that on top of all the incredible engineering and R&D work, there’s still a lot of basic people-to-people communication, customer feedback, and doing the customer-feedback loop. Even on the way to AGI, this is still a core part of the business.
Zixuan Li (45:01)
Yes, I think I spend more time with human beings than AI lately.
Kyle Chan (45:06)
For now, I guess.
Zixuan Li (45:11)
Because things change so fast. If you have a document and you teach AI to learn from it, after two days it will be obsolete. It will be outdated. Because all the documents say the frontier model is GLM-5, but then you have GLM-5.1 and you need to change everything. But you cannot. So you need to reply to emails one by one. Someone identifies a new issue for you, and you need to figure it out. So 50% of the issues are caused by our new services rather than by the existing ones.
Kyle Chan (45:49)
Right. So you are sort of creating new challenges for yourself as you are solving old ones.
Well, that’s all the questions I’ve got for now. I think it’ll be really exciting once 5.1 is out to be able to play around with the model and try it out. I’m sure everyone has their favorite personal benchmarks or tests that they like to experiment with. I’m sure you have your own toolkit that you use to test and check each of the different models when they come out.
So I just want to thank you for taking the time to chat with me, and especially given your busy schedule. I’ll definitely include links to z.ai, the new GLM 5.1 report when it comes out, and also to your X account, which is really useful. I’ve been following it for a long time, and it’s really useful to hear not just about what’s happening with z.ai, but more broadly what’s going on in the AI landscape. So I’ll definitely include all that in the show notes.
With that, thank you very much, Zixuan, for a fantastic conversation.
Zixuan Li (47:04)
Thank you.
Thank you. Please follow us, and I’ll try to engage with everyone in the audience if you reach out to me.
Kyle Chan (47:18)
Sounds good. I’ll send them your way.
All right, so to wrap up, if you like this episode, please rate and subscribe on YouTube, Spotify, or Apple Podcasts. You can find episode transcripts and more information on the High Capacity Newsletter at highcapacity.org. I’m your host, Kyle Chan. Thanks for joining, and see you next time.



