ByteDance Poaches Alibaba's AI Team; Sakana AI creates a Self-Improving Robot

December Week 1: Dec 3 - Dec 9

Dec 10, 2024

Hi friends 👋,

In this week’s edition of Coconut Capitalists, we’re diving into:

Alibaba’s #1 LLM Team Poached by ByteDance
Sakana AI’s Self-Improving Artificial Intelligence
Quick-Fire Startup News from Around Asia

Let’s get into it.

Alibaba’s #1 LLM Team Poached by ByteDance

The Scoop

This week, it was reported that Zhou Chang, one of the top AI Scientists behind Alibaba's Qwen Family of Language Models, which is often regarded as China's Best-In-Class Model Family, has been poached by rival ByteDance.

And ByteDance didn't just acquire Zhou – they also brought over his entire team of direct reports from Alibaba's DAMO AI division. The group joined ByteDance to help develop the next generation of Doubao, the company's family of language models that competes with offerings like OpenAI's GPTs and Anthropic's Claudes.

But why would Zhou leave Alibaba at such a pivotal time in China's AI race? An AI race that is hot hot hot with competition from billion dollar startups like Moonshot AI, Zhipu, and 01.AI. And let's not forget about the Tech Giants such as Tencent with Hunyuan and Baidu with Ernie, that have also built top 1% AI Research Teams.

Reportedly, ByteDance offered him a pay package worth $10 million USD cash compensation + a significant amount of stock to join ByteDance's Doubao team.

Now, it's important to note, pertaining to Zhou's departure from Alibaba, he allegedly told executives he was leaving to start a company. But within two months of his exit, Zhou was discovered to have not only joined ByteDance, but had also started hiring away his previous Qwen Research Staff from Alibaba. Due to this situation, Alibaba has accused Zhou of violating both his non-compete & non-solicitation agreements with the company, and is pursuing arbitration in the coming months.

Poaching Often Goes Very Wrong

If we look at recent history, there are two great examples of engineering teams leaving market leaders (from a technology perspective) to join underdogs.

The first example is Tesla’s vehicle engineers leaving to join Apple's electric car division (maybe some of you remember the codename "Project Titan"). It’s worth highlighting that in just one week, Apple poached 40 of Tesla's top engineers, including Tesla's Vice President of Vehicle Engineering (their top guy!). Apple's electric car project ended up being in development for over 9 years & cost the company a reported $12 Billion USD before Tim Cook lost his patience and shut it down entirely.
The second example is when Uber poached 8 of Waymo's top self-driving researchers to kickstart a brand new division called Uber ATG (Advanced Technologies Group). Similar to the Apple example, this also flopped. Over the span of 5 years, the division burned through a reported $1.4 Billion USD. Oh, and get this - the Waymo team lead who joined Uber later found himself in prison for "borrowing" Waymo's source code. But that's a story for another day...

It's worth noting that I'm probably being overly pessimistic & simply highlighting examples where these situations did not work out (& I'm not sure why, maybe it's the Singapore rain making me sad). But anyways, there's actually a more closely related comparison to ByteDance hiring Alibaba's Top Researchers - with OpenAI.

In 2021, a group of seven AI researchers left OpenAI in an act of rebellion over what they referred to as a "lack of concern for AI Safety" to co-found a competing AI firm called Anthropic. Fast forward to today - the company is now worth a reported $40 Billion USD and has annual revenue of over $1 Billion from customers like Jane Street, Notion, & Zoom. Even though the Anthropic example wasn't a "poach" per se, it's an incredible success story of a group of talented engineers leaving a company to start something new.

Why it Matters

It's difficult to explain just how world-class Zhou Chang & Alibaba's Qwen Team of AI Researchers have historically been. Before we dive in, it's worth noting that the Qwen team is reportedly over 200 people strong, so a small team leaving DAMO is for sure a brain-drain, but by no means is it catastrophic to Alibaba.

In terms of numbers, over 90,000 enterprises currently access the Qwen Models (like Qwen-Plus and Qwen-Turbo) through Alibaba's Cloud Platform. And that's not counting the tens of thousands of additional enterprises accessing open-source versions of the models through providers like AWS, GCP, and Azure. Alibaba loves to brag about how companies like Xiaomi, Sephora, & LVMH (yes, even the handbag company is excited about AI) are building sophisticated customer experiences on top of Qwen.

Moreover, in China, Alibaba Cloud owns a whopping 39% of the market, with Huawei Cloud & Tencent Cloud coming in at 19% and 16%, respectively. These 39% of Chinese cloud customers need a model family to build upon, and the Alibaba Cloud product is so much more sticky when offering best-in-class proprietary models that run only on this 1 cloud provider. It’s the same strategy that the team over at Microsoft Azure is taking with their exclusive OpenAI deal.

Sakana AI’s Self-Improving Artificial Intelligence

The Scoop

On December 3rd 2024, Japanese artificial intelligence startup Sakana AI released a blog post titled "Population-based Model Merging via Quality Diversity," highlighting what this world-class team of Ex-Google AI researchers has been pursuing. One of the many research directions they're tackling is developing a framework called CycleQD that can spin up large networks of AI Agents, capable of autonomous self-improvement to complete even the most complex of tasks.

The framework is still very new and has yet to be fully realized (right now it's much more focused on model merging and SVD mutations). But, in a future version of CycleQD, the system could run much more like a traditional organization; where instead of people, AI agents are completing tasks, and the CycleQD system could run as a manager of sorts, completely autonomous & with zero human intervention.

Now, let's take a quick step back and provide a recap on Sakana AI for our friends who are less familiar with the company: Sakana AI is the brain-child of co-founder Llion Jones (one of eight co-authors on the world's most important AI research paper, which introduced the Transformer Architecture; OpenAI's GPT Series and many other modern language models are built upon this).

The co-founders include:

David Ha (CEO): Previously a Managing Director at Goldman Sachs Japan; also spent 6 years as an ML researcher at Google Brain.
Ren Ito (COO): Former member of the Japanese Prime Minister's cabinet, ex-CEO of Japanese unicorn startup Mercari, and former COO of Stability AI.

Additionally, the company is currently worth $1.5 billion USD, backed by Nvidia, Khosla Ventures, and Japan's three largest banks. They've got a multi-million dollar supercomputing grant from the Japanese government, boast one of the top AI teams in Japan, and have raised a total of $344 million in funding - all in less than two years since incorporation.

Breaking down CycleQD

CycleQD is not an AI model per se, but instead a framework (Complex Python code) that could evolve into how large swarms of AIs will eventually collaborate and solve complex tasks. Multi-agent frameworks aren't new: there's LangChain, Auto-GPT, BabyAGI, and many more frameworks you can find with a quick GitHub search. But what's novel about this research direction is that each AI model will have the ability to recursively & rapidly self-improve.

Currently, for an AI model to get better, one of our fellow friendly humans would have to take a pre-trained model off the shelf (like Llama 3), manually collect fine-tuning data, fire up a GPU, write the Python architecture code, kick-start the training job while painfully debugging hardware failures, and then finally redeploy the improved model so the world can access it. This is work!!

Here's a speculative example of what this research direction, if successful, could turn into as an end-state (I must note, this is my own hypothetical - not one provided by Sakana AI). Imagine a system that can create a Hollywood-production-level movie without a single human in the loop. It's as compelling as anything you'd find in a cinema and is completely generated by a self-improving artificial intelligence. Sakana AI is far away from this outcome, but this highlights how the research could eventually be turned into a product.

As a very forward looking example (just for fun), let's say Sakana AI is partnering with Netflix to develop “AI Generated Original Movies”.

The way the CycleQD system would work is:

First, the CycleQD framework would be given an initial group of 100 identical Llama 3 LLMs, each with 8 billion parameters as a starting point.
Second, the system is then told to accomplish a very specific task: "create a movie that earns a 95% thumbs-up approval rate from 100 randomly selected Netflix users". Luckily, today we have all the AI tools to accomplish this:
- 4K Quality Synthetic Video Generation (Like Pika Labs or OpenAI Sora)
- Voice Cloning Technology (Like Eleven Labs or PlayHT)
- AI Music Composition (like Udio or Suno)
- CycleQD could eventually be the missing piece to seamlessly integrate and coordinate this entire process—much like how Apple's Final Cut Pro or Adobe Premiere Pro brings together various elements to assemble a finished movie.
Third, the system randomly selects one Llama 3 model as the leader. The “lead AI” now breaks down the movie creation process into 99 distinct steps for the 99 other Llama 3 models to complete. For example:
- One model handles actor selection by searching the web in real-time via the Bing API for the top 10 actors of 2024
- A second model takes an input from the first model to then create the synthetic AI Avatars for all 10 actors via API calls to HeyGen
- Now, a third model generates an entire catalog of background music and soundtracks for the movie via API calls to OpenAI’s Jukebox
- And so on...
Fourth, once the initial version of the movie is complete, CycleQD presents it to the 100 Netflix viewers. But, the results are disappointing—only 16 people gave it a thumbs up. The system then takes this feedback data, improves itself via fine-tuning jobs, model-merging processes, training additional models from scratch, firing underperforming models, and other techniques. And now CycleQD's second movie attempt gets 21 likes. After 21 likes, the third attempt yields 34 likes, and the cycle of self-improvement continues.

Why it Matters

Just as humans figured out how to make "rocks" (phones) communicate across the world, what often seems like hyperbole today—such as "Hollywood-level synthetic movies"—may soon become the norm. It's worth noting that Sakana AI's perspective differs from the idea of a single "god model" (which competitors such as OpenAI seem to be betting on); instead, they envision a future powered by a myriad of small, specialized models collaborating to solve the most complex tasks.

🇻🇳 Vietnam News

Nvidia has signed an agreement with the Vietnamese government to set up an AI R&D center and has committed to a reported investment of $400 million USD in Vietnam over the next 5 years. This announcement came immediately on the heels of another announcement that they would acquire VinBrain, an AI Healthcare startup owned by the Vietnamese conglomerate Vingroup.

If you're not familiar with Vingroup, it has a similar structure to players you'd find in South Korea like Samsung and Hyundai - very successful, family-controlled conglomerates with significant government connections. The difference here is that Vingroup is by far Vietnam's largest private conglomerate, with significant influence across various sectors. Its subsidiaries include Vinhomes (real estate), VinFast (automobiles), Vincom Retail (shopping malls), Vinmec (healthcare), and Vinschool (education). Acquiring VinBrain could be seen as a method for Nvidia to develop closer ties to Vingroup (and I don't say this in a negative light - VinBrain has a cracked AI team that is world-class).

🇰🇷 Korea News

Rebellions & Sapeon, two of South Korea's most well-funded ($225 million & $45 million, respectively) fabless semiconductor design startups, have officially merged. Rebellions was founded in 2020 by a top 1% technical team, led by Sunghyun Park, an Electrical Engineering & Computer Science PhD from the Massachusetts Institute of Technology (MIT, #1 STEM University in the World), who was previously the ASIC design lead for SpaceX's Starlink program. Sapeon was created in 2022 as a spin-out from SK Group.

Both startups have historically built chips solely for AI Inference workloads and will now operate under one roof with the name "Rebellions". The company plans to collaborate closely with SK Telecom & SK Hynix to expand into the global AI data center market.

Ably, South Korea's fastest-growing e-commerce fashion app, has raised $71 million from Alibaba and now has a $2.1 billion valuation - making it Korea's first new unicorn of 2024. It's worth noting, the company has 9 million monthly active users and is rapidly accelerating with a user growth rate exceeding 30% per year.

🇸🇬 Singapore News

ByteDance now has the world's second most used AI chatbot, behind OpenAI's ChatGPT. ByteDance's Doubao has a reported 62 million weekly active users. It's worth noting, this pales in comparison to OpenAI's 300 million weekly active users, but it highlights a continuation of strong momentum from ByteDance's product portfolio, which includes Douyin, CapCut, Lark, Toutiao, and TikTok.

Speaking of TikTok, the success of Doubao is even more critical given the significant geopolitical headwinds ByteDance faces from the US. TikTok will soon face a "ban" in the US if the company does not find an American buyer for the app. TikTok is currently used by over 170 million Americans and on the open market could be worth $100 billion. Oracle is best positioned to purchase TikTok - they have a market cap of $525 billion, already run all of TikTok's US cloud infrastructure, and would not face the same regulatory scrutiny as Microsoft, Meta, or Google. If Oracle is not interested, look for Walmart to make a move - they have a market cap of $750 billion.

🇨🇳 China News

Moonshot AI's founder Yang Zhilin (a PhD graduate from Carnegie Mellon University, whom many refer to as the next "Colin Huang") recently responded to two public claims: first, that he created the initial intellectual property for Moonshot AI (now valued at $3.3 billion USD) while working at his previous company, Recurrent AI; and second, that he did not receive proper consent from Recurrent shareholders to start his new venture. In his response, Yang countered these claims, stating that when he left Recurrent AI, he had received signed consent from all board members. It's worth noting that the facts of this situation are difficult to verify, as the claims and counterclaims have turned into a bit of a "he said, she said" situation.

OnlyFans, a US startup valued at over $18 billion dollars and known as a creator subscription platform for adult content, is now fully accessible in China.

The company, which paid out $6.6 billion USD to creators in 2023, has been completely unblocked from the Chinese Internet Firewall. It's worth noting that this $6.6 billion number is more than the NBA paid all of its players, which totaled $4.9 billion. Arguably, OnlyFans is larger than basketball.

This instance highlights the lack of transparency around which businesses are allowed in China. Facebook, YouTube, and Netflix are banned, while American company OnlyFans, known for explicit content, gets the a-okay from party officials.

Tencent has released an open-source synthetic video generation model that competes directly with closed-source rivals such as Pika Labs & OpenAI's Sora. The model, known as HunyuanVideo, packs a whopping 13 billion parameters and is completely free for anyone to use. It's now officially the world's largest open-source video generation model, with Tencent showing off some seriously impressive demos featuring everything from hot air balloons floating through photorealistic skies to surfers navigating complex waves.

🇮🇳 India News

Enterpret, a customer feedback data platform (similar to Segment or Qualtrics but with AI at its core), has raised $20.8 million USD from firms like Peak XV & Kleiner Perkins. The company, which is located in Bengaluru, India with satellite offices in New York City & San Francisco, already has enterprise contracts with customers such as Canva, Vimeo, and Nextdoor.