Apple Unveils Ferret: A Pioneering Open Source Multimodal LLM

Apple's latest innovation, Ferret, is a multimodal large language model (LLM) that understands text, images, and audio, marking a significant shift towards open-source AI development.

  • Apple Launches Ferret: Apple has quietly launched Ferret, a new multimodal large language model (LLM) in October, stepping into the realm of open-source AI development. Unlike its typical approach of closed systems, Apple’s move towards an open-source model with Ferret is a notable shift in its strategy【source】【source】.

  • Understanding Text, Images, and Audio: Ferret stands out as it can process and understand not just text, but also images and audio. This capability positions Ferret alongside major players like Google’s Gemini, Meta’s Llama 2, and OpenAI’s GPT series, offering a new tool for developers and researchers in AI【source】.

Ferret's Capabilities and Usage

Ferret's design allows it to refer and ground any element at any granularity within an image. This means it can identify and analyze specific parts of an image based on user queries, providing a unique interaction between AI and visual data. For instance, if a user highlights an animal in a photo and asks what it is, Ferret can determine the species and offer context-related responses based on other elements in the image【source】.

The model was collaboratively developed by Apple and researchers from Cornell University. Initially released on GitHub in October, Ferret's code, along with Ferret-Bench, was made public without much publicity, and its significance grew among AI researchers only later【source】.

Implications and Future Prospects

The launch of Ferret under a non-commercial license implies that it cannot be commercialized in its current state. However, there's potential for its integration into future Apple products or services. This release also indicates Apple’s increased openness in sharing its AI research, contrasting with its traditionally secretive approach【source】.

While Apple is enhancing its AI server capabilities, it still faces challenges in scaling to match the likes of ChatGPT. The release of Ferret, trained on powerful Nvidia GPUs, demonstrates Apple's commitment to advancing AI technology and potentially collaborating with other firms to augment its capabilities【source】.

