Model Description

Chat-bot models represent different AI brains. The newer the brain, the better it is to use, but the price will vary. For example, GPT-4.0 is better than GPT-3.5, but GPT-4.0 is 20 times more expensive than GPT-3.5. You can choose according to your needs.

Basic introduction to models:

OpenAl

o1-preview

Released on September 13, 2024, the o1 model marks a significant advancement in OpenAI's inferential capabilities. Compared to the previous GPT-4o, o1-preview excels in multiple aspects, especially in its ability to write code and solve complex problems. Through the adoption of a brand-new optimization algorithm and specially customized new training datasets, the model has achieved notable improvements in accuracy and reasoning abilities.

o1-mini

Launched simultaneously with o1-preview, it is the smaller version in the o1 series. It is 80% cheaper than o1-preview, and although there are usage limitations, it performs exceptionally well in generating and debugging complex code, making it particularly suitable for developers.

gpt-4o-2015-05-13

On May 13, 2024, OpenAI officially released its new version of the large model GPT-4o, where "o" stands for Omni, emphasizing its multi-functional characteristics, from multimodal end-to-end real-time reasoning, without conversion, significantly reducing response latency.

gpt-4o

GPT-4o can take any combination of text, audio, image, and video as input, and output any combination of text, audio, and image. It can respond to audio in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversations.

gpt-4o-mini-2024-07-18

Launched on July 18, 2024, this model is a branch of GPT-4o. While retaining GPT-4's functionalities, it is much smaller in size compared to large models like GPT-4o and is more cost-effective.

gpt-4-plus

An upgraded version of GPT-4, enhancing some features based on GPT-4, including more precise text representation, advanced emotion analysis, and higher semantic understanding capabilities. It performs better in natural language understanding and generation.

gpt-4o-2024-08-06

Launched on August 6, 2024, it's an updated version of the multimodal model GPT-4o. It introduced a breakthrough feature in its API - structured output, ensuring that the model's generated output fully complies with the JSON schema provided by developers, significantly improving API reliability and application accuracy.

chatgpt-4o-latest

Released on August 15, 2024, it's the latest version of GPT-4o with significant improvements in coding, instruction following, and hard prompts. It supports up to 128K context output and a maximum of 16K output tokens, showing considerable enhancement in reasoning compared to GPT-4o.

gpt-4-turbo

Officially launched in April 2024, this model is more powerful, cheaper, and supports a 128K context window. The platform includes new multimodal features such as vision, image creation, and text-to-speech. Its real-world knowledge is updated to April 2023. Price-wise, input tokens are 3 times cheaper than GPT-4, and output tokens are 2 times cheaper.

gpt-4-turbo-preview

The GPT-4-Turbo preview model debuted on November 6, 2023, with a 0125 version update later. It not only improved performance but also fixed a bug affecting non-English UTF-8 generation, enhancing model stability and multilingual support.

gpt-3.5-turbo-0125

The latest model of GPT-3.5-turbo, has higher accuracy in responding to request formats and fixed an error causing text encoding issues during non-English function calls. While improving stability and accuracy, GPT-3.5-turbo-0125 is also priced lower compared to its predecessor.

gpt-4

Officially released by OpenAI on March 14, 2023, it is the fourth generation of the GPT series models. It features stronger language understanding and generation capabilities, multimodal processing abilities, larger model scale, improved safety and ethics. It can be applied in various fields such as dialogue systems, content generation, code writing, data analysis, education, and research.

Gemini

gemini-1.5-pro-002

Released by Google on September 25, 2024, compared to other versions in the 1.5 series, the new 1.5 Pro has improved overall quality, with significant increases in mathematics, long context, and vision capabilities. It can better understand more complex and nuanced instructions. The price has been reduced by >50%, rate limits have been increased by about 3 times, output speed has doubled, and latency has been reduced by 3 times.

gemini-1.5-flash-002

Co-released with gemini-1.5-pro-002, although it's a lighter model, it still performs excellently in multimodal reasoning capabilities. It excels at summary creation, chat applications, providing image captions and video subtitles, as well as extracting data from long documents and tables, with greatly improved response speed.

gemini-1.5-pro

The first-generation model was released by Google on February 15, 2024, as an upgrade to Gemini-1.0-Pro. It supports ultra-long context content, capable of stably processing up to 1 million tokens (equivalent to 1 hour of video, 11 hours of audio, over 30,000 lines of code, or 700,000 words). It supports multimodal input, able to analyze, summarize, and process various forms including images, documents, videos, and audio.

gemini-1.5-pro-0801

This is an experimental version of gemini-1.5-pro, excelling in multilingual tasks and performing exceptionally well in technical areas such as mathematics, complex prompts, and coding. Another notable feature is its extended context window of up to 2 million tokens, which far surpasses many AI models available in the market.

gemini-1.5-flash-001

Introduced by Google on May 15, 1.5 Flash is the fastest Gemini model available via API. While maintaining breakthrough long-text capabilities, it has been optimized for large-scale processing of high-volume, high-frequency tasks. It excels in summarization, chat applications, image and video captioning, and extracting data from long documents and tables.

ANTHROPIC

claude-3.5-soonet-20241022

Anthropic's latest upgraded version, surpassing previous versions in all aspects. Its coding ability has improved significantly, capable of moving the cursor, clicking on corresponding positions, and inputting information through a virtual keyboard according to user instructions, mimicking human-computer interaction.

claude-3.5-sonnet-20240620

Claude-3.5-sonnet is a large language model (LLM) released by Anthropic on June 20, 2024. It is an advance version of the Claude 3.5 series models, demonstrating improved performance in understanding subtle differences, humor, and complex instructions. Its writing style is more natural and approachable, and it excels at explaining charts and graphics.

claude-3-opus

Opus is the most advanced model in the Claude 3 series, showing the best performance in highly complex tasks in the market. It can easily handle various open-ended prompts and unknown scenarios, completing tasks with exceptional fluency and human-like understanding.

claude-3-haiku

This is Anthropic's fastest and smallest model, capable of providing almost instant response times. It can answer simple questions and respond to requests extremely quickly. Haiku is priced at $0.25 per million tokens for input and $1.25 per million tokens for output, which is quite inexpensive.

China Model

Qwen-Max

Qwen-Max is an independently developed large language model by Alibaba Cloud. It is part of the Tongyi Qianwen series, designed to understand and analyze user input in natural language. It is suitable for handling complex, multi-step tasks and offers multiple model versions, including qwen-max-longcontext, which supports a context length of up to 30,000 characters, meeting the needs of tasks requiring long document processing or complex logic.

Qwen-VL-Max

Qwen-VL-Max is an upgraded version of Alibaba's open-source model Qwen-VL. It significantly improves image-related reasoning capabilities, as well as the ability to recognize, extract, and analyze details and text in images. It supports high-resolution images with over a million pixels and various aspect ratios. It excels in Chinese question-answering and Chinese text comprehension tasks.

Qwen-Math-Plus

The Qwen Math model is a language model specifically designed for mathematical problem-solving, dedicated to solving complex and challenging mathematical problems. Its technical principles include large-scale pre-training, specialized corpus, instruction fine-tuning, reward models, binary signals, and PPO optimization. The model has performed excellently in multiple mathematical benchmark tests, particularly surpassing several leading open and closed-source models in solving mathematical competition problems.

GLM-4

GLM-4 is a foundation large model released by ZHIPU AI on January 16, 2024. It supports a context window length of 128k, allowing it to process up to 300 pages of text in a single prompt. In terms of multimodal capabilities, both text-to-image generation and multimodal understanding have been enhanced.

GLM-4-Long

This model is specifically designed for processing ultra-long texts and memory-intensive tasks. It supports extra-long inputs with a maximum context length of 1M, approximately 1.5-2 million characters. It possesses long-text reasoning capabilities and can process million-character texts with controllable response times, making it a powerful tool for handling large-scale textual data.

GLM-4-Plus

Released on August 29, GLM-4-Plus base model has achieved significant improvements in multiple key indicators, especially in language understanding, instruction following, and long text processing capabilities. It has constructed massive high-quality data through various methods and utilized multiple technologies such as PPO to effectively enhance the model's performance in reasoning and instruction following, better reflecting human preferences.

GLM-4-Air

With performance comparable to GLM-4 but priced at only 1 yuan per 100 million tokens, GLM-4-Air is highly suitable for large-scale applications. It's a very cost-effective version with 128k context, fast speed, and affordable pricing.

GLM-4-Airx

GLM-4-Airx is a high-performance version of GLM-4-Air. It maintains the same effectiveness but achieves 2.6 times faster inference speed, with an 8k context.

CodeGeeX-4

CodeGeeX is a code generation large model under ZHIPU AI. The first generation model was released in September 2022, and CodeGeeX-4 was released on July 5, 2024. As the latest generation of the CodeGeeX series, it significantly improves code generation capabilities. A single model can support code completion and generation, code interpretation, internet search, tool invocation, repository-level long code Q&A, and generation functionalities.

GLM-4V

GLM-4V has visual comprehension capabilities, achieving deep integration of visual and language features. It supports various image understanding tasks such as visual question answering, image captioning, visual localization, and complex object detection.

GLM-4V-Plus

A new generation of image/video understanding model released by Zhipu AI, with excellent image understanding capabilities and time-aware video understanding abilities. It performs exceptionally well in image understanding and can comprehend and analyze complex video content. It also possesses strong time perception abilities, not only understanding webpage content and converting it to HTML code but also accurately describing actions and scene changes in videos.

Baichuan3-Turbo

Optimized for high-frequency enterprise scenarios, significantly improved performance, high cost-effectiveness. Compared to the Baichuan2 model, content creation improved by 20%, knowledge Q&A improved by 17%, and role-playing ability improved by 40%.

Baichuan4

The latest generation of foundation model released by Baichuan Intelligence on May 22, 2024. It has improved in general capabilities, mathematics, and code processing abilities. It demonstrates excellent performance in handling Chinese tasks such as knowledge encyclopedias, long texts, and creative generation. In the evaluation by SuperCLUE, a leading domestic large model evaluation institution, Baichuan4's model capability ranked first in China, placing it in the top tier in the industry.

Moonshot-v1-8k

Moonshot-v1 is a language model with hundreds of billions of parameters launched by Moonshot AI. It has excellent semantic understanding, instruction following, and text generation capabilities. Moonshot-v1-8k is a model with a length of 8k, supporting an 8K context window, suitable for generating short texts.

Yi-Lightning

On October 16, 01.AI officially released its new flagship model Yi-Lightning, with significantly improved reasoning speed. The first package time has doubled, and the highest generation speed has increased by nearly 40%. It has outstanding performance in multi-round dialogues, mathematics, coding, and other sub-rankings. It also has a significant price advantage and has been launched on the Yi large model open platform, costing only 0.99 yuan per million tokens.

Yi-Large

A closed-source large model with hundreds of billions of parameters released by 01.AI on May 13, 2024. Its main features include super-strong text generation and reasoning capabilities, suitable for complex reasoning, prediction, and in-depth content creation scenarios.

Yi-Vision

An open-source multimodal language model released by 01.AI. Yi-Vision is developed based on the Yi language model and has high-performance image understanding and analysis capabilities, capable of serving chat and analysis scenarios based on images.

Step-1V-8k

Step-1V, launched by the domestic large model company Stepfun, is a multimodal large model with hundreds of billions of parameters. It has powerful image understanding capabilities and currently only supports text and image input, with only text generation output. Step-1V-8k refers to a context length of 8k.

Step-2-16k

On July 4, 2024, Stepfun released the Step-2 model, which is a giant deep learning model with trillions of parameters. It adopts the MoE structure. In terms of mathematics, logic, programming, knowledge, creation, and multi-round dialogues, Step-2's capabilities are perceived to be approaching GPT-4 comprehensively. Step-2-16k refers to a context length of 16k.

ERNIE-4.0-8k

ERNIE-4.0-8k is Baidu's self-developed flagship large-scale language model, achieving comprehensive upgrades in model capabilities compared to ERNIE3.5. It is widely applicable to complex task scenarios in various fields. It supports automatic connection to Baidu search plugins, ensuring up-to-date information in question answering, and supports 5K tokens input + 2K tokens output.

ERNIE-4.0-Turbo

Released on June 28, 2024, it is an upgraded version of the Ernie-4.0 model launched in October 2023. It is the latest flagship version of the ERNIE series large model, increasing the input token length from 2K to 128K, and improving AI-generated image resolution from 512×512 to 1024×1024, with significant improvements in both generation speed and effectiveness.

Deepseek-Chat

DeepSeek is a model developed by the DeepSeek company, based on the DeepSeek-V2 model. It is an AI technology product that integrates a 200 billion parameter MoE model. Its features include economical training and efficient inference, supporting 128K context in the open-source model, while its dialogue website/API supports 32K context. It provides immediate access, excellent capabilities, and low-cost services, and is compatible with the OpenAI API interface, bringing users a smooth experience.

Deepseek-Coder

DeepSeek-Coder is an intelligent coding assistance tool that can understand programming problem descriptions and automatically generate relevant solutions or code snippets. Its training data volume is as high as 2T, covering 87% code and 13% natural language, including English and Chinese. Different model versions from 1B to 33B meet the needs of projects of various scales.

Doubao-pro-32k

Doubao General Model Pro is a large language model independently developed by ByteDance, suitable for handling complex tasks. It performs well in scenarios such as reference Q&A, summarization, creation, text classification, and role-playing. It supports inference and fine-tuning with a 32k context window.

Spark Max

The Iflytek Spark Cognitive Large Model is a large model released by iFLYTEK. Spark Max is a flagship large language model with hundreds of billions of parameters. Its core capabilities have been comprehensively upgraded, with stronger mathematics, Chinese, coding, and multi-modal abilities. It is suitable for business scenarios that require higher performance in mathematical calculations, logical reasoning, etc.

Spark Ultra

The most powerful version of the large language model, achieving surpassing GPT 4-Turbo in text generation, language understanding, knowledge Q&A, logical reasoning, and mathematical abilities. It optimizes the internet search chain to provide more accurate answers.

Spark Lite

A lightweight large language model with higher response speed, suitable for low-compute inference and model fine-tuning customization scenarios, capable of meeting enterprise product rapid validation needs.

SenseChat-5

The latest large language model released by SenseTime Group on April 23, 2024, adopting a Mixture of Experts (MoE) architecture, with a parameter count of 600 billion, supporting a 200K context window.

SenseChat-Turbo

Also a model released by SenseTime, cheaper compared to SenseChat-5, with a price of 0.005 RMB per call.

abab6.5s

MiniMax Technology officially launched the abab6.5 series models on April 17, 2024. The abab6.5 series includes two models: abab6.5 and abab6.5s. abab6.5s is more efficient, supporting a context length of 200k tokens, capable of processing nearly 30,000 characters of text within 1 second.

Hunyuan-Lite

With billions of parameters, Hunyuan-Tencent's large model is a trillion-parameter model developed entirely in-house by Tencent. Hunyuan-Lite has been upgraded to a MOE structure with a context window of 256k, leading many open-source models in multiple evaluation sets including NLP, code, mathematics, and industry-specific tasks.

Hunyuan-Standard

With hundreds of billions of parameters, it adopts a more optimized routing strategy while alleviating the problems of load balancing and expert convergence. In terms of long texts, the needle-in-a-haystack indicator reaches 99.9%. hunyuan-standard-32K offers relatively higher cost-effectiveness, balancing performance and price while handling long text inputs; hunyuan-standard-256K further breaks through in length and effectiveness, greatly expanding the possible input length.

Hunyuan-Pro

With trillion-level parameters, it is currently the best-performing version among Hunyuan models, achieving absolute leading levels in various benchmarks. It excels in complex instructions and reasoning, possesses advanced mathematical capabilities, supports function calls, and has been optimized for applications in multilingual translation, finance, law, medicine, and other fields.

Hunyuan-Code

The latest code generation model from Hunyuan, trained on 200B high-quality code data to enhance the base model, iterated with high-quality SFT data training for half a year. It has increased the context window length to 8K and ranks at the top in automatic evaluation metrics for code generation in five major languages. In high-quality manual evaluations of comprehensive coding tasks across five languages and 10 aspects, its performance is in the first tier.

Hunyuan-Vision

The latest multimodal model from Hunyuan, supporting text generation from image and text input. It includes capabilities such as basic image recognition, image content creation, multi-round image dialogue, image-based knowledge Q&A, image analysis and reasoning, and image OCR.

Expert Model

farui-plus

A legal large language model product launched by Alibaba Cloud, based on the Tongyi Qianwen model and specially trained with legal industry data and knowledge. It features legal intelligent dialogue, legal document generation, legal knowledge retrieval, case analysis assistance, legal text reading and parsing, contract clause review, and other functions.

XuanYuan-70B

XuanYuan is China's first open-source hundred-billion-parameter Chinese dialogue large language model, and also the first open-source dialogue model of this scale optimized for the Chinese financial domain. XuanYuan 70B (Du Xiaoman Chinese Financial Dialogue Large Model) is a Chinese financial dialogue model developed by Du Xiaoman Financial. For long-text business scenarios in finance, it extends the context length to 8k and 16k, significantly improving financial comprehension while maintaining general Chinese and English language capabilities.

ChatLaw

A Chinese legal large language model jointly released by the Yuan Li research group at the School of Information Engineering, Peking University, and the PKU-Rabbitpro AIGC Joint Laboratory in July 2023. It is trained on various Chinese legal texts, actual cases, and judicial documents. The model can assist with AI-powered legal contract drafting, case introductions, clause explanations, and judicial consultations.

Qwen2-Math-72B

On August 8th, Alibaba open-sourced the Qwen2-Math series models, which focus on mathematical reasoning abilities. Evaluation results on Math benchmarks show that Qwen2-Math-72B surpasses state-of-the-art models, including GPT-4, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B.

grok-2

Released on August 13, 2024, it is an advanced artificial intelligence language model launched by xAI company, including Grok-2 and Grok-2 mini versions. It has chat, coding, and reasoning functions. The model has 314 billion parameters, making it the open-source model with the largest number of parameters to date, giving it stronger capabilities in handling complex tasks and generating high-quality text.

pplx-8b-online

Introduced by Perplexity AI, it is an online model based on large language models (LLMs) that provides instant, accurate query responses using real-time internet data. Available through API, it enables immediate response to queries, marking the first public access to online LLMs via API.

pplx-70b-online

Built on the Llama2-70B base model, this online model's main feature is its ability to access real-time internet data, thus providing the latest information. It prioritizes access to high-quality and authoritative websites through Perplexity Labs' internal search infrastructure and uses advanced ranking mechanisms to present relevant and reliable information snippets in real-time.

Open Source Model

Llama3.2-90B

Officially launched on September 25, Meta's most advanced model excels in common sense, long text generation, multilingual translation, coding, mathematics, and advanced reasoning. It has also introduced image reasoning capabilities, capable of completing image understanding and visual reasoning tasks.

Llama3.2-11B

Highly suitable for content creation, conversational AI, language understanding, and enterprise applications requiring visual reasoning. This model performs excellently in text summarization, sentiment analysis, code generation, and instruction execution. It has added image reasoning capabilities, with use cases similar to the 90B version: image captioning, image-text retrieval, visual foundation, visual question answering and reasoning, as well as document visual question answering.

Qwen2.5-72B

Introduced on September 19, it supports context lengths of up to 128K and can generate up to 8K content. It supports over 29 languages and is pre-trained on 18T token data. Compared to Qwen2, Qwen2.5 has improved overall performance by more than 18%, possessing more knowledge and stronger programming and mathematical abilities. The 72B version is a performance powerhouse suitable for industrial and research-grade scenarios.

Qwen2.5-Coder-7B

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models, specifically designed for code generation and programming-related tasks. It adopts a Transformer-based architecture and introduces specific optimization techniques such as Grouped-Query Attention (GQA) and Dual Context Attention (DCA) to optimize inference performance and long text processing capabilities.

Llama3.1-405B

Llama3 is Meta's latest open-source large language model. The 405B version is suitable for synthetic data, large language models (LLMs) as judges or distillation. It supports a context length of 128K tokens and 8 languages, allowing the use of model outputs to improve other LLMs.

Llama3.1-70B

The 70B version is suitable for large-scale AI-native applications. It has 70 billion parameters and supports text generation in 8 languages. It uses optimized Transformer architecture and is further enhanced through supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety.

Llama3.1-8B

The 8B version is suitable for researchers and developers who need to develop natural language processing and dialogue systems in multilingual environments. It contains an 8B-sized version, supports 8 languages, and is optimized for multilingual dialogue use cases.

Llama3-70B

Llama3 was launched by Meta on April 18, 2024, with 8B and 70B versions. The 70B version has 70 billion parameters. This increase in complexity translates into enhanced performance in various NLP tasks, including code generation, creative writing, and even polymorphic applications. It also requires more computational resources and needs a powerful hardware setup with ample memory and GPU capabilities.

Llama3-8B

The Llama3-8B model strikes a balance between performance and resource requirements. It has 8 billion parameters, offering impressive language understanding and generation capabilities while remaining relatively lightweight, making it suitable for systems with moderate hardware configurations.

Mistral-Large-2

On July 26, 2024, French AI startup Mistral AI released its latest model, Mistral-Large-2, with 123 billion parameters. It excels particularly in code and mathematical reasoning, has a context window of 128k, supports dozens of natural languages and 80+ programming languages. Its pre-trained version achieved an impressive 84.0% accuracy on MMLU.

Mixtral-8x7B

Mixtral-8x7B is the first open-source MoE (Mixture of Experts) large model, featuring 8 experts with 7B parameters each and efficient sparse processing. The model consists of 8 feed-forward blocks (i.e., "experts") at each layer, with a routing network selecting two experts for processing each token and combining their outputs. This architecture allows the model to access more parameters while maintaining low computational costs.

Gemma-7B

Gemma is an AI large model developed by Google. Gemma-7B has 7 billion parameters and is designed for efficient deployment and development. It's suitable for consumer-grade GPUs and TPUs, and its performance is comparable to the best models at the 7B parameter level, including Mistral-7B.

Gemma2-9B

Gemma2 is Google's latest open-source model released in June 2024. Compared to the first generation, Gemma2 offers higher inference performance, improved efficiency, and significant advancements in safety. Gemma2-9B's performance leads among its peers, surpassing Llama3-8B and other open-source models of similar scale.

Gemma2-27B

Gemma2-27B performs best in its class and can even challenge larger models. The 27B model can efficiently run inference at full precision on a single Google Claude TPU host or NVIDIA H100 GPU, significantly reducing costs while maintaining high performance.

Command R+

Command R+ is the latest model introduced by Cohere, the most powerful in the Command R series with a total of 104 billion parameters. Its uniqueness lies in its strong generative capabilities and advanced retrieval functionality, allowing the model to retrieve relevant content from external knowledge sources based on given context information and integrate it into generated responses, effectively mitigating the model's "hallucination" problem.

Command R

Command-R has 35B model parameters, providing powerful language understanding and generation capabilities. The model supports a context window of up to 128K, far exceeding industry standards, enabling it to handle more complex texts and generate more coherent content.

Qwen2-72B

Released on June 7, 2024, it is one of the Qwen large model series developed by Alibaba Cloud. The 72B instruction-tuned version model also increased its supported context length to up to 128k tokens. It features large-scale high-quality training data, powerful performance, and a more comprehensive vocabulary coverage.

Qwen2-7B

Qwen2 is a new generation of multilingual pre-trained models launched by Alibaba's Tongyi. It includes 5 sizes: 0.5B, 1.5B, 7B, 57B-A14B, and 72B. Among them, Qwen2-7B supports longer context lengths, up to 128K tokens. In addition to Chinese and English, it incorporates high-quality training data for 27 new languages.

Llama-3.1-nemotron

This is a series of large language models developed by NVIDIA, based on Llama-3.1-70B. It adopts a novel Neural Architecture Search (NAS) method to create a highly accurate and efficient model. Under high workloads, the model can run on just one NVIDIA H100 GPU, making it easier to use and more cost-effective.

Supported models and prices:

Please check: https://302.ai/pricing/

A small trick to distinguish between 4.0 and 3.5:

GPT-3.5:

GPT-4.0:

Chat-bot Sharing

Usage Tips

Last modified: 2024-10-23

Outline