Mastodo
Mastodo

@eiJil@mastodon.social

May 3, 2022

DeepSeek-V3 Technical Report

, Liu, Feng, Xue, Wang, Wu, Lu, Zhao, Deng, Zhang, Ruan, Dai, Guo, Yang, Chen, Ji, Li, Lin, Dai, Luo, Hao, Chen, Li, Zhang, Bao, Xu, Wang, Zhang, Ding, Xin, Gao, Li, Qu, Cai, Liang, Guo, Ni, Li, Wang, Chen, Chen, Yuan, Qiu, Li, Song, Dong, Hu, Gao, Guan, Huang, Yu, Wang, Zhang, Xu, Xia, Zhao, Wang, Zhang, Li, Wang, Zhang, Zhang, Tang, Li, Tian, Huang, Wang, Zhang, Wang, Zhu, Chen, Du, Chen, Jin, Ge, Zhang, Pan, Wang, Xu, Zhang, Chen, Li, Lu, Zhou, Chen, Wu, Ye, Ye, Ma, Wang, Zhou, Yu, Zhou, Pan, Wang, Yun, Pei, Sun, Xiao, Zeng, Zhao, An, Liu, Liang, Gao, Yu, Zhang, Li, Jin, Wang, Bi, Liu, Wang, Shen, Chen, Zhang, Chen, Nie, Sun, Wang, Cheng, Liu, Xie, Liu, Yu, Song, Shan, Zhou, Yang, Li, Su, Lin, Li, Wang, Wei, Zhu, Zhang, Xu, Xu, Huang, Li, Zhao, Sun, Li, Wang, Yu, Zheng, Zhang, Shi, Xiong, He, Tang, Piao, Wang, Tan, Ma, Liu, Guo, Wu, Ou, Zhu, Wang, Gong, Zou, He, Zha, Xiong, Ma, Yan, Luo, You, Liu, Zhou, Wu, Ren, Ren, Sha, Fu, Xu, Huang, Zhang, Xie, Zhang, Hao, Gou, Ma, Yan, Shao, Xu, Wu, Zhang, Li, Gu, Zhu, Liu, Li, Xie, Song, Gao, Pan
arxiv.org/abs/2412.19437 arxiv.org/pdf/2412.19437 arxiv.org/html/2412.19437

arXiv:2412.19437v1 Announce Type: new
Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at github.com/deepseek-ai/DeepSee.

基於Ollama的一款長文本自動筆記生成工具:ollama-ebook-summary,它可以生成有條理的要點筆記,並保留原文的層次結構
對需要整理大量文本材料的場景比較實用,學術研究材料整理、內容快速消化或教學研究等

它會將長文本先分成小塊進行處理,而不是一次處理整個文件,對每個小塊分別提問和總結,以確保其響應的粒度精準,不遺失細節

能自動識別和提取電子書(epub/pdf)的目錄結構,保留原文的章節層次,生成有層級的筆記內容,重要內容會加粗顯示

可指定不同的總結方式,比如全面的要點筆記或簡明扼要的總結等; 輸出格式支持Markdown、CSV、TXT等

專案地址: github.com/cognitivetech/ollama-ebook-summary

Elk Logo

Welcome to Elk!

Elk is a nimble Mastodon web client. You can login to your Mastodon account and use it to interact with the fediverse.

Expect some bugs and missing features here and there. Elk is Open Source and we're actively improving it as a community project. Join us and let's build it together!

If you'd like to report a bug, help us testing, give feedback, or contribute, reach out to us on GitHub and get involved.

To boost development, you can sponsor the Team through GitHub Sponsors. We hope you enjoy Elk!

PatakDaniel RoeJoaquín SánchezAnthony Fu三咲智子 Kevin DengTAKAHASHI Shuuji

The Elk Team