DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,China Movies | Adult Movies Online Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-25 23:27
1095 views
5 nominations we'd love to see at the 2018 Emmys
The past 12 months have been roughly 700 years long, but Thursday's Primetime Emmy nominations remin
Read More
2025-06-25 23:01
2568 views
Google Calendar's 'working hours' are great for passive aggressiveness
You work a 9-to-5, and yet some jerk in the office keeps scheduling you for 8 a.m. meetings. This ma
Read More