Security has entered the era of large models, "seeking" the new decade according to the picture. 07/12 Update SLTechnology News&Howtos

Security has entered the era of large models, "seeking" the new decade according to the picture.

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

Since 2016, AI visual entrepreneurship can be found everywhere in China.

They got together because of AI and walked out of the top academic ivory tower, hoping to use the key of AI to revolutionize the long-standing business model of traditional industries.

However, looking back on this dream exploration trip, most of the high-profile entrants ended sadly in the end, and only a few enterprises broke through the blockade, and the outstanding ones became what people call the "four Little Dragons of AI."

After the initial highlight, doubts about AI's high financing, high R & D and high losses are heating up.

In the discussion about the stall of the four little dragons, the halo of technology has faded, capital enthusiasm has dissipated, and policy risks have increased, which are the most well-known reasons.

From the same CV front to separate paths, things are now faced with the same problem-where is there new vitality?

The turnaround of the market often takes place at the moment of technological evolution.

In 2023, which is defined as the first year of China's "big model", the gear of fate began to turn again.

With the tuyere of AIGC, the AI track has changed its decline and regained the top of the list of all kinds of hot topics.

In the re-opening of the new competition, the old players of AI will certainly not be absent.

It is too early to say that "Security + AI" is out of date.

In 2016, when AlphaGo took away the last glory of mankind on the board game, the investment and financing of the AI circuit began to be hot.

In the first few years, startups were spoiled by venture capital, capital scrambled to pay the bill, and companies could sit back and watch valuations rise as long as they were obsessed with laboratory research and development.

However, over the past few years, watching enterprises in the quagmire of financing, R & D and losses, commercialization has been difficult to roll out on a large scale, and the capital is no longer willing to listen to the story to pay for the loss.

Ideally, under the gift of technology, this is a win-win cooperation of "everyone gathering firewood", but in fact, we all work together to promote not only the surge, but also the grand bubble.

What follows is the discussion on the AI business model in the market, whether it can verify its own commercialization ability, and become a new assessment standard to judge an AI enterprise.

In just a few years, the wind turned rapidly, because the imagination brought by AI, a very subversive technology, masked the difficulties that new technologies would encounter in the process of value transformation.

Security is one of the earliest landing scenes of AI. Through face recognition, behavior analysis and other technologies, the performance of the monitoring system can be improved. But at the same time, the fragmentation characteristics of the security industry make standardization and generalization difficult to achieve, resulting in AI enterprises in the pursuit of algorithms and accuracy in the process, into a highly customized profit dilemma.

In particular, with the end of the Xueliang project construction, the security industry has also encountered development bottlenecks, so some people asked, is the problem in the security industry, another industry, the problem will be easily solved?

One fact is: AI has indeed encountered the problem of commercial landing, but security is the starting point for the landing of AI applications, and security is still a good starting point for the opening of the era of large models.

From the perspective of intelligence, intelligence essentially solves the five aspects of machine learning: cognition, vision, movement, consciousness and memory. The core technologies of artificial intelligence include language intelligence, visual intelligence and motion intelligence.

The biggest application scene of visual intelligence is in the public security, that is, the security market in a narrow sense.

For a long time, the landing of AI vision has gone through the same process, from the beginning of the public security, to the government, and then to the enterprise, the landing of the large model is no exception.

In the view of Xu Yan, vice president of science and technology according to Yitu, every progress of intelligence cannot be separated from taking security as the starting point. "in all government departments, the informationization of public security departments has been very advanced. There is a rigid demand for the use of video data in business, and they have the deepest understanding of technology, and they are most willing to use new technology to solve the problems they face."

The crux of the problem in the past is that the original technical route of AI has always been difficult to break through the cost bottleneck, but now the emergence of a new round of AI tuyere represented by a large model is equivalent to providing an excellent solution to the problem of both value and cost.

The arrival of the big model has also brought new vitality to the AI enterprises, which are in a period of transformation and rethinking their own value and way out.

The early establishment of AI company has a customer base, take Yitu, the public security industry is mostly its old customers, once according to the map has new technology, landing transformation will be very fast.

More importantly, the last wave of AI companies suffered in commercialization, no longer just talking about technology leadership, they understand that only by combining leading technology with business scenarios and achieving low cost is the real solution to users' pain points.

In the era of big model, is the security industry ready?

Is the security industry ready for the arrival of the big model era?

If summed up in one sentence, it can be said that the demand of the security market is full of technology.

In this new round of AI tuyere, the four little dragons have all rushed to test the multimodal big model: Shangtang released the "daily new big model", Yuncong released the "leisurely big model", and Xianyang also released a lightweight LLM model reasoning framework. Today, although the "Tian Wen" multimodal model is officially released late, it has in fact been actively recognized by customers. At present, applications have been deployed in more than 30 projects, and the landing process is further.

These veteran AI players, who survived the last round of fighting, are now naturally expanding from visual to multi-modal models with accumulated data and industry knowledge.

In the past two years, the development of the security industry has entered a bottleneck period, and the growth is weak. Looking at the main participants in the security industry, everyone is eager for change and are looking for ways to break the situation in both breadth and depth.

In terms of breadth, the security market in a narrow sense has moved from public security business dominated by evasion and distribution control to a more comprehensive urban governance business; at the same time, the pan-security market has entered the larger ToB enterprise market with the help of visual intelligent technology, while increasing non-video investment to seek greater growth space by expanding product lines.

In depth, enterprises still take video as the core, and further invest in perceptual intelligence, cognitive intelligence and subordinate large model ability in intelligent technology.

The exploration of these two directions is closely combined with the intelligence of video.

From the early "visible", to "visible", to today's "understandable" under the enabling of large models, video intelligence has gone through five stages of evolution of L1-L5:

From the structured tagging of pictures to the semantic understanding of videos

From numerous discriminant task models to vision-based multimodal large models

-- from the end-side camera AI computing power to the cloud-side centralized server AI computing power

From the recognition of people and cars to the recognition of long-tailed objects

From tabbed filtering interaction to semantic-based multimodal human-computer interaction.

In the L5 stage, we basically reached an important consensus: the large model based on Transformer unifies the underlying framework of video and big data, and opens a new era of video intelligent situational understanding.

It means that in the face of the same instruction, the whole process will be simplified from two separate steps to one step, and the prediction accuracy will be greatly improved because the intermediate process is omitted.

Taking the statistics of congestion at intersection An in the past ten days as an example, the previous practice is to first parse through the camera to generate a large amount of structured data, and then match the results to the database. The fusion of vision and big data merges the unstructured video and the structured database into a unified solution with a single model. When it comes to operation, all you need is voice command. "in the past 10 days, intersection An is particularly congested. Please pull out the camera at this intersection."

In other words, in the era of large models, when data flows in from one end, the result flows directly from the other.

In the public security industry, it will be a great transformation of the police information system.

First of all, the big model will bring about a major change in the construction of video intelligence.

Video intelligent construction is divided into two modes: one is the picture stream, the front-end camera is directly intelligent, and then the picture is sent back to the back-end for further analysis. The disadvantage is that a large amount of information is lost. First, the video stream is captured by the front-end camera and analyzed by the back-end. Although all the semantics in the video are retained and the behavior can be fully described, the value of the complete semantics of the video has not been fully excavated and used because there is no blessing of the large model.

Taking the intelligent route of video stream is a more responsible choice for customers. "Image stream can only be recognized by human face. With the growth of business demand, the construction mode of picture stream will become a dead end and can only be replaced with a new camera. While video stream is beneficial to the front-end construction of the old protection, you only need to upgrade the back-end algorithm."

Considering that the large model can only be deployed in the back end, the intelligent video stream may become mainstream in the future.

Second, there are major changes in the IT infrastructure.

As mentioned earlier, the traditional information technology, through the front-end camera will produce a lot of structured data, structured data stored in the database, the database is stored in the hard disk.

In the era of large model, all data processed by the model will generate feature vectors that contain a complete understanding of video semantics. The feature vectors are stored in the vector database and the vector database is stored in the explicit memory.

In the process of changing from traditional database to vector database, the infrastructure level will also change from the construction of CPU-based database system to the construction of GPU-based vector search system.

Based on this understanding, the reality server was released in 2022.

For the birth of this server, Xu Yan explained as follows: "in the vector era, the super-converged server is very powerful. According to the figure, the reality server has one of the biggest features: memory is apparent memory, because even the Nvidia A100 has only 40 gigabytes of video memory, but the reality server memory can reach 512 gigabytes, or even 1T."

In addition, in terms of the production process, with the pre-training model, the end-to-end time and cost of user demand will be greatly reduced.

On the one hand, replacing the past small model with large model technology can greatly compress the production process cycle, reduce the cost of algorithm research and development, and let customers enjoy the technology dividend more quickly.

For example, when users want to make a need to "ride a battery bike without a helmet", AI needs to collect data for training, and the demand will be met for at least one to two months.

With the pre-training model, you only need to enter the instruction of "riding a battery car without a helmet" to generate the algorithm, which can achieve the effect of the previous month or two in a week, and the longer the time, the higher the accuracy of the model.

On the other hand, the cost will be minimized through the end-to-end integration of algorithms and computing power.

Xu Yan told Leifeng, for example, "in the past, it took 16 cabinets to do 10,000-way video intelligence, but now after the end-to-end integration of algorithms and computing power, all problems can be solved with only one cabinet, but the overall cost has been reduced by 80%."

Finally, from the operational level, great changes will take place in the traditional command center, mobile policing, investigation and detection.

For example, the command center can access the camera directly through voice, such as "help me access the video of all the parks" and "help me access the video of all the currently congested roads"; mobile policing will also become, when the police are patrolling the road, no longer need to type on the law enforcement instrument, but can efficiently enter instructions through voice, such as "Please check the identity of the person wearing a black hat in front."

"New security, true intelligence". The next decade of security belongs to intelligence.

"Yitu has always believed that the security market will move from surveillance to intelligence."

If there is any obvious difference between Yitu and other companies, it is that it always recognizes the value of the security market and sticks to the security market for ten years.

When it comes to Yitu's contribution to the security market, Xu Yan is full of pride:

"since its establishment in 2012, Yitu represents the most cutting-edge level in the industry at all stages of the development of security intelligence. One of the reasons why Yitu firmly believes that security is a good market is that it has made a profit in the security market and continues to create value for customers."

This value is embodied in that up to now, Yitu is the first company in the security industry to launch a large multimodal model that can be practically and commercially available.

Although the "first" is easy to write, it is a heavy effort for Yitu.

In 2018, according to the picture, the slogan "New Security, True Intelligence" has not changed.

At that time, with the rapid development of visual intelligence, face recognition began to enter the large-scale landing stage. According to the slogan of "new security", it is issued for the reform of the public security business process, while the slogan of "true intelligence" emphasizes the differentiated user value that "real intelligence" brings to the security industry.

Entering the era of big model, the business process and intelligence of public security have leapt to a new stage, but the core of this slogan remains unchanged.

What is really moving is that Yitu did not become a "slogan expert", but hung the slogan on the actual action.

All along, customers seem to be unreasonable pursuit of six words: high value, low cost.

It is not easy to meet this demand, especially at AI, which has a very high technical and channel threshold.

But according to the belief and practice of this sentence, it is rare to see a solid foundation in AI.

The business logic behind "Security is a good market" is precisely the pursuit of value maximization and cost minimization.

According to the picture, the chip board was laid out in 2017, and the first cloud AI chip "Quest" was released on May 9, 2019, which is used in the field of visual reasoning.

At that time, the press conference site, according to the picture through four "search" chips, real-time comparison of the live audience, within 10 minutes of the demonstration, there was no false alarm.

The reason for making chips up according to the map is to see the contradiction between the rapid development of algorithm performance and the slow improvement of machine computing power, which causes AI companies to either sacrifice algorithm performance and cut feet to fit shoes, or there are no advanced algorithms, which consumes air resources.

With the intensification of scientific and technological competition between China and the United States, the blockade and sanctions imposed by the United States on Chinese chips continue to escalate, and the localization of cost-effective chip products is particularly important at the moment. According to the chart, the choice seems to be more forward-looking: the end-to-end integration of hardware and algorithms can give users the most cost-effective back-end intelligent products.

If, in the past, the lack of a unified model structure made it difficult for chip and algorithm companies to match, to some extent, artificial intelligence chips were not established, then today, this obstacle has disappeared.

Looking back on the decision that Tu began to invest in large model technology based on Transformer in 2019, the significance is to choose the right path for the company in the next few years.

When the direction chosen by enterprises is more accurate, the efficiency of R & D will be more efficient, the cost will be lower, and it will be more cost-effective for customers, and finally achieve a win-win situation.

"Today, the multimodal model according to the chart has been deployed and applied in more than 30 projects across the country." According to Xu Yan's introduction to Leifeng net, according to the picture "Tianwen" multimodal large model, it has the following three major characteristics:

First, the semantic search ability of video, which supports searching all kinds of videos with natural language.

Expressions such as "car blocking fire passage", "road with stagnant water" and "unattended red suitcase" can quickly find the corresponding video.

Second, zero sample, cold start.

For example, to create an algorithm that rides a bike without a helmet, you only need to enter "riding without a helmet" to generate the algorithm, and once the algorithm is online, it can do minute-level training online, and the more you use it, the higher the accuracy.

Third, built-in a large number of algorithms that condense the understanding of the industry according to the map.

Through targeted specific data into the training model, such as large passenger flow, key area prevention and control and other typical algorithms to improve the performance of the model.

Since the beginning of this year, the Tianwen model according to the picture has received a lot of impressive feedback.

Traffic accidents occur frequently in a city, and customers in the city find out to judge the conventional violations and uncivilized behaviors at city junctions according to the map, and require that the Tianwen big model can be used to endow the old front-end camera.

According to the picture, various algorithms have been done, such as running red lights, speeding, disrespectful pedestrians, cycling without helmets, illegal U-turns, compacted lines, and so on. On the first day the system was launched, the accuracy was only 60%, 70%, and at the end of a week, the accuracy basically reached 100%. And the cost is 60% and 70% lower than the previous construction cost.

In Xu Yan's recollection, "at that time, this customer was particularly excited because according to the map, it took a week to solve the great pain point of traffic governance in the city."

In addition to large-scale projects, the significance of multimodal large models lies in unlocking the long-tail algorithm.

In the past, a large number of small and medium-sized customers had no rigid demand for AI due to their weak ability to pay. In the future, with the help of the migration and general ability of large models, the cost threshold can be reduced, and these users can also use large models.

Conclusion

My way ahead is long; I see no ending; yet high and low I'll search with my will unbending.

In 2019, according to the picture released the first AI chip, from qu Yuan's "songs of Chu" to find a "search", expressing the enthusiasm for artificial intelligence technology, artificial intelligence industry landing exploration.

At the same time, Yitu also opened the research on the application of Transformer technology in the field of vision, and also found the name "Tian Wen" from the songs of Chu, giving it a large multimodal visual model that was still in existence at that time.

Up to now, Yitu has "seeking" and "realistic" in the domestic computing power, and "heaven asking" in the algorithm of the big model. After ten years of AI Road, the image of "full-stack AI technology" and "one-stop AI solution provider" created by the picture has become more and more clear and complete.

In the big model era, Yitu is still full of confidence in security intelligence, adhering to the operation philosophy of "value maximization, cost minimization". With the commercial launch and large-scale deployment of visual multimodal large model products, Yitu once again stands at the forefront of the new era of AI.

The big model era is bound to produce new business models and application scenarios, and in the second decade of Yitu, we expect it to once again lead the industry and once again turn the slogan of "new security, true intelligence" into reality.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.