Deng Shasha: ChatGPT and AI generated content: Should scientific research adopt or resist?

Time:0905,2023View:12

On November 30, 2022, Open AI released the ChatGPT chat robot model. Unlike previous chat robots, it "knows astronomy and geography" and can also complete tasks such as writing emails, editing copy, and generating code. The popularity of ChatGPT has also changed our way of life and work, especially sparking an "industrial revolution" among knowledge workers. China also proposed on February 27, 2023 to promote the establishment of digital China research bases in higher education institutions, research institutions, and enterprises.

The popularity of ChatGPT has made people keenly aware that Artificial Intelligence Generated Content (AIGC) will lead the latest model in education and research. However, the application of AIGC in scientific research also poses challenges. Taking ChatGPT as an example, it has been found in a large number of user experiences that the text generated by ChatGPT is not entirely reliable and cannot directly replace search engines. At the same time, ChatGPT completed its training in 2021, so the model does not have the ability to keep up with the times. At present, ChatGPT is still offline, and the output of all content can only rely on its own internal knowledge and logic, without the ability to conduct online self inspection. Although ChatGPT is not perfect, it complements the development of scientific research.

ChatGPT and AIGC Technology

Tasks in the field of Natural Language Processing (NLP) require attention to contextual sequence information. Recurrent Neural Network (RNN) utilizes the last moment output to construct temporal correlations between hidden layers of the neural network, and has been widely used in machine translation, speech recognition, and sentiment analysis. The RNN loop input structure limits its application on large-scale corpora. The Self Attention mechanism embeds contextual information by directly calculating the correlation between different words, eliminating RNN's dependence on previous information and allowing for parallel computation. The attention mechanism has led a new direction in the field of NLP due to its advanced interpretability and efficient computational performance.

The popularity of GPT and Bert in 2018 was not only due to their advanced structural design based on self attention, but also due to their massive data pre training methods. There are a large number of unlabeled corpora in natural language. By unsupervised training on massive amounts of data, a pre trained model with strong generalization ability is obtained. The general features of words and sentences generated by the pre trained model are used as input for specific tasks, which can save a lot of computational resources and improve the model's generalization ability.

ChatGPT, as the latest research progress, adopts a strategy of reinforcement learning from human feedback (RLHF), further demonstrating the enormous value of Artificial Intelligence Generated Content (AIGC). AIGC is a collection of technologies that categorize content from the perspective of content creators and automatically generate new production methods using artificial intelligence technology. According to the form of content, the technical system of AIGC can be divided into AI generated natural language content, AI generated visual content, and AI generated multimodal content.

As the most basic form of content, natural language is the description of the objective world and the expression of the subjective world, with the widest range of applications. The use of massive data for natural language understanding (NLU) in mining common knowledge is a key aspect of AIGC. Pre trained models based on large-scale unlabeled corpora perform well in tasks such as sentiment analysis, speech recognition, reading comprehension, and text generation;

Visual content is the most objective record of the physical world in the era of the Internet, and the perception of human consciousness is more realistic. How AI perceives and perceives massive visual data determines the authenticity and connotation of AI visual content. The structure and generative model of Vision Transformer (VIT) have contributed to the development of AI generated visual content;

In the metaverse, text data and image data are intertwined and presented in a coordinated manner. Relying solely on single modal data modeling research can lead to incomplete evaluation of human cognitive learning processes by AI. At the same time, if AIGC technology can only generate single modal content, then the application scenarios of AIGC will be extremely limited. The multimodal large model is dedicated to processing data and information from different modalities, sources, and tasks, searching for corresponding relationships between different modal data to achieve mutual transformation between different modal data, and then generating audio-visual integrated multimodal content.

Assumption of the Application of AIGC in Scientific Research

Unlike traditional content generation models, AIGC can overcome the limitations of resources in time and space, allowing each researcher to directly experience, construct, and generate research elements, effectively solving the problem of uneven allocation of research resources. At the same time, AIGC provides a good research environment for team innovation, liberating researchers from being "research workers" and efficiently producing high-quality research results. In addition, AIGC's supplementation of research resources and guidance on team collaboration further help to break disciplinary boundaries, and interdisciplinary collaborative research has become the mainstream development direction.

Firstly, AIGC can supplement research resources. The emergence of AIGC has to some extent broken the current situation of imbalanced allocation of scientific research resources. AIGC can create highly realistic sample data and highly realistic virtual models, which is of great benefit to the research of many disciplines such as biology, medicine, computer science, neuroscience, etc. Researchers use AIGC to mine complete data samples from massive amounts of data, no longer relying on search and human judgment filtering methods, thus avoiding the problem of knowledge deficiency and omission caused by knowledge not being in the existing database or subjective judgment. At the same time, specific restrictions can be made on the content generated by AIGC to generate standardized data samples. From searching for data through past technologies to creating data through current technologies, AIGC will promote the construction of WEB3.0, supplement research resources, and lower research barriers.

Secondly, AIGC can assist in team collaboration efficiency and team innovation. The strong inclusiveness of AIGC can promote interdisciplinary complementarity and resource integration. On the other hand, AIGC will subvert traditional research management models and guide research management work. Research is not only focused on academic leaders, but every member is the core of the team, fully leveraging the enthusiasm and innovation of research.

Finally, AIGC can break disciplinary boundaries. Due to its late start, China has always lagged behind foreign countries in professional software development. AGIC is moving from surface data to low-level technology generation, and AIGC can also quickly model and simulate. In recent years, various industries have put forward high requirements for computer fundamentals. The emergence of AIGC has helped humans understand technology and further blurred the boundaries between industries. The relationship between industry and academia is mutually reinforcing, and the boundaries of disciplines will be further broken, making interdisciplinary collaborative research an inevitable development trend.

Challenges faced by AIGC in scientific research

Although AIGC has rich application possibilities in scientific research, unlike the characteristics of primary and secondary data used in scientific research, the content generated by AI is neither completely existing nor completely objective. The application of AIGC in scientific research will bring many problems.

Firstly, the primary issue in the application of AIGC in scientific research is whether its rigor can be verified. The results of ChatGPT for the same problem are not entirely consistent, and the approach of finding commonalities in this fuzzy similarity also increases the difficulty of rigor argumentation in AIGC. Just like there is still a significant gap between the optimal laboratory model and practical application, AIGC has a long way to go from convenient living to scientific application. On the other hand, many journals explicitly prohibit ChatGPT from being listed as a co author of a paper, and also conduct plagiarism checks on the text it generates.

Secondly, the convenience of AI generating content can also erode the ability of researchers to think independently. With the continuous development of AIGC interaction, the convenience and zero cost of knowledge acquisition can lead learners to fall into the trap of technology dependence and AI addiction. The comprehensive ability of AIGC and its compensatory effect on reality carry a strong risk of addiction. The stimulation given to the brain by AIGC can easily stimulate endless human desires, and if not prohibited, it will be infinitely replicated. The government, universities, society, and technology giants have the responsibility to regulate the use of AI technology and protect the security of cyberspace use.

Thirdly, privacy and ethical issues. In today's increasingly frequent information exchange between the metaverse and Web3.0, user created data is rapidly spreading on decentralized blockchains, and the digital traces in it are likely to contain sensitive information about user privacy. AI needs to keep up with the times to generate realistic and realistic content, and its training data sources will eventually move towards the internet. And these private information are indiscriminately captured and learned by artificial intelligence, and the generated content may infringe on user privacy and intellectual property rights. On the other hand, the false information generated by AIGC can be spread and published without discrimination, which also poses a risk of information fraud.

Conclusion

 2023 is the first year of AIGC, and in the same year, China also proposed to promote the joint participation of universities, research institutions, and enterprises in the construction of a digital China. Technicians and researchers around the world are vying to experience the charm of AI generated content and are also thinking about how to view the application prospects of AIGC in scientific research.

For researchers, unlike traditional research work that relies on platform resources, research work empowered by AI has a more competitive advantage, and reducing the cost of acquiring research resources can further liberate the mind. For research teams, it is important to combine AI generated content with existing research management models, promote team collaboration efficiency, and ensure team innovation vitality. For the development of disciplines, AIGC has broken the boundaries between different disciplines, expanded the scope of management disciplines, and put forward higher requirements for research work. We need to conduct management science research from a cross era and interdisciplinary perspective.

It is too early to directly predict that AIGC will become the primary productivity of future scientific research. This technology still needs to be tested over time. When researchers supplement AIGC with scientific research, they need to respect the rigor of scientific research, be vigilant against technological traps, avoid moral and legal risks, and correctly use AI as a double-edged sword to accelerate the construction of digital China.


HONGKOU CAMPUS
550 Dalian Road (W), Shanghai 200083, China
SONGJIANG CAMPUS
1550 Wenxiang Road, Shanghai 201620, China