LLM & ChatGPT 相关论文

20240605
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions [Dec 2023]

BLIVA, an augmented version of InstructBLIP with Visual Assistant, which incorporates the query embeddings from InstructBLIP and also directly projects encoded patch embeddings into the LLM, a technique inspired by LLaVA demonstrates significant capability in decoding real-world images, irrespective of text presence.

CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare [Dec 2023]

This work introduces the MMQS dataset, and showcases how visual cues from images enhance the generation of medically nuanced summaries, a multimodal approach that enhances the decision-making process in healthcare but also fosters a more nuanced understanding of patient queries.

Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation [May 2024]

A novel method that further aids SLMs learning key reasoning steps rather than mere simple fine-tuning is proposed, and it is found that EDIT benefits more from logical errors than from knowledge or mathematical calculation errors in dual CoTs.

Improve Student’s Reasoning Generalizability through Cascading Decomposed CoTs Distillation [May 2024]

By restructuring the training objectives -- removing the answer from outputs and concatenating the question with the rationale as input -- CasCoD's two-step learning process ensures that students focus on learning rationales without interference from the preset answers, thus improving reasoning generalizability.

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting [Dec 2023]

This work develops new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks using diverse wording and structures expressed through natural languages.

Distillation with Explanations from Large Language Models [May 2024]

This work proposes a new mechanism, Distillation with Explanations from LLMs, that combines the ground truth labels and answers-explanations generated by LLMs, to simultaneously generate more accurate answers and the corresponding free-text explanations.

AS-ES Learning: Towards Efficient CoT Learning in Small Models [Mar 2024]

A new training paradigm AS-ES (Abstractive Segments - Extractive Segments) learning is proposed, which exploits the inherent information in CoT for iterative generation and explores the reason behind the inefficiency of small models in learning CoT.

Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models [Apr 2024]

A robust Post-Semantic-Thinking (PST) strategy to generate answers before rationale, which loose the constraint against the generated rationale to be close to the LLMs gold standard in the hidden semantic space instead of the vocabulary space, thus making the small student model better comprehend the semantic reasoning logic in rationale.

Multimodal Question Answering for Unified Information Extraction [Oct 2023]

A novel multimodal question answering (MQA) framework to unify three MIE tasks by reformulating them into a unified span extraction and multi-choice QA pipeline and can serve as a general principle of utilizing LMMs to better solve MIE and potentially other downstream multiodal tasks.

LLM as an Art Director (LaDi): Using LLM’s to improve Text-to-Media Generators [Nov 2023]

The techniques that can be used to make Large Language Models (LLMs) act as Art Directors that enhance image and video generation are described, and a unified system for this called LaDi is described.

Large Language Model Agent for Fake News Detection [Apr 2024]

This work introduces FactAgent, an agentic approach of utilizing large language models (LLMs) for fake news detection that enables LLMs to emulate human expert behavior in verifying news claims without any model training, following a structured workflow.

Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction [Apr 2024]

A trained component that retrieves schema elements relevant to the input text; this improves the LLMs' extraction performance in a retrieval-augmented generation-like manner and demonstrates on three KGC benchmarks that EDC is able to extract high-quality triplets without any parameter tuning and with significantly larger schemas compared to prior works.

20231122
Heuristics-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction [Nov 2023]

The Heuristic-Driven Link-of-Analogy (HD-LoA) prompting method is introduced, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their adaptability and inspired by the analogical reasoning of human.

Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning [Nov 2023]

This work investigates constructing and leveraging extracted semantic structures (graphs) for multi-hop question answering, especially the reasoning process, and generates more faithful reasoning chains and substantially improves the QA performance on two benchmark datasets.
PPT made by XuSheng

Unveiling the Siren's Song: Towards Reliable Fact-Conflicting Hallucination Detection [Oct 2023]

FactCHD, a fact-conflicting hallucination detection benchmark meticulously designed for LLMs, and TRUTH-TRIANGULATOR that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence.

Can We Edit Multimodal Large Language Models? [Oct 2023]

A new benchmark is constructed, dubbed MMEdit, for editing multimodal LLMs and establishing a suite of innovative metrics for evaluation, to facilitate research in this area and provide the NLP community with insights.
PPT made by LiuChengwei

ChatHaruhi: Reviving Anime Character in Reality via Large Language Model [Aug 2023]

Both automatic and human evaluations show the approach improves role-playing ability over baselines, and an algorithm that controls language models via an improved prompt and memories of the character extracted from scripts is proposed.

Adapting Fake News Detection to the Era of Large Language Models [Nov 2023]

A comprehensive evaluation of fake news detectors trained in various scenarios reveals an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
PPT made by SunTiening

Large Language Models Are Reasoning Teachers [Jul 2023]

This paper uses very large models as reasoning teachers to enable complex reasoning in smaller models and reduce model size requirements by several orders of magnitude, and proposes Fine-tune-CoT, a method that generates reasoning samples from very large teacher models to fine-tunes smaller models.

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [Feb 2023]

A LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules to significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses.
PPT made by WangYongsheng

20231108
CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks [Oct 2023]

This work proposes Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to solve tasks from different granularities.
PPT made by XuSheng

Learning From Mistakes Makes LLM Better Reasoner [Oct 2023]

LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4 and improves the performance compared with fine-tuning on CoT data alone, surpassing the SOTA performance achieved by non-execution open-source models on these challenging tasks.
PPT made by XuSheng

Simple synthetic data reduces sycophancy in large language models [Aug 2023]

A straightforward synthetic-data intervention is presented that takes public NLP tasks and encourages models to be robust to user opinions on these tasks and can significantly reduce sycophantic behavior on held-out prompts.
PPT made by LiuChengwei

Why Does ChatGPT Fall Short in Providing Truthful Answers? [Apr 2023]

Analysis of the failures of ChatGPT in complex open-domain question answering and identifies the abilities under the failures indicates that furnishing the model with fine-grained external knowledge, hints for knowledge recall, and guidance for reasoning can empower the model to answer questions more truthfully.
PPT made by LiuChengwei

Visual Instruction Tuning [Apr 2023]

This paper presents LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding and introduces GPT-4 generated visual instruction tuning data, the model and code base publicly available.
PPT made by SunTiening

Are aligned neural networks adversarially aligned? [Jun 2023]

It is shown that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models, and conjecture that improved NLP attacks may demonstrate this same level of adversarial control over text-only models.
PPT made by SunTiening

STAR: Improving Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [May 2023]

STARS is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances given limited seed demonstrations, thereby boosting low-resource information extraction performance and surpassing the effectiveness of human-curated data.
PPT made by WangYongsheng

Empirical Study of Zero-Shot NER with ChatGPT [Oct 2023]

This work focuses on exploring LLM performance on zero-shot information extraction, with a focus on the ChatGPT and named entity recognition (NER) task, and proposes syntactic augmentation to stimulate the model's intermediate thinking in two ways.
PPT made by WangYongsheng

TarGEN: Targeted Data Generation with Large Language Models [Oct 2023]

TarGEN is presented, a multi-step prompting strategy for generating high-quality synthetic datasets utilizing a LLM that augments TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances during dataset creation, ensuring reliable labels.
PPT made by ChenXinyu

Making Large Language Models Better Data Creators [Oct 2023]

This paper proposes a unified data creation pipeline that requires only a single formatting example, and which is applicable to a broad range of tasks, including traditionally problematic ones with semantically devoid label spaces, and demonstrates that instruction-following LLMs are highly cost-effective data creators.
PPT made by ChenXinyu

Kosmos-2: Grounding Multimodal Large Language Models to the World [Jun 2023]

Kosmos-2, a Multimodal Large Language Model (MLLM), is introduced, enabling new capabilities of perceiving object descriptions and grounding text to the visual world and sheds light on the big convergence of language, multimodal perception, action, and world modeling.
PPT made by ChenLizhi

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model [Jun 2023]

COSA, a COncatenated SAmple pretrained vision-language foundation model, jointly models visual contents and event-level temporal cues using only image-text corpora and achieves state-of-the-art results on various competitive benchmarks.
PPT made by ChenLizhi

ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations [Apr 2023]

It is found that ChatGPT exhibits strong performance in detecting and reasoning about causal relations, while it may not be proficient in identifying the temporal order between two events.
PPT made by LiuShannan

Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study [May 2023]

The results show that the generative paradigm allows ChatGPT to achieve comparative performance in the topic segmentation task comparable to state-of-the-art methods but reveals room for improvement in the more complex tasks of discourse relation recognition and discourse parsing.
PPT made by LiuShannan

20230822
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction [Apr 2023]

Experimental results demonstrate that the proposed InstructUIE method achieves comparable performance to Bert in supervised settings and significantly outperforms the state-of-the-art and gpt3.5 in zero-shot settings.

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks [May 2023]

A general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks is proposed that will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies.

Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning [ACL 2023]

This paper systematically study the role of task definitions in instruction learning, and proposes two strategies to help models better leverage task instructions: providing only key information for tasks in a common structured format, and adding a meta-tuning stage to help the model better understand the definitions.

PPT made by Xu
20230808
Finetuned Language Models Are Zero-Shot Learners [Sep 2021]

It is shown that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin.

Self-Instruct: Aligning Language Models with Self-Generated Instructions [ACL 2023]

Self-Instruct is introduced, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off their own generations by generating instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.

Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning [May 2023]

A preliminary exploration into reducing the data used in LLM instruction tuning and identifies several observations regarding task specialization for LLM training, such as the optimization of performance for a specific task, the number of instruction types required for instruction tuning, and the amount of data required for task-specific models.

Specializing Smaller Language Models towards Multi-Step Reasoning [Jan 2023]

This work shows two important aspects of model abilities: there exists a very complex balance/ tradeoff between language models' multi-dimensional abilities, and by paying the price of decreased generic ability, it can clearly lift up the scaling curve of models smaller than 10B towards a specialized multi-step math reasoning ability.

LogiCoT: Logical Chain-of-Thought Instruction-Tuning Data Collection with GPT-4 [May 2023]

LogiCoT is presented, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4 that serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning [Aug 2023]

This work proposes a zero-shot verification scheme to recognize individual errors within a step-by-step reasoning in large language models and uses it to improve question-answering performance, by using it to perform weighted voting on different generated answers.

Chinese Open Instruction Generalist: A Preliminary Release [Apr 2023]

This work proposes the project as an attempt to create a Chinese instruction dataset by various methods adapted to the intrinsic characteristics of 4 sub-tasks by various Methods, collecting around 200k Chinese instruction tuning samples.

PPT made by Xu
20230725
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond [Apr 2023]

A comprehensive and practical guide for practitioners and end-users working with Large Language Models in their downstream natural language processing (NLP) tasks, enabling the successful implementation of these models in a wide range of NLP tasks.

A Survey of Large Language Models [Mar 2023]

A review of the recent advances of large language models by introducing the background, key findings, and mainstream techniques, and focusing on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation.

Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning [Mar 2023]

This survey paper tries to summarize and provide insights into the current research on instruction learning, particularly by answering the following questions: What is task instruction, and what instruction types exist?

PPT made by Xu
20230531
Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors [May 2023]

This paper evaluates ChatGPT's performance on 17 datasets with 14 IE sub-tasks under the zero-shot, few-shot and chain-of-thought scenarios, and finds a huge performance gap between ChatG PT and SOTA results, and proposes a soft-matching strategy for evaluation.
PPT made by Xu

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [May 2023]

An exhaustive quantitative and qualitative evaluation of Large Language Models for Knowledge Graph (KG) construction and reasoning, which suggests that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets.
PPT made by Xu

SummIt: Iterative Text Summarization via ChatGPT [May 2023]

This paper proposes SummIt, an iterative text summarization framework based on large language models like ChatGPT that enables the model to refine the generated summary iteratively through self-evaluation and feedback, closely resembling the iterative process humans undertake when drafting and revising summaries.
PPT made by Chen

ClusterLLM: Large Language Models as a Guide for Text Clustering [May 2023]

ClusterLLM, a novel text clustering framework that leverages feedback from an instruction-tuned large language model, such as ChatGPT, consistently improves clustering quality, at an average cost of ~$0.6 per dataset.
PPT made by Chen

20230517
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness [Apr 2023]

ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting, as evidenced by human evaluation, and indicates that ChatGPT provides high-quality and trustworthy explanations for its decisions.
PPT made by Xu

ChatGraph: Interpretable Text Classification by Converting ChatGPT Knowledge to Graphs [May 2023]

A novel framework that leverages the power of ChatGPT for specific tasks, such as text classification, while improving its interpretability and providing a more transparent decision-making process compared with previous text classification methods is proposed.
PPT made by Xu

VicunaNER: Zero/Few-shot Named Entity Recognition using Vicuna [May 2023]

VicunaNER is a two-phase framework, where each phase leverages multi-turn dialogues with Vicuna to recognize entities from texts and names the second phase as Re-Recognition, which recognizes those entities not recognized in the first phase.
PPT made by Chen

Using ChatGPT for Entity Matching [May 2023]

This paper investigates using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models, and shows that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching an average zero-shot performance of 83% F1 on a challenging matching task.
PPT made by Chen

ChatGPT as a Text Simplification Tool to Remove Bias [May 2023]

A possible technique for bias mitigation in the form of simplification of text is explored, which is that simplifying text should standardise language to one way of speaking while keeping the same meaning.
PPT made by Liu

CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors [May 2023]

This paper proposes to recast the structured output in the form of code instead of natural language and utilize generative LLMs of code (Code-LLMs) such as Codex to perform IE tasks, in particular, named entity recognition and relation extraction.
PPT made by Liu

Is ChatGPT Equipped with Emotional Dialogue Capabilities? [Apr 2023]

This study evaluates the performance of ChatGPT on emotional dialogue understanding and generation through a series of experiments on several downstream tasks.
PPT made by Sun

How would Stance Detection Techniques Evolve after the Launch of ChatGPT? [Apr 2023]

For the stance detection tasks, experiments show that ChatGPT can achieve SOTA or similar performance for commonly used datasets including SemEval-2016 and P-Stance, and can provide explanation for its own prediction, which is beyond the capability of any existing model.
PPT made by Sun

20230503
Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study [Apr 2023]

A preliminary evaluation of ChatGPT on the understanding of opinions, sentiments, and emotions contained in the text and compares it with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on end-task.
PPT made by Liu

Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media [Apr 2023]

This paper examines CoT's effectiveness in stance detection tasks, demonstrating its superior accuracy and discussing associated challenges.
PPT made by Liu

ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT [Apr 2023]

It is shown that with appropriate task-specific ChatGPT prompts, the use of data obtained from prompting a large generative language model,ChatGPT, to generate synthetic training data with the aim of augmenting data in low resource scenarios outperform the most popular existing approaches for such data augmentation.
PPT made by Chen

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [Mar 2023]

A system to enable the user to interact with ChatGPT by sending and receiving not only languages but also images and providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps, and opens the door to investigating the visual roles ofChatGPT with the help of Visual Foundation Models.
PPT made by Chen

Human-like Summarization Evaluation with ChatGPT [Apr 2023]

ChatGPT was able to complete annotations relatively smoothly using Likert scale scoring, pairwise comparison, Pyramid, and binary factuality evaluation, and it outperformed commonly used automatic evaluation metrics on some datasets.
PPT made by Wang

Linguistic ambiguity analysis in ChatGPT [Feb 2023]

An introduction to linguistic ambiguity, its varieties and their relevance in modern NLP, and an extensive empiric analysis are provided, as well as strategies to get the most of this model.
PPT made by Wang

20230419
Zero-Shot Information Extraction via Chatting with ChatGPT [Feb 2023]

This work transforms the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE), and extensively evaluates the framework on three IE tasks: entity-relation triple extract, named entity recognition, and event extraction.
PPT made by Liu

AugGPT: Leveraging ChatGPT for Text Data Augmentation [Feb 2023]

Experimental results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.
PPT made by Liu

Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine [Jan 2023]

A preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness finds that it performs competitively with commercial translation products on high-resource European languages but lags behind significantly on low-resource or distant languages.
PPT made by Sun

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection [Jan 2023]

This work collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas, and builds three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios.
PPT made by Sun

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research [Mar 2023]

This work introduces WavCaps, the first large-scale weakly-labelled audio captioning dataset, and proposes a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT, a large language model, is leveraged to filter and transform raw descriptions automatically.
PPT made by Wang

Exploring the Feasibility of ChatGPT for Event Extraction [Mar 2023]

The usability testing experiments indicate that ChatGPT is not robust enough, and continuous refinement of the prompt does not lead to stable performance improvements, which can result in a poor user experience.
PPT made by Wang

Using Multiple RDF Knowledge Graphs for Enriching ChatGPT Responses [Apr 2023]

A research prototype, called GPToLODS, is presented, which is able to enrich any ChatGPT response with more information from hundreds of RDF KGs, and identifies and annotates each entity of the response with statistics and hyperlinks to LODsyndesis KG.
PPT made by Chen

Zero-shot Temporal Relation Extraction with ChatGPT [Apr 2023]

It is found that ChatGPT cannot keep consistency during temporal inference and it fails in actively long-dependency temporal inference.
PPT made by Chen

Extractive Summarization via ChatGPT for Faithful Summary Generation [Apr 2023]

It is found that applying an extract-then-generate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness, and highlights potential directions for enhancing ChatG PT's capabilities for faithful text summarization tasks using two-stage approaches.
PPT made by Xu

Zero-shot Clinical Entity Recognition using ChatGPT [Mar 2023]

This study investigated the potential of ChatGPT for the clinical named entity recognition task in a zero-shot setting with two different prompt strategies.
PPT made by Xu