Updated on 2024.09.24
Table of Contents
- <a href=#peft>PEFT</a>
- <a href=#text-to-image-generation>Text-to-Image Generation</a>
- <a href=#vision-language-models>Vision-Language Models</a>
- <a href=#generative-weight-space-modeling>Generative Weight Space Modeling</a>
- <a href=#data-distillation>Data Distillation</a>
- <a href=#schrodinger-bridge>Schrodinger Bridge</a>
PEFT
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | null |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-09-17 | Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models | Divij Gupta et.al. | 2409.11302 | null |
2024-09-18 | Propulsion: Steering LLM with Tiny Fine-Tuning | Md Kowsher et.al. | 2409.10927 | link |
2024-09-16 | From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs | Navya Jain et.al. | 2409.10245 | null |
2024-09-14 | COMFORT: A Continual Fine-Tuning Framework for Foundation Models Targeted at Consumer Healthcare | Chia-Hao Li et.al. | 2409.09549 | null |
2024-09-14 | Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models | Alireza Salemi et.al. | 2409.09510 | link |
2024-09-13 | Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights | Dixi Yao et.al. | 2409.08482 | null |
2024-09-12 | Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation? | Kerem Cekmeceli et.al. | 2409.07960 | link |
2024-09-11 | Efficient Localized Adaptation of Neural Weather Forecasting: A Case Study in the MENA Region | Muhammad Akhtar Munir et.al. | 2409.07585 | link |
2024-09-10 | Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts | Assefa Seyoum Wahd et.al. | 2409.06821 | null |
2024-09-11 | Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | Yao Shu et.al. | 2409.06277 | link |
2024-09-09 | SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values | Chengwei Sun et.al. | 2409.05926 | null |
2024-09-10 | Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment | Zhixian Zhao et.al. | 2409.05015 | null |
2024-09-06 | Customizing Large Language Model Generation Style using Parameter-Efficient Finetuning | Xinyue Liu et.al. | 2409.04574 | null |
2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838 | null |
2024-09-04 | Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs | Ruoyu Wang et.al. | 2409.02686 | null |
2024-09-04 | Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA | Shuangyi Chen et.al. | 2409.02346 | null |
2024-09-02 | Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning | Chongjie Si et.al. | 2409.01035 | link |
2024-08-28 | 3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability | Baohao Liao et.al. | 2409.00119 | null |
2024-08-21 | SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | Yang Cao et.al. | 2409.00055 | link |
2024-08-30 | MoRe Fine-Tuning with 10x Fewer Parameters | Wenxuan Tan et.al. | 2408.17383 | link |
2024-09-02 | Instant Adversarial Purification with Adversarial Consistency Distillation | Chun Tong Lei et.al. | 2408.17064 | null |
2024-08-28 | Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization | Léo Hemamou et.al. | 2408.15801 | null |
2024-08-27 | GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs | Maxim Zhelnin et.al. | 2408.15300 | link |
2024-08-27 | Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training | Xingliang Lei et.al. | 2408.15011 | null |
2024-08-27 | CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task | Lingyun Huang et.al. | 2408.14961 | link |
2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | link |
2024-08-24 | Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings | Sagar Srinivas Sakhinana et.al. | 2408.13622 | null |
2024-08-21 | Positional Prompt Tuning for Efficient 3D Representation Learning | Shaochen Zhang et.al. | 2408.11567 | link |
2024-08-20 | Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning | Bei Ouyang et.al. | 2408.10746 | null |
2024-08-20 | TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning | Bin Wang et.al. | 2408.10688 | link |
2024-08-19 | TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition | Tianwei Lin et.al. | 2408.09856 | link |
2024-08-16 | Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models | Vladimir Araujo et.al. | 2408.09053 | null |
2024-08-14 | KIND: Knowledge Integration and Diversion in Diffusion Models | Yucheng Xie et.al. | 2408.07337 | null |
2024-08-30 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-06 | SARA: Singular-Value Based Adaptive Low-Rank Adaption | Jihao Gu et.al. | 2408.03290 | null |
2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
2024-08-03 | TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks | Yang Yu et.al. | 2408.01835 | link |
2024-08-02 | MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts | Lin Ning et.al. | 2408.01505 | null |
2024-08-02 | Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs | Afia Anjum et.al. | 2408.01008 | null |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739 | null |
2024-07-28 | Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models | Jifeng Wang et.al. | 2407.19564 | link |
2024-07-24 | Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective | Jingren Liu et.al. | 2407.17120 | null |
2024-07-22 | Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders | Laura Niss et.al. | 2407.15731 | null |
2024-07-21 | Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization | Jiajun Hu et.al. | 2407.15085 | null |
2024-07-16 | InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification | Yujia Hu et.al. | 2407.12882 | link |
2024-07-18 | Turning Generative Models Degenerate: The Power of Data Poisoning Attacks | Shuli Jiang et.al. | 2407.12281 | null |
2024-07-16 | Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification | Naif Alkhunaizi et.al. | 2407.11573 | null |
2024-07-16 | An efficient framework based on large foundation model for cervical cytopathology whole slide image screening | Jialong Huang et.al. | 2407.11486 | link |
2024-07-10 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang et.al. | 2407.08044 | link |
2024-07-10 | ROSA: Random Subspace Adaptation for Efficient Fine-Tuning | Marawan Gamal Abdel Hameed et.al. | 2407.07802 | link |
2024-07-10 | Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction | Yumin Kim et.al. | 2407.07517 | null |
2024-07-09 | Reprogramming Distillation for Medical Foundation Models | Yuhang Zhou et.al. | 2407.06504 | null |
2024-07-07 | See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition | Chongjie Si et.al. | 2407.05417 | link |
2024-07-16 | LoRA-GA: Low-Rank Adaptation with Gradient Approximation | Shaowen Wang et.al. | 2407.05000 | link |
2024-07-05 | GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning | Aleksander Ficek et.al. | 2407.04528 | null |
2024-07-04 | Deep Content Understanding Toward Entity and Aspect Target Sentiment Analysis on Foundation Models | Vorakit Vorakitphan et.al. | 2407.04050 | link |
2024-07-04 | ASteISR: Adapting Single Image Super-resolution Pre-trained Model for Efficient Stereo Image Super-resolution | Yuanbo Zhou et.al. | 2407.03598 | null |
2024-07-03 | Knowledge Composition using Task Vectors with Learned Anisotropic Scaling | Frederic Z. Zhang et.al. | 2407.02880 | link |
2024-07-03 | Exploring the Capabilities of LLMs for Code Change Related Tasks | Lishui Fan et.al. | 2407.02824 | link |
2024-07-02 | FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs | Haodong Chen et.al. | 2407.02157 | null |
2024-07-02 | CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications | Yupeng Cao et.al. | 2407.01953 | null |
2024-07-05 | Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Zihan Wang et.al. | 2407.01906 | link |
2024-07-01 | A Fingerprint for Large Language Models | Zhiguang Yang et.al. | 2407.01235 | null |
2024-07-02 | Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images | Wenqiang Zu et.al. | 2407.01003 | link |
2024-06-25 | Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning | Arijit Sehanobish et.al. | 2406.17740 | null |
2024-06-19 | Parameter Training Efficiency Aware Resource Allocation for AIGC in Space-Air-Ground Integrated Networks | Liangxin Qian et.al. | 2406.13602 | null |
2024-06-19 | Sparse High Rank Adapters | Kartikeya Bhardwaj et.al. | 2406.13175 | null |
2024-06-18 | Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates | Cristian Meo et.al. | 2406.13046 | null |
2024-06-18 | Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation | Branislav Pecher et.al. | 2406.12471 | null |
2024-06-17 | A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models | Jian Gu et.al. | 2406.11753 | null |
2024-06-16 | ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts | Samar Khanna et.al. | 2406.10973 | null |
2024-06-16 | ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Yurun Song et.al. | 2406.10785 | null |
2024-06-16 | RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning | Haoyu Wang et.al. | 2406.10777 | null |
2024-06-15 | Benchmarking Children’s ASR with Supervised and Self-supervised Speech Foundation Models | Ruchao Fan et.al. | 2406.10507 | link |
2024-06-15 | Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts | Zhaoxuan Tan et.al. | 2406.10471 | null |
2024-06-13 | Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models | Lukas Thede et.al. | 2406.09384 | null |
2024-06-12 | Exploring Fact Memorization and Style Imitation in LLMs Using QLoRA: An Experimental Study and Quality Assessment Methods | Eugene Vyborov et.al. | 2406.08582 | null |
2024-06-12 | The Impact of Initialization on LoRA Finetuning Dynamics | Soufiane Hayou et.al. | 2406.08447 | null |
2024-06-20 | Low-Rank Quantization-Aware Training for LLMs | Yelysei Bondarenko et.al. | 2406.06385 | link |
2024-06-10 | A Parameter-efficient Language Extension Framework for Multilingual ASR | Wei Liu et.al. | 2406.06329 | null |
2024-06-09 | A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair | Guochang Li et.al. | 2406.05639 | link |
2024-06-07 | Efficient Differentially Private Fine-Tuning of Diffusion Models | Jing Liu et.al. | 2406.05257 | null |
2024-06-07 | CorDA: Context-Oriented Decomposition Adaptation of Large Language Models | Yibo Yang et.al. | 2406.05223 | null |
2024-06-07 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou et.al. | 2406.05130 | null |
2024-06-07 | MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jitai Hao et.al. | 2406.04984 | link |
2024-06-06 | Time Sensitive Knowledge Editing through Efficient Finetuning | Xiou Ge et.al. | 2406.04496 | link |
2024-06-06 | VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation | Prashanth Vijayaraghavan et.al. | 2406.04379 | null |
2024-06-10 | Hypernetworks for Personalizing ASR to Atypical Speech | Max Müller-Eberstein et.al. | 2406.04240 | null |
2024-06-06 | Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning | Naibin Gu et.al. | 2406.03792 | link |
2024-06-05 | Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need | Martin Wistuba et.al. | 2406.03216 | null |
2024-06-06 | Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision | Minglei Li et.al. | 2406.03051 | null |
2024-05-31 | Mamba State-Space Models Can Be Strong Downstream Learners | John T. Halloran et.al. | 2406.00209 | null |
2024-05-30 | ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections | Massimo Bini et.al. | 2405.20271 | link |
2024-05-30 | SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors | Vijay Lingam et.al. | 2405.19597 | link |
2024-05-29 | MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection | Raman Dutt et.al. | 2405.19458 | null |
2024-05-29 | MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning | Junjie Wang et.al. | 2405.18897 | null |
2024-05-29 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | Zelin Peng et.al. | 2405.18840 | null |
2024-06-01 | Low-Rank Few-Shot Adaptation of Vision-Language Models | Maxime Zanella et.al. | 2405.18541 | null |
2024-05-28 | Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning | Renzhi Wang et.al. | 2405.18292 | null |
2024-05-28 | VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections | Roy Miles et.al. | 2405.17991 | null |
2024-05-28 | Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis | Mingyuan Liu et.al. | 2405.17877 | null |
2024-05-27 | LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | Klaudia Bałazy et.al. | 2405.17604 | link |
2024-05-23 | EMR-Merging: Tuning-Free High-Performance Model Merging | Chenyu Huang et.al. | 2405.17461 | null |
2024-05-28 | DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution | Yulong Mao et.al. | 2405.17357 | link |
2024-05-27 | $\textit{Trans-LoRA}$ : towards data-free Transferable Parameter Efficient Finetuning | Runqian Wang et.al. | 2405.17258 | null |
2024-05-30 | Sparse Matrix in Large Language Model Fine-tuning | Haoze He et.al. | 2405.15525 | null |
2024-05-24 | Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation | Abhinav Jain et.al. | 2405.15282 | link |
2024-05-27 | VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks | Yang Li et.al. | 2405.15179 | link |
2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862 | null |
2024-05-23 | Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference | Ting Liu et.al. | 2405.14700 | link |
2024-05-22 | Spectral Adapter: Fine-Tuning in Spectral Space | Fangzhao Zhang et.al. | 2405.13952 | null |
2024-05-24 | MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models | Jingwei Xu et.al. | 2405.13053 | link |
2024-05-20 | FeTT: Continual Class Incremental Learning via Feature Transformation Tuning | Sunyuan Qiang et.al. | 2405.11822 | null |
2024-05-21 | HARIS: Human-Like Attention for Reference Image Segmentation | Mengxi Zhang et.al. | 2405.10707 | null |
2024-05-28 | DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation | Jie Xu et.al. | 2405.06368 | null |
2024-05-09 | Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection | Bhawesh Kumar et.al. | 2405.06093 | null |
2024-05-09 | Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning | Shibo Jie et.al. | 2405.05615 | link |
2024-05-07 | Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning | Karim Galliamov et.al. | 2405.04126 | link |
2024-05-04 | Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning | Jing Xu et.al. | 2405.02596 | link |
2024-03-16 | Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R | Amirreza Esmaeili et.al. | 2405.01553 | null |
2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
2024-04-29 | LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report | Justin Zhao et.al. | 2405.00732 | link |
2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602 | null |
2024-05-01 | MoPEFT: A Mixture-of-PEFTs for the Segment Anything Model | Rajat Sahay et.al. | 2405.00293 | null |
2024-04-30 | SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models | Samir Arora et.al. | 2405.00201 | null |
2024-05-23 | HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning | Chunlin Tian et.al. | 2404.19245 | link |
2024-05-25 | FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition | Yuxuan Yan et.al. | 2404.18848 | null |
2024-04-25 | Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models | Jiawei Chen et.al. | 2404.16385 | null |
2024-05-23 | MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | Dengchun Li et.al. | 2404.15159 | link |
2024-04-22 | ColA: Collaborative Adaptation with Gradient Learning | Enmao Diao et.al. | 2404.13844 | link |
2024-04-23 | Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications | Charith Chandra Sai Balne et.al. | 2404.13506 | null |
2024-04-18 | SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up | Nakyeong Yang et.al. | 2404.11916 | null |
2024-04-16 | Shears: Unstructured Sparsity with Neural Low-rank Adapter Search | J. Pablo Muñoz et.al. | 2404.10934 | link |
2024-04-16 | Exact and Efficient Unlearning for Large Language Model-based Recommendation | Zhiyu Hu et.al. | 2404.10327 | null |
2024-04-15 | LoRA Dropout as a Sparsity Regularizer for Overfitting Control | Yang Lin et.al. | 2404.09610 | null |
2024-04-21 | Analyzing the Impact of Data Selection and Fine-Tuning on Economic and Political Biases in LLMs | Ahmed Agiza et.al. | 2404.08699 | link |
2024-04-08 | Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing | Chengyan Fu et.al. | 2404.05350 | null |
2024-04-08 | DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model | Chao Gao et.al. | 2404.05182 | null |
2024-04-12 | Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models | Zhiyuan Peng et.al. | 2404.04522 | null |
2024-04-05 | Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation | Tong Su et.al. | 2404.04212 | null |
2024-05-22 | ReFT: Representation Finetuning for Language Models | Zhengxuan Wu et.al. | 2404.03592 | link |
2024-06-11 | Personalized LLM Response Generation with Parameterized Memory Injection | Kai Zhang et.al. | 2404.03565 | null |
2024-06-20 | Eigenpruning: an Interpretability-Inspired PEFT Method | Tomás Vergara-Browne et.al. | 2404.03147 | link |
2024-05-28 | PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models | Fanxu Meng et.al. | 2404.02948 | link |
2024-04-03 | Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data | Parth Patwa et.al. | 2404.02422 | null |
2024-04-11 | IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT | Junchen Fu et.al. | 2404.02059 | link |
2024-03-31 | Query-driven Relevant Paragraph Extraction from Legal Judgments | T. Y. S. S Santosh et.al. | 2404.00595 | null |
2024-03-30 | Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4 | Aryo Pradipta Gema et.al. | 2404.00484 | link |
2024-04-03 | InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning | Yan-Shuo Liang et.al. | 2404.00228 | link |
2024-03-27 | Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation | Mateusz Klimaszewski et.al. | 2403.18804 | link |
2024-03-26 | The Unreasonable Ineffectiveness of the Deeper Layers | Andrey Gromov et.al. | 2403.17887 | null |
2024-04-15 | ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models | Zequan Liu et.al. | 2403.16187 | null |
2024-03-22 | KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation | Xindi Luo et.al. | 2403.14950 | link |
2024-03-22 | A Single Linear Layer Yields Task-Adapted Low-Rank Matrices | Hwichan Kim et.al. | 2403.14946 | null |
2024-03-21 | AutoRE: Document-Level Relation Extraction with Large Language Models | Xue Lilong et.al. | 2403.14888 | link |
2024-04-29 | Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | Zeyu Han et.al. | 2403.14608 | null |
2024-03-20 | Harnessing Large Language Models for Text-Rich Sequential Recommendation | Zhi Zheng et.al. | 2403.13325 | link |
2024-04-16 | AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models | Zeyu Liu et.al. | 2403.13269 | null |
2024-03-18 | Improving LoRA in Privacy-preserving Federated Learning | Youbang Sun et.al. | 2403.12313 | null |
2024-03-18 | Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Wangbo Zhao et.al. | 2403.11808 | link |
2024-03-18 | Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model | Haoyun Xu et.al. | 2403.11621 | null |
2024-03-19 | JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning | Anique Tahir et.al. | 2403.11366 | link |
2024-03-14 | Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks | Tingyu Qu et.al. | 2403.09377 | link |
2024-03-14 | PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation | Yizhe Xiong et.al. | 2403.09192 | link |
2024-03-13 | Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning | Ming Dong et.al. | 2403.08484 | null |
Text-to-Image Generation
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-09-18 | Massively Multi-Person 3D Human Motion Forecasting with Scene Context | Felix B Mueller et.al. | 2409.12189 | null |
2024-09-18 | MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | null |
2024-09-18 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | EverestAI et.al. | 2409.12139 | null |
2024-09-18 | Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance | Jaehoon Joo et.al. | 2409.12099 | null |
2024-09-19 | Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval | Warren Jouanneau et.al. | 2409.12097 | null |
2024-09-18 | Design of Ligand-Binding Proteins with Atomic Flow Matching | Junqi Liu et.al. | 2409.12080 | null |
2024-09-18 | Denoising diffusion models for high-resolution microscopy image restoration | Pamela Osuna-Vargas et.al. | 2409.12078 | null |
2024-09-19 | Using Large Language Models to Generate Clinical Trial Tables and Figures | Yumeng Yang et.al. | 2409.12046 | null |
2024-09-18 | LEMON: Localized Editing with Mesh Optimization and Neural Shaders | Furkan Mert Algan et.al. | 2409.12024 | null |
2024-09-18 | Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization | Zhi Chen et.al. | 2409.12020 | null |
2024-09-18 | Towards Global Localization using Multi-Modal Object-Instance Re-Identification | Aneesh Chavan et.al. | 2409.12002 | null |
2024-09-18 | Tracking Any Point with Frame-Event Fusion Network at High Frame Rate | Jiaxiong Liu et.al. | 2409.11953 | null |
2024-09-18 | Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models | Lorenzo Mandelli et.al. | 2409.11920 | null |
2024-09-18 | AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots | Zhaxizhuoma et.al. | 2409.11905 | null |
2024-09-18 | Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation | Dimitrios Christodoulou et.al. | 2409.11904 | null |
2024-09-17 | Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion | Zhenwei Wang et.al. | 2409.11406 | null |
2024-09-17 | Teaching dark matter simulations to speak the halo language | Shivam Pandey et.al. | 2409.11401 | null |
2024-09-17 | Ultrasound Image Enhancement with the Variance of Diffusion Models | Yuxin Zhang et.al. | 2409.11380 | link |
2024-09-17 | OSV: One Step is Enough for High-Quality Image to Video Generation | Xiaofeng Mao et.al. | 2409.11367 | null |
2024-09-17 | Ping! Your Food is Ready: Comparing Different Notification Techniques in 3D AR Cooking Environment | Aditya Raikwar et.al. | 2409.11357 | null |
2024-09-17 | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Gonzalo Martin Garcia et.al. | 2409.11355 | link |
2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340 | link |
2024-09-17 | fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction | Jianxiong Gao et.al. | 2409.11315 | null |
2024-09-17 | SpMis: An Investigation of Synthetic Spoken Misinformation Detection | Peizhuo Liu et.al. | 2409.11308 | null |
2024-09-17 | Measurement of top-quark pair production in association with charm quarks in proton-proton collisions at $\sqrt{s}=13$ TeV with the ATLAS detector | ATLAS Collaboration et.al. | 2409.11305 | null |
2024-09-17 | NirvaWave: An Accurate and Efficient Near Field Wave Propagation Simulator for 6G and Beyond | Vahid Yazdnian et.al. | 2409.11293 | null |
2024-09-17 | DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models | Avirup Das et.al. | 2409.11292 | null |
2024-09-17 | Neural Networks for Vehicle Routing Problem | László Kovács et.al. | 2409.11290 | null |
2024-09-17 | Attacking Slicing Network via Side-channel Reinforcement Learning Attack | Wei Shao et.al. | 2409.11258 | null |
2024-09-17 | Learning Source Disentanglement in Neural Audio Codec | Xiaoyu Bie et.al. | 2409.11228 | null |
2024-09-16 | Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond | Zack Goldblum et.al. | 2409.10509 | null |
2024-09-16 | Torres funerarias chullpa en el valle del río Lauca: un primer análisis arqueoastronómico | Alejandro Gangui et.al. | 2409.10497 | null |
2024-09-16 | Incorporating Classifier-Free Guidance in Diffusion Model-Based Recommendation | Noah Buchanan et.al. | 2409.10494 | null |
2024-09-16 | SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing | Qi Qian et.al. | 2409.10476 | null |
2024-09-16 | MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion | Lehong Wu et.al. | 2409.10473 | null |
2024-09-16 | Signed Graph Autoencoder for Explainable and Polarization-Aware Network Embeddings | Nikolaos Nakis et.al. | 2409.10452 | null |
2024-09-16 | Mamba-ST: State Space Model for Efficient Style Transfer | Filippo Botti et.al. | 2409.10385 | link |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357 | null |
2024-09-16 | Taming Diffusion Models for Image Restoration: A Review | Ziwei Luo et.al. | 2409.10353 | null |
2024-09-16 | MEGS: Morphological Evaluation of Galactic Structure | Ufuk Çakır et.al. | 2409.10346 | link |
2024-09-16 | VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation | Aaron Mark Thomas et.al. | 2409.10339 | null |
2024-09-16 | Research and Design of a Financial Intelligent Risk Control Platform Based on Big Data Analysis and Deep Machine Learning | Shuochen Bi et.al. | 2409.10331 | null |
2024-09-16 | Fairness, not Emotion, Drives Socioeconomic Decision Making | Rudra Mukhopadhyay et.al. | 2409.10322 | null |
2024-09-16 | On Synthetic Texture Datasets: Challenges, Creation, and Curation | Blaine Hoak et.al. | 2409.10297 | null |
2024-09-16 | DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis | Fa-Ting Hong et.al. | 2409.10281 | null |
2024-09-13 | Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation | Qingwen Bu et.al. | 2409.09016 | link |
2024-09-13 | A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis | Yohan Poirier-Ginter et.al. | 2409.08947 | null |
2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
2024-09-13 | Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation | Guojun Liang et.al. | 2409.08917 | link |
2024-09-13 | Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling | Nebiyou Yismaw et.al. | 2409.08906 | null |
2024-09-13 | Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control | Carles Domingo-Enrich et.al. | 2409.08861 | null |
2024-09-13 | The Line-Based Dial-a-Ride Problem | Kendra Reiter et.al. | 2409.08860 | null |
2024-09-13 | InstantDrag: Improving Interactivity in Drag-based Image Editing | Joonghyuk Shin et.al. | 2409.08857 | null |
2024-09-13 | DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s) | Yun Su Jeong et.al. | 2409.08850 | null |
2024-09-13 | Development of a Compton Imager Setup | Anuraag Arya et.al. | 2409.08822 | null |
2024-09-13 | LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment | Huan Zhang et.al. | 2409.08795 | null |
2024-09-13 | What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs | Qianou Ma et.al. | 2409.08775 | null |
2024-09-13 | A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization | Tiago Cunha et.al. | 2409.08752 | null |
2024-09-13 | Adaptive Sampling for Continuous Group Equivariant Neural Networks | Berfin Inal et.al. | 2409.08741 | null |
2024-09-13 | DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset | Jiawei Du et.al. | 2409.08731 | null |
2024-09-12 | DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors | Thomas Hanwen Zhu et.al. | 2409.08278 | null |
2024-09-12 | Hand-Object Interaction Pretraining from Videos | Himanshu Gaurav Singh et.al. | 2409.08273 | null |
2024-09-12 | Click2Mask: Local Editing with Dynamic Mask Generation | Omer Regev et.al. | 2409.08272 | null |
2024-09-12 | DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer | Runjia Li et.al. | 2409.08271 | null |
2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269 | null |
2024-09-12 | Improving Text-guided Object Inpainting with Semantic Pre-inpainting | Yifu Chen et.al. | 2409.08260 | link |
2024-09-12 | Improving Virtual Try-On with Garment-focused Diffusion Models | Siqi Wan et.al. | 2409.08258 | null |
2024-09-12 | LoRID: Low-Rank Iterative Diffusion for Adversarial Purification | Geigh Zollicoffer et.al. | 2409.08255 | null |
2024-09-12 | Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding | Hongyu Li et.al. | 2409.08251 | null |
2024-09-12 | IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation | Yinwei Wu et.al. | 2409.08240 | null |
2024-09-12 | Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Alisia Lupidi et.al. | 2409.08239 | null |
2024-09-12 | LT3SD: Latent Trees for 3D Scene Diffusion | Quan Meng et.al. | 2409.08215 | null |
2024-09-12 | VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis | Hao Chen et.al. | 2409.08207 | null |
2024-09-12 | High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis | Takuto Onikubo et.al. | 2409.08167 | null |
2024-09-12 | MagicStyle: Portrait Stylization Based on Reference Image | Zhaoli Deng et.al. | 2409.08156 | null |
2024-09-11 | DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation | Haibo Yang et.al. | 2409.07454 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452 | link |
2024-09-11 | FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process | Yang Luo et.al. | 2409.07451 | null |
2024-09-11 | Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging | Yunzhen Wang et.al. | 2409.07417 | null |
2024-09-11 | Extracting TCPIP Headers at High Speed for the Anonymized Network Traffic Graph Challenge | Zhaoyang Han et.al. | 2409.07374 | null |
2024-09-11 | Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination | Daniel Zhang-Li et.al. | 2409.07372 | null |
2024-09-11 | Event-based Mosaicing Bundle Adjustment | Shuang Guo et.al. | 2409.07365 | link |
2024-09-11 | Training-Free Guidance for Discrete Diffusion Models for Molecular Generation | Thomas J. Kerby et.al. | 2409.07359 | null |
2024-09-11 | Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching | Eugenio Chisari et.al. | 2409.07343 | null |
2024-09-11 | Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models | Fengzhe Zhang et.al. | 2409.07323 | null |
2024-09-11 | Optimizing Neural Network Performance and Interpretability with Diophantine Equation Encoding | Ronald Katende et.al. | 2409.07310 | null |
2024-09-11 | Exploring User-level Gradient Inversion with a Diffusion Prior | Zhuohang Li et.al. | 2409.07291 | null |
2024-09-11 | CCFExp: Facial Image Synthesis with Cycle Cross-Fusion Diffusion Model for Facial Paralysis Individuals | Weixiang Gao et.al. | 2409.07271 | link |
2024-09-11 | Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models | Sanoojan Baliah et.al. | 2409.07269 | link |
2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | null |
2024-09-10 | Technical Report of Mobile Manipulator Robot for Industrial Environments | Erfan Amoozad Khalili et.al. | 2409.06693 | null |
2024-09-10 | SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Teng Hu et.al. | 2409.06633 | null |
2024-09-10 | MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification | Phu Pham et.al. | 2409.06620 | null |
2024-09-10 | A Primer on Variational Inference for Physics-Informed Deep Generative Modelling | Alex Glyn-Davies et.al. | 2409.06560 | null |
2024-09-10 | From LIMA to DeepLIMA: following a new path of interoperability | Victor Bocharov et.al. | 2409.06550 | null |
2024-09-10 | Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models | Xin Jing et.al. | 2409.06451 | null |
2024-09-10 | Prompt2Fashion: An automatically generated fashion dataset | Georgia Argyro et.al. | 2409.06442 | link |
2024-09-10 | Fast nonparametric inference of network backbones for graph sparsification | Alec Kirkley et.al. | 2409.06417 | link |
2024-09-10 | Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition | Junzheng Zhang et.al. | 2409.06371 | null |
2024-09-10 | What happens to diffusion model likelihood when your model is conditional? | Mattias Cross et.al. | 2409.06364 | null |
2024-09-10 | DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement | Jia-Wei Liao et.al. | 2409.06355 | null |
2024-09-10 | Improving Conditional Level Generation using Automated Validation in Match-3 Games | Monica Villanueva Aylagas et.al. | 2409.06349 | null |
2024-09-10 | Foragax: An Agent Based Modelling framework based on JAX | Siddharth Chaturvedi et.al. | 2409.06345 | link |
2024-09-10 | G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer | Jinzhi Zhang et.al. | 2409.06322 | null |
2024-09-10 | Learning Augmentation Policies from A Model Zoo for Time Series Forecasting | Haochen Yuan et.al. | 2409.06282 | null |
2024-09-09 | Fast Generation of Custom Floating-Point Spatial Filters on FPGAs | Nelson Campos et.al. | 2409.05837 | null |
2024-09-09 | Enhancing Preference-based Linear Bandits via Human Response Time | Shen Li et.al. | 2409.05798 | null |
2024-09-09 | Predicting Critical Heat Flux with Uncertainty Quantification and Domain Generalization Using Conditional Variational Autoencoders and Deep Neural Networks | Farah Alsafadi et.al. | 2409.05790 | null |
2024-09-09 | Vector Quantized Diffusion Model Based Speech Bandwidth Extension | Yuan Fang et.al. | 2409.05784 | null |
2024-09-09 | AS-Speech: Adaptive Style For Speech Synthesis | Zhipeng Li et.al. | 2409.05730 | null |
2024-09-09 | pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning | Jiahao Lai et.al. | 2409.05701 | null |
2024-09-09 | Citizen-Led Personalization of User Interfaces: Investigating How People Customize Interfaces for Themselves and Others | Sérgio Alves et.al. | 2409.05696 | null |
2024-09-09 | Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models | Aakash Sen Sharma et.al. | 2409.05668 | null |
2024-09-09 | Forward KL Regularized Preference Optimization for Aligning Diffusion Policies | Zhao Shan et.al. | 2409.05622 | null |
2024-09-09 | CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization | Nan Chen et.al. | 2409.05606 | null |
2024-09-09 | Latent 3D Brain MRI Counterfactual | Wei Peng et.al. | 2409.05585 | null |
2024-09-09 | Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation | Muraleekrishna Gopinathan et.al. | 2409.05583 | link |
2024-09-09 | Design and Implementation of TAO DAQ System | Shuihan Zhang et.al. | 2409.05522 | null |
2024-09-09 | A Taxonomy of Miscompressions: Preparing Image Forensics for Neural Compression | Nora Hofer et.al. | 2409.05490 | null |
2024-09-09 | DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation | Wei Wu et.al. | 2409.05463 | null |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429 | null |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424 | null |
2024-09-06 | Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation | Zhuoyan Luo et.al. | 2409.04410 | null |
2024-09-06 | Enhancing Skin Lesion Diagnosis with Ensemble Learning | Xiaoyi Liu et.al. | 2409.04381 | null |
2024-09-06 | How Fair is Your Diffusion Recommender Model? | Daniele Malitesta et.al. | 2409.04339 | null |
2024-09-06 | Random effects estimation in a fractional diffusion model based on continuous observations | Nesrine Chebli et.al. | 2409.04331 | null |
2024-09-06 | Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | Yuxiao Huang et.al. | 2409.04270 | null |
2024-09-06 | An overview of domain-specific foundation model: key technologies, applications and challenges | Haolong Chen et.al. | 2409.04267 | null |
2024-09-06 | UniDet3D: Multi-dataset Indoor 3D Object Detection | Maksim Kolodiazhnyi et.al. | 2409.04234 | link |
2024-09-06 | Generative Modelling via Quantile Regression | Johannes Schmidt-Hieber et.al. | 2409.04231 | null |
2024-09-06 | Breaking the Brownian Barrier: Models and Manifestations of Molecular Diffusion in Complex Fluids | Harish Srinivasan et.al. | 2409.04199 | null |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196 | null |
2024-09-06 | Subsampling of Correlated Graph Signals | Rishabh Ravi et.al. | 2409.04107 | null |
2024-09-06 | Estimation of service value parameters for a queue with unobserved balking | Daniel Podorojnyi et.al. | 2409.04090 | null |
2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | link |
2024-09-05 | WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild | Yuntian Deng et.al. | 2409.03753 | null |
2024-09-05 | ArtiFade: Learning to Generate High-quality Subject from Blemished Images | Shuya Yang et.al. | 2409.03745 | null |
2024-09-06 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
2024-09-05 | RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images | Benzhi Wang et.al. | 2409.03644 | null |
2024-09-05 | DiffEVC: Any-to-Any Emotion Voice Conversion with Expressive Guidance | Hsing-Hang Chou et.al. | 2409.03636 | null |
2024-09-05 | Generalizing Linear Graphs and Bond Graph Models with Hetero-functional Graphs for System-of-Systems Engineering Applications | Ehsanoddin Ghorbanichemazkati et.al. | 2409.03630 | null |
2024-09-05 | TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces | Bernardo Biesseck et.al. | 2409.03600 | link |
2024-09-05 | DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture | Qianlong Xiang et.al. | 2409.03550 | null |
2024-09-05 | Euclid preparation. Simulations and nonlinearities beyond $Λ$ CDM. 2. Results from non-standard simulations | Euclid Collaboration et.al. | 2409.03523 | null |
2024-09-05 | Blended Latent Diffusion under Attention Control for Real-World Video Editing | Deyin Liu et.al. | 2409.03514 | null |
2024-09-05 | Physical Modelling of Piano Sound | Haifan Xie et.al. | 2409.03481 | null |
2024-09-05 | Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration | Pei Wang et.al. | 2409.03455 | null |
2024-09-05 | Rx Strategist: Prescription Verification using LLM Agents System | Phuc Phan Van et.al. | 2409.03440 | null |
2024-09-05 | KiloBot: A Programming Language for Deploying Perception-Guided Industrial Manipulators at Scale | Wei Gao et.al. | 2409.03439 | null |
2024-09-04 | HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts | Xinyu Liu et.al. | 2409.02919 | link |
2024-09-04 | Latent Watermarking of Audio Generative Models | Robin San Roman et.al. | 2409.02915 | null |
2024-09-04 | Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling | Kaiwen Zheng et.al. | 2409.02908 | null |
2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
2024-09-04 | Look Into the LITE in Deep Learning for Time Series Classification | Ali Ismail-Fawaz et.al. | 2409.02869 | link |
2024-09-04 | Building a Scalable, Effective, and Steerable Search and Ranking Platform | Marjan Celikik et.al. | 2409.02856 | null |
2024-09-04 | Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models | Zhibin Liu et.al. | 2409.02851 | link |
2024-09-04 | Anomaly Detection in Offshore Open Radio Access Network Using Long Short-Term Memory Models on a Novel Artificial Intelligence-Driven Cloud-Native Data Platform | Abdelrahim Ahmad et.al. | 2409.02849 | null |
2024-09-04 | Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model | Tornike Karchkhadze et.al. | 2409.02845 | null |
2024-09-04 | SNNAX – Spiking Neural Networks in JAX | Jamie Lohoff et.al. | 2409.02842 | null |
2024-09-04 | Experimental Framework for Generating Reliable Ground Truth for Laryngeal Spatial Segmentation Tasks | Hamzeh Ghasemzadeh et.al. | 2409.02809 | null |
2024-09-04 | Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL | Mohammad Reshadati et.al. | 2409.02711 | null |
2024-09-04 | Rethinking HTG Evaluation: Bridging Generation and Recognition | Konstantina Nikolaidou et.al. | 2409.02683 | link |
2024-09-04 | Introduction to Machine Learning | Laurent Younes et.al. | 2409.02668 | null |
2024-09-04 | Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus | Gokhan Dogru et.al. | 2409.02667 | null |
2024-08-30 | Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes | Li Zhang et.al. | 2408.17421 | link |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
2024-08-30 | Subspace Diffusion Posterior Sampling for Travel-Time Tomography | Xiang Cao et.al. | 2408.17333 | null |
2024-08-30 | Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations | Ahmed Hammam et.al. | 2408.17311 | null |
2024-08-30 | Leveraging Deep Generative Model For Computational Protein Design And Optimization | Boqiao Lai et.al. | 2408.17241 | null |
2024-08-30 | Towards Symbolic XAI – Explanation Through Human Understandable Logical Relationships Between Features | Thomas Schnake et.al. | 2408.17198 | null |
2024-09-02 | Leveraging Blockchain and ANFIS for Optimal Supply Chain Management | Amirfarhad Farhadi et.al. | 2408.17161 | null |
2024-08-30 | Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning | Xiaoye Qu et.al. | 2408.17150 | link |
2024-08-30 | Flow Matching for Optimal Reaction Coordinates of Biomolecular System | Mingyuan Zhang et.al. | 2408.17139 | link |
2024-08-30 | Temporal and Interactive Modeling for Efficient Human-Human Motion Generation | Yabiao Wang et.al. | 2408.17135 | null |
2024-09-02 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | Avideep Mukherjee et.al. | 2408.17095 | null |
2024-08-30 | FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition | Chen Hu et.al. | 2408.17090 | link |
2024-08-30 | Approximately Invertible Neural Network for Learned Image Compression | Yanbo Gao et.al. | 2408.17073 | null |
2024-09-02 | Instant Adversarial Purification with Adversarial Consistency Distillation | Chun Tong Lei et.al. | 2408.17064 | null |
2024-08-30 | Text-to-Image Generation Via Energy-Based CLIP | Roy Ganz et.al. | 2408.17046 | null |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767 | null |
2024-08-29 | CSGO: Content-Style Composition in Text-to-Image Generation | Peng Xing et.al. | 2408.16766 | null |
2024-08-29 | A Score-Based Density Formula, with Applications in Diffusion Generative Models | Gen Li et.al. | 2408.16765 | null |
2024-08-29 | UV-free Texture Generation with Denoising and Geodesic Heat Diffusions | Simone Foti et.al. | 2408.16762 | link |
2024-08-29 | One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Anisha Jain et.al. | 2408.16704 | null |
2024-08-29 | VMC: A Grammar for Visualizing Statistical Model Checks | Ziyang Guo et.al. | 2408.16702 | null |
2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D’Incà et.al. | 2408.16700 | link |
2024-08-29 | Optimization Models for the Quadratic Traveling Salesperson Problem | Yuxiao Chen et.al. | 2408.16680 | null |
2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
2024-08-29 | RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model | Zhuan Shi et.al. | 2408.16634 | null |
2024-08-28 | TEDRA: Text-based Editing of Dynamic and Photoreal Actors | Basavaraj Sunagad et.al. | 2408.15995 | null |
2024-08-28 | Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation | Shengyuan Zhang et.al. | 2408.15991 | link |
2024-08-28 | Thoughtseeds: Evolutionary Priors, Nested Markov Blankets, and the Emergence of Embodied Cognition | Prakash Chandra Kavi et.al. | 2408.15982 | null |
2024-08-28 | Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems | Ibrahim K. Ozaslan et.al. | 2408.15969 | null |
2024-08-28 | MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets | Dominic Phillips et.al. | 2408.15905 | null |
2024-08-28 | Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones | Carlos Plou et.al. | 2408.15899 | null |
2024-08-28 | Airfoil Diffusion: Denoising Diffusion Model For Conditional Airfoil Generation | Reid Graves et.al. | 2408.15898 | link |
2024-08-28 | Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data | Ayodeji Ijishakin et.al. | 2408.15890 | null |
2024-08-29 | Recent Decade’s Power Outage Data Reveals the Increasing Vulnerability of U.S. Power Infrastructure | Bo Li et.al. | 2408.15882 | null |
2024-08-28 | GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model | Yongjie Fu et.al. | 2408.15868 | null |
2024-08-27 | GenRec: Unifying Video Generation and Recognition with Diffusion Models | Zejia Weng et.al. | 2408.15241 | null |
2024-08-27 | Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation | Xiaojuan Wang et.al. | 2408.15239 | null |
2024-08-27 | Simulation of Stochastic Discrete Dislocation Dynamics in Ductile Vs Brittle Materials | Santosh Chhetri et.al. | 2408.15157 | null |
2024-08-27 | How transformers learn structured data: insights from hierarchical filtering | Jerome Garnier-Brun et.al. | 2408.15138 | null |
2024-08-27 | DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays | Yiran Sun et.al. | 2408.15118 | link |
2024-08-27 | Data-Driven Nonlinear Deformation Design of 3D-Printable Shells | Samuel Silverman et.al. | 2408.15097 | link |
2024-08-27 | Constrained Diffusion Models via Dual Training | Shervin Khalafi et.al. | 2408.15094 | null |
2024-08-27 | LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features | Weidong Guo et.al. | 2408.14977 | null |
2024-08-27 | MegActor- $Σ$ : Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer | Shurong Yang et.al. | 2408.14975 | null |
2024-08-27 | Integrated Bundling and Pricing of Unique Items | Maxime Bouscary et.al. | 2408.14913 | null |
2024-08-26 | K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences | Zhikai Li et.al. | 2408.14468 | null |
2024-08-26 | Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs | Xiaoman Zhang et.al. | 2408.14397 | link |
2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
2024-08-26 | GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy | Peiyan Li et.al. | 2408.14368 | link |
2024-08-27 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-26 | Automated Machine Learning in Insurance | Panyi Dong et.al. | 2408.14331 | link |
2024-08-26 | LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | Yayati Jadhav et.al. | 2408.14307 | null |
2024-08-26 | Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes | Chao Chen et.al. | 2408.14279 | null |
2024-08-26 | Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach | Vittoriano Muttillo et.al. | 2408.14259 | null |
2024-08-27 | Text3DAug – Prompted Instance Augmentation for LiDAR Perception | Laurenz Reichardt et.al. | 2408.14253 | link |
2024-08-23 | How Diffusion Models Learn to Factorize and Compose | Qiyao Liang et.al. | 2408.13256 | null |
2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
2024-08-23 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Tao Wu et.al. | 2408.13239 | null |
2024-08-23 | Social Welfare Maximization for Federated Learning with Network Effects | Xiang Li et.al. | 2408.13223 | null |
2024-08-23 | Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | Dineth Jayakody et.al. | 2408.13202 | null |
2024-08-23 | IFH: a Diffusion Framework for Flexible Design of Graph Generative Models | Samuel Cognolato et.al. | 2408.13194 | link |
2024-08-23 | Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention | Xiaoyi Liu et.al. | 2408.13180 | null |
2024-08-26 | Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation | Bonan Li et.al. | 2408.13149 | null |
2024-08-23 | Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning | Jihwan Oh et.al. | 2408.13092 | null |
2024-08-23 | General Intelligent Imaging and Uncertainty Quantification by Deterministic Diffusion Model | Weiru Fan et.al. | 2408.13061 | null |
2024-08-22 | xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Can Qin et.al. | 2408.12590 | null |
2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
2024-08-22 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Jinheng Xie et.al. | 2408.12528 | null |
2024-08-22 | FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing | Jue Wang et.al. | 2408.12429 | link |
2024-08-22 | Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification | Sudi Murindanyi et.al. | 2408.12426 | null |
2024-08-22 | 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment | Kaihui Cheng et.al. | 2408.12419 | null |
2024-08-22 | CODE: Confident Ordinary Differential Editing | Bastien van Delft et.al. | 2408.12418 | link |
2024-08-22 | Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures | Ce Liu et.al. | 2408.12413 | null |
2024-08-22 | A Stable Polygamy Approach to Spectrum Access with Channel Reuse | Dan Ben Ami et.al. | 2408.12402 | null |
2024-08-22 | Multi-Style Facial Sketch Synthesis through Masked Generative Modeling | Bowen Sun et.al. | 2408.12400 | null |
2024-08-21 | Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models | Chun-Yen Shih et.al. | 2408.11810 | null |
2024-08-21 | ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation | Shiqi Yang et.al. | 2408.11805 | null |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
2024-08-21 | Timeline and Boundary Guided Diffusion Network for Video Shadow Detection | Haipeng Zhou et.al. | 2408.11785 | link |
2024-08-21 | Sum of Squares Circuits | Lorenzo Loconte et.al. | 2408.11778 | null |
2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
2024-08-21 | D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models | M. Forlini et.al. | 2408.11761 | null |
2024-08-21 | JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet | Yujia Gu et.al. | 2408.11744 | null |
2024-08-21 | Enhancing Cross-Modal Medical Image Segmentation through Compositionality | Aniek Eijpe et.al. | 2408.11733 | link |
2024-08-21 | AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams | Tianyi Liu et.al. | 2408.11728 | null |
2024-08-20 | Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research | Sreyoshi Bhaduri et.al. | 2408.11043 | null |
2024-08-20 | Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | Chunting Zhou et.al. | 2408.11039 | null |
2024-08-20 | Full Detector Simulation of a Projective Dual-Readout Segmented Crystal Electromagnetic Calorimeter with Precision Timing | Wonyong Chung et.al. | 2408.11027 | null |
2024-08-20 | MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning | Haoning Wu et.al. | 2408.11001 | link |
2024-08-20 | GreediRIS: Scalable Influence Maximization using Distributed Streaming Maximum Cover | Reet Barik et.al. | 2408.10982 | null |
2024-08-21 | Assortment Optimization Under History-Dependent Effects | Taotao He et.al. | 2408.10967 | null |
2024-08-20 | Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling | Jaideep Pathak et.al. | 2408.10958 | null |
2024-08-20 | SysBench: Can Large Language Models Follow System Messages? | Yanzhao Qin et.al. | 2408.10943 | link |
2024-08-20 | A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection | Vladislav Li et.al. | 2408.10940 | null |
2024-08-20 | Large Point-to-Gaussian Model for Image-to-3D Generation | Longfei Lu et.al. | 2408.10935 | null |
2024-08-19 | MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model | Minghua Liu et.al. | 2408.10198 | null |
2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195 | null |
2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | null |
2024-08-19 | Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language | Manjil Karki et.al. | 2408.10128 | null |
2024-08-19 | Learning Precise Affordances from Egocentric Videos for Robotic Manipulation | Gen Li et.al. | 2408.10123 | null |
2024-08-19 | Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision | Zhijun Jia et.al. | 2408.10096 | null |
2024-08-19 | Stacked Intelligent Metasurfaces for Integrated Sensing and Communications | Haoxian Niu et.al. | 2408.10043 | null |
2024-08-19 | General Impedance Modeling for Modular Multilevel Converter with Grid-forming and Grid-following Control | Chu Sun et.al. | 2408.10017 | null |
2024-08-19 | Uniting contrastive and generative learning for event sequences models | Aleksandr Yugay et.al. | 2408.09995 | null |
2024-08-19 | Multi-layer diffusion model of photovoltaic installations | Tomasz Weron et.al. | 2408.09904 | null |
2024-08-16 | Automated High-throughput Organic Crystal Structure Prediction via Population-based Sampling | Qiang Zhu et.al. | 2408.08843 | link |
2024-08-16 | PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future | Guangyi Wang et.al. | 2408.08822 | null |
2024-08-16 | A Unified Automata-Theoretic Approach to LTLf Modulo Theories (Extended Version) | Marco Faella et.al. | 2408.08817 | null |
2024-08-16 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics | Chenwei Wan et.al. | 2408.08782 | link |
2024-08-16 | Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion | Sanchayan Vivekananthan et.al. | 2408.08751 | null |
2024-08-16 | The Blessing of Strategic Customers in Personalized Pricing | Zhi Chen et.al. | 2408.08738 | null |
2024-08-16 | ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language | Yongkang Liu et.al. | 2408.08724 | null |
2024-08-16 | An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation | Peiming Guo et.al. | 2408.08650 | null |
2024-08-16 | Modeling the Neonatal Brain Development Using Implicit Neural Representations | Florentin Bieder et.al. | 2408.08647 | link |
2024-08-16 | Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes | Chiara Amorino et.al. | 2408.08638 | null |
2024-08-15 | Understanding the Local Geometry of Generative Model Manifolds | Ahmed Imtiaz Humayun et.al. | 2408.08307 | null |
2024-08-15 | Accelerated Image-Aware Generative Diffusion Modeling | Tanmay Asthana et.al. | 2408.08306 | null |
2024-08-15 | Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks | Ni Ou et.al. | 2408.08276 | null |
2024-08-15 | mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis | Dae-young Kim et.al. | 2408.08261 | null |
2024-08-15 | Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding | Xiner Li et.al. | 2408.08252 | link |
2024-08-15 | Picosecond laser pulses for quantum dot-microcavity based single photon generation by cascaded electro-optic modulation of a narrow-linewidth laser | Mio Poortvliet et.al. | 2408.08213 | null |
2024-08-15 | Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion | Adi Haviv et.al. | 2408.08184 | null |
2024-08-15 | Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality | Sangita Das et.al. | 2408.08142 | link |
2024-08-15 | Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification | Levente Murgás et.al. | 2408.08126 | link |
2024-08-15 | When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Pingping Zhang et.al. | 2408.08093 | null |
2024-08-14 | Detecting Near-Duplicate Face Images | Sudipta Banerjee et.al. | 2408.07689 | link |
2024-08-14 | Composing Automatic Differentiation with Custom Derivatives of Higher-Order Functions | Sam Estep et.al. | 2408.07683 | null |
2024-08-14 | Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding | Bing Hu et.al. | 2408.07636 | null |
2024-08-14 | Anisotropic Diffusion Model of Communication in 2D Biofilm | Yanahan Paramalingam et.al. | 2408.07626 | null |
2024-08-14 | Neural Quantum States and Peaked Molecular Wave Functions: Curse or Blessing? | Aleksei Malyshev et.al. | 2408.07625 | null |
2024-08-14 | MatterGPT: A Generative Transformer for Multi-Property Inverse Design of Solid-State Materials | Yan Chen et.al. | 2408.07608 | null |
2024-08-14 | PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation | Sang-Hoon Lee et.al. | 2408.07547 | link |
2024-08-14 | New Curriculum, New Chance – Retrieval Augmented Generation for Lesson Planning in Ugandan Secondary Schools. Prototype Quality Evaluation | Simon Kloker et.al. | 2408.07542 | null |
2024-08-14 | DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model | Erez Yosef et.al. | 2408.07541 | null |
2024-08-14 | Towards Real-time Video Compressive Sensing on Mobile Devices | Miao Cao et.al. | 2408.07530 | link |
2024-08-13 | Imagen 3 | Imagen-Team-Google et.al. | 2408.07009 | null |
2024-08-13 | Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models | Cheng Chen et.al. | 2408.06995 | null |
2024-08-13 | DCMSA: Multi-Head Self-Attention Mechanism Based on Deformable Convolution For Seismic Data Denoising | Wang Mingwei et.al. | 2408.06963 | null |
2024-08-13 | Neural Speech and Audio Coding | Minje Kim et.al. | 2408.06954 | null |
2024-08-13 | Diffusion Model for Slate Recommendation | Federico Tomasi et.al. | 2408.06883 | null |
2024-08-13 | Efficient Search for Customized Activation Functions with Gradient Descent | Lukas Strack et.al. | 2408.06820 | link |
2024-08-13 | Enhancing Diabetic Retinopathy Diagnosis: A Lightweight CNN Architecture for Efficient Exudate Detection in Retinal Fundus Images | Mujadded Al Rabbani Alif et.al. | 2408.06784 | null |
2024-08-13 | Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective | Ouxiang Li et.al. | 2408.06741 | link |
2024-08-13 | DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion | Yujia Wu et.al. | 2408.06740 | null |
2024-08-13 | Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a β-Variational Autoencoder | Gizem Mert et.al. | 2408.06720 | null |
2024-08-12 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Chris Lu et.al. | 2408.06292 | link |
2024-08-12 | Open-Source Molecular Processing Pipeline for Generating Molecules | Shreyas V et.al. | 2408.06261 | null |
2024-08-12 | 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs) | Jaydeep Rade et.al. | 2408.06244 | null |
2024-08-12 | Cislunar Constellation Design for Space Situational Awareness with Time-Expanded Facility Location Problem | Yuri Shimane et.al. | 2408.06238 | null |
2024-08-12 | Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance | Taewon Kang et.al. | 2408.06157 | null |
2024-08-12 | LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library | Tianhao Yu et.al. | 2408.06150 | null |
2024-08-12 | Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models | Ioannis Romanelis et.al. | 2408.06145 | link |
2024-08-12 | Med42-v2: A Suite of Clinical LLMs | Clément Christophe et.al. | 2408.06142 | null |
2024-08-12 | Five Pitfalls When Assessing Synthetic Medical Images with Reference Metrics | Melanie Dohmen et.al. | 2408.06075 | null |
2024-08-12 | CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Zhuoyi Yang et.al. | 2408.06072 | link |
2024-08-09 | Multi-Garment Customized Model Generation | Yichen Liu et.al. | 2408.05206 | null |
2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-09 | Cell Morphology-Guided Small Molecule Generation with GFlowNets | Stephen Zhewen Lu et.al. | 2408.05196 | link |
2024-08-09 | Lithography-free patterning of chalcogenide materials for integrated photonic devices | Zhen Hu et.al. | 2408.05099 | null |
2024-08-09 | Social contagion under hybrid interactions | Xincheng Shu et.al. | 2408.05050 | null |
2024-08-09 | Infrared Beam-shaping on Demand via Tailored Geometric Phase Metasurfaces employing the Plasmonic Phase-Change Material In3SbTe2 | Lukas Conrads et.al. | 2408.05044 | null |
2024-08-09 | Collaborative Static-Dynamic Teaching: A Semi-Supervised Framework for Stripe-Like Space Target Detection | Zijian Zhu et.al. | 2408.05029 | null |
2024-08-09 | Retrieval-augmented code completion for local projects using large language models | Marko Hostnik et.al. | 2408.05026 | null |
2024-08-09 | DreamCouple: Exploring High Quality Text-to-3D Generation Via Rectified Flow | Hangyu Li et.al. | 2408.05008 | null |
2024-08-09 | Pay Attention To Mean Fields For Point Cloud Generation | Benno Käch et.al. | 2408.04997 | link |
2024-08-08 | Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Ruining Li et.al. | 2408.04631 | null |
2024-08-08 | Transformer Explainer: Interactive Learning of Text-Generative Models | Aeree Cho et.al. | 2408.04619 | null |
2024-08-08 | Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User’s Casual Sketches | Yongzhi Xu et.al. | 2408.04567 | null |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-08 | On the Asymptotic Convergence of Subgraph Generated Models | Xinchen Xu et.al. | 2408.04541 | null |
2024-08-08 | AExGym: Benchmarks and Environments for Adaptive Experimentation | Jimmy Wang et.al. | 2408.04531 | null |
2024-08-08 | NFDI4Health workflow and service for synthetic data generation, assessment and risk management | Sobhan Moazemi et.al. | 2408.04478 | null |
2024-08-08 | Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations | Julen Urain et.al. | 2408.04380 | null |
2024-08-08 | Making sense of AI systems development | Mateusz Dolata et.al. | 2408.04311 | null |
2024-08-08 | AI-Driven Chatbot for Intrusion Detection in Edge Networks: Enhancing Cybersecurity with Ethical User Consent | Mugheez Asif et.al. | 2408.04281 | null |
2024-08-07 | Prospects for using drones to test formation-flying CubeSat concepts, and other astronomical applications | John D. Monnier et.al. | 2408.03911 | null |
2024-08-07 | Hate Speech Detection and Classification in Amharic Text with Deep Learning | Samuel Minale Gashe et.al. | 2408.03849 | null |
2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
2024-08-07 | A broken duet: multistable dynamics of dyadic interactions | Johan Medrano et.al. | 2408.03809 | null |
2024-08-07 | Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning | Martin Moder et.al. | 2408.03807 | link |
2024-08-07 | Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model | Guoqing Zhu et.al. | 2408.03748 | link |
2024-08-07 | Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction | Benjamin Matthias Ruppik et.al. | 2408.03706 | null |
2024-08-07 | Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling | Zilyu Ye et.al. | 2408.03695 | null |
2024-08-07 | Unsupervised Detection of Fetal Brain Anomalies using Denoising Diffusion Models | Markus Ditlev Sjøgren Olsen et.al. | 2408.03654 | null |
2024-08-07 | Goal-oriented Semantic Communication for the Metaverse Application | Zhe Wang et.al. | 2408.03646 | null |
2024-08-06 | MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation | Xiaofeng Mao et.al. | 2408.03312 | null |
2024-08-06 | IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts | Ciara Rowles et.al. | 2408.03209 | null |
2024-08-06 | Personalizing Federated Instrument Segmentation with Visual Trait Priors in Robotic Surgery | Jialang Xu et.al. | 2408.03208 | null |
2024-08-06 | An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion | Xingguang Yan et.al. | 2408.03178 | null |
2024-08-06 | Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models | Sho Ozaki et.al. | 2408.03156 | null |
2024-08-06 | Enhancing Twitter Bot Detection via Multimodal Invariant Representations | Jibing Gong et.al. | 2408.03096 | null |
2024-08-06 | Analysis of Argument Structure Constructions in a Deep Recurrent Language Model | Pegah Ramezani et.al. | 2408.03062 | null |
2024-08-06 | OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | Qiang Sun et.al. | 2408.03047 | link |
2024-08-06 | Targeted Visual Prompting for Medical Visual Question Answering | Sergio Tascon-Morales et.al. | 2408.03043 | null |
2024-08-06 | Training-Free Condition Video Diffusion Models for single frame Spatial-Semantic Echocardiogram Synthesis | Van Phi Nguyen et.al. | 2408.03035 | link |
2024-08-05 | Command-line Obfuscation Detection using Small Language Models | Vojtech Outrata et.al. | 2408.02637 | null |
2024-08-05 | VidGen-1M: A Large-Scale Dataset for Text-to-video Generation | Zhiyu Tan et.al. | 2408.02629 | null |
2024-08-05 | YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition | Duc Manh Nguyen Dang et.al. | 2408.02623 | link |
2024-08-05 | LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba | Yunxiang Fu et.al. | 2408.02615 | link |
2024-08-05 | MetaParticles: Computationally engineered nanomaterials with tunable and responsive properties | Massimiliano Paesani et.al. | 2408.02564 | null |
2024-08-05 | Fairness and Bias Mitigation in Computer Vision: A Survey | Sepehr Dehdashtian et.al. | 2408.02464 | null |
2024-08-05 | TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments | Daeun Song et.al. | 2408.02454 | null |
2024-08-05 | Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models | Zi Liang et.al. | 2408.02416 | link |
2024-08-05 | Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models | Tongtong Feng et.al. | 2408.02408 | null |
2024-08-05 | A Few-Shot Approach for Relation Extraction Domain Adaptation using Large Language Models | Vanni Zavarella et.al. | 2408.02377 | null |
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415 | null |
2024-08-02 | Autoencoders in Function Space | Justin Bunker et.al. | 2408.01362 | link |
2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | link |
2024-08-02 | TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling | Dong Huo et.al. | 2408.01291 | null |
2024-08-02 | A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness | Lutao Jiang et.al. | 2408.01269 | null |
2024-08-02 | Exchange control in a MOS double quantum dot made using a 300 mm wafer process | Jacob F. Chittock-Wood et.al. | 2408.01241 | null |
2024-08-02 | CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models | Kushal Kumar Jain et.al. | 2408.01233 | null |
2024-08-02 | Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion | Ke Li et.al. | 2408.01225 | link |
2024-08-02 | PSP-GEN: Stochastic inversion of the Process-Structure-Property chain in materials design through deep, generative probabilistic modeling | Yaohua Zang et.al. | 2408.01114 | null |
2024-08-02 | Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding | Danbinaerin Han et.al. | 2408.01096 | null |
2024-08-01 | Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation | Yixiao Wang et.al. | 2408.00766 | null |
2024-08-01 | Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention | Susung Hong et.al. | 2408.00760 | null |
2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
2024-08-01 | TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models | Gilad Deutch et.al. | 2408.00735 | null |
2024-08-01 | A Natural Language Processing Framework for Hotel Recommendation Based on Users’ Text Reviews | Lavrentia Aravani et.al. | 2408.00716 | null |
2024-08-02 | Reinforcement Learning applied to Insurance Portfolio Pursuit | Edward James Young et.al. | 2408.00713 | link |
2024-08-01 | MotionFix: Text-Driven 3D Human Motion Editing | Nikos Athanasiou et.al. | 2408.00712 | null |
2024-08-01 | Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function | Matias Oscar Volman Stern et.al. | 2408.00707 | null |
2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665 | link |
2024-08-01 | Privacy-preserving datasets by capturing feature distributions with Conditional VAEs | Francesco Di Salvo et.al. | 2408.00639 | link |
2024-07-31 | Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Yuxin Wen et.al. | 2407.21720 | link |
2024-07-31 | Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Zhenghao Zhang et.al. | 2407.21705 | null |
2024-07-31 | Generative Diffusion Model for Seismic Imaging Improvement of Sparsely Acquired Data and Uncertainty Quantification | Xingchen Shi et.al. | 2407.21683 | null |
2024-07-31 | Quality Control for Radiology Report Generation Models via Auxiliary Auditing Components | Hermione Warr et.al. | 2407.21638 | null |
2024-07-31 | LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows | Lukas Teufelberger et.al. | 2407.21593 | null |
2024-07-31 | Long-term investment and energy procurement risk management under uncertainty for an electrolytic green hydrogen producer | Owen Palmer et.al. | 2407.21574 | null |
2024-07-31 | Conditioned Prompt-Optimization for Continual Deepfake Detection | Francesco Laiti et.al. | 2407.21554 | link |
2024-07-31 | CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment | Akira Kasuga et.al. | 2407.21553 | null |
2024-07-31 | Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation | Junxuan Yu et.al. | 2407.21490 | null |
2024-07-31 | Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends | Giuliano Martinelli et.al. | 2407.21489 | link |
2024-07-30 | Matting by Generation | Zhixiang Wang et.al. | 2407.21017 | null |
2024-07-30 | Add-SD: Rational Generation without Manual Reference | Lingfeng Yang et.al. | 2407.21016 | link |
2024-07-30 | Integrating Agent-Based and Compartmental Models for Infectious Disease Modeling: A Novel Hybrid Approach | Inan Bostanci et.al. | 2407.20993 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Xiaowei Chi et.al. | 2407.20962 | link |
2024-07-30 | Mitigating calibration errors from mutual coupling with time-domain filtering of 21 cm cosmological radio observations | N. Charles et.al. | 2407.20923 | null |
2024-07-30 | Impact of Geographical Separation on Spectrum Sharing Markets | Kangle Mu et.al. | 2407.20909 | null |
2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908 | link |
2024-07-30 | Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks | Yunfeng Diao et.al. | 2407.20836 | null |
2024-07-30 | Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning | Norman Di Palo et.al. | 2407.20798 | null |
2024-07-30 | SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Zheng Liu et.al. | 2407.20756 | link |
2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232 | null |
2024-07-29 | LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework | Zhenqi He et.al. | 2407.20172 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
2024-07-29 | DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models | Jing Yang et.al. | 2407.20141 | null |
2024-07-29 | Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning | Liyuan Mao et.al. | 2407.20109 | null |
2024-07-29 | On the significance of parameters and the projective level in the Choice and Collection axioms | Vladimir Kanovei et.al. | 2407.20098 | null |
2024-07-29 | Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations | Fangyijie Wang et.al. | 2407.20072 | link |
2024-07-29 | ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning | Delyan Boychev et.al. | 2407.20020 | link |
2024-07-29 | Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation” | Daniel Gallo Fernández et.al. | 2407.19996 | null |
2024-07-29 | HeadsetOff: Enabling Photorealistic Video Conferencing on Economical VR Headsets | Yili Jin et.al. | 2407.19988 | null |
2024-07-26 | Generative Adversarial Networks for Imputing Sparse Learning Performance | Liang Zhang et.al. | 2407.18875 | null |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
2024-07-26 | Scalable Group Choreography via Variational Phase Manifold Learning | Nhat Le et.al. | 2407.18839 | null |
2024-07-26 | Revision of calcium and scandium abundances in Am stars based on NLTE calculations and comparison with diffusion stellar evolution models | L. I. Mashonkina et.al. | 2407.18736 | null |
2024-07-26 | BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation | Peng Hao et.al. | 2407.18715 | null |
2024-07-26 | Q-gen: A Parameterized Quantum Circuit Generator | Yikai Mao et.al. | 2407.18697 | link |
2024-07-26 | Adversarial Robustification via Text-to-Image Diffusion Models | Daewon Choi et.al. | 2407.18658 | link |
2024-07-26 | Robust VAEs via Generating Process of Noise Augmented Data | Hiroo Irobe et.al. | 2407.18632 | null |
2024-07-26 | Denoising Lévy Probabilistic Models | Dario Shariatian et.al. | 2407.18609 | null |
2024-07-26 | How To Segment in 3D Using 2D Models: Automated 3D Segmentation of Prostate Cancer Metastatic Lesions on PET Volumes Using Multi-Angle Maximum Intensity Projections and Diffusion Models | Amirhosein Toosi et.al. | 2407.18555 | null |
2024-07-25 | RegionDrag: Fast Region-Based Image Editing with Diffusion Models | Jingyi Lu et.al. | 2407.18247 | null |
2024-07-25 | VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads | Orest Kupyn et.al. | 2407.18245 | null |
2024-07-25 | CodedVO: Coded Visual Odometry | Sachin Shah et.al. | 2407.18240 | null |
2024-07-25 | SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits | Yanyue Xie et.al. | 2407.18209 | null |
2024-07-25 | Test2VA: Reusing GUI Test Cases for Voice Assistant Features Development in Mobile Applications | Garrett Weaver et.al. | 2407.18155 | null |
2024-07-25 | Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images | Roberto Di Via et.al. | 2407.18125 | null |
2024-07-25 | Keypoint Promptable Re-Identification | Vladimir Somers et.al. | 2407.18112 | link |
2024-07-25 | SSTD: Stripe-Like Space Target Detection using Single-Point Supervision | Zijian Zhu et.al. | 2407.18097 | null |
2024-07-25 | Cross-Observatory Coordination with tilepy: A Novel Tool for Observations of Multi-Messenger Transient Events | Monica Seglar-Arroyo et.al. | 2407.18076 | null |
2024-07-25 | AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild | Junho Park et.al. | 2407.18034 | link |
2024-07-24 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | Yiming Xie et.al. | 2407.17470 | null |
2024-07-24 | BlueTempNet: A Temporal Multi-network Dataset of Social Interactions in Bluesky Social | Ujun Jeong et.al. | 2407.17451 | null |
2024-07-24 | ProvenanceWidgets: A Library of UI Control Elements to Track and Dynamically Overlay Analytic Provenance | Arpit Narechania et.al. | 2407.17431 | link |
2024-07-24 | CDDIP: Constrained Diffusion-Driven Deep Image Prior for Seismic Image Reconstruction | Paul Goyes-Peñafiel et.al. | 2407.17402 | link |
2024-07-24 | Cosmic ray susceptibility of the Terahertz Intensity Mapper detector arrays | Lun-Jun Liu et.al. | 2407.17381 | null |
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
2024-07-24 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding et.al. | 2407.17349 | link |
2024-07-24 | Quantum nonlocal modulation cancellation with distributed clocks | Stephen D. Chapman et.al. | 2407.17330 | null |
2024-07-25 | Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population | Nikolaos Ntampakis et.al. | 2407.17324 | null |
2024-07-24 | Edge-Cloud Continuum Orchestration of Critical Services: A Smart-City Approach | Rodrigo Rosmaninho et.al. | 2407.17314 | null |
2024-07-23 | Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions | Fabio Tosi et.al. | 2407.16698 | link |
2024-07-23 | From Imitation to Refinement – Residual RL for Precise Visual Assembly | Lars Ankile et.al. | 2407.16677 | null |
2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
2024-07-23 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | Canyu Zhao et.al. | 2407.16655 | null |
2024-07-23 | Unveiling and Mitigating Bias in Audio Visual Segmentation | Peiwen Sun et.al. | 2407.16638 | null |
2024-07-23 | Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses | Haojun Yu et.al. | 2407.16634 | null |
2024-07-23 | GenRec: A Flexible Data Generator for Recommendations | Erica Coppolillo et.al. | 2407.16594 | null |
2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560 | link |
2024-07-23 | DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models | Zhenyu Xie et.al. | 2407.16511 | null |
2024-07-23 | qMRI Diffusor: Quantitative T1 Mapping of the Brain using a Denoising Diffusion Probabilistic Model | Shishuai Wang et.al. | 2407.16477 | null |
2024-07-22 | Artist: Aesthetically Controllable Text-Driven Stylization without Training | Ruixiang Jiang et.al. | 2407.15842 | link |
2024-07-23 | A Large-scale Benchmark Dataset for Commuting Origin-destination Matrix Generation | Can Rong et.al. | 2407.15823 | link |
2024-07-22 | Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget | Vikash Sehwag et.al. | 2407.15811 | null |
2024-07-22 | Quantum Computing for Phonon Scattering Effects on Thermal Conductivity | Xiangjun Tan et.al. | 2407.15808 | null |
2024-07-22 | Enhancing Mass Customization Manufacturing: Multiobjective Metaheuristic Algorithms for flow shop Production in Smart Industry | Diego Rossit et.al. | 2407.15802 | null |
2024-07-22 | Diffusion Model Based Resource Allocation Strategy in Ultra-Reliable Wireless Networked Control Systems | Amirhassan Babazadeh Darabi et.al. | 2407.15784 | null |
2024-07-22 | A Hamilton-Jacobi approach to road-field reaction-diffusion models | Christopher Henderson et.al. | 2407.15760 | null |
2024-07-22 | Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond | Silvio Galesso et.al. | 2407.15739 | link |
2024-07-22 | DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design | Zhi Hao Luo et.al. | 2407.15723 | link |
2024-07-22 | Estimating Probability Densities with Transformer and Denoising Diffusion | Henry W. Leung et.al. | 2407.15703 | link |
2024-07-19 | DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks | Sarah Jabbour et.al. | 2407.14509 | null |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505 | link |
2024-07-19 | M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models | Seunggeun Chi et.al. | 2407.14502 | null |
2024-07-19 | A Precision Cryogenic Positioning Stage for Detector Dithering and Flexure Compensation | Stephen A. Smee et.al. | 2407.14493 | null |
2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
2024-07-19 | Describe Data to get Science-Data-Ready Tooling: Awkward as a Target for Kaitai Struct YAML | Manasvi Goyal et.al. | 2407.14461 | null |
2024-07-19 | Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model | Seonghui Min et.al. | 2407.14434 | null |
2024-07-19 | Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models | Hyun-Jic Oh et.al. | 2407.14426 | null |
2024-07-19 | GLAudio Listens to the Sound of the Graph | Aurelio Sulser et.al. | 2407.14387 | link |
2024-07-18 | LogoSticker: Inserting Logos into Diffusion Models for Customized Generation | Mingkang Zhu et.al. | 2407.13752 | null |
2024-07-18 | Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review | Masatoshi Uehara et.al. | 2407.13734 | link |
2024-07-18 | Shaded Route Planning Using Active Segmentation and Identification of Satellite Images | Longchao Da et.al. | 2407.13689 | null |
2024-07-18 | PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers | Songlin Li et.al. | 2407.13677 | null |
2024-07-18 | MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis | Ziming Zhong et.al. | 2407.13675 | link |
2024-07-18 | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models | Xiaoyu Zhu et.al. | 2407.13642 | null |
2024-07-18 | Training-free Composite Scene Generation for Layout-to-Image Synthesis | Jiaqi Liu et.al. | 2407.13609 | link |
2024-07-18 | EnergyDiff: Universal Time-Series Energy Data Generation using Diffusion Models | Nan Lin et.al. | 2407.13538 | null |
2024-07-18 | VeriQR: A Robustness Verification Tool for Quantum Machine Learning Models | Yanling Lin et.al. | 2407.13533 | null |
2024-07-18 | All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models | Charumathi Badrinath et.al. | 2407.13449 | link |
2024-07-17 | SMooDi: Stylized Motion Diffusion Model | Lei Zhong et.al. | 2407.12783 | null |
2024-07-17 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control | Sherwin Bahmani et.al. | 2407.12781 | null |
2024-07-17 | Hallucination Index: An Image Quality Metric for Generative Reconstruction Models | Matthew Tivnan et.al. | 2407.12780 | null |
2024-07-17 | GroundUp: Rapid Sketch-Based 3D City Massing | Gizem Esra Unlu et.al. | 2407.12739 | null |
2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
2024-07-17 | An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection | Amit Prasad et.al. | 2407.12724 | null |
2024-07-17 | Unlocking planetesimal magnetic field histories: a refined, versatile model for thermal evolution and dynamo generation | Hannah R. Sanderson et.al. | 2407.12721 | null |
2024-07-17 | SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow | Yuanzhi Zhu et.al. | 2407.12718 | link |
2024-07-17 | Teleoperation in Robot-assisted MIS with Adaptive RCM via Admittance Control | Ehsan Nasiri et.al. | 2407.12711 | null |
2024-07-16 | Efficient Training with Denoised Neural Weights | Yifan Gong et.al. | 2407.11966 | null |
2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | null |
2024-07-16 | Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design | Leo Klarner et.al. | 2407.11942 | link |
2024-07-16 | Code Documentation and Analysis to Secure Software Development | Paul Attie et.al. | 2407.11934 | null |
2024-07-16 | Global Optimisation of Black-Box Functions with Generative Models in the Wasserstein Space | Tigran Ramazyan et.al. | 2407.11917 | link |
2024-07-16 | Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data | Tim Elsner et.al. | 2407.11913 | null |
2024-07-16 | Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development | Daoyuan Chen et.al. | 2407.11784 | link |
2024-07-16 | Diffusion-driven self-assembly of emerin nanodomains at the nuclear envelope | Carlos D. Alas et.al. | 2407.11758 | null |
2024-07-16 | Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen | Alessandro Palma et.al. | 2407.11734 | link |
2024-07-16 | Theoretical Insights into CycleGAN: Analyzing Approximation and Estimation Errors in Unpaired Data Generation | Luwei Sun et.al. | 2407.11678 | null |
2024-07-15 | Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion | Yongyuan Liang et.al. | 2407.10973 | null |
2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
2024-07-15 | InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models | Nirat Saini et.al. | 2407.10958 | null |
2024-07-16 | DataDream: Few-shot Guided Dataset Generation | Jae Myung Kim et.al. | 2407.10910 | link |
2024-07-15 | Optical Diffusion Models for Image Generation | Ilker Oguz et.al. | 2407.10897 | null |
2024-07-15 | R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection | Zheyuan Zhou et.al. | 2407.10862 | null |
2024-07-15 | Physics-Inspired Generative Models in Medical Imaging: A Review | Dennis Hein et.al. | 2407.10856 | null |
2024-07-15 | Inferring dark energy properties from the scale factor parametrisation | Upala Mukhopadhayay et.al. | 2407.10845 | null |
2024-07-15 | MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration | Yulin Ren et.al. | 2407.10833 | null |
2024-07-15 | Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation | Tu Vu et.al. | 2407.10817 | null |
2024-07-12 | StyleSplat: 3D Object Style Transfer with Gaussian Splatting | Sahil Jain et.al. | 2407.09473 | null |
2024-07-12 | FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 | Georgios Makridis et.al. | 2407.09467 | null |
2024-07-12 | The $μ\mathcal{G}$ Language for Programming Graph Neural Networks | Matteo Belenchia et.al. | 2407.09441 | null |
2024-07-12 | Graph Neural Network Causal Explanation via Neural Causal Models | Arman Behnam et.al. | 2407.09378 | link |
2024-07-12 | Computationally Efficient Estimation of Large Probit Models | Patrick Ding et.al. | 2407.09371 | null |
2024-07-12 | Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text | Lucio La Cava et.al. | 2407.09364 | null |
2024-07-15 | Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees | Alexia Jolicoeur-Martineau et.al. | 2407.09357 | link |
2024-07-12 | PID: Physics-Informed Diffusion Model for Infrared Image Generation | Fangyuan Mao et.al. | 2407.09299 | link |
2024-07-12 | Learning Distances from Data with Normalizing Flows and Score Matching | Peter Sorrenson et.al. | 2407.09297 | null |
2024-07-12 | Surgical Text-to-Image Generation | Chinedu Innocent Nwoye et.al. | 2407.09230 | null |
2024-07-11 | Video Diffusion Alignment via Reward Gradients | Mihir Prabhudesai et.al. | 2407.08737 | link |
2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701 | null |
2024-07-11 | FAR-Trans: An Investment Dataset for Financial Asset Recommendation | Javier Sanz-Cruzado et.al. | 2407.08692 | null |
2024-07-11 | Scattering transforms on the sphere, application to large scale structure modelling | Louise Mousset et.al. | 2407.08687 | null |
2024-07-11 | CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs | Leah Chong et.al. | 2407.08675 | null |
2024-07-11 | Still-Moving: Customized Video Generation without Customized Video Data | Hila Chefer et.al. | 2407.08674 | null |
2024-07-11 | Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density | Shuangqi Li et.al. | 2407.08659 | null |
2024-07-11 | Adaptive Smooth Non-Stationary Bandits | Joe Suk et.al. | 2407.08654 | null |
2024-07-11 | Fine-Tuning Stable Diffusion XL for Stylistic Icon Generation: A Comparison of Caption Size | Youssef Sultan et.al. | 2407.08513 | null |
2024-07-11 | Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Mode | Yuxing Tian et.al. | 2407.08500 | null |
2024-07-10 | Generative Image as Action Models | Mohit Shridhar et.al. | 2407.07875 | link |
2024-07-10 | Dynamical Measure Transport and Neural PDE Solvers for Sampling | Jingtong Sun et.al. | 2407.07873 | null |
2024-07-10 | Controlling Space and Time with Diffusion Models | Daniel Watson et.al. | 2407.07860 | null |
2024-07-10 | Generic Numerical Analysis of Stochastic Reaction Diffusion Model with applications in excitable media | Yahya Alnashri et.al. | 2407.07834 | null |
2024-07-10 | Universal and non-universal signatures in the scaling functions of critical variables | Gianluca Teza et.al. | 2407.07782 | null |
2024-07-10 | Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control | Elahe Delavari et.al. | 2407.07684 | null |
2024-07-10 | VEnhancer: Generative Space-Time Enhancement for Video Generation | Jingwen He et.al. | 2407.07667 | null |
2024-07-10 | A Coding-Theoretic Analysis of Hyperspherical Prototypical Learning Geometry | Martin Lindström et.al. | 2407.07664 | link |
2024-07-10 | The heterogeneous impact of the EU-Canada agreement with causal machine | Lionel Fontagné et.al. | 2407.07652 | null |
2024-07-11 | MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis | Wanggui He et.al. | 2407.07614 | link |
2024-07-09 | ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction | Shaozhe Hao et.al. | 2407.07077 | link |
2024-07-09 | Latent Space Imaging | Matheus Souza et.al. | 2407.07052 | null |
2024-07-09 | Generative models of astrophysical fields with scattering transforms on the sphere | Louise Mousset et.al. | 2407.07007 | link |
2024-07-10 | PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods | Yiying Wang et.al. | 2407.06985 | link |
2024-07-09 | Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach | Taolin Zhang et.al. | 2407.06964 | null |
2024-07-09 | RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models | Bowen Zhang et.al. | 2407.06938 | null |
2024-07-09 | HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance | Guian Fang et.al. | 2407.06937 | link |
2024-07-09 | Fine-grained large-scale content recommendations for MSX sellers | Manpreet Singh et.al. | 2407.06910 | null |
2024-07-09 | Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load | Vijay Babu Pamshetti et.al. | 2407.06857 | null |
2024-07-09 | A reaction-diffusion model for relapsing-remitting multiple sclerosis with a treatment term | Romina Travaglini et.al. | 2407.06802 | null |
2024-07-08 | Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images | Zhangyang Qi et.al. | 2407.06191 | null |
2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188 | null |
2024-07-08 | JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation | Yu Zeng et.al. | 2407.06187 | null |
2024-07-08 | The Tug-of-War Between Deepfake Generation and Detection | Hannah Lee et.al. | 2407.06174 | null |
2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
2024-07-08 | Structured Generations: Using Hierarchical Clusters to guide Diffusion Models | Jorge da Silva Goncalves et.al. | 2407.06124 | link |
2024-07-08 | PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models | Jinhua Zhang et.al. | 2407.06109 | link |
2024-07-08 | Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation | Xinyu Bai et.al. | 2407.06095 | null |
2024-07-08 | Assessing Cardiomegaly in Dogs Using a Simple CNN Model | Nikhil Deekonda et.al. | 2407.06092 | null |
2024-07-08 | Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis | Emaad Khwaja et.al. | 2407.06079 | null |
2024-07-05 | RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation | Yuxuan Kuang et.al. | 2407.04689 | link |
2024-07-05 | Thermal and mechanical study of a parametrised cryostat model for optical characterisation of upcoming CMB experiments | Thomas J. L. J. Gascard et.al. | 2407.04613 | link |
2024-07-08 | PartCraft: Crafting Creative Objects by Parts | Kam Woh Ng et.al. | 2407.04604 | link |
2024-07-05 | Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates | Ryotaro Okabe et.al. | 2407.04557 | null |
2024-07-05 | Unified continuous-time q-learning for mean-field game and mean-field control problems | Xiaoli Wei et.al. | 2407.04521 | null |
2024-07-08 | Speed-accuracy trade-off for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport | Kotaro Ikeda et.al. | 2407.04495 | null |
2024-07-05 | PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation | Yinghua Yao et.al. | 2407.04493 | null |
2024-07-05 | Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model | Duy M. H. Nguyen et.al. | 2407.04489 | null |
2024-07-05 | Leveraging Graph Structures to Detect Hallucinations in Large Language Models | Noa Nonkes et.al. | 2407.04485 | link |
2024-07-05 | VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing | Shang Liu et.al. | 2407.04461 | null |
2024-07-03 | DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents | Yilun Xu et.al. | 2407.03300 | link |
2024-07-03 | Improved Noise Schedule for Diffusion Training | Tiankai Hang et.al. | 2407.03297 | null |
2024-07-03 | Anomaly-based Framework for Detecting Power Overloading Cyberattacks in Smart Grid AMI | Abdelaziz Amara Korba et.al. | 2407.03264 | null |
2024-07-03 | SOS! Soft Prompt Attack Against Open-Source Large Language Models | Ziqing Yang et.al. | 2407.03160 | null |
2024-07-04 | Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis | Tong Zhou et.al. | 2407.03089 | null |
2024-07-03 | Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios | Patricia A. Apellániz et.al. | 2407.03080 | null |
2024-07-03 | Electromagnetic Property Sensing Based on Diffusion Model in ISAC System | Yuhua Jiang et.al. | 2407.03075 | null |
2024-07-03 | Semantic-Aware Power Allocation for Generative Semantic Communications with Foundation Models | Chunmei Xu et.al. | 2407.03050 | null |
2024-07-03 | SlerpFace: Face Template Protection via Spherical Linear Interpolation | Zhizhou Zhong et.al. | 2407.03043 | null |
2024-07-03 | An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis | Marawan Elbatel et.al. | 2407.03018 | link |
2024-07-02 | Magic Insert: Style-Aware Drag-and-Drop | Nataniel Ruiz et.al. | 2407.02489 | null |
2024-07-02 | Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models | Fei Shen et.al. | 2407.02482 | link |
2024-07-02 | A Pattern Language for Machine Learning Tasks | Benjamin Rodatz et.al. | 2407.02424 | null |
2024-07-02 | GCF: Graph Convolutional Networks for Facial Expression Recognition | Hozaifa Kassab et.al. | 2407.02361 | null |
2024-07-02 | MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space | Yihong Tang et.al. | 2407.02345 | null |
2024-07-02 | Choice-based time slot management in attended home delivery | Dorsa Abdolhamidi et.al. | 2407.02339 | null |
2024-07-02 | Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Log | Adrian Rebmann et.al. | 2407.02336 | link |
2024-07-02 | A tactical time slot management problem under mixed logit demand | Dorsa Abdolhamidi et.al. | 2407.02308 | null |
2024-07-02 | Renard: A Modular Pipeline for Extracting Character Networks from Narrative Texts | Arthur Amalvy et.al. | 2407.02284 | null |
2024-07-03 | Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis | Sufen Ren et.al. | 2407.02261 | null |
2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085 | null |
2024-06-28 | The hybrid Josephson rhombus: A superconducting element with tailored current-phase relation | L. Banszerus et.al. | 2406.20082 | null |
2024-06-28 | HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model | Hieu T. Nguyen et.al. | 2406.20077 | null |
2024-06-28 | Modeling and LQR Control of Insect Sized Flapping Wing Robot | Daksh Dhingra et.al. | 2406.20061 | null |
2024-06-28 | Neural Differentiable Modeling with Diffusion-Based Super-resolution for Two-Dimensional Spatiotemporal Turbulence | Xiantao Fan et.al. | 2406.20047 | null |
2024-06-28 | Electrostatics-based particle sampling and approximate inference | Yongchao Huang et.al. | 2406.20044 | link |
2024-06-28 | HAITCH: A Framework for Distortion and Motion Correction in Fetal Multi-Shell Diffusion-Weighted MRI | Haykel Snoussi et.al. | 2406.20042 | null |
2024-06-28 | Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs | Sangwon Jeong et.al. | 2406.19987 | null |
2024-07-01 | Text2Robot: Evolutionary Robot Design from Text Descriptions | Ryan P. Ringel et.al. | 2406.19963 | null |
2024-06-28 | Kolmogorov-Smirnov GAN | Maciej Falkiewicz et.al. | 2406.19948 | link |
2024-06-27 | Looking 3D: Anomaly Detection with 2D-3D Alignment | Ankan Bhunia et.al. | 2406.19393 | link |
2024-06-27 | Taming Data and Transformers for Audio Generation | Moayed Haji-Ali et.al. | 2406.19388 | null |
2024-06-27 | Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space | Core Francisco Park et.al. | 2406.19370 | null |
2024-06-27 | Accelerating Multiphase Flow Simulations with Denoising Diffusion Model Driven Initializations | Jaehong Chung et.al. | 2406.19333 | null |
2024-06-27 | Subtractive Training for Music Stem Insertion using Latent Diffusion Models | Ivan Villa-Renteria et.al. | 2406.19328 | null |
2024-06-27 | Efficient World Models with Context-Aware Tokenization | Vincent Micheli et.al. | 2406.19320 | link |
2024-06-27 | PNeRV: A Polynomial Neural Representation for Videos | Sonam Gupta et.al. | 2406.19299 | null |
2024-06-27 | Compositional Image Decomposition with Diffusion Models | Jocelin Su et.al. | 2406.19298 | null |
2024-06-27 | BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring | Luca Benfenati et.al. | 2406.19189 | null |
2024-06-27 | On Pólya-Young urn models and growth processes | Markus Kuba et.al. | 2406.19110 | null |
2024-06-26 | MatchTime: Towards Automatic Soccer Game Commentary Generation | Jiayuan Rao et.al. | 2406.18530 | link |
2024-06-26 | MultiDiff: Consistent Novel View Synthesis from a Single Image | Norman Müller et.al. | 2406.18524 | null |
2024-06-26 | Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration | Kang Liao et.al. | 2406.18516 | link |
2024-06-26 | DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance | Younghyun Kim et.al. | 2406.18459 | link |
2024-06-26 | Cascading Large Language Models for Salient Event Graph Generation | Xingwei Tan et.al. | 2406.18449 | link |
2024-06-26 | Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling | Abril Corona-Figueroa et.al. | 2406.18422 | link |
2024-06-26 | Towards diffusion models for large-scale sea-ice modelling | Tobias Sebastian Finn et.al. | 2406.18417 | null |
2024-06-27 | Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process | Tianyu Lin et.al. | 2406.18361 | link |
2024-06-26 | Molecular Diffusion Models with Virtual Receptors | Matan Halfon et.al. | 2406.18330 | null |
2024-06-27 | Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems | Italo Luis da Silva et.al. | 2406.18245 | link |
2024-06-25 | DiffusionPDE: Generative PDE-Solving Under Partial Observation | Jiahe Huang et.al. | 2406.17763 | link |
2024-06-25 | MotionBooth: Motion-Aware Customized Text-to-Video Generation | Jianzong Wu et.al. | 2406.17758 | null |
2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
2024-06-25 | Extensions of Panjer’s recursion for mixed compound distributions | Spyridon M. Tzaninis et.al. | 2406.17726 | null |
2024-06-25 | PANDA: A self-driving lab for studying electrodeposited polymer films | Harley Quinn et.al. | 2406.17725 | null |
2024-06-25 | Unified Auto-Encoding with Masked Diffusion | Philippe Hansen-Estruch et.al. | 2406.17688 | link |
2024-06-25 | LaTable: Towards Large Tabular Models | Boris van Breugel et.al. | 2406.17673 | null |
2024-06-26 | SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond | Marco Comunità et.al. | 2406.17672 | null |
2024-06-25 | Banishing LLM Hallucinations Requires Rethinking Generalization | Johnny Li et.al. | 2406.17642 | null |
2024-06-25 | The experience of humans’ and robots’ mutual (im)politeness in enacted service scenarios: An empirical study | Victor Kaptelinin et.al. | 2406.17641 | null |
2024-06-24 | FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models | Haonan Qiu et.al. | 2406.16863 | link |
2024-06-24 | Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Junbang Liang et.al. | 2406.16862 | null |
2024-06-24 | DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | Yuang Peng et.al. | 2406.16855 | link |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations | Mounika Marreddy et.al. | 2406.16833 | null |
2024-06-24 | General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design | Yue Jian et.al. | 2406.16821 | null |
2024-06-24 | ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians | Yufei Liu et.al. | 2406.16815 | null |
2024-06-24 | Conformal time series decomposition with component-wise exchangeability | Derck W. E. Prinzhorn et.al. | 2406.16766 | link |
2024-06-24 | Inferring stochastic low-rank recurrent neural networks from neural data | Matthijs Pals et.al. | 2406.16749 | link |
2024-06-24 | Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image | Jinkun Hao et.al. | 2406.16710 | null |
2024-06-24 | Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling | Min-Seop Kwak et.al. | 2406.16695 | null |
2024-06-21 | Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild | Nadav Orzech et.al. | 2406.15331 | null |
2024-06-21 | Rethinking Remote Sensing Change Detection With A Mask View | Xiaowen Ma et.al. | 2406.15320 | link |
2024-06-21 | You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation | Hongyu Chen et.al. | 2406.15269 | null |
2024-06-21 | Evaluating Diversity in Automatic Poetry Generation | Yanran Chen et.al. | 2406.15267 | null |
2024-06-21 | Fingerprint Membership and Identity Inference Against Generative Adversarial Networks | Saverio Cavasin et.al. | 2406.15253 | null |
2024-06-21 | MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252 | null |
2024-06-21 | Unsupervised Bayesian Generation of Synthetic CT from CBCT Using Patient-Specific Score-Based Prior | Junbo Peng et.al. | 2406.15219 | null |
2024-06-21 | Sound and Fury, Signifying Nothing? Impact of Data Breach Disclosure Laws | Muhammad Zia Hydari et.al. | 2406.15215 | null |
2024-06-21 | Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors | Ali Naseh et.al. | 2406.15213 | null |
2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
2024-06-20 | A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Xincheng Shuai et.al. | 2406.14555 | link |
2024-06-21 | Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation | Eyal Michaeli et.al. | 2406.14551 | link |
2024-06-20 | Consistency Models Made Easy | Zhengyang Geng et.al. | 2406.14548 | link |
2024-06-20 | IRASim: Learning Interactive Real-Robot Action Simulators | Fangqi Zhu et.al. | 2406.14540 | null |
2024-06-20 | Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps | Nikita Starodubcev et.al. | 2406.14539 | null |
2024-06-20 | Fantastic Copyrighted Beasts and How (Not) to Generate Them | Luxi He et.al. | 2406.14526 | null |
2024-06-20 | Photoacoustic methane detection assisted by a gas-filled anti-resonant hollow-core fiber laser | Cuiling Zhang et.al. | 2406.14521 | null |
2024-06-20 | V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Rotem Shalev-Arkushin et.al. | 2406.14510 | null |
2024-06-20 | CodeRAG-Bench: Can Retrieval Augment Code Generation? | Zora Zhiruo Wang et.al. | 2406.14497 | link |
2024-06-20 | SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Josef Dai et.al. | 2406.14477 | link |
2024-06-20 | CollaFuse: Collaborative Diffusion Models | Simeon Allmendinger et.al. | 2406.14429 | link |
2024-06-20 | Active Diffusion Subsampling | Oisin Nolan et.al. | 2406.14388 | link |
2024-06-20 | Multicoloured Hardcore Model: Fast Mixing and Queueing | Sam Olesker-Taylor et.al. | 2406.14376 | null |
2024-06-20 | FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability | Md Fahim Sikder et.al. | 2406.14281 | link |
2024-06-20 | In Tree Structure Should Sentence Be Generated | Yaguang Li et.al. | 2406.14189 | link |
2024-06-20 | CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation | Tingwei Liu et.al. | 2406.14186 | link |
2024-06-20 | Tractable Equilibrium Computation in Markov Games through Risk Aversion | Eric Mazumdar et.al. | 2406.14156 | null |
2024-06-20 | ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning | Zhongjie Duan et.al. | 2406.14130 | link |
2024-06-20 | Dye4AI: Assuring Data Boundary on Generative AI Services | Shu Wang et.al. | 2406.14114 | null |
2024-06-20 | HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models | Xinrui Zhou et.al. | 2406.14098 | null |
2024-06-20 | Bridging bulk and surface: An interacting particle system towards the field-road diffusion model | Matthieu Alfaro et.al. | 2406.14093 | null |
2024-06-20 | A Practical Diffusion Path for Sampling | Omar Chehab et.al. | 2406.14040 | null |
2024-06-20 | Leveraging eBPF and AI for Ransomware Nose Out | Arjun Sekar et.al. | 2406.14020 | null |
2024-06-20 | Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition | Yimin Zhao et.al. | 2406.14014 | link |
2024-06-20 | Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs | Mahammed Kamruzzaman et.al. | 2406.13993 | null |
2024-06-20 | The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging | Georgi Ganev et.al. | 2406.13985 | link |
2024-06-20 | Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning | Tingyi Lin et.al. | 2406.13977 | null |
2024-06-20 | Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models | Yuan Zhong et.al. | 2406.13942 | null |
2024-06-20 | EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations | Jie Ren et.al. | 2406.13933 | null |
2024-06-20 | Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions | Hamdireza Rouzegar et.al. | 2406.13903 | null |
2024-06-19 | INFusion: Diffusion Regularized Implicit Neural Representations for 2D and 3D accelerated MRI reconstruction | Yamin Arefeen et.al. | 2406.13895 | null |
2024-06-19 | Open Generative Large Language Models for Galician | Pablo Gamallo et.al. | 2406.13893 | null |
2024-06-19 | StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | Davit Abrahamyan et.al. | 2406.13840 | link |
2024-06-19 | RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design | Rishabh Anand et.al. | 2406.13839 | link |
2024-06-19 | COAC: Cross-layer Optimization of Accelerator Configurability for Efficient CNN Processing | Steven Colleman et.al. | 2406.13752 | null |
2024-06-19 | GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Baiqi Li et.al. | 2406.13743 | link |
2024-06-19 | Tree-Sliced Wasserstein Distance on a System of Lines | Viet-Hoang Tran et.al. | 2406.13725 | null |
2024-06-19 | Hitchhiker’s guide on Energy-Based Models: a comprehensive review on the relation with other generative models, sampling and statistical physics | Davide Carbone et.al. | 2406.13661 | null |
2024-06-19 | Towards Minimal Targeted Updates of Language Models with Targeted Negative Training | Lily H. Zhang et.al. | 2406.13660 | link |
2024-06-19 | Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics | Weitong Zhang et.al. | 2406.13652 | null |
2024-06-19 | On AI-Inspired UI-Design | Jialiang Wei et.al. | 2406.13631 | null |
2024-06-19 | Can AI be enabled to dynamical downscaling? Training a Latent Diffusion Model to mimic km-scale COSMO-CLM downscaling of ERA5 over Italy | Elena Tomasi et.al. | 2406.13627 | link |
2024-06-19 | Enhance the Image: Super Resolution using Artificial Intelligence in MRI | Ziyu Li et.al. | 2406.13625 | null |
2024-06-19 | Generative Modeling by Minimizing the Wasserstein-2 Loss | Yu-Jui Huang et.al. | 2406.13619 | null |
2024-06-19 | Parameter Training Efficiency Aware Resource Allocation for AIGC in Space-Air-Ground Integrated Networks | Liangxin Qian et.al. | 2406.13602 | null |
2024-06-19 | ModSec-Learn: Boosting ModSecurity with Machine Learning | Christian Scano et.al. | 2406.13547 | link |
2024-06-19 | Towards Cyber Threat Intelligence for the IoT | Alfonso Iacovazzi et.al. | 2406.13543 | null |
2024-06-19 | Image Distillation for Safe Data Sharing in Histopathology | Zhe Li et.al. | 2406.13536 | link |
2024-06-19 | Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement | Chenda Li et.al. | 2406.13471 | null |
2024-06-19 | Unifying nonlinearly constrained nonconvex optimization | Charlie Vanaret et.al. | 2406.13454 | link |
2024-06-19 | Federating to Grow Transformers with Constrained Resources without Model Sharing | Shikun Shen et.al. | 2406.13450 | null |
2024-06-19 | Multi-messenger modeling of the Monogem pulsar halo | Youyou Li et.al. | 2406.13426 | null |
2024-06-19 | Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images | Haruo Fujiwara et.al. | 2406.13393 | null |
2024-06-19 | Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs | Hewen Wang et.al. | 2406.13369 | null |
2024-06-19 | Situational Instructions Database: Task Guidance in Dynamic Environments | Muhammad Saif Ullah Khan et.al. | 2406.13302 | link |
2024-06-19 | ARDuP: Active Region Video Diffusion for Universal Policies | Shuaiyi Huang et.al. | 2406.13301 | null |
2024-06-19 | AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models | Ken Chen et.al. | 2406.13272 | null |
2024-06-19 | Self-Supervised Diffusion Model for 3-D Seismic Data Reconstruction | Xinyang Wang et.al. | 2406.13252 | null |
2024-06-19 | Optimizing Inventory Management through Multiobjective Reverse Logistics with Environmental Impact | I. B. Wadhawan et.al. | 2406.13226 | null |
2024-06-19 | Neural Residual Diffusion Models for Deep Scalable Vision Generation | Zhiyuan Ma et.al. | 2406.13215 | null |
2024-06-19 | Surgical Triplet Recognition via Diffusion Model | Daochang Liu et.al. | 2406.13210 | null |
2024-06-19 | Diffusion Model-based FOD Restoration from High Distortion in dMRI | Shuo Huang et.al. | 2406.13209 | null |
2024-06-19 | Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach | Yicong Li et.al. | 2406.13201 | link |
2024-06-19 | Synthetic Context Generation for Question Generation | Naiming Liu et.al. | 2406.13188 | null |
2024-06-19 | Conditional score-based diffusion models for solving inverse problems in mechanics | Agnimitra Dasgupta et.al. | 2406.13154 | null |
2024-06-19 | von Mises Quasi-Processes for Bayesian Circular Regression | Yarden Cohen et.al. | 2406.13151 | null |
2024-06-19 | MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction | Jiaqi Cui et.al. | 2406.13150 | null |
2024-06-19 | GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement | Hao Wang et.al. | 2406.13136 | null |
2024-06-19 | Thruster-Assisted Incline Walking | Kaushik Venkatesh Krishnamurthy et.al. | 2406.13118 | null |
2024-06-18 | Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models | Paul Henderson et.al. | 2406.13099 | null |
2024-06-18 | RITA: A Real-time Interactive Talking Avatars Framework | Wuxinlin Cheng et.al. | 2406.13093 | null |
2024-06-18 | PIPPIN: Generating variable length full events from partons | Guillaume Quétant et.al. | 2406.13074 | link |
2024-06-18 | MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification | Harrison Gietz et.al. | 2406.13066 | link |
2024-06-18 | Traffic Prediction considering Multiple Levels of Spatial-temporal Information: A Multi-scale Graph Wavelet-based Approach | Zilin Bian et.al. | 2406.13038 | null |
2024-06-18 | Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities | Matthew T. C. Li et.al. | 2406.13036 | null |
2024-06-18 | Data Plagiarism Index: Characterizing the Privacy Risk of Data-Copying in Tabular Generative Models | Joshua Ward et.al. | 2406.13012 | null |
2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
2024-06-18 | Evaluating the design space of diffusion-based generative models | Yuqing Wang et.al. | 2406.12839 | null |
2024-06-18 | Neural Approximate Mirror Maps for Constrained Diffusion Models | Berthy T. Feng et.al. | 2406.12816 | null |
2024-06-19 | AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation | Xinyu Hou et.al. | 2406.12805 | link |
2024-06-18 | Extracting Training Data from Unconditional Diffusion Models | Yunhao Chen et.al. | 2406.12752 | null |
2024-06-18 | Useful stochastic bounds in time-varying queues with service and patience times having general joint distribution | Shreehari Anand Bodas et.al. | 2406.12745 | null |
2024-06-18 | SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation | Polina Karpikova et.al. | 2406.12700 | null |
2024-06-18 | Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation | Miseul Kim et.al. | 2406.12688 | null |
2024-06-18 | GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Yongtao Ge et.al. | 2406.12671 | link |
2024-06-18 | Research and Implementation of Data Enhancement Techniques for Graph Neural Networks | Jingzhao Gu et.al. | 2406.12640 | null |
2024-06-18 | News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | Andreea Iana et.al. | 2406.12634 | link |
2024-06-18 | Learning Diffusion at Lightspeed | Antonio Terpin et.al. | 2406.12616 | null |
2024-06-18 | Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images | Shivank Garg et.al. | 2406.12592 | link |
2024-06-18 | Behavior-Dependent Linear Recurrent Units for Efficient Sequential Recommendation | Chengkai Liu et.al. | 2406.12580 | link |
2024-06-18 | Training Diffusion Models with Federated Learning | Matthijs de Goede et.al. | 2406.12575 | null |
2024-06-18 | P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts | Yuhao Dan et.al. | 2406.12548 | null |
2024-06-18 | Structured Detection for Simultaneous Super-Resolution and Optical Sectioning in Laser Scanning Microscopy | Alessandro Zunino et.al. | 2406.12542 | link |
2024-06-18 | Variational Distillation of Diffusion Policies into Mixture of Experts | Hongyi Zhou et.al. | 2406.12538 | null |
2024-06-18 | HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors | Panwang Pan et.al. | 2406.12459 | link |
2024-06-18 | Planning Using Schrödinger Bridge Diffusion Models | Adarsh Srivastava et.al. | 2406.12458 | link |
2024-06-18 | Deep Temporal Deaggregation: Large-Scale Spatio-Temporal Generative Models | David Bergström et.al. | 2406.12423 | null |
2024-06-18 | ROVER: RTL Optimization via Verified E-Graph Rewriting | Samuel Coward et.al. | 2406.12421 | null |
2024-06-18 | TADM: Temporally-Aware Diffusion Model for Neurodegenerative Progression on Brain MRI | Mattia Litrico et.al. | 2406.12411 | null |
2024-06-18 | SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions | Yuexiong Ding et.al. | 2406.12395 | null |
Vision-Language Models
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-09-18 | Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution | Peng Wang et.al. | 2409.12191 | link |
2024-09-18 | All-in-one foundational models learning across quantum chemical levels | Yuxinxin Chen et.al. | 2409.12015 | link |
2024-09-18 | LMMCoDrive: Cooperative Driving with Large Multimodal Model | Haichao Liu et.al. | 2409.11981 | null |
2024-09-16 | MusicLIME: Explainable Multimodal Music Understanding | Theodoros Sotirou et.al. | 2409.10496 | link |
2024-09-19 | IRIS: Interactive Responsive Intelligent Segmentation for 3D Affordance Analysis | Meng Chu et.al. | 2409.10078 | null |
2024-09-16 | AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing | Huawei Ji et.al. | 2409.10016 | link |
2024-09-14 | Keypoints-Integrated Instruction-Following Data Generation for Enhanced Human Pose Understanding in Multimodal Models | Dewen Zhang et.al. | 2409.09306 | null |
2024-09-13 | Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing | Minh-Duc Vu et.al. | 2409.08885 | null |
2024-09-13 | A Multimodal Approach for Fluid Overload Prediction: Integrating Lung Ultrasound and Clinical Data | Tianqi Yang et.al. | 2409.08790 | null |
2024-09-13 | Dynamics of Collective Group Affect: Group-level Annotations and the Multimodal Modeling of Convergence and Divergence | Navin Raj Prabhu et.al. | 2409.08578 | null |
2024-09-13 | A Comprehensive Survey on Deep Multimodal Learning with Missing Modality | Renjie Wu et.al. | 2409.07825 | null |
2024-09-12 | Top-down Activity Representation Learning for Video Question Answering | Yanan Wang et.al. | 2409.07748 | null |
2024-09-11 | What to align in multimodal contrastive learning? | Benoit Dufumier et.al. | 2409.07402 | null |
2024-09-11 | MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis | Hanyu Jiang et.al. | 2409.07129 | null |
2024-09-11 | FSMDet: Vision-guided feature diffusion for fully sparse 3D detector | Tianran Liu et.al. | 2409.06945 | null |
2024-09-16 | Scaling Law Hypothesis for Multimodal Model | Qingyun Sun et.al. | 2409.06754 | null |
2024-09-10 | Multiclass Arrhythmia Classification using Smartwatch Photoplethysmography Signals Collected in Real-life Settings | Dong Han et.al. | 2409.06147 | null |
2024-09-11 | A Survey of Multimodal Composite Editing and Retrieval | Suyan Li et.al. | 2409.05405 | link |
2024-09-05 | Learning in Order! A Sequential Strategy to Learn Invariant Features for Multimodal Sentiment Analysis | Xianbing Zhao et.al. | 2409.04473 | null |
2024-09-06 | Generating Faithful and Salient Text from Multimodal Data | Tahsina Hashem et.al. | 2409.03961 | link |
2024-09-06 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Wentao Liu et.al. | 2409.02834 | null |
2024-09-10 | MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark | Xiang Yue et.al. | 2409.02813 | null |
2024-09-04 | Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models | Chih-Yuan Li et.al. | 2409.02530 | null |
2024-09-03 | Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models | Bin Fu et.al. | 2409.01560 | null |
2024-09-03 | Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition | Yaozong Gan et.al. | 2409.01534 | null |
2024-09-02 | Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models | Jiao Chen et.al. | 2409.01207 | null |
2024-09-02 | Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information | Yi Chen et.al. | 2409.01179 | null |
2024-08-31 | Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification | Aref Farhadipour et.al. | 2409.00562 | null |
2024-08-30 | UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios | Baichuan Zhou et.al. | 2408.17267 | null |
2024-08-29 | Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning | Boyu Chen et.al. | 2408.16577 | null |
2024-08-29 | Toward Robust Early Detection of Alzheimer’s Disease via an Integrated Multimodal Learning Approach | Yifei Chen et.al. | 2408.16343 | link |
2024-08-28 | Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis | Sijie Mai et.al. | 2408.16029 | null |
2024-08-28 | ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation | Tiantian Feng et.al. | 2408.15803 | null |
2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | null |
2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
2024-08-27 | The Benefits of Balance: From Information Projections to Variance Reduction | Lang Liu et.al. | 2408.15065 | null |
2024-08-27 | NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework | Shuangchen Zhao et.al. | 2408.14950 | null |
2024-08-26 | MMR: Evaluating Reading Ability of Large Multimodal Models | Jian Chen et.al. | 2408.14594 | null |
2024-09-03 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-26 | LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models | Qihang Ge et.al. | 2408.14008 | null |
2024-08-27 | Quantum Multimodal Contrastive Learning Framework | Chi-Sheng Chen et.al. | 2408.13919 | null |
2024-08-25 | Tangram: A Challenging Benchmark for Geometric Element Recognizing | Jiamin Tang et.al. | 2408.13854 | null |
2024-08-25 | Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples | Jayakanth Kunhoth et.al. | 2408.13754 | null |
2024-08-24 | Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models | Sakhinana Sagar Srinivas et.al. | 2408.13621 | null |
2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
2024-08-23 | Indoor scene recognition from images under visual corruptions | Willams de Lima Costa et.al. | 2408.13029 | null |
2024-08-23 | Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition | Cam-Van Thi Nguyen et.al. | 2408.12895 | null |
2024-08-23 | Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey | Qika Lin et.al. | 2408.12880 | link |
2024-08-22 | Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models | Jean Park et.al. | 2408.12763 | null |
2024-08-22 | Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization | Luyao Cheng et.al. | 2408.12102 | null |
2024-08-22 | Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment | Jinghui Qin et.al. | 2408.12088 | null |
2024-08-21 | GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models | Jonathan Roberts et.al. | 2408.11817 | null |
2024-08-21 | D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models | M. Forlini et.al. | 2408.11761 | null |
2024-08-21 | UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Xiangyu Zhao et.al. | 2408.11305 | link |
2024-08-21 | BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation | Haotian Peng et.al. | 2408.11281 | link |
2024-08-20 | Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays | Cynthia Zastudil et.al. | 2408.11137 | null |
2024-08-21 | SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Zebang Cheng et.al. | 2408.10500 | link |
2024-08-19 | Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting | Yun-Da Tsai et.al. | 2408.09798 | null |
2024-08-19 | Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Yunxin Li et.al. | 2408.09787 | link |
2024-08-18 | PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding | Dawei Dai et.al. | 2408.09530 | link |
2024-08-17 | Measuring Visual Sycophancy in Multimodal Models | Jaehyuk Lim et.al. | 2408.09111 | null |
2024-08-16 | AdaRank: Disagreement Based Module Rank Prediction for Low-rank Adaptation | Yihe Dong et.al. | 2408.09015 | link |
2024-08-16 | xGen-MM (BLIP-3): A Family of Open Large Multimodal Models | Le Xue et.al. | 2408.08872 | null |
2024-08-16 | Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs | Jinming Liu et.al. | 2408.08575 | null |
2024-08-15 | LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Jiajie Li et.al. | 2408.07981 | null |
2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | link |
2024-08-14 | Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach | Muhammad Saad Saeed et.al. | 2408.07445 | null |
2024-08-14 | Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration | Xiaogen Zhon et.al. | 2408.07341 | link |
2024-08-14 | Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Peiyuan Chen et.al. | 2408.07303 | null |
2024-08-13 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | Xiaomin Wu et.al. | 2408.07037 | null |
2024-08-13 | EditScribe: Non-Visual Image Editing with Natural Language Verification Loops | Ruei-Che Chang et.al. | 2408.06632 | null |
2024-08-13 | CROME: Cross-Modal Adapters for Efficient Multimodal LLM | Sayna Ebrahimi et.al. | 2408.06610 | null |
2024-08-13 | Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning | Jieming Bian et.al. | 2408.06549 | null |
2024-08-12 | VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | Xiao Liu et.al. | 2408.06327 | link |
2024-08-11 | HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes | Xuanyu Su et.al. | 2408.05794 | null |
2024-08-08 | Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs | Aliki Anagnostopoulou et.al. | 2408.04331 | null |
2024-08-06 | LLaVA-OneVision: Easy Visual Task Transfer | Bo Li et.al. | 2408.03326 | link |
2024-08-06 | Multitask and Multimodal Neural Tuning for Large Models | Hao Sun et.al. | 2408.03001 | null |
2024-08-06 | Body of Her: A Preliminary Study on End-to-End Humanoid Agent | Tenglong Ao et.al. | 2408.02879 | null |
2024-08-04 | Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion | Shaoxu Cheng et.al. | 2408.02695 | null |
2024-08-02 | A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications | Valerio Guarrasi et.al. | 2408.02686 | null |
2024-08-05 | REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Agneet Chatterjee et.al. | 2408.02231 | null |
2024-08-04 | CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization | Xiang He et.al. | 2408.01952 | link |
2024-08-02 | MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | Benno Weck et.al. | 2408.01337 | link |
2024-08-05 | Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Jin Gao et.al. | 2408.01091 | link |
2024-08-02 | GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging | Saleh Sakib Ahmed et.al. | 2408.00984 | link |
2024-08-01 | MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | Weihao Yu et.al. | 2408.00765 | link |
2024-08-01 | GalleryGPT: Analyzing Paintings with Large Multimodal Models | Yi Bin et.al. | 2408.00491 | link |
2024-08-01 | Everything We Hear: Towards Tackling Misinformation in Podcasts | Sachin Pathiyan Cherumanal et.al. | 2408.00292 | null |
2024-08-01 | OmniParser for Pure Vision Based GUI Agent | Yadong Lu et.al. | 2408.00203 | null |
2024-07-30 | Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection | Jinfa Huang et.al. | 2407.21004 | null |
2024-07-30 | HyperMM : Robust Multimodal Learning with Varying-sized Inputs | Hava Chaptoukaev et.al. | 2407.20768 | null |
2024-07-30 | Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos | Dhruv Verma et.al. | 2407.20642 | link |
2024-07-29 | Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter | Chao Liu et.al. | 2407.19981 | null |
2024-07-29 | ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | Wenjun Huang et.al. | 2407.19832 | null |
2024-08-02 | XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training | Biao Wu et.al. | 2407.19546 | link |
2024-07-28 | Detached and Interactive Multimodal Learning | Yunfeng Fan et.al. | 2407.19514 | link |
2024-07-27 | Data Processing Techniques for Modern Multimodal Models | Yinheng Li et.al. | 2407.19180 | null |
2024-07-26 | MangaUB: A Manga Understanding Benchmark for Large Multimodal Models | Hikaru Ikuta et.al. | 2407.19034 | null |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
2024-07-26 | ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema | Fei Wang et.al. | 2407.18716 | null |
2024-07-25 | Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis | Cristian-Alexandru Botocan et.al. | 2407.18251 | link |
2024-07-25 | $\mathbb{X}$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs | Vlad Sobal et.al. | 2407.18134 | null |
2024-07-25 | Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis | Jatin Chaudhary et.al. | 2407.18060 | null |
2024-07-25 | What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models | Tessa Verhoef et.al. | 2407.17974 | null |
2024-07-25 | Shapley Value-based Contrastive Alignment for Multimodal Information Extraction | Wen Luo et.al. | 2407.17854 | null |
2024-07-25 | Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning | Vedanshu et.al. | 2407.17813 | null |
2024-07-25 | KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models | Eunice Yiu et.al. | 2407.17773 | link |
2024-07-24 | Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles | Zuoyin Tang et.al. | 2407.17211 | null |
2024-07-23 | Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities | Muhammad Irzam Liaqat et.al. | 2407.16243 | null |
2024-07-22 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Haoning Wu et.al. | 2407.15754 | link |
2024-07-22 | Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training | Ye Lin Tun et.al. | 2407.15426 | null |
2024-07-21 | VideoGameBunny: Towards vision assistants for video games | Mohammad Reza Taesiri et.al. | 2407.15295 | null |
2024-07-22 | Patch-based Intuitive Multimodal Prototypes Network (PIMPNet) for Alzheimer’s Disease classification | Lisa Anita De Santi et.al. | 2407.14277 | link |
2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | link |
2024-07-17 | Text- and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild | Nicolas Richet et.al. | 2407.12927 | link |
2024-07-16 | ChatBCG: Can AI Read Your Slide Deck? | Nikita Singh et.al. | 2407.12875 | null |
2024-07-17 | LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Kaichen Zhang et.al. | 2407.12772 | link |
2024-07-17 | Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models | Donggeun Kim et.al. | 2407.12616 | null |
2024-07-17 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang et.al. | 2407.12580 | link |
2024-07-16 | FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models | Pengxiang Li et.al. | 2407.11522 | null |
2024-07-16 | COMET: “Cone of experience” enhanced large multimodal model for mathematical problem generation | Sannyuya Liu et.al. | 2407.11315 | null |
2024-07-15 | OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models | Zijian Zhou et.al. | 2407.11213 | null |
2024-07-15 | FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries | Yuqi Jiang et.al. | 2407.10810 | null |
2024-07-15 | Scaling 3D Reasoning with LMMs to Large Robot Mission Environments Using Datagraphs | W. J. Meijer et.al. | 2407.10743 | null |
2024-07-16 | Qwen2 Technical Report | An Yang et.al. | 2407.10671 | link |
2024-07-15 | How and where does CLIP process negation? | Vincent Quantmeyer et.al. | 2407.10488 | null |
2024-07-12 | Diagnosing and Re-learning for Balanced Multimodal Learning | Yake Wei et.al. | 2407.09705 | link |
2024-07-12 | Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX | Zhiyuan Chen et.al. | 2407.09274 | link |
2024-07-12 | DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training | Chen Xin et.al. | 2407.09174 | link |
2024-07-11 | Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design | Jingyi Xie et.al. | 2407.08882 | null |
2024-07-10 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang et.al. | 2407.08044 | link |
2024-07-10 | LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | Feng Li et.al. | 2407.07895 | link |
2024-07-11 | InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Chenguo Lin et.al. | 2407.07580 | null |
2024-07-10 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053 | link |
2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
2024-07-07 | Multimodal Language Models for Domain-Specific Procedural Video Summarization | Nafisa Hussain et.al. | 2407.05419 | null |
2024-07-07 | Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition | Zirun Guo et.al. | 2407.05374 | link |
2024-07-06 | Enhance the Robustness of Text-Centric Multimodal Alignments | Ting-Yu Yen et.al. | 2407.05036 | null |
2024-07-06 | Completed Feature Disentanglement Learning for Multimodal MRIs Analysis | Tianling Liu et.al. | 2407.04916 | null |
2024-07-06 | MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension | Zekun Li et.al. | 2407.04903 | link |
2024-07-05 | VCoME: Verbal Video Composition with Multimodal Editing Effects | Weibo Gong et.al. | 2407.04697 | null |
2024-07-05 | Multimodal Classification via Modal-Aware Interactive Enhancement | Qing-Yuan Jiang et.al. | 2407.04587 | null |
2024-07-05 | Robust Multimodal Learning via Representation Decoupling | Shicai Wei et.al. | 2407.04458 | null |
2024-07-05 | Smart Vision-Language Reasoners | Denisa Roberts et.al. | 2407.04212 | link |
2024-07-04 | Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks | Amit Parekh et.al. | 2407.03967 | link |
2024-07-04 | ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities | Julie Mordacq et.al. | 2407.03836 | link |
2024-07-04 | M $\mathbf5$ – A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks | Florian Schneider et.al. | 2407.03791 | null |
2024-07-03 | HEMM: Holistic Evaluation of Multimodal Foundation Models | Paul Pu Liang et.al. | 2407.03418 | link |
2024-07-02 | Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties | Srivathsan Badrinarayanan et.al. | 2407.03380 | link |
2024-07-02 | Understanding Alignment in Multimodal LLMs: A Comprehensive Study | Elmira Amirloo et.al. | 2407.02477 | null |
2024-07-02 | Synthetic Multimodal Question Generation | Ian Wu et.al. | 2407.02233 | null |
2024-07-02 | Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models | Anjishnu Mukherjee et.al. | 2407.02067 | link |
2024-07-01 | Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents | Mehdi Arjmand et.al. | 2407.01824 | link |
2024-07-01 | We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Runqi Qiao et.al. | 2407.01284 | link |
2024-07-01 | Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models | Shaeke Salman et.al. | 2407.01157 | null |
2024-06-29 | AI-powered multimodal modeling of personalized hemodynamics in aortic stenosis | Caglar Ozturk et.al. | 2407.00535 | null |
2024-06-29 | MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation | Jinsheng Huang et.al. | 2407.00468 | link |
2024-06-29 | How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models | Jaeyoung Lee et.al. | 2407.00369 | null |
2024-06-28 | PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration | Yuxuan Sun et.al. | 2407.00203 | null |
2024-06-28 | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Yuxuan Zhang et.al. | 2406.20076 | link |
2024-06-28 | InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding | Kirolos Ataallah et.al. | 2406.19875 | link |
2024-06-28 | MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis | Jun-Yan He et.al. | 2406.19859 | null |
2024-06-28 | MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment | Jihao Liu et.al. | 2406.19736 | link |
2024-06-28 | Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction | Akash Awasthi et.al. | 2406.19686 | null |
2024-06-28 | SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs | Xin Su et.al. | 2406.19593 | null |
2024-06-27 | OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Tao Zhang et.al. | 2406.19389 | null |
2024-06-28 | FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Shubhankar Singh et.al. | 2406.19237 | null |
2024-06-27 | RAVEN: Multitask Retrieval Augmented Vision-Language Learning | Varun Nagaraj Rao et.al. | 2406.19150 | null |
2024-06-27 | DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming | Jiaxin Zhang et.al. | 2406.19101 | null |
2024-06-27 | Fairness and Bias in Multimodal AI: A Survey | Tosin Adewumi et.al. | 2406.19097 | null |
2024-06-27 | MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation | Sanggeon Yun et.al. | 2406.18815 | null |
2024-06-26 | MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data | William Berman et.al. | 2406.18790 | null |
2024-06-26 | S3: A Simple Strong Sample-effective Multimodal Dialog System | Elisei Rykov et.al. | 2406.18305 | link |
2024-06-26 | EHR-Based Mobile and Web Platform for Chronic Disease Risk Prediction Using Large Language Multimodal Models | Chun-Chieh Liao et.al. | 2406.18087 | null |
2024-06-26 | Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs | Uttaran Bhattacharya et.al. | 2406.18068 | null |
2024-06-25 | Human-centered In-building Embodied Delivery Benchmark | Zhuoqun Xu et.al. | 2406.17898 | link |
2024-06-25 | InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation | Jinbin Huang et.al. | 2406.17838 | null |
2024-06-25 | Data curation via joint example selection further accelerates multimodal learning | Talfan Evans et.al. | 2406.17711 | null |
2024-06-25 | Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights | Hao Yang et.al. | 2406.17430 | null |
2024-06-24 | At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models | Dimitrios Tanoglidis et.al. | 2406.17057 | null |
2024-06-24 | Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models | Jierun Chen et.al. | 2406.16866 | link |
2024-06-24 | Long Context Transfer from Language to Vision | Peiyuan Zhang et.al. | 2406.16852 | link |
2024-06-24 | QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds | Ye Wang et.al. | 2406.16578 | null |
2024-06-21 | Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning | Brandon Huang et.al. | 2406.15334 | null |
2024-06-21 | Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | Jiayu Wang et.al. | 2406.14852 | null |
2024-06-20 | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models | Giulia Polverini et.al. | 2406.14685 | null |
2024-06-20 | Revealing Vision-Language Integration in the Brain with Multimodal Networks | Vighnesh Subramaniam et.al. | 2406.14481 | link |
2024-06-25 | iWISDM: Assessing instruction following in multimodal models at scale | Xiaoxuan Lei et.al. | 2406.14343 | link |
2024-06-20 | Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models | Sherzod Hakimov et.al. | 2406.14035 | null |
2024-06-20 | Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning | Yupei Zhang et.al. | 2406.13979 | link |
2024-06-20 | PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents | Junjie Wang et.al. | 2406.13923 | null |
2024-06-19 | Through the Theory of Mind’s Eye: Reading Minds with Multimodal Video Large Language Models | Zhawnen Chen et.al. | 2406.13763 | null |
2024-06-19 | GUI Action Narrator: Where and When Did That Action Take Place? | Qinchen Wu et.al. | 2406.13719 | null |
2024-06-19 | Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor | Veedant Jain et.al. | 2406.13564 | null |
2024-06-19 | VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models | Haowen Hou et.al. | 2406.13362 | link |
2024-06-19 | Learnable In-Context Vector for Visual Question Answering | Yingzhe Peng et.al. | 2406.13185 | null |
2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
2024-06-18 | OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | Zhen Huang et.al. | 2406.12753 | link |
2024-06-18 | Disturbing Image Detection Using LMM-Elicited Emotion Embeddings | Maria Tzelepi et.al. | 2406.12668 | null |
2024-06-18 | Automatic benchmarking of large multimodal models via iterative experiment programming | Alessandro Conti et.al. | 2406.12321 | link |
2024-06-18 | Language and Multimodal Models in Sports: A Survey of Datasets and Applications | Haotian Xia et.al. | 2406.12252 | null |
2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816 | null |
2024-06-17 | LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | Dantong Niu et.al. | 2406.11815 | null |
2024-06-17 | Multimodal Learning To Improve Segmentation With Intraoperative CBCT & Preoperative CT | Maximilian E. Tschuchnig et.al. | 2406.11650 | null |
2024-06-17 | Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Chao Wen et.al. | 2406.11334 | null |
2024-06-17 | VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Yunxin Li et.al. | 2406.11303 | null |
2024-06-17 | i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment | Daechul Ahn et.al. | 2406.11280 | link |
2024-06-17 | MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens | Anas Awadalla et.al. | 2406.11271 | link |
2024-06-17 | Generative Visual Instruction Tuning | Jefferson Hernandez et.al. | 2406.11262 | link |
2024-06-17 | Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective | Yang Chen et.al. | 2406.11249 | null |
2024-06-16 | Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies | Hung-Ting Su et.al. | 2406.10923 | null |
2024-06-15 | Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Lu Xu et.al. | 2406.10484 | null |
2024-06-12 | MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases | Rithesh Murthy et.al. | 2406.10290 | null |
2024-06-14 | VideoGUI: A Benchmark for GUI Automation from Instructional Videos | Kevin Qinghong Lin et.al. | 2406.10227 | null |
2024-06-14 | ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation | Chufan Shi et.al. | 2406.09961 | link |
2024-06-14 | BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval | Imanol Miranda et.al. | 2406.09952 | link |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
2024-06-14 | 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Roman Bachmann et.al. | 2406.09406 | null |
2024-06-13 | Yo’LLaVA: Your Personalized Language and Vision Assistant | Thao Nguyen et.al. | 2406.09400 | null |
2024-06-13 | CMC-Bench: Towards a New Paradigm of Visual Signal Compression | Chunyi Li et.al. | 2406.09356 | link |
2024-06-13 | Comparison Visual Instruction Tuning | Wei Lin et.al. | 2406.09240 | null |
2024-06-13 | Zoom and Shift are All You Need | Jiahao Qin et.al. | 2406.08866 | null |
2024-06-11 | Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes | Asim Waqas et.al. | 2406.08521 | null |
2024-06-14 | Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | Yi-Fan Zhang et.al. | 2406.08487 | link |
2024-06-13 | OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418 | link |
2024-06-12 | A Concept-Based Explainability Framework for Large Multimodal Models | Jayneel Parekh et.al. | 2406.08074 | null |
2024-06-12 | LVBench: An Extreme Long Video Understanding Benchmark | Weihan Wang et.al. | 2406.08035 | link |
2024-06-11 | Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis | David Ortiz-Perez et.al. | 2406.07542 | link |
2024-06-11 | Understanding Visual Concepts Across Models | Brandon Trabucco et.al. | 2406.07506 | link |
2024-06-11 | Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology | Huahui Yi et.al. | 2406.07078 | link |
2024-06-14 | BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification | June-Woo Kim et.al. | 2406.06786 | link |
2024-06-10 | Vript: A Video Is Worth Thousands of Words | Dongjie Yang et.al. | 2406.06040 | link |
2024-06-10 | FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model | Yebin Lee et.al. | 2406.06004 | link |
2024-06-10 | CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark | David Romero et.al. | 2406.05967 | null |
2024-06-09 | Stealthy Targeted Backdoor Attacks against Image Captioning | Wenshu Fan et.al. | 2406.05874 | link |
2024-06-09 | F-LMM: Grounding Frozen Large Multimodal Models | Size Wu et.al. | 2406.05821 | link |
2024-06-08 | Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities | Sai Munikoti et.al. | 2406.05496 | null |
2024-06-07 | Semantic Segmentation on VSPW Dataset through Masked Video Consistency | Chen Liang et.al. | 2406.04979 | null |
2024-06-07 | Predictive Dynamic Fusion | Bing Cao et.al. | 2406.04802 | link |
2024-06-07 | MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description | Cong Yang et.al. | 2406.04716 | link |
2024-06-07 | AICoderEval: Improving AI Domain Code Generation of Large Language Models | Yinghui Xia et.al. | 2406.04712 | null |
2024-06-06 | GenAI Arena: An Open Evaluation Platform for Generative Models | Dongfu Jiang et.al. | 2406.04485 | null |
2024-06-06 | MAIRA-2: Grounded Radiology Report Generation | Shruthi Bannur et.al. | 2406.04449 | link |
2024-06-06 | DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Lingchen Meng et.al. | 2406.04334 | null |
2024-06-06 | BLSP-Emo: Towards Empathetic Large Speech-Language Models | Chen Wang et.al. | 2406.03872 | link |
2024-06-05 | Identification of Stone Deterioration Patterns with Large Multimodal Models | Daniele Corradetti et.al. | 2406.03207 | link |
2024-06-05 | Exploiting LMM-based knowledge for image classification tasks | Maria Tzelepi et.al. | 2406.03071 | null |
2024-06-02 | Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications | David Restrepo et.al. | 2406.02601 | null |
2024-06-04 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Alex Jinpeng Wang et.al. | 2406.02547 | link |
2024-06-04 | Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization | Yunpeng Zhao et.al. | 2406.01987 | null |
2024-06-03 | Automatic Fused Multimodal Deep Learning for Plant Identification | Alfreds Lapkovskis et.al. | 2406.01455 | link |
2024-06-05 | Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data | Zhusi Zhong et.al. | 2406.01302 | null |
2024-06-03 | Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model | Kezhen Chen et.al. | 2406.00977 | link |
2024-06-02 | Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient | Zechu Li et.al. | 2406.00681 | null |
2024-06-04 | StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond | Pengyuan Lyu et.al. | 2405.21013 | null |
2024-05-31 | Don’t Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models | A. Bavaresco et.al. | 2405.20846 | link |
2024-06-17 | Ovis: Structural Embedding Alignment for Multimodal Large Language Model | Shiyin Lu et.al. | 2405.20797 | link |
2024-05-31 | Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning | Yang Chen et.al. | 2405.20606 | link |
2024-05-30 | Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA | Qianqi Yan et.al. | 2405.20421 | link |
2024-05-30 | Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use | Franz Louis Cesista et.al. | 2405.20245 | null |
2024-05-31 | Visual Attention Analysis in Online Learning | Miriam Navarro et.al. | 2405.20091 | null |
2024-05-30 | MM-Lego: Modular Biomedical Multimodal Models with Minimal Fine-Tuning | Konstantin Hemker et.al. | 2405.19950 | null |
2024-05-30 | Instruction-Guided Visual Masking | Jinliang Zheng et.al. | 2405.19783 | link |
2024-05-29 | Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining | Blake R. Duschatko et.al. | 2405.19386 | null |
2024-06-09 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
2024-05-29 | Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare | Hanwei Zhu et.al. | 2405.19298 | link |
2024-05-31 | Benchmarking and Improving Detail Image Caption | Hongyuan Dong et.al. | 2405.19092 | link |
2024-05-29 | Topological Perspectives on Optimal Multimodal Embedding Spaces | Abdul Aziz A. B et.al. | 2405.18867 | null |
2024-05-29 | Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches | A. Hammad et.al. | 2405.18834 | null |
2024-05-28 | The Evolution of Multimodal Model Architectures | Shakti N. Wadekar et.al. | 2405.17927 | null |
2024-05-28 | Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment | Xin Xiao et.al. | 2405.17871 | link |
2024-05-28 | Full-Stack Allreduce on Multi-Rail Networks | Enda Yu et.al. | 2405.17870 | null |
2024-05-28 | MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance | Yake Wei et.al. | 2405.17730 | link |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
2024-05-27 | XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser | Xianfu Cheng et.al. | 2405.17336 | null |
2024-05-28 | LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding | Haoyu Zhao et.al. | 2405.17104 | null |
2024-05-27 | Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning | Zihua Zhao et.al. | 2405.16996 | link |
2024-05-27 | Multilingual Diversity Improves Vision-Language Representations | Thao Nguyen et.al. | 2405.16915 | null |
2024-05-26 | Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs | Mustafa Shukor et.al. | 2405.16700 | link |
2024-05-25 | How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect | Siddhartha K. Vemuri et.al. | 2405.16128 | null |
2024-05-24 | ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models | Chunjiang Ge et.al. | 2405.15738 | link |
2024-05-24 | Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models | Yongsheng Yu et.al. | 2405.15687 | null |
2024-05-24 | M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models | Hongyu Wang et.al. | 2405.15638 | link |
2024-05-24 | DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception | Run Luo et.al. | 2405.15232 | link |
2024-05-24 | Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search | Marie Al Ghossein et.al. | 2405.15190 | link |
Generative Weight Space Modeling
Publish Date | Title | Authors | Code | |||
---|---|---|---|---|---|---|
2024-09-18 | Monomial Matrix Group Equivariant Neural Functional Networks | Hoang V. Tran et.al. | 2409.11697 | null | ||
2024-09-17 | Existence of an extremal function of Sobolev critical embedding with an $α$ -homogeneous weight | Petr Gurka et.al. | 2409.11193 | null | ||
2024-09-16 | Inferring stellar parameters and their uncertainties from high-resolution spectroscopy using invertible neural networks | Nils Candebat et.al. | 2409.10621 | null | ||
2024-09-13 | Non-unitary Wightman CFTs and non-unitary vertex algebras | Sebastiano Carpi et.al. | 2409.08454 | null | ||
2024-09-12 | Global well-posedness and scattering in weighted space for nonlinear Schrödinger equations below the Strauss exponent without gauge-invariance | Masaki Kawamoto et.al. | 2409.08432 | null | ||
2024-09-09 | Fast gradient-free optimization of excitations in variational quantum eigensolvers | Jonas Jäger et.al. | 2409.05939 | null | ||
2024-09-06 | SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields | Yuze Wang et.al. | 2409.04482 | null | ||
2024-09-04 | Federated Quantum-Train with Batched Parameter Generation | Chen-Yu Liu et.al. | 2409.02763 | null | ||
2024-09-16 | Regret Analysis for Randomized Gaussian Process Upper Confidence Bound | Shion Takeno et.al. | 2409.00979 | null | ||
2024-08-30 | Abstracted Gaussian Prototypes for One-Shot Concept Learning | Chelsea Zou et.al. | 2408.17251 | link | ||
2024-08-23 | Emergence of global receptive fields capturing multipartite quantum correlations | Oleg M. Sotnikov et.al. | 2408.13033 | null | ||
2024-08-22 | **Action of $\mathfrak{osp}(1 | 2n)$ on polynomials tensor $\mathbb{C}^{0 | 2n}$** | Dwight Anderson Williams II et.al. | 2408.12324 | null |
2024-08-19 | Unimodal sequences and mixed false theta functions | Kevin Allen et.al. | 2408.09789 | null | ||
2024-08-16 | Onsager-Machlup functional for stochastic lattice dynamical systems driven by time-varying noise | Xinze Zhang et.al. | 2408.08465 | null | ||
2024-08-10 | Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks | Yoav Gelberg et.al. | 2408.05496 | null | ||
2024-08-09 | Quasilinear parabolic equations with superlinear nonlinearities in critical spaces | Bogdan-Vasile Matioc et.al. | 2408.05067 | null | ||
2024-08-08 | A framework for generalizing toric inequalities for holographic entanglement entropy | Ning Bao et.al. | 2408.04741 | null | ||
2024-08-07 | Counterfactuals and Uncertainty-Based Explainable Paradigm for the Automated Detection and Segmentation of Renal Cysts in Computed Tomography Images: A Multi-Center Study | Zohaib Salahuddin et.al. | 2408.03789 | null | ||
2024-08-05 | BOTS-LM: Training Large Language Models for Setswana | Nathan Brown et.al. | 2408.02239 | null | ||
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415 | null | ||
2024-08-01 | Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization | Róisín Luo et.al. | 2408.00923 | null | ||
2024-07-31 | Semantic Codebook Learning for Dynamic Recommendation Models | Zheqi Lv et.al. | 2408.00123 | null | ||
2024-07-29 | Tensor product weight modules over the affine-Virasoro algebra | Qiu-Fan Chen et.al. | 2407.19844 | null | ||
2024-07-24 | Generalized Hilbert operators acting on weighted spaces of holomorphic functions with sup-norms | María J. Beltrán-Meneu et.al. | 2407.17646 | null | ||
2024-07-24 | Generalized Ordinal Priority Approach for Multi-Attribute Decision-Making under Incomplete Preference Information | Renlong Wang et.al. | 2407.17099 | null | ||
2024-07-22 | WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation | Zirui Shao et.al. | 2407.15502 | link | ||
2024-07-18 | FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning | Tristan Cinquin et.al. | 2407.13711 | null | ||
2024-07-19 | Parameter Generation of Quantum Approximate Optimization Algorithm with Diffusion Model | Fanxu Meng et.al. | 2407.12242 | null | ||
2024-07-24 | Effect Heterogeneity with Earth Observation in Randomized Controlled Trials: Exploring the Role of Data, Model, and Evaluation Metric Choice | Connor T. Jerzak et.al. | 2407.11674 | link | ||
2024-07-15 | Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion | Yongyuan Liang et.al. | 2407.10973 | null | ||
2024-07-16 | The well-posedness of generalized nonlinear wave equations on the lattice graph | Bobo Hua et.al. | 2407.09815 | null | ||
2024-07-15 | Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization | Jinlong Li et.al. | 2407.08374 | null | ||
2024-07-09 | Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | Ruochen Jin et.al. | 2407.07089 | link | ||
2024-07-04 | Recovering Initial States in Semilinear Parabolic Problems from Time-Averages | Lina Sophie Schmitz et.al. | 2407.03829 | null | ||
2024-07-01 | A quantum deformation of the ${\mathcal N}=2$ superconformal algebra | H. Awata et.al. | 2407.00901 | null | ||
2024-06-24 | WARP: On the Benefits of Weight Averaged Rewarded Policies | Alexandre Ramé et.al. | 2406.16768 | null | ||
2024-06-24 | Improving robustness to corruptions with multiplicative weight perturbations | Trung Trinh et.al. | 2406.16540 | null | ||
2024-06-21 | Determination of certain mod $p$ Galois representations using local constancy | Abhik Ganguli et.al. | 2406.15600 | null | ||
2024-06-21 | Elliptic analysis on collapsing gravitational instantons modelled using the Gibbons-Hawking ansatz | Willem Adriaan Salm et.al. | 2406.15008 | null | ||
2024-06-20 | MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization | Zhaozhe Hu et.al. | 2406.14259 | link | ||
2024-06-18 | From Instance Training to Instruction Learning: Task Adapters Generation from Instructions | Huanxuan Liao et.al. | 2406.12382 | link | ||
2024-06-17 | Kaniadakis entropy in extreme gravitational and cosmological environments: a review on the state-of-the-art and future prospects | Giuseppe Gaetano Luciano et.al. | 2406.11373 | null | ||
2024-06-16 | Analysis and approximation of elliptic problems with Uhlenbeck structure in convex polytopes | Tadele Mengesha et.al. | 2406.10762 | null | ||
2024-06-14 | Towards Scalable and Versatile Weight Space Learning | Konstantin Schürholt et.al. | 2406.09997 | link | ||
2024-06-13 | Interpreting the Weight Space of Customized Diffusion Models | Amil Dravid et.al. | 2406.09413 | link | ||
2024-06-12 | Diffusion Soup: Model Merging for Text-to-Image Diffusion Models | Benjamin Biggs et.al. | 2406.08431 | null | ||
2024-06-24 | Cartan monopoles | Andrei Smilga et.al. | 2406.06042 | null | ||
2024-06-08 | Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models | Minho Park et.al. | 2406.05432 | null | ||
2024-06-06 | Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks | Tristan Cinquin et.al. | 2406.04317 | null | ||
2024-06-06 | A characterization of $(μ,ν)$ -dichotomies via admissibility | Lucas Backes et.al. | 2406.04126 | null | ||
2024-06-05 | Reproducing Kernel Thesis of Hankel Operators on Weighted Hardy Spaces | Ana Čolović et.al. | 2406.03106 | null | ||
2024-05-21 | Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration | Wei Ji et.al. | 2406.01601 | null | ||
2024-05-29 | Thermodynamics of the most generalized form of Holographic Dark Energy and some particular cases with Corrected Entropies | Sanghati Saha et.al. | 2405.20783 | null | ||
2024-06-20 | The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof | Derek Lim et.al. | 2405.20231 | null | ||
2024-05-28 | Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | Jie Liu et.al. | 2405.18356 | link | ||
2024-05-28 | $C^2M^3$ : Cycle-Consistent Multi-Model Merging | Donato Crisostomi et.al. | 2405.17897 | link | ||
2024-05-27 | Smoothing effects and extinction in finite time for fractional fast diffusions on Riemannian manifolds | Elvise Berchio et.al. | 2405.17126 | null | ||
2024-05-31 | FedSheafHN: Personalized Federated Learning on Graph-structured Data | Wenfei Liang et.al. | 2405.16056 | null | ||
2024-05-27 | HyperInterval: Hypernetwork approach to training weight interval regions in continual learning | Patryk Krukowski et.al. | 2405.15444 | link | ||
2024-05-23 | Scalable Optimization in the Modular Norm | Tim Large et.al. | 2405.14813 | link | ||
2024-06-16 | A refined Weyl character formula for comodules on $\operatorname{GL}_{2,A}$ | Helge Øystein Maakestad et.al. | 2405.09210 | null | ||
2024-05-13 | Localizing Task Information for Improved Model Merging and Compression | Ke Wang et.al. | 2405.07813 | link | ||
2024-05-13 | $α$ VIL: Learning to Leverage Auxiliary Tasks for Multitask Learning | Rafael Kourdis et.al. | 2405.07769 | null | ||
2024-05-12 | Approximation by a new sequence of operators involving Laguerre polynomials | Kapil Kumar et.al. | 2405.07228 | null | ||
2024-05-06 | Swarm intelligence for full Stokes dynamic imaging reconstruction of interferometric data | Alejandro Mus et.al. | 2405.03330 | null | ||
2024-05-04 | Large Deviation Principles of Invariant Measures of Stochastic Reaction-Diffusion Lattice Systems | Bixiang Wang et.al. | 2405.02720 | null | ||
2024-05-03 | The Immersed Inextensible Interface Problem in 2D Stokes Flow | Eduardo García-Juárez et.al. | 2405.02446 | null | ||
2024-05-02 | Customizing Text-to-Image Models with a Single Image Pair | Maxwell Jones et.al. | 2405.01536 | null | ||
2024-04-25 | Robust Fine-tuning for Pre-trained 3D Point Cloud Models | Zhibo Zhang et.al. | 2404.16422 | null | ||
2024-04-23 | The Geometry of the Set of Equivalent Linear Neural Networks | Jonathan Richard Shewchuk et.al. | 2404.14855 | null | ||
2024-04-24 | Nonexistence of solutions to parabolic problems with a potential on weighted graphs | Dario D. Monticelli et.al. | 2404.12058 | null | ||
2024-04-17 | On the relaxation to equilibrium of a quantum oscillator interacting with a radiation field | Pierre-A. Vuillermot et.al. | 2404.11329 | null | ||
2024-04-15 | Higher-curvature gravity in AdS $_3$, holographic $c$ -theorems and black hole microstates | Mariano Chernicoff et.al. | 2404.10128 | null | ||
2024-04-16 | Asymptotic-preserving approximations for stochastic incompressible viscous fluids and SPDEs on graph | Jianbo Cui et.al. | 2404.09168 | null | ||
2024-04-09 | Perspective on Physical Interpretations of Rényi Entropy in Statistical Mechanics | Misaki Ozawa et.al. | 2404.06436 | null | ||
2024-04-09 | A gluing construction of singular solutions for a fully non-linear equation in conformal geometry | María Fernanda Espinal et.al. | 2404.05965 | null | ||
2024-04-05 | Dissipative Euler flows originating from circular vortex filaments | Francisco Gancedo et.al. | 2404.04250 | null | ||
2024-04-05 | Macdonald characters from a new formula for Macdonald polynomials | Houcine Ben Dali et.al. | 2404.03904 | null | ||
2024-04-04 | Fundamental inequalities for the iterated Fourier-cosine convolution with Gaussian weight and its application | Nguyen Thi Hong Phuong et.al. | 2404.03609 | null | ||
2024-03-29 | Embracing Unknown Step by Step: Towards Reliable Sparse Training in Real World | Bowen Lei et.al. | 2403.20047 | link | ||
2024-03-28 | Model Stock: All we need is just a few fine-tuned models | Dong-Hwan Jang et.al. | 2403.19522 | link | ||
2024-03-26 | A location Invariant Statistic-Based Consistent Estimation Method for Three-Parameter Generalized Exponential Distribution | Kiran Prajapat et.al. | 2403.17609 | null | ||
2024-06-03 | FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis | Santosh Sanjeev et.al. | 2403.13341 | link | ||
2024-06-18 | Learning Useful Representations of Recurrent Neural Network Weight Matrices | Vincent Herrmann et.al. | 2403.11998 | link | ||
2024-03-16 | Function-space Parameterization of Neural Networks for Sequential Learning | Aidan Scannell et.al. | 2403.10929 | link | ||
2024-03-14 | Imprints of Barrow-Tsallis Cosmology in Primordial Gravitational Waves | Petr Jizba et.al. | 2403.09797 | null | ||
2024-03-14 | Eigenvariety for partially classical Hilbert modular forms | Mladen Dimitrov et.al. | 2403.09784 | null | ||
2024-03-12 | The solenoidal Heisenberg Virasoro algebra and its simple weight modules | Boujemaa Agrebaoui et.al. | 2403.07381 | null | ||
2024-03-10 | FrameQuant: Flexible Low-Bit Quantization for Transformers | Harshavardhan Adepu et.al. | 2403.06082 | link | ||
2024-03-06 | The solenoidal Virasoro algebra and its simple weight modules | Boujemaa Agrebaoui et.al. | 2403.03753 | null | ||
2024-03-05 | Tensor Decomposition-based Time Varying Channel Estimation for mmWave MIMO-OFDM Systems | Ruizhe Wang et.al. | 2403.02942 | null | ||
2024-03-05 | Neural Redshift: Random Networks are not Random Functions | Damien Teney et.al. | 2403.02241 | null | ||
2024-03-04 | Tiny fluctuations of the averaging process around its degenerate steady state | Federico Sau et.al. | 2403.02032 | null | ||
2024-03-15 | Training-Free Pretrained Model Merging | Zhengqi Xu et.al. | 2403.01753 | link | ||
2024-04-22 | HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances | Supreeth Narasimhaswamy et.al. | 2403.01693 | null | ||
2024-03-13 | TOOLVERIFIER: Generalization to New Tools via Self-Verification | Dheeraj Mekala et.al. | 2402.14158 | link | ||
2024-02-21 | Computing Tangent Spaces to Eigenvarieties | James Rawson et.al. | 2402.13799 | null | ||
2024-05-28 | Neural Network Parameter Diffusion | Kai Wang et.al. | 2402.13144 | link | ||
2024-02-19 | Exponential attractors for a nonlocal delayed reaction-diffusion equation on an unbounded domain | Wenjie Hu et.al. | 2402.11856 | null | ||
2024-02-18 | Discrete Neural Algorithmic Reasoning | Gleb Rodionov et.al. | 2402.11628 | link | ||
2024-02-17 | Uncertainty Quantification of Graph Convolution Neural Network Models of Evolving Processes | Jeremiah Hauth et.al. | 2402.11179 | null | ||
2024-06-06 | Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model Pruning | Tuc Nguyen et.al. | 2402.10639 | null | ||
2024-02-14 | TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction | Xueqi Guo et.al. | 2402.09567 | null | ||
2024-02-14 | The cohomology of $p$ -adic Deligne-Luszitg schemes of Coxeter type | Alexander B. Ivanov et.al. | 2402.09017 | null | ||
2024-02-09 | The Asymptotic Structure of Cosmological Integrals | Paolo Benincasa et.al. | 2402.06558 | null | ||
2024-02-07 | Universal Neural Functionals | Allan Zhou et.al. | 2402.05232 | link | ||
2024-02-06 | Maximal regularity and optimal control for a non-local Cahn-Hilliard tumour growth model | Matteo Fornoni et.al. | 2402.04204 | null | ||
2024-02-06 | Improved Generalization of Weight Space Networks via Augmentations | Aviv Shamsian et.al. | 2402.04081 | null | ||
2024-02-02 | Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion | Zexi Li et.al. | 2402.01342 | null | ||
2024-02-01 | Understanding Neural Network Systems for Image Analysis using Vector Spaces and Inverse Maps | Rebecca Pattichis et.al. | 2402.00261 | null | ||
2024-01-26 | Do deep neural networks utilize the weight space efficiently? | Onur Can Koyun et.al. | 2401.16438 | null | ||
2024-01-22 | On strong growth conditions for weighted spaces of entire functions | Gerhard Schindl et.al. | 2401.14330 | null | ||
2024-01-24 | Task structure and nonlinearity jointly determine learned representational geometry | Matteo Alleman et.al. | 2401.13558 | null | ||
2024-01-25 | Sparse Domination of Singular Bilinear Forms on Non-Homogeneous spaces | Paco Villarroya et.al. | 2401.13130 | null | ||
2024-01-22 | WARM: On the Benefits of Weight Averaged Reward Models | Alexandre Ramé et.al. | 2401.12187 | null | ||
2024-01-17 | Cesàro operators associated with Borel measures acting on weighted spaces of holomorphic functions with sup-norm | Maria José Beltrán Meneu et.al. | 2401.09406 | null | ||
2024-01-15 | Singular fractal dimension at periodicity cascades in parameters spaces | Carlos E. P. Abreu et.al. | 2401.07648 | null | ||
2024-01-17 | Computing Fringe Presentations of Multigraded Persistence Modules | Fabian Lenzen et.al. | 2401.06008 | null | ||
2024-01-10 | Grimoire is All You Need for Enhancing Large Language Models | Ding Chen et.al. | 2401.03385 | link | ||
2024-03-26 | Artificial Intelligence for Operations Research: Revolutionizing the Operations Research Process | Zhenan Fan et.al. | 2401.03244 | null | ||
2023-12-31 | A Compact Representation for Bayesian Neural Networks By Removing Permutation Symmetry | Tim Z. Xiao et.al. | 2401.00611 | link | ||
2023-12-28 | Fractional non-homogeneous counting process | Nick Laskin et.al. | 2312.17389 | null | ||
2023-12-28 | Some unimodal sequences of Kronecker coefficients | Alimzhan Amanov et.al. | 2312.17054 | null | ||
2023-12-24 | The Vlasov-Maxwell-Boltzmann/Landau system with polynomial perturbation near Maxwellian | Chuqi Cao et.al. | 2312.15510 | null | ||
2023-12-22 | Emage: Non-Autoregressive Text-to-Image Generation | Zhangyin Feng et.al. | 2312.14988 | null | ||
2023-12-21 | Hypercyclic shifts on lattice graphs | Anton Baranov et.al. | 2312.13934 | null | ||
2023-12-21 | Scattering for 2d semi-relativistic Hartree equations with short range potential | Changhun Yang et.al. | 2312.13606 | null | ||
2023-12-21 | Entropic Inflation in Presence of Scalar Field | Sergei D. Odintsov et.al. | 2312.13587 | null | ||
2023-12-30 | Time is Encoded in the Weights of Finetuned Language Models | Kai Nylund et.al. | 2312.13401 | link | ||
2023-12-14 | Efficient momentum space approach to superconductivity in quasiperiodic systems | Mao Yoshii et.al. | 2312.09124 | null | ||
2023-12-13 | Best one-sided algebraic approximation by average modulus | Raheam A. Al-Saphory et.al. | 2312.08407 | null | ||
2023-12-19 | Well-Posedness of Quasilinear Parabolic Equations in Time-Weighted Spaces | Bogdan Matioc et.al. | 2312.07974 | null | ||
2023-12-12 | Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models | Arnav Chavan et.al. | 2312.07046 | link | ||
2023-12-11 | Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks | MohammadReza Davari et.al. | 2312.06795 | null | ||
2023-12-08 | Stoichiometry preservation and generalization of Bilger mixture fraction for non-premixed combustion with differential molecular diffusion | Haifeng Wang et.al. | 2312.05204 | null | ||
2023-12-01 | New polyconvolution product for Fourier-cosine and Laplace integral operators and their applications | Trinh Tuan et.al. | 2312.00764 | null | ||
2023-11-30 | Modelling Einstein cluster using Einasto profile | Ritwik Acharyya et.al. | 2311.18622 | null | ||
2023-11-27 | Extraction of the microscopic properties of quasi-particles using deep neural networks | Olga Soloveva et.al. | 2311.15984 | null | ||
2024-01-24 | Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning | Thomas Baldwin-McDonald et.al. | 2311.14828 | null |
Data Distillation
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-09-18 | Applications of Knowledge Distillation in Remote Sensing: A Survey | Yassine Himeur et.al. | 2409.12111 | null |
2024-09-18 | Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction | Jin Jie Sean Yeo et.al. | 2409.11964 | null |
2024-09-18 | Distillation-free Scaling of Large SSMs for Images and Videos | Hamid Suleman et.al. | 2409.11867 | null |
2024-09-18 | EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis | Shaojie Li et.al. | 2409.11817 | null |
2024-09-18 | Efficient Low-Resolution Face Recognition via Bridge Distillation | Shiming Ge et.al. | 2409.11786 | null |
2024-09-18 | RUIE: Retrieval-based Unified Information Extraction using Large Language Model | Xincheng Liao et.al. | 2409.11673 | null |
2024-09-17 | Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model | Derek Jollie et.al. | 2409.11609 | link |
2024-09-17 | Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation | Rui Yu et.al. | 2409.11018 | null |
2024-09-17 | Single-stage TTS with Masked Audio Token Modeling and Semantic Knowledge Distillation | Gerard I. Gállego et.al. | 2409.11003 | null |
2024-09-16 | Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning | Amin Karimi Monsefi et.al. | 2409.10362 | null |
2024-09-16 | Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference | Huy-Dung Nguyen et.al. | 2409.10095 | null |
2024-09-14 | Effective Pre-Training of Audio Transformers for Sound Event Detection | Florian Schmid et.al. | 2409.09546 | link |
2024-09-14 | Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification | Wenhao Yang et.al. | 2409.09389 | null |
2024-09-14 | Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility | Xiaoyu Liu et.al. | 2409.09357 | null |
2024-09-13 | Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection | Dixi Yao et.al. | 2409.08858 | null |
2024-09-13 | AWF: Adaptive Weight Fusion for Enhanced Class Incremental Semantic Segmentation | Zechao Sun et.al. | 2409.08516 | null |
2024-09-12 | DiReDi: Distillation and Reverse Distillation for AIoT Applications | Chen Sun et.al. | 2409.08308 | null |
2024-09-12 | Ruri: Japanese General Text Embeddings | Hayato Tsukagoshi et.al. | 2409.07737 | null |
2024-09-12 | Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios | Xinlei Huang et.al. | 2409.07694 | null |
2024-09-11 | DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer’s Early Diagnosis | Ke Chen et.al. | 2409.07584 | null |
2024-09-11 | EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data | Grégoire Petit et.al. | 2409.07566 | null |
2024-09-11 | Enhancing CTC-Based Visual Speech Recognition | Hendrik Laux et.al. | 2409.07210 | null |
2024-09-11 | A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption | Marcus Rüb et.al. | 2409.07114 | null |
2024-09-16 | Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator | Kangyang Luo et.al. | 2409.06955 | null |
2024-09-10 | Applied Federated Model Personalisation in the Industrial Domain: A Comparative Study | Ilias Siniosoglou et.al. | 2409.06904 | null |
2024-09-10 | EasyST: A Simple Framework for Spatio-Temporal Prediction | Jiabin Tang et.al. | 2409.06748 | link |
2024-09-10 | Knowledge Distillation via Query Selection for Detection Transformer | Yi Liu et.al. | 2409.06443 | null |
2024-09-10 | Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition | Junzheng Zhang et.al. | 2409.06371 | null |
2024-09-09 | Joint Input and Output Coordination for Class-Incremental Learning | Shuai Wang et.al. | 2409.05620 | null |
2024-09-09 | LEROjD: Lidar Extended Radar-Only Object Detection | Patrick Palmer et.al. | 2409.05564 | link |
2024-09-09 | Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition | Shiming Ge et.al. | 2409.05384 | null |
2024-09-09 | FedBrain-Distill: Communication-Efficient Federated Brain Tumor Classification Using Ensemble Knowledge Distillation on Non-IID Data | Rasoul Jafari Gohari et.al. | 2409.05359 | link |
2024-09-07 | LoCa: Logit Calibration for Knowledge Distillation | Runming Yang et.al. | 2409.04778 | null |
2024-09-06 | SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields | Yuze Wang et.al. | 2409.04482 | null |
2024-09-05 | Experimentation in Content Moderation using RWKV | Umut Yildirim et.al. | 2409.03939 | null |
2024-09-05 | Data-Efficient Generation for Dataset Distillation | Zhe Li et.al. | 2409.03929 | null |
2024-09-05 | DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture | Qianlong Xiang et.al. | 2409.03550 | null |
2024-09-05 | Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration | Pei Wang et.al. | 2409.03455 | null |
2024-09-05 | Efficient Image Compression Using Advanced State Space Models | Bouzid Arezki et.al. | 2409.02743 | null |
2024-09-04 | CLDA: Collaborative Learning for Enhanced Unsupervised Domain Adaptation | Minhee Cho et.al. | 2409.02699 | null |
2024-09-04 | Low-Resolution Object Recognition with Cross-Resolution Relational Contrastive Distillation | Kangkai Zhang et.al. | 2409.02555 | null |
2024-09-04 | A design of magnetic tunnel junctions for the deployment of neuromorphic hardware for edge computing | Davi Rodrigues et.al. | 2409.02528 | null |
2024-09-04 | Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation | Yilong Chen et.al. | 2409.02438 | null |
2024-09-03 | Low-Resolution Face Recognition via Adaptable Instance-Relation Distillation | Ruixin Shi et.al. | 2409.02049 | null |
2024-09-03 | Efficient Point Cloud Classification via Offline Distillation Framework and Negative-Weight Self-Distillation Technique | Qiang Zheng et.al. | 2409.02020 | null |
2024-09-03 | Contemporary Model Compression on Large Language Models Inference | Dong Liu et.al. | 2409.01990 | null |
2024-09-05 | Adaptive Explicit Knowledge Transfer for Knowledge Distillation | Hyungkeun Park et.al. | 2409.01679 | null |
2024-09-03 | Improving Apple Object Detection with Occlusion-Enhanced Distillation | Liang Geng et.al. | 2409.01573 | null |
2024-09-02 | Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning | Vyacheslav Kungurtsev et.al. | 2409.01410 | null |
2024-09-02 | MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation | Zewen Chen et.al. | 2409.01212 | link |
2024-09-04 | Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning | Jinglin Liang et.al. | 2409.01128 | link |
2024-09-02 | Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment | Aditya Bansal et.al. | 2409.00880 | null |
2024-09-01 | LanguaShrink: Reducing Token Overhead with Psycholinguistics | Xuechen Liang et.al. | 2409.00855 | null |
2024-08-30 | How Knowledge Distillation Mitigates the Synthetic Gap in Fair Face Recognition | Pedro C. Neto et.al. | 2408.17399 | link |
2024-08-30 | HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution | Masoomeh Aslahishahri et.al. | 2408.16959 | link |
2024-08-29 | VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition | Zaiwei Zhang et.al. | 2408.16930 | null |
2024-08-29 | Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling | Hritik Bansal et.al. | 2408.16737 | null |
2024-08-29 | MST-KD: Multiple Specialized Teachers Knowledge Distillation for Fair Face Recognition | Eduarda Caldeira et.al. | 2408.16563 | link |
2024-08-29 | UDD: Dataset Distillation via Mining Underutilized Regions | Shiguang Wang et.al. | 2408.16268 | null |
2024-08-29 | Neural Spectral Decomposition for Dataset Distillation | Shaolei Yang et.al. | 2408.16236 | null |
2024-08-28 | EMP: Enhance Memory in Data Pruning | Jinying Xiao et.al. | 2408.16031 | null |
2024-08-28 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
2024-08-28 | ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation | Tiantian Feng et.al. | 2408.15803 | null |
2024-08-28 | Online pre-training with long-form videos | Itsuki Kato et.al. | 2408.15651 | null |
2024-08-28 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
2024-08-27 | Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification | Yiqiang Cai et.al. | 2408.14862 | link |
2024-08-26 | Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems | Nikhil Khani et.al. | 2408.14678 | null |
2024-08-26 | TSAK: Two-Stage Semantic-Aware Knowledge Distillation for Efficient Wearable Modality and Model Optimization in Manufacturing Lines | Hymalai Bello et.al. | 2408.14146 | null |
Schrodinger Bridge
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-09-18 | Massively Multi-Person 3D Human Motion Forecasting with Scene Context | Felix B Mueller et.al. | 2409.12189 | null |
2024-09-18 | MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | null |
2024-09-18 | Cyclicity Analysis of the Ornstein-Uhlenbeck Process | Vivek Kaushik et.al. | 2409.12102 | null |
2024-09-18 | Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance | Jaehoon Joo et.al. | 2409.12099 | null |
2024-09-18 | Denoising diffusion models for high-resolution microscopy image restoration | Pamela Osuna-Vargas et.al. | 2409.12078 | null |
2024-09-18 | SFDA-rPPG: Source-Free Domain Adaptive Remote Physiological Measurement with Spatio-Temporal Consistency | Yiping Xie et.al. | 2409.12040 | null |
2024-09-18 | LEMON: Localized Editing with Mesh Optimization and Neural Shaders | Furkan Mert Algan et.al. | 2409.12024 | null |
2024-09-18 | Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models | Lorenzo Mandelli et.al. | 2409.11920 | null |
2024-09-18 | DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech | Xin Qi et.al. | 2409.11835 | null |
2024-09-18 | RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets | Jikai Ye et.al. | 2409.11831 | null |
2024-09-18 | InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models | Yan Zheng et.al. | 2409.11734 | null |
2024-09-18 | GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation | Shuowen Liang et.al. | 2409.11689 | null |
2024-09-18 | Recurrent Interpolants for Probabilistic Time Series Prediction | Yu Chen et.al. | 2409.11684 | null |
2024-09-18 | SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation | Mingze Sun et.al. | 2409.11682 | null |
2024-09-18 | Electromagnetic Property Sensing and Channel Reconstruction Based on Diffusion Schrödinger Bridge in ISAC | Yuhua Jiang et.al. | 2409.11651 | null |
2024-09-17 | Ultrasound Image Enhancement with the Variance of Diffusion Models | Yuxin Zhang et.al. | 2409.11380 | link |
2024-09-17 | OSV: One Step is Enough for High-Quality Image to Video Generation | Xiaofeng Mao et.al. | 2409.11367 | null |
2024-09-17 | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Gonzalo Martin Garcia et.al. | 2409.11355 | link |
2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340 | link |
2024-09-17 | Parameter dependent rough SDEs with applications to rough PDEs | Fabio Bugini et.al. | 2409.11330 | null |
2024-09-17 | fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction | Jianxiong Gao et.al. | 2409.11315 | null |
2024-09-17 | DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models | Avirup Das et.al. | 2409.11292 | null |
2024-09-17 | Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models | Tianqi Chen et.al. | 2409.11219 | null |
2024-09-17 | High-Resolution Speech Restoration with Latent Diffusion Model | Tushar Dhyani et.al. | 2409.11145 | null |
2024-09-17 | In-situ measurements of light diffusion in an optically dense atomic ensemble | Antoine Glicenstein et.al. | 2409.11117 | null |
2024-09-17 | TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation | Yansong Wu et.al. | 2409.11047 | null |
2024-09-17 | Enhanced segmentation of femoral bone metastasis in CT scans of patients using synthetic data generation with 3D diffusion models | Emile Saillard et.al. | 2409.11011 | null |
2024-09-17 | Local discontinuous Galerkin method for nonlinear BSPDEs of Neumann boundary conditions with deep backward dynamic programming time-marching | Yixiang Dai et.al. | 2409.11004 | null |
2024-09-17 | Edge-based Denoising Image Compression | Ryugo Morita et.al. | 2409.10978 | null |
2024-09-17 | CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement | Xuanzhao Dong et.al. | 2409.10966 | null |
2024-09-16 | Incorporating Classifier-Free Guidance in Diffusion Model-Based Recommendation | Noah Buchanan et.al. | 2409.10494 | null |
2024-09-16 | SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing | Qi Qian et.al. | 2409.10476 | null |
2024-09-16 | MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion | Lehong Wu et.al. | 2409.10473 | null |
2024-09-16 | Mamba-ST: State Space Model for Efficient Style Transfer | Filippo Botti et.al. | 2409.10385 | link |
2024-09-16 | Stochastic Control of UAVs: An Optimal Tradeoff between Performance, Flight Smoothness and Control Effort | George Rapakoulias et.al. | 2409.10369 | null |
2024-09-16 | Taming Diffusion Models for Image Restoration: A Review | Ziwei Luo et.al. | 2409.10353 | null |
2024-09-16 | Fairness, not Emotion, Drives Socioeconomic Decision Making | Rudra Mukhopadhyay et.al. | 2409.10322 | null |
2024-09-16 | DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis | Fa-Ting Hong et.al. | 2409.10281 | null |
2024-09-16 | RealDiff: Real-world 3D Shape Completion using Self-Supervised Diffusion Models | Başak Melis Öcal et.al. | 2409.10180 | null |
2024-09-16 | PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion | Peng Li et.al. | 2409.10141 | null |
2024-09-16 | Approximating the signature of Brownian motion for high order SDE simulation | James Foster et.al. | 2409.10118 | null |
2024-09-16 | DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection | Kun Fang et.al. | 2409.10094 | null |
2024-09-16 | MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior | Weijing Tao et.al. | 2409.10090 | link |
2024-09-16 | Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based models | Alexander Koch et.al. | 2409.10089 | null |
2024-09-16 | A Riemannian Approach to Ground Metric Learning for Optimal Transport | Pratik Jawanpuria et.al. | 2409.10085 | null |
2024-09-13 | Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation | Qingwen Bu et.al. | 2409.09016 | link |
2024-09-13 | A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis | Yohan Poirier-Ginter et.al. | 2409.08947 | null |
2024-09-13 | Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation | Guojun Liang et.al. | 2409.08917 | link |
2024-09-13 | Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling | Nebiyou Yismaw et.al. | 2409.08906 | null |
2024-09-13 | Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control | Carles Domingo-Enrich et.al. | 2409.08861 | null |
2024-09-13 | InstantDrag: Improving Interactivity in Drag-based Image Editing | Joonghyuk Shin et.al. | 2409.08857 | null |
2024-09-13 | DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s) | Yun Su Jeong et.al. | 2409.08850 | null |
2024-09-13 | Measure-Theoretic Time-Delay Embedding | Jonah Botvinick-Greenhouse et.al. | 2409.08768 | link |
2024-09-13 | DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset | Jiawei Du et.al. | 2409.08731 | null |
2024-09-13 | Asymptotics for Random Quadratic Transportation Costs | Martin Huesmann et.al. | 2409.08612 | null |
2024-09-13 | Finite-time thermodynamic bounds and tradeoff relations for information processing | Takuya Kamijima et.al. | 2409.08606 | null |
2024-09-13 | STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment | Yong Ren et.al. | 2409.08601 | null |
2024-09-13 | LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling | Yubo Huang et.al. | 2409.08583 | null |
2024-09-13 | DiffFAS: Face Anti-Spoofing via Generative Diffusion Models | Xinxu Ge et.al. | 2409.08572 | link |
2024-09-13 | Think Twice Before You Act: Improving Inverse Problem Solving With MCMC | Yaxuan Zhu et.al. | 2409.08551 | null |
2024-09-12 | DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors | Thomas Hanwen Zhu et.al. | 2409.08278 | null |
2024-09-12 | DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer | Runjia Li et.al. | 2409.08271 | null |
2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269 | null |
2024-09-12 | Improving Text-guided Object Inpainting with Semantic Pre-inpainting | Yifu Chen et.al. | 2409.08260 | link |
2024-09-12 | Improving Virtual Try-On with Garment-focused Diffusion Models | Siqi Wan et.al. | 2409.08258 | null |
2024-09-12 | LoRID: Low-Rank Iterative Diffusion for Adversarial Purification | Geigh Zollicoffer et.al. | 2409.08255 | null |
2024-09-12 | Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding | Hongyu Li et.al. | 2409.08251 | null |
2024-09-12 | IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation | Yinwei Wu et.al. | 2409.08240 | null |
2024-09-12 | How can the tragedy of the commons be prevented?: Introducing Linear Quadratic Mixed Mean Field Games | Gokce Dayanikli et.al. | 2409.08235 | null |
2024-09-12 | LT3SD: Latent Trees for 3D Scene Diffusion | Quan Meng et.al. | 2409.08215 | null |
2024-09-12 | VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis | Hao Chen et.al. | 2409.08207 | null |
2024-09-12 | MagicStyle: Portrait Stylization Based on Reference Image | Zhaoli Deng et.al. | 2409.08156 | null |
2024-09-12 | EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance | Zicheng Duan et.al. | 2409.08091 | null |
2024-09-12 | Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation | Junsung Lee et.al. | 2409.08077 | null |
2024-09-12 | AI-accelerated discovery of high critical temperature superconductors | Xiao-Qi Han et.al. | 2409.08065 | null |
2024-09-11 | DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation | Haibo Yang et.al. | 2409.07454 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452 | link |
2024-09-11 | FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process | Yang Luo et.al. | 2409.07451 | null |
2024-09-11 | Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging | Yunzhen Wang et.al. | 2409.07417 | null |
2024-09-11 | Training-Free Guidance for Discrete Diffusion Models for Molecular Generation | Thomas J. Kerby et.al. | 2409.07359 | null |
2024-09-11 | Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching | Eugenio Chisari et.al. | 2409.07343 | null |
2024-09-11 | Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models | Fengzhe Zhang et.al. | 2409.07323 | null |
2024-09-11 | Exploring User-level Gradient Inversion with a Diffusion Prior | Zhuohang Li et.al. | 2409.07291 | null |
2024-09-11 | CCFExp: Facial Image Synthesis with Cycle Cross-Fusion Diffusion Model for Facial Paralysis Individuals | Weixiang Gao et.al. | 2409.07271 | link |
2024-09-11 | Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models | Sanoojan Baliah et.al. | 2409.07269 | link |
2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | null |
2024-09-12 | Alignment of Diffusion Models: Fundamentals, Challenges, and Future | Buhua Liu et.al. | 2409.07253 | link |
2024-09-11 | Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning | Yingling Lu et.al. | 2409.07238 | link |
2024-09-11 | Phy124: Fast Physics-Driven 4D Content Generation from a Single Image | Jiajing Lin et.al. | 2409.07179 | null |
2024-09-11 | Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models | Jiahang Cao et.al. | 2409.07163 | null |
2024-09-10 | SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Teng Hu et.al. | 2409.06633 | null |
2024-09-10 | One-Shot Imitation under Mismatched Execution | Kushal Kedia et.al. | 2409.06615 | null |
2024-09-10 | Modelling Global Trade with Optimal Transport | Thomas Gaskin et.al. | 2409.06554 | link |
2024-09-10 | Robust financial calibration: a Bayesian approach for neural SDEs | Christa Cuchiero et.al. | 2409.06551 | link |
2024-09-10 | Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models | Xin Jing et.al. | 2409.06451 | null |
2024-09-10 | Robust semi-parametric signal detection in particle physics with classifiers decorrelated via optimal transport | Purvasha Chakravarti et.al. | 2409.06399 | null |
2024-09-10 | Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition | Junzheng Zhang et.al. | 2409.06371 | null |
2024-09-10 | What happens to diffusion model likelihood when your model is conditional? | Mattias Cross et.al. | 2409.06364 | null |
2024-09-10 | DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement | Jia-Wei Liao et.al. | 2409.06355 | null |
2024-09-10 | Geometry of the Space of Partitioned Networks: A Unified Theoretical and Computational Framework | Stephen Y Zhang et.al. | 2409.06302 | link |
2024-09-10 | Multi-Source Music Generation with Latent Diffusion | Zhongweiyang Xu et.al. | 2409.06190 | link |
2024-09-10 | MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control | Yining Yao et.al. | 2409.06189 | null |
2024-09-10 | EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation | Nischal Khanal et.al. | 2409.06183 | link |
2024-09-09 | Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer | Michele Mancusi et.al. | 2409.06096 | null |
2024-09-09 | SVS-GAN: Leveraging GANs for Semantic Video Synthesis | Khaled M. Seyam et.al. | 2409.06074 | null |
2024-09-09 | Enhancing Preference-based Linear Bandits via Human Response Time | Shen Li et.al. | 2409.05798 | null |
2024-09-09 | Vector Quantized Diffusion Model Based Speech Bandwidth Extension | Yuan Fang et.al. | 2409.05784 | null |
2024-09-09 | AS-Speech: Adaptive Style For Speech Synthesis | Zhipeng Li et.al. | 2409.05730 | null |
2024-09-09 | Distributionally Robust Stochastic Data-Driven Predictive Control with Optimized Feedback Gain | Ruiqi Li et.al. | 2409.05727 | null |
2024-09-09 | Quantitative approximation of stochastic kinetic equations: from discrete to continuum | Zimo Hao et.al. | 2409.05706 | null |
2024-09-09 | pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning | Jiahao Lai et.al. | 2409.05701 | null |
2024-09-09 | Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models | Aakash Sen Sharma et.al. | 2409.05668 | null |
2024-09-09 | Forward KL Regularized Preference Optimization for Aligning Diffusion Policies | Zhao Shan et.al. | 2409.05622 | null |
2024-09-09 | CipherDM: Secure Three-Party Inference for Diffusion Model Sampling | Xin Zhao et.al. | 2409.05414 | null |
2024-09-09 | Sequential Posterior Sampling with Diffusion Models | Tristan S. W. Stevens et.al. | 2409.05399 | null |
2024-09-09 | TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors | Yichuan Mo et.al. | 2409.05294 | link |
2024-09-08 | The Stochastic Gause predator-prey model: noise-induced extinctions and invariance | Leon Alexander Valencia et.al. | 2409.05237 | null |
2024-09-08 | Nuclear transparencies with a two step process of the $A(e,e’π^+)$ reactions | Tae Keun Choi et.al. | 2409.05129 | null |
2024-09-08 | Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule | Siyi Wang et.al. | 2409.05116 | null |
2024-09-08 | A Survey on Diffusion Models for Recommender Systems | Jianghao Lin et.al. | 2409.05033 | link |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429 | null |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424 | null |
2024-09-06 | How Fair is Your Diffusion Recommender Model? | Daniele Malitesta et.al. | 2409.04339 | null |
2024-09-06 | Random effects estimation in a fractional diffusion model based on continuous observations | Nesrine Chebli et.al. | 2409.04331 | null |
2024-09-06 | Probabilistic Representation for Viscosity Solutions to Double-Obstacle Quasi-Variational Inequalities | Magnus Perninge et.al. | 2409.04207 | null |
2024-09-06 | Breaking the Brownian Barrier: Models and Manifestations of Molecular Diffusion in Complex Fluids | Harish Srinivasan et.al. | 2409.04199 | null |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196 | null |
2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
2024-09-06 | A policy iteration algorithm for non-Markovian control problems | Dylan Possamaï et.al. | 2409.04037 | null |
2024-09-06 | One-Shot Diffusion Mimicker for Handwritten Text Generation | Gang Dai et.al. | 2409.04004 | link |
2024-09-06 | DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | Jianbiao Mei et.al. | 2409.04003 | link |
2024-09-05 | Data-Efficient Generation for Dataset Distillation | Zhe Li et.al. | 2409.03929 | null |
2024-09-05 | Generating High Dimensional User-Specific Wireless Channels using Diffusion Models | Taekyun Lee et.al. | 2409.03924 | null |
2024-09-05 | Neural Entropy | Akhil Premkumar et.al. | 2409.03817 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | link |
2024-09-05 | ArtiFade: Learning to Generate High-quality Subject from Blemished Images | Shuya Yang et.al. | 2409.03745 | null |
2024-09-05 | Quantum optimal transport with convex regularization | Emanuele Caputo et.al. | 2409.03698 | null |
2024-09-05 | RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images | Benzhi Wang et.al. | 2409.03644 | null |
2024-09-05 | DiffEVC: Any-to-Any Emotion Voice Conversion with Expressive Guidance | Hsing-Hang Chou et.al. | 2409.03636 | null |
2024-09-05 | TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces | Bernardo Biesseck et.al. | 2409.03600 | link |
2024-09-05 | DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture | Qianlong Xiang et.al. | 2409.03550 | null |
2024-09-05 | On the mean field limit of consensus based methods | Marvin Koß et.al. | 2409.03518 | null |
2024-09-05 | Blended Latent Diffusion under Attention Control for Real-World Video Editing | Deyin Liu et.al. | 2409.03514 | null |
2024-09-05 | Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration | Pei Wang et.al. | 2409.03455 | null |
2024-09-05 | Recursive Quantization for $\mathcal{L}_2$ Stabilization of a Finite Capacity Stochastic Control Loop with Intermittent State Observations | Shrija Karmakar et.al. | 2409.03398 | null |
2024-09-05 | Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning | Huaxi Huang et.al. | 2409.03326 | null |
2024-09-05 | SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model | Weipeng Tan et.al. | 2409.03270 | null |
2024-09-05 | RoomDiffusion: A Specialized Diffusion Model in the Interior Design Industry | Zhaowei Wang et.al. | 2409.03198 | null |
2024-09-04 | Spatial Diffusion for Cell Layout Generation | Chen Li et.al. | 2409.03106 | link |
2024-09-04 | HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts | Xinyu Liu et.al. | 2409.02919 | link |
2024-09-04 | Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling | Kaiwen Zheng et.al. | 2409.02908 | null |
2024-09-04 | Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models | Zhibin Liu et.al. | 2409.02851 | link |
2024-09-04 | Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model | Tornike Karchkhadze et.al. | 2409.02845 | null |
2024-09-04 | Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects | Kyungmin Jo et.al. | 2409.02653 | null |
2024-09-04 | MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos | Junyi Ma et.al. | 2409.02638 | null |
2024-09-04 | Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency | Jianwen Jiang et.al. | 2409.02634 | null |
2024-09-04 | Rate-Adaptive Generative Semantic Communication Using Conditional Diffusion Models | Pujing Yang et.al. | 2409.02597 | null |
2024-09-04 | Solving Video Inverse Problems Using Image Diffusion Models | Taesung Kwon et.al. | 2409.02574 | null |
2024-09-04 | StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models | Wen Li et.al. | 2409.02543 | link |
2024-09-04 | Sample what you cant compress | Vighnesh Birodkar et.al. | 2409.02529 | null |
2024-09-04 | Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal | Jifeng Hu et.al. | 2409.02512 | link |
2024-09-04 | Demographic parity in regression and classification within the unawareness framework | Vincent Divol et.al. | 2409.02471 | null |
2024-09-04 | Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis | Aishwarya Agarwal et.al. | 2409.02429 | null |
2024-09-04 | Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering | Peng Wang et.al. | 2409.02426 | link |
2024-08-30 | Subspace Diffusion Posterior Sampling for Travel-Time Tomography | Xiang Cao et.al. | 2408.17333 | null |
2024-08-30 | Likelihood estimation for stochastic differential equations with mixed effects | Fernando Baltazar-Larios et.al. | 2408.17257 | null |
2024-08-30 | The random periodic solutions for McKean-Vlasov stochastic differential equations | Jianhai Bao et.al. | 2408.17242 | null |
2024-08-30 | A methodological framework for Resilience as a Service (RaaS) in multimodal urban transportation networks | Sara Jaber et.al. | 2408.17233 | null |
2024-09-02 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | Avideep Mukherjee et.al. | 2408.17095 | null |
2024-09-02 | Instant Adversarial Purification with Adversarial Consistency Distillation | Chun Tong Lei et.al. | 2408.17064 | null |
2024-08-30 | Text-to-Image Generation Via Energy-Based CLIP | Roy Ganz et.al. | 2408.17046 | null |
2024-08-30 | High-fidelity holographic beam shaping with optimal transport and phase diversity | Hunter Swan et.al. | 2408.17025 | null |
2024-08-30 | Contrastive Learning with Synthetic Positives | Dewen Zeng et.al. | 2408.16965 | link |
2024-09-02 | Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis | Theodoros Kouzelis et.al. | 2408.16845 | null |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767 | null |
2024-09-04 | CSGO: Content-Style Composition in Text-to-Image Generation | Peng Xing et.al. | 2408.16766 | null |
2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
2024-09-02 | RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model | Zhuan Shi et.al. | 2408.16634 | null |
2024-08-29 | A Score-based Generative Solver for PDE-constrained Inverse Problems with Complex Priors | Yankun Hong et.al. | 2408.16626 | null |