arXiv.org on Hacker News

86 Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP 27 4 May 130 The Matrix: A Bayesian learning model for LLMs 10 4 May 37 CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions 1 4 May 76 StructLM: Towards Building Generalist Models for Structured Knowledge Grounding 0 3 May 298 Better and Faster Large Language Models via Multi-Token Prediction 126 1 May 19 Iterative reasoning preference optimization 4 1 May 76 Building a Large Japanese Web Corpus for Large Language Models 29 30 Apr 46 Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Models 4 30 Apr 33 RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation 3 30 Apr 181 LoRA+: Efficient Low Rank Adaptation of Large Models 46 28 Apr 15 Step Differences in Instructional Video 0 28 Apr 159 Let's Think Dot by Dot: Hidden Computation in Transformer Language Models 32 27 Apr 73 Relational Graph Convolutional Networks for Sentiment Analysis 3 26 Apr 78 One bad apple can spoil your IPv6 privacy (2022) 59 26 Apr 48 CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data 4 25 Apr 100 Quaternion Knowledge Graph Embeddings (2019) 41 25 Apr 135 Removing Reflections from RAW Photos 30 24 Apr 126 Claude 3 beats Google Translate 118 23 Apr 411 Phi-3 Technical Report 129 23 Apr 128 FPGA Architecture for Deep Learning: Survey and Future Directions 52 22 Apr 77 Survey Study on AI Agent Architectures (2024) 16 22 Apr 62 Many-Shot In-Context Learning 1 22 Apr 47 RecurrentGemma: Moving Past Transformers for Efficient Open Language Models 3 22 Apr 136 Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding 23 21 Apr 45 Eight Transaction Papers by Jim Gray 9 19 Apr 124 Chinchilla Scaling: A replication attempt 68 18 Apr 92 Collapse of self-trained language models 30 17 Apr 88 The Ballmer Peak: An Empirical Search 24 17 Apr 168 Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length 28 16 Apr 124 ResearchAgent: Iterative Research Idea Generation Using LLMs 63 16 Apr 38 We have no idea how models will behave in production until production 3 15 Apr 29 ChatGPT Can Predict the Future Telling Stories Set in the Future About the Past 8 14 Apr 13 Mechanics of Next Token Prediction with Self-Attention 0 13 Apr 119 Your LLM Is a Capable Regressor When Given In-Context Examples 36 13 Apr 59 Fine-Tuning Increases LLM Vulnerabilities and Risk 33 11 Apr 14 Autonomous LLM agents with human-out-of-loop 8 11 Apr 39 Leave No Context Behind: Efficient Infinite Context Transformers 4 11 Apr 24 Toward Inference-Optimal Mixture-of-Expert Large Language Models 0 10 Apr 16 A Survey on Red Teaming for Generative Models 0 10 Apr 71 Evaluating faithfulness and content selection of LLMs in book-length summaries 7 9 Apr 21 AI consciousness is inevitable: A theoretical computer science perspective 54 9 Apr 104 Social Skill Training with Large Language Models 100 9 Apr 53 Apple Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs 7 9 Apr 52 Direct Nash Optimization: Teaching language models to self-improve 11 8 Apr 71 Nightfall: Can Kalgash Exist (2014) 10 8 Apr 281 Mixture-of-Depths: Dynamically allocating compute in transformers 83 7 Apr 29 Rendering string diagrams recursively [pdf] 4 7 Apr 54 Sophia: Scalable Stochastic 2nd-Order Optimizer for Language Model Pre-Training 2 7 Apr 288 More Agents Is All You Need: LLMs performance scales with the number of agents 206 6 Apr 26 Long-form factuality in large language models 16 6 Apr