new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jun 24

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques to scale up across multiple GPUs within a node and/or scale out across multiple nodes. However, this approach suffers from the high computational costs of graph partitioning algorithms and inefficient communication across GPUs. To address these overheads, we propose Large-scale Storage-based Multi-GPU GNN framework (LSM-GNN), a storagebased approach to train GNN models that utilizes a novel communication layer enabling GPU software caches to function as a system-wide shared cache with low overheads.LSM-GNN incorporates a hybrid eviction policy that intelligently manages cache space by using both static and dynamic node information to significantly enhance cache performance. Furthermore, we introduce the Preemptive Victim-buffer Prefetcher (PVP), a mechanism for prefetching node feature data from a Victim Buffer located in CPU pinned-memory to further reduce the pressure on the storage devices. Experimental results show that despite the lower compute capabilities and memory capacities, LSM-GNN in a single node with two GPUs offers superior performance over two-node-four-GPU Dist-DGL baseline and provides up to 3.75x speed up on end-to-end epoch time while running large-scale GNN training

  • 6 authors
·
Jul 21, 2024

LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation

Conventional medical image segmentation methods have been found inadequate in facilitating physicians with the identification of specific lesions for diagnosis and treatment. Given the utility of text as an instructional format, we introduce a novel task termed Medical Image Referring Segmentation (MIRS), which requires segmenting specified lesions in images based on the given language expressions. Due to the varying object scales in medical images, MIRS demands robust vision-language modeling and comprehensive multi-scale interaction for precise localization and segmentation under linguistic guidance. However, existing medical image segmentation methods fall short in meeting these demands, resulting in insufficient segmentation accuracy. In response, we propose an approach named Language-guided Scale-aware MedSegmentor (LSMS), incorporating two appealing designs: (1)~a Scale-aware Vision-Language Attention module that leverages diverse convolutional kernels to acquire rich visual knowledge and interact closely with linguistic features, thereby enhancing lesion localization capability; (2)~a Full-Scale Decoder that globally models multi-modal features across various scales, capturing complementary information between scales to accurately outline lesion boundaries. Addressing the lack of suitable datasets for MIRS, we constructed a vision-language medical dataset called Reference Hepatic Lesion Segmentation (RefHL-Seg). This dataset comprises 2,283 abdominal CT slices from 231 cases, with corresponding textual annotations and segmentation masks for various liver lesions in images. We validated the performance of LSMS for MIRS and conventional medical image segmentation tasks across various datasets. Our LSMS consistently outperforms on all datasets with lower computational costs. The code and datasets will be released.

  • 7 authors
·
Aug 30, 2024

Characterize LSM-tree Compaction Performance via On-Device LLM Inference

Modern key-value storage engines built on Log-Structured Merge-trees (LSM-trees), such as RocksDB and LevelDB, rely heavily on the performance of their compaction operations, which are impacted by a complex set of interdependent configuration parameters. Manually tuning these parameters for optimal performance demands considerable expertise, while traditional auto-tuning approaches struggle with the enormous search space and low sample efficiency inherent to this domain. In recent years, Large Language Models (LLMs) have demonstrated strong capabilities in code generation and logical reasoning, offering new possibilities for system optimization. However, applying LLMs to real-time compaction tuning in such latency-sensitive environments is a double-edged sword. While large-scale LLMs can offer superior reasoning for strategy generation, their high inference latency and computational cost make them impractical for interactive, low-latency tuning. In contrast, small-scale LLMs achieve low latency but often at the expense of reasoning accuracy and tuning effectiveness. In this paper, we first evaluate this trade-off by analyzing the compaction-tuning performance and inference latency of LLMs at different scales in an LSM-tree-based tuning case. We then characterize the performance of LSM-tree on RocksDB v8.8.1, with a focus on adjusting the key compaction-related parameters under db_bench workloads. Our experimental results show a clear positive correlation between model capability and tuning effectiveness.

  • 5 authors
·
Feb 12