Article, 2024

QoS-aware edge AI placement and scheduling with multiple implementations in FaaS-based edge computing

Future Generation Computer Systems, ISSN 1872-7115, 0167-739X, Volume 157, Pages 250-263, 10.1016/j.future.2024.03.035

Contributors

Hudson, Nathaniel C 0000-0001-7474-2689 (Corresponding author) [1] [2] Khamfroush, Hana [3] Baughman, Matt [1] Lucani, Daniel Enrique 0000-0001-5325-8863 [4] Chard, Kyle 0000-0002-7370-4805 [1] [2] Foster, Ian 0000-0003-2129-5269 [1] [2]

Affiliations

  1. [1] University of Chicago
  2. [NORA names: United States; America, North; OECD];
  3. [2] Argonne National Laboratory
  4. [NORA names: United States; America, North; OECD];
  5. [3] University of Kentucky
  6. [NORA names: United States; America, North; OECD];
  7. [4] Aarhus University
  8. [NORA names: AU Aarhus University; University; Denmark; Europe, EU; Nordic; OECD]

Abstract

Resource constraints on the computing continuum require that we make smart decisions for serving AI-based services at the network edge. AI-based services typically have multiple implementations (e.g., image classification implementations include SqueezeNet, DenseNet, and others) with varying trade-offs (e.g., latency and accuracy). The question then is how should AI-based services be placed across Function-as-a-Service (FaaS) based edge computing systems in order to maximize total Quality-of-Service (QoS). To address this question, we propose a problem that jointly aims to solve (i) edge AI service placement and (ii) request scheduling. These are done across two time-scales (one for placement and one for scheduling). We first cast the problem as an integer linear program. We then decompose the problem into separate placement and scheduling subproblems and prove that both are NP-hard. We then propose a novel placement algorithm that places services while considering device-to-device communication across edge clouds to offload requests to one another. Our results show that the proposed placement algorithm is able to outperform a state-of-the-art placement algorithm for AI-based services, and other baseline heuristics, with regard to maximizing total QoS. Additionally, we present a federated learning-based framework, FLIES, to predict the future incoming service requests and their QoS requirements. Our results also show that our FLIES algorithm is able to outperform a standard decentralized learning baseline for predicting incoming requests and show comparable predictive performance when compared to centralized training.

Keywords

AI-based services, FAA, Function-as-a-Service, NP-hard, QoS, QoS requirements, QoS., algorithm, baseline, baseline heuristic, centralized training, cloud, communication, computer, computing continuum, constraints, continuum, decision, device-to-device communication, edge, edge cloud, edge computing, flies, fly algorithm, framework, heuristics, implementation, income, incoming requests, incoming service requests, integer, integer linear programming, learning baselines, learning-based framework, linear programming, multiple implementations, network, network edge, novel placement algorithm, offloading requests, performance, placement, placement algorithm, predictive performance, problem, program, proposed placement algorithm, quality-of-service, requests, requirements, resource constraints, resources, results, scheduling, scheduling subproblem, separator placement, service placement, service requests, services, state-of-the-art placement algorithms, subproblems, time-scales, trade-offs, training

Funders

  • Directorate for Computer & Information Science & Engineering
  • United States Department of Energy

Data Provider: Digital Science