Eliminating the Middleman: Peer-To-Peer Dataflow
Essay by Woxman • March 7, 2012 • Research Paper • 6,035 Words (25 Pages) • 1,713 Views
Eliminating The Middleman: Peer-to-Peer Dataflow
Adam Barker
National e-Science Centre
University of Edinburgh
a.d.barker@ed.ac.uk
Jon B. Weissman
University of Minnesota,
Minneapolis, MN, USA.
jon@cs.umn.edu
Jano van Hemert
National e-Science Centre
University of Edinburgh
j.vanhemert@ed.ac.uk
ABSTRACT
Efficiently executing large-scale, data-intensive workflows such
as Montage must take into account the volume and pattern
of communication. When orchestrating data-centric workflows,
centralised servers common to standard workflow systems
can become a bottleneck to performance. However,
standards-based workflow systems that rely on centralisation,
e.g., Web service based frameworks, have many other
benefits such as a wide user base and sustained support.
This paper presents and evaluates a light-weight hybrid
architecture which maintains the robustness and simplicity
of centralised orchestration, but facilitates choreography by
allowing services to exchange data directly with one another.
Furthermore our architecture is standards compliment, flexible
and is a non-disruptive solution; service definitions do
not have to be altered prior to enactment. Our architecture
could be realised within any existing workflow framework,
in this paper, we focus on a Web service based framework.
Taking inspiration from Montage, a number of common
workflow patterns (sequence, fan-in and fan-out), input to
output data size relationships and network configurations
are identified and evaluated. The performance analysis concludes
that a substantial reduction in communication overhead
results in a 2-4 fold performance benefit across all patterns.
An end-to-end pattern through the Montage workflow
results in an 8 fold performance benefit and demonstrates
how the advantage of using our hybrid architecture increases
as the complexity of a workflow grows.
Categories and Subject Descriptors
C.2.4 [Computer-Communication Networks]: Distributed
Systems; C.4 [Performance of Systems]; D.2.11 [Software
Engineering]: Software Architectures
General Terms
Design, Performance.
Keywords
Decentralised orchestration, workflow optimisation.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
HPDC'08, June 23-27, 2008, Boston, Massachusetts, USA.
Copyright 2008 ACM 978-1-59593-997-5/08/06 ...$5.00.
1. INTRODUCTION
Efficiently executing large-scale, data-intensive workflows
common to scientific applications must take into account
the volume and pattern of communication. For example, in
Montage [7] an all-sky mosaic computation can require between
2-8 TB of data movement. Standard workflow tools
based on a centralised enactment engine, such as Taverna
[19] and OMII BPEL Designer [18] can easily become a performance
bottleneck for such applications, extra copies of
the data (intermediate data) are sent that consume network
bandwidth and overwhelm the central engine. Instead, a solution
is desired that permits data output from one stage to
be forwarded directly to where it is needed at the next stage
in the workflow. It is certainly possible to develop an optimised
workflow system from scratch that implements this
kind of optimisation. In contrast workflow systems based on
concrete industrial standards offer a different set of benefits:
they have a much larger and wider user base, which allows
the leverage of a greater availability of supported tools and
application components. This paper explores the extent to
which the benefits of each approach can be realised. Can
a standards-based workflow system achieve the performance
optimisations of custom systems and what are the tradeoffs?
1.1 Orchestration and Choreography
There are two common architectural approaches to implementing
...
...