The trend of multi-core processor development brings a shift of paradigm in applications development. Traditionally, increasing clock frequency is one of the main dimensions for conventional processors to achieve higher performance gains. Application developers used to improve performance of their applications by just waiting for faster processor platforms. Today, increasing clock frequency has reached a point of diminishing returns—and even negative returns if power is taken into account. Multi-core processors promise a power-efficient way to increase performance and become prevalent in vendors’ solutions. However, the application or algorithm development process must be significantly changed in order to fully explore the potential of multi-core processors. The aim of this special issue is to discuss related challenges, issues, case studies, and solutions, especially focusing on multimedia-related applications, architectures, and programming environments.

This special issue opens with three papers that address overall multi-core architectures, efficient on-chip inter-connect between the cores, and cache and memory subsystems to support the huge amount of computation.

  • To implement video codec efficiently, “An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation” by Qiu et al. presents an architecture, which incorporates multiple programmable IP blocks to deal with the computationally intensive components. To achieve high performance and flexibility, the authors introduce a concept of virtual socket so that the programmable IP blocks can run concurrently and independently from each other. The architecture is extensible, and thus can be utilized to deploy multiple video codecs.

  • When we have multiple processors (or processing units) on the same chip, it is important to have an efficient communication mechanism between them. In addition to performance, power efficiency is critical as well. “Power Dissipation of the Network-on-Chip in Multi-Processor System-on-Chip Dedicated for Video Coding Applications” by Milojevic et al. studies the power dissipation of a Network-on-Chip in multi-processor system-on-chip dedicated for video coding applications.

  • As CMPs have higher computational capability, memory subsystem performance becomes critical. “Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors” by Molnos et al. presents a scheme that partitions the cache for running multiple tasks on a Chip Multiprocessor (CMP) simultaneously. Furthermore, the scheme dynamically adjusts the sizes of partitions when the application scenario changes. The scheme has significant performance advantage, compared to a regular shared cache.

Besides architecture-related challenges, there are many algorithmic-related challenges. It is critical to understand the complexity of develo** a new application or porting an existing application onto a multi-core processor. This special issue continues with three papers that address how to exploit multi-core architecture efficiently for multimedia applications, including design examples and design methodology that can be generalized to other applications.

  • For future system with hundreds of cores, we must consider whether there is enough parallelism in the applications. “Parallel Scalability of Video Decoders” by Meenderinck et al. analyzes the H.264 decoder, which was considered hard to parallelize conventionally, and shows that in theory even mobile video will have enough parallelism if parallelized via a dynamic three-dimensional wave-front approach.

  • In order to increase parallelism, it is important to break data dependency. To achieve the best compression quality, H.264/AVC has incorporated many algorithms that have heavy data dependency. In “A Multi-core Architecture based Parallel Framework for H.264/AVC Deblocking Filters,” Wang et al. carefully review the deblocking filter algorithm and observe that the results of each deblocking filtering step only affect a limited region of pixels. Therefore, a novel algorithm is proposed to take advantage of the parallelism.

  • “Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors” by Li et al. demonstrates that properly parallelizing the media mining workloads is a key to effectively utilize existing small-scale multi-core processors or future large-scale multi-core platforms, but requires extra efforts. One important factor in parallelization is the performance analysis. After performance bottlenecks are identified, we can then apply various techniques. While the authors parallelize the emerging media mining workloads, the methodology presented in this paper is also applicable to other applications.

Furthermore, there are differences between programming for multicore architectures and programming for other kinds of parallel architectures. It is important to have proper supports from programming environments or tools to exploit parallelism with performance, scalability, and correctness.

  • The complexity of modern multiprocessor system-on-chip (MPSoC) platforms makes it increasingly important to employ structured design methods and associated tools. “Design and Tool Flow of Multimedia MPSoC Platforms” by De Sutter et al. addresses these considerations for the multimedia domain. A design methodology is developed based on identifying core components of multimedia MPSoC platforms, and applying a tool flow that is centered around configuration and integration of these kinds of components. The approach is demonstrated concretely on two relevant applications from the targeted domain.

  • “Combining Coarse-Grained Software Pipelining with DVS for Scheduling Real-Time Periodic Dependent Tasks on Multi-Core Embedded Systems” by Liu et al. applies multi-core technology in conjunction with dynamic voltage scaling (DVS) to schedule dependent groups of tasks in such a way that energy consumption is minimized for a given performance constraint. The method first applies retiming to transform the problem of scheduling dependent groups of tasks into a problem of scheduling independent tasks; this increases scheduling flexibility. Algorithms are then developed for DVS-integrated scheduling targeted to single-processor and multi-core architectures.

  • “Multiprocessor, Mulithreading and Memory Optimization for On-Chip Multimedia Applications” by Girodias et al. analyzes the impact of multi-core or simultaneous multithreading environments on memory optimization techniques. In order to get the best efficiency from multi-core or multi-threaded processors, some existing memory optimization techniques must be improved. Sometimes, the best technique for multi-threaded processors is not necessarily the best for multi-core processors. Sometimes, the best technique for one application is not necessary the best for another application.

This special issue concludes with one application case study to illustrate the novel usage of emerging multi-core processors.

  • Graphics processing units (GPUs) form a class of domain-specific processor architectures that is receiving attention in various embedded application domains beyond the original graphics-oriented domain that these processors were initially designed for. “Real-time visual tracker by Stream processing” by Lozano et al. develops a novel GPU-based computer vision system for real-time tracking of objects in video sequences. The approach presents new techniques for improving the performance of particle-filter-based tracking algorithms, and provides a detailed case study of applying GPU-based acceleration techniques to a practical computer vision system.

We hope that this special issue provides enlightening information to the community in a timely fashion on the design and implementation of multimedia applications on multi-core platforms. We would like to thank the authors for their excellent contributions. We also appreciate the reviewers for their constructive comments.