publications

Home	News	Packages	Wiki	Publications	Software	People	Social

LOCAL: Low-Complex Mapping Algorithm for Spatial DNN Accelerators
NorCAS2021 - 2021
Many-Core Computing: Hardware and Software - Chapter 6 Hardware and software performance in deep learning
IET - 2019
Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks
ARITH - 2019
Low Complexity Multiply Accumulate Units for Convolutional Neural Networks with Weight-Sharing
ACM TACO - 2018
Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming
CGO - 2018
Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks
IEEE CAL - 2017
Efficient Multibyte Floating Point Data Formats using Vectorization
ACM TOC - 2017
Parallel Multi Channel Convolution using General Matrix Multiplication
IEEE ASAP - 2017
ACACES 2017 - Chapter: Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks
HiPEAC ACACES - 2017
Practical algorithms for finding extremal sets
JEA - 2016
Parallel performance problems on shared-memory multicore systems: a taxonomy and observation
unknown - 2016
Vectorization of Multibyte Floating Point Data Formats
IEEE PACT - 2016
Heuristics on Reachability Trees for Bicriteria Scheduling of Stream Graphs on Heterogeneous Multiprocessor Architectures
unknown - 2015
Automatic Vectorization of Interleaved Data Revisited
ACM TACO - 2015
An evaluation of the suitability of the Movidius Myriad architecture for scientific computing
unknown - 2015
Semi-automatic composition of data layout transformations for loop vectorization
NPC - 2014
Design Considerations for Parallel Performance Tools
unknown - 2014
Orchestratin stream graphs using model checking
unknown - 2013
Minimal unroll factor for code generation of software pipelining
unknown - 2013
Fast Asymmetric Thread Synchronization
unknown - 2013
Compiler Support for Lightweight Context Switching
unknown - 2013
Real-time sensor signal capture from a harsh environment
DS-RT - 2012
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions
unknown - 2012
Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures
unknown - 2011
GSFAP adaptive filtering using log arithmetic for resource-constrained embedded systems
ACM Transactions on Embedded Computing Systems - 2010
Comparing Integer Data Structures for 32 and 64-bit Keys
ACM Journal of Experimental Algorithmics - 2010
Comparing integer data structures for 32- and 64-bit keys
unknown - 2010
An Output Sensitive Algorithm for Computing a Maximum Independent Set of a Circle Graph
Information Processing Letters - 2010
A Program Generator for Intel AES-NI Instructions
unknown - 2010
Using The Meeting Graph Framework to Minimise Kernel Loop Unrolling for Scheduled Loops
LCPC - 2009
Streamlining Offload Computing to High Performance Architectures
ICCS - 2009
Portable Just-in-time Specialization of Dynamically Typed Scripting Languages
LCPC - 2009
Efficiently Implementing Maximum Independent Set Algorithms On Circle Graphs
ACM Journal of Experimental Algorithmics - 2009
A practical solution for scripting language reimplementations and compilers
unknown - 2009
A Practical Solution for Scripting Language Compilers
SAC '09: ACM Symposium on Applied Computing (2009) - 2009
Virtual machine showdown: Stack versus registers
ACM TACO - 2008
Optimization strategies for a Java Virtual Machine interpreter on the Cell Broadband Engine
unknown - 2008
Comparing Integer Data Structures for 32 and 64 Bit Keys
Proceedings of WEA 2008 (7th International Workshop on Experimental Algorithms - 2008
An experimental study of study of sorting and branch prediction
ACM Journal of Experimental Algorithmics - 2008
A stochastic bit-width estimation technique for compact and low-power custom processors
unknown - 2008
Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters
ACM Transactions on Programming Languages and Systems - 2007
Optimising code-copying JIT compilers for virtual stack machines
unknown - 2006
Multiple-valued logic buses for reducing bus energy in low-power systems
IEE Proceedings on Computers and Digital Techniques - 2006
Low-cost microarchitectural techniques for enhancing the prediction of return addresses on high-performance trace cache processors
Proceedings of the 21st Annual Symposium on Computer and Information Sciences (ISCIS 06) - 2006
High performance scientific computing using FPGAs with IEEE floating point and logarithmic arithmetic for Lattice QCD
Proceedings of the 16th International Conference on Field Programmable Logic and Applications (FPL 06) - 2006
FPGA implementation of adaptive filters based on GSFAP using log arithmetic
Proceedings of the 2006 IEEE Workshop on Signal Processing Systems Design and Implementation (SiPS 06) - 2006
Fast and flexible instruction selection with on-demand tree-parsing automata
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI 06) - 2006
Analyzing effects of trace cache configurations on the prediction of indirect branches
Journal of Instruction-Level Parallelism - 2006
The case for virtual register machines
Sci. Comput. Program. - 2005
Multiple-valued logic buses for reducing bus size, transitions and power in deep submicron technologies
unknown - 2005
Multiple-Valued Caches for Power-Efficient Embedded Systems
35th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2005) - 2005
Implementation of an image segmentation algorithm using logarithmic arithmetic, on post-conference
unknown - 2005
FPGA implementation of Lattice QCD algorithm using log arithmetic
unknown - 2005
FPGA implementation of an Image Segmentation algorithm using logarithmic arithmetic
unknown - 2005
Estimating bus size for custom processors in embedded systems
Design Automation for Embedded Systems - 2005
A method-level comparison of the Java Grande and SPEC JVM98 benchmark suites
Concurrency and Computation Practice and Experience - 2005
Stochastic bit-width approximation using extreme value theory for customizable processors
CC - 2004
Retargeting JIT compilers using C-compiler generated executable code
PACT - 2004
Combining stack-caching with dynamic superinstructions
Proceedings of the ACM SIGPLAN 2004 Workshop on Interpreters, Virtual Machines and Emulators (IVME 04) - 2004
Automatic customization of embedded applications for enhanced performance and reduced power using optimizing compiler techniques
Proceedings of the 10th European Conference on Parallel Computing (Europar 04) - 2004
A language and tool for generating efficient virtual machine interpreters
Domain-Specific Program Generation - 2004
Towards superinstructions for Java interpeters
SCOPES - 2003
The Structure and Performance of Efficient Interpreters
JILP - 2003
Platform independent dynamic Java Virtual Machine analysis: the Java Grande forum benchmark suite
Concurrency and Computation Practice and Experience - 2003
An optimized Java interpreter for connected devices and embedded systems
Proceedings of the 18th ACM Symposium on Applied Computing (SAC 03) - 2003
Vmgen - a generator of efficient virtual machine interpreters
SPE - 2002
Primitive sequences in general purpose Forth programs
EuroForth - 2002
Generating an interpreter with Vmgen
unknown - 2002
Building an interpreter with Vmgen
Proceedings of the 10th International Conference on Compiler Construction (CC 2002) - 2002
Benchmarking the Java virtual architecture
Java Microarchitectures - 2002
The common case in Forth programs
EuroForth - 2001
The behaviour of efficient virtual machine interpreters on modern architectures
Europar - 2001
Implementation of an Efficient Java Interpreter
Proceedings of the 9th High Performance Computing and Networking Conference - 2001
Identification and Quantification of Hotspots in Grande Java Programs
Proceedings of the 9th High Performance Computing and Networking Conference - 2001
Comparing Tail Duplication with Compensation Code in single path global instruction scheduling
Proceedings of the 9th International Conference on Compiler Construction (CC 2001) - 2001
A fast Java interpreter
Java Optimization Strategies for Embedded Systems Workshop (JOSES) - 2001
Software Pipelining with Iteration Preselection
CC - 2000
Global software pipelining with Iteration Preselection
Proceedings of the 8th International Conference on Compiler Construction (CC 2000) - 2000
Comparing Code Duplication and Compensation Code
Code Optimisation: Trends, Challenges and Perspectives - 2000
Mapping Streaming Languages to General Purpose Processors through Vectorization
Proceedings of the 2009 International Workshops on Languages and Compilers for Parallel Computing - 1970
FPGA Implementation of a Lattice Quantum Chromodynamics Algorithm Using Logarithmic Arithmetic
12th Reconfigurable Architectures Workshop (RAW 2005) - 1970
Dynamic Interpretation for Dynamic Scripting Languages
Proceedings of the 2010 ACM International Symposium on Code Generation and Optimization - 1970

Research Group