High-level data-structures are an important foundation for most applications. With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. However, these operations often incur high-level abstraction and scheduling penalties. We present a generic data-parallel collections design based on work-stealing for shared-memory architectures that overcomes abstraction penalties through call site specialization of data-parallel operation instances.
Moreover, we introduce work-stealing iterators that allow more fine-grained and efficient work-stealing. By eliminating abstraction penalties and making work-stealing data-structure-aware we achieve several dozen times better performance compared to existing JVM-based approaches.