CLIJ: GPU-accelerated image processing for everyone

Robert Haase, Loic A. Royer, Peter Steinbach, Deborah Schmidt, Alexandr Dibrov, Uwe Schmidt, Martin Weigert, Nicola Maghelli, Pavel Tomancak, Florian Jug, Eugene W. Myers

Posted on: 10 September 2019

Preprint posted on 22 August 2019

Article now published in Nature Methods at http://dx.doi.org/10.1038/s41592-019-0650-1

CLIJ: a tool that overcomes an important bottleneck in the microscopy workflow: it brings accelerated processing speed to everyone, regardless of coding skills.

Selected by Mariana De Niz

Categories: bioinformatics, cell biology, developmental biology, immunology, microbiology, molecular biology, physiology

Background

Graphics processing units (GPU) are single-chip processors that allow image processing at unprecedented speed. Aside of being used for 2D data, GPUs are essential for decoding and rendering of 3D animations, and can perform parallel operations on multiple sets of data.

Conversely, central processing units (CPU) are thought of as a computer ‘brain’ and their function is to interpret and control most of the commands from the computer. However, they have important bottlenecks when performing real-time processing or multiple tasks. GPUs help provide significant acceleration when performing multiple operations simultaneously.

Moreover, in modern microscopy platforms, very large amounts of multi-dimensional image data can be generated. An important bottleneck associated with this is significantly reduced speed in image processing and analysis of such data. One way to speed processing which has been recently explored in the context of imaging, including restoration, segmentation and visualization, is to use the parallel processing capacity of GPUs, using specific code for each of the intended tasks (1-6).

The flexible workflows implemented in user-friendly tools such as ImageJ and Fiji, were programmed at a time when GPUs were not widely used, and therefore rely on CPU processing. Because until now GPU-accelerated image processing required programming abilities, perhaps microscopists without such experience have not yet benefitted from the advantages that GPUs provide.

Key development

Haase and Royer et al (7) developed CLIJ, a Fiji plugin enabling users to benefit from GPU-accelerated image processing.
A key feature of CLIJ is that it does not require any GPU programming skills, nor specialized hardware to be executed.
CLIJ complements core ImageJ operations with reprogrammed counterparts that take advantage of OpenCL (an open standard for cross-platform parallel programming) to execute on GPUs.
CLIJ offers a wide range of image processing functions for morphological filters, spatial transformations, thresholding, minima/maxima detection, 3D-to-2D projections, and methods of descriptive statistics for quantitative measurements, among others.
While the speed-up using GPU may vary depending on the dataset and the processing required, the execution time for image processing on various systems, regardless of whether they were commercial laptops or professional workstations, and regardless of the operating system used, was faster that the counterparts in ImageJ running on the CPU.
For the data to be processed on GPUs, they have to be first pushed to GPU memory, and later pulled back to CPU memory.
CLIJ is compatible with all programming languages available in ImageJ.
CLIJ opens the possibility of real-time analysis for smart microscopy applications.
The authors provide a plugin template together with the full open source code of CLIJ and all data and scripts, in order to provide a baseline for other developers.

Figure 1. CLIJ facilitates accelerated image processing. (Image reproduced from https://clij.github.io)

Application

As a proof of principle, the authors used CLIJ to process a multi-step workflow on data generated with 3D light sheet microscopy, in this case, imaging a Drosophila embryo. This workflow included reduction of background signal using Gaussian filtering, data projection from 3D to 2D, and nuclei counting. The differences on count accuracy using CPU or GPU were not significant. Likewise, the hardware on which CLIJ was used also had little impact on count accuracy. Processing time was reduced by a factor of 15-33 when CLIJ was used on a laptop or a workstation respectively.

What I like about this paper

I favour very much open science, and that different tools generated by various labs are designed to make science available and accessible to everyone, be this as affordable hardware, free software, or as tools that help eliminate barriers dividing scientists. CLIJ is a good example of the latter: on one hand, CLIJ enables users with little programming skills to benefit from accelerated image processing (making this step of science more efficient, and therefore offering the possibility of increasing output and complexity of image processing done). On the other hand, Haase and Royer (7) provide in their work very detailed documentation aimed at both, entry-level users, and users with more advanced programming skills to allow them to contribute to this tool in the future. As a microscopist with an interest in image analysis, I found the documentation provided by the authors easy to follow, and the accelerated image processing an enormous advantage to the work I do. A big advantage of CLIJ is also that it might encourage more scientists to engage in methods of image analysis, and workflow complexity, that before they did not consider because the high processing time was a hindrance. Furthermore, it allows users who only have access to low cost computers to make use of GPU acceleration.

I like also that the documentation is very complete, as is the author’s website (link to sections I found particularly useful below):

https://clij.github.io

https://clij.github.io/clij-docs/quickTour

https://clij.github.io/clij-docs/faq

https://github.com/clij/clij-docs/blob/master/clij_cheatsheet.pdf

Open questions*

(See bottom of page for author’s answers)

In your discussion, you mention that CLIJ is compatible with smart microscopy, doing real-time processing. Has it been used already, and in what type of operations?
One thing you discuss in the FAQ section, is that you put emphasis on mathematical correctness, consistency, simplicity of code, performance, and similarity of results obtained by CLIJ and ImageJ. You mention also that while algorithms on CPU can use double precision, in GPU, this is usually single-precision. In your opinion, what are the trade-offs to keep in mind when using CLIJ for processing?
Is CLIJ compatible with processing super-resolution microscopy?
You discuss also in the FAQ section, that multi-channel, time lapse images are not compatible with CLIJ, at present. Why is it so? As a user who currently does precisely this type of image, I know a big bottleneck to output is precisely the slow processing time of very large videos. Do you envisage that in the near future, CLIJ will also enable processing 4D and 5D datasets?
In your blog discussions, and in your documentation, you carefully explain the “pull” and “push” code. You also discuss in your FAQ how to optimize use of CLIJ, including the type of datasets that would indeed benefit more from GPU-based processing than CPU. While at the moment this is a user-based decision, do you think that in the future, based on the dataset and workflow, a platform such as Fiji can automatically assign the dataset for GPU or CPU-based processing, depending on what is most optimal? Namely a hybrid platform with a user friendly interface.
You gave the example of the Drosophila embryo. Would machine learning approaches for image analysis benefit also from GPU-based processing? Is it an option CLIJ could provide?
Two aspects that benefit from accelerated image processing are throughput and quality. In terms of quality, do you think there are more functions currently not existing, or sub-optimal in Fiji, which can be improved by the use of CLIJ and GPU-based processing?

References

Preibisch S et al, Efficient Bayesian-based Multiview deconvolution. Nature Methods, 11 (6) (2014)
Laine R, et al, NanoJ: a high performance open-source super-resolution microscopy toolbox, Journal of Physics D: Applied Physics,52 (16), (2019)
Culley, S et al, Quantitative mapping and minimization of super-resolution optical imaging artefacts, Nature Methods, 15 (4), (2018)
Weigert, M, et al, Content-aware image restoration: pushing the limits of fluorescence microscopy, Nature Methods, 15 (12), (2018)
Falk T, et al, U-Net: deep learning from cell counting, detection and morphometry, Nature Methods, 16(4), (2019)
Schmid, B et al, 3Dscript: animating 3D/4D microscopy data using a natural-language-based syntax, Nature Methods16 (4), (2019)
Haase R, Royer LA, et al, CLIJ: GPU-accelerated image processing for everyone, bioRxiv, (2019)

Acknowledgement

I am very grateful to Robert Haase and Loic Royer for their time and input, for engaging in answering my questions and discussion, and providing very useful additional links to their work. I thank Mate Palfy for his helpful feedback on this highlight.

Tags: image analysis, microscopy

doi: https://doi.org/10.1242/prelights.13735

Read preprint

(No Ratings Yet)

Author's response

Robert Haase (RH) and Loic Royer (LR) shared

Open questions

In your discussion, you mention that CLIJ is compatible with smart microscopy, doing real-time processing. Has it been used already, and in what type of operations?

RH: We recommend CLIJ for processing images on-the-fly (also for other samples than Drosophila) because we do that in routine. Background-subtracted maximum projections with scale bars are saved before the raw data. But smart microscopy goes much further. We showed earlier that one of our microscopes can observe several Drosophila embryos in early stage subsequently, predict when one will undergo gastrulation in advance and then image just this one with increased temporal resolution. That’s what we work on in Dresden: event driven smart microscopy, actually running on the CPU back in the days. We wanted to be faster in processing and more flexible. We had bigger plans but we were limited by slow processing and analysis algorithms. Thus, CLIJ started as a collection of operations for event-driven smart microscopy. At the same time I programmed auto-completion in Fijis script editor – as a side project. Finally, the idea came up that auto-completion has the potential of bringing GPU-acceleration to everyone, also to people who don’t feel like learning programming OpenCL or CUDA. Thus, we decided going on that detour leading to this preprint.

One topic you discuss in the FAQ section, is that you put emphasis on mathematical correctness, consistency, simplicity of code, performance, and similarity of results obtained by CLIJ and ImageJ. You mention also that while algorithms on CPU can use double precision, in GPU, this is usually single-precision. In your opinion, what are the trade-offs to keep in mind when using CLIJ for processing?

RH: CLIJ users clearly aim for highest speed. Also without GPUs, you can gain speed by sacrificing accuracy, precision, quality and resolution. Microscopists know what I’m talking about. Users should make this decision actively by trying different solutions. Recommendation: If you see two ways for implementing a workflow in CLIJ, try both! Measure time, compare results with ImageJ and then make your decision. Let me illustrate an example: If you run ImageJs Mean-3D-filter on my laptop with radius 3 on a 16 MB image stack, it takes about 3 seconds. Running CLIJs Mean3DSphere filter, which is quite similar to ImageJs filter, takes about 120 milliseconds. Speedup factor 25 – amazing, no? If you run CLIJs Mean3DBox filter, which is a bit different but might fulfil its purpose, it needs just 40 milliseconds. That’s a speedup factor of 75 – including data transfer to/from GPU btw! GPU-acceleration is an amazing playground and investing some time in experimenting with workflows totally pays off, as you can spare hours in the future. I could spend my whole day doing these CPU-GPU comparisons. Unfortunately, I like watching little developing fly and beetle eggs in a light sheet microscope even more.

Is CLIJ compatible with processing super-resolution microscopy?

RH: Sure. You can load any kind of images and process them. Unfortunately, CLIJ doesn’t have super-res specific operations such as Gaussian fitting… yet. But I’m convinced it would be beneficial to do that on the GPU. You are not the first asking for that. The more people ask for it and talk about it, the higher the chance that somebody will make it happen at some point. Maybe not me as I’m not a super-res guy, but I’m happy to help anyone who wants to dive into doing this on the GPU.

You discuss also in the FAQ section, that multi-channel, time lapse images are not compatible with CLIJ, at present. Why is it so? As a user who currently does precisely this type of image, I know a big bottleneck to output is precisely the slow processing time of very large videos. Do you envisage that in the near future, CLIJ will also enable processing 4D and 5D datasets?

RH: I would love to make that happen. Technically it’s absolutely feasible. But without going too much into detail, there are two major imponderables with n-D images we need to solve first from a conceptual point of view: Firstly, operations over time and channels are tricky usage-wise. Just an example: Gaussian blur in 2D or 3D are very similar and technically easy to implement. Blurring over time also makes sense in time lapse data if you want to stabilize jittering objects for example. A Gaussian blur over channels is not so easy to justify. Even worse, thresholding moving objects over time in a 4D-fashion can actually cause severe faults in the results which are not easy to spot if you work with a 4D-dataset. The very most workflows processing 4D and 5D images are executed frame by frame and channel by channel anyway and this can be done with CLIJ already. Secondly, memory in GPUs is often limited. GPUs with 8 GB memory and more are actually expensive. Thus, also from this point of view it makes sense to split long time lapses into frames and process them in the GPU individually. As there is a need to deal with this, I’m planning to introduce some simplification for loading n-D images into the GPU. But in there they might still be treated as lists or arrays of 2D and 3D images.

In your blog discussions, and in your documentation, you carefully explain the “pull” and “push” code. You also discuss in your FAQ how to optimize use of CLIJ, including the type of datasets that would indeed benefit more from GPU-based processing than CPU. While at the moment this is a user-based decision, do you think that in the future, based on the dataset and workflow, a platform such as Fiji can automatically assign the dataset for GPU or CPU-based processing, depending on what is most optimal? Namely a hybrid platform with a user friendly interface.

RH: It would be so great to have a Fiji that recognizes a powerful GPU in the computer and then runs workflows on it. However, we must leave this decision to the user because of the small differences between CPUs and GPUs. Otherwise we might sacrifice reproducibility of workflows. Counter question: If a future Fiji would in general run on the GPU producing reproducible results: Would you use it even though there are small differences compared to “the old CPU Fiji”?

MDN: It depends on the degree of differences, and the relevance of differences to what I am trying to study. For instance a lot of the work I do is motion tracking, registration, background correction and/or segmentation in very large datasets. For most of the workflow, accelerated speed would be an enormous advantage a) to the throughput, and b) to the amount of parameters I analyse.

RH: Indeed, for many applications perfect precision is not necessary. We just wanted users to be aware. We shall be good scientists and check if our workflows are doing the right thing – as always. If everyone is fine with small differences, then we can reach the next level regarding image processing speed in many applications. I’m looking forward to see amazing workflows running in real-time which were taking ages in the past.

You gave the example of the Drosophila embryo. Would machine learning approaches for image analysis benefit also from GPU-based processing? Is it an option CLIJ could provide?

LR: All major machine learning libraries have both CPU and GPU back-ends, and the GPU backend is used 99% of the time because it is often orders of magnitude faster than CPU based machine learning. So in the end, GPUs are definitely the way to go for fast image processing, machine learning or not.

Two aspects that benefit from accelerated image processing are throughput and quality. In terms of quality, do you think there are more functions currently not existing, or sub-optimal in Fiji, which can be improved by the use of CLIJ and GPU-based processing?

RH: We recently made CLIJ compatible with Matlab because we need some operations in CLIJ which don’t exist in Fiji, but in Matlab: mostly vector and matrix arithmetic based algorithms. Programming these methods in the GPU appeared simpler if we can compare them directly to Matlab. So yes, more functions are coming. The prototype of CLIJ2 has currently about twice the number of operations CLIJ1 offers.

Furthermore, quite some people asked if it would be possible to run the Trainable WEKA Segmentation and the 3D Objects Counter on the GPU. I cannot promise that both will become part of CLIJ2. But I can tell you that both exist as functional prototypes.