A class for performing 2d convolution using the FFTW library. More...

#include <m_convolve.h>

Classes
struct	Sizes
	utility aggregate stores all sizes – rarely used by caller More...

Public Types
typedef std::pair < boost::shared_ptr < FFTW_real_vector > , boost::shared_ptr < FFTW_complex_vector > >	Work_buffer

Public Member Functions
	Fftw_convolution_2d (int data_num_rows, int data_num_cols, int mask_max_rows, int mask_max_cols, int fft_alg_type=FFTW_MEASURE)
	Initialize the convolver by specifying data and mask dimensions. More...

void	set_mask (const Matrix &)
	set mask, which must fit with mask size maxima given to ctor More...

void	set_gaussian_mask (double sigma)
	set mask to a circular gaussian kernel of given sigma (pixels) More...

void	convolve (const Matrix &, Matrix &) const

void	reflect_and_convolve (const Matrix &, Matrix &) const
	convolve with mask, assuming input reflects at its boundaries More...

void	execute (const Matrix &i, Matrix &o) const
	deprecated synonym for convolve method More...

Work_buffer	allocate_work_buffer () const
	get a handle to a work buffer needed for threadsafe convolution. More...

void	convolve (const Matrix &, Matrix &, Work_buffer) const

void	reflect_and_convolve (const Matrix &, Matrix &, Work_buffer) const
	convolve with mask, assuming input reflects at its boundaries More...

const Sizes &	get_sizes () const
	read access to the sizes specified at ctor time More...

bool	is_mask_set () const
	read access of the flag indicating whether the mask has been set More...

Detailed Description

A class for performing 2d convolution using the FFTW library.

This should run faster than kjb_c::fourier_convolve_matrix() for applications that perform similar-sized convolutions many times, because this class re-uses the plans constructed by FFTW, the construction of which is usually the bottleneck for FFTW. The results from the convolve() method should be the same as those of fourier_convolve_matrix() except for numerical noise.

Two ways to handle boundary conditions

If you use the convolve() method, boundary conditions are handled by padding the data and mask with zeros before operating on them. This padding is removed before a result is returned. Note that kjb_c::convolve_matrix() behaves differently (it reflects the input matrix at its boundary). To emulate this behavior, use the reflect_and_convolve() method.

Storage is allocated only once when the object is created. Because of this, all data used for convolutions must be of the same size. Masks may differ in size, but dimensions cannot exceed the maximums specified during construction. Setting maximum mask dimensions higher than needed won't affect correctness, but will reduce performance. Setting the maximum mask dimensions too low will raise an exception when a larger mask is used.

Convolution using just one thread

If your application does not need to be parallelized, you may simply disregard any methods that mention Work_buffer. In other words, use only the two-parameter convolve() and reflect_and_convolve() methods.

Convolution using multiple threads

A few methods are reentrant (i.e., thread-safe). If you wish to use multiple threads to convolve a common mask with multiple matrices (each convolution using ONE thread), this class can accomodate. Here is the general strategy:

One thread instantiates class in object o and (usually) sets the mask
Start some worker threads, and give each a pointer p to object o.
Each worker thread needs its own Fftw_convolution_2d::Work_buffer b from allocate_work_buffer(), but this method is NOT reentrant! Use a mutex to force sequential calls, or have one starter thread perform all the allocation sequentially and distribute the buffers to the worker threads.
Worker threads may use (only) the three-parameter convolve() method or reflect_and_convolve() method. The third parameter is a Work_buffer, and obviously each worker thread should only use its own personal buffer for the appropriate class.
Trivial methods get_sizes() and is_mask_set() are also reentrant. No other methods are reentrant. Also, the class FFTW_vector<T> is not reentrant.

Additional advice for multithreaded programs:

A Work_buffer object is lightweight and can be copied by value, because it is just a couple of smart pointers,
Annoyingly, it is NOT thread-safe for the last copy of a Work_buffer to be destroyed! You have to serialize the destruction of the last copy (i.e., serialize the deallocation of the memory). See Destroying a Work_buffer below.
If you instantiate multiple Fftw_convolution_2d objects (presumably with different masks) you should not share Work_buffer objects between them unless all four sizes given to the ctor are identical.
Once you call allocate_work_buffer(), it is best not to call either of the thread-unsafe, two-parameter convolve methods afterwards, or you will incur a substantial time penalty. The reason is that when calling allocate_work_buffer(), the object takes this as a hint that you will probably never need the thread-unsafe calls for the rest of the lifetime of the object. So, it frees some of its internal resources required for the thread-unsafe calls.

Destroying a Work_buffer

Because the Work_buffer is a shared pointer to a block of memory that must be deallocated sequentially (using fftw_free, which is not reentrant), the last copy of any Work_buffer cannot be destroyed simultaneously with any other FFTW calls (except for its thread-safe functions). Thus, if a thread-safe function is to call allocate_work_buffer or initiate the destruction of the work buffer, it must serialize those steps. There are a number of ways to do so.

See Example multithreaded code for one straightforward way to serialize those actions.
Simpler alternative: leave it to the main thread to allocate the buffers, to launch the threads, and to destroy all the buffers after the join. Advantage: simplicity, and no locks. Disadvantage: none of the memory is released until all the convolution is over.

Example multithreaded code

The following code snippet shows the general outline of a thread function that performs convolution, returning NULL or non-NULL to indicate error or success. Also it accesses two global objects, c and mtx.

kjb_pthread_mutex mtx = KJB_PTHREAD_MUTEX_INITIALIZER;
Fftw_convolution_2d* c = NULL; // setup of c done elsewhere, e.g., in main()
// Here is the function that the worker threads all call:
void* thread_work(void* input) {
    NRN(c); NRN(input);
    Fftw_convolution_2d::Work_buffer wb; // ok: default ctor is thread-safe
    // Allocate work buffer, using 'mtx' to prevent races between threads.
    do { Mutex_lock l(&mtx); wb = c -> allocate_work_buffer(); } while(0);
    // Now, do convolution on input using c and wb.
    // . . .
    // Ready to exit.  Now, destroy any COPIES of wb asynchronously.
    // . . .
    if (!work_buffer_is_unique(wb)) {
        set_error("wb not unique in thread %d", get_kjb_pthread_number());
        return NULL; // cannot destroy work buffer if another copy exists
    }
    // Destroy the last copy synchronously.
    Mutex_lock l(&mtx);
    wb = Fftw_convolution_2d::Work_buffer(); // clobber its contents
    return input;
}

Author: Kyle Simek; Prasad Gabur; Kobus Barnard; Andrew Predoehl

Member Typedef Documentation

typedef std::pair< boost::shared_ptr<FFTW_real_vector>, boost::shared_ptr<FFTW_complex_vector> > kjb::Fftw_convolution_2d::Work_buffer

Constructor & Destructor Documentation

kjb::Fftw_convolution_2d::Fftw_convolution_2d	(	int	data_num_rows,
		int	data_num_cols,
		int	mask_max_rows,
		int	mask_max_cols,
		int	fft_alg_type = `FFTW_MEASURE`
	)

Initialize the convolver by specifying data and mask dimensions.

Parameters

data_num_rows	exact number of rows in each data matrix
data_num_cols	exact number of col in each data matrix
mask_max_rows	maximum number of rows in convolution mask (kernel)
mask_max_cols	maximum number of cols in convolution mask (kernel)
tuning_algorithm	The approach that FFTW should use to tune its fft algorithm. Acceptable values in order of increaing performance (and initialization time) are FFTW_ESTIMATE, FFTW_MEASURE, FFTW_PATIENT, FFTW_EXHAUSTIVE. Default is FFTW_MEASURE, which is a good balance of performance and startup-time. FFT_PATIENT gives ~1.4x speedup, at the expense of several extra seconds of additional startup time. In parallel mode, FFT_PATIENT should show even greater speedup (unconfirmed claim).

Mask data is specified later by calling set_mask(). This ctor sets up the FFTW plans, and thus it will may take a few seconds.

Todo:: see if this has a bug when the given mask is smaller than the dimensions given here: it might cause the output to shift.

Member Function Documentation

Fftw_convolution_2d::Work_buffer kjb::Fftw_convolution_2d::allocate_work_buffer ( ) const

get a handle to a work buffer needed for threadsafe convolution.

Returns: a Work_buffer (i.e., smart pointers) to memory for convolution

Warning: not thread safe; do not mix work buffers among different objects of this class, unless the results of get_sizes() are identical.

Postcondition: data_ contains two null smart pointers.

If you call this function, you are probably going to use the reentrant convolution methods, and therefore you will probably not need the memory in the data_ field, if any. Thus we check whether data_ has memory, and if so we give it away in the expectation (no guarantees) that it would probably never otherwise be used.

To repeat: the object this returns is a handle, an opaque pointer – when originally written, a pair of smart pointers but we make no promises about implementation – and you should pass it around BY VALUE, via copying. It is fine to store them in an array or vector, if you like.

Warning: It is also not thread-safe to let the last copy of a Work_buffer object go out of scope. See Destroying a Work_buffer for discussion.

void kjb::Fftw_convolution_2d::convolve	(	const Matrix &	,
		Matrix &
	)		const

void kjb::Fftw_convolution_2d::convolve	(	const Matrix &	,
		Matrix &	,
		Work_buffer
	)		const

void kjb::Fftw_convolution_2d::execute	(	const Matrix &	i,
		Matrix &	o
	)		const

inline

deprecated synonym for convolve method

const Sizes& kjb::Fftw_convolution_2d::get_sizes ( ) const

inline

read access to the sizes specified at ctor time

bool kjb::Fftw_convolution_2d::is_mask_set ( ) const

inline

read access of the flag indicating whether the mask has been set

void kjb::Fftw_convolution_2d::reflect_and_convolve	(	const Matrix &	,
		Matrix &
	)		const

convolve with mask, assuming input reflects at its boundaries

void kjb::Fftw_convolution_2d::reflect_and_convolve	(	const Matrix &	,
		Matrix &	,
		Work_buffer
	)		const

convolve with mask, assuming input reflects at its boundaries

void kjb::Fftw_convolution_2d::set_gaussian_mask ( double sigma )

set mask to a circular gaussian kernel of given sigma (pixels)

void kjb::Fftw_convolution_2d::set_mask ( const Matrix & )

set mask, which must fit with mask size maxima given to ctor

The documentation for this class was generated from the following files:

/work/kobus/src/lib/m_cpp/m_convolve.h
/work/kobus/src/lib/m_cpp/m_convolve.cpp

Classes

Public Types

Public Member Functions

Detailed Description

Two ways to handle boundary conditions

Convolution using just one thread

Convolution using multiple threads

Additional advice for multithreaded programs:

Destroying a Work_buffer

Example multithreaded code

Member Typedef Documentation

Constructor & Destructor Documentation

Member Function Documentation