How to build GIMP 2.7.5 using MinGW in Windows 32-bits

Hi!

As part of my work to implement OpenCL in GEGL and GIMP, I have to compile it for Windows, and it’s no easy task. I’ll describe here the step-by-step how to build GEGL and GIMP from Git.

Maybe this post can help people trying to compile GIMP and help developers to improve GIMP building process in Windows.

Install MinGW and MSYS

First, use mingw-get-inst to install the latest MinGW and MSYS, choose to install C/C++ compilers and the basic development environment. Install them in the default path.

After that, open the mingw shell and install wget, openssl and unzip, so we can download things from the command-line:

$ mingw-get.exe install msys-wget
$ mingw-get.exe install msys-openssl
$ mingw-get.exe install msys-unzip

Installing Perl

Perl is used by many building scripts. Download and install ActivePerl.

Now, let's create our build directory:

$ mkdir /opt

Copy the whole content of Active Perl folder to this new folder [C:\MinGW\msys\1.0\opt].

GTK+

Let's create the folder where our GIMP will be:

$ mkdir /opt/gimp

Here the fun begins, we have to go to many places to find gtk+ precompiled binaries.

First, download from the GNOME ftp the GTK+ all-in-one bundle, it has many of the libraries we need.

$ cd /opt/gimp
$ wget http://ftp.gnome.org/pub/GNOME/binaries/win32/gtk+/2.24/gtk+-bundle_2.24.8-20111122_win32.zip
$ unzip  gtk+-bundle_2.24.8-20111122_win32.zip

But, ATK, GLib and GTK+ versions bundled are too old for the latest GIMP. So we get them from OpenSUSE repository:

$ cd /opt/gimp
$ wget  http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/mingw32-atk-2.2.0-1.27.noarch.rpm
$ wget http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/mingw32-atk-devel-2.2.0-1.27.noarch.rpm
$ wget http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/mingw32-glib2-2.30.2-1.7.noarch.rpm
$ wget http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/mingw32-glib2-devel-2.30.2-1.7.noarch.rpm
$ wget http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/mingw32-gtk2-2.24.8-1.7.noarch.rpm
$ wget http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/mingw32-gtk2-devel-2.24.8-1.7.noarch.rpm

No easy way to extract them in MSYS, so go to the directory and extract it using 7zip. Replace files when asked. EDIT: Remember to move /bin, /usr, etc in packages (e.g: opt\gimp\usr\i686-w64-mingw32\sys-root\mingw) to your build dir!

JPEG, TIFF and PNG

Download these packages from GNOME ftp and unzip them in /opt/gimp

$ wget http://ftp.gnome.org/pub/GNOME/binaries/win32/dependencies/jpeg_8-1_win32.zip
$ wget http://ftp.gnome.org/pub/GNOME/binaries/win32/dependencies/jpeg-dev_8-1_win32.zip
$ wget http://ftp.gnome.org/pub/GNOME/binaries/win32/dependencies/libpng_1.4.3-1_win32.zip
$ wget http://ftp.gnome.org/pub/GNOME/binaries/win32/dependencies/libpng-dev_1.4.3-1_win32.zip

Environment variables

Before compiling anything, set these variables:

export PATH=".:/opt/perl/bin:/opt/bin:/bin:/mingw/bin:c:/opt/gimp/bin"
export PKG_CONFIG_PATH=/opt/gimp/lib/pkgconfig:/opt/lib/pkgconfig

Also, try to run gtk-demo to see if everything is ok:

$ gtk-demo.exe

A window should appear. If it complains about some dll missing, go to the GNOME ftp or the OpenSUSE repository, get it and install it in /opt/gimp.

Intltool

Get Intltool source and let's install in /opt

$ cd /opt
$ wget http://ftp.gnome.org/pub/GNOME/sources/intltool/0.40/intltool-0.40.6.tar.gz
$ tar -xzvf  intltool-0.40.6.tar.gz
$ cd  intltool-0.40.6
$ ./configure prefix=/opt
$ make ; make install

Little CMS

Download lcms 1.19 source and move it to /opt/src.

$ tar -xzvf lcms-1.19.tar.gz
$ cd  lcms-1.19
$ ./configure --prefix=/opt/gimp
$ make ; make install

Now, go to /opt/gimp/lib/pkgconfig. Here are all pkg-config files that will be needed to compile BABL, GEGL and GIMP. It's very boring, but change all paths in them [prefix=] to /opt/gimp. You can use some script like:

find * -type f -name '*.pc' -exec sed -i "s#/devel/target/\(.*\)#/opt/gimp#g" {} \;

But change it to catch all pkg-config prefix formats in the folder.

BABL

Now you can get the latest packages directly from Git or from the nightly builds site.

$ cd /opt/src/babl-0.1.7
$ ./configure --prefix=/opt/gimp
$ make ; make install

GEGL

$ cd /opt/src/gegl-0.1.9
$ ./configure --prefix=/opt/gimp CPPFLAGS="-march=pentium -mtune=pentium" --disable-docs
$ make ; make install

GIMP

Finally, GIMP! In this tutorial I didn't enabled Python for simplicity. In http://git.gnome.org/browse/gimp/commit/?id=c15c3f4828527d9836de0ba168b4bfe00669cc21 I fixed some errors about undefined prototypes, so give a look if your source includes it.

$ cd /opt/src/gimp-2.7.5
 $ ./configure --prefix=/opt/gimp CPPFLAGS="-march=pentium -mtune=pentium" --disable-python
 $ make
 $ make install

Now, there will be some errors probably:

  • If you have an error about "Undefined GetUserDefaultUILanguage", change line 50 in app/language.c:
//switch (GetUserDefaultUILanguage())
switch (GetUserDefaultLangID())
  • I don't know why, but my libintl doesn't export some symbols like libintl_printf, so I had to put "#define libintl_printf printf" in the beginning of the following files:
    • app/core/gimptagcache.c
    • plug-ins/common/animation-play.c
    • plug-ins/common/curve-bend.c:
    • plug-ins/common/file-xwd.c
    • plug-ins/common/jigsaw.c
    • plug-ins/common/newsprint.c
    • plug-ins/common/sample-colorize.c
    • plug-ins/file-sgi/sgi.c
  • if you have many errors with Little CMS, change lines 22-26 of modules/display-filter-lcms.c. Don't really know why:
//#ifdef G_OS_WIN32
//#define STRICT
#include <windows.h>
//#define LCMS_WIN_TYPES_ALREADY_DEFINED
//#endif

Now, go to /opt/gimp/bin and the gimp executable should be there. As we installed everything in /opt/gimp, just compress this folder if you want to create an installer or use it in another PC.

References

Posted in Uncategorized | 22 Comments

OpenCL on GEGL: Results up to now

Hello everyone! I’m glad to show you the results up to now of my GSoC project about adding OpenCL support to the General Graphics Library.

What I’ve done

GEGL has two basic data types:

  • GeglTile
  • GeglBuffer

A GeglBuffer can be seen as a layer in a image editing tool, they can be translated, cut, duplicated, etc. A final image is a composition of buffers. A buffer is composed by many GeglTiles, which are rectangular regions of pixels with same size, so pixel data like color is stored in tiles. This architecture is very flexible and allows for example that tiles may be stored in the disk, in a network or compacted.

What I want in my project is to be able to process tiles using an OpenCL device, like GPUs or even a multi-core CPU, the solution I implemented is that each tile has two states, the host memory data and a pointer to a OpenCL memory buffer and each one has its revision number which are used for synchronization.

This synchronization is achieved through locks. For example, suppose gegl_buffer_get is called for a buffer which tiles are being processed in the GPU. This function asks for buffer data to be copied to a pointer, as such, each buffer’s tile is going to be locked for reading, this locking process will verify the revision numbers and move data from the GPU to the CPU accordingly. The picture below illustrate this architecture:

An Example of Use

I’ll show an example of use of gegl buffer iterators to implement a Brightness-Contrast filter using OpenCL.

First, we define the OpenCL kernel that will be executed for each tile:

    const char* kernel_source[] =
    {
    "sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |              \n",
    "                    CLK_ADDRESS_NONE            |              \n",
    "                    CLK_FILTER_NEAREST;                        \n",
    "__kernel void kernel_bc(__read_only  image2d_t in,             \n",
    "                        __write_only image2d_t out,            \n",
    "                         float brightness,                     \n",
    "                         float contrast)                       \n",
    "{                                                              \n",
    "  int2 gid = (int2)(get_global_id(0), get_global_id(1));       \n",
    "  float4 in_v  = read_imagef(in, sampler, gid);                \n",
    "  float4 out_v;                                                \n",
    "  out_v.xyz = (in_v.xyz - 0.5f) * contrast + brightness + 0.5f;\n",
    "  out_v.w   =  in_v.w;                                         \n",
    "  write_imagef(out, gid, out_v);                               \n",
    "}                                                              \n",
}

So, each tile is a OpenCL image2d_t type which can be read-only or write-only and be must be fetched through a sampler.

Now let's see the iterator code:

    i = gegl_buffer_iterator_new (buffer_write, NULL, NULL, GEGL_BUFFER_CL_WRITE);
    index = gegl_buffer_iterator_add (i, buffer_read, NULL, NULL, GEGL_BUFFER_CL_READ);
    while (gegl_buffer_iterator_next (i))
      {
        GeglClTexture *in_tex  = i->cl_data[index];
        GeglClTexture *out_tex = i->cl_data[0];
        size_t global_worksize[2] = {i->roi[0].width, i->roi[0].height};

        CL_SAFE_CALL( errcode = gegl_clSetKernelArg(kernel, 0, sizeof(cl_mem),   (void*)&in_tex->data) );
        CL_SAFE_CALL( errcode = gegl_clSetKernelArg(kernel, 1, sizeof(cl_mem),   (void*)&out_tex->data) );
        CL_SAFE_CALL( errcode = gegl_clSetKernelArg(kernel, 2, sizeof(cl_float), (void*)&brightness) );
        CL_SAFE_CALL( errcode = gegl_clSetKernelArg(kernel, 3, sizeof(cl_float), (void*)&contrast) );

        CL_SAFE_CALL( errcode = gegl_clEnqueueNDRangeKernel(gegl_cl_get_command_queue(), kernel, 2,
                                                      NULL, global_worksize, NULL,
                                                      0, NULL, NULL) );
        CL_SAFE_CALL( errcode = gegl_clFinish(gegl_cl_get_command_queue()) );
      }

The key point here is the GEGL_BUFFER_CL_WRITE and GEGL_BUFFER_CL_READ flags passed to the iterator. They mean that writing and reading will be done through whatever OpenCL device [GPU or CPU] we're using. This code just executes the kernel defined above for each tile.

Before entering the iteration in buffer_read and buffer_write, all data from buffer_read is copied to the GPU [of course, only if it is the most recent]. At the end, the OpenCL revision numbers from buffer_write's tiles are bumped.

If after all that we do this:

gegl_buffer_get (buffer_write, 1.0, NULL, NULL, buf_write, GEGL_AUTO_ROWSTRIDE);

This means we want to copy buffer_write's data to a pointer in the host memory, so we have to synchronize host and GPU data versions before that. So all buffer functions will always return the most recent data version and, at the same time, memory transferences will be made only if necessary.

Here is a fluxogram  of what is happening in this code:

Full code

Performance Results

Running the Brightness-Contrast code with a 1 mega pixel image using a NVidia Tesla C2050 as OpenCL device and a Intel Xeon E5506 as comparison [just using one core, but the code uses SSE2].

Time of memory transferences to the GPU was considered in this benchmark [EDIT: this time considers transferring data back and forth between GPU and CPU].

  • CPU Elapsed time: 526 milliseconds
  • OpenCL Elapsed time: 483 milliseconds

Also, here is a chart from NVidia profiler showing how execution time was spent:

Almost 80% of total execution time has been spent in memory transferences to and from the GPU. This is a good result, because even with this overhead the results were reasonable. Consider that the typical use case of GEGL is doing many operations in sequence, so the ratio processing/memory transferences tends to be higher. In fact, the case present here is the worst-case.

Possible Improvements

There are a lot of things that can be done in order to increase current code speed:

Intercalate execution of tiles with memory transferences of others tiles

GPU hardware (at least modern NVidia GPUs) has separated  units for processing and memory transferences, we can use this to intercalate tiles processing and copying.

Tiles sharing the same OpenCL memory buffer

There is a lot of overhead in allocating a GPU texture for each tile. which is typically small [128x64]. I think the best way to tackle this problem is allocating a big chunk of memory and using offsets in this chunk when processing [it's impossible to have pointers to GPU memory], the problem is that GEGL is supposed to abstract the user  this kind of stuff from the user. Another idea is to serialize execution by having a pool of textures which can be reused by tiles, this would be good also because GPU memory is smaller than Host memory in general, the direct mapping CPU<->GPU cannot stand in fact.

Multiple OpenCL Command Queues

Command Queues can be executed concurrently in the same device, the Fermi architecture from NVidia can run 16 kernels at the same time, for example. This can be used to solve the memory transference overhead also.

Next Steps

I have yet to finish the implementation of a operator interface for OpenCL and make some OpenCL operators in order to create a useful chain of processing only in the GPU.

As the time for a GSoC project is very limited, my mentor and I decided to let optimizations outside the project, but I intend to work on them as soon as I can :)

Conclusion

The use of locking in order to synchronize CPU and GPU data was the most challenging part of the implementation, but after extensive testing. I think it's working now, though I took more time than I expected to make it run properly.

Moreover, results so far show that using OpenCL to speed up Gegl is feasible and very interesting, thought still there is some challenges to be tackled, the tiled structure of Gegl allows a lot of optimizations.

The Gegl OpenCL branch is here.

Posted in gsoc | 52 Comments

The Limits of Understanding

 

 

About This Video

This statement is false. Think about it, and it makes your head hurt. If it’s true, it’s false. If it’s false, it’s true. In 1931, Austrian logician Kurt Gödel shocked the worlds of mathematics and philosophy by establishing that such statements are far more than a quirky turn of language: he showed that there are mathematical truths which simply can’t be proven. In the decades since, thinkers have taken the brilliant Gödel’s result in a variety of directions—linking it to limits of human comprehension and the quest to recreate human thinking on a computer. In this full program from the 2010 Festival, leading thinkers untangle Gödel’s discovery and examine the wider implications of his revolutionary finding.

http://worldsciencefestival.com/videos/the_limits_of_understanding

Posted in Uncategorized | Leave a comment

Coincidence

People are entirely too disbelieving of coincidence. They are far too ready to dismiss it and to build arcane structures of extremely rickety substance in order to avoid it. I, on the other hand, see coincidence everywhere as an inevitable consequence of the laws of probability, according to which having no unusual coincidence is far more unusual than any coincidence could possibly be.
Isaac Asimov – The Planet That Wasn`t

 

Posted in Uncategorized | Leave a comment

Can a machine have a soul?

Alan, You really have thought about everything, who knows how the world would be if you had lived.

Thinking is a function of man’s immortal soul

In attempting to construct such machines we should not be irreverently usurping His power of creating souls, any more than we are in the procreation of children: rather we are, in either case, instruments of His will providing mansions for the souls that He creates.

Alan Turing

Posted in Uncategorized | 1 Comment

Aleatório semanal

 

 

 

Solimão I

 

 

Posted in Uncategorized | Leave a comment

Brute Force Exact Euclidian Distance Transform in CUDA

Hi!

Following a discussion in Reddit about the Distance Transform in GPU, I decided to post a implementation I made some time ago.

It’s the brute force euclidian distance transform. Basically, in a binary image, for each pixel in the foreground we verify what is the 2D euclidian distance to the nearest pixel in the background.

Here is the source of the kernel:

#define BLOCK_SIZE 256

__global__ void euclidian_distance_transform_kernel(
  const unsigned char* img, float* dist, int w, int h)
{
  const int i = blockIdx.x*blockDim.x + threadIdx.x;
  const int N = w*h;

  if (i >= N)
  {
    return;
  }

  int cx = i % w;
  int cy = i / w;

  float minv = INFINITY;

  if (img[i] > 0)
  {
    minv = 0.0f;
  }
  else
  {
    for (int j = 0; j < N; j++)
    {
        if (img[j] > 0)
        {
          int x = j % w;
          int y = j / w;
          float d = sqrtf( powf(float(x-cx), 2.0f) + powf(float(y-cy), 2.0f) );
          if (d < minv) minv = d;
        }
    }
  }

  dist[i] = minv;
}

Performance

35.8 seconds for a 1 megapixel image in a Tesla C2050 GPU. Terrible result.

I suppose a good hardware can't save a bad algorithm after all ;)

The code is at https://github.com/victormatheus/DT-GPU

Posted in gpu, image processing | Leave a comment