Package Details: openblas-lapack 0.3.29-1

Git Clone URL: https://aur.archlinux.org/openblas-lapack.git (read-only, click to copy)
Package Base: openblas-lapack
Description: Optimized BLAS library based on GotoBLAS2 1.13 BSD (providing blas, lapack, and cblas)
Upstream URL: http://www.openblas.net/
Licenses: BSD
Conflicts: blas, cblas, lapack, lapacke, openblas
Provides: blas, cblas, lapack, lapacke, openblas
Submitter: sftrytry
Maintainer: thrasibule
Last Packager: thrasibule
Votes: 92
Popularity: 0.000729
First Submitted: 2013-11-20 23:53 (UTC)
Last Updated: 2025-01-13 18:13 (UTC)

Required by (641)

Sources (1)

Latest Comments

« First ‹ Previous 1 .. 5 6 7 8 9 10 11 12 13 14 15 16 Next › Last »

MaartenBaert commented on 2017-10-08 20:05 (UTC)

I know, OMP_NUM_THREADS has the same effect. But I am getting slightly better performance when I compile without multithreading. Probably because it reduces overhead.

adfjjv commented on 2017-10-08 09:27 (UTC)

@MaartenBaert If you're using OpenBlas in a multi-threaded application the simplest thing is to disable threading by setting environment variable OPENBLAS_NUM_THREADS to 1. No need to recompile. See FAQ: https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

MaartenBaert commented on 2017-10-07 03:15 (UTC) (edited on 2017-10-07 03:20 (UTC) by MaartenBaert)

Actually it seems to be more complicated than this. I managed to fix the problem this way on another machine (running CentOS 6), but the same change doesn't work on this machine. CHOLMOD always launches 3 threads even with OMP_NUM_THREADS=1. On my Arch machine, OpenBLAS launches 3 extra threads and the performance crashes. On the CentOS 6 machine however, OpenBLAS launches 4 extra threads and the performance is only slightly degraded. Both machines have 4 physical cores. Looks like I still don't fully understand it. So far I'm getting the best performance with multithreading completely disabled, i.e. USE_OPENMP=0 USE_THREAD=0. This still gives me multithreading in CHOLMOD but it solves the problem, and apparently also reduces overhead slightly. EDIT: After recompiling yet another time with USE_OPENMP=1 (USE_THREAD not defined), the problem is solved on the Arch machine as well. The behaviour is now the same as on the CentOS 6 machine: 3 threads from CHOLMOD, 4 extra threads from OpenBLAS. Performance is still better with multithreading completely disabled, but at least it is usable now.

MaartenBaert commented on 2017-10-07 02:29 (UTC)

I found the problem. The issue here is USE_OPENMP=0 USE_THREAD=1. CHOLMOD uses OpenMP, and this is creating a conflict with the non-OpenMP based multithreading in this OpenBLAS package. My CPU has 4 physical cores, so OpenMP would normally create 3 worker threads to occupy the remaining cores. Instead I'm seeing 6 worker threads: 3 with 100% CPU usage (from OpenMP) and 3 with 25% CPU usage (from CHOLMOD). Breaking with a debugger attached at random times suggests that these workers are mostly just busy-waiting, wasting CPU time without doing real computation. The OpenMP threads use spinlocks, which is a terrible idea when you have more threads than physical cores. I tried testing again with OMP_WAIT_POLICY=PASSIVE (which disables the spinlocks), this results in performance comparable to the OpenMP-enabled package. OMP_NUM_THREADS=1 (which eliminates 3 worker threads) has a similar effect. Conclusion: if the parent application/library uses OpenMP, OpenBLAS should also be compiled with OpenMP support or the performance will be terrible.

adfjjv commented on 2017-10-05 17:12 (UTC)

@MaartenBaert What is your hardware? Did you compile the package on a different machine? The PKGBUILD looks like it doesn't create a hardware-agnostic package. I think it would need at a minimum DYNAMIC_ARCH=1.

MaartenBaert commented on 2017-10-02 02:45 (UTC)

I was comparing the performance of various BLAS/LAPACK packages (through CHOLMOD from SuiteSparse), and was surprised to find that this package is ~40 times slower than the 'openblas' package without LAPACK: blas + lapack: 10015ms openblas + lapack: 3721ms openblas-lapack: 161417ms atlas-lapack: 4232ms Does anyone know what could be causing this?

eolianoe commented on 2017-07-14 14:42 (UTC)

@solnce: building fine in an up to date clean chroot. OpenMP is not enabled in this PKGBUILD, so I do not understand why there is some references to OpenMP routines.

solnce commented on 2017-07-14 07:49 (UTC)

Building the most recent version fails for me. gcc -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=12 -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I. -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=12 -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -Wl,-O1,--sort-common,--as-needed,-z,relro -w -o linktest linktest.c ../libopenblas_nehalemp-r0.2.19.so -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1 -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1/../../.. -lgfortran -lm -lquadmath -lm -lc && echo OK. ../libopenblas_nehalemp-r0.2.19.so: undefined reference to `GOMP_parallel' ../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_in_parallel' ../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_set_num_threads' ../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_get_num_threads' ../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_get_max_threads' ../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_get_thread_num' collect2: Fehler: ld gab 1 als Ende-Status zurück

richli commented on 2017-07-13 19:00 (UTC)

Could you add another symlink? ln -sf libopenblas.so libcblas.so.${_lapackver:0:1} Otherwise, it seems a recent update to python-numpy fails: $ python -c 'import numpy' Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module> from . import multiarray ImportError: libcblas.so.3: cannot open shared object file: No such file or directory

eolianoe commented on 2017-07-10 17:26 (UTC)

@xyproto: I'm fine with the move to [community] but if I'm not wrong some optimisations depend on the type of the CPU, not as strongly as atlas does but it may decrease the performance.