Installing art_modern¶
Here provides a detailed installation guide for art_modern.
NOTE All “at least” in this guide are inclusive.
Operating System¶
The project assumes x86_64 (aka., Intel 64, AMD64, x64, with chips manufactured by Intel, AMD, VIA, Zhaoxin, etc.) platforms. Support on other platforms (including 32-bit CPUs) are not guaranteed.
The project assumes modern GNU/Linux distributions that are still under official support. For example, Ubuntu 16.04 LTS (Xenial Xerus) had already reached its end-of-life in Apr. 2021. Other POSIX platforms like *BSD, and patent UNIX are theoretically supported but not tested. POSIX-on-Windows platforms like Cygwin, MSYS2, MinGW, MinGW-w64 are neither supported nor tested.
C/C++ Compilers¶
Checking C11 and C++17 Compatibility¶
This project requires a working C++ compiler that supports C++17 and a working C compiler that supports C11. You may test whether your compiler (GCC, for example) supports such standard using:
echo 'int main(){}' | gcc --std=c11 -x c - -o /dev/null
echo 'int main(){}' | g++ --std=c++17 -x c++ - -o /dev/null
NOTE This is a very rough test. The compiler may still fail to compile the project due to the lack of support of some specific features. In other words, if there is no error, the compiler MIGHT be supported. If there is an error, the compiler is definitely NOT supported.
NOTE The CMake build scripts inside this project contains a script that tests compiler compatibility, which will be automatically executed when configuring the project.
For a table of the minimum compiler version that supports those versions, see also:
GCC¶
GNU Compiler Collections (GCC) (homepage) is the most widely used compiler for GNU/Linux that provides the best compatibility and error-tolerance. At least 7.4.0 is required.
NOTE GCC supports diverse programming languages. Please ensure that your GCC installation comes with C++ support. You need at least g++ program (Test with g++ --version) and a working GNU C++ Standard Library (libstdc++).
See also:
Clang¶
Clang is another popular compiler for GNU/Linux that uses Low-Level Virtual Machine (LLVM) toolchain. Also, the default C++ compiler for FreeBSD and Apple Mac OS X. At least 5.0.1 is required.
NOTE Clang may need GCC to work properly due to the need of the compiler runtime library (e.g., libgcc/libgcc_s and libatomic). See this LLVM article for detailed instructions on selecting GNU- or LLVM-based variants of each toolchain component for Clang.
See also:
Intel OneAPI DPC++/C++ Compiler¶
The compiler (homepage) can build accelerated binaries on Intel CPUs. Compatibilities on the latest version of this compiler will be tested.
NOTE Here we refer to the LLVM-based one (with programs named icx and icpx) instead of the old Intel C++ Compiler Classic (ICC, with programs named icc and icpc).
NOTE CMake version higher than 3.20.0 is required to support this compiler.
NOTE Distributing binaries built with this compiler may require you to comply with Intel’s license and/or breaking the GPL.
Other Compilers¶
Although not tested, the following compilers can also theoretically be of use:
Intel C++ Compiler Classic (ICC) MAY work with legacy systems and Boost libraries.
NVIDIA HPC compilers should work as all versions of this compiler support C++17.
NOTE CMake version higher than 3.20.0 is required to support this compiler.
AMD Optimizing C/C++ and Fortran Compilers (AOCC) should work as all versions of this compiler support C++17. Specifically,
Its first version, 3.2.0 (
AMD Clang 13.0.0 (CLANG: AOCC_3.2.0-Build#128 2021_11_12)), was tested.Its latest version, 5.0.0 (
AMD Clang 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24)), was tested.
Arm C/C++/Fortran Compiler should work. Tested using Arm C/C++/Fortran Compiler version 24.10.1 (build number 4) (based on LLVM 19.1.0) on an ARM development board.
Essential Tools for Building¶
CMake¶
CMake is a cross-platform build system. It is required to build the project. At least 3.17 is required.
NOTE Lots of EOL distributions do not ship with a recent version of CMake. You may download CMake 3.17 in a binary form for x86_64 GNU/Linux or Mac OS X from CMake officially-built binaries.
Dependencies of CMake Build System¶
CMake requires a CMake Generator, which is used to perform the build. Under GNU/Linux and other POSIX systems (e.g., Mac OS X, FreeBSD), Ninja is preferred. GNU Make is also acceptable. BSD-flavored make will NOT work.
This project requires Python at least 3.7 when building the package, for embedding bundled Illumina profiles and make help. Python is not required at runtime.
Linkers, Assemblers, Archivers, etc¶
You need either GNU BinUtils or LLVM BinUtils Replacements to perform assembling and linking. Under normal circumstances, they should come together with your compiler if you install them using your package management systems. The latter may require additional CMake variables to be set.
C Library¶
This project were tested working using GNU C Library and MUSL C Library. Other C libraries are not tested. However, C libraries that satisfy POSIX.1-2008 and C11 should work.
NOTE LLVM C Library is neither supported nor tested.
NOTE Most C libraries bundles a copy of POSIX threads library (pthread(7), usually named libpthread.so or libpthread.a if statically linked) and math library (Usually named libm.so or libm.a). Those libraries are required by this project.
NOTE Here, we assume that FindThread of CMake will find pthread.
pkgconf or pkg-config¶
The project may take advantage pkg-config or its modern replacement (RECOMMENDED) pkgconf to locate the dependencies. Installing such is HIGHLY recommended.
NOTE You may need to set up environment variable PKG_CONFIG_PATH to allow those tools to find the dependencies.
Command-Line Utilities¶
For Apple Mac OS X, FreeBSD, Alpine Linux, and other GNU/Linux distributions with obsolete packages, please consider checking the version of the following tools:
GNU CoreUtils (At least 8.28) for the use of
env -C(introduced in 8.28) andreadlink -f(introduced in 8.0).NOTE CMake may have its own requirements on the version of GNU CoreUtils.
GNU Bash (At least 4.2) for the possible use of advanced array operations.
NOTE CMake may have its own requirements on the version of GNU Bash.
GNU Make as the BSD variant is known to be incompatible.
NOTE CMake may have its own requirements on the version of GNU Make if you use GNU Make as CMake Generator.
As your original system tools shipped with the operating system/BusyBox may NOT work.
Required External Libraries¶
Dependencies are those libraries or tools that should be installed on your system before building the project. If you’re using a personal computer with root privilege, consider installing them using your system’s package manager like APT, YUM, DNF, pacman, and Conda etc. Otherwise, contact your system administrator for where to find them or build them from source.
Boost C++ Library¶
Boost is an umbrella project of diverse small modules that can be used independently. Except Boost header-only libraries, the compiled modules used in this project are:
Essential header-only modules, including:
Essential modules that would be linked to the final executable:
FileSystem. See this Design section for why this module instead of C++17
<filesystem>is required.Program Options. For parsing command-line options.
Log. For logging support.
NOTE A boost module may depend on other boost modules in either header-only or compiled form. CMake should be able to find those dependencies automatically.
Boost modules are found through find_package(Boost ...) command of CMake. This usually requires the presence of BoostConfig.cmake (Provided by Boost) or FindBoost.cmake (provided by CMake) file. See BOOST_CONFIG_PROVIDED_BY_BOOST CMake variable mentioned below for details.
zlib¶
zlib performs compression and decompression bundled ART error profiles. At least 1.2.0 is required. zlib is also required by bundled HTSLib.
The project will firstly try to find zlib using pkgconf. That usually requires the presence of zlib.pc file. If failed, will fall back to libz.so/libz.a with optional version suffixes.
Optional External Libraries¶
The following dependencies are optional. You may choose to install them if you want to improve the performance, adaptability, or user-friendliness of the program.
Optional Boost Components¶
StackTrace: For a more developer-friendly stack trace. Can be absent for non-developers.
See also: configuring Boost::StackTrace for platform-specific configuration instructions for this library.
Test: For unit testing only. Can be absent for non-developers.
Timer: For displaying CPU time, wall-clock time, and average CPU utilization at the end of the program. Can be absent if you do not care about performance.
Random: For random number generation if you choose to use Boost random number generators (See CMake variable
USE_RANDOM_GENERATORbelow). Can be absent if you do not choose to use Boost random number generators.
If benchmarking (See CMake variable BUILD_ART_MODERN_BENCHMARKS) is required, you may also install:
Accelerated Random Number Generators¶
Users on Intel/AMD CPUs are highly recommended to use Intel OneAPI Math Kernel Library (OneMKL). To use this option, set CMake variable USE_RANDOM_GENERATOR to ONEMKL.
OneMKL can be found through CMake. This requires you to install the Intel OneAPI Base Toolkit, which provides MKLConfig.cmake. See official docs for 2025.2 for more details.
OneMKL can also be found through pkgconf. For example, Debian packing of Intel MKL libmkl-dev provides mkl-sdl-lp64.pc. Intel OneAPI MKL installation provides mkl-sdl.pc. To use MKL specified using pkgconf instead of CMake, see CMake variable FIND_RANDOM_MKL_THROUGH_PKGCONF below.
NOTE OneMKL is property software. Distributing binaries built with this library may require you to comply with Intel’s license and/or breaking the GPL.
PCG is used as an alternative to std::mt19937 for random number generation. File <pcg_random.hpp> is required. This file usually located in /usr/include/pcg_random.hpp if you installed Debian package libpcg-cpp-dev. To use this option, set CMake variable USE_RANDOM_GENERATOR to SYSTEM_PCG. Note that we also provide a bundled minified version of PCG random number generators under the name of PCG.
Alternate malloc/free Implementations¶
Users may use either mi-malloc, jemalloc, or tcmalloc to slightly improve the performance of memory allocation and deallocation and check for potential memory leaks.
For jemalloc,
jemalloc.pcfile is required.For mi-malloc,
mimalloc-config.cmakefile is required.For tcmalloc,
libtcmalloc.pcorlibtcmalloc_minimal.pcis required. Under Debian GNU/Linux, this file is provided bylibgoogle-perftools-devpackage.
See also: CMake variable USE_MALLOC below.
NOTE If you already compiled the binary without those libraries, you may still link them at run-time using LD_PRELOAD.
Message Passing Interface (MPI) Library¶
The MPI standard required in this project is MPI 1.0, which is published in 1994. Thus, most MPI libraries should work. MPI libraries from the following vendors are supported for MPI-based parallelization:
The system should theoretically support:
The MPI installation with MPI C API should be locatable using CMake module FindMPI.cmake, which is shipped with CMake. We do NOT need the MPI C++ API, which is deprecated in later MPI standards.
See also: CMake variable WITH_MPI below.
NCBI NGS Libraries¶
For enabling NCBI Short Read Archive (SRA) support in art_profile_builder. Requires libncbi-ngs.so/libncbi-ngs.a, with its development headers.
Available at https://github.com/ncbi/sra-tools.
Available since 1.3.3.
See also: CMake variable WITH_NCBI_NGS below.
NOTE If you would like to use NCBI NGS Library, you MUST use bundled HTSLib due to symbol conflicts.
Required Bundled/External Libraries¶
The following dependencies are bundled with the project. You do not need to install them manually. However, you may choose to use external ones if you have them installed in your system. Consult your system administrator if you do not know whether and where those libraries are installed.
NOTE Using bundled dependencies may introduce security vulnerabilities.
See also: Copying.md for the licenses and versions of those bundled dependencies.
HTSLib¶
HTSLib is used for reading large FASTA files and generating SAM/BAM files.
See also: CMake variable USE_HTSLIB mentioned below.
Bundled¶
This bundled HTSLib is a trimmed down version of the original HTSLib with the following changes:
Removed GNU Autotools & Makefile-based building system and use CMake instead.
Dropped support for
libcurl.Removed the plugin system.
All files unused in
htscodecsare removed.Support for external HTSCodecs and OpenSSL (For calculating MD5) are removed.
To build bundled HTSLib sources, you need to have:
REQUIRED zlib, which is also required by this project.
REQUIRED pthread. See above section for C libraries bundling pthread.
HIGHLY RECOMMENDED libdeflate: This library accelerates compressed BAM output.
OPTIONAL libbz2: For CRAM compression.
OPTIONAL liblzma: For CRAM compression.
NOTE The CRAM format is currently not supported as an output format for art_modern.
See official HTSLib documentation for more details.
External¶
HTSLib at least 1.17 is required due to the use of sam_flush (1.14) and faidx_seq_len64 (1.17).
The project will firstly try to find HTSLib using pkgconf. That usually requires the presence of htslib.pc file. If failed, will fall back to lib[val].so/lib[val].a with optional version suffixes where [val] is the value of CMake variable USE_HTSLIB.
NOTE Remote reference file is not supported.
moodycamel::ConcurrentQueue<T>¶
moodycamel::ConcurrentQueue<T> is used for multi-producer single-consumer output queue.
See also: CMake variable USE_CONCURRENT_QUEUE mentioned below.
Bundled¶
No additional dependency is required.
External¶
The library is header-only, so only the path to concurrentqueue.h is required. At least 1.0.4 is required.
{fmt}¶
{fmt} is used for high-speed formatting FASTA and FASTQ output.
See also: CMake variable USE_LIBFMT mentioned below.
Bundled¶
No additional dependency is required.
External¶
If USE_LIBFMT CMake variable is set to CMAKE, the project will find {fmt} through find_package of CMake. This usually requires fmt-config.cmake. Added in 1.3.4.
If USE_LIBFMT CMake variable is set to other value, the project will firstly try to find {fmt} using pkgconf. That usually requires the presence of fmt.pc file. If failed, will fall back to lib[val].so/lib[val].a with optional version suffixes, where [val] is the value of USE_LIBFMT CMake variable. At least 7.1.3 is required.
Optional Bundled/External Dependencies¶
Not any.
Required Bundled Dependencies¶
The required bundled dependencies of this project is libceu. No additional dependency is required.
Optional Bundled Dependencies¶
BS::thread_pool¶
BS::thread_pool is used as an alternative to Boost.ASIO for older Boost versions where a thread pool is not available.
No additional dependency is required.
See also: CMake variable USE_THREAD_PARALLEL mentioned below.
CMake Variables¶
This project relies on diverse CMake variables that control the build behavior. If you want a specific build (e.g., with accelerated random number generation, with or without debugging information), you should set them accordingly. They should be set when invoking cmake. For example,
cmake -DBUILD_SHARED_LIBS=ON
sets BUILD_SHARED_LIBS to ON.
Following is a list of CMake variables used in this project:
CMAKE_BUILD_TYPE¶
Available since 1.0.0.
Build executables/libraries with different optimization and debugging levels.
Debug(DEFAULT): For developers with debugging needs.NOTE Very, very slow with tons of extra checks.
Release: Optimized executables/libraries without debug symbols. Used for daily use.RelWithDebInfo: Optimized executables/libraries with debug symbols. Used for profiling.
See also: Official CMake documentation.
Python3_EXECUTABLE¶
Available since 1.1.1.
Path to the Python interpreter. Required for adding bundled error profiles to the executable. Default to python3.
See also: Official CMake documentation.
CEU_CM_SHOULD_USE_NATIVE¶
Available since 1.0.0.
Whether to build the binaries using -mtune=native, if possible. This would result in faster executable with impaired portability (i.e., do not run on other machines).
OFF(DEFAULT): Will not build native executables/libraries.ON: Will build native executables/libraries.
CEU_CM_SHOULD_ENABLE_TEST¶
Available since 1.0.0.
Whether test should be enabled.
Unset (DEFAULT): Set to
OFF.OFF: Will disable test.ON: Will enable test.
USE_HTSLIB¶
Available since 1.0.0.
Use which HTSLib implementation.
Unset (DEFAULT): Will use bundled HTSLib. See Bundled HTSLib for requirements.
hts: Will use the HTSLib found in the system. See External HTSLib for requirements.Any other value
[val]: Will use the HTSLib of other names (lib[val].so/lib[val].a) found in the system. See External HTSLib for requirements.
USE_RANDOM_GENERATOR¶
Available since 1.0.0.
The random number generator used.
STL(DEFAULT): Use STL random generators.PCG: PCG random generators. This uses a minimal version of PCG random generators. Available since 1.1.1.SYSTEM_PCG: Use PCG random generators found in the system. See Accelerated Random Number Generators for requirements. Available since 1.3.0.BOOST: Use Boost random generators. See Optional Boost Components for requirements.ONEMKL: Use Intel OneAPI MKL random generators. See Accelerated Random Number Generators for requirements. See also CMake variableFIND_RANDOM_MKL_THROUGH_PKGCONFbelow.
On my system (13th Gen Intel(R) Core(TM) i7-13700H, Intel OneAPI BaseKit 2025.2) for generating filling 1024 random 32-bit unsigned integers 1024 times with 200 replicate, the performance is:
MKL::VSL_BRNG_SFMT19937(32 bits): gmean: 286; mean/sd: 286/3 us
MKL::VSL_BRNG_MT19937(32 bits): gmean: 435; mean/sd: 446/132 us
PCG::pcg32_fast(32 bits): gmean: 848; mean/sd: 848/5 us
absl::InsecureBitGen(64 bits): gmean: 1,486; mean/sd: 1,486/33 us
boost::random::mt19937(32 bits): gmean: 1,756; mean/sd: 1,756/19 us
std::mt19937(32 bits): gmean: 1,852; mean/sd: 1,853/59 us
GSL::mt19937(32 bits): gmean: 2,420; mean/sd: 2,422/114 us
absl::BitGen(64 bits): gmean: 3,543; mean/sd: 3,548/202 us
On another OrangePi 3B ARM development board (Rockchip RK3566 quad-core 64-bit CPU, Arm C/C++/Fortran Compiler version 24.10.1 (build number 4) (based on LLVM 19.1.0)), the performance is:
PCG::pcg32_fast(32 bits): gmean: 4,452; mean/sd: 4,452/30 us
absl::InsecureBitGen(64 bits): gmean: 12,787; mean/sd: 12,787/66 us
boost::random::mt19937(32 bits): gmean: 15,489; mean/sd: 15,490/117 us
std::mt19937(32 bits): gmean: 17,848; mean/sd: 17,848/137 us
GSL::mt19937(32 bits): gmean: 20,802; mean/sd: 20,802/144 us
absl::BitGen(64 bits): gmean: 29,104; mean/sd: 29,104/112 us
NOTE The performance may vary on different platforms.
Benchmark code available at https://github.com/YU-Zhejian/art_modern_bench_rand.
USE_THREAD_PARALLEL¶
Available since 1.0.0.
The thread-level parallelism strategy.
ASIO(DEFAULT): Will use Boost.ASIO for thread-based parallelism.NOTE This is only available in Boost at least 1.66.
BS: Will useBS::thread_pool. Available since 1.1.1. SeeBS::thread_poolfor requirements.NOP: Will not use thread-based parallelism. Useful for debugging.
BOOST_CONFIG_PROVIDED_BY_BOOST¶
Available since 1.1.3.
Configures the behavior of CMake policy CMP0167. There’s usually no need to change this. You only need to set this switch to OFF if you have Boost earlier than 1.69 with CMake at least 3.30.
ON(DEFAULT): Will use the set the policy toNEW.OFF: Will use the set the policy toOLD.
USE_MALLOC¶
Available since 1.1.1.
Whether to use alternative high-performance malloc/free implementations like mi-malloc, jemalloc, or tcmalloc (see above). Using those implementations can improve the performance of the program but slightly increase memory consumption.
AUTO(DEFAULT): Will use jemalloc and then mi-malloc, then tcmalloc, finally minimal tcmalloc if possible.JEMALLOC: Find and use jemalloc, and fail if not found.MIMALLOC: Find and use mi-malloc, and fail if not found.TCMALLOC: Find and use tcmalloc, and fail if not found.TCMALLOC_MINIMAL: Find and use the minimal version of tcmalloc, and fail if not found.NOP: Will not use alternativemalloc/freeimplementations. I.e., use the system-providedmalloc/freeimplementations.
See Alternate malloc/free Implementations for requirements.
USE_LIBFMT¶
Available since 1.1.7.
Whether to use bundled {fmt} library for formatting strings.
Unset (DEFAULT): Will use bundled
{fmt}. See Bundled{fmt}for requirements.CMAKE: Will use the{fmt}found in the system. See External{fmt}for requirements. Added in 1.3.4.fmt: Will use the{fmt}found in the system. See External{fmt}for requirements.Any other value
[val]: Will use the{fmt}of other names (lib[val].so) found in the system. See External{fmt}for requirements.
USE_CONCURRENT_QUEUE¶
Available since 1.1.7.
Whether to use bundled moodycamel::ConcurrentQueue<T>. Specifically, where concurrentqueue.h is located. For example, if you use Debian GNU/Linux and installed libconcurrentqueue-dev, you may set this variable to /usr/include/concurrentqueue/moodycamel/ or /usr/include/concurrentqueue/.
Unset (DEFAULT): Will use bundled
moodycamel::ConcurrentQueue<T>. See Bundledmoodycamel::ConcurrentQueue<T>for requirements.Any value
[val]: Will search formoodycamel::ConcurrentQueue<T>at including path[val]. See Externalmoodycamel::ConcurrentQueue<T>for requirements.
REPRODUCIBLE_BUILDS¶
Available since 1.1.7.
Whether to enable reproducible builds. This complies Debian policies.
Unset (DEFAULT): Will not enable reproducible builds.
ON: Will enable reproducible builds. All used__DATE__and__TIME__macros will be replaced with fixed values.
See also:
BUILD_ART_MODERN_BENCHMARKS¶
Available since 1.1.7.
Whether to build mini benchmarks executable.
Unset (DEFAULT): Will not build benchmarks.
ON: Will build benchmarks.
WITH_MPI¶
Available since 1.2.0.
Whether to enable MPI-based parallelization.
Unset (DEFAULT)/OFF: Will not enable MPI-based parallelization.
ON: Will enable MPI-based parallelization.
See MPI Library for requirements.
FIND_RANDOM_MKL_THROUGH_PKGCONF¶
Available since 1.3.0.
Find Intel OneAPI MKL through pkgconf. Must be specified with CMake variable USE_RANDOM_GENERATOR set to ONEMKL.
Unset (DEFAULT): Will not find Intel OneAPI MKL through pkgconf.
Any value
[val]: Will find Intel OneAPI MKL through pkgconf with the name[val]. This requires the presence of[val].pcfile.
WITH_NCBI_NGS¶
Available since 1.3.3.
Enable NCBI SRA parsing to art_profile_builder.
See also NCBI NGS Libraries for requirements.