diff options
author | Tom Stellard <thomas.stellard@amd.com> | 2014-12-31 14:56:56 +0000 |
---|---|---|
committer | Tom Stellard <thomas.stellard@amd.com> | 2014-12-31 14:56:56 +0000 |
commit | 6ad55e925a906382607f275be2c78da988d13d2a (patch) | |
tree | 4389f06ba859bb72d4c8e0f68e74d1de6736d0fe | |
parent | Merging r224333: (diff) | |
parent | [analyzer] Include a couple more comments on using xcrun to query the SDK. (diff) | |
download | llvm-project-6ad55e925a906382607f275be2c78da988d13d2a.tar.gz llvm-project-6ad55e925a906382607f275be2c78da988d13d2a.tar.bz2 llvm-project-6ad55e925a906382607f275be2c78da988d13d2a.zip |
Creating a 3.5 branch, which is compatible with LLVM 3.5
This branch will probably not be mantained. Its purpose is to mark
the last commit that is compatible with LLVM 3.5.
llvm-svn: 225040
361 files changed, 7878 insertions, 0 deletions
diff --git a/libclc/CREDITS.TXT b/libclc/CREDITS.TXT new file mode 100644 index 000000000000..b18d40bd7339 --- /dev/null +++ b/libclc/CREDITS.TXT @@ -0,0 +1,2 @@ +N: Peter Collingbourne +E: peter@pcc.me.uk diff --git a/libclc/LICENSE.TXT b/libclc/LICENSE.TXT new file mode 100644 index 000000000000..03a00447d6f8 --- /dev/null +++ b/libclc/LICENSE.TXT @@ -0,0 +1,64 @@ +============================================================================== +libclc License +============================================================================== + +The libclc library is dual licensed under both the University of Illinois +"BSD-Like" license and the MIT license. As a user of this code you may choose +to use it under either license. As a contributor, you agree to allow your code +to be used under both. + +Full text of the relevant licenses is included below. + +============================================================================== + +Copyright (c) 2011-2014 by the contributors listed in CREDITS.TXT + +All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal with +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimers. + + * Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimers in the + documentation and/or other materials provided with the distribution. + + * The names of the contributors may not be used to endorse or promote + products derived from this Software without specific prior written + permission. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS +FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE +SOFTWARE. + +============================================================================== + +Copyright (c) 2011-2014 by the contributors listed in CREDITS.TXT + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff --git a/libclc/README.TXT b/libclc/README.TXT new file mode 100644 index 000000000000..00ae6bfa40a1 --- /dev/null +++ b/libclc/README.TXT @@ -0,0 +1,52 @@ +libclc +------ + +libclc is an open source, BSD licensed implementation of the library +requirements of the OpenCL C programming language, as specified by the +OpenCL 1.1 Specification. The following sections of the specification +impose library requirements: + + * 6.1: Supported Data Types + * 6.2.3: Explicit Conversions + * 6.2.4.2: Reinterpreting Types Using as_type() and as_typen() + * 6.9: Preprocessor Directives and Macros + * 6.11: Built-in Functions + * 9.3: Double Precision Floating-Point + * 9.4: 64-bit Atomics + * 9.5: Writing to 3D image memory objects + * 9.6: Half Precision Floating-Point + +libclc is intended to be used with the Clang compiler's OpenCL frontend. + +libclc is designed to be portable and extensible. To this end, it provides +generic implementations of most library requirements, allowing the target +to override the generic implementation at the granularity of individual +functions. + +libclc currently only supports the PTX target, but support for more +targets is welcome. + +Compiling and installing with Make +---------------------------------- + +$ ./configure.py --with-llvm-config=/path/to/llvm-config && make +$ make install + +Note you can use the DESTDIR Makefile variable to do staged installs. + +$ make install DESTDIR=/path/for/staged/install + +Compiling and installing with Ninja +----------------------------------- + +$ ./configure.py -g ninja --with-llvm-config=/path/to/llvm-config && ninja +$ ninja install + +Note you can use the DESTDIR environment variable to do staged installs. + +$ DESTDIR=/path/for/staged/install ninja install + +Website +------- + +http://www.pcc.me.uk/~peter/libclc/ diff --git a/libclc/build/metabuild.py b/libclc/build/metabuild.py new file mode 100644 index 000000000000..4ab5db58e06e --- /dev/null +++ b/libclc/build/metabuild.py @@ -0,0 +1,100 @@ +import ninja_syntax +import os + +# Simple meta-build system. + +class Make(object): + def __init__(self): + self.output = open(self.output_filename(), 'w') + self.rules = {} + self.rule_text = '' + self.all_targets = [] + self.default_targets = [] + self.clean_files = [] + self.distclean_files = [] + self.output.write("""all:: + +ifndef VERBOSE + Verb = @ +endif + +""") + + def output_filename(self): + return 'Makefile' + + def rule(self, name, command, description=None, depfile=None, + generator=False): + self.rules[name] = {'command': command, 'description': description, + 'depfile': depfile, 'generator': generator} + + def build(self, output, rule, inputs=[], implicit=[], order_only=[]): + inputs = self._as_list(inputs) + implicit = self._as_list(implicit) + order_only = self._as_list(order_only) + + output_dir = os.path.dirname(output) + if output_dir != '' and not os.path.isdir(output_dir): + os.makedirs(output_dir) + + dollar_in = ' '.join(inputs) + subst = lambda text: text.replace('$in', dollar_in).replace('$out', output) + + deps = ' '.join(inputs + implicit) + if order_only: + deps += ' | ' + deps += ' '.join(order_only) + self.output.write('%s: %s\n' % (output, deps)) + + r = self.rules[rule] + command = subst(r['command']) + if r['description']: + desc = subst(r['description']) + self.output.write('\t@echo %s\n\t$(Verb) %s\n' % (desc, command)) + else: + self.output.write('\t%s\n' % command) + if r['depfile']: + depfile = subst(r['depfile']) + self.output.write('-include '+depfile+'\n') + self.output.write('\n') + + self.all_targets.append(output) + if r['generator']: + self.distclean_files.append(output) + if r['depfile']: + self.distclean_files.append(depfile) + else: + self.clean_files.append(output) + if r['depfile']: + self.distclean_files.append(depfile) + + + def _as_list(self, input): + if isinstance(input, list): + return input + return [input] + + def default(self, paths): + self.default_targets += self._as_list(paths) + + def finish(self): + self.output.write('all:: %s\n\n' % ' '.join(self.default_targets or self.all_targets)) + self.output.write('clean: \n\trm -f %s\n\n' % ' '.join(self.clean_files)) + self.output.write('distclean: clean\n\trm -f %s\n' % ' '.join(self.distclean_files)) + +class Ninja(ninja_syntax.Writer): + def __init__(self): + ninja_syntax.Writer.__init__(self, open(self.output_filename(), 'w')) + + def output_filename(self): + return 'build.ninja' + + def finish(self): + pass + +def from_name(name): + if name == 'make': + return Make() + if name == 'ninja': + return Ninja() + raise LookupError, 'unknown generator: %s; supported generators are make and ninja' % name diff --git a/libclc/build/ninja_syntax.py b/libclc/build/ninja_syntax.py new file mode 100644 index 000000000000..7d9f592dfadf --- /dev/null +++ b/libclc/build/ninja_syntax.py @@ -0,0 +1,118 @@ +#!/usr/bin/python + +"""Python module for generating .ninja files. + +Note that this is emphatically not a required piece of Ninja; it's +just a helpful utility for build-file-generation systems that already +use Python. +""" + +import textwrap +import re + +class Writer(object): + def __init__(self, output, width=78): + self.output = output + self.width = width + + def newline(self): + self.output.write('\n') + + def comment(self, text): + for line in textwrap.wrap(text, self.width - 2): + self.output.write('# ' + line + '\n') + + def variable(self, key, value, indent=0): + if value is None: + return + if isinstance(value, list): + value = ' '.join(value) + self._line('%s = %s' % (key, value), indent) + + def rule(self, name, command, description=None, depfile=None, + generator=False): + self._line('rule %s' % name) + self.variable('command', escape(command), indent=1) + if description: + self.variable('description', description, indent=1) + if depfile: + self.variable('depfile', depfile, indent=1) + if generator: + self.variable('generator', '1', indent=1) + + def build(self, outputs, rule, inputs=None, implicit=None, order_only=None, + variables=None): + outputs = self._as_list(outputs) + all_inputs = self._as_list(inputs)[:] + + if implicit: + all_inputs.append('|') + all_inputs.extend(self._as_list(implicit)) + if order_only: + all_inputs.append('||') + all_inputs.extend(self._as_list(order_only)) + + self._line('build %s: %s %s' % (' '.join(outputs), + rule, + ' '.join(all_inputs))) + + if variables: + for key, val in variables: + self.variable(key, val, indent=1) + + return outputs + + def include(self, path): + self._line('include %s' % path) + + def subninja(self, path): + self._line('subninja %s' % path) + + def default(self, paths): + self._line('default %s' % ' '.join(self._as_list(paths))) + + def _line(self, text, indent=0): + """Write 'text' word-wrapped at self.width characters.""" + leading_space = ' ' * indent + while len(text) > self.width: + # The text is too wide; wrap if possible. + + # Find the rightmost space that would obey our width constraint. + available_space = self.width - len(leading_space) - len(' $') + space = text.rfind(' ', 0, available_space) + if space < 0: + # No such space; just use the first space we can find. + space = text.find(' ', available_space) + if space < 0: + # Give up on breaking. + break + + self.output.write(leading_space + text[0:space] + ' $\n') + text = text[space+1:] + + # Subsequent lines are continuations, so indent them. + leading_space = ' ' * (indent+2) + + self.output.write(leading_space + text + '\n') + + def _as_list(self, input): + if input is None: + return [] + if isinstance(input, list): + return input + return [input] + + +def escape(string): + """Escape a string such that Makefile and shell variables are + correctly escaped for use in a Ninja file. + """ + assert '\n' not in string, 'Ninja syntax does not allow newlines' + # We only have one special metacharacter: '$'. + + # We should leave $in and $out untouched. + # Just look for makefile/shell style substitutions + return re.sub(r'(\$[{(][a-z_]+[})])', + r'$\1', + string, + flags=re.IGNORECASE) diff --git a/libclc/compile-test.sh b/libclc/compile-test.sh new file mode 100755 index 000000000000..47c7f385bb92 --- /dev/null +++ b/libclc/compile-test.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +clang -target nvptx--nvidiacl -Iptx-nvidiacl/include -Igeneric/include -Xclang -mlink-bitcode-file -Xclang nvptx--nvidiacl/lib/builtins.bc -include clc/clc.h -Dcl_clang_storage_class_specifiers -Dcl_khr_fp64 "$@" diff --git a/libclc/configure.py b/libclc/configure.py new file mode 100755 index 000000000000..7170f46cd7a5 --- /dev/null +++ b/libclc/configure.py @@ -0,0 +1,247 @@ +#!/usr/bin/python + +def c_compiler_rule(b, name, description, compiler, flags): + command = "%s -MMD -MF $out.d %s -c -o $out $in" % (compiler, flags) + b.rule(name, command, description + " $out", depfile="$out.d") + +version_major = 0; +version_minor = 0; +version_patch = 1; + +from optparse import OptionParser +import os +import string +from subprocess import * +import sys + +srcdir = os.path.dirname(sys.argv[0]) + +sys.path.insert(0, os.path.join(srcdir, 'build')) +import metabuild + +p = OptionParser() +p.add_option('--with-llvm-config', metavar='PATH', + help='use given llvm-config script') +p.add_option('--with-cxx-compiler', metavar='PATH', + help='use given C++ compiler') +p.add_option('--prefix', metavar='PATH', + help='install to given prefix') +p.add_option('--libexecdir', metavar='PATH', + help='install *.bc to given dir') +p.add_option('--includedir', metavar='PATH', + help='install include files to given dir') +p.add_option('--pkgconfigdir', metavar='PATH', + help='install clc.pc to given dir') +p.add_option('-g', metavar='GENERATOR', default='make', + help='use given generator (default: make)') +(options, args) = p.parse_args() + +llvm_config_exe = options.with_llvm_config or "llvm-config" + +prefix = options.prefix +if not prefix: + prefix = '/usr/local' + +libexecdir = options.libexecdir +if not libexecdir: + libexecdir = os.path.join(prefix, 'lib/clc') + +includedir = options.includedir +if not includedir: + includedir = os.path.join(prefix, 'include') + +pkgconfigdir = options.pkgconfigdir +if not pkgconfigdir: + pkgconfigdir = os.path.join(prefix, 'share/pkgconfig') + +def llvm_config(args): + try: + proc = Popen([llvm_config_exe] + args, stdout=PIPE) + return proc.communicate()[0].rstrip().replace('\n', ' ') + except OSError: + print "Error executing llvm-config." + print "Please ensure that llvm-config is in your $PATH, or use --with-llvm-config." + sys.exit(1) + +llvm_version = string.split(string.replace(llvm_config(['--version']), 'svn', ''), '.') +llvm_system_libs = '' +if (int(llvm_version[0]) == 3 and int(llvm_version[1]) >= 5) or int(llvm_version[0]) > 3: + llvm_system_libs = llvm_config(['--system-libs']) +llvm_bindir = llvm_config(['--bindir']) +llvm_core_libs = llvm_config(['--libs', 'core', 'bitreader', 'bitwriter']) + ' ' + \ + llvm_system_libs + ' ' + \ + llvm_config(['--ldflags']) +llvm_cxxflags = llvm_config(['--cxxflags']) + ' -fno-exceptions -fno-rtti' +llvm_libdir = llvm_config(['--libdir']) + +llvm_clang = os.path.join(llvm_bindir, 'clang') +llvm_link = os.path.join(llvm_bindir, 'llvm-link') +llvm_opt = os.path.join(llvm_bindir, 'opt') + +cxx_compiler = options.with_cxx_compiler +if not cxx_compiler: + cxx_compiler = os.path.join(llvm_bindir, 'clang++') + +available_targets = { + 'r600--' : { 'devices' : + [{'gpu' : 'cedar', 'aliases' : ['palm', 'sumo', 'sumo2', 'redwood', 'juniper']}, + {'gpu' : 'cypress', 'aliases' : ['hemlock']}, + {'gpu' : 'barts', 'aliases' : ['turks', 'caicos']}, + {'gpu' : 'cayman', 'aliases' : ['aruba']}, + {'gpu' : 'tahiti', 'aliases' : ['pitcairn', 'verde', 'oland', 'hainan', 'bonaire', 'kabini', 'kaveri', 'hawaii','mullins']}]}, + 'nvptx--' : { 'devices' : [{'gpu' : '', 'aliases' : []}] }, + 'nvptx64--' : { 'devices' : [{'gpu' : '', 'aliases' : []}] }, + 'nvptx--nvidiacl' : { 'devices' : [{'gpu' : '', 'aliases' : []}] }, + 'nvptx64--nvidiacl' : { 'devices' : [{'gpu' : '', 'aliases' : []}] } +} + +default_targets = ['nvptx--nvidiacl', 'nvptx64--nvidiacl', 'r600--'] + +targets = args +if not targets: + targets = default_targets + +b = metabuild.from_name(options.g) + +b.rule("LLVM_AS", "%s -o $out $in" % os.path.join(llvm_bindir, "llvm-as"), + 'LLVM-AS $out') +b.rule("LLVM_LINK", command = llvm_link + " -o $out $in", + description = 'LLVM-LINK $out') +b.rule("OPT", command = llvm_opt + " -O3 -o $out $in", + description = 'OPT $out') + +c_compiler_rule(b, "LLVM_TOOL_CXX", 'CXX', cxx_compiler, llvm_cxxflags) +b.rule("LLVM_TOOL_LINK", cxx_compiler + " -o $out $in %s" % llvm_core_libs + " -Wl,-rpath %s" % llvm_libdir, 'LINK $out') + +prepare_builtins = os.path.join('utils', 'prepare-builtins') +b.build(os.path.join('utils', 'prepare-builtins.o'), "LLVM_TOOL_CXX", + os.path.join(srcdir, 'utils', 'prepare-builtins.cpp')) +b.build(prepare_builtins, "LLVM_TOOL_LINK", + os.path.join('utils', 'prepare-builtins.o')) + +b.rule("PREPARE_BUILTINS", "%s -o $out $in" % prepare_builtins, + 'PREPARE-BUILTINS $out') +b.rule("PYTHON_GEN", "python < $in > $out", "PYTHON_GEN $out") +b.build('generic/lib/convert.cl', "PYTHON_GEN", ['generic/lib/gen_convert.py']) + +manifest_deps = set([sys.argv[0], os.path.join(srcdir, 'build', 'metabuild.py'), + os.path.join(srcdir, 'build', 'ninja_syntax.py')]) + +install_files_bc = [] +install_deps = [] + +# Create libclc.pc +clc = open('libclc.pc', 'w') +clc.write('includedir=%(inc)s\nlibexecdir=%(lib)s\n\nName: libclc\nDescription: Library requirements of the OpenCL C programming language\nVersion: %(maj)s.%(min)s.%(pat)s\nCflags: -I${includedir}\nLibs: -L${libexecdir}' % +{'inc': includedir, 'lib': libexecdir, 'maj': version_major, 'min': version_minor, 'pat': version_patch}) +clc.close() + +for target in targets: + (t_arch, t_vendor, t_os) = target.split('-') + archs = [t_arch] + if t_arch == 'nvptx' or t_arch == 'nvptx64': + archs.append('ptx') + archs.append('generic') + + subdirs = [] + for arch in archs: + subdirs.append("%s-%s-%s" % (arch, t_vendor, t_os)) + subdirs.append("%s-%s" % (arch, t_os)) + subdirs.append(arch) + + incdirs = filter(os.path.isdir, + [os.path.join(srcdir, subdir, 'include') for subdir in subdirs]) + libdirs = filter(lambda d: os.path.isfile(os.path.join(d, 'SOURCES')), + [os.path.join(srcdir, subdir, 'lib') for subdir in subdirs]) + + clang_cl_includes = ' '.join(["-I%s" % incdir for incdir in incdirs]) + + for device in available_targets[target]['devices']: + # The rule for building a .bc file for the specified architecture using clang. + clang_bc_flags = "-target %s -I`dirname $in` %s " \ + "-fno-builtin " \ + "-Dcl_clang_storage_class_specifiers " \ + "-Dcl_khr_fp64 " \ + "-Dcles_khr_int64 " \ + "-D__CLC_INTERNAL " \ + "-emit-llvm" % (target, clang_cl_includes) + if device['gpu'] != '': + clang_bc_flags += ' -mcpu=' + device['gpu'] + clang_bc_rule = "CLANG_CL_BC_" + target + "_" + device['gpu'] + c_compiler_rule(b, clang_bc_rule, "LLVM-CC", llvm_clang, clang_bc_flags) + + objects = [] + sources_seen = set() + + if device['gpu'] == '': + full_target_name = target + obj_suffix = '' + else: + full_target_name = device['gpu'] + '-' + target + obj_suffix = '.' + device['gpu'] + + for libdir in libdirs: + subdir_list_file = os.path.join(libdir, 'SOURCES') + manifest_deps.add(subdir_list_file) + override_list_file = os.path.join(libdir, 'OVERRIDES') + + # Add target overrides + if os.path.exists(override_list_file): + for override in open(override_list_file).readlines(): + override = override.rstrip() + sources_seen.add(override) + + for src in open(subdir_list_file).readlines(): + src = src.rstrip() + if src not in sources_seen: + sources_seen.add(src) + obj = os.path.join(target, 'lib', src + obj_suffix + '.bc') + objects.append(obj) + src_file = os.path.join(libdir, src) + ext = os.path.splitext(src)[1] + if ext == '.ll': + b.build(obj, 'LLVM_AS', src_file) + else: + b.build(obj, clang_bc_rule, src_file) + + builtins_link_bc = os.path.join(target, 'lib', 'builtins.link' + obj_suffix + '.bc') + builtins_opt_bc = os.path.join(target, 'lib', 'builtins.opt' + obj_suffix + '.bc') + builtins_bc = os.path.join('built_libs', full_target_name + '.bc') + b.build(builtins_link_bc, "LLVM_LINK", objects) + b.build(builtins_opt_bc, "OPT", builtins_link_bc) + b.build(builtins_bc, "PREPARE_BUILTINS", builtins_opt_bc, prepare_builtins) + install_files_bc.append((builtins_bc, builtins_bc)) + install_deps.append(builtins_bc) + for alias in device['aliases']: + # Ninja cannot have multiple rules with same name so append suffix + ruleName = "CREATE_ALIAS_{0}_for_{1}".format(alias, device['gpu']) + b.rule(ruleName, "ln -fs %s $out" % os.path.basename(builtins_bc) + ,"CREATE-ALIAS $out") + + alias_file = os.path.join('built_libs', alias + '-' + target + '.bc') + b.build(alias_file, ruleName, builtins_bc) + install_files_bc.append((alias_file, alias_file)) + install_deps.append(alias_file) + b.default(builtins_bc) + + +install_cmd = ' && '.join(['mkdir -p ${DESTDIR}/%(dst)s && cp -r %(src)s ${DESTDIR}/%(dst)s' % + {'src': file, + 'dst': libexecdir} + for (file, dest) in install_files_bc]) +install_cmd = ' && '.join(['%(old)s && mkdir -p ${DESTDIR}/%(dst)s && cp -r %(srcdir)s/generic/include/clc ${DESTDIR}/%(dst)s' % + {'old': install_cmd, + 'dst': includedir, + 'srcdir': srcdir}]) +install_cmd = ' && '.join(['%(old)s && mkdir -p ${DESTDIR}/%(dst)s && cp -r libclc.pc ${DESTDIR}/%(dst)s' % + {'old': install_cmd, + 'dst': pkgconfigdir}]) + +b.rule('install', command = install_cmd, description = 'INSTALL') +b.build('install', 'install', install_deps) + +b.rule("configure", command = ' '.join(sys.argv), description = 'CONFIGURE', + generator = True) +b.build(b.output_filename(), 'configure', list(manifest_deps)) + +b.finish() diff --git a/libclc/generic/include/clc/as_type.h b/libclc/generic/include/clc/as_type.h new file mode 100644 index 000000000000..0bb9ee2e8313 --- /dev/null +++ b/libclc/generic/include/clc/as_type.h @@ -0,0 +1,68 @@ +#define as_char(x) __builtin_astype(x, char) +#define as_uchar(x) __builtin_astype(x, uchar) +#define as_short(x) __builtin_astype(x, short) +#define as_ushort(x) __builtin_astype(x, ushort) +#define as_int(x) __builtin_astype(x, int) +#define as_uint(x) __builtin_astype(x, uint) +#define as_long(x) __builtin_astype(x, long) +#define as_ulong(x) __builtin_astype(x, ulong) +#define as_float(x) __builtin_astype(x, float) + +#define as_char2(x) __builtin_astype(x, char2) +#define as_uchar2(x) __builtin_astype(x, uchar2) +#define as_short2(x) __builtin_astype(x, short2) +#define as_ushort2(x) __builtin_astype(x, ushort2) +#define as_int2(x) __builtin_astype(x, int2) +#define as_uint2(x) __builtin_astype(x, uint2) +#define as_long2(x) __builtin_astype(x, long2) +#define as_ulong2(x) __builtin_astype(x, ulong2) +#define as_float2(x) __builtin_astype(x, float2) + +#define as_char3(x) __builtin_astype(x, char3) +#define as_uchar3(x) __builtin_astype(x, uchar3) +#define as_short3(x) __builtin_astype(x, short3) +#define as_ushort3(x) __builtin_astype(x, ushort3) +#define as_int3(x) __builtin_astype(x, int3) +#define as_uint3(x) __builtin_astype(x, uint3) +#define as_long3(x) __builtin_astype(x, long3) +#define as_ulong3(x) __builtin_astype(x, ulong3) +#define as_float3(x) __builtin_astype(x, float3) + +#define as_char4(x) __builtin_astype(x, char4) +#define as_uchar4(x) __builtin_astype(x, uchar4) +#define as_short4(x) __builtin_astype(x, short4) +#define as_ushort4(x) __builtin_astype(x, ushort4) +#define as_int4(x) __builtin_astype(x, int4) +#define as_uint4(x) __builtin_astype(x, uint4) +#define as_long4(x) __builtin_astype(x, long4) +#define as_ulong4(x) __builtin_astype(x, ulong4) +#define as_float4(x) __builtin_astype(x, float4) + +#define as_char8(x) __builtin_astype(x, char8) +#define as_uchar8(x) __builtin_astype(x, uchar8) +#define as_short8(x) __builtin_astype(x, short8) +#define as_ushort8(x) __builtin_astype(x, ushort8) +#define as_int8(x) __builtin_astype(x, int8) +#define as_uint8(x) __builtin_astype(x, uint8) +#define as_long8(x) __builtin_astype(x, long8) +#define as_ulong8(x) __builtin_astype(x, ulong8) +#define as_float8(x) __builtin_astype(x, float8) + +#define as_char16(x) __builtin_astype(x, char16) +#define as_uchar16(x) __builtin_astype(x, uchar16) +#define as_short16(x) __builtin_astype(x, short16) +#define as_ushort16(x) __builtin_astype(x, ushort16) +#define as_int16(x) __builtin_astype(x, int16) +#define as_uint16(x) __builtin_astype(x, uint16) +#define as_long16(x) __builtin_astype(x, long16) +#define as_ulong16(x) __builtin_astype(x, ulong16) +#define as_float16(x) __builtin_astype(x, float16) + +#ifdef cl_khr_fp64 +#define as_double(x) __builtin_astype(x, double) +#define as_double2(x) __builtin_astype(x, double2) +#define as_double3(x) __builtin_astype(x, double3) +#define as_double4(x) __builtin_astype(x, double4) +#define as_double8(x) __builtin_astype(x, double8) +#define as_double16(x) __builtin_astype(x, double16) +#endif diff --git a/libclc/generic/include/clc/async/async_work_group_copy.h b/libclc/generic/include/clc/async/async_work_group_copy.h new file mode 100644 index 000000000000..39c637b0e265 --- /dev/null +++ b/libclc/generic/include/clc/async/async_work_group_copy.h @@ -0,0 +1,15 @@ +#define __CLC_DST_ADDR_SPACE local +#define __CLC_SRC_ADDR_SPACE global +#define __CLC_BODY <clc/async/async_work_group_copy.inc> +#include <clc/async/gentype.inc> +#undef __CLC_DST_ADDR_SPACE +#undef __CLC_SRC_ADDR_SPACE +#undef __CLC_BODY + +#define __CLC_DST_ADDR_SPACE global +#define __CLC_SRC_ADDR_SPACE local +#define __CLC_BODY <clc/async/async_work_group_copy.inc> +#include <clc/async/gentype.inc> +#undef __CLC_DST_ADDR_SPACE +#undef __CLC_SRC_ADDR_SPACE +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/async/async_work_group_copy.inc b/libclc/generic/include/clc/async/async_work_group_copy.inc new file mode 100644 index 000000000000..d85df6c8fadd --- /dev/null +++ b/libclc/generic/include/clc/async/async_work_group_copy.inc @@ -0,0 +1,5 @@ +_CLC_OVERLOAD _CLC_DECL event_t async_work_group_copy( + __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst, + const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src, + size_t num_gentypes, + event_t event); diff --git a/libclc/generic/include/clc/async/async_work_group_strided_copy.h b/libclc/generic/include/clc/async/async_work_group_strided_copy.h new file mode 100644 index 000000000000..bfa6f31faca8 --- /dev/null +++ b/libclc/generic/include/clc/async/async_work_group_strided_copy.h @@ -0,0 +1,15 @@ +#define __CLC_DST_ADDR_SPACE local +#define __CLC_SRC_ADDR_SPACE global +#define __CLC_BODY <clc/async/async_work_group_strided_copy.inc> +#include <clc/async/gentype.inc> +#undef __CLC_DST_ADDR_SPACE +#undef __CLC_SRC_ADDR_SPACE +#undef __CLC_BODY + +#define __CLC_DST_ADDR_SPACE global +#define __CLC_SRC_ADDR_SPACE local +#define __CLC_BODY <clc/async/async_work_group_strided_copy.inc> +#include <clc/async/gentype.inc> +#undef __CLC_DST_ADDR_SPACE +#undef __CLC_SRC_ADDR_SPACE +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/async/async_work_group_strided_copy.inc b/libclc/generic/include/clc/async/async_work_group_strided_copy.inc new file mode 100644 index 000000000000..bdbea3aa4a16 --- /dev/null +++ b/libclc/generic/include/clc/async/async_work_group_strided_copy.inc @@ -0,0 +1,6 @@ +_CLC_OVERLOAD _CLC_DECL event_t async_work_group_strided_copy( + __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst, + const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src, + size_t num_gentypes, + size_t stride, + event_t event); diff --git a/libclc/generic/include/clc/async/gentype.inc b/libclc/generic/include/clc/async/gentype.inc new file mode 100644 index 000000000000..6b79acdff171 --- /dev/null +++ b/libclc/generic/include/clc/async/gentype.inc @@ -0,0 +1,204 @@ + +#define __CLC_GENTYPE char +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE char2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE char4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE char8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE char16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uchar +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uchar2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uchar4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uchar8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uchar16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE short +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE short2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE short4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE short8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE short16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ushort +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ushort2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ushort4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ushort8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ushort16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE long +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE long2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE long4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE long8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE long16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ulong +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ulong2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ulong4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ulong8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE ulong16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#ifdef cl_khr_fp64 + +#define __CLC_GENTYPE double +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#endif diff --git a/libclc/generic/include/clc/async/prefetch.h b/libclc/generic/include/clc/async/prefetch.h new file mode 100644 index 000000000000..f64bc2045de9 --- /dev/null +++ b/libclc/generic/include/clc/async/prefetch.h @@ -0,0 +1,3 @@ +#define __CLC_BODY <clc/async/prefetch.inc> +#include <clc/async/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/async/prefetch.inc b/libclc/generic/include/clc/async/prefetch.inc new file mode 100644 index 000000000000..f817a66c249c --- /dev/null +++ b/libclc/generic/include/clc/async/prefetch.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL void prefetch(const global __CLC_GENTYPE *p, size_t num_gentypes); diff --git a/libclc/generic/include/clc/async/wait_group_events.h b/libclc/generic/include/clc/async/wait_group_events.h new file mode 100644 index 000000000000..799efa0a791c --- /dev/null +++ b/libclc/generic/include/clc/async/wait_group_events.h @@ -0,0 +1 @@ +void wait_group_events(int num_events, event_t *event_list); diff --git a/libclc/generic/include/clc/atomic/atomic_add.h b/libclc/generic/include/clc/atomic/atomic_add.h new file mode 100644 index 000000000000..7dd4fd3c682e --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_add.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_add +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_and.h b/libclc/generic/include/clc/atomic/atomic_and.h new file mode 100644 index 000000000000..a198c46b7ee9 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_and.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_and +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_cmpxchg.h b/libclc/generic/include/clc/atomic/atomic_cmpxchg.h new file mode 100644 index 000000000000..2e4f1c21dcc2 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_cmpxchg.h @@ -0,0 +1,15 @@ +#define __CLC_FUNCTION atomic_cmpxchg + +#define __CLC_DECLARE_ATOMIC_3_ARG(ADDRSPACE, TYPE) \ + _CLC_OVERLOAD _CLC_DECL TYPE __CLC_FUNCTION (volatile ADDRSPACE TYPE *, TYPE, TYPE); + +#define __CLC_DECLARE_ATOMIC_ADDRSPACE_3_ARG(TYPE) \ + __CLC_DECLARE_ATOMIC_3_ARG(global, TYPE) \ + __CLC_DECLARE_ATOMIC_3_ARG(local, TYPE) + +__CLC_DECLARE_ATOMIC_ADDRSPACE_3_ARG(int) +__CLC_DECLARE_ATOMIC_ADDRSPACE_3_ARG(uint) + +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC_3_ARG +#undef __CLC_DECLARE_ATOMIC_ADDRESS_SPACE_3_ARG diff --git a/libclc/generic/include/clc/atomic/atomic_dec.h b/libclc/generic/include/clc/atomic/atomic_dec.h new file mode 100644 index 000000000000..15d05884aeb4 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_dec.h @@ -0,0 +1 @@ +#define atomic_dec(p) atomic_sub(p, 1) diff --git a/libclc/generic/include/clc/atomic/atomic_decl.inc b/libclc/generic/include/clc/atomic/atomic_decl.inc new file mode 100644 index 000000000000..49ccde2bae52 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_decl.inc @@ -0,0 +1,10 @@ + +#define __CLC_DECLARE_ATOMIC(ADDRSPACE, TYPE) \ + _CLC_OVERLOAD _CLC_DECL TYPE __CLC_FUNCTION (volatile ADDRSPACE TYPE *, TYPE); + +#define __CLC_DECLARE_ATOMIC_ADDRSPACE(TYPE) \ + __CLC_DECLARE_ATOMIC(global, TYPE) \ + __CLC_DECLARE_ATOMIC(local, TYPE) + +__CLC_DECLARE_ATOMIC_ADDRSPACE(int) +__CLC_DECLARE_ATOMIC_ADDRSPACE(uint) diff --git a/libclc/generic/include/clc/atomic/atomic_inc.h b/libclc/generic/include/clc/atomic/atomic_inc.h new file mode 100644 index 000000000000..d8bc342aa5f6 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_inc.h @@ -0,0 +1 @@ +#define atomic_inc(p) atomic_add(p, 1) diff --git a/libclc/generic/include/clc/atomic/atomic_max.h b/libclc/generic/include/clc/atomic/atomic_max.h new file mode 100644 index 000000000000..ed09ec9caef2 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_max.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_max +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_min.h b/libclc/generic/include/clc/atomic/atomic_min.h new file mode 100644 index 000000000000..6a46af403d06 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_min.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_min +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_or.h b/libclc/generic/include/clc/atomic/atomic_or.h new file mode 100644 index 000000000000..2369d81a3a06 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_or.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_or +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_sub.h b/libclc/generic/include/clc/atomic/atomic_sub.h new file mode 100644 index 000000000000..993e995001fa --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_sub.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_sub +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_xchg.h b/libclc/generic/include/clc/atomic/atomic_xchg.h new file mode 100644 index 000000000000..ebe0d9af8098 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_xchg.h @@ -0,0 +1,6 @@ +#define __CLC_FUNCTION atomic_xchg +#include <clc/atomic/atomic_decl.inc> +__CLC_DECLARE_ATOMIC_ADDRSPACE(float); +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/atomic/atomic_xor.h b/libclc/generic/include/clc/atomic/atomic_xor.h new file mode 100644 index 000000000000..2cb74803ca92 --- /dev/null +++ b/libclc/generic/include/clc/atomic/atomic_xor.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION atomic_xor +#include <clc/atomic/atomic_decl.inc> +#undef __CLC_FUNCTION +#undef __CLC_DECLARE_ATOMIC +#undef __CLC_DECLARE_ATOMIC_ADDRSPACE diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_add.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_add.h new file mode 100644 index 000000000000..9740b3ddab63 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_add.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_add(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_add(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h new file mode 100644 index 000000000000..168f423396a6 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_cmpxchg(global int *p, int cmp, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_cmpxchg(global unsigned int *p, unsigned int cmp, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h new file mode 100644 index 000000000000..bbc872ce0527 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h new file mode 100644 index 000000000000..050747c79403 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_inc(global int *p); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_inc(global unsigned int *p); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h new file mode 100644 index 000000000000..c435c726798c --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_sub(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h new file mode 100644 index 000000000000..6a18e9e8e1b1 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_xchg(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_xchg(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h new file mode 100644 index 000000000000..19df7d6ed6ea --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_and(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_and(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h new file mode 100644 index 000000000000..b46ce29c40c5 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_max(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_max(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h new file mode 100644 index 000000000000..0e458eb60eae --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_min(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_min(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h new file mode 100644 index 000000000000..91cde56a4d7b --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_or(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_or(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h new file mode 100644 index 000000000000..f787849cff00 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_xor(global int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_xor(global unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_add.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_add.h new file mode 100644 index 000000000000..096d01107d89 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_add.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_add(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_add(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h new file mode 100644 index 000000000000..e10a84f2cb47 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_cmpxchg(local int *p, int cmp, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_cmpxchg(local unsigned int *p, unsigned int cmp, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h new file mode 100644 index 000000000000..e74d8fc12b92 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_dec(local int *p); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(local unsigned int *p); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h new file mode 100644 index 000000000000..718f1f2b8041 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_inc(local int *p); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_inc(local unsigned int *p); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h new file mode 100644 index 000000000000..6363780e9dec --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_sub(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_sub(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h new file mode 100644 index 000000000000..c5a1f09b0849 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_xchg(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_xchg(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h new file mode 100644 index 000000000000..96d7b1a89b6e --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_and(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_and(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h new file mode 100644 index 000000000000..7d6b17df2a55 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_max(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_max(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h new file mode 100644 index 000000000000..ddb6cf379284 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_min(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_min(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h new file mode 100644 index 000000000000..518c256dfbb8 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_or(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_or(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h new file mode 100644 index 000000000000..e6c9f2f57521 --- /dev/null +++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h @@ -0,0 +1,2 @@ +_CLC_OVERLOAD _CLC_DECL int atom_xor(local int *p, int val); +_CLC_OVERLOAD _CLC_DECL unsigned int atom_xor(local unsigned int *p, unsigned int val); diff --git a/libclc/generic/include/clc/clc.h b/libclc/generic/include/clc/clc.h new file mode 100644 index 000000000000..bd92fdb12b5a --- /dev/null +++ b/libclc/generic/include/clc/clc.h @@ -0,0 +1,195 @@ +#ifndef cl_clang_storage_class_specifiers +#error Implementation requires cl_clang_storage_class_specifiers extension! +#endif + +#pragma OPENCL EXTENSION cl_clang_storage_class_specifiers : enable + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +/* Function Attributes */ +#include <clc/clcfunc.h> + +/* 6.1 Supported Data Types */ +#include <clc/clctypes.h> + +/* 6.2.3 Explicit Conversions */ +#include <clc/convert.h> + +/* 6.2.4.2 Reinterpreting Types Using as_type() and as_typen() */ +#include <clc/as_type.h> + +/* 6.9 Preprocessor Directives and Macros */ +#include <clc/clcversion.h> + +/* 6.11.1 Work-Item Functions */ +#include <clc/workitem/get_global_size.h> +#include <clc/workitem/get_global_id.h> +#include <clc/workitem/get_local_size.h> +#include <clc/workitem/get_local_id.h> +#include <clc/workitem/get_num_groups.h> +#include <clc/workitem/get_group_id.h> + +/* 6.11.2 Math Functions */ +#include <clc/math/acos.h> +#include <clc/math/asin.h> +#include <clc/math/atan.h> +#include <clc/math/atan2.h> +#include <clc/math/copysign.h> +#include <clc/math/cos.h> +#include <clc/math/ceil.h> +#include <clc/math/exp.h> +#include <clc/math/exp10.h> +#include <clc/math/exp2.h> +#include <clc/math/fabs.h> +#include <clc/math/floor.h> +#include <clc/math/fma.h> +#include <clc/math/fmax.h> +#include <clc/math/fmin.h> +#include <clc/math/fmod.h> +#include <clc/math/hypot.h> +#include <clc/math/log.h> +#include <clc/math/log1p.h> +#include <clc/math/log2.h> +#include <clc/math/mad.h> +#include <clc/math/mix.h> +#include <clc/math/nextafter.h> +#include <clc/math/pow.h> +#include <clc/math/pown.h> +#include <clc/math/rint.h> +#include <clc/math/round.h> +#include <clc/math/sin.h> +#include <clc/math/sincos.h> +#include <clc/math/sqrt.h> +#include <clc/math/tan.h> +#include <clc/math/trunc.h> +#include <clc/math/native_cos.h> +#include <clc/math/native_divide.h> +#include <clc/math/native_exp.h> +#include <clc/math/native_exp10.h> +#include <clc/math/native_exp2.h> +#include <clc/math/native_log.h> +#include <clc/math/native_log2.h> +#include <clc/math/native_powr.h> +#include <clc/math/native_sin.h> +#include <clc/math/native_sqrt.h> +#include <clc/math/rsqrt.h> + +/* 6.11.2.1 Floating-point macros */ +#include <clc/float/definitions.h> + +/* 6.11.3 Integer Functions */ +#include <clc/integer/abs.h> +#include <clc/integer/abs_diff.h> +#include <clc/integer/add_sat.h> +#include <clc/integer/clz.h> +#include <clc/integer/hadd.h> +#include <clc/integer/mad24.h> +#include <clc/integer/mad_hi.h> +#include <clc/integer/mad_sat.h> +#include <clc/integer/mul24.h> +#include <clc/integer/mul_hi.h> +#include <clc/integer/rhadd.h> +#include <clc/integer/rotate.h> +#include <clc/integer/sub_sat.h> +#include <clc/integer/upsample.h> + +/* 6.11.3 Integer Definitions */ +#include <clc/integer/definitions.h> + +/* 6.11.2 and 6.11.3 Shared Integer/Math Functions */ +#include <clc/shared/clamp.h> +#include <clc/shared/max.h> +#include <clc/shared/min.h> +#include <clc/shared/vload.h> +#include <clc/shared/vstore.h> + +/* 6.11.4 Common Functions */ +#include <clc/common/sign.h> + +/* 6.11.5 Geometric Functions */ +#include <clc/geometric/cross.h> +#include <clc/geometric/dot.h> +#include <clc/geometric/length.h> +#include <clc/geometric/normalize.h> + +/* 6.11.6 Relational Functions */ +#include <clc/relational/all.h> +#include <clc/relational/any.h> +#include <clc/relational/bitselect.h> +#include <clc/relational/isequal.h> +#include <clc/relational/isfinite.h> +#include <clc/relational/isgreater.h> +#include <clc/relational/isgreaterequal.h> +#include <clc/relational/isinf.h> +#include <clc/relational/isless.h> +#include <clc/relational/islessequal.h> +#include <clc/relational/islessgreater.h> +#include <clc/relational/isnan.h> +#include <clc/relational/isnormal.h> +#include <clc/relational/isnotequal.h> +#include <clc/relational/isordered.h> +#include <clc/relational/isunordered.h> +#include <clc/relational/select.h> +#include <clc/relational/signbit.h> + +/* 6.11.8 Synchronization Functions */ +#include <clc/synchronization/cl_mem_fence_flags.h> +#include <clc/synchronization/barrier.h> + +/* 6.11.10 Async Copy and Prefetch Functions */ +#include <clc/async/async_work_group_copy.h> +#include <clc/async/async_work_group_strided_copy.h> +#include <clc/async/prefetch.h> +#include <clc/async/wait_group_events.h> + +/* 6.11.11 Atomic Functions */ +#include <clc/atomic/atomic_add.h> +#include <clc/atomic/atomic_and.h> +#include <clc/atomic/atomic_cmpxchg.h> +#include <clc/atomic/atomic_dec.h> +#include <clc/atomic/atomic_inc.h> +#include <clc/atomic/atomic_max.h> +#include <clc/atomic/atomic_min.h> +#include <clc/atomic/atomic_or.h> +#include <clc/atomic/atomic_sub.h> +#include <clc/atomic/atomic_xchg.h> +#include <clc/atomic/atomic_xor.h> + +/* cl_khr_global_int32_base_atomics Extension Functions */ +#include <clc/cl_khr_global_int32_base_atomics/atom_add.h> +#include <clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h> +#include <clc/cl_khr_global_int32_base_atomics/atom_dec.h> +#include <clc/cl_khr_global_int32_base_atomics/atom_inc.h> +#include <clc/cl_khr_global_int32_base_atomics/atom_sub.h> +#include <clc/cl_khr_global_int32_base_atomics/atom_xchg.h> + +/* cl_khr_global_int32_extended_atomics Extension Functions */ +#include <clc/cl_khr_global_int32_extended_atomics/atom_and.h> +#include <clc/cl_khr_global_int32_extended_atomics/atom_max.h> +#include <clc/cl_khr_global_int32_extended_atomics/atom_min.h> +#include <clc/cl_khr_global_int32_extended_atomics/atom_or.h> +#include <clc/cl_khr_global_int32_extended_atomics/atom_xor.h> + +/* cl_khr_local_int32_base_atomics Extension Functions */ +#include <clc/cl_khr_local_int32_base_atomics/atom_add.h> +#include <clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h> +#include <clc/cl_khr_local_int32_base_atomics/atom_dec.h> +#include <clc/cl_khr_local_int32_base_atomics/atom_inc.h> +#include <clc/cl_khr_local_int32_base_atomics/atom_sub.h> +#include <clc/cl_khr_local_int32_base_atomics/atom_xchg.h> + +/* cl_khr_local_int32_extended_atomics Extension Functions */ +#include <clc/cl_khr_local_int32_extended_atomics/atom_and.h> +#include <clc/cl_khr_local_int32_extended_atomics/atom_max.h> +#include <clc/cl_khr_local_int32_extended_atomics/atom_min.h> +#include <clc/cl_khr_local_int32_extended_atomics/atom_or.h> +#include <clc/cl_khr_local_int32_extended_atomics/atom_xor.h> + +/* libclc internal defintions */ +#ifdef __CLC_INTERNAL +#include <math/clc_nextafter.h> +#endif + +#pragma OPENCL EXTENSION all : disable diff --git a/libclc/generic/include/clc/clcfunc.h b/libclc/generic/include/clc/clcfunc.h new file mode 100644 index 000000000000..5f166c5a4143 --- /dev/null +++ b/libclc/generic/include/clc/clcfunc.h @@ -0,0 +1,4 @@ +#define _CLC_OVERLOAD __attribute__((overloadable)) +#define _CLC_DECL +#define _CLC_DEF __attribute__((always_inline)) +#define _CLC_INLINE __attribute__((always_inline)) inline diff --git a/libclc/generic/include/clc/clctypes.h b/libclc/generic/include/clc/clctypes.h new file mode 100644 index 000000000000..2e3db60dbdfe --- /dev/null +++ b/libclc/generic/include/clc/clctypes.h @@ -0,0 +1,89 @@ +/* 6.1.1 Built-in Scalar Data Types */ + +typedef unsigned char uchar; +typedef unsigned short ushort; +typedef unsigned int uint; +typedef unsigned long ulong; + +typedef __SIZE_TYPE__ size_t; +typedef __PTRDIFF_TYPE__ ptrdiff_t; + +#define __stdint_join3(a,b,c) a ## b ## c + +#define __intn_t(n) __stdint_join3(__INT, n, _TYPE__) +#define __uintn_t(n) __stdint_join3(unsigned __INT, n, _TYPE__) + +typedef __intn_t(__INTPTR_WIDTH__) intptr_t; +typedef __uintn_t(__INTPTR_WIDTH__) uintptr_t; + +#undef __uintn_t +#undef __intn_t +#undef __stdint_join3 + +/* 6.1.2 Built-in Vector Data Types */ + +typedef __attribute__((ext_vector_type(2))) char char2; +typedef __attribute__((ext_vector_type(3))) char char3; +typedef __attribute__((ext_vector_type(4))) char char4; +typedef __attribute__((ext_vector_type(8))) char char8; +typedef __attribute__((ext_vector_type(16))) char char16; + +typedef __attribute__((ext_vector_type(2))) uchar uchar2; +typedef __attribute__((ext_vector_type(3))) uchar uchar3; +typedef __attribute__((ext_vector_type(4))) uchar uchar4; +typedef __attribute__((ext_vector_type(8))) uchar uchar8; +typedef __attribute__((ext_vector_type(16))) uchar uchar16; + +typedef __attribute__((ext_vector_type(2))) short short2; +typedef __attribute__((ext_vector_type(3))) short short3; +typedef __attribute__((ext_vector_type(4))) short short4; +typedef __attribute__((ext_vector_type(8))) short short8; +typedef __attribute__((ext_vector_type(16))) short short16; + +typedef __attribute__((ext_vector_type(2))) ushort ushort2; +typedef __attribute__((ext_vector_type(3))) ushort ushort3; +typedef __attribute__((ext_vector_type(4))) ushort ushort4; +typedef __attribute__((ext_vector_type(8))) ushort ushort8; +typedef __attribute__((ext_vector_type(16))) ushort ushort16; + +typedef __attribute__((ext_vector_type(2))) int int2; +typedef __attribute__((ext_vector_type(3))) int int3; +typedef __attribute__((ext_vector_type(4))) int int4; +typedef __attribute__((ext_vector_type(8))) int int8; +typedef __attribute__((ext_vector_type(16))) int int16; + +typedef __attribute__((ext_vector_type(2))) uint uint2; +typedef __attribute__((ext_vector_type(3))) uint uint3; +typedef __attribute__((ext_vector_type(4))) uint uint4; +typedef __attribute__((ext_vector_type(8))) uint uint8; +typedef __attribute__((ext_vector_type(16))) uint uint16; + +typedef __attribute__((ext_vector_type(2))) long long2; +typedef __attribute__((ext_vector_type(3))) long long3; +typedef __attribute__((ext_vector_type(4))) long long4; +typedef __attribute__((ext_vector_type(8))) long long8; +typedef __attribute__((ext_vector_type(16))) long long16; + +typedef __attribute__((ext_vector_type(2))) ulong ulong2; +typedef __attribute__((ext_vector_type(3))) ulong ulong3; +typedef __attribute__((ext_vector_type(4))) ulong ulong4; +typedef __attribute__((ext_vector_type(8))) ulong ulong8; +typedef __attribute__((ext_vector_type(16))) ulong ulong16; + +typedef __attribute__((ext_vector_type(2))) float float2; +typedef __attribute__((ext_vector_type(3))) float float3; +typedef __attribute__((ext_vector_type(4))) float float4; +typedef __attribute__((ext_vector_type(8))) float float8; +typedef __attribute__((ext_vector_type(16))) float float16; + +/* 9.3 Double Precision Floating-Point */ + +#ifdef cl_khr_fp64 +typedef __attribute__((ext_vector_type(2))) double double2; +typedef __attribute__((ext_vector_type(3))) double double3; +typedef __attribute__((ext_vector_type(4))) double double4; +typedef __attribute__((ext_vector_type(8))) double double8; +typedef __attribute__((ext_vector_type(16))) double double16; +#endif + +#define NULL ((void *)0) diff --git a/libclc/generic/include/clc/clcversion.h b/libclc/generic/include/clc/clcversion.h new file mode 100644 index 000000000000..57c989e3b713 --- /dev/null +++ b/libclc/generic/include/clc/clcversion.h @@ -0,0 +1,8 @@ +#if __OPENCL_VERSION__ >= 110 +#define CLC_VERSION_1_0 100 +#define CLC_VERSION_1_1 110 +#endif + +#if __OPENCL_VERSION__ >= 120 +#define CLC_VERSION_1_2 120 +#endif diff --git a/libclc/generic/include/clc/common/sign.h b/libclc/generic/include/clc/common/sign.h new file mode 100644 index 000000000000..fa9aa096541f --- /dev/null +++ b/libclc/generic/include/clc/common/sign.h @@ -0,0 +1,5 @@ +#define __CLC_FUNCTION sign +#define __CLC_BODY <clc/math/unary_decl.inc> +#include <clc/math/gentype.inc> +#undef __CLC_FUNCTION +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/convert.h b/libclc/generic/include/clc/convert.h new file mode 100644 index 000000000000..f0ba796864d4 --- /dev/null +++ b/libclc/generic/include/clc/convert.h @@ -0,0 +1,60 @@ +#define _CLC_CONVERT_DECL(FROM_TYPE, TO_TYPE, SUFFIX) \ + _CLC_OVERLOAD _CLC_DECL TO_TYPE convert_##TO_TYPE##SUFFIX(FROM_TYPE x); + +#define _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, TO_TYPE, SUFFIX) \ + _CLC_CONVERT_DECL(FROM_TYPE, TO_TYPE, SUFFIX) \ + _CLC_CONVERT_DECL(FROM_TYPE##2, TO_TYPE##2, SUFFIX) \ + _CLC_CONVERT_DECL(FROM_TYPE##3, TO_TYPE##3, SUFFIX) \ + _CLC_CONVERT_DECL(FROM_TYPE##4, TO_TYPE##4, SUFFIX) \ + _CLC_CONVERT_DECL(FROM_TYPE##8, TO_TYPE##8, SUFFIX) \ + _CLC_CONVERT_DECL(FROM_TYPE##16, TO_TYPE##16, SUFFIX) + +#define _CLC_VECTOR_CONVERT_FROM1(FROM_TYPE, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, char, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, uchar, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, int, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, uint, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, short, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, ushort, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, long, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, ulong, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, float, SUFFIX) + +#ifdef cl_khr_fp64 +#define _CLC_VECTOR_CONVERT_FROM(FROM_TYPE, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM1(FROM_TYPE, SUFFIX) \ + _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, double, SUFFIX) +#else +#define _CLC_VECTOR_CONVERT_FROM(FROM_TYPE, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM1(FROM_TYPE, SUFFIX) +#endif + +#define _CLC_VECTOR_CONVERT_TO1(SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(char, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(uchar, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(int, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(uint, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(short, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(ushort, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(long, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(ulong, SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(float, SUFFIX) + +#ifdef cl_khr_fp64 +#define _CLC_VECTOR_CONVERT_TO(SUFFIX) \ + _CLC_VECTOR_CONVERT_TO1(SUFFIX) \ + _CLC_VECTOR_CONVERT_FROM(double, SUFFIX) +#else +#define _CLC_VECTOR_CONVERT_TO(SUFFIX) \ + _CLC_VECTOR_CONVERT_TO1(SUFFIX) +#endif + +#define _CLC_VECTOR_CONVERT_TO_SUFFIX(ROUND) \ + _CLC_VECTOR_CONVERT_TO(_sat##ROUND) \ + _CLC_VECTOR_CONVERT_TO(ROUND) + +_CLC_VECTOR_CONVERT_TO_SUFFIX(_rtn) +_CLC_VECTOR_CONVERT_TO_SUFFIX(_rte) +_CLC_VECTOR_CONVERT_TO_SUFFIX(_rtz) +_CLC_VECTOR_CONVERT_TO_SUFFIX(_rtp) +_CLC_VECTOR_CONVERT_TO_SUFFIX() diff --git a/libclc/generic/include/clc/float/definitions.h b/libclc/generic/include/clc/float/definitions.h new file mode 100644 index 000000000000..329b6238c3f4 --- /dev/null +++ b/libclc/generic/include/clc/float/definitions.h @@ -0,0 +1,74 @@ +#define MAXFLOAT 0x1.fffffep127f +#define HUGE_VALF __builtin_huge_valf() +#define INFINITY __builtin_inff() +#define NAN __builtin_nanf("") + +#define FLT_DIG 6 +#define FLT_MANT_DIG 24 +#define FLT_MAX_10_EXP +38 +#define FLT_MAX_EXP +128 +#define FLT_MIN_10_EXP -37 +#define FLT_MIN_EXP -125 +#define FLT_RADIX 2 +#define FLT_MAX MAXFLOAT +#define FLT_MIN 0x1.0p-126f +#define FLT_EPSILON 0x1.0p-23f + +#define M_E_F 0x1.5bf0a8p+1f +#define M_LOG2E_F 0x1.715476p+0f +#define M_LOG10E_F 0x1.bcb7b2p-2f +#define M_LN2_F 0x1.62e430p-1f +#define M_LN10_F 0x1.26bb1cp+1f +#define M_PI_F 0x1.921fb6p+1f +#define M_PI_2_F 0x1.921fb6p+0f +#define M_PI_4_F 0x1.921fb6p-1f +#define M_1_PI_F 0x1.45f306p-2f +#define M_2_PI_F 0x1.45f306p-1f +#define M_2_SQRTPI_F 0x1.20dd76p+0f +#define M_SQRT2_F 0x1.6a09e6p+0f +#define M_SQRT1_2_F 0x1.6a09e6p-1f + +#ifdef cl_khr_fp64 + +#define HUGE_VAL __builtin_huge_val() + +#define DBL_DIG 15 +#define DBL_MANT_DIG 53 +#define DBL_MAX_10_EXP +308 +#define DBL_MAX_EXP +1024 +#define DBL_MIN_10_EXP -307 +#define DBL_MIN_EXP -1021 +#define DBL_MAX 0x1.fffffffffffffp1023 +#define DBL_MIN 0x1.0p-1022 +#define DBL_EPSILON 0x1.0p-52 + +#define M_E 0x1.5bf0a8b145769p+1 +#define M_LOG2E 0x1.71547652b82fep+0 +#define M_LOG10E 0x1.bcb7b1526e50ep-2 +#define M_LN2 0x1.62e42fefa39efp-1 +#define M_LN10 0x1.26bb1bbb55516p+1 +#define M_PI 0x1.921fb54442d18p+1 +#define M_PI_2 0x1.921fb54442d18p+0 +#define M_PI_4 0x1.921fb54442d18p-1 +#define M_1_PI 0x1.45f306dc9c883p-2 +#define M_2_PI 0x1.45f306dc9c883p-1 +#define M_2_SQRTPI 0x1.20dd750429b6dp+0 +#define M_SQRT2 0x1.6a09e667f3bcdp+0 +#define M_SQRT1_2 0x1.6a09e667f3bcdp-1 + +#endif + +#ifdef cl_khr_fp16 + +#if __OPENCL_VERSION__ >= 120 + +#define HALF_DIG 3 +#define HALF_MANT_DIG 11 +#define HALF_MAX_10_EXP +4 +#define HALF_MAX_EXP +16 +#define HALF_MIN_10_EXP -4 +#define HALF_MIN_EXP -13 + +#endif + +#endif diff --git a/libclc/generic/include/clc/geometric/cross.h b/libclc/generic/include/clc/geometric/cross.h new file mode 100644 index 000000000000..eee0cc81bb92 --- /dev/null +++ b/libclc/generic/include/clc/geometric/cross.h @@ -0,0 +1,7 @@ +_CLC_OVERLOAD _CLC_DECL float3 cross(float3 p0, float3 p1); +_CLC_OVERLOAD _CLC_DECL float4 cross(float4 p0, float4 p1); + +#ifdef cl_khr_fp64 +_CLC_OVERLOAD _CLC_DECL double3 cross(double3 p0, double3 p1); +_CLC_OVERLOAD _CLC_DECL double4 cross(double4 p0, double4 p1); +#endif diff --git a/libclc/generic/include/clc/geometric/distance.h b/libclc/generic/include/clc/geometric/distance.h new file mode 100644 index 000000000000..3e91332d7838 --- /dev/null +++ b/libclc/generic/include/clc/geometric/distance.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/geometric/distance.inc> +#include <clc/geometric/floatn.inc> diff --git a/libclc/generic/include/clc/geometric/dot.h b/libclc/generic/include/clc/geometric/dot.h new file mode 100644 index 000000000000..7f65fed9760d --- /dev/null +++ b/libclc/generic/include/clc/geometric/dot.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/geometric/dot.inc> +#include <clc/geometric/floatn.inc> diff --git a/libclc/generic/include/clc/geometric/dot.inc b/libclc/generic/include/clc/geometric/dot.inc new file mode 100644 index 000000000000..34245e2935a4 --- /dev/null +++ b/libclc/generic/include/clc/geometric/dot.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_FLOAT dot(__CLC_FLOATN p0, __CLC_FLOATN p1); diff --git a/libclc/generic/include/clc/geometric/floatn.inc b/libclc/generic/include/clc/geometric/floatn.inc new file mode 100644 index 000000000000..fb7a9ae601cd --- /dev/null +++ b/libclc/generic/include/clc/geometric/floatn.inc @@ -0,0 +1,45 @@ +#define __CLC_FLOAT float + +#define __CLC_FLOATN float +#include __CLC_BODY +#undef __CLC_FLOATN + +#define __CLC_FLOATN float2 +#include __CLC_BODY +#undef __CLC_FLOATN + +#define __CLC_FLOATN float3 +#include __CLC_BODY +#undef __CLC_FLOATN + +#define __CLC_FLOATN float4 +#include __CLC_BODY +#undef __CLC_FLOATN + +#undef __CLC_FLOAT + +#ifdef cl_khr_fp64 + +#define __CLC_FLOAT double + +#define __CLC_FLOATN double +#include __CLC_BODY +#undef __CLC_FLOATN + +#define __CLC_FLOATN double2 +#include __CLC_BODY +#undef __CLC_FLOATN + +#define __CLC_FLOATN double3 +#include __CLC_BODY +#undef __CLC_FLOATN + +#define __CLC_FLOATN double4 +#include __CLC_BODY +#undef __CLC_FLOATN + +#undef __CLC_FLOAT + +#endif + +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/geometric/length.h b/libclc/generic/include/clc/geometric/length.h new file mode 100644 index 000000000000..cb992b9bc72e --- /dev/null +++ b/libclc/generic/include/clc/geometric/length.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/geometric/length.inc> +#include <clc/geometric/floatn.inc> diff --git a/libclc/generic/include/clc/geometric/length.inc b/libclc/generic/include/clc/geometric/length.inc new file mode 100644 index 000000000000..c2d95e876831 --- /dev/null +++ b/libclc/generic/include/clc/geometric/length.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_FLOAT length(__CLC_FLOATN p0); diff --git a/libclc/generic/include/clc/geometric/normalize.h b/libclc/generic/include/clc/geometric/normalize.h new file mode 100644 index 000000000000..dccff9b4e041 --- /dev/null +++ b/libclc/generic/include/clc/geometric/normalize.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/geometric/normalize.inc> +#include <clc/geometric/floatn.inc> diff --git a/libclc/generic/include/clc/geometric/normalize.inc b/libclc/generic/include/clc/geometric/normalize.inc new file mode 100644 index 000000000000..6eb13150603e --- /dev/null +++ b/libclc/generic/include/clc/geometric/normalize.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_FLOATN normalize(__CLC_FLOATN p); diff --git a/libclc/generic/include/clc/integer/abs.h b/libclc/generic/include/clc/integer/abs.h new file mode 100644 index 000000000000..77a4cbeb4fe3 --- /dev/null +++ b/libclc/generic/include/clc/integer/abs.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/abs.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/abs.inc b/libclc/generic/include/clc/integer/abs.inc new file mode 100644 index 000000000000..952bce7e29e3 --- /dev/null +++ b/libclc/generic/include/clc/integer/abs.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_U_GENTYPE abs(__CLC_GENTYPE x); diff --git a/libclc/generic/include/clc/integer/abs_diff.h b/libclc/generic/include/clc/integer/abs_diff.h new file mode 100644 index 000000000000..3f3b4b43c5d7 --- /dev/null +++ b/libclc/generic/include/clc/integer/abs_diff.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/abs_diff.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/abs_diff.inc b/libclc/generic/include/clc/integer/abs_diff.inc new file mode 100644 index 000000000000..e844d46e808b --- /dev/null +++ b/libclc/generic/include/clc/integer/abs_diff.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_U_GENTYPE abs_diff(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/add_sat.h b/libclc/generic/include/clc/integer/add_sat.h new file mode 100644 index 000000000000..2e5e69851442 --- /dev/null +++ b/libclc/generic/include/clc/integer/add_sat.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/add_sat.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/add_sat.inc b/libclc/generic/include/clc/integer/add_sat.inc new file mode 100644 index 000000000000..913841a1dada --- /dev/null +++ b/libclc/generic/include/clc/integer/add_sat.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE add_sat(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/clz.h b/libclc/generic/include/clc/integer/clz.h new file mode 100644 index 000000000000..f7cdbf78ec06 --- /dev/null +++ b/libclc/generic/include/clc/integer/clz.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/clz.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/clz.inc b/libclc/generic/include/clc/integer/clz.inc new file mode 100644 index 000000000000..45826d10c9fa --- /dev/null +++ b/libclc/generic/include/clc/integer/clz.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE clz(__CLC_GENTYPE x); diff --git a/libclc/generic/include/clc/integer/definitions.h b/libclc/generic/include/clc/integer/definitions.h new file mode 100644 index 000000000000..a407974a0d4e --- /dev/null +++ b/libclc/generic/include/clc/integer/definitions.h @@ -0,0 +1,15 @@ +#define CHAR_BIT 8 +#define INT_MAX 2147483647 +#define INT_MIN -2147483648 +#define LONG_MAX 0x7fffffffffffffffL +#define LONG_MIN -0x8000000000000000L +#define SCHAR_MAX 127 +#define SCHAR_MIN -128 +#define CHAR_MAX 127 +#define CHAR_MIN -128 +#define SHRT_MAX 32767 +#define SHRT_MIN -32768 +#define UCHAR_MAX 255 +#define USHRT_MAX 65535 +#define UINT_MAX 0xffffffff +#define ULONG_MAX 0xffffffffffffffffUL diff --git a/libclc/generic/include/clc/integer/gentype.inc b/libclc/generic/include/clc/integer/gentype.inc new file mode 100644 index 000000000000..6f4d6996d8f5 --- /dev/null +++ b/libclc/generic/include/clc/integer/gentype.inc @@ -0,0 +1,435 @@ +//These 2 defines only change when switching between data sizes or base types to +//keep this file manageable. +#define __CLC_GENSIZE 8 +#define __CLC_SCALAR_GENTYPE char + +#define __CLC_GENTYPE char +#define __CLC_U_GENTYPE uchar +#define __CLC_S_GENTYPE char +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE char2 +#define __CLC_U_GENTYPE uchar2 +#define __CLC_S_GENTYPE char2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE char3 +#define __CLC_U_GENTYPE uchar3 +#define __CLC_S_GENTYPE char3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE char4 +#define __CLC_U_GENTYPE uchar4 +#define __CLC_S_GENTYPE char4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE char8 +#define __CLC_U_GENTYPE uchar8 +#define __CLC_S_GENTYPE char8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE char16 +#define __CLC_U_GENTYPE uchar16 +#define __CLC_S_GENTYPE char16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE uchar + +#define __CLC_GENTYPE uchar +#define __CLC_U_GENTYPE uchar +#define __CLC_S_GENTYPE char +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uchar2 +#define __CLC_U_GENTYPE uchar2 +#define __CLC_S_GENTYPE char2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uchar3 +#define __CLC_U_GENTYPE uchar3 +#define __CLC_S_GENTYPE char3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uchar4 +#define __CLC_U_GENTYPE uchar4 +#define __CLC_S_GENTYPE char4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uchar8 +#define __CLC_U_GENTYPE uchar8 +#define __CLC_S_GENTYPE char8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uchar16 +#define __CLC_U_GENTYPE uchar16 +#define __CLC_S_GENTYPE char16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_GENSIZE +#define __CLC_GENSIZE 16 +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE short + +#define __CLC_GENTYPE short +#define __CLC_U_GENTYPE ushort +#define __CLC_S_GENTYPE short +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE short2 +#define __CLC_U_GENTYPE ushort2 +#define __CLC_S_GENTYPE short2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE short3 +#define __CLC_U_GENTYPE ushort3 +#define __CLC_S_GENTYPE short3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE short4 +#define __CLC_U_GENTYPE ushort4 +#define __CLC_S_GENTYPE short4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE short8 +#define __CLC_U_GENTYPE ushort8 +#define __CLC_S_GENTYPE short8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE short16 +#define __CLC_U_GENTYPE ushort16 +#define __CLC_S_GENTYPE short16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE ushort + +#define __CLC_GENTYPE ushort +#define __CLC_U_GENTYPE ushort +#define __CLC_S_GENTYPE short +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ushort2 +#define __CLC_U_GENTYPE ushort2 +#define __CLC_S_GENTYPE short2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ushort3 +#define __CLC_U_GENTYPE ushort3 +#define __CLC_S_GENTYPE short3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ushort4 +#define __CLC_U_GENTYPE ushort4 +#define __CLC_S_GENTYPE short4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ushort8 +#define __CLC_U_GENTYPE ushort8 +#define __CLC_S_GENTYPE short8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ushort16 +#define __CLC_U_GENTYPE ushort16 +#define __CLC_S_GENTYPE short16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_GENSIZE +#define __CLC_GENSIZE 32 +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE int + +#define __CLC_GENTYPE int +#define __CLC_U_GENTYPE uint +#define __CLC_S_GENTYPE int +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE int2 +#define __CLC_U_GENTYPE uint2 +#define __CLC_S_GENTYPE int2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE int3 +#define __CLC_U_GENTYPE uint3 +#define __CLC_S_GENTYPE int3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE int4 +#define __CLC_U_GENTYPE uint4 +#define __CLC_S_GENTYPE int4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE int8 +#define __CLC_U_GENTYPE uint8 +#define __CLC_S_GENTYPE int8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE int16 +#define __CLC_U_GENTYPE uint16 +#define __CLC_S_GENTYPE int16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE uint + +#define __CLC_GENTYPE uint +#define __CLC_U_GENTYPE uint +#define __CLC_S_GENTYPE int +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uint2 +#define __CLC_U_GENTYPE uint2 +#define __CLC_S_GENTYPE int2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uint3 +#define __CLC_U_GENTYPE uint3 +#define __CLC_S_GENTYPE int3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uint4 +#define __CLC_U_GENTYPE uint4 +#define __CLC_S_GENTYPE int4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uint8 +#define __CLC_U_GENTYPE uint8 +#define __CLC_S_GENTYPE int8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE uint16 +#define __CLC_U_GENTYPE uint16 +#define __CLC_S_GENTYPE int16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_GENSIZE +#define __CLC_GENSIZE 64 +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE long + +#define __CLC_GENTYPE long +#define __CLC_U_GENTYPE ulong +#define __CLC_S_GENTYPE long +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE long2 +#define __CLC_U_GENTYPE ulong2 +#define __CLC_S_GENTYPE long2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE long3 +#define __CLC_U_GENTYPE ulong3 +#define __CLC_S_GENTYPE long3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE long4 +#define __CLC_U_GENTYPE ulong4 +#define __CLC_S_GENTYPE long4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE long8 +#define __CLC_U_GENTYPE ulong8 +#define __CLC_S_GENTYPE long8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE long16 +#define __CLC_U_GENTYPE ulong16 +#define __CLC_S_GENTYPE long16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_SCALAR_GENTYPE +#define __CLC_SCALAR_GENTYPE ulong + +#define __CLC_GENTYPE ulong +#define __CLC_U_GENTYPE ulong +#define __CLC_S_GENTYPE long +#define __CLC_SCALAR 1 +#include __CLC_BODY +#undef __CLC_SCALAR +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ulong2 +#define __CLC_U_GENTYPE ulong2 +#define __CLC_S_GENTYPE long2 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ulong3 +#define __CLC_U_GENTYPE ulong3 +#define __CLC_S_GENTYPE long3 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ulong4 +#define __CLC_U_GENTYPE ulong4 +#define __CLC_S_GENTYPE long4 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ulong8 +#define __CLC_U_GENTYPE ulong8 +#define __CLC_S_GENTYPE long8 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#define __CLC_GENTYPE ulong16 +#define __CLC_U_GENTYPE ulong16 +#define __CLC_S_GENTYPE long16 +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_U_GENTYPE +#undef __CLC_S_GENTYPE + +#undef __CLC_GENSIZE +#undef __CLC_SCALAR_GENTYPE +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/integer/hadd.h b/libclc/generic/include/clc/integer/hadd.h new file mode 100644 index 000000000000..37304e26cc2d --- /dev/null +++ b/libclc/generic/include/clc/integer/hadd.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/hadd.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/hadd.inc b/libclc/generic/include/clc/integer/hadd.inc new file mode 100644 index 000000000000..f698989cef20 --- /dev/null +++ b/libclc/generic/include/clc/integer/hadd.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE hadd(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/integer-gentype.inc b/libclc/generic/include/clc/integer/integer-gentype.inc new file mode 100644 index 000000000000..e4115cf45ebb --- /dev/null +++ b/libclc/generic/include/clc/integer/integer-gentype.inc @@ -0,0 +1,47 @@ +#define __CLC_GENTYPE int +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int3 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE int16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint3 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE uint16 +#include __CLC_BODY +#undef __CLC_GENTYPE diff --git a/libclc/generic/include/clc/integer/mad24.h b/libclc/generic/include/clc/integer/mad24.h new file mode 100644 index 000000000000..0c120faac2b1 --- /dev/null +++ b/libclc/generic/include/clc/integer/mad24.h @@ -0,0 +1,3 @@ +#define __CLC_BODY <clc/integer/mad24.inc> +#include <clc/integer/integer-gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/integer/mad24.inc b/libclc/generic/include/clc/integer/mad24.inc new file mode 100644 index 000000000000..81fe0c2a8926 --- /dev/null +++ b/libclc/generic/include/clc/integer/mad24.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mad24(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z); diff --git a/libclc/generic/include/clc/integer/mad_hi.h b/libclc/generic/include/clc/integer/mad_hi.h new file mode 100644 index 000000000000..863ce92d9f2d --- /dev/null +++ b/libclc/generic/include/clc/integer/mad_hi.h @@ -0,0 +1 @@ +#define mad_hi(a, b, c) (mul_hi((a),(b))+(c)) diff --git a/libclc/generic/include/clc/integer/mad_sat.h b/libclc/generic/include/clc/integer/mad_sat.h new file mode 100644 index 000000000000..3e92372a27d0 --- /dev/null +++ b/libclc/generic/include/clc/integer/mad_sat.h @@ -0,0 +1,3 @@ +#define __CLC_BODY <clc/integer/mad_sat.inc> +#include <clc/integer/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/integer/mad_sat.inc b/libclc/generic/include/clc/integer/mad_sat.inc new file mode 100644 index 000000000000..5da2bdf8908d --- /dev/null +++ b/libclc/generic/include/clc/integer/mad_sat.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mad_sat(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z); diff --git a/libclc/generic/include/clc/integer/mul24.h b/libclc/generic/include/clc/integer/mul24.h new file mode 100644 index 000000000000..4f97098d70f0 --- /dev/null +++ b/libclc/generic/include/clc/integer/mul24.h @@ -0,0 +1,3 @@ +#define __CLC_BODY <clc/integer/mul24.inc> +#include <clc/integer/integer-gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/integer/mul24.inc b/libclc/generic/include/clc/integer/mul24.inc new file mode 100644 index 000000000000..8cbf7c10ac44 --- /dev/null +++ b/libclc/generic/include/clc/integer/mul24.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mul24(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/mul_hi.h b/libclc/generic/include/clc/integer/mul_hi.h new file mode 100644 index 000000000000..27b95d83442f --- /dev/null +++ b/libclc/generic/include/clc/integer/mul_hi.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/mul_hi.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/mul_hi.inc b/libclc/generic/include/clc/integer/mul_hi.inc new file mode 100644 index 000000000000..ce9e5c0b2c18 --- /dev/null +++ b/libclc/generic/include/clc/integer/mul_hi.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mul_hi(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/rhadd.h b/libclc/generic/include/clc/integer/rhadd.h new file mode 100644 index 000000000000..69b43faeebd2 --- /dev/null +++ b/libclc/generic/include/clc/integer/rhadd.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/rhadd.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/rhadd.inc b/libclc/generic/include/clc/integer/rhadd.inc new file mode 100644 index 000000000000..88ccaf09fd5e --- /dev/null +++ b/libclc/generic/include/clc/integer/rhadd.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE rhadd(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/rotate.h b/libclc/generic/include/clc/integer/rotate.h new file mode 100644 index 000000000000..6320223e7cf2 --- /dev/null +++ b/libclc/generic/include/clc/integer/rotate.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/rotate.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/rotate.inc b/libclc/generic/include/clc/integer/rotate.inc new file mode 100644 index 000000000000..c97711ecf882 --- /dev/null +++ b/libclc/generic/include/clc/integer/rotate.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE rotate(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/sub_sat.h b/libclc/generic/include/clc/integer/sub_sat.h new file mode 100644 index 000000000000..f84152944817 --- /dev/null +++ b/libclc/generic/include/clc/integer/sub_sat.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/integer/sub_sat.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/include/clc/integer/sub_sat.inc b/libclc/generic/include/clc/integer/sub_sat.inc new file mode 100644 index 000000000000..425df2e4b696 --- /dev/null +++ b/libclc/generic/include/clc/integer/sub_sat.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE sub_sat(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/integer/upsample.h b/libclc/generic/include/clc/integer/upsample.h new file mode 100644 index 000000000000..0b36b692a2c8 --- /dev/null +++ b/libclc/generic/include/clc/integer/upsample.h @@ -0,0 +1,25 @@ +#define __CLC_UPSAMPLE_DECL(BGENTYPE, GENTYPE, UGENTYPE) \ + _CLC_OVERLOAD _CLC_DECL BGENTYPE upsample(GENTYPE hi, UGENTYPE lo); + +#define __CLC_UPSAMPLE_VEC(BGENTYPE, GENTYPE, UGENTYPE) \ + __CLC_UPSAMPLE_DECL(BGENTYPE, GENTYPE, UGENTYPE) \ + __CLC_UPSAMPLE_DECL(BGENTYPE##2, GENTYPE##2, UGENTYPE##2) \ + __CLC_UPSAMPLE_DECL(BGENTYPE##3, GENTYPE##3, UGENTYPE##3) \ + __CLC_UPSAMPLE_DECL(BGENTYPE##4, GENTYPE##4, UGENTYPE##4) \ + __CLC_UPSAMPLE_DECL(BGENTYPE##8, GENTYPE##8, UGENTYPE##8) \ + __CLC_UPSAMPLE_DECL(BGENTYPE##16, GENTYPE##16, UGENTYPE##16) \ + +#define __CLC_UPSAMPLE_TYPES() \ + __CLC_UPSAMPLE_VEC(short, char, uchar) \ + __CLC_UPSAMPLE_VEC(ushort, uchar, uchar) \ + __CLC_UPSAMPLE_VEC(int, short, ushort) \ + __CLC_UPSAMPLE_VEC(uint, ushort, ushort) \ + __CLC_UPSAMPLE_VEC(long, int, uint) \ + __CLC_UPSAMPLE_VEC(ulong, uint, uint) \ + +__CLC_UPSAMPLE_TYPES() + +#undef __CLC_UPSAMPLE_TYPES +#undef __CLC_UPSAMPLE_DECL +#undef __CLC_UPSAMPLE_VEC + diff --git a/libclc/generic/include/clc/math/acos.h b/libclc/generic/include/clc/math/acos.h new file mode 100644 index 000000000000..e753dee36aa5 --- /dev/null +++ b/libclc/generic/include/clc/math/acos.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/acos.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/acos.inc b/libclc/generic/include/clc/math/acos.inc new file mode 100644 index 000000000000..4ca8c7538aef --- /dev/null +++ b/libclc/generic/include/clc/math/acos.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE acos(__CLC_GENTYPE x); diff --git a/libclc/generic/include/clc/math/asin.h b/libclc/generic/include/clc/math/asin.h new file mode 100644 index 000000000000..2a858721e952 --- /dev/null +++ b/libclc/generic/include/clc/math/asin.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/asin.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/asin.inc b/libclc/generic/include/clc/math/asin.inc new file mode 100644 index 000000000000..b4ad8ff1231d --- /dev/null +++ b/libclc/generic/include/clc/math/asin.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE asin(__CLC_GENTYPE x); diff --git a/libclc/generic/include/clc/math/atan.h b/libclc/generic/include/clc/math/atan.h new file mode 100644 index 000000000000..d9697194ee8a --- /dev/null +++ b/libclc/generic/include/clc/math/atan.h @@ -0,0 +1,24 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#define __CLC_BODY <clc/math/atan.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/atan.inc b/libclc/generic/include/clc/math/atan.inc new file mode 100644 index 000000000000..d217c955593f --- /dev/null +++ b/libclc/generic/include/clc/math/atan.inc @@ -0,0 +1,23 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE atan(__CLC_GENTYPE a); diff --git a/libclc/generic/include/clc/math/atan2.h b/libclc/generic/include/clc/math/atan2.h new file mode 100644 index 000000000000..9c082a082f0a --- /dev/null +++ b/libclc/generic/include/clc/math/atan2.h @@ -0,0 +1,24 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#define __CLC_BODY <clc/math/atan2.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/atan2.inc b/libclc/generic/include/clc/math/atan2.inc new file mode 100644 index 000000000000..ce273da53346 --- /dev/null +++ b/libclc/generic/include/clc/math/atan2.inc @@ -0,0 +1,23 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE atan2(__CLC_GENTYPE a, __CLC_GENTYPE b); diff --git a/libclc/generic/include/clc/math/binary_decl.inc b/libclc/generic/include/clc/math/binary_decl.inc new file mode 100644 index 000000000000..70a711477704 --- /dev/null +++ b/libclc/generic/include/clc/math/binary_decl.inc @@ -0,0 +1,6 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE a, __CLC_GENTYPE b); +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE a, float b); + +#ifdef cl_khr_fp64 +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE a, double b); +#endif diff --git a/libclc/generic/include/clc/math/binary_intrin.inc b/libclc/generic/include/clc/math/binary_intrin.inc new file mode 100644 index 000000000000..cfbe74159ec2 --- /dev/null +++ b/libclc/generic/include/clc/math/binary_intrin.inc @@ -0,0 +1,18 @@ +_CLC_OVERLOAD float __CLC_FUNCTION(float, float) __asm(__CLC_INTRINSIC ".f32"); +_CLC_OVERLOAD float2 __CLC_FUNCTION(float2, float2) __asm(__CLC_INTRINSIC ".v2f32"); +_CLC_OVERLOAD float3 __CLC_FUNCTION(float3, float3) __asm(__CLC_INTRINSIC ".v3f32"); +_CLC_OVERLOAD float4 __CLC_FUNCTION(float4, float4) __asm(__CLC_INTRINSIC ".v4f32"); +_CLC_OVERLOAD float8 __CLC_FUNCTION(float8, float8) __asm(__CLC_INTRINSIC ".v8f32"); +_CLC_OVERLOAD float16 __CLC_FUNCTION(float16, float16) __asm(__CLC_INTRINSIC ".v16f32"); + +#ifdef cl_khr_fp64 +_CLC_OVERLOAD double __CLC_FUNCTION(double, double) __asm(__CLC_INTRINSIC ".f64"); +_CLC_OVERLOAD double2 __CLC_FUNCTION(double2, double2) __asm(__CLC_INTRINSIC ".v2f64"); +_CLC_OVERLOAD double3 __CLC_FUNCTION(double3, double3) __asm(__CLC_INTRINSIC ".v3f64"); +_CLC_OVERLOAD double4 __CLC_FUNCTION(double4, double4) __asm(__CLC_INTRINSIC ".v4f64"); +_CLC_OVERLOAD double8 __CLC_FUNCTION(double8, double8) __asm(__CLC_INTRINSIC ".v8f64"); +_CLC_OVERLOAD double16 __CLC_FUNCTION(double16, double16) __asm(__CLC_INTRINSIC ".v16f64"); +#endif + +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC diff --git a/libclc/generic/include/clc/math/ceil.h b/libclc/generic/include/clc/math/ceil.h new file mode 100644 index 000000000000..5b40abf97c20 --- /dev/null +++ b/libclc/generic/include/clc/math/ceil.h @@ -0,0 +1,6 @@ +#undef ceil +#define ceil __clc_ceil + +#define __CLC_FUNCTION __clc_ceil +#define __CLC_INTRINSIC "llvm.ceil" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/clc_nextafter.h b/libclc/generic/include/clc/math/clc_nextafter.h new file mode 100644 index 000000000000..81c8f369c3bd --- /dev/null +++ b/libclc/generic/include/clc/math/clc_nextafter.h @@ -0,0 +1,11 @@ +#define __CLC_BODY <clc/math/binary_decl.inc> + +#define __CLC_FUNCTION nextafter +#include <clc/math/gentype.inc> +#undef __CLC_FUNCTION + +#define __CLC_FUNCTION __clc_nextafter +#include <clc/math/gentype.inc> +#undef __CLC_FUNCTION + +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/math/copysign.h b/libclc/generic/include/clc/math/copysign.h new file mode 100644 index 000000000000..8f0742e451fd --- /dev/null +++ b/libclc/generic/include/clc/math/copysign.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/copysign.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/copysign.inc b/libclc/generic/include/clc/math/copysign.inc new file mode 100644 index 000000000000..6091abcc1fc5 --- /dev/null +++ b/libclc/generic/include/clc/math/copysign.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE copysign(__CLC_GENTYPE a, __CLC_GENTYPE b); diff --git a/libclc/generic/include/clc/math/cos.h b/libclc/generic/include/clc/math/cos.h new file mode 100644 index 000000000000..3d4cf39a0f80 --- /dev/null +++ b/libclc/generic/include/clc/math/cos.h @@ -0,0 +1,3 @@ +#define __CLC_BODY <clc/math/cos.inc> +#include <clc/math/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/math/cos.inc b/libclc/generic/include/clc/math/cos.inc new file mode 100644 index 000000000000..160e625c6912 --- /dev/null +++ b/libclc/generic/include/clc/math/cos.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE cos(__CLC_GENTYPE a); diff --git a/libclc/generic/include/clc/math/exp.h b/libclc/generic/include/clc/math/exp.h new file mode 100644 index 000000000000..986652476295 --- /dev/null +++ b/libclc/generic/include/clc/math/exp.h @@ -0,0 +1,9 @@ +#undef exp + +#define __CLC_BODY <clc/math/unary_decl.inc> +#define __CLC_FUNCTION exp + +#include <clc/math/gentype.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/math/exp10.h b/libclc/generic/include/clc/math/exp10.h new file mode 100644 index 000000000000..a1d426a20ab0 --- /dev/null +++ b/libclc/generic/include/clc/math/exp10.h @@ -0,0 +1,9 @@ +#undef exp10 + +#define __CLC_BODY <clc/math/unary_decl.inc> +#define __CLC_FUNCTION exp10 + +#include <clc/math/gentype.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/math/exp2.h b/libclc/generic/include/clc/math/exp2.h new file mode 100644 index 000000000000..ec0dad268a7b --- /dev/null +++ b/libclc/generic/include/clc/math/exp2.h @@ -0,0 +1,6 @@ +#undef exp2 +#define exp2 __clc_exp2 + +#define __CLC_FUNCTION __clc_exp2 +#define __CLC_INTRINSIC "llvm.exp2" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/fabs.h b/libclc/generic/include/clc/math/fabs.h new file mode 100644 index 000000000000..ee2f8932a94d --- /dev/null +++ b/libclc/generic/include/clc/math/fabs.h @@ -0,0 +1,6 @@ +#undef fabs +#define fabs __clc_fabs + +#define __CLC_FUNCTION __clc_fabs +#define __CLC_INTRINSIC "llvm.fabs" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/floor.h b/libclc/generic/include/clc/math/floor.h new file mode 100644 index 000000000000..2337d35caae6 --- /dev/null +++ b/libclc/generic/include/clc/math/floor.h @@ -0,0 +1,6 @@ +#undef floor +#define floor __clc_floor + +#define __CLC_FUNCTION __clc_floor +#define __CLC_INTRINSIC "llvm.floor" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/fma.h b/libclc/generic/include/clc/math/fma.h new file mode 100644 index 000000000000..02d39f681675 --- /dev/null +++ b/libclc/generic/include/clc/math/fma.h @@ -0,0 +1,6 @@ +#undef fma +#define fma __clc_fma + +#define __CLC_FUNCTION __clc_fma +#define __CLC_INTRINSIC "llvm.fma" +#include <clc/math/ternary_intrin.inc> diff --git a/libclc/generic/include/clc/math/fmax.h b/libclc/generic/include/clc/math/fmax.h new file mode 100644 index 000000000000..d6956af85a5f --- /dev/null +++ b/libclc/generic/include/clc/math/fmax.h @@ -0,0 +1,11 @@ +#undef fmax +#define fmax __clc_fmax + +#define __CLC_BODY <clc/math/binary_decl.inc> +#define __CLC_FUNCTION __clc_fmax + +#include <clc/math/gentype.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION + diff --git a/libclc/generic/include/clc/math/fmin.h b/libclc/generic/include/clc/math/fmin.h new file mode 100644 index 000000000000..5588ba93a8b8 --- /dev/null +++ b/libclc/generic/include/clc/math/fmin.h @@ -0,0 +1,11 @@ +#undef fmin +#define fmin __clc_fmin + +#define __CLC_BODY <clc/math/binary_decl.inc> +#define __CLC_FUNCTION __clc_fmin + +#include <clc/math/gentype.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION + diff --git a/libclc/generic/include/clc/math/fmod.h b/libclc/generic/include/clc/math/fmod.h new file mode 100644 index 000000000000..49068675b0ef --- /dev/null +++ b/libclc/generic/include/clc/math/fmod.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/fmod.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/fmod.inc b/libclc/generic/include/clc/math/fmod.inc new file mode 100644 index 000000000000..39d915365c25 --- /dev/null +++ b/libclc/generic/include/clc/math/fmod.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE fmod(__CLC_GENTYPE a, __CLC_GENTYPE b); diff --git a/libclc/generic/include/clc/math/gentype.inc b/libclc/generic/include/clc/math/gentype.inc new file mode 100644 index 000000000000..9f79f6eb037f --- /dev/null +++ b/libclc/generic/include/clc/math/gentype.inc @@ -0,0 +1,67 @@ +#define __CLC_SCALAR_GENTYPE float +#define __CLC_FPSIZE 32 + +#define __CLC_GENTYPE float +#define __CLC_SCALAR +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_SCALAR + +#define __CLC_GENTYPE float2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float3 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE float16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#undef __CLC_FPSIZE +#undef __CLC_SCALAR_GENTYPE + +#ifdef cl_khr_fp64 +#define __CLC_SCALAR_GENTYPE double +#define __CLC_FPSIZE 64 + +#define __CLC_SCALAR +#define __CLC_GENTYPE double +#include __CLC_BODY +#undef __CLC_GENTYPE +#undef __CLC_SCALAR + +#define __CLC_GENTYPE double2 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double3 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double4 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double8 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#define __CLC_GENTYPE double16 +#include __CLC_BODY +#undef __CLC_GENTYPE + +#undef __CLC_FPSIZE +#undef __CLC_SCALAR_GENTYPE +#endif + +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/math/hypot.h b/libclc/generic/include/clc/math/hypot.h new file mode 100644 index 000000000000..c00eb4532461 --- /dev/null +++ b/libclc/generic/include/clc/math/hypot.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/hypot.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/hypot.inc b/libclc/generic/include/clc/math/hypot.inc new file mode 100644 index 000000000000..08b46058b0aa --- /dev/null +++ b/libclc/generic/include/clc/math/hypot.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE hypot(__CLC_GENTYPE x, __CLC_GENTYPE y); diff --git a/libclc/generic/include/clc/math/log.h b/libclc/generic/include/clc/math/log.h new file mode 100644 index 000000000000..644f8575c1b3 --- /dev/null +++ b/libclc/generic/include/clc/math/log.h @@ -0,0 +1,4 @@ +#undef log + +// log(x) = log2(x) * (1/log2(e)) +#define log(val) (__clc_log2(val) * 0.693147181f) diff --git a/libclc/generic/include/clc/math/log1p.h b/libclc/generic/include/clc/math/log1p.h new file mode 100644 index 000000000000..4d716dd18d9c --- /dev/null +++ b/libclc/generic/include/clc/math/log1p.h @@ -0,0 +1,24 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#define __CLC_BODY <clc/math/log1p.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/log1p.inc b/libclc/generic/include/clc/math/log1p.inc new file mode 100644 index 000000000000..4cbfbf38fc11 --- /dev/null +++ b/libclc/generic/include/clc/math/log1p.inc @@ -0,0 +1,23 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE log1p(__CLC_GENTYPE a); diff --git a/libclc/generic/include/clc/math/log2.h b/libclc/generic/include/clc/math/log2.h new file mode 100644 index 000000000000..880124097ed0 --- /dev/null +++ b/libclc/generic/include/clc/math/log2.h @@ -0,0 +1,6 @@ +#undef log2 +#define log2 __clc_log2 + +#define __CLC_FUNCTION __clc_log2 +#define __CLC_INTRINSIC "llvm.log2" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/mad.h b/libclc/generic/include/clc/math/mad.h new file mode 100644 index 000000000000..c4e50840ced0 --- /dev/null +++ b/libclc/generic/include/clc/math/mad.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/mad.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/mad.inc b/libclc/generic/include/clc/math/mad.inc new file mode 100644 index 000000000000..61194b6ca4a7 --- /dev/null +++ b/libclc/generic/include/clc/math/mad.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mad(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_GENTYPE c); diff --git a/libclc/generic/include/clc/math/mix.h b/libclc/generic/include/clc/math/mix.h new file mode 100644 index 000000000000..c3c95c1f0c4b --- /dev/null +++ b/libclc/generic/include/clc/math/mix.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/mix.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/mix.inc b/libclc/generic/include/clc/math/mix.inc new file mode 100644 index 000000000000..52cb10ad9027 --- /dev/null +++ b/libclc/generic/include/clc/math/mix.inc @@ -0,0 +1,5 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mix(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_GENTYPE c); + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mix(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_SCALAR_GENTYPE c); +#endif diff --git a/libclc/generic/include/clc/math/native_cos.h b/libclc/generic/include/clc/math/native_cos.h new file mode 100644 index 000000000000..c7212cc4b663 --- /dev/null +++ b/libclc/generic/include/clc/math/native_cos.h @@ -0,0 +1 @@ +#define native_cos cos diff --git a/libclc/generic/include/clc/math/native_divide.h b/libclc/generic/include/clc/math/native_divide.h new file mode 100644 index 000000000000..5c52167fd3e7 --- /dev/null +++ b/libclc/generic/include/clc/math/native_divide.h @@ -0,0 +1 @@ +#define native_divide(x, y) ((x) / (y)) diff --git a/libclc/generic/include/clc/math/native_exp.h b/libclc/generic/include/clc/math/native_exp.h new file mode 100644 index 000000000000..e206de66926d --- /dev/null +++ b/libclc/generic/include/clc/math/native_exp.h @@ -0,0 +1 @@ +#define native_exp exp diff --git a/libclc/generic/include/clc/math/native_exp10.h b/libclc/generic/include/clc/math/native_exp10.h new file mode 100644 index 000000000000..1156f58c53a5 --- /dev/null +++ b/libclc/generic/include/clc/math/native_exp10.h @@ -0,0 +1 @@ +#define native_exp10 exp10 diff --git a/libclc/generic/include/clc/math/native_exp2.h b/libclc/generic/include/clc/math/native_exp2.h new file mode 100644 index 000000000000..b6759390ee43 --- /dev/null +++ b/libclc/generic/include/clc/math/native_exp2.h @@ -0,0 +1 @@ +#define native_exp2 exp2 diff --git a/libclc/generic/include/clc/math/native_log.h b/libclc/generic/include/clc/math/native_log.h new file mode 100644 index 000000000000..7805a39ed696 --- /dev/null +++ b/libclc/generic/include/clc/math/native_log.h @@ -0,0 +1 @@ +#define native_log log diff --git a/libclc/generic/include/clc/math/native_log2.h b/libclc/generic/include/clc/math/native_log2.h new file mode 100644 index 000000000000..0c692eec27f4 --- /dev/null +++ b/libclc/generic/include/clc/math/native_log2.h @@ -0,0 +1 @@ +#define native_log2 log2 diff --git a/libclc/generic/include/clc/math/native_powr.h b/libclc/generic/include/clc/math/native_powr.h new file mode 100644 index 000000000000..e8a37d9cb066 --- /dev/null +++ b/libclc/generic/include/clc/math/native_powr.h @@ -0,0 +1 @@ +#define native_powr pow diff --git a/libclc/generic/include/clc/math/native_sin.h b/libclc/generic/include/clc/math/native_sin.h new file mode 100644 index 000000000000..569a051ccc75 --- /dev/null +++ b/libclc/generic/include/clc/math/native_sin.h @@ -0,0 +1 @@ +#define native_sin sin diff --git a/libclc/generic/include/clc/math/native_sqrt.h b/libclc/generic/include/clc/math/native_sqrt.h new file mode 100644 index 000000000000..a9525fccb7c1 --- /dev/null +++ b/libclc/generic/include/clc/math/native_sqrt.h @@ -0,0 +1 @@ +#define native_sqrt sqrt diff --git a/libclc/generic/include/clc/math/nextafter.h b/libclc/generic/include/clc/math/nextafter.h new file mode 100644 index 000000000000..06e1b2a53c52 --- /dev/null +++ b/libclc/generic/include/clc/math/nextafter.h @@ -0,0 +1,5 @@ +#define __CLC_BODY <clc/math/binary_decl.inc> +#define __CLC_FUNCTION nextafter +#include <clc/math/gentype.inc> +#undef __CLC_FUNCTION +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/math/pow.h b/libclc/generic/include/clc/math/pow.h new file mode 100644 index 000000000000..320d341a6830 --- /dev/null +++ b/libclc/generic/include/clc/math/pow.h @@ -0,0 +1,6 @@ +#undef pow +#define pow __clc_pow + +#define __CLC_FUNCTION __clc_pow +#define __CLC_INTRINSIC "llvm.pow" +#include <clc/math/binary_intrin.inc> diff --git a/libclc/generic/include/clc/math/pown.h b/libclc/generic/include/clc/math/pown.h new file mode 100644 index 000000000000..bdbf50c1de6f --- /dev/null +++ b/libclc/generic/include/clc/math/pown.h @@ -0,0 +1,24 @@ +#define _CLC_POWN_INTRINSIC "llvm.powi" + +#define _CLC_POWN_DECL(GENTYPE, INTTYPE) \ + _CLC_OVERLOAD _CLC_DECL GENTYPE pown(GENTYPE x, INTTYPE y); + +#define _CLC_VECTOR_POWN_DECL(GENTYPE, INTTYPE) \ + _CLC_POWN_DECL(GENTYPE##2, INTTYPE##2) \ + _CLC_POWN_DECL(GENTYPE##3, INTTYPE##3) \ + _CLC_POWN_DECL(GENTYPE##4, INTTYPE##4) \ + _CLC_POWN_DECL(GENTYPE##8, INTTYPE##8) \ + _CLC_POWN_DECL(GENTYPE##16, INTTYPE##16) + +_CLC_OVERLOAD float pown(float x, int y) __asm(_CLC_POWN_INTRINSIC ".f32"); + +_CLC_VECTOR_POWN_DECL(float, int) + +#ifdef cl_khr_fp64 +_CLC_OVERLOAD double pown(double x, int y) __asm(_CLC_POWN_INTRINSIC ".f64"); +_CLC_VECTOR_POWN_DECL(double, int) +#endif + +#undef _CLC_POWN_INTRINSIC +#undef _CLC_POWN_DECL +#undef _CLC_VECTOR_POWN_DECL diff --git a/libclc/generic/include/clc/math/rint.h b/libclc/generic/include/clc/math/rint.h new file mode 100644 index 000000000000..d257634a6f95 --- /dev/null +++ b/libclc/generic/include/clc/math/rint.h @@ -0,0 +1,6 @@ +#undef rint +#define rint __clc_rint + +#define __CLC_FUNCTION __clc_rint +#define __CLC_INTRINSIC "llvm.rint" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/round.h b/libclc/generic/include/clc/math/round.h new file mode 100644 index 000000000000..43e16aed028f --- /dev/null +++ b/libclc/generic/include/clc/math/round.h @@ -0,0 +1,9 @@ +#undef round +#define round __clc_round + +#define __CLC_FUNCTION __clc_round +#define __CLC_INTRINSIC "llvm.round" +#include <clc/math/unary_intrin.inc> + +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC diff --git a/libclc/generic/include/clc/math/rsqrt.h b/libclc/generic/include/clc/math/rsqrt.h new file mode 100644 index 000000000000..9d49ee652262 --- /dev/null +++ b/libclc/generic/include/clc/math/rsqrt.h @@ -0,0 +1 @@ +#define rsqrt(x) (1.f/sqrt(x)) diff --git a/libclc/generic/include/clc/math/sin.h b/libclc/generic/include/clc/math/sin.h new file mode 100644 index 000000000000..6d4cf5a3142c --- /dev/null +++ b/libclc/generic/include/clc/math/sin.h @@ -0,0 +1,3 @@ +#define __CLC_BODY <clc/math/sin.inc> +#include <clc/math/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/math/sin.inc b/libclc/generic/include/clc/math/sin.inc new file mode 100644 index 000000000000..e722fa352731 --- /dev/null +++ b/libclc/generic/include/clc/math/sin.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE sin(__CLC_GENTYPE a); diff --git a/libclc/generic/include/clc/math/sincos.h b/libclc/generic/include/clc/math/sincos.h new file mode 100644 index 000000000000..fbb9b55cd1f7 --- /dev/null +++ b/libclc/generic/include/clc/math/sincos.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/sincos.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/sincos.inc b/libclc/generic/include/clc/math/sincos.inc new file mode 100644 index 000000000000..444ac82a5204 --- /dev/null +++ b/libclc/generic/include/clc/math/sincos.inc @@ -0,0 +1,8 @@ +#define __CLC_DECLARE_SINCOS(ADDRSPACE, TYPE) \ + _CLC_OVERLOAD _CLC_DECL TYPE sincos (TYPE x, ADDRSPACE TYPE * cosval); + +__CLC_DECLARE_SINCOS(global, __CLC_GENTYPE) +__CLC_DECLARE_SINCOS(local, __CLC_GENTYPE) +__CLC_DECLARE_SINCOS(private, __CLC_GENTYPE) + +#undef __CLC_DECLARE_SINCOS diff --git a/libclc/generic/include/clc/math/sqrt.h b/libclc/generic/include/clc/math/sqrt.h new file mode 100644 index 000000000000..f69de847e629 --- /dev/null +++ b/libclc/generic/include/clc/math/sqrt.h @@ -0,0 +1,6 @@ +#undef sqrt +#define sqrt __clc_sqrt + +#define __CLC_FUNCTION __clc_sqrt +#define __CLC_INTRINSIC "llvm.sqrt" +#include <clc/math/unary_intrin.inc> diff --git a/libclc/generic/include/clc/math/tan.h b/libclc/generic/include/clc/math/tan.h new file mode 100644 index 000000000000..d2d52a9459d0 --- /dev/null +++ b/libclc/generic/include/clc/math/tan.h @@ -0,0 +1,2 @@ +#define __CLC_BODY <clc/math/tan.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/math/tan.inc b/libclc/generic/include/clc/math/tan.inc new file mode 100644 index 000000000000..50c5b1d160c8 --- /dev/null +++ b/libclc/generic/include/clc/math/tan.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE tan(__CLC_GENTYPE x); diff --git a/libclc/generic/include/clc/math/ternary_intrin.inc b/libclc/generic/include/clc/math/ternary_intrin.inc new file mode 100644 index 000000000000..9633696ed9c4 --- /dev/null +++ b/libclc/generic/include/clc/math/ternary_intrin.inc @@ -0,0 +1,18 @@ +_CLC_OVERLOAD float __CLC_FUNCTION(float, float, float) __asm(__CLC_INTRINSIC ".f32"); +_CLC_OVERLOAD float2 __CLC_FUNCTION(float2, float2, float2) __asm(__CLC_INTRINSIC ".v2f32"); +_CLC_OVERLOAD float3 __CLC_FUNCTION(float3, float3, float3) __asm(__CLC_INTRINSIC ".v3f32"); +_CLC_OVERLOAD float4 __CLC_FUNCTION(float4, float4, float4) __asm(__CLC_INTRINSIC ".v4f32"); +_CLC_OVERLOAD float8 __CLC_FUNCTION(float8, float8, float8) __asm(__CLC_INTRINSIC ".v8f32"); +_CLC_OVERLOAD float16 __CLC_FUNCTION(float16, float16, float16) __asm(__CLC_INTRINSIC ".v16f32"); + +#ifdef cl_khr_fp64 +_CLC_OVERLOAD double __CLC_FUNCTION(double, double, double) __asm(__CLC_INTRINSIC ".f64"); +_CLC_OVERLOAD double2 __CLC_FUNCTION(double2, double2, double2) __asm(__CLC_INTRINSIC ".v2f64"); +_CLC_OVERLOAD double3 __CLC_FUNCTION(double3, double3, double3) __asm(__CLC_INTRINSIC ".v3f64"); +_CLC_OVERLOAD double4 __CLC_FUNCTION(double4, double4, double4) __asm(__CLC_INTRINSIC ".v4f64"); +_CLC_OVERLOAD double8 __CLC_FUNCTION(double8, double8, double8) __asm(__CLC_INTRINSIC ".v8f64"); +_CLC_OVERLOAD double16 __CLC_FUNCTION(double16, double16, double16) __asm(__CLC_INTRINSIC ".v16f64"); +#endif + +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC diff --git a/libclc/generic/include/clc/math/trunc.h b/libclc/generic/include/clc/math/trunc.h new file mode 100644 index 000000000000..d34f66190433 --- /dev/null +++ b/libclc/generic/include/clc/math/trunc.h @@ -0,0 +1,9 @@ +#undef trunc +#define trunc __clc_trunc + +#define __CLC_FUNCTION __clc_trunc +#define __CLC_INTRINSIC "llvm.trunc" +#include <clc/math/unary_intrin.inc> + +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC diff --git a/libclc/generic/include/clc/math/unary_decl.inc b/libclc/generic/include/clc/math/unary_decl.inc new file mode 100644 index 000000000000..9858d908da09 --- /dev/null +++ b/libclc/generic/include/clc/math/unary_decl.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE x); diff --git a/libclc/generic/include/clc/math/unary_intrin.inc b/libclc/generic/include/clc/math/unary_intrin.inc new file mode 100644 index 000000000000..8c62d8827fe7 --- /dev/null +++ b/libclc/generic/include/clc/math/unary_intrin.inc @@ -0,0 +1,18 @@ +_CLC_OVERLOAD float __CLC_FUNCTION(float f) __asm(__CLC_INTRINSIC ".f32"); +_CLC_OVERLOAD float2 __CLC_FUNCTION(float2 f) __asm(__CLC_INTRINSIC ".v2f32"); +_CLC_OVERLOAD float3 __CLC_FUNCTION(float3 f) __asm(__CLC_INTRINSIC ".v3f32"); +_CLC_OVERLOAD float4 __CLC_FUNCTION(float4 f) __asm(__CLC_INTRINSIC ".v4f32"); +_CLC_OVERLOAD float8 __CLC_FUNCTION(float8 f) __asm(__CLC_INTRINSIC ".v8f32"); +_CLC_OVERLOAD float16 __CLC_FUNCTION(float16 f) __asm(__CLC_INTRINSIC ".v16f32"); + +#ifdef cl_khr_fp64 +_CLC_OVERLOAD double __CLC_FUNCTION(double d) __asm(__CLC_INTRINSIC ".f64"); +_CLC_OVERLOAD double2 __CLC_FUNCTION(double2 d) __asm(__CLC_INTRINSIC ".v2f64"); +_CLC_OVERLOAD double3 __CLC_FUNCTION(double3 d) __asm(__CLC_INTRINSIC ".v3f64"); +_CLC_OVERLOAD double4 __CLC_FUNCTION(double4 d) __asm(__CLC_INTRINSIC ".v4f64"); +_CLC_OVERLOAD double8 __CLC_FUNCTION(double8 d) __asm(__CLC_INTRINSIC ".v8f64"); +_CLC_OVERLOAD double16 __CLC_FUNCTION(double16 d) __asm(__CLC_INTRINSIC ".v16f64"); +#endif + +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC diff --git a/libclc/generic/include/clc/relational/all.h b/libclc/generic/include/clc/relational/all.h new file mode 100644 index 000000000000..f8b0942444a2 --- /dev/null +++ b/libclc/generic/include/clc/relational/all.h @@ -0,0 +1,18 @@ +#define _CLC_ALL_DECL(TYPE) \ + _CLC_OVERLOAD _CLC_DECL int all(TYPE v); + +#define _CLC_VECTOR_ALL_DECL(TYPE) \ + _CLC_ALL_DECL(TYPE) \ + _CLC_ALL_DECL(TYPE##2) \ + _CLC_ALL_DECL(TYPE##3) \ + _CLC_ALL_DECL(TYPE##4) \ + _CLC_ALL_DECL(TYPE##8) \ + _CLC_ALL_DECL(TYPE##16) + +_CLC_VECTOR_ALL_DECL(char) +_CLC_VECTOR_ALL_DECL(short) +_CLC_VECTOR_ALL_DECL(int) +_CLC_VECTOR_ALL_DECL(long) + +#undef _CLC_ALL_DECL +#undef _CLC_VECTOR_ALL_DECL diff --git a/libclc/generic/include/clc/relational/any.h b/libclc/generic/include/clc/relational/any.h new file mode 100644 index 000000000000..4687ed263793 --- /dev/null +++ b/libclc/generic/include/clc/relational/any.h @@ -0,0 +1,16 @@ + +#define _CLC_ANY_DECL(TYPE) \ + _CLC_OVERLOAD _CLC_DECL int any(TYPE v); + +#define _CLC_VECTOR_ANY_DECL(TYPE) \ + _CLC_ANY_DECL(TYPE) \ + _CLC_ANY_DECL(TYPE##2) \ + _CLC_ANY_DECL(TYPE##3) \ + _CLC_ANY_DECL(TYPE##4) \ + _CLC_ANY_DECL(TYPE##8) \ + _CLC_ANY_DECL(TYPE##16) + +_CLC_VECTOR_ANY_DECL(char) +_CLC_VECTOR_ANY_DECL(short) +_CLC_VECTOR_ANY_DECL(int) +_CLC_VECTOR_ANY_DECL(long) diff --git a/libclc/generic/include/clc/relational/binary_decl.inc b/libclc/generic/include/clc/relational/binary_decl.inc new file mode 100644 index 000000000000..c9e4aee839a1 --- /dev/null +++ b/libclc/generic/include/clc/relational/binary_decl.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_INTN __CLC_FUNCTION(__CLC_FLOATN a, __CLC_FLOATN b); diff --git a/libclc/generic/include/clc/relational/bitselect.h b/libclc/generic/include/clc/relational/bitselect.h new file mode 100644 index 000000000000..e91cbfded8b7 --- /dev/null +++ b/libclc/generic/include/clc/relational/bitselect.h @@ -0,0 +1 @@ +#define bitselect(x, y, z) ((x) ^ ((z) & ((y) ^ (x)))) diff --git a/libclc/generic/include/clc/relational/floatn.inc b/libclc/generic/include/clc/relational/floatn.inc new file mode 100644 index 000000000000..8d7fd52cc7da --- /dev/null +++ b/libclc/generic/include/clc/relational/floatn.inc @@ -0,0 +1,81 @@ + +#define __CLC_FLOATN float +#define __CLC_INTN int +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN float2 +#define __CLC_INTN int2 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN float3 +#define __CLC_INTN int3 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN float4 +#define __CLC_INTN int4 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN float8 +#define __CLC_INTN int8 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN float16 +#define __CLC_INTN int16 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#undef __CLC_FLOAT +#undef __CLC_INT + +#ifdef cl_khr_fp64 + +#define __CLC_FLOATN double +#define __CLC_INTN int +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN double2 +#define __CLC_INTN long2 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN double3 +#define __CLC_INTN long3 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN double4 +#define __CLC_INTN long4 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN double8 +#define __CLC_INTN long8 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#define __CLC_FLOATN double16 +#define __CLC_INTN long16 +#include __CLC_BODY +#undef __CLC_INTN +#undef __CLC_FLOATN + +#endif + +#undef __CLC_BODY diff --git a/libclc/generic/include/clc/relational/isequal.h b/libclc/generic/include/clc/relational/isequal.h new file mode 100644 index 000000000000..c28a98565ee3 --- /dev/null +++ b/libclc/generic/include/clc/relational/isequal.h @@ -0,0 +1,20 @@ +#define _CLC_ISEQUAL_DECL(TYPE, RETTYPE) \ + _CLC_OVERLOAD _CLC_DECL RETTYPE isequal(TYPE x, TYPE y); + +#define _CLC_VECTOR_ISEQUAL_DECL(TYPE, RETTYPE) \ + _CLC_ISEQUAL_DECL(TYPE##2, RETTYPE##2) \ + _CLC_ISEQUAL_DECL(TYPE##3, RETTYPE##3) \ + _CLC_ISEQUAL_DECL(TYPE##4, RETTYPE##4) \ + _CLC_ISEQUAL_DECL(TYPE##8, RETTYPE##8) \ + _CLC_ISEQUAL_DECL(TYPE##16, RETTYPE##16) + +_CLC_ISEQUAL_DECL(float, int) +_CLC_VECTOR_ISEQUAL_DECL(float, int) + +#ifdef cl_khr_fp64 +_CLC_ISEQUAL_DECL(double, int) +_CLC_VECTOR_ISEQUAL_DECL(double, long) +#endif + +#undef _CLC_ISEQUAL_DECL +#undef _CLC_VECTOR_ISEQUAL_DEC diff --git a/libclc/generic/include/clc/relational/isfinite.h b/libclc/generic/include/clc/relational/isfinite.h new file mode 100644 index 000000000000..48e261a54ff7 --- /dev/null +++ b/libclc/generic/include/clc/relational/isfinite.h @@ -0,0 +1,9 @@ +#undef isfinite + +#define __CLC_FUNCTION isfinite +#define __CLC_BODY <clc/relational/unary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isgreater.h b/libclc/generic/include/clc/relational/isgreater.h new file mode 100644 index 000000000000..d17ae0c00c82 --- /dev/null +++ b/libclc/generic/include/clc/relational/isgreater.h @@ -0,0 +1,9 @@ +#undef isgreater + +#define __CLC_FUNCTION isgreater +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isgreaterequal.h b/libclc/generic/include/clc/relational/isgreaterequal.h new file mode 100644 index 000000000000..835332858d29 --- /dev/null +++ b/libclc/generic/include/clc/relational/isgreaterequal.h @@ -0,0 +1,9 @@ +#undef isgreaterequal + +#define __CLC_FUNCTION isgreaterequal +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isinf.h b/libclc/generic/include/clc/relational/isinf.h new file mode 100644 index 000000000000..869f0c8a9ac4 --- /dev/null +++ b/libclc/generic/include/clc/relational/isinf.h @@ -0,0 +1,21 @@ + +#define _CLC_ISINF_DECL(RET_TYPE, ARG_TYPE) \ + _CLC_OVERLOAD _CLC_DECL RET_TYPE isinf(ARG_TYPE); + +#define _CLC_VECTOR_ISINF_DECL(RET_TYPE, ARG_TYPE) \ + _CLC_ISINF_DECL(RET_TYPE##2, ARG_TYPE##2) \ + _CLC_ISINF_DECL(RET_TYPE##3, ARG_TYPE##3) \ + _CLC_ISINF_DECL(RET_TYPE##4, ARG_TYPE##4) \ + _CLC_ISINF_DECL(RET_TYPE##8, ARG_TYPE##8) \ + _CLC_ISINF_DECL(RET_TYPE##16, ARG_TYPE##16) + +_CLC_ISINF_DECL(int, float) +_CLC_VECTOR_ISINF_DECL(int, float) + +#ifdef cl_khr_fp64 +_CLC_ISINF_DECL(int, double) +_CLC_VECTOR_ISINF_DECL(long, double) +#endif + +#undef _CLC_ISINF_DECL +#undef _CLC_VECTOR_ISINF_DECL diff --git a/libclc/generic/include/clc/relational/isless.h b/libclc/generic/include/clc/relational/isless.h new file mode 100644 index 000000000000..1debd87f386e --- /dev/null +++ b/libclc/generic/include/clc/relational/isless.h @@ -0,0 +1,7 @@ +#define __CLC_FUNCTION isless +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/islessequal.h b/libclc/generic/include/clc/relational/islessequal.h new file mode 100644 index 000000000000..e6a99d7f21c8 --- /dev/null +++ b/libclc/generic/include/clc/relational/islessequal.h @@ -0,0 +1,7 @@ +#define __CLC_FUNCTION islessequal +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/islessgreater.h b/libclc/generic/include/clc/relational/islessgreater.h new file mode 100644 index 000000000000..005ba1090789 --- /dev/null +++ b/libclc/generic/include/clc/relational/islessgreater.h @@ -0,0 +1,7 @@ +#define __CLC_FUNCTION islessgreater +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isnan.h b/libclc/generic/include/clc/relational/isnan.h new file mode 100644 index 000000000000..93eb9dffb424 --- /dev/null +++ b/libclc/generic/include/clc/relational/isnan.h @@ -0,0 +1,21 @@ + +#define _CLC_ISNAN_DECL(RET_TYPE, ARG_TYPE) \ + _CLC_OVERLOAD _CLC_DECL RET_TYPE isnan(ARG_TYPE); + +#define _CLC_VECTOR_ISNAN_DECL(RET_TYPE, ARG_TYPE) \ + _CLC_ISNAN_DECL(RET_TYPE##2, ARG_TYPE##2) \ + _CLC_ISNAN_DECL(RET_TYPE##3, ARG_TYPE##3) \ + _CLC_ISNAN_DECL(RET_TYPE##4, ARG_TYPE##4) \ + _CLC_ISNAN_DECL(RET_TYPE##8, ARG_TYPE##8) \ + _CLC_ISNAN_DECL(RET_TYPE##16, ARG_TYPE##16) + +_CLC_ISNAN_DECL(int, float) +_CLC_VECTOR_ISNAN_DECL(int, float) + +#ifdef cl_khr_fp64 +_CLC_ISNAN_DECL(int, double) +_CLC_VECTOR_ISNAN_DECL(long, double) +#endif + +#undef _CLC_ISNAN_DECL +#undef _CLC_VECTOR_ISNAN_DECL diff --git a/libclc/generic/include/clc/relational/isnormal.h b/libclc/generic/include/clc/relational/isnormal.h new file mode 100644 index 000000000000..f568c56f8e6e --- /dev/null +++ b/libclc/generic/include/clc/relational/isnormal.h @@ -0,0 +1,9 @@ +#undef isnormal + +#define __CLC_FUNCTION isnormal +#define __CLC_BODY <clc/relational/unary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isnotequal.h b/libclc/generic/include/clc/relational/isnotequal.h new file mode 100644 index 000000000000..f2ceea211046 --- /dev/null +++ b/libclc/generic/include/clc/relational/isnotequal.h @@ -0,0 +1,9 @@ +#undef isnotequal + +#define __CLC_FUNCTION isnotequal +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isordered.h b/libclc/generic/include/clc/relational/isordered.h new file mode 100644 index 000000000000..89e9620a4600 --- /dev/null +++ b/libclc/generic/include/clc/relational/isordered.h @@ -0,0 +1,9 @@ +#undef isordered + +#define __CLC_FUNCTION isordered +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/isunordered.h b/libclc/generic/include/clc/relational/isunordered.h new file mode 100644 index 000000000000..a6b8e2557d23 --- /dev/null +++ b/libclc/generic/include/clc/relational/isunordered.h @@ -0,0 +1,9 @@ +#undef isunordered + +#define __CLC_FUNCTION isunordered +#define __CLC_BODY <clc/relational/binary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/select.h b/libclc/generic/include/clc/relational/select.h new file mode 100644 index 000000000000..33a6909fb929 --- /dev/null +++ b/libclc/generic/include/clc/relational/select.h @@ -0,0 +1 @@ +#define select(a, b, c) ((c) ? (b) : (a)) diff --git a/libclc/generic/include/clc/relational/signbit.h b/libclc/generic/include/clc/relational/signbit.h new file mode 100644 index 000000000000..41e5284bb34c --- /dev/null +++ b/libclc/generic/include/clc/relational/signbit.h @@ -0,0 +1,9 @@ +#undef signbit + +#define __CLC_FUNCTION signbit +#define __CLC_BODY <clc/relational/unary_decl.inc> + +#include <clc/relational/floatn.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/include/clc/relational/unary_decl.inc b/libclc/generic/include/clc/relational/unary_decl.inc new file mode 100644 index 000000000000..ab9b776a46ec --- /dev/null +++ b/libclc/generic/include/clc/relational/unary_decl.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_INTN __CLC_FUNCTION(__CLC_FLOATN x); diff --git a/libclc/generic/include/clc/shared/clamp.h b/libclc/generic/include/clc/shared/clamp.h new file mode 100644 index 000000000000..a389b85d2666 --- /dev/null +++ b/libclc/generic/include/clc/shared/clamp.h @@ -0,0 +1,5 @@ +#define __CLC_BODY <clc/shared/clamp.inc> +#include <clc/integer/gentype.inc> + +#define __CLC_BODY <clc/shared/clamp.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/shared/clamp.inc b/libclc/generic/include/clc/shared/clamp.inc new file mode 100644 index 000000000000..aaff9d0ff07f --- /dev/null +++ b/libclc/generic/include/clc/shared/clamp.inc @@ -0,0 +1,5 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z); + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_SCALAR_GENTYPE y, __CLC_SCALAR_GENTYPE z); +#endif diff --git a/libclc/generic/include/clc/shared/max.h b/libclc/generic/include/clc/shared/max.h new file mode 100644 index 000000000000..ee20b9e64df7 --- /dev/null +++ b/libclc/generic/include/clc/shared/max.h @@ -0,0 +1,5 @@ +#define __CLC_BODY <clc/shared/max.inc> +#include <clc/integer/gentype.inc> + +#define __CLC_BODY <clc/shared/max.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/shared/max.inc b/libclc/generic/include/clc/shared/max.inc new file mode 100644 index 000000000000..590107435e66 --- /dev/null +++ b/libclc/generic/include/clc/shared/max.inc @@ -0,0 +1,5 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_GENTYPE b); + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b); +#endif diff --git a/libclc/generic/include/clc/shared/min.h b/libclc/generic/include/clc/shared/min.h new file mode 100644 index 000000000000..e11d9f9551ff --- /dev/null +++ b/libclc/generic/include/clc/shared/min.h @@ -0,0 +1,5 @@ +#define __CLC_BODY <clc/shared/min.inc> +#include <clc/integer/gentype.inc> + +#define __CLC_BODY <clc/shared/min.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/include/clc/shared/min.inc b/libclc/generic/include/clc/shared/min.inc new file mode 100644 index 000000000000..d8c1568a590c --- /dev/null +++ b/libclc/generic/include/clc/shared/min.inc @@ -0,0 +1,5 @@ +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_GENTYPE b); + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b); +#endif diff --git a/libclc/generic/include/clc/shared/vload.h b/libclc/generic/include/clc/shared/vload.h new file mode 100644 index 000000000000..93d07501d4a1 --- /dev/null +++ b/libclc/generic/include/clc/shared/vload.h @@ -0,0 +1,37 @@ +#define _CLC_VLOAD_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \ + _CLC_OVERLOAD _CLC_DECL VEC_TYPE vload##WIDTH(size_t offset, const ADDR_SPACE PRIM_TYPE *x); + +#define _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, ADDR_SPACE) \ + _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \ + _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \ + _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \ + _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \ + _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE) + +#define _CLC_VECTOR_VLOAD_PRIM1(PRIM_TYPE) \ + _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __private) \ + _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __local) \ + _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __constant) \ + _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __global) \ + +#define _CLC_VECTOR_VLOAD_PRIM() \ + _CLC_VECTOR_VLOAD_PRIM1(char) \ + _CLC_VECTOR_VLOAD_PRIM1(uchar) \ + _CLC_VECTOR_VLOAD_PRIM1(short) \ + _CLC_VECTOR_VLOAD_PRIM1(ushort) \ + _CLC_VECTOR_VLOAD_PRIM1(int) \ + _CLC_VECTOR_VLOAD_PRIM1(uint) \ + _CLC_VECTOR_VLOAD_PRIM1(long) \ + _CLC_VECTOR_VLOAD_PRIM1(ulong) \ + _CLC_VECTOR_VLOAD_PRIM1(float) \ + +#ifdef cl_khr_fp64 +#define _CLC_VECTOR_VLOAD() \ + _CLC_VECTOR_VLOAD_PRIM1(double) \ + _CLC_VECTOR_VLOAD_PRIM() +#else +#define _CLC_VECTOR_VLOAD() \ + _CLC_VECTOR_VLOAD_PRIM() +#endif + +_CLC_VECTOR_VLOAD() diff --git a/libclc/generic/include/clc/shared/vstore.h b/libclc/generic/include/clc/shared/vstore.h new file mode 100644 index 000000000000..1f784f82fec0 --- /dev/null +++ b/libclc/generic/include/clc/shared/vstore.h @@ -0,0 +1,36 @@ +#define _CLC_VSTORE_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \ + _CLC_OVERLOAD _CLC_DECL void vstore##WIDTH(VEC_TYPE vec, size_t offset, ADDR_SPACE PRIM_TYPE *out); + +#define _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, ADDR_SPACE) \ + _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \ + _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \ + _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \ + _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \ + _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE) + +#define _CLC_VECTOR_VSTORE_PRIM1(PRIM_TYPE) \ + _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __private) \ + _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __local) \ + _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __global) \ + +#define _CLC_VECTOR_VSTORE_PRIM() \ + _CLC_VECTOR_VSTORE_PRIM1(char) \ + _CLC_VECTOR_VSTORE_PRIM1(uchar) \ + _CLC_VECTOR_VSTORE_PRIM1(short) \ + _CLC_VECTOR_VSTORE_PRIM1(ushort) \ + _CLC_VECTOR_VSTORE_PRIM1(int) \ + _CLC_VECTOR_VSTORE_PRIM1(uint) \ + _CLC_VECTOR_VSTORE_PRIM1(long) \ + _CLC_VECTOR_VSTORE_PRIM1(ulong) \ + _CLC_VECTOR_VSTORE_PRIM1(float) \ + +#ifdef cl_khr_fp64 +#define _CLC_VECTOR_VSTORE() \ + _CLC_VECTOR_VSTORE_PRIM1(double) \ + _CLC_VECTOR_VSTORE_PRIM() +#else +#define _CLC_VECTOR_VSTORE() \ + _CLC_VECTOR_VSTORE_PRIM() +#endif + +_CLC_VECTOR_VSTORE() diff --git a/libclc/generic/include/clc/synchronization/barrier.h b/libclc/generic/include/clc/synchronization/barrier.h new file mode 100644 index 000000000000..7167a3d3f093 --- /dev/null +++ b/libclc/generic/include/clc/synchronization/barrier.h @@ -0,0 +1 @@ +_CLC_DECL void barrier(cl_mem_fence_flags flags); diff --git a/libclc/generic/include/clc/synchronization/cl_mem_fence_flags.h b/libclc/generic/include/clc/synchronization/cl_mem_fence_flags.h new file mode 100644 index 000000000000..c57eb4249a41 --- /dev/null +++ b/libclc/generic/include/clc/synchronization/cl_mem_fence_flags.h @@ -0,0 +1,4 @@ +typedef uint cl_mem_fence_flags; + +#define CLK_LOCAL_MEM_FENCE 1 +#define CLK_GLOBAL_MEM_FENCE 2 diff --git a/libclc/generic/include/clc/workitem/get_global_id.h b/libclc/generic/include/clc/workitem/get_global_id.h new file mode 100644 index 000000000000..92759f146894 --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_global_id.h @@ -0,0 +1 @@ +_CLC_DECL size_t get_global_id(uint dim); diff --git a/libclc/generic/include/clc/workitem/get_global_size.h b/libclc/generic/include/clc/workitem/get_global_size.h new file mode 100644 index 000000000000..2f8370585397 --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_global_size.h @@ -0,0 +1 @@ +_CLC_DECL size_t get_global_size(uint dim); diff --git a/libclc/generic/include/clc/workitem/get_group_id.h b/libclc/generic/include/clc/workitem/get_group_id.h new file mode 100644 index 000000000000..346c82c6c316 --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_group_id.h @@ -0,0 +1 @@ +_CLC_DECL size_t get_group_id(uint dim); diff --git a/libclc/generic/include/clc/workitem/get_local_id.h b/libclc/generic/include/clc/workitem/get_local_id.h new file mode 100644 index 000000000000..169aeed86786 --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_local_id.h @@ -0,0 +1 @@ +_CLC_DECL size_t get_local_id(uint dim); diff --git a/libclc/generic/include/clc/workitem/get_local_size.h b/libclc/generic/include/clc/workitem/get_local_size.h new file mode 100644 index 000000000000..040ec58a3d8b --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_local_size.h @@ -0,0 +1 @@ +_CLC_DECL size_t get_local_size(uint dim); diff --git a/libclc/generic/include/clc/workitem/get_num_groups.h b/libclc/generic/include/clc/workitem/get_num_groups.h new file mode 100644 index 000000000000..e555c7efc2d2 --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_num_groups.h @@ -0,0 +1 @@ +_CLC_DECL size_t get_num_groups(uint dim); diff --git a/libclc/generic/include/clc/workitem/get_work_dim.h b/libclc/generic/include/clc/workitem/get_work_dim.h new file mode 100644 index 000000000000..6d1982567063 --- /dev/null +++ b/libclc/generic/include/clc/workitem/get_work_dim.h @@ -0,0 +1 @@ +_CLC_DECL uint get_work_dim(); diff --git a/libclc/generic/include/math/clc_nextafter.h b/libclc/generic/include/math/clc_nextafter.h new file mode 100644 index 000000000000..2b674b707956 --- /dev/null +++ b/libclc/generic/include/math/clc_nextafter.h @@ -0,0 +1,7 @@ +#define __CLC_BODY <clc/math/binary_decl.inc> +#define __CLC_FUNCTION __clc_nextafter + +#include <clc/math/gentype.inc> + +#undef __CLC_BODY +#undef __CLC_FUNCTION diff --git a/libclc/generic/lib/SOURCES b/libclc/generic/lib/SOURCES new file mode 100644 index 000000000000..b76fec98f634 --- /dev/null +++ b/libclc/generic/lib/SOURCES @@ -0,0 +1,99 @@ +async/async_work_group_copy.cl +async/async_work_group_strided_copy.cl +async/prefetch.cl +async/wait_group_events.cl +atomic/atomic_xchg.cl +atomic/atomic_impl.ll +cl_khr_global_int32_base_atomics/atom_add.cl +cl_khr_global_int32_base_atomics/atom_cmpxchg.cl +cl_khr_global_int32_base_atomics/atom_dec.cl +cl_khr_global_int32_base_atomics/atom_inc.cl +cl_khr_global_int32_base_atomics/atom_sub.cl +cl_khr_global_int32_base_atomics/atom_xchg.cl +cl_khr_global_int32_extended_atomics/atom_and.cl +cl_khr_global_int32_extended_atomics/atom_max.cl +cl_khr_global_int32_extended_atomics/atom_min.cl +cl_khr_global_int32_extended_atomics/atom_or.cl +cl_khr_global_int32_extended_atomics/atom_xor.cl +cl_khr_local_int32_base_atomics/atom_add.cl +cl_khr_local_int32_base_atomics/atom_cmpxchg.cl +cl_khr_local_int32_base_atomics/atom_dec.cl +cl_khr_local_int32_base_atomics/atom_inc.cl +cl_khr_local_int32_base_atomics/atom_sub.cl +cl_khr_local_int32_base_atomics/atom_xchg.cl +cl_khr_local_int32_extended_atomics/atom_and.cl +cl_khr_local_int32_extended_atomics/atom_max.cl +cl_khr_local_int32_extended_atomics/atom_min.cl +cl_khr_local_int32_extended_atomics/atom_or.cl +cl_khr_local_int32_extended_atomics/atom_xor.cl +convert.cl +common/sign.cl +geometric/cross.cl +geometric/dot.cl +geometric/length.cl +geometric/normalize.cl +integer/abs.cl +integer/abs_diff.cl +integer/add_sat.cl +integer/add_sat_if.ll +integer/add_sat_impl.ll +integer/clz.cl +integer/clz_if.ll +integer/clz_impl.ll +integer/hadd.cl +integer/mad24.cl +integer/mad_sat.cl +integer/mul24.cl +integer/mul_hi.cl +integer/rhadd.cl +integer/rotate.cl +integer/sub_sat.cl +integer/sub_sat_if.ll +integer/sub_sat_impl.ll +integer/upsample.cl +math/acos.cl +math/asin.cl +math/atan.cl +math/atan2.cl +math/copysign.cl +math/cos.cl +math/exp.cl +math/exp10.cl +math/fmax.cl +math/fmin.cl +math/fmod.cl +math/hypot.cl +math/log1p.cl +math/mad.cl +math/mix.cl +math/tables.cl +math/clc_nextafter.cl +math/nextafter.cl +math/pown.cl +math/sin.cl +math/sincos.cl +math/sincos_helpers.cl +math/tan.cl +relational/all.cl +relational/any.cl +relational/isequal.cl +relational/isfinite.cl +relational/isgreater.cl +relational/isgreaterequal.cl +relational/isinf.cl +relational/isless.cl +relational/islessequal.cl +relational/islessgreater.cl +relational/isnan.cl +relational/isnormal.cl +relational/isnotequal.cl +relational/isordered.cl +relational/isunordered.cl +relational/signbit.cl +shared/clamp.cl +shared/max.cl +shared/min.cl +shared/vload.cl +shared/vstore.cl +workitem/get_global_id.cl +workitem/get_global_size.cl diff --git a/libclc/generic/lib/async/async_work_group_copy.cl b/libclc/generic/lib/async/async_work_group_copy.cl new file mode 100644 index 000000000000..fe20ecfd9bba --- /dev/null +++ b/libclc/generic/lib/async/async_work_group_copy.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <async_work_group_copy.inc> +#include <clc/async/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/lib/async/async_work_group_copy.inc b/libclc/generic/lib/async/async_work_group_copy.inc new file mode 100644 index 000000000000..a143ddfb9f6c --- /dev/null +++ b/libclc/generic/lib/async/async_work_group_copy.inc @@ -0,0 +1,17 @@ +_CLC_OVERLOAD _CLC_DEF event_t async_work_group_copy( + local __CLC_GENTYPE *dst, + const global __CLC_GENTYPE *src, + size_t num_gentypes, + event_t event) { + + return async_work_group_strided_copy(dst, src, num_gentypes, 1, event); +} + +_CLC_OVERLOAD _CLC_DEF event_t async_work_group_copy( + global __CLC_GENTYPE *dst, + const local __CLC_GENTYPE *src, + size_t num_gentypes, + event_t event) { + + return async_work_group_strided_copy(dst, src, num_gentypes, 1, event); +} diff --git a/libclc/generic/lib/async/async_work_group_strided_copy.cl b/libclc/generic/lib/async/async_work_group_strided_copy.cl new file mode 100644 index 000000000000..61b88986fe47 --- /dev/null +++ b/libclc/generic/lib/async/async_work_group_strided_copy.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <async_work_group_strided_copy.inc> +#include <clc/async/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/lib/async/async_work_group_strided_copy.inc b/libclc/generic/lib/async/async_work_group_strided_copy.inc new file mode 100644 index 000000000000..d81a8b79430d --- /dev/null +++ b/libclc/generic/lib/async/async_work_group_strided_copy.inc @@ -0,0 +1,34 @@ + +#define STRIDED_COPY(dst, src, num_gentypes, dst_stride, src_stride) \ + size_t size = get_local_size(0) * get_local_size(1) * get_local_size(2); \ + size_t id = (get_local_size(1) * get_local_size(2) * get_local_id(0)) + \ + (get_local_size(2) * get_local_id(1)) + \ + get_local_id(2); \ + size_t i; \ + \ + for (i = id; i < num_gentypes; i += size) { \ + dst[i * dst_stride] = src[i * src_stride]; \ + } + + +_CLC_OVERLOAD _CLC_DEF event_t async_work_group_strided_copy( + local __CLC_GENTYPE *dst, + const global __CLC_GENTYPE *src, + size_t num_gentypes, + size_t src_stride, + event_t event) { + + STRIDED_COPY(dst, src, num_gentypes, 1, src_stride); + return event; +} + +_CLC_OVERLOAD _CLC_DEF event_t async_work_group_strided_copy( + global __CLC_GENTYPE *dst, + const local __CLC_GENTYPE *src, + size_t num_gentypes, + size_t dst_stride, + event_t event) { + + STRIDED_COPY(dst, src, num_gentypes, dst_stride, 1); + return event; +} diff --git a/libclc/generic/lib/async/prefetch.cl b/libclc/generic/lib/async/prefetch.cl new file mode 100644 index 000000000000..45af21b4d9ff --- /dev/null +++ b/libclc/generic/lib/async/prefetch.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <prefetch.inc> +#include <clc/async/gentype.inc> +#undef __CLC_BODY diff --git a/libclc/generic/lib/async/prefetch.inc b/libclc/generic/lib/async/prefetch.inc new file mode 100644 index 000000000000..6747e4cf5819 --- /dev/null +++ b/libclc/generic/lib/async/prefetch.inc @@ -0,0 +1 @@ +_CLC_OVERLOAD _CLC_DEF void prefetch(const global __CLC_GENTYPE *p, size_t num_gentypes) { } diff --git a/libclc/generic/lib/async/wait_group_events.cl b/libclc/generic/lib/async/wait_group_events.cl new file mode 100644 index 000000000000..05c9d58db45e --- /dev/null +++ b/libclc/generic/lib/async/wait_group_events.cl @@ -0,0 +1,5 @@ +#include <clc/clc.h> + +_CLC_DEF void wait_group_events(int num_events, event_t *event_list) { + barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); +} diff --git a/libclc/generic/lib/atomic/atomic_impl.ll b/libclc/generic/lib/atomic/atomic_impl.ll new file mode 100644 index 000000000000..019147f8c509 --- /dev/null +++ b/libclc/generic/lib/atomic/atomic_impl.ll @@ -0,0 +1,133 @@ +define i32 @__clc_atomic_add_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile add i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_add_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile add i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_and_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile and i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_and_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile and i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_cmpxchg_addr1(i32 addrspace(1)* nocapture %ptr, i32 %compare, i32 %value) nounwind alwaysinline { +entry: + %0 = cmpxchg volatile i32 addrspace(1)* %ptr, i32 %compare, i32 %value seq_cst seq_cst + %1 = extractvalue { i32, i1 } %0, 0 + ret i32 %1 +} + +define i32 @__clc_atomic_cmpxchg_addr3(i32 addrspace(3)* nocapture %ptr, i32 %compare, i32 %value) nounwind alwaysinline { +entry: + %0 = cmpxchg volatile i32 addrspace(3)* %ptr, i32 %compare, i32 %value seq_cst seq_cst + %1 = extractvalue { i32, i1 } %0, 0 + ret i32 %1 +} + +define i32 @__clc_atomic_max_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile max i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_max_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile max i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_min_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile min i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_min_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile min i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_or_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile or i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_or_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile or i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_umax_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile umax i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_umax_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile umax i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_umin_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile umin i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_umin_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile umin i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_sub_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile sub i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_sub_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile sub i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_xchg_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile xchg i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_xchg_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile xchg i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_xor_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile xor i32 addrspace(1)* %ptr, i32 %value seq_cst + ret i32 %0 +} + +define i32 @__clc_atomic_xor_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { +entry: + %0 = atomicrmw volatile xor i32 addrspace(3)* %ptr, i32 %value seq_cst + ret i32 %0 +} diff --git a/libclc/generic/lib/atomic/atomic_xchg.cl b/libclc/generic/lib/atomic/atomic_xchg.cl new file mode 100644 index 000000000000..9aee5950141c --- /dev/null +++ b/libclc/generic/lib/atomic/atomic_xchg.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +_CLC_OVERLOAD _CLC_DEF float atomic_xchg(volatile global float *p, float val) { + return as_float(atomic_xchg((volatile global int *)p, as_int(val))); +} + +_CLC_OVERLOAD _CLC_DEF float atomic_xchg(volatile local float *p, float val) { + return as_float(atomic_xchg((volatile local int *)p, as_int(val))); +} diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_add.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_add.cl new file mode 100644 index 000000000000..9151b0ccf8d9 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_add.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_add(global TYPE *p, TYPE val) { \ + return atomic_add(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_cmpxchg.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_cmpxchg.cl new file mode 100644 index 000000000000..76477406c7f1 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_cmpxchg.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_cmpxchg(global TYPE *p, TYPE cmp, TYPE val) { \ + return atomic_cmpxchg(p, cmp, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl new file mode 100644 index 000000000000..a74158d45fc8 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_dec(global TYPE *p) { \ + return atom_sub(p, 1); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl new file mode 100644 index 000000000000..1404b5aa4477 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_inc(global TYPE *p) { \ + return atom_add(p, 1); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_sub.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_sub.cl new file mode 100644 index 000000000000..7faa3cc040f0 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_sub.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_sub(global TYPE *p, TYPE val) { \ + return atomic_sub(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_xchg.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_xchg.cl new file mode 100644 index 000000000000..9c77db13f309 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_xchg.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_xchg(global TYPE *p, TYPE val) { \ + return atomic_xchg(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_and.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_and.cl new file mode 100644 index 000000000000..e58796961b98 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_and.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_and(global TYPE *p, TYPE val) { \ + return atomic_and(p, val); \ +} + +IMPL(int) +IMPL(unsigned int)
\ No newline at end of file diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_max.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_max.cl new file mode 100644 index 000000000000..09177ed8eef4 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_max.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_max(global TYPE *p, TYPE val) { \ + return atomic_max(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_min.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_min.cl new file mode 100644 index 000000000000..277c41ba90dc --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_min.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_min(global TYPE *p, TYPE val) { \ + return atomic_min(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_or.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_or.cl new file mode 100644 index 000000000000..a936a8ea7d31 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_or.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_or(global TYPE *p, TYPE val) { \ + return atomic_or(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_xor.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_xor.cl new file mode 100644 index 000000000000..1a8e35004cd5 --- /dev/null +++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_xor.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_xor(global TYPE *p, TYPE val) { \ + return atomic_xor(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_add.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_add.cl new file mode 100644 index 000000000000..a5dea1824a16 --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_add.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_add(local TYPE *p, TYPE val) { \ + return atomic_add(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_cmpxchg.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_cmpxchg.cl new file mode 100644 index 000000000000..16e957964dbb --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_cmpxchg.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_cmpxchg(local TYPE *p, TYPE cmp, TYPE val) { \ + return atomic_cmpxchg(p, cmp, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl new file mode 100644 index 000000000000..d22c333f5d56 --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_dec(local TYPE *p) { \ + return atom_sub(p, 1); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl new file mode 100644 index 000000000000..4ba0d062997c --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_inc(local TYPE *p) { \ + return atom_add(p, 1); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_sub.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_sub.cl new file mode 100644 index 000000000000..c96696ac2084 --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_sub.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_sub(local TYPE *p, TYPE val) { \ + return atomic_sub(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_xchg.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_xchg.cl new file mode 100644 index 000000000000..7d4bcca3fe7a --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_xchg.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_xchg(local TYPE *p, TYPE val) { \ + return atomic_xchg(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_and.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_and.cl new file mode 100644 index 000000000000..180103acc01e --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_and.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_and(local TYPE *p, TYPE val) { \ + return atomic_and(p, val); \ +} + +IMPL(int) +IMPL(unsigned int)
\ No newline at end of file diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_max.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_max.cl new file mode 100644 index 000000000000..b90301ba0f76 --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_max.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_max(local TYPE *p, TYPE val) { \ + return atomic_max(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_min.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_min.cl new file mode 100644 index 000000000000..3acedd8350fc --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_min.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_min(local TYPE *p, TYPE val) { \ + return atomic_min(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_or.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_or.cl new file mode 100644 index 000000000000..338ff2c01088 --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_or.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_or(local TYPE *p, TYPE val) { \ + return atomic_or(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_xor.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_xor.cl new file mode 100644 index 000000000000..51ae3c0e9194 --- /dev/null +++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_xor.cl @@ -0,0 +1,9 @@ +#include <clc/clc.h> + +#define IMPL(TYPE) \ +_CLC_OVERLOAD _CLC_DEF TYPE atom_xor(local TYPE *p, TYPE val) { \ + return atomic_xor(p, val); \ +} + +IMPL(int) +IMPL(unsigned int) diff --git a/libclc/generic/lib/clcmacro.h b/libclc/generic/lib/clcmacro.h new file mode 100644 index 000000000000..ef102ea54e9f --- /dev/null +++ b/libclc/generic/lib/clcmacro.h @@ -0,0 +1,76 @@ +#define _CLC_UNARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE) \ + DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x) { \ + return (RET_TYPE##2)(FUNCTION(x.x), FUNCTION(x.y)); \ + } \ +\ + DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x) { \ + return (RET_TYPE##3)(FUNCTION(x.x), FUNCTION(x.y), FUNCTION(x.z)); \ + } \ +\ + DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x) { \ + return (RET_TYPE##4)(FUNCTION(x.lo), FUNCTION(x.hi)); \ + } \ +\ + DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x) { \ + return (RET_TYPE##8)(FUNCTION(x.lo), FUNCTION(x.hi)); \ + } \ +\ + DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x) { \ + return (RET_TYPE##16)(FUNCTION(x.lo), FUNCTION(x.hi)); \ + } + +#define _CLC_BINARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \ + DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x, ARG2_TYPE##2 y) { \ + return (RET_TYPE##2)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y)); \ + } \ +\ + DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x, ARG2_TYPE##3 y) { \ + return (RET_TYPE##3)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y), \ + FUNCTION(x.z, y.z)); \ + } \ +\ + DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x, ARG2_TYPE##4 y) { \ + return (RET_TYPE##4)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \ + } \ +\ + DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x, ARG2_TYPE##8 y) { \ + return (RET_TYPE##8)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \ + } \ +\ + DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x, ARG2_TYPE##16 y) { \ + return (RET_TYPE##16)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \ + } + +#define _CLC_TERNARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE, ARG3_TYPE) \ + DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x, ARG2_TYPE##2 y, ARG3_TYPE##2 z) { \ + return (RET_TYPE##2)(FUNCTION(x.x, y.x, z.x), FUNCTION(x.y, y.y, z.y)); \ + } \ +\ + DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x, ARG2_TYPE##3 y, ARG3_TYPE##3 z) { \ + return (RET_TYPE##3)(FUNCTION(x.x, y.x, z.x), FUNCTION(x.y, y.y, z.y), \ + FUNCTION(x.z, y.z, z.z)); \ + } \ +\ + DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x, ARG2_TYPE##4 y, ARG3_TYPE##4 z) { \ + return (RET_TYPE##4)(FUNCTION(x.lo, y.lo, z.lo), FUNCTION(x.hi, y.hi, z.hi)); \ + } \ +\ + DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x, ARG2_TYPE##8 y, ARG3_TYPE##8 z) { \ + return (RET_TYPE##8)(FUNCTION(x.lo, y.lo, z.lo), FUNCTION(x.hi, y.hi, z.hi)); \ + } \ +\ + DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x, ARG2_TYPE##16 y, ARG3_TYPE##16 z) { \ + return (RET_TYPE##16)(FUNCTION(x.lo, y.lo, z.lo), FUNCTION(x.hi, y.hi, z.hi)); \ + } + +#define _CLC_DEFINE_BINARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE, ARG2_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \ + return BUILTIN(x, y); \ +} \ +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) + +#define _CLC_DEFINE_UNARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x) { \ + return BUILTIN(x); \ +} \ +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, RET_TYPE, FUNCTION, ARG1_TYPE) diff --git a/libclc/generic/lib/common/sign.cl b/libclc/generic/lib/common/sign.cl new file mode 100644 index 000000000000..25832e0b4f8b --- /dev/null +++ b/libclc/generic/lib/common/sign.cl @@ -0,0 +1,28 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +#define SIGN(TYPE, F) \ +_CLC_DEF _CLC_OVERLOAD TYPE sign(TYPE x) { \ + if (isnan(x)) { \ + return 0.0F; \ + } \ + if (x > 0.0F) { \ + return 1.0F; \ + } \ + if (x < 0.0F) { \ + return -1.0F; \ + } \ + return x; /* -0.0 or +0.0 */ \ +} + +SIGN(float, f) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, sign, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +SIGN(double, ) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, sign, double) + +#endif diff --git a/libclc/generic/lib/gen_convert.py b/libclc/generic/lib/gen_convert.py new file mode 100644 index 000000000000..f91a89a3c321 --- /dev/null +++ b/libclc/generic/lib/gen_convert.py @@ -0,0 +1,388 @@ +#!/usr/bin/env python3 + +# OpenCL built-in library: type conversion functions +# +# Copyright (c) 2013 Victor Oliveira <victormatheus@gmail.com> +# Copyright (c) 2013 Jesse Towner <jessetowner@lavabit.com> +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. + +# This script generates the file convert_type.cl, which contains all of the +# OpenCL functions in the form: +# +# convert_<destTypen><_sat><_roundingMode>(<sourceTypen>) + +types = ['char', 'uchar', 'short', 'ushort', 'int', 'uint', 'long', 'ulong', 'float', 'double'] +int_types = ['char', 'uchar', 'short', 'ushort', 'int', 'uint', 'long', 'ulong'] +unsigned_types = ['uchar', 'ushort', 'uint', 'ulong'] +float_types = ['float', 'double'] +int64_types = ['long', 'ulong'] +float64_types = ['double'] +vector_sizes = ['', '2', '3', '4', '8', '16'] +half_sizes = [('2',''), ('4','2'), ('8','4'), ('16','8')] + +saturation = ['','_sat'] +rounding_modes = ['_rtz','_rte','_rtp','_rtn'] +float_prefix = {'float':'FLT_', 'double':'DBL_'} +float_suffix = {'float':'f', 'double':''} + +bool_type = {'char' : 'char', + 'uchar' : 'char', + 'short' : 'short', + 'ushort': 'short', + 'int' : 'int', + 'uint' : 'int', + 'long' : 'long', + 'ulong' : 'long', + 'float' : 'int', + 'double' : 'long'} + +unsigned_type = {'char' : 'uchar', + 'uchar' : 'uchar', + 'short' : 'ushort', + 'ushort': 'ushort', + 'int' : 'uint', + 'uint' : 'uint', + 'long' : 'ulong', + 'ulong' : 'ulong'} + +sizeof_type = {'char' : 1, 'uchar' : 1, + 'short' : 2, 'ushort' : 2, + 'int' : 4, 'uint' : 4, + 'long' : 8, 'ulong' : 8, + 'float' : 4, 'double' : 8} + +limit_max = {'char' : 'CHAR_MAX', + 'uchar' : 'UCHAR_MAX', + 'short' : 'SHRT_MAX', + 'ushort': 'USHRT_MAX', + 'int' : 'INT_MAX', + 'uint' : 'UINT_MAX', + 'long' : 'LONG_MAX', + 'ulong' : 'ULONG_MAX'} + +limit_min = {'char' : 'CHAR_MIN', + 'uchar' : '0', + 'short' : 'SHRT_MIN', + 'ushort': '0', + 'int' : 'INT_MIN', + 'uint' : '0', + 'long' : 'LONG_MIN', + 'ulong' : '0'} + +def conditional_guard(src, dst): + int64_count = 0 + float64_count = 0 + if src in int64_types: + int64_count = int64_count +1 + elif src in float64_types: + float64_count = float64_count + 1 + if dst in int64_types: + int64_count = int64_count +1 + elif dst in float64_types: + float64_count = float64_count + 1 + if float64_count > 0 and int64_count > 0: + print("#if defined(cl_khr_fp64) && defined(cles_khr_int64)") + return True + elif float64_count > 0: + print("#ifdef cl_khr_fp64") + return True + elif int64_count > 0: + print("#ifdef cles_khr_int64") + return True + return False + + +print("""/* !!!! AUTOGENERATED FILE generated by convert_type.py !!!!! + + DON'T CHANGE THIS FILE. MAKE YOUR CHANGES TO convert_type.py AND RUN: + $ ./generate-conversion-type-cl.sh + + OpenCL type conversion functions + + Copyright (c) 2013 Victor Oliveira <victormatheus@gmail.com> + Copyright (c) 2013 Jesse Towner <jessetowner@lavabit.com> + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. +*/ + +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +""") + +# +# Default Conversions +# +# All conversions are in accordance with the OpenCL specification, +# which cites the C99 conversion rules. +# +# Casting from floating point to integer results in conversions +# with truncation, so it should be suitable for the default convert +# functions. +# +# Conversions from integer to floating-point, and floating-point to +# floating-point through casting is done with the default rounding +# mode. While C99 allows dynamically changing the rounding mode +# during runtime, it is not a supported feature in OpenCL according +# to Section 7.1 - Rounding Modes in the OpenCL 1.2 specification. +# +# Therefore, we can assume for optimization purposes that the +# rounding mode is fixed to round-to-nearest-even. Platform target +# authors should ensure that the rounding-control registers remain +# in this state, and that this invariant holds. +# +# Also note, even though the OpenCL specification isn't entirely +# clear on this matter, we implement all rounding mode combinations +# even for integer-to-integer conversions. When such a conversion +# is used, the rounding mode is ignored. +# + +def generate_default_conversion(src, dst, mode): + close_conditional = conditional_guard(src, dst) + + # scalar conversions + print("""_CLC_DEF _CLC_OVERLOAD +{DST} convert_{DST}{M}({SRC} x) +{{ + return ({DST})x; +}} +""".format(SRC=src, DST=dst, M=mode)) + + # vector conversions, done through decomposition to components + for size, half_size in half_sizes: + print("""_CLC_DEF _CLC_OVERLOAD +{DST}{N} convert_{DST}{N}{M}({SRC}{N} x) +{{ + return ({DST}{N})(convert_{DST}{H}(x.lo), convert_{DST}{H}(x.hi)); +}} +""".format(SRC=src, DST=dst, N=size, H=half_size, M=mode)) + + # 3-component vector conversions + print("""_CLC_DEF _CLC_OVERLOAD +{DST}3 convert_{DST}3{M}({SRC}3 x) +{{ + return ({DST}3)(convert_{DST}2(x.s01), convert_{DST}(x.s2)); +}}""".format(SRC=src, DST=dst, M=mode)) + + if close_conditional: + print("#endif") + + +for src in types: + for dst in types: + generate_default_conversion(src, dst, '') + +for src in int_types: + for dst in int_types: + for mode in rounding_modes: + generate_default_conversion(src, dst, mode) + +# +# Saturated Conversions To Integers +# +# These functions are dependent on the unsaturated conversion functions +# generated above, and use clamp, max, min, and select to eliminate +# branching and vectorize the conversions. +# +# Again, as above, we allow all rounding modes for integer-to-integer +# conversions with saturation. +# + +def generate_saturated_conversion(src, dst, size): + # Header + close_conditional = conditional_guard(src, dst) + print("""_CLC_DEF _CLC_OVERLOAD +{DST}{N} convert_{DST}{N}_sat({SRC}{N} x) +{{""".format(DST=dst, SRC=src, N=size)) + + # FIXME: This is a work around for lack of select function with + # signed third argument when the first two arguments are unsigned types. + # We cast to the signed type for sign-extension, then do a bitcast to + # the unsigned type. + if dst in unsigned_types: + bool_prefix = "as_{DST}{N}(convert_{BOOL}{N}".format(DST=dst, BOOL=bool_type[dst], N=size); + bool_suffix = ")" + else: + bool_prefix = "convert_{BOOL}{N}".format(BOOL=bool_type[dst], N=size); + bool_suffix = "" + + # Body + if src == dst: + + # Conversion between same types + print(" return x;") + + elif src in float_types: + + # Conversion from float to int + print(""" {DST}{N} y = convert_{DST}{N}(x); + y = select(y, ({DST}{N}){DST_MIN}, {BP}(x < ({SRC}{N}){DST_MIN}){BS}); + y = select(y, ({DST}{N}){DST_MAX}, {BP}(x > ({SRC}{N}){DST_MAX}){BS}); + return y;""".format(SRC=src, DST=dst, N=size, + DST_MIN=limit_min[dst], DST_MAX=limit_max[dst], + BP=bool_prefix, BS=bool_suffix)) + + else: + + # Integer to integer convesion with sizeof(src) == sizeof(dst) + if sizeof_type[src] == sizeof_type[dst]: + if src in unsigned_types: + print(" x = min(x, ({SRC}){DST_MAX});".format(SRC=src, DST_MAX=limit_max[dst])) + else: + print(" x = max(x, ({SRC})0);".format(SRC=src)) + + # Integer to integer conversion where sizeof(src) > sizeof(dst) + elif sizeof_type[src] > sizeof_type[dst]: + if src in unsigned_types: + print(" x = min(x, ({SRC}){DST_MAX});".format(SRC=src, DST_MAX=limit_max[dst])) + else: + print(" x = clamp(x, ({SRC}){DST_MIN}, ({SRC}){DST_MAX});" + .format(SRC=src, DST_MIN=limit_min[dst], DST_MAX=limit_max[dst])) + + # Integer to integer conversion where sizeof(src) < sizeof(dst) + elif src not in unsigned_types and dst in unsigned_types: + print(" x = max(x, ({SRC})0);".format(SRC=src)) + + print(" return convert_{DST}{N}(x);".format(DST=dst, N=size)) + + # Footer + print("}") + if close_conditional: + print("#endif") + + +for src in types: + for dst in int_types: + for size in vector_sizes: + generate_saturated_conversion(src, dst, size) + + +def generate_saturated_conversion_with_rounding(src, dst, size, mode): + # Header + close_conditional = conditional_guard(src, dst) + + # Body + print("""_CLC_DEF _CLC_OVERLOAD +{DST}{N} convert_{DST}{N}_sat{M}({SRC}{N} x) +{{ + return convert_{DST}{N}_sat(x); +}} +""".format(DST=dst, SRC=src, N=size, M=mode)) + + # Footer + if close_conditional: + print("#endif") + + +for src in int_types: + for dst in int_types: + for size in vector_sizes: + for mode in rounding_modes: + generate_saturated_conversion_with_rounding(src, dst, size, mode) + +# +# Conversions To/From Floating-Point With Rounding +# +# Note that we assume as above that casts from floating-point to +# integer are done with truncation, and that the default rounding +# mode is fixed to round-to-nearest-even, as per C99 and OpenCL +# rounding rules. +# +# These functions rely on the use of abs, ceil, fabs, floor, +# nextafter, sign, rint and the above generated conversion functions. +# +# Only conversions to integers can have saturation. +# + +def generate_float_conversion(src, dst, size, mode, sat): + # Header + close_conditional = conditional_guard(src, dst) + print("""_CLC_DEF _CLC_OVERLOAD +{DST}{N} convert_{DST}{N}{S}{M}({SRC}{N} x) +{{""".format(SRC=src, DST=dst, N=size, M=mode, S=sat)) + + # Perform conversion + if dst in int_types: + if mode == '_rte': + print(" x = rint(x);"); + elif mode == '_rtp': + print(" x = ceil(x);"); + elif mode == '_rtn': + print(" x = floor(x);"); + print(" return convert_{DST}{N}{S}(x);".format(DST=dst, N=size, S=sat)) + elif mode == '_rte': + print(" return convert_{DST}{N}(x);".format(DST=dst, N=size)) + else: + print(" {DST}{N} r = convert_{DST}{N}(x);".format(DST=dst, N=size)) + print(" {SRC}{N} y = convert_{SRC}{N}(y);".format(SRC=src, N=size)) + if mode == '_rtz': + if src in int_types: + print(" {USRC}{N} abs_x = abs(x);".format(USRC=unsigned_type[src], N=size)) + print(" {USRC}{N} abs_y = abs(y);".format(USRC=unsigned_type[src], N=size)) + else: + print(" {SRC}{N} abs_x = fabs(x);".format(SRC=src, N=size)) + print(" {SRC}{N} abs_y = fabs(y);".format(SRC=src, N=size)) + print(" return select(r, nextafter(r, sign(r) * ({DST}{N})-INFINITY), convert_{BOOL}{N}(abs_y > abs_x));" + .format(DST=dst, N=size, BOOL=bool_type[dst])) + if mode == '_rtp': + print(" return select(r, nextafter(r, ({DST}{N})INFINITY), convert_{BOOL}{N}(y < x));" + .format(DST=dst, N=size, BOOL=bool_type[dst])) + if mode == '_rtn': + print(" return select(r, nextafter(r, ({DST}{N})-INFINITY), convert_{BOOL}{N}(y > x));" + .format(DST=dst, N=size, BOOL=bool_type[dst])) + + # Footer + print("}") + if close_conditional: + print("#endif") + + +for src in float_types: + for dst in int_types: + for size in vector_sizes: + for mode in rounding_modes: + for sat in saturation: + generate_float_conversion(src, dst, size, mode, sat) + + +for src in types: + for dst in float_types: + for size in vector_sizes: + for mode in rounding_modes: + generate_float_conversion(src, dst, size, mode, '') diff --git a/libclc/generic/lib/geometric/cross.cl b/libclc/generic/lib/geometric/cross.cl new file mode 100644 index 000000000000..3b4ca6cafae9 --- /dev/null +++ b/libclc/generic/lib/geometric/cross.cl @@ -0,0 +1,25 @@ +#include <clc/clc.h> + +_CLC_OVERLOAD _CLC_DEF float3 cross(float3 p0, float3 p1) { + return (float3)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z, + p0.x*p1.y - p0.y*p1.x); +} + +_CLC_OVERLOAD _CLC_DEF float4 cross(float4 p0, float4 p1) { + return (float4)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z, + p0.x*p1.y - p0.y*p1.x, 0.f); +} + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_OVERLOAD _CLC_DEF double3 cross(double3 p0, double3 p1) { + return (double3)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z, + p0.x*p1.y - p0.y*p1.x); +} + +_CLC_OVERLOAD _CLC_DEF double4 cross(double4 p0, double4 p1) { + return (double4)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z, + p0.x*p1.y - p0.y*p1.x, 0.f); +} +#endif diff --git a/libclc/generic/lib/geometric/dot.cl b/libclc/generic/lib/geometric/dot.cl new file mode 100644 index 000000000000..0d6fe6c9a4e8 --- /dev/null +++ b/libclc/generic/lib/geometric/dot.cl @@ -0,0 +1,39 @@ +#include <clc/clc.h> + +_CLC_OVERLOAD _CLC_DEF float dot(float p0, float p1) { + return p0*p1; +} + +_CLC_OVERLOAD _CLC_DEF float dot(float2 p0, float2 p1) { + return p0.x*p1.x + p0.y*p1.y; +} + +_CLC_OVERLOAD _CLC_DEF float dot(float3 p0, float3 p1) { + return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z; +} + +_CLC_OVERLOAD _CLC_DEF float dot(float4 p0, float4 p1) { + return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z + p0.w*p1.w; +} + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_OVERLOAD _CLC_DEF double dot(double p0, double p1) { + return p0*p1; +} + +_CLC_OVERLOAD _CLC_DEF double dot(double2 p0, double2 p1) { + return p0.x*p1.x + p0.y*p1.y; +} + +_CLC_OVERLOAD _CLC_DEF double dot(double3 p0, double3 p1) { + return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z; +} + +_CLC_OVERLOAD _CLC_DEF double dot(double4 p0, double4 p1) { + return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z + p0.w*p1.w; +} + +#endif diff --git a/libclc/generic/lib/geometric/length.cl b/libclc/generic/lib/geometric/length.cl new file mode 100644 index 000000000000..ef087c75f9f1 --- /dev/null +++ b/libclc/generic/lib/geometric/length.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <length.inc> +#include <clc/geometric/floatn.inc> diff --git a/libclc/generic/lib/geometric/length.inc b/libclc/generic/lib/geometric/length.inc new file mode 100644 index 000000000000..5faaaffbd6a8 --- /dev/null +++ b/libclc/generic/lib/geometric/length.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_FLOAT length(__CLC_FLOATN p) { + return native_sqrt(dot(p, p)); +} diff --git a/libclc/generic/lib/geometric/normalize.cl b/libclc/generic/lib/geometric/normalize.cl new file mode 100644 index 000000000000..b06b2fe3a4c4 --- /dev/null +++ b/libclc/generic/lib/geometric/normalize.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <normalize.inc> +#include <clc/geometric/floatn.inc> diff --git a/libclc/generic/lib/geometric/normalize.inc b/libclc/generic/lib/geometric/normalize.inc new file mode 100644 index 000000000000..423ff79fc4e2 --- /dev/null +++ b/libclc/generic/lib/geometric/normalize.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_FLOATN normalize(__CLC_FLOATN p) { + return p/length(p); +} diff --git a/libclc/generic/lib/integer/abs.cl b/libclc/generic/lib/integer/abs.cl new file mode 100644 index 000000000000..faff8d05fefc --- /dev/null +++ b/libclc/generic/lib/integer/abs.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <abs.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/lib/integer/abs.inc b/libclc/generic/lib/integer/abs.inc new file mode 100644 index 000000000000..cfe7bfecd294 --- /dev/null +++ b/libclc/generic/lib/integer/abs.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_U_GENTYPE abs(__CLC_GENTYPE x) { + return __builtin_astype((__CLC_GENTYPE)(x > (__CLC_GENTYPE)(0) ? x : -x), __CLC_U_GENTYPE); +} diff --git a/libclc/generic/lib/integer/abs_diff.cl b/libclc/generic/lib/integer/abs_diff.cl new file mode 100644 index 000000000000..3d751057819e --- /dev/null +++ b/libclc/generic/lib/integer/abs_diff.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <abs_diff.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/lib/integer/abs_diff.inc b/libclc/generic/lib/integer/abs_diff.inc new file mode 100644 index 000000000000..f39c3ff4d3e8 --- /dev/null +++ b/libclc/generic/lib/integer/abs_diff.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_U_GENTYPE abs_diff(__CLC_GENTYPE x, __CLC_GENTYPE y) { + return __builtin_astype((__CLC_GENTYPE)(x > y ? x-y : y-x), __CLC_U_GENTYPE); +} diff --git a/libclc/generic/lib/integer/add_sat.cl b/libclc/generic/lib/integer/add_sat.cl new file mode 100644 index 000000000000..d4df66db3ede --- /dev/null +++ b/libclc/generic/lib/integer/add_sat.cl @@ -0,0 +1,53 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +// From add_sat.ll +_CLC_DECL char __clc_add_sat_s8(char, char); +_CLC_DECL uchar __clc_add_sat_u8(uchar, uchar); +_CLC_DECL short __clc_add_sat_s16(short, short); +_CLC_DECL ushort __clc_add_sat_u16(ushort, ushort); +_CLC_DECL int __clc_add_sat_s32(int, int); +_CLC_DECL uint __clc_add_sat_u32(uint, uint); +_CLC_DECL long __clc_add_sat_s64(long, long); +_CLC_DECL ulong __clc_add_sat_u64(ulong, ulong); + +_CLC_OVERLOAD _CLC_DEF char add_sat(char x, char y) { + return __clc_add_sat_s8(x, y); +} + +_CLC_OVERLOAD _CLC_DEF uchar add_sat(uchar x, uchar y) { + return __clc_add_sat_u8(x, y); +} + +_CLC_OVERLOAD _CLC_DEF short add_sat(short x, short y) { + return __clc_add_sat_s16(x, y); +} + +_CLC_OVERLOAD _CLC_DEF ushort add_sat(ushort x, ushort y) { + return __clc_add_sat_u16(x, y); +} + +_CLC_OVERLOAD _CLC_DEF int add_sat(int x, int y) { + return __clc_add_sat_s32(x, y); +} + +_CLC_OVERLOAD _CLC_DEF uint add_sat(uint x, uint y) { + return __clc_add_sat_u32(x, y); +} + +_CLC_OVERLOAD _CLC_DEF long add_sat(long x, long y) { + return __clc_add_sat_s64(x, y); +} + +_CLC_OVERLOAD _CLC_DEF ulong add_sat(ulong x, ulong y) { + return __clc_add_sat_u64(x, y); +} + +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, add_sat, char, char) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, add_sat, uchar, uchar) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, add_sat, short, short) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, add_sat, ushort, ushort) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, add_sat, int, int) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, add_sat, uint, uint) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, add_sat, long, long) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, add_sat, ulong, ulong) diff --git a/libclc/generic/lib/integer/add_sat_if.ll b/libclc/generic/lib/integer/add_sat_if.ll new file mode 100644 index 000000000000..bcbe4c0dd348 --- /dev/null +++ b/libclc/generic/lib/integer/add_sat_if.ll @@ -0,0 +1,55 @@ +declare i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y) + +define i8 @__clc_add_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y) + ret i8 %call +} + +declare i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y) + +define i8 @__clc_add_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y) + ret i8 %call +} + +declare i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y) + +define i16 @__clc_add_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y) + ret i16 %call +} + +declare i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y) + +define i16 @__clc_add_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y) + ret i16 %call +} + +declare i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y) + +define i32 @__clc_add_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y) + ret i32 %call +} + +declare i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y) + +define i32 @__clc_add_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y) + ret i32 %call +} + +declare i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y) + +define i64 @__clc_add_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y) + ret i64 %call +} + +declare i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y) + +define i64 @__clc_add_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y) + ret i64 %call +} diff --git a/libclc/generic/lib/integer/add_sat_impl.ll b/libclc/generic/lib/integer/add_sat_impl.ll new file mode 100644 index 000000000000..c150ecb56b8b --- /dev/null +++ b/libclc/generic/lib/integer/add_sat_impl.ll @@ -0,0 +1,83 @@ +declare {i8, i1} @llvm.sadd.with.overflow.i8(i8, i8) +declare {i8, i1} @llvm.uadd.with.overflow.i8(i8, i8) + +define i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call {i8, i1} @llvm.sadd.with.overflow.i8(i8 %x, i8 %y) + %res = extractvalue {i8, i1} %call, 0 + %over = extractvalue {i8, i1} %call, 1 + %x.msb = ashr i8 %x, 7 + %x.limit = xor i8 %x.msb, 127 + %sat = select i1 %over, i8 %x.limit, i8 %res + ret i8 %sat +} + +define i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call {i8, i1} @llvm.uadd.with.overflow.i8(i8 %x, i8 %y) + %res = extractvalue {i8, i1} %call, 0 + %over = extractvalue {i8, i1} %call, 1 + %sat = select i1 %over, i8 -1, i8 %res + ret i8 %sat +} + +declare {i16, i1} @llvm.sadd.with.overflow.i16(i16, i16) +declare {i16, i1} @llvm.uadd.with.overflow.i16(i16, i16) + +define i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call {i16, i1} @llvm.sadd.with.overflow.i16(i16 %x, i16 %y) + %res = extractvalue {i16, i1} %call, 0 + %over = extractvalue {i16, i1} %call, 1 + %x.msb = ashr i16 %x, 15 + %x.limit = xor i16 %x.msb, 32767 + %sat = select i1 %over, i16 %x.limit, i16 %res + ret i16 %sat +} + +define i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call {i16, i1} @llvm.uadd.with.overflow.i16(i16 %x, i16 %y) + %res = extractvalue {i16, i1} %call, 0 + %over = extractvalue {i16, i1} %call, 1 + %sat = select i1 %over, i16 -1, i16 %res + ret i16 %sat +} + +declare {i32, i1} @llvm.sadd.with.overflow.i32(i32, i32) +declare {i32, i1} @llvm.uadd.with.overflow.i32(i32, i32) + +define i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %x, i32 %y) + %res = extractvalue {i32, i1} %call, 0 + %over = extractvalue {i32, i1} %call, 1 + %x.msb = ashr i32 %x, 31 + %x.limit = xor i32 %x.msb, 2147483647 + %sat = select i1 %over, i32 %x.limit, i32 %res + ret i32 %sat +} + +define i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %x, i32 %y) + %res = extractvalue {i32, i1} %call, 0 + %over = extractvalue {i32, i1} %call, 1 + %sat = select i1 %over, i32 -1, i32 %res + ret i32 %sat +} + +declare {i64, i1} @llvm.sadd.with.overflow.i64(i64, i64) +declare {i64, i1} @llvm.uadd.with.overflow.i64(i64, i64) + +define i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call {i64, i1} @llvm.sadd.with.overflow.i64(i64 %x, i64 %y) + %res = extractvalue {i64, i1} %call, 0 + %over = extractvalue {i64, i1} %call, 1 + %x.msb = ashr i64 %x, 63 + %x.limit = xor i64 %x.msb, 9223372036854775807 + %sat = select i1 %over, i64 %x.limit, i64 %res + ret i64 %sat +} + +define i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call {i64, i1} @llvm.uadd.with.overflow.i64(i64 %x, i64 %y) + %res = extractvalue {i64, i1} %call, 0 + %over = extractvalue {i64, i1} %call, 1 + %sat = select i1 %over, i64 -1, i64 %res + ret i64 %sat +} diff --git a/libclc/generic/lib/integer/clz.cl b/libclc/generic/lib/integer/clz.cl new file mode 100644 index 000000000000..17e3fe014741 --- /dev/null +++ b/libclc/generic/lib/integer/clz.cl @@ -0,0 +1,53 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +// From clz.ll +_CLC_DECL char __clc_clz_s8(char); +_CLC_DECL uchar __clc_clz_u8(uchar); +_CLC_DECL short __clc_clz_s16(short); +_CLC_DECL ushort __clc_clz_u16(ushort); +_CLC_DECL int __clc_clz_s32(int); +_CLC_DECL uint __clc_clz_u32(uint); +_CLC_DECL long __clc_clz_s64(long); +_CLC_DECL ulong __clc_clz_u64(ulong); + +_CLC_OVERLOAD _CLC_DEF char clz(char x) { + return __clc_clz_s8(x); +} + +_CLC_OVERLOAD _CLC_DEF uchar clz(uchar x) { + return __clc_clz_u8(x); +} + +_CLC_OVERLOAD _CLC_DEF short clz(short x) { + return __clc_clz_s16(x); +} + +_CLC_OVERLOAD _CLC_DEF ushort clz(ushort x) { + return __clc_clz_u16(x); +} + +_CLC_OVERLOAD _CLC_DEF int clz(int x) { + return __clc_clz_s32(x); +} + +_CLC_OVERLOAD _CLC_DEF uint clz(uint x) { + return __clc_clz_u32(x); +} + +_CLC_OVERLOAD _CLC_DEF long clz(long x) { + return __clc_clz_s64(x); +} + +_CLC_OVERLOAD _CLC_DEF ulong clz(ulong x) { + return __clc_clz_u64(x); +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, clz, char) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, clz, uchar) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, clz, short) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, clz, ushort) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, clz, int) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, clz, uint) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, clz, long) +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, clz, ulong) diff --git a/libclc/generic/lib/integer/clz_if.ll b/libclc/generic/lib/integer/clz_if.ll new file mode 100644 index 000000000000..23dfc74a8a82 --- /dev/null +++ b/libclc/generic/lib/integer/clz_if.ll @@ -0,0 +1,55 @@ +declare i8 @__clc_clz_impl_s8(i8 %x) + +define i8 @__clc_clz_s8(i8 %x) nounwind readnone alwaysinline { + %call = call i8 @__clc_clz_impl_s8(i8 %x) + ret i8 %call +} + +declare i8 @__clc_clz_impl_u8(i8 %x) + +define i8 @__clc_clz_u8(i8 %x) nounwind readnone alwaysinline { + %call = call i8 @__clc_clz_impl_u8(i8 %x) + ret i8 %call +} + +declare i16 @__clc_clz_impl_s16(i16 %x) + +define i16 @__clc_clz_s16(i16 %x) nounwind readnone alwaysinline { + %call = call i16 @__clc_clz_impl_s16(i16 %x) + ret i16 %call +} + +declare i16 @__clc_clz_impl_u16(i16 %x) + +define i16 @__clc_clz_u16(i16 %x) nounwind readnone alwaysinline { + %call = call i16 @__clc_clz_impl_u16(i16 %x) + ret i16 %call +} + +declare i32 @__clc_clz_impl_s32(i32 %x) + +define i32 @__clc_clz_s32(i32 %x) nounwind readnone alwaysinline { + %call = call i32 @__clc_clz_impl_s32(i32 %x) + ret i32 %call +} + +declare i32 @__clc_clz_impl_u32(i32 %x) + +define i32 @__clc_clz_u32(i32 %x) nounwind readnone alwaysinline { + %call = call i32 @__clc_clz_impl_u32(i32 %x) + ret i32 %call +} + +declare i64 @__clc_clz_impl_s64(i64 %x) + +define i64 @__clc_clz_s64(i64 %x) nounwind readnone alwaysinline { + %call = call i64 @__clc_clz_impl_s64(i64 %x) + ret i64 %call +} + +declare i64 @__clc_clz_impl_u64(i64 %x) + +define i64 @__clc_clz_u64(i64 %x) nounwind readnone alwaysinline { + %call = call i64 @__clc_clz_impl_u64(i64 %x) + ret i64 %call +} diff --git a/libclc/generic/lib/integer/clz_impl.ll b/libclc/generic/lib/integer/clz_impl.ll new file mode 100644 index 000000000000..b5c3d98ae418 --- /dev/null +++ b/libclc/generic/lib/integer/clz_impl.ll @@ -0,0 +1,44 @@ +declare i8 @llvm.ctlz.i8(i8, i1) +declare i16 @llvm.ctlz.i16(i16, i1) +declare i32 @llvm.ctlz.i32(i32, i1) +declare i64 @llvm.ctlz.i64(i64, i1) + +define i8 @__clc_clz_impl_s8(i8 %x) nounwind readnone alwaysinline { + %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0) + ret i8 %call +} + +define i8 @__clc_clz_impl_u8(i8 %x) nounwind readnone alwaysinline { + %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0) + ret i8 %call +} + +define i16 @__clc_clz_impl_s16(i16 %x) nounwind readnone alwaysinline { + %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0) + ret i16 %call +} + +define i16 @__clc_clz_impl_u16(i16 %x) nounwind readnone alwaysinline { + %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0) + ret i16 %call +} + +define i32 @__clc_clz_impl_s32(i32 %x) nounwind readnone alwaysinline { + %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0) + ret i32 %call +} + +define i32 @__clc_clz_impl_u32(i32 %x) nounwind readnone alwaysinline { + %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0) + ret i32 %call +} + +define i64 @__clc_clz_impl_s64(i64 %x) nounwind readnone alwaysinline { + %call = call i64 @llvm.ctlz.i64(i64 %x, i1 0) + ret i64 %call +} + +define i64 @__clc_clz_impl_u64(i64 %x) nounwind readnone alwaysinline { + %call = call i64 @llvm.ctlz.i64(i64 %x, i1 0) + ret i64 %call +} diff --git a/libclc/generic/lib/integer/hadd.cl b/libclc/generic/lib/integer/hadd.cl new file mode 100644 index 000000000000..749026e5a8ad --- /dev/null +++ b/libclc/generic/lib/integer/hadd.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <hadd.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/lib/integer/hadd.inc b/libclc/generic/lib/integer/hadd.inc new file mode 100644 index 000000000000..ea59d9bd7db5 --- /dev/null +++ b/libclc/generic/lib/integer/hadd.inc @@ -0,0 +1,6 @@ +//hadd = (x+y)>>1 +//This can be simplified to x>>1 + y>>1 + (1 if both x and y have the 1s bit set) +//This saves us having to do any checks for overflow in the addition sum +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE hadd(__CLC_GENTYPE x, __CLC_GENTYPE y) { + return (x>>(__CLC_GENTYPE)1)+(y>>(__CLC_GENTYPE)1)+(x&y&(__CLC_GENTYPE)1); +} diff --git a/libclc/generic/lib/integer/mad24.cl b/libclc/generic/lib/integer/mad24.cl new file mode 100644 index 000000000000..e29e99f28b56 --- /dev/null +++ b/libclc/generic/lib/integer/mad24.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <mad24.inc> +#include <clc/integer/integer-gentype.inc> diff --git a/libclc/generic/lib/integer/mad24.inc b/libclc/generic/lib/integer/mad24.inc new file mode 100644 index 000000000000..902b0aafe4c8 --- /dev/null +++ b/libclc/generic/lib/integer/mad24.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mad24(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z){ + return mul24(x, y) + z; +} diff --git a/libclc/generic/lib/integer/mad_sat.cl b/libclc/generic/lib/integer/mad_sat.cl new file mode 100644 index 000000000000..1708b29efffc --- /dev/null +++ b/libclc/generic/lib/integer/mad_sat.cl @@ -0,0 +1,72 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +_CLC_OVERLOAD _CLC_DEF char mad_sat(char x, char y, char z) { + return clamp((short)mad24((short)x, (short)y, (short)z), (short)CHAR_MIN, (short) CHAR_MAX); +} + +_CLC_OVERLOAD _CLC_DEF uchar mad_sat(uchar x, uchar y, uchar z) { + return clamp((ushort)mad24((ushort)x, (ushort)y, (ushort)z), (ushort)0, (ushort) UCHAR_MAX); +} + +_CLC_OVERLOAD _CLC_DEF short mad_sat(short x, short y, short z) { + return clamp((int)mad24((int)x, (int)y, (int)z), (int)SHRT_MIN, (int) SHRT_MAX); +} + +_CLC_OVERLOAD _CLC_DEF ushort mad_sat(ushort x, ushort y, ushort z) { + return clamp((uint)mad24((uint)x, (uint)y, (uint)z), (uint)0, (uint) USHRT_MAX); +} + +_CLC_OVERLOAD _CLC_DEF int mad_sat(int x, int y, int z) { + int mhi = mul_hi(x, y); + uint mlo = x * y; + long m = upsample(mhi, mlo); + m += z; + if (m > INT_MAX) + return INT_MAX; + if (m < INT_MIN) + return INT_MIN; + return m; +} + +_CLC_OVERLOAD _CLC_DEF uint mad_sat(uint x, uint y, uint z) { + if (mul_hi(x, y) != 0) + return UINT_MAX; + return add_sat(x * y, z); +} + +_CLC_OVERLOAD _CLC_DEF long mad_sat(long x, long y, long z) { + long hi = mul_hi(x, y); + ulong ulo = x * y; + long slo = x * y; + /* Big overflow of more than 2 bits, add can't fix this */ + if (((x < 0) == (y < 0)) && hi != 0) + return LONG_MAX; + /* Low overflow in mul and z not neg enough to correct it */ + if (hi == 0 && ulo >= LONG_MAX && (z > 0 || (ulo + z) > LONG_MAX)) + return LONG_MAX; + /* Big overflow of more than 2 bits, add can't fix this */ + if (((x < 0) != (y < 0)) && hi != -1) + return LONG_MIN; + /* Low overflow in mul and z not pos enough to correct it */ + if (hi == -1 && ulo <= ((ulong)LONG_MAX + 1UL) && (z < 0 || z < (LONG_MAX - ulo))) + return LONG_MIN; + /* We have checked all conditions, any overflow in addition returns + * the correct value */ + return ulo + z; +} + +_CLC_OVERLOAD _CLC_DEF ulong mad_sat(ulong x, ulong y, ulong z) { + if (mul_hi(x, y) != 0) + return ULONG_MAX; + return add_sat(x * y, z); +} + +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, mad_sat, char, char, char) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, mad_sat, uchar, uchar, uchar) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, mad_sat, short, short, short) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, mad_sat, ushort, ushort, ushort) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, mad_sat, int, int, int) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, mad_sat, uint, uint, uint) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, mad_sat, long, long, long) +_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, mad_sat, ulong, ulong, ulong) diff --git a/libclc/generic/lib/integer/mul24.cl b/libclc/generic/lib/integer/mul24.cl new file mode 100644 index 000000000000..8aedca64b859 --- /dev/null +++ b/libclc/generic/lib/integer/mul24.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <mul24.inc> +#include <clc/integer/integer-gentype.inc> diff --git a/libclc/generic/lib/integer/mul24.inc b/libclc/generic/lib/integer/mul24.inc new file mode 100644 index 000000000000..95a2f1d6f31b --- /dev/null +++ b/libclc/generic/lib/integer/mul24.inc @@ -0,0 +1,11 @@ + +// We need to use shifts here in order to mantain the sign bit for signed +// integers. The compiler should optimize this to (x & 0x00FFFFFF) for +// unsigned integers. +#define CONVERT_TO_24BIT(x) (((x) << 8) >> 8) + +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mul24(__CLC_GENTYPE x, __CLC_GENTYPE y){ + return CONVERT_TO_24BIT(x) * CONVERT_TO_24BIT(y); +} + +#undef CONVERT_TO_24BIT diff --git a/libclc/generic/lib/integer/mul_hi.cl b/libclc/generic/lib/integer/mul_hi.cl new file mode 100644 index 000000000000..174d893afb14 --- /dev/null +++ b/libclc/generic/lib/integer/mul_hi.cl @@ -0,0 +1,109 @@ +#include <clc/clc.h> + +//For all types EXCEPT long, which is implemented separately +#define __CLC_MUL_HI_IMPL(BGENTYPE, GENTYPE, GENSIZE) \ + _CLC_OVERLOAD _CLC_DEF GENTYPE mul_hi(GENTYPE x, GENTYPE y){ \ + return (GENTYPE)(((BGENTYPE)x * (BGENTYPE)y) >> GENSIZE); \ + } \ + +//FOIL-based long mul_hi +// +// Summary: Treat mul_hi(long x, long y) as: +// (a+b) * (c+d) where a and c are the high-order parts of x and y respectively +// and b and d are the low-order parts of x and y. +// Thinking back to algebra, we use FOIL to do the work. + +_CLC_OVERLOAD _CLC_DEF long mul_hi(long x, long y){ + long f, o, i; + ulong l; + + //Move the high/low halves of x/y into the lower 32-bits of variables so + //that we can multiply them without worrying about overflow. + long x_hi = x >> 32; + long x_lo = x & UINT_MAX; + long y_hi = y >> 32; + long y_lo = y & UINT_MAX; + + //Multiply all of the components according to FOIL method + f = x_hi * y_hi; + o = x_hi * y_lo; + i = x_lo * y_hi; + l = x_lo * y_lo; + + //Now add the components back together in the following steps: + //F: doesn't need to be modified + //O/I: Need to be added together. + //L: Shift right by 32-bits, then add into the sum of O and I + //Once O/I/L are summed up, then shift the sum by 32-bits and add to F. + // + //We use hadd to give us a bit of extra precision for the intermediate sums + //but as a result, we shift by 31 bits instead of 32 + return (long)(f + (hadd(o, (i + (long)((ulong)l>>32))) >> 31)); +} + +_CLC_OVERLOAD _CLC_DEF ulong mul_hi(ulong x, ulong y){ + ulong f, o, i; + ulong l; + + //Move the high/low halves of x/y into the lower 32-bits of variables so + //that we can multiply them without worrying about overflow. + ulong x_hi = x >> 32; + ulong x_lo = x & UINT_MAX; + ulong y_hi = y >> 32; + ulong y_lo = y & UINT_MAX; + + //Multiply all of the components according to FOIL method + f = x_hi * y_hi; + o = x_hi * y_lo; + i = x_lo * y_hi; + l = x_lo * y_lo; + + //Now add the components back together, taking care to respect the fact that: + //F: doesn't need to be modified + //O/I: Need to be added together. + //L: Shift right by 32-bits, then add into the sum of O and I + //Once O/I/L are summed up, then shift the sum by 32-bits and add to F. + // + //We use hadd to give us a bit of extra precision for the intermediate sums + //but as a result, we shift by 31 bits instead of 32 + return (f + (hadd(o, (i + (l>>32))) >> 31)); +} + +#define __CLC_MUL_HI_VEC(GENTYPE) \ + _CLC_OVERLOAD _CLC_DEF GENTYPE##2 mul_hi(GENTYPE##2 x, GENTYPE##2 y){ \ + return (GENTYPE##2){mul_hi(x.s0, y.s0), mul_hi(x.s1, y.s1)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF GENTYPE##3 mul_hi(GENTYPE##3 x, GENTYPE##3 y){ \ + return (GENTYPE##3){mul_hi(x.s0, y.s0), mul_hi(x.s1, y.s1), mul_hi(x.s2, y.s2)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF GENTYPE##4 mul_hi(GENTYPE##4 x, GENTYPE##4 y){ \ + return (GENTYPE##4){mul_hi(x.lo, y.lo), mul_hi(x.hi, y.hi)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF GENTYPE##8 mul_hi(GENTYPE##8 x, GENTYPE##8 y){ \ + return (GENTYPE##8){mul_hi(x.lo, y.lo), mul_hi(x.hi, y.hi)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF GENTYPE##16 mul_hi(GENTYPE##16 x, GENTYPE##16 y){ \ + return (GENTYPE##16){mul_hi(x.lo, y.lo), mul_hi(x.hi, y.hi)}; \ + } \ + +#define __CLC_MUL_HI_DEC_IMPL(BTYPE, TYPE, BITS) \ + __CLC_MUL_HI_IMPL(BTYPE, TYPE, BITS) \ + __CLC_MUL_HI_VEC(TYPE) + +#define __CLC_MUL_HI_TYPES() \ + __CLC_MUL_HI_DEC_IMPL(short, char, 8) \ + __CLC_MUL_HI_DEC_IMPL(ushort, uchar, 8) \ + __CLC_MUL_HI_DEC_IMPL(int, short, 16) \ + __CLC_MUL_HI_DEC_IMPL(uint, ushort, 16) \ + __CLC_MUL_HI_DEC_IMPL(long, int, 32) \ + __CLC_MUL_HI_DEC_IMPL(ulong, uint, 32) \ + __CLC_MUL_HI_VEC(long) \ + __CLC_MUL_HI_VEC(ulong) + +__CLC_MUL_HI_TYPES() + +#undef __CLC_MUL_HI_TYPES +#undef __CLC_MUL_HI_DEC_IMPL +#undef __CLC_MUL_HI_IMPL +#undef __CLC_MUL_HI_VEC +#undef __CLC_B32 diff --git a/libclc/generic/lib/integer/rhadd.cl b/libclc/generic/lib/integer/rhadd.cl new file mode 100644 index 000000000000..c985870f7c7a --- /dev/null +++ b/libclc/generic/lib/integer/rhadd.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <rhadd.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/lib/integer/rhadd.inc b/libclc/generic/lib/integer/rhadd.inc new file mode 100644 index 000000000000..3d6076874808 --- /dev/null +++ b/libclc/generic/lib/integer/rhadd.inc @@ -0,0 +1,6 @@ +//rhadd = (x+y+1)>>1 +//This can be simplified to x>>1 + y>>1 + (1 if either x or y have the 1s bit set) +//This saves us having to do any checks for overflow in the addition sums +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE rhadd(__CLC_GENTYPE x, __CLC_GENTYPE y) { + return (x>>(__CLC_GENTYPE)1)+(y>>(__CLC_GENTYPE)1)+((x&(__CLC_GENTYPE)1)|(y&(__CLC_GENTYPE)1)); +} diff --git a/libclc/generic/lib/integer/rotate.cl b/libclc/generic/lib/integer/rotate.cl new file mode 100644 index 000000000000..27ce515c7293 --- /dev/null +++ b/libclc/generic/lib/integer/rotate.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> + +#define __CLC_BODY <rotate.inc> +#include <clc/integer/gentype.inc> diff --git a/libclc/generic/lib/integer/rotate.inc b/libclc/generic/lib/integer/rotate.inc new file mode 100644 index 000000000000..33bb0a85241d --- /dev/null +++ b/libclc/generic/lib/integer/rotate.inc @@ -0,0 +1,42 @@ +/** + * Not necessarily optimal... but it produces correct results (at least for int) + * If we're lucky, LLVM will recognize the pattern and produce rotate + * instructions: + * http://llvm.1065342.n5.nabble.com/rotate-td47679.html + * + * Eventually, someone should feel free to implement an llvm-specific version + */ + +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE rotate(__CLC_GENTYPE x, __CLC_GENTYPE n){ + //Try to avoid extra work if someone's spinning the value through multiple + //full rotations + n = n % (__CLC_GENTYPE)__CLC_GENSIZE; + +#ifdef __CLC_SCALAR + if (n > 0){ + return (x << n) | (((__CLC_U_GENTYPE)x) >> (__CLC_GENSIZE - n)); + } else if (n == 0){ + return x; + } else { + return ( (((__CLC_U_GENTYPE)x) >> -n) | (x << (__CLC_GENSIZE + n)) ); + } +#else + //XXX: There's a lot of __builtin_astype calls to cast everything to + // unsigned ... This should be improved so that if __CLC_GENTYPE==__CLC_U_GENTYPE, no + // casts are required. + + __CLC_U_GENTYPE x_1 = __builtin_astype(x, __CLC_U_GENTYPE); + + //XXX: Is (__CLC_U_GENTYPE >> S__CLC_GENTYPE) | (__CLC_U_GENTYPE << S__CLC_GENTYPE) legal? + // If so, then combine the amt and shifts into a single set of statements + + __CLC_U_GENTYPE amt; + amt = (n < (__CLC_GENTYPE)0 ? __builtin_astype((__CLC_GENTYPE)0-n, __CLC_U_GENTYPE) : (__CLC_U_GENTYPE)0); + x_1 = (x_1 >> amt) | (x_1 << ((__CLC_U_GENTYPE)__CLC_GENSIZE - amt)); + + amt = (n < (__CLC_GENTYPE)0 ? (__CLC_U_GENTYPE)0 : __builtin_astype(n, __CLC_U_GENTYPE)); + x_1 = (x_1 << amt) | (x_1 >> ((__CLC_U_GENTYPE)__CLC_GENSIZE - amt)); + + return __builtin_astype(x_1, __CLC_GENTYPE); +#endif +} diff --git a/libclc/generic/lib/integer/sub_sat.cl b/libclc/generic/lib/integer/sub_sat.cl new file mode 100644 index 000000000000..6b42cc86a74c --- /dev/null +++ b/libclc/generic/lib/integer/sub_sat.cl @@ -0,0 +1,53 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +// From sub_sat.ll +_CLC_DECL char __clc_sub_sat_s8(char, char); +_CLC_DECL uchar __clc_sub_sat_u8(uchar, uchar); +_CLC_DECL short __clc_sub_sat_s16(short, short); +_CLC_DECL ushort __clc_sub_sat_u16(ushort, ushort); +_CLC_DECL int __clc_sub_sat_s32(int, int); +_CLC_DECL uint __clc_sub_sat_u32(uint, uint); +_CLC_DECL long __clc_sub_sat_s64(long, long); +_CLC_DECL ulong __clc_sub_sat_u64(ulong, ulong); + +_CLC_OVERLOAD _CLC_DEF char sub_sat(char x, char y) { + return __clc_sub_sat_s8(x, y); +} + +_CLC_OVERLOAD _CLC_DEF uchar sub_sat(uchar x, uchar y) { + return __clc_sub_sat_u8(x, y); +} + +_CLC_OVERLOAD _CLC_DEF short sub_sat(short x, short y) { + return __clc_sub_sat_s16(x, y); +} + +_CLC_OVERLOAD _CLC_DEF ushort sub_sat(ushort x, ushort y) { + return __clc_sub_sat_u16(x, y); +} + +_CLC_OVERLOAD _CLC_DEF int sub_sat(int x, int y) { + return __clc_sub_sat_s32(x, y); +} + +_CLC_OVERLOAD _CLC_DEF uint sub_sat(uint x, uint y) { + return __clc_sub_sat_u32(x, y); +} + +_CLC_OVERLOAD _CLC_DEF long sub_sat(long x, long y) { + return __clc_sub_sat_s64(x, y); +} + +_CLC_OVERLOAD _CLC_DEF ulong sub_sat(ulong x, ulong y) { + return __clc_sub_sat_u64(x, y); +} + +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, sub_sat, char, char) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, sub_sat, uchar, uchar) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, sub_sat, short, short) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, sub_sat, ushort, ushort) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, sub_sat, int, int) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, sub_sat, uint, uint) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, sub_sat, long, long) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, sub_sat, ulong, ulong) diff --git a/libclc/generic/lib/integer/sub_sat_if.ll b/libclc/generic/lib/integer/sub_sat_if.ll new file mode 100644 index 000000000000..7252574b5b8e --- /dev/null +++ b/libclc/generic/lib/integer/sub_sat_if.ll @@ -0,0 +1,55 @@ +declare i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y) + +define i8 @__clc_sub_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y) + ret i8 %call +} + +declare i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y) + +define i8 @__clc_sub_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y) + ret i8 %call +} + +declare i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y) + +define i16 @__clc_sub_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y) + ret i16 %call +} + +declare i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y) + +define i16 @__clc_sub_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y) + ret i16 %call +} + +declare i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y) + +define i32 @__clc_sub_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y) + ret i32 %call +} + +declare i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y) + +define i32 @__clc_sub_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y) + ret i32 %call +} + +declare i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y) + +define i64 @__clc_sub_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y) + ret i64 %call +} + +declare i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y) + +define i64 @__clc_sub_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y) + ret i64 %call +} diff --git a/libclc/generic/lib/integer/sub_sat_impl.ll b/libclc/generic/lib/integer/sub_sat_impl.ll new file mode 100644 index 000000000000..e82b632f43b4 --- /dev/null +++ b/libclc/generic/lib/integer/sub_sat_impl.ll @@ -0,0 +1,83 @@ +declare {i8, i1} @llvm.ssub.with.overflow.i8(i8, i8) +declare {i8, i1} @llvm.usub.with.overflow.i8(i8, i8) + +define i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call {i8, i1} @llvm.ssub.with.overflow.i8(i8 %x, i8 %y) + %res = extractvalue {i8, i1} %call, 0 + %over = extractvalue {i8, i1} %call, 1 + %x.msb = ashr i8 %x, 7 + %x.limit = xor i8 %x.msb, 127 + %sat = select i1 %over, i8 %x.limit, i8 %res + ret i8 %sat +} + +define i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call {i8, i1} @llvm.usub.with.overflow.i8(i8 %x, i8 %y) + %res = extractvalue {i8, i1} %call, 0 + %over = extractvalue {i8, i1} %call, 1 + %sat = select i1 %over, i8 0, i8 %res + ret i8 %sat +} + +declare {i16, i1} @llvm.ssub.with.overflow.i16(i16, i16) +declare {i16, i1} @llvm.usub.with.overflow.i16(i16, i16) + +define i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call {i16, i1} @llvm.ssub.with.overflow.i16(i16 %x, i16 %y) + %res = extractvalue {i16, i1} %call, 0 + %over = extractvalue {i16, i1} %call, 1 + %x.msb = ashr i16 %x, 15 + %x.limit = xor i16 %x.msb, 32767 + %sat = select i1 %over, i16 %x.limit, i16 %res + ret i16 %sat +} + +define i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call {i16, i1} @llvm.usub.with.overflow.i16(i16 %x, i16 %y) + %res = extractvalue {i16, i1} %call, 0 + %over = extractvalue {i16, i1} %call, 1 + %sat = select i1 %over, i16 0, i16 %res + ret i16 %sat +} + +declare {i32, i1} @llvm.ssub.with.overflow.i32(i32, i32) +declare {i32, i1} @llvm.usub.with.overflow.i32(i32, i32) + +define i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %x, i32 %y) + %res = extractvalue {i32, i1} %call, 0 + %over = extractvalue {i32, i1} %call, 1 + %x.msb = ashr i32 %x, 31 + %x.limit = xor i32 %x.msb, 2147483647 + %sat = select i1 %over, i32 %x.limit, i32 %res + ret i32 %sat +} + +define i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %x, i32 %y) + %res = extractvalue {i32, i1} %call, 0 + %over = extractvalue {i32, i1} %call, 1 + %sat = select i1 %over, i32 0, i32 %res + ret i32 %sat +} + +declare {i64, i1} @llvm.ssub.with.overflow.i64(i64, i64) +declare {i64, i1} @llvm.usub.with.overflow.i64(i64, i64) + +define i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call {i64, i1} @llvm.ssub.with.overflow.i64(i64 %x, i64 %y) + %res = extractvalue {i64, i1} %call, 0 + %over = extractvalue {i64, i1} %call, 1 + %x.msb = ashr i64 %x, 63 + %x.limit = xor i64 %x.msb, 9223372036854775807 + %sat = select i1 %over, i64 %x.limit, i64 %res + ret i64 %sat +} + +define i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call {i64, i1} @llvm.usub.with.overflow.i64(i64 %x, i64 %y) + %res = extractvalue {i64, i1} %call, 0 + %over = extractvalue {i64, i1} %call, 1 + %sat = select i1 %over, i64 0, i64 %res + ret i64 %sat +} diff --git a/libclc/generic/lib/integer/upsample.cl b/libclc/generic/lib/integer/upsample.cl new file mode 100644 index 000000000000..da77315f8f93 --- /dev/null +++ b/libclc/generic/lib/integer/upsample.cl @@ -0,0 +1,34 @@ +#include <clc/clc.h> + +#define __CLC_UPSAMPLE_IMPL(BGENTYPE, GENTYPE, UGENTYPE, GENSIZE) \ + _CLC_OVERLOAD _CLC_DEF BGENTYPE upsample(GENTYPE hi, UGENTYPE lo){ \ + return ((BGENTYPE)hi << GENSIZE) | lo; \ + } \ + _CLC_OVERLOAD _CLC_DEF BGENTYPE##2 upsample(GENTYPE##2 hi, UGENTYPE##2 lo){ \ + return (BGENTYPE##2){upsample(hi.s0, lo.s0), upsample(hi.s1, lo.s1)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF BGENTYPE##3 upsample(GENTYPE##3 hi, UGENTYPE##3 lo){ \ + return (BGENTYPE##3){upsample(hi.s0, lo.s0), upsample(hi.s1, lo.s1), upsample(hi.s2, lo.s2)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF BGENTYPE##4 upsample(GENTYPE##4 hi, UGENTYPE##4 lo){ \ + return (BGENTYPE##4){upsample(hi.lo, lo.lo), upsample(hi.hi, lo.hi)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF BGENTYPE##8 upsample(GENTYPE##8 hi, UGENTYPE##8 lo){ \ + return (BGENTYPE##8){upsample(hi.lo, lo.lo), upsample(hi.hi, lo.hi)}; \ + } \ + _CLC_OVERLOAD _CLC_DEF BGENTYPE##16 upsample(GENTYPE##16 hi, UGENTYPE##16 lo){ \ + return (BGENTYPE##16){upsample(hi.lo, lo.lo), upsample(hi.hi, lo.hi)}; \ + } \ + +#define __CLC_UPSAMPLE_TYPES() \ + __CLC_UPSAMPLE_IMPL(short, char, uchar, 8) \ + __CLC_UPSAMPLE_IMPL(ushort, uchar, uchar, 8) \ + __CLC_UPSAMPLE_IMPL(int, short, ushort, 16) \ + __CLC_UPSAMPLE_IMPL(uint, ushort, ushort, 16) \ + __CLC_UPSAMPLE_IMPL(long, int, uint, 32) \ + __CLC_UPSAMPLE_IMPL(ulong, uint, uint, 32) \ + +__CLC_UPSAMPLE_TYPES() + +#undef __CLC_UPSAMPLE_TYPES +#undef __CLC_UPSAMPLE_IMPL diff --git a/libclc/generic/lib/math/acos.cl b/libclc/generic/lib/math/acos.cl new file mode 100644 index 000000000000..3ce96554fef3 --- /dev/null +++ b/libclc/generic/lib/math/acos.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <acos.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/acos.inc b/libclc/generic/lib/math/acos.inc new file mode 100644 index 000000000000..8612415f37bd --- /dev/null +++ b/libclc/generic/lib/math/acos.inc @@ -0,0 +1,21 @@ +/* + * There are multiple formulas for calculating arccosine of x: + * 1) acos(x) = (1/2*pi) + i * ln(i*x + sqrt(1-x^2)) (notice the 'i'...) + * 2) acos(x) = pi/2 + asin(-x) (asin isn't implemented yet) + * 3) acos(x) = pi/2 - asin(x) (ditto) + * 4) acos(x) = 2*atan2(sqrt(1-x), sqrt(1+x)) + * 5) acos(x) = pi/2 - atan2(x, ( sqrt(1-x^2) ) ) + * + * Options 1-3 are not currently usable, #5 generates more concise radeonsi + * bitcode and assembly than #4 (134 vs 132 instructions on radeonsi), but + * precision of #4 may be better. + */ + +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE acos(__CLC_GENTYPE x) { + return ( + (__CLC_GENTYPE) 2.0 * atan2( + sqrt((__CLC_GENTYPE) 1.0 - x), + sqrt((__CLC_GENTYPE) 1.0 + x) + ) + ); +} diff --git a/libclc/generic/lib/math/asin.cl b/libclc/generic/lib/math/asin.cl new file mode 100644 index 000000000000..d56dbd780a7b --- /dev/null +++ b/libclc/generic/lib/math/asin.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <asin.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/asin.inc b/libclc/generic/lib/math/asin.inc new file mode 100644 index 000000000000..a109c367fc79 --- /dev/null +++ b/libclc/generic/lib/math/asin.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE asin(__CLC_GENTYPE x) { + return atan2(x, sqrt( (__CLC_GENTYPE)1.0 -(x*x) )); +}
\ No newline at end of file diff --git a/libclc/generic/lib/math/atan.cl b/libclc/generic/lib/math/atan.cl new file mode 100644 index 000000000000..fa3633cef748 --- /dev/null +++ b/libclc/generic/lib/math/atan.cl @@ -0,0 +1,183 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "math.h" +#include "../clcmacro.h" + +#include <clc/clc.h> + +_CLC_OVERLOAD _CLC_DEF float atan(float x) +{ + const float piby2 = 1.5707963267948966f; // 0x3ff921fb54442d18 + + uint ux = as_uint(x); + uint aux = ux & EXSIGNBIT_SP32; + uint sx = ux ^ aux; + + float spiby2 = as_float(sx | as_uint(piby2)); + + float v = as_float(aux); + + // Return for NaN + float ret = x; + + // 2^26 <= |x| <= Inf => atan(x) is close to piby2 + ret = aux <= PINFBITPATT_SP32 ? spiby2 : ret; + + // Reduce arguments 2^-19 <= |x| < 2^26 + + // 39/16 <= x < 2^26 + x = -MATH_RECIP(v); + float c = 1.57079632679489655800f; // atan(infinity) + + // 19/16 <= x < 39/16 + int l = aux < 0x401c0000; + float xx = MATH_DIVIDE(v - 1.5f, mad(v, 1.5f, 1.0f)); + x = l ? xx : x; + c = l ? 9.82793723247329054082e-01f : c; // atan(1.5) + + // 11/16 <= x < 19/16 + l = aux < 0x3f980000U; + xx = MATH_DIVIDE(v - 1.0f, 1.0f + v); + x = l ? xx : x; + c = l ? 7.85398163397448278999e-01f : c; // atan(1) + + // 7/16 <= x < 11/16 + l = aux < 0x3f300000; + xx = MATH_DIVIDE(mad(v, 2.0f, -1.0f), 2.0f + v); + x = l ? xx : x; + c = l ? 4.63647609000806093515e-01f : c; // atan(0.5) + + // 2^-19 <= x < 7/16 + l = aux < 0x3ee00000; + x = l ? v : x; + c = l ? 0.0f : c; + + // Core approximation: Remez(2,2) on [-7/16,7/16] + + float s = x * x; + float a = mad(s, + mad(s, 0.470677934286149214138357545549e-2f, 0.192324546402108583211697690500f), + 0.296528598819239217902158651186f); + + float b = mad(s, + mad(s, 0.299309699959659728404442796915f, 0.111072499995399550138837673349e1f), + 0.889585796862432286486651434570f); + + float q = x * s * MATH_DIVIDE(a, b); + + float z = c - (q - x); + float zs = as_float(sx | as_uint(z)); + + ret = aux < 0x4c800000 ? zs : ret; + + // |x| < 2^-19 + ret = aux < 0x36000000 ? as_float(ux) : ret; + return ret; +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, atan, float); + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + + +_CLC_OVERLOAD _CLC_DEF double atan(double x) +{ + const double piby2 = 1.5707963267948966e+00; // 0x3ff921fb54442d18 + + double v = fabs(x); + + // 2^56 > v > 39/16 + double a = -1.0; + double b = v; + // (chi + clo) = arctan(infinity) + double chi = 1.57079632679489655800e+00; + double clo = 6.12323399573676480327e-17; + + double ta = v - 1.5; + double tb = 1.0 + 1.5 * v; + int l = v <= 0x1.38p+1; // 39/16 > v > 19/16 + a = l ? ta : a; + b = l ? tb : b; + // (chi + clo) = arctan(1.5) + chi = l ? 9.82793723247329054082e-01 : chi; + clo = l ? 1.39033110312309953701e-17 : clo; + + ta = v - 1.0; + tb = 1.0 + v; + l = v <= 0x1.3p+0; // 19/16 > v > 11/16 + a = l ? ta : a; + b = l ? tb : b; + // (chi + clo) = arctan(1.) + chi = l ? 7.85398163397448278999e-01 : chi; + clo = l ? 3.06161699786838240164e-17 : clo; + + ta = 2.0 * v - 1.0; + tb = 2.0 + v; + l = v <= 0x1.6p-1; // 11/16 > v > 7/16 + a = l ? ta : a; + b = l ? tb : b; + // (chi + clo) = arctan(0.5) + chi = l ? 4.63647609000806093515e-01 : chi; + clo = l ? 2.26987774529616809294e-17 : clo; + + l = v <= 0x1.cp-2; // v < 7/16 + a = l ? v : a; + b = l ? 1.0 : b;; + chi = l ? 0.0 : chi; + clo = l ? 0.0 : clo; + + // Core approximation: Remez(4,4) on [-7/16,7/16] + double r = a / b; + double s = r * r; + double qn = fma(s, + fma(s, + fma(s, + fma(s, 0.142316903342317766e-3, + 0.304455919504853031e-1), + 0.220638780716667420e0), + 0.447677206805497472e0), + 0.268297920532545909e0); + + double qd = fma(s, + fma(s, + fma(s, + fma(s, 0.389525873944742195e-1, + 0.424602594203847109e0), + 0.141254259931958921e1), + 0.182596787737507063e1), + 0.804893761597637733e0); + + double q = r * s * qn / qd; + r = chi - ((q - clo) - r); + + double z = isnan(x) ? x : piby2; + z = v <= 0x1.0p+56 ? r : z; + z = v < 0x1.0p-26 ? v : z; + return x == v ? z : -z; +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, atan, double); + +#endif // cl_khr_fp64 diff --git a/libclc/generic/lib/math/atan2.cl b/libclc/generic/lib/math/atan2.cl new file mode 100644 index 000000000000..9e5fb587d422 --- /dev/null +++ b/libclc/generic/lib/math/atan2.cl @@ -0,0 +1,81 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "math.h" +#include "../clcmacro.h" + +#include <clc/clc.h> + +_CLC_OVERLOAD _CLC_DEF float atan2(float y, float x) +{ + const float pi = 0x1.921fb6p+1f; + const float piby2 = 0x1.921fb6p+0f; + const float piby4 = 0x1.921fb6p-1f; + const float threepiby4 = 0x1.2d97c8p+1f; + + float ax = fabs(x); + float ay = fabs(y); + float v = min(ax, ay); + float u = max(ax, ay); + + // Scale since u could be large, as in "regular" divide + float s = u > 0x1.0p+96f ? 0x1.0p-32f : 1.0f; + float vbyu = s * MATH_DIVIDE(v, s*u); + + float vbyu2 = vbyu * vbyu; + +#define USE_2_2_APPROXIMATION +#if defined USE_2_2_APPROXIMATION + float p = mad(vbyu2, mad(vbyu2, -0x1.7e1f78p-9f, -0x1.7d1b98p-3f), -0x1.5554d0p-2f) * vbyu2 * vbyu; + float q = mad(vbyu2, mad(vbyu2, 0x1.1a714cp-2f, 0x1.287c56p+0f), 1.0f); +#else + float p = mad(vbyu2, mad(vbyu2, -0x1.55cd22p-5f, -0x1.26cf76p-2f), -0x1.55554ep-2f) * vbyu2 * vbyu; + float q = mad(vbyu2, mad(vbyu2, mad(vbyu2, 0x1.9f1304p-5f, 0x1.2656fap-1f), 0x1.76b4b8p+0f), 1.0f); +#endif + + // Octant 0 result + float a = mad(p, MATH_RECIP(q), vbyu); + + // Fix up 3 other octants + float at = piby2 - a; + a = ay > ax ? at : a; + at = pi - a; + a = x < 0.0F ? at : a; + + // y == 0 => 0 for x >= 0, pi for x < 0 + at = as_int(x) < 0 ? pi : 0.0f; + a = y == 0.0f ? at : a; + + // if (!FINITE_ONLY()) { + // x and y are +- Inf + at = x > 0.0f ? piby4 : threepiby4; + a = ax == INFINITY & ay == INFINITY ? at : a; + + // x or y is NaN + a = isnan(x) | isnan(y) ? as_float(QNANBITPATT_SP32) : a; + // } + + // Fixup sign and return + return copysign(a, y); +} + +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, atan2, float, float); diff --git a/libclc/generic/lib/math/binary_impl.inc b/libclc/generic/lib/math/binary_impl.inc new file mode 100644 index 000000000000..c9bf97242672 --- /dev/null +++ b/libclc/generic/lib/math/binary_impl.inc @@ -0,0 +1,22 @@ + +#ifndef __CLC_SCALAR + +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE FUNCTION(__CLC_GENTYPE x, __CLC_GENTYPE y) { + return FUNCTION_IMPL(x, y); +} + +#endif + +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE FUNCTION(__CLC_GENTYPE x, float y) { + __CLC_GENTYPE vec_y = (__CLC_GENTYPE) (y); + return FUNCTION_IMPL(x, vec_y); +} + +#ifdef cl_khr_fp64 + +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE FUNCTION(__CLC_GENTYPE x, double y) { + __CLC_GENTYPE vec_y = (__CLC_GENTYPE) (y); + return FUNCTION_IMPL(x, vec_y); +} + +#endif diff --git a/libclc/generic/lib/math/clc_nextafter.cl b/libclc/generic/lib/math/clc_nextafter.cl new file mode 100644 index 000000000000..e53837d179fb --- /dev/null +++ b/libclc/generic/lib/math/clc_nextafter.cl @@ -0,0 +1,43 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +// This file provides OpenCL C implementations of nextafter for targets that +// don't support the clang builtin. + +#define FLT_NAN 0.0f/0.0f + +#define NEXTAFTER(FLOAT_TYPE, UINT_TYPE, NAN, ZERO, NEXTAFTER_ZERO) \ +_CLC_OVERLOAD _CLC_DEF FLOAT_TYPE __clc_nextafter(FLOAT_TYPE x, FLOAT_TYPE y) { \ + union { \ + FLOAT_TYPE f; \ + UINT_TYPE i; \ + } next; \ + if (isnan(x) || isnan(y)) { \ + return NAN; \ + } \ + if (x == y) { \ + return y; \ + } \ + next.f = x; \ + if (x < y) { \ + next.i++; \ + } else { \ + if (next.f == ZERO) { \ + next.i = NEXTAFTER_ZERO; \ + } else { \ + next.i--; \ + } \ + } \ + return next.f; \ +} + +NEXTAFTER(float, uint, FLT_NAN, 0.0f, 0x80000001) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, __clc_nextafter, float, float) + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#define DBL_NAN 0.0/0.0 + +NEXTAFTER(double, ulong, DBL_NAN, 0.0, 0x8000000000000001) +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, __clc_nextafter, double, double) +#endif diff --git a/libclc/generic/lib/math/copysign.cl b/libclc/generic/lib/math/copysign.cl new file mode 100644 index 000000000000..4e0c51b09373 --- /dev/null +++ b/libclc/generic/lib/math/copysign.cl @@ -0,0 +1,12 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +_CLC_DEFINE_BINARY_BUILTIN(float, copysign, __builtin_copysignf, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_DEFINE_BINARY_BUILTIN(double, copysign, __builtin_copysign, double, double) + +#endif diff --git a/libclc/generic/lib/math/cos.cl b/libclc/generic/lib/math/cos.cl new file mode 100644 index 000000000000..bbd96b42bc12 --- /dev/null +++ b/libclc/generic/lib/math/cos.cl @@ -0,0 +1,67 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include <clc/clc.h> + +#include "math.h" +#include "sincos_helpers.h" +#include "../clcmacro.h" + +_CLC_OVERLOAD _CLC_DEF float cos(float x) +{ + int ix = as_int(x); + int ax = ix & 0x7fffffff; + float dx = as_float(ax); + + float r0, r1; + int regn = argReductionS(&r0, &r1, dx); + + float ss = -sinf_piby4(r0, r1); + float cc = cosf_piby4(r0, r1); + + float c = (regn & 1) != 0 ? ss : cc; + c = as_float(as_int(c) ^ ((regn > 1) << 31)); + + c = ax >= PINFBITPATT_SP32 ? as_float(QNANBITPATT_SP32) : c; + + return c; +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, cos, float); + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +#define __CLC_FUNCTION __clc_cos_intrinsic +#define __CLC_INTRINSIC "llvm.cos" +#include <clc/math/unary_intrin.inc> +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC + +_CLC_OVERLOAD _CLC_DEF double cos(double x) { + return __clc_cos_intrinsic(x); +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, cos, double); + +#endif diff --git a/libclc/generic/lib/math/exp.cl b/libclc/generic/lib/math/exp.cl new file mode 100644 index 000000000000..dbf4a930b01d --- /dev/null +++ b/libclc/generic/lib/math/exp.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <exp.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/exp.inc b/libclc/generic/lib/math/exp.inc new file mode 100644 index 000000000000..525fb59c9967 --- /dev/null +++ b/libclc/generic/lib/math/exp.inc @@ -0,0 +1,10 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE exp(__CLC_GENTYPE val) { + // exp(x) = exp2(x * log2(e)) +#if __CLC_FPSIZE == 32 + return exp2(val * M_LOG2E_F); +#elif __CLC_FPSIZE == 64 + return exp2(val * M_LOG2E); +#else +#error unknown _CLC_FPSIZE +#endif +} diff --git a/libclc/generic/lib/math/exp10.cl b/libclc/generic/lib/math/exp10.cl new file mode 100644 index 000000000000..c8039cb8dedc --- /dev/null +++ b/libclc/generic/lib/math/exp10.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <exp10.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/exp10.inc b/libclc/generic/lib/math/exp10.inc new file mode 100644 index 000000000000..a592c1948799 --- /dev/null +++ b/libclc/generic/lib/math/exp10.inc @@ -0,0 +1,10 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE exp10(__CLC_GENTYPE val) { + // exp10(x) = exp2(x * log2(10)) +#if __CLC_FPSIZE == 32 + return exp2(val * log2(10.0f)); +#elif __CLC_FPSIZE == 64 + return exp2(val * log2(10.0)); +#else +#error unknown _CLC_FPSIZE +#endif +} diff --git a/libclc/generic/lib/math/fmax.cl b/libclc/generic/lib/math/fmax.cl new file mode 100644 index 000000000000..58583d6767aa --- /dev/null +++ b/libclc/generic/lib/math/fmax.cl @@ -0,0 +1,11 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define FUNCTION __clc_fmax +#define FUNCTION_IMPL(x, y) ((x) < (y) ? (y) : (x)) + +#define __CLC_BODY <binary_impl.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/fmin.cl b/libclc/generic/lib/math/fmin.cl new file mode 100644 index 000000000000..a61ad4757289 --- /dev/null +++ b/libclc/generic/lib/math/fmin.cl @@ -0,0 +1,11 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define FUNCTION __clc_fmin +#define FUNCTION_IMPL(x, y) ((y) < (x) ? (y) : (x)) + +#define __CLC_BODY <binary_impl.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/fmod.cl b/libclc/generic/lib/math/fmod.cl new file mode 100644 index 000000000000..f9a4e3176137 --- /dev/null +++ b/libclc/generic/lib/math/fmod.cl @@ -0,0 +1,12 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +_CLC_DEFINE_BINARY_BUILTIN(float, fmod, __builtin_fmodf, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_DEFINE_BINARY_BUILTIN(double, fmod, __builtin_fmod, double, double) + +#endif diff --git a/libclc/generic/lib/math/hypot.cl b/libclc/generic/lib/math/hypot.cl new file mode 100644 index 000000000000..eca042c91535 --- /dev/null +++ b/libclc/generic/lib/math/hypot.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <hypot.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/hypot.inc b/libclc/generic/lib/math/hypot.inc new file mode 100644 index 000000000000..036cee7e1f06 --- /dev/null +++ b/libclc/generic/lib/math/hypot.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE hypot(__CLC_GENTYPE x, __CLC_GENTYPE y) { + return sqrt(x*x + y*y); +} diff --git a/libclc/generic/lib/math/log1p.cl b/libclc/generic/lib/math/log1p.cl new file mode 100644 index 000000000000..be25c64bf6a4 --- /dev/null +++ b/libclc/generic/lib/math/log1p.cl @@ -0,0 +1,177 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include <clc/clc.h> + +#include "math.h" +#include "tables.h" +#include "../clcmacro.h" + +_CLC_OVERLOAD _CLC_DEF float log1p(float x) +{ + float w = x; + uint ux = as_uint(x); + uint ax = ux & EXSIGNBIT_SP32; + + // |x| < 2^-4 + float u2 = MATH_DIVIDE(x, 2.0f + x); + float u = u2 + u2; + float v = u * u; + // 2/(5 * 2^5), 2/(3 * 2^3) + float zsmall = mad(-u2, x, mad(v, 0x1.99999ap-7f, 0x1.555556p-4f) * v * u) + x; + + // |x| >= 2^-4 + ux = as_uint(x + 1.0f); + + int m = (int)((ux >> EXPSHIFTBITS_SP32) & 0xff) - EXPBIAS_SP32; + float mf = (float)m; + uint indx = (ux & 0x007f0000) + ((ux & 0x00008000) << 1); + float F = as_float(indx | 0x3f000000); + + // x > 2^24 + float fg24 = F - as_float(0x3f000000 | (ux & MANTBITS_SP32)); + + // x <= 2^24 + uint xhi = ux & 0xffff8000; + float xh = as_float(xhi); + float xt = (1.0f - xh) + w; + uint xnm = ((~(xhi & 0x7f800000)) - 0x00800000) & 0x7f800000; + xt = xt * as_float(xnm) * 0.5f; + float fl24 = F - as_float(0x3f000000 | (xhi & MANTBITS_SP32)) - xt; + + float f = mf > 24.0f ? fg24 : fl24; + + indx = indx >> 16; + float r = f * USE_TABLE(log_inv_tbl, indx); + + // 1/3, 1/2 + float poly = mad(mad(r, 0x1.555556p-2f, 0x1.0p-1f), r*r, r); + + const float LOG2_HEAD = 0x1.62e000p-1f; // 0.693115234 + const float LOG2_TAIL = 0x1.0bfbe8p-15f; // 0.0000319461833 + + float2 tv = USE_TABLE(loge_tbl, indx); + float z1 = mad(mf, LOG2_HEAD, tv.s0); + float z2 = mad(mf, LOG2_TAIL, -poly) + tv.s1; + float z = z1 + z2; + + z = ax < 0x3d800000U ? zsmall : z; + + + + // Edge cases + z = ax >= PINFBITPATT_SP32 ? w : z; + z = w < -1.0f ? as_float(QNANBITPATT_SP32) : z; + z = w == -1.0f ? as_float(NINFBITPATT_SP32) : z; + //fix subnormals + z = ax < 0x33800000 ? x : z; + + return z; +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, log1p, float); + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_OVERLOAD _CLC_DEF double log1p(double x) +{ + // Computes natural log(1+x). Algorithm based on: + // Ping-Tak Peter Tang + // "Table-driven implementation of the logarithm function in IEEE + // floating-point arithmetic" + // ACM Transactions on Mathematical Software (TOMS) + // Volume 16, Issue 4 (December 1990) + // Note that we use a lookup table of size 64 rather than 128, + // and compensate by having extra terms in the minimax polynomial + // for the kernel approximation. + + // Process Inside the threshold now + ulong ux = as_ulong(1.0 + x); + int xexp = ((as_int2(ux).hi >> 20) & 0x7ff) - EXPBIAS_DP64; + double f = as_double(ONEEXPBITS_DP64 | (ux & MANTBITS_DP64)); + + int j = as_int2(ux).hi >> 13; + j = ((0x80 | (j & 0x7e)) >> 1) + (j & 0x1); + double f1 = (double)j * 0x1.0p-6; + j -= 64; + + double f2temp = f - f1; + double m2 = as_double(convert_ulong(0x3ff - xexp) << EXPSHIFTBITS_DP64); + double f2l = fma(m2, x, m2 - f1); + double f2g = fma(m2, x, -f1) + m2; + double f2 = xexp <= MANTLENGTH_DP64-1 ? f2l : f2g; + f2 = (xexp <= -2) | (xexp >= MANTLENGTH_DP64+8) ? f2temp : f2; + + double2 tv = USE_TABLE(ln_tbl, j); + double z1 = tv.s0; + double q = tv.s1; + + double u = MATH_DIVIDE(f2, fma(0.5, f2, f1)); + double v = u * u; + + double poly = v * fma(v, + fma(v, 2.23219810758559851206e-03, 1.24999999978138668903e-02), + 8.33333333333333593622e-02); + + // log2_lead and log2_tail sum to an extra-precise version of log(2) + const double log2_lead = 6.93147122859954833984e-01; /* 0x3fe62e42e0000000 */ + const double log2_tail = 5.76999904754328540596e-08; /* 0x3e6efa39ef35793c */ + + double z2 = q + fma(u, poly, u); + double dxexp = (double)xexp; + double r1 = fma(dxexp, log2_lead, z1); + double r2 = fma(dxexp, log2_tail, z2); + double result1 = r1 + r2; + + // Process Outside the threshold now + double r = x; + u = r / (2.0 + r); + double correction = r * u; + u = u + u; + v = u * u; + r1 = r; + + poly = fma(v, + fma(v, + fma(v, 4.34887777707614552256e-04, 2.23213998791944806202e-03), + 1.25000000037717509602e-02), + 8.33333333333317923934e-02); + + r2 = fma(u*v, poly, -correction); + + // The values exp(-1/16)-1 and exp(1/16)-1 + const double log1p_thresh1 = -0x1.f0540438fd5c3p-5; + const double log1p_thresh2 = 0x1.082b577d34ed8p-4; + double result2 = r1 + r2; + result2 = x < log1p_thresh1 | x > log1p_thresh2 ? result1 : result2; + + result2 = isinf(x) ? x : result2; + result2 = x < -1.0 ? as_double(QNANBITPATT_DP64) : result2; + result2 = x == -1.0 ? as_double(NINFBITPATT_DP64) : result2; + return result2; +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, log1p, double); + +#endif // cl_khr_fp64 diff --git a/libclc/generic/lib/math/mad.cl b/libclc/generic/lib/math/mad.cl new file mode 100644 index 000000000000..6c7b90d150d5 --- /dev/null +++ b/libclc/generic/lib/math/mad.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <mad.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/mad.inc b/libclc/generic/lib/math/mad.inc new file mode 100644 index 000000000000..d32c7839d1b9 --- /dev/null +++ b/libclc/generic/lib/math/mad.inc @@ -0,0 +1,3 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mad(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_GENTYPE c) { + return a * b + c; +} diff --git a/libclc/generic/lib/math/math.h b/libclc/generic/lib/math/math.h new file mode 100644 index 000000000000..f46c7ea7a7d0 --- /dev/null +++ b/libclc/generic/lib/math/math.h @@ -0,0 +1,90 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#define SNAN 0x001 +#define QNAN 0x002 +#define NINF 0x004 +#define NNOR 0x008 +#define NSUB 0x010 +#define NZER 0x020 +#define PZER 0x040 +#define PSUB 0x080 +#define PNOR 0x100 +#define PINF 0x200 + +#define HAVE_HW_FMA32() (1) +#define HAVE_BITALIGN() (0) +#define HAVE_FAST_FMA32() (0) + +#define MATH_DIVIDE(X, Y) ((X) / (Y)) +#define MATH_RECIP(X) (1.0f / (X)) +#define MATH_SQRT(X) sqrt(X) + +#define SIGNBIT_SP32 0x80000000 +#define EXSIGNBIT_SP32 0x7fffffff +#define EXPBITS_SP32 0x7f800000 +#define MANTBITS_SP32 0x007fffff +#define ONEEXPBITS_SP32 0x3f800000 +#define TWOEXPBITS_SP32 0x40000000 +#define HALFEXPBITS_SP32 0x3f000000 +#define IMPBIT_SP32 0x00800000 +#define QNANBITPATT_SP32 0x7fc00000 +#define INDEFBITPATT_SP32 0xffc00000 +#define PINFBITPATT_SP32 0x7f800000 +#define NINFBITPATT_SP32 0xff800000 +#define EXPBIAS_SP32 127 +#define EXPSHIFTBITS_SP32 23 +#define BIASEDEMIN_SP32 1 +#define EMIN_SP32 -126 +#define BIASEDEMAX_SP32 254 +#define EMAX_SP32 127 +#define LAMBDA_SP32 1.0e30 +#define MANTLENGTH_SP32 24 +#define BASEDIGITS_SP32 7 + +#ifdef cl_khr_fp64 + +#define SIGNBIT_DP64 0x8000000000000000L +#define EXSIGNBIT_DP64 0x7fffffffffffffffL +#define EXPBITS_DP64 0x7ff0000000000000L +#define MANTBITS_DP64 0x000fffffffffffffL +#define ONEEXPBITS_DP64 0x3ff0000000000000L +#define TWOEXPBITS_DP64 0x4000000000000000L +#define HALFEXPBITS_DP64 0x3fe0000000000000L +#define IMPBIT_DP64 0x0010000000000000L +#define QNANBITPATT_DP64 0x7ff8000000000000L +#define INDEFBITPATT_DP64 0xfff8000000000000L +#define PINFBITPATT_DP64 0x7ff0000000000000L +#define NINFBITPATT_DP64 0xfff0000000000000L +#define EXPBIAS_DP64 1023 +#define EXPSHIFTBITS_DP64 52 +#define BIASEDEMIN_DP64 1 +#define EMIN_DP64 -1022 +#define BIASEDEMAX_DP64 2046 /* 0x7fe */ +#define EMAX_DP64 1023 /* 0x3ff */ +#define LAMBDA_DP64 1.0e300 +#define MANTLENGTH_DP64 53 +#define BASEDIGITS_DP64 15 + +#endif // cl_khr_fp64 + +#define ALIGNED(x) __attribute__((aligned(x))) diff --git a/libclc/generic/lib/math/mix.cl b/libclc/generic/lib/math/mix.cl new file mode 100644 index 000000000000..294f332e67f2 --- /dev/null +++ b/libclc/generic/lib/math/mix.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <mix.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/mix.inc b/libclc/generic/lib/math/mix.inc new file mode 100644 index 000000000000..1e8b936149bb --- /dev/null +++ b/libclc/generic/lib/math/mix.inc @@ -0,0 +1,9 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mix(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE a) { + return mad( y - x, a, x ); +} + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mix(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_SCALAR_GENTYPE a) { + return mix(x, y, (__CLC_GENTYPE)a); +} +#endif diff --git a/libclc/generic/lib/math/nextafter.cl b/libclc/generic/lib/math/nextafter.cl new file mode 100644 index 000000000000..cbe54cd4e266 --- /dev/null +++ b/libclc/generic/lib/math/nextafter.cl @@ -0,0 +1,12 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +_CLC_DEFINE_BINARY_BUILTIN(float, nextafter, __builtin_nextafterf, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_DEFINE_BINARY_BUILTIN(double, nextafter, __builtin_nextafter, double, double) + +#endif diff --git a/libclc/generic/lib/math/pown.cl b/libclc/generic/lib/math/pown.cl new file mode 100644 index 000000000000..f3b27d4ccab7 --- /dev/null +++ b/libclc/generic/lib/math/pown.cl @@ -0,0 +1,10 @@ +#include <clc/clc.h> +#include "../clcmacro.h" + +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, pown, float, int) + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, pown, double, int) +#endif diff --git a/libclc/generic/lib/math/sin.cl b/libclc/generic/lib/math/sin.cl new file mode 100644 index 000000000000..ffc4dd1aa037 --- /dev/null +++ b/libclc/generic/lib/math/sin.cl @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include <clc/clc.h> + +#include "math.h" +#include "sincos_helpers.h" +#include "../clcmacro.h" + +_CLC_OVERLOAD _CLC_DEF float sin(float x) +{ + int ix = as_int(x); + int ax = ix & 0x7fffffff; + float dx = as_float(ax); + + float r0, r1; + int regn = argReductionS(&r0, &r1, dx); + + float ss = sinf_piby4(r0, r1); + float cc = cosf_piby4(r0, r1); + + float s = (regn & 1) != 0 ? cc : ss; + s = as_float(as_int(s) ^ ((regn > 1) << 31) ^ (ix ^ ax)); + + s = ax >= PINFBITPATT_SP32 ? as_float(QNANBITPATT_SP32) : s; + + //Subnormals + s = x == 0.0f ? x : s; + + return s; +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, sin, float); + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +#define __CLC_FUNCTION __clc_sin_intrinsic +#define __CLC_INTRINSIC "llvm.sin" +#include <clc/math/unary_intrin.inc> +#undef __CLC_FUNCTION +#undef __CLC_INTRINSIC + +_CLC_OVERLOAD _CLC_DEF double sin(double x) { + return __clc_sin_intrinsic(x); +} + +_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, sin, double); + +#endif diff --git a/libclc/generic/lib/math/sincos.cl b/libclc/generic/lib/math/sincos.cl new file mode 100644 index 000000000000..eace5adcf16f --- /dev/null +++ b/libclc/generic/lib/math/sincos.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <sincos.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/sincos.inc b/libclc/generic/lib/math/sincos.inc new file mode 100644 index 000000000000..e97f0f9641c1 --- /dev/null +++ b/libclc/generic/lib/math/sincos.inc @@ -0,0 +1,11 @@ +#define __CLC_DECLARE_SINCOS(ADDRSPACE, TYPE) \ + _CLC_OVERLOAD _CLC_DEF TYPE sincos (TYPE x, ADDRSPACE TYPE * cosval) { \ + *cosval = cos(x); \ + return sin(x); \ + } + +__CLC_DECLARE_SINCOS(global, __CLC_GENTYPE) +__CLC_DECLARE_SINCOS(local, __CLC_GENTYPE) +__CLC_DECLARE_SINCOS(private, __CLC_GENTYPE) + +#undef __CLC_DECLARE_SINCOS diff --git a/libclc/generic/lib/math/sincos_helpers.cl b/libclc/generic/lib/math/sincos_helpers.cl new file mode 100644 index 000000000000..1a5f10c8e651 --- /dev/null +++ b/libclc/generic/lib/math/sincos_helpers.cl @@ -0,0 +1,308 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include <clc/clc.h> + +#include "math.h" +#include "sincos_helpers.h" + +uint bitalign(uint hi, uint lo, uint shift) +{ + return (hi << (32 - shift)) | (lo >> shift); +} + +float sinf_piby4(float x, float y) +{ + // Taylor series for sin(x) is x - x^3/3! + x^5/5! - x^7/7! ... + // = x * (1 - x^2/3! + x^4/5! - x^6/7! ... + // = x * f(w) + // where w = x*x and f(w) = (1 - w/3! + w^2/5! - w^3/7! ... + // We use a minimax approximation of (f(w) - 1) / w + // because this produces an expansion in even powers of x. + + const float c1 = -0.1666666666e0f; + const float c2 = 0.8333331876e-2f; + const float c3 = -0.198400874e-3f; + const float c4 = 0.272500015e-5f; + const float c5 = -2.5050759689e-08f; // 0xb2d72f34 + const float c6 = 1.5896910177e-10f; // 0x2f2ec9d3 + + float z = x * x; + float v = z * x; + float r = mad(z, mad(z, mad(z, mad(z, c6, c5), c4), c3), c2); + float ret = x - mad(v, -c1, mad(z, mad(y, 0.5f, -v*r), -y)); + + return ret; +} + +float cosf_piby4(float x, float y) +{ + // Taylor series for cos(x) is 1 - x^2/2! + x^4/4! - x^6/6! ... + // = f(w) + // where w = x*x and f(w) = (1 - w/2! + w^2/4! - w^3/6! ... + // We use a minimax approximation of (f(w) - 1 + w/2) / (w*w) + // because this produces an expansion in even powers of x. + + const float c1 = 0.416666666e-1f; + const float c2 = -0.138888876e-2f; + const float c3 = 0.248006008e-4f; + const float c4 = -0.2730101334e-6f; + const float c5 = 2.0875723372e-09f; // 0x310f74f6 + const float c6 = -1.1359647598e-11f; // 0xad47d74e + + float z = x * x; + float r = z * mad(z, mad(z, mad(z, mad(z, mad(z, c6, c5), c4), c3), c2), c1); + + // if |x| < 0.3 + float qx = 0.0f; + + int ix = as_int(x) & EXSIGNBIT_SP32; + + // 0.78125 > |x| >= 0.3 + float xby4 = as_float(ix - 0x01000000); + qx = (ix >= 0x3e99999a) & (ix <= 0x3f480000) ? xby4 : qx; + + // x > 0.78125 + qx = ix > 0x3f480000 ? 0.28125f : qx; + + float hz = mad(z, 0.5f, -qx); + float a = 1.0f - qx; + float ret = a - (hz - mad(z, r, -x*y)); + return ret; +} + +void fullMulS(float *hi, float *lo, float a, float b, float bh, float bt) +{ + if (HAVE_HW_FMA32()) { + float ph = a * b; + *hi = ph; + *lo = fma(a, b, -ph); + } else { + float ah = as_float(as_uint(a) & 0xfffff000U); + float at = a - ah; + float ph = a * b; + float pt = mad(at, bt, mad(at, bh, mad(ah, bt, mad(ah, bh, -ph)))); + *hi = ph; + *lo = pt; + } +} + +float removePi2S(float *hi, float *lo, float x) +{ + // 72 bits of pi/2 + const float fpiby2_1 = (float) 0xC90FDA / 0x1.0p+23f; + const float fpiby2_1_h = (float) 0xC90 / 0x1.0p+11f; + const float fpiby2_1_t = (float) 0xFDA / 0x1.0p+23f; + + const float fpiby2_2 = (float) 0xA22168 / 0x1.0p+47f; + const float fpiby2_2_h = (float) 0xA22 / 0x1.0p+35f; + const float fpiby2_2_t = (float) 0x168 / 0x1.0p+47f; + + const float fpiby2_3 = (float) 0xC234C4 / 0x1.0p+71f; + const float fpiby2_3_h = (float) 0xC23 / 0x1.0p+59f; + const float fpiby2_3_t = (float) 0x4C4 / 0x1.0p+71f; + + const float twobypi = 0x1.45f306p-1f; + + float fnpi2 = trunc(mad(x, twobypi, 0.5f)); + + // subtract n * pi/2 from x + float rhead, rtail; + fullMulS(&rhead, &rtail, fnpi2, fpiby2_1, fpiby2_1_h, fpiby2_1_t); + float v = x - rhead; + float rem = v + (((x - v) - rhead) - rtail); + + float rhead2, rtail2; + fullMulS(&rhead2, &rtail2, fnpi2, fpiby2_2, fpiby2_2_h, fpiby2_2_t); + v = rem - rhead2; + rem = v + (((rem - v) - rhead2) - rtail2); + + float rhead3, rtail3; + fullMulS(&rhead3, &rtail3, fnpi2, fpiby2_3, fpiby2_3_h, fpiby2_3_t); + v = rem - rhead3; + + *hi = v + ((rem - v) - rhead3); + *lo = -rtail3; + return fnpi2; +} + +int argReductionSmallS(float *r, float *rr, float x) +{ + float fnpi2 = removePi2S(r, rr, x); + return (int)fnpi2 & 0x3; +} + +#define FULL_MUL(A, B, HI, LO) \ + LO = A * B; \ + HI = mul_hi(A, B) + +#define FULL_MAD(A, B, C, HI, LO) \ + LO = ((A) * (B) + (C)); \ + HI = mul_hi(A, B); \ + HI += LO < C + +int argReductionLargeS(float *r, float *rr, float x) +{ + int xe = (int)(as_uint(x) >> 23) - 127; + uint xm = 0x00800000U | (as_uint(x) & 0x7fffffU); + + // 224 bits of 2/PI: . A2F9836E 4E441529 FC2757D1 F534DDC0 DB629599 3C439041 FE5163AB + const uint b6 = 0xA2F9836EU; + const uint b5 = 0x4E441529U; + const uint b4 = 0xFC2757D1U; + const uint b3 = 0xF534DDC0U; + const uint b2 = 0xDB629599U; + const uint b1 = 0x3C439041U; + const uint b0 = 0xFE5163ABU; + + uint p0, p1, p2, p3, p4, p5, p6, p7, c0, c1; + + FULL_MUL(xm, b0, c0, p0); + FULL_MAD(xm, b1, c0, c1, p1); + FULL_MAD(xm, b2, c1, c0, p2); + FULL_MAD(xm, b3, c0, c1, p3); + FULL_MAD(xm, b4, c1, c0, p4); + FULL_MAD(xm, b5, c0, c1, p5); + FULL_MAD(xm, b6, c1, p7, p6); + + uint fbits = 224 + 23 - xe; + + // shift amount to get 2 lsb of integer part at top 2 bits + // min: 25 (xe=18) max: 134 (xe=127) + uint shift = 256U - 2 - fbits; + + // Shift by up to 134/32 = 4 words + int c = shift > 31; + p7 = c ? p6 : p7; + p6 = c ? p5 : p6; + p5 = c ? p4 : p5; + p4 = c ? p3 : p4; + p3 = c ? p2 : p3; + p2 = c ? p1 : p2; + p1 = c ? p0 : p1; + shift -= (-c) & 32; + + c = shift > 31; + p7 = c ? p6 : p7; + p6 = c ? p5 : p6; + p5 = c ? p4 : p5; + p4 = c ? p3 : p4; + p3 = c ? p2 : p3; + p2 = c ? p1 : p2; + shift -= (-c) & 32; + + c = shift > 31; + p7 = c ? p6 : p7; + p6 = c ? p5 : p6; + p5 = c ? p4 : p5; + p4 = c ? p3 : p4; + p3 = c ? p2 : p3; + shift -= (-c) & 32; + + c = shift > 31; + p7 = c ? p6 : p7; + p6 = c ? p5 : p6; + p5 = c ? p4 : p5; + p4 = c ? p3 : p4; + shift -= (-c) & 32; + + // bitalign cannot handle a shift of 32 + c = shift > 0; + shift = 32 - shift; + uint t7 = bitalign(p7, p6, shift); + uint t6 = bitalign(p6, p5, shift); + uint t5 = bitalign(p5, p4, shift); + p7 = c ? t7 : p7; + p6 = c ? t6 : p6; + p5 = c ? t5 : p5; + + // Get 2 lsb of int part and msb of fraction + int i = p7 >> 29; + + // Scoot up 2 more bits so only fraction remains + p7 = bitalign(p7, p6, 30); + p6 = bitalign(p6, p5, 30); + p5 = bitalign(p5, p4, 30); + + // Subtract 1 if msb of fraction is 1, i.e. fraction >= 0.5 + uint flip = i & 1 ? 0xffffffffU : 0U; + uint sign = i & 1 ? 0x80000000U : 0U; + p7 = p7 ^ flip; + p6 = p6 ^ flip; + p5 = p5 ^ flip; + + // Find exponent and shift away leading zeroes and hidden bit + xe = clz(p7) + 1; + shift = 32 - xe; + p7 = bitalign(p7, p6, shift); + p6 = bitalign(p6, p5, shift); + + // Most significant part of fraction + float q1 = as_float(sign | ((127 - xe) << 23) | (p7 >> 9)); + + // Shift out bits we captured on q1 + p7 = bitalign(p7, p6, 32-23); + + // Get 24 more bits of fraction in another float, there are not long strings of zeroes here + int xxe = clz(p7) + 1; + p7 = bitalign(p7, p6, 32-xxe); + float q0 = as_float(sign | ((127 - (xe + 23 + xxe)) << 23) | (p7 >> 9)); + + // At this point, the fraction q1 + q0 is correct to at least 48 bits + // Now we need to multiply the fraction by pi/2 + // This loses us about 4 bits + // pi/2 = C90 FDA A22 168 C23 4C4 + + const float pio2h = (float)0xc90fda / 0x1.0p+23f; + const float pio2hh = (float)0xc90 / 0x1.0p+11f; + const float pio2ht = (float)0xfda / 0x1.0p+23f; + const float pio2t = (float)0xa22168 / 0x1.0p+47f; + + float rh, rt; + + if (HAVE_HW_FMA32()) { + rh = q1 * pio2h; + rt = fma(q0, pio2h, fma(q1, pio2t, fma(q1, pio2h, -rh))); + } else { + float q1h = as_float(as_uint(q1) & 0xfffff000); + float q1t = q1 - q1h; + rh = q1 * pio2h; + rt = mad(q1t, pio2ht, mad(q1t, pio2hh, mad(q1h, pio2ht, mad(q1h, pio2hh, -rh)))); + rt = mad(q0, pio2h, mad(q1, pio2t, rt)); + } + + float t = rh + rt; + rt = rt - (t - rh); + + *r = t; + *rr = rt; + return ((i >> 1) + (i & 1)) & 0x3; +} + +int argReductionS(float *r, float *rr, float x) +{ + if (x < 0x1.0p+23f) + return argReductionSmallS(r, rr, x); + else + return argReductionLargeS(r, rr, x); +} + diff --git a/libclc/generic/lib/math/sincos_helpers.h b/libclc/generic/lib/math/sincos_helpers.h new file mode 100644 index 000000000000..f89c19f6874c --- /dev/null +++ b/libclc/generic/lib/math/sincos_helpers.h @@ -0,0 +1,25 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +float sinf_piby4(float x, float y); +float cosf_piby4(float x, float y); +int argReductionS(float *r, float *rr, float x); diff --git a/libclc/generic/lib/math/tables.cl b/libclc/generic/lib/math/tables.cl new file mode 100644 index 000000000000..b5345a2cff1b --- /dev/null +++ b/libclc/generic/lib/math/tables.cl @@ -0,0 +1,366 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include <clc/clc.h> + +#include "tables.h" + +DECLARE_TABLE(float2, LOGE_TBL, 129) = { + (float2)(0x0.000000p+0f, 0x0.000000p+0f), + (float2)(0x1.fe0000p-8f, 0x1.535882p-23f), + (float2)(0x1.fc0000p-7f, 0x1.5161f8p-20f), + (float2)(0x1.7b8000p-6f, 0x1.1b07d4p-18f), + (float2)(0x1.f82000p-6f, 0x1.361cf0p-19f), + (float2)(0x1.39e000p-5f, 0x1.0f73fcp-18f), + (float2)(0x1.774000p-5f, 0x1.63d8cap-19f), + (float2)(0x1.b42000p-5f, 0x1.bae232p-18f), + (float2)(0x1.f0a000p-5f, 0x1.86008ap-20f), + (float2)(0x1.164000p-4f, 0x1.36eea2p-16f), + (float2)(0x1.340000p-4f, 0x1.d7961ap-16f), + (float2)(0x1.51a000p-4f, 0x1.073f06p-16f), + (float2)(0x1.6f0000p-4f, 0x1.a515cap-17f), + (float2)(0x1.8c2000p-4f, 0x1.45d630p-16f), + (float2)(0x1.a92000p-4f, 0x1.b4e92ap-18f), + (float2)(0x1.c5e000p-4f, 0x1.523d6ep-18f), + (float2)(0x1.e26000p-4f, 0x1.076e2ap-16f), + (float2)(0x1.fec000p-4f, 0x1.2263b6p-17f), + (float2)(0x1.0d6000p-3f, 0x1.7e7cd0p-15f), + (float2)(0x1.1b6000p-3f, 0x1.2ad52ep-15f), + (float2)(0x1.294000p-3f, 0x1.52f81ep-15f), + (float2)(0x1.370000p-3f, 0x1.fc201ep-15f), + (float2)(0x1.44c000p-3f, 0x1.2b6ccap-15f), + (float2)(0x1.526000p-3f, 0x1.cbc742p-16f), + (float2)(0x1.5fe000p-3f, 0x1.3070a6p-15f), + (float2)(0x1.6d6000p-3f, 0x1.fce33ap-20f), + (float2)(0x1.7aa000p-3f, 0x1.890210p-15f), + (float2)(0x1.87e000p-3f, 0x1.a06520p-15f), + (float2)(0x1.952000p-3f, 0x1.6a73d0p-17f), + (float2)(0x1.a22000p-3f, 0x1.bc1fe2p-15f), + (float2)(0x1.af2000p-3f, 0x1.c94e80p-15f), + (float2)(0x1.bc2000p-3f, 0x1.0ce85ap-16f), + (float2)(0x1.c8e000p-3f, 0x1.f7c79ap-15f), + (float2)(0x1.d5c000p-3f, 0x1.0b5a7cp-18f), + (float2)(0x1.e26000p-3f, 0x1.076e2ap-15f), + (float2)(0x1.ef0000p-3f, 0x1.5b97b8p-16f), + (float2)(0x1.fb8000p-3f, 0x1.186d5ep-15f), + (float2)(0x1.040000p-2f, 0x1.2ca5a6p-17f), + (float2)(0x1.0a2000p-2f, 0x1.24e272p-14f), + (float2)(0x1.104000p-2f, 0x1.8bf9aep-14f), + (float2)(0x1.166000p-2f, 0x1.5cabaap-14f), + (float2)(0x1.1c8000p-2f, 0x1.3182d2p-15f), + (float2)(0x1.228000p-2f, 0x1.41fbcep-14f), + (float2)(0x1.288000p-2f, 0x1.5a13dep-14f), + (float2)(0x1.2e8000p-2f, 0x1.c575c2p-15f), + (float2)(0x1.346000p-2f, 0x1.dd9a98p-14f), + (float2)(0x1.3a6000p-2f, 0x1.3155a4p-16f), + (float2)(0x1.404000p-2f, 0x1.843434p-17f), + (float2)(0x1.460000p-2f, 0x1.8bc21cp-14f), + (float2)(0x1.4be000p-2f, 0x1.7e55dcp-16f), + (float2)(0x1.51a000p-2f, 0x1.5b0e5ap-15f), + (float2)(0x1.576000p-2f, 0x1.dc5d14p-16f), + (float2)(0x1.5d0000p-2f, 0x1.bdbf58p-14f), + (float2)(0x1.62c000p-2f, 0x1.05e572p-15f), + (float2)(0x1.686000p-2f, 0x1.903d36p-15f), + (float2)(0x1.6e0000p-2f, 0x1.1d5456p-15f), + (float2)(0x1.738000p-2f, 0x1.d7f6bap-14f), + (float2)(0x1.792000p-2f, 0x1.4abfbap-15f), + (float2)(0x1.7ea000p-2f, 0x1.f07704p-15f), + (float2)(0x1.842000p-2f, 0x1.a3b43cp-15f), + (float2)(0x1.89a000p-2f, 0x1.9c360ap-17f), + (float2)(0x1.8f0000p-2f, 0x1.1e8736p-14f), + (float2)(0x1.946000p-2f, 0x1.941c20p-14f), + (float2)(0x1.99c000p-2f, 0x1.958116p-14f), + (float2)(0x1.9f2000p-2f, 0x1.23ecbep-14f), + (float2)(0x1.a48000p-2f, 0x1.024396p-16f), + (float2)(0x1.a9c000p-2f, 0x1.d93534p-15f), + (float2)(0x1.af0000p-2f, 0x1.293246p-14f), + (float2)(0x1.b44000p-2f, 0x1.eef798p-15f), + (float2)(0x1.b98000p-2f, 0x1.625a4cp-16f), + (float2)(0x1.bea000p-2f, 0x1.4d9da6p-14f), + (float2)(0x1.c3c000p-2f, 0x1.d7a7ccp-14f), + (float2)(0x1.c8e000p-2f, 0x1.f7c79ap-14f), + (float2)(0x1.ce0000p-2f, 0x1.af0b84p-14f), + (float2)(0x1.d32000p-2f, 0x1.fcfc00p-15f), + (float2)(0x1.d82000p-2f, 0x1.e7258ap-14f), + (float2)(0x1.dd4000p-2f, 0x1.a81306p-16f), + (float2)(0x1.e24000p-2f, 0x1.1034f8p-15f), + (float2)(0x1.e74000p-2f, 0x1.09875ap-16f), + (float2)(0x1.ec2000p-2f, 0x1.99d246p-14f), + (float2)(0x1.f12000p-2f, 0x1.1ebf5ep-15f), + (float2)(0x1.f60000p-2f, 0x1.23fa70p-14f), + (float2)(0x1.fae000p-2f, 0x1.588f78p-14f), + (float2)(0x1.ffc000p-2f, 0x1.2e0856p-14f), + (float2)(0x1.024000p-1f, 0x1.52a5a4p-13f), + (float2)(0x1.04a000p-1f, 0x1.df9da8p-13f), + (float2)(0x1.072000p-1f, 0x1.f2e0e6p-16f), + (float2)(0x1.098000p-1f, 0x1.bd3d5cp-15f), + (float2)(0x1.0be000p-1f, 0x1.cb9094p-15f), + (float2)(0x1.0e4000p-1f, 0x1.261746p-15f), + (float2)(0x1.108000p-1f, 0x1.f39e2cp-13f), + (float2)(0x1.12e000p-1f, 0x1.719592p-13f), + (float2)(0x1.154000p-1f, 0x1.87a5e8p-14f), + (float2)(0x1.178000p-1f, 0x1.eabbd8p-13f), + (float2)(0x1.19e000p-1f, 0x1.cd68cep-14f), + (float2)(0x1.1c2000p-1f, 0x1.b81f70p-13f), + (float2)(0x1.1e8000p-1f, 0x1.7d79c0p-15f), + (float2)(0x1.20c000p-1f, 0x1.b9a324p-14f), + (float2)(0x1.230000p-1f, 0x1.30d7bep-13f), + (float2)(0x1.254000p-1f, 0x1.5bce98p-13f), + (float2)(0x1.278000p-1f, 0x1.5e1288p-13f), + (float2)(0x1.29c000p-1f, 0x1.37fec2p-13f), + (float2)(0x1.2c0000p-1f, 0x1.d3da88p-14f), + (float2)(0x1.2e4000p-1f, 0x1.d0db90p-15f), + (float2)(0x1.306000p-1f, 0x1.d7334ep-13f), + (float2)(0x1.32a000p-1f, 0x1.133912p-13f), + (float2)(0x1.34e000p-1f, 0x1.44ece6p-16f), + (float2)(0x1.370000p-1f, 0x1.17b546p-13f), + (float2)(0x1.392000p-1f, 0x1.e0d356p-13f), + (float2)(0x1.3b6000p-1f, 0x1.0893fep-14f), + (float2)(0x1.3d8000p-1f, 0x1.026a70p-13f), + (float2)(0x1.3fa000p-1f, 0x1.5b84d0p-13f), + (float2)(0x1.41c000p-1f, 0x1.8fe846p-13f), + (float2)(0x1.43e000p-1f, 0x1.9fe2f8p-13f), + (float2)(0x1.460000p-1f, 0x1.8bc21cp-13f), + (float2)(0x1.482000p-1f, 0x1.53d1eap-13f), + (float2)(0x1.4a4000p-1f, 0x1.f0bb60p-14f), + (float2)(0x1.4c6000p-1f, 0x1.e6bf32p-15f), + (float2)(0x1.4e6000p-1f, 0x1.d811b6p-13f), + (float2)(0x1.508000p-1f, 0x1.13cc00p-13f), + (float2)(0x1.52a000p-1f, 0x1.6932dep-16f), + (float2)(0x1.54a000p-1f, 0x1.246798p-13f), + (float2)(0x1.56a000p-1f, 0x1.f9d5b2p-13f), + (float2)(0x1.58c000p-1f, 0x1.5b6b9ap-14f), + (float2)(0x1.5ac000p-1f, 0x1.404c34p-13f), + (float2)(0x1.5cc000p-1f, 0x1.b1dc6cp-13f), + (float2)(0x1.5ee000p-1f, 0x1.54920ap-20f), + (float2)(0x1.60e000p-1f, 0x1.97a23cp-16f), + (float2)(0x1.62e000p-1f, 0x1.0bfbe8p-15f), +}; + +DECLARE_TABLE(float, LOG_INV_TBL, 129) = { + 0x1.000000p+1f, + 0x1.fc07f0p+0f, + 0x1.f81f82p+0f, + 0x1.f4465ap+0f, + 0x1.f07c20p+0f, + 0x1.ecc07cp+0f, + 0x1.e9131ap+0f, + 0x1.e573acp+0f, + 0x1.e1e1e2p+0f, + 0x1.de5d6ep+0f, + 0x1.dae608p+0f, + 0x1.d77b66p+0f, + 0x1.d41d42p+0f, + 0x1.d0cb58p+0f, + 0x1.cd8568p+0f, + 0x1.ca4b30p+0f, + 0x1.c71c72p+0f, + 0x1.c3f8f0p+0f, + 0x1.c0e070p+0f, + 0x1.bdd2b8p+0f, + 0x1.bacf92p+0f, + 0x1.b7d6c4p+0f, + 0x1.b4e81cp+0f, + 0x1.b20364p+0f, + 0x1.af286cp+0f, + 0x1.ac5702p+0f, + 0x1.a98ef6p+0f, + 0x1.a6d01ap+0f, + 0x1.a41a42p+0f, + 0x1.a16d40p+0f, + 0x1.9ec8eap+0f, + 0x1.9c2d14p+0f, + 0x1.99999ap+0f, + 0x1.970e50p+0f, + 0x1.948b10p+0f, + 0x1.920fb4p+0f, + 0x1.8f9c18p+0f, + 0x1.8d3018p+0f, + 0x1.8acb90p+0f, + 0x1.886e60p+0f, + 0x1.861862p+0f, + 0x1.83c978p+0f, + 0x1.818182p+0f, + 0x1.7f4060p+0f, + 0x1.7d05f4p+0f, + 0x1.7ad220p+0f, + 0x1.78a4c8p+0f, + 0x1.767dcep+0f, + 0x1.745d18p+0f, + 0x1.724288p+0f, + 0x1.702e06p+0f, + 0x1.6e1f76p+0f, + 0x1.6c16c2p+0f, + 0x1.6a13cep+0f, + 0x1.681682p+0f, + 0x1.661ec6p+0f, + 0x1.642c86p+0f, + 0x1.623fa8p+0f, + 0x1.605816p+0f, + 0x1.5e75bcp+0f, + 0x1.5c9882p+0f, + 0x1.5ac056p+0f, + 0x1.58ed24p+0f, + 0x1.571ed4p+0f, + 0x1.555556p+0f, + 0x1.539094p+0f, + 0x1.51d07ep+0f, + 0x1.501502p+0f, + 0x1.4e5e0ap+0f, + 0x1.4cab88p+0f, + 0x1.4afd6ap+0f, + 0x1.49539ep+0f, + 0x1.47ae14p+0f, + 0x1.460cbcp+0f, + 0x1.446f86p+0f, + 0x1.42d662p+0f, + 0x1.414142p+0f, + 0x1.3fb014p+0f, + 0x1.3e22ccp+0f, + 0x1.3c995ap+0f, + 0x1.3b13b2p+0f, + 0x1.3991c2p+0f, + 0x1.381382p+0f, + 0x1.3698e0p+0f, + 0x1.3521d0p+0f, + 0x1.33ae46p+0f, + 0x1.323e34p+0f, + 0x1.30d190p+0f, + 0x1.2f684cp+0f, + 0x1.2e025cp+0f, + 0x1.2c9fb4p+0f, + 0x1.2b404ap+0f, + 0x1.29e412p+0f, + 0x1.288b02p+0f, + 0x1.27350cp+0f, + 0x1.25e228p+0f, + 0x1.24924ap+0f, + 0x1.234568p+0f, + 0x1.21fb78p+0f, + 0x1.20b470p+0f, + 0x1.1f7048p+0f, + 0x1.1e2ef4p+0f, + 0x1.1cf06ap+0f, + 0x1.1bb4a4p+0f, + 0x1.1a7b96p+0f, + 0x1.194538p+0f, + 0x1.181182p+0f, + 0x1.16e068p+0f, + 0x1.15b1e6p+0f, + 0x1.1485f0p+0f, + 0x1.135c82p+0f, + 0x1.12358ep+0f, + 0x1.111112p+0f, + 0x1.0fef02p+0f, + 0x1.0ecf56p+0f, + 0x1.0db20ap+0f, + 0x1.0c9714p+0f, + 0x1.0b7e6ep+0f, + 0x1.0a6810p+0f, + 0x1.0953f4p+0f, + 0x1.084210p+0f, + 0x1.073260p+0f, + 0x1.0624dep+0f, + 0x1.051980p+0f, + 0x1.041042p+0f, + 0x1.03091cp+0f, + 0x1.020408p+0f, + 0x1.010102p+0f, + 0x1.000000p+0f, +}; + +TABLE_FUNCTION(float2, LOGE_TBL, loge_tbl); +TABLE_FUNCTION(float, LOG_INV_TBL, log_inv_tbl); + +#ifdef cl_khr_fp64 + +DECLARE_TABLE(double2, LN_TBL, 65) = { + (double2)(0x0.0000000000000p+0, 0x0.0000000000000p+0), + (double2)(0x1.fc0a800000000p-7, 0x1.61f807c79f3dbp-28), + (double2)(0x1.f829800000000p-6, 0x1.873c1980267c8p-25), + (double2)(0x1.7745800000000p-5, 0x1.ec65b9f88c69ep-26), + (double2)(0x1.f0a3000000000p-5, 0x1.8022c54cc2f99p-26), + (double2)(0x1.341d700000000p-4, 0x1.2c37a3a125330p-25), + (double2)(0x1.6f0d200000000p-4, 0x1.15cad69737c93p-25), + (double2)(0x1.a926d00000000p-4, 0x1.d256ab1b285e9p-27), + (double2)(0x1.e270700000000p-4, 0x1.b8abcb97a7aa2p-26), + (double2)(0x1.0d77e00000000p-3, 0x1.f34239659a5dcp-25), + (double2)(0x1.2955280000000p-3, 0x1.e07fd48d30177p-25), + (double2)(0x1.44d2b00000000p-3, 0x1.b32df4799f4f6p-25), + (double2)(0x1.5ff3000000000p-3, 0x1.c29e4f4f21cf8p-25), + (double2)(0x1.7ab8900000000p-3, 0x1.086c848df1b59p-30), + (double2)(0x1.9525a80000000p-3, 0x1.cf456b4764130p-27), + (double2)(0x1.af3c900000000p-3, 0x1.3a02ffcb63398p-25), + (double2)(0x1.c8ff780000000p-3, 0x1.1e6a6886b0976p-25), + (double2)(0x1.e270700000000p-3, 0x1.b8abcb97a7aa2p-25), + (double2)(0x1.fb91800000000p-3, 0x1.b578f8aa35552p-25), + (double2)(0x1.0a324c0000000p-2, 0x1.139c871afb9fcp-25), + (double2)(0x1.1675c80000000p-2, 0x1.5d5d30701ce64p-25), + (double2)(0x1.22941c0000000p-2, 0x1.de7bcb2d12142p-25), + (double2)(0x1.2e8e280000000p-2, 0x1.d708e984e1664p-25), + (double2)(0x1.3a64c40000000p-2, 0x1.56945e9c72f36p-26), + (double2)(0x1.4618bc0000000p-2, 0x1.0e2f613e85bdap-29), + (double2)(0x1.51aad80000000p-2, 0x1.cb7e0b42724f6p-28), + (double2)(0x1.5d1bd80000000p-2, 0x1.fac04e52846c7p-25), + (double2)(0x1.686c800000000p-2, 0x1.e9b14aec442bep-26), + (double2)(0x1.739d7c0000000p-2, 0x1.b5de8034e7126p-25), + (double2)(0x1.7eaf800000000p-2, 0x1.dc157e1b259d3p-25), + (double2)(0x1.89a3380000000p-2, 0x1.b05096ad69c62p-28), + (double2)(0x1.9479400000000p-2, 0x1.c2116faba4cddp-26), + (double2)(0x1.9f323c0000000p-2, 0x1.65fcc25f95b47p-25), + (double2)(0x1.a9cec80000000p-2, 0x1.a9a08498d4850p-26), + (double2)(0x1.b44f740000000p-2, 0x1.de647b1465f77p-25), + (double2)(0x1.beb4d80000000p-2, 0x1.da71b7bf7861dp-26), + (double2)(0x1.c8ff7c0000000p-2, 0x1.e6a6886b09760p-28), + (double2)(0x1.d32fe40000000p-2, 0x1.f0075eab0ef64p-25), + (double2)(0x1.dd46a00000000p-2, 0x1.3071282fb989bp-28), + (double2)(0x1.e744240000000p-2, 0x1.0eb43c3f1bed2p-25), + (double2)(0x1.f128f40000000p-2, 0x1.faf06ecb35c84p-26), + (double2)(0x1.faf5880000000p-2, 0x1.ef1e63db35f68p-27), + (double2)(0x1.02552a0000000p-1, 0x1.69743fb1a71a5p-27), + (double2)(0x1.0723e40000000p-1, 0x1.c1cdf404e5796p-25), + (double2)(0x1.0be72e0000000p-1, 0x1.094aa0ada625ep-27), + (double2)(0x1.109f380000000p-1, 0x1.e2d4c96fde3ecp-25), + (double2)(0x1.154c3c0000000p-1, 0x1.2f4d5e9a98f34p-25), + (double2)(0x1.19ee6a0000000p-1, 0x1.467c96ecc5cbep-25), + (double2)(0x1.1e85f40000000p-1, 0x1.e7040d03dec5ap-25), + (double2)(0x1.23130c0000000p-1, 0x1.7bebf4282de36p-25), + (double2)(0x1.2795e00000000p-1, 0x1.289b11aeb783fp-25), + (double2)(0x1.2c0e9e0000000p-1, 0x1.a891d1772f538p-26), + (double2)(0x1.307d720000000p-1, 0x1.34f10be1fb591p-25), + (double2)(0x1.34e2880000000p-1, 0x1.d9ce1d316eb93p-25), + (double2)(0x1.393e0c0000000p-1, 0x1.3562a19a9c442p-25), + (double2)(0x1.3d90260000000p-1, 0x1.4e2adf548084cp-26), + (double2)(0x1.41d8fe0000000p-1, 0x1.08ce55cc8c97ap-26), + (double2)(0x1.4618bc0000000p-1, 0x1.0e2f613e85bdap-28), + (double2)(0x1.4a4f840000000p-1, 0x1.db03ebb0227bfp-25), + (double2)(0x1.4e7d800000000p-1, 0x1.1b75bb09cb098p-25), + (double2)(0x1.52a2d20000000p-1, 0x1.96f16abb9df22p-27), + (double2)(0x1.56bf9c0000000p-1, 0x1.5b3f399411c62p-25), + (double2)(0x1.5ad4040000000p-1, 0x1.86b3e59f65355p-26), + (double2)(0x1.5ee02a0000000p-1, 0x1.2482ceae1ac12p-26), + (double2)(0x1.62e42e0000000p-1, 0x1.efa39ef35793cp-25), +}; + +TABLE_FUNCTION(double2, LN_TBL, ln_tbl); + +#endif // cl_khr_fp64 diff --git a/libclc/generic/lib/math/tables.h b/libclc/generic/lib/math/tables.h new file mode 100644 index 000000000000..925544064a50 --- /dev/null +++ b/libclc/generic/lib/math/tables.h @@ -0,0 +1,50 @@ +/* + * Copyright (c) 2014 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#define TABLE_SPACE __constant + +#define TABLE_MANGLE(NAME) __clc_##NAME + +#define DECLARE_TABLE(TYPE,NAME,LENGTH) \ + TABLE_SPACE TYPE NAME [ LENGTH ] + +#define TABLE_FUNCTION(TYPE,TABLE,NAME) \ + TYPE TABLE_MANGLE(NAME)(size_t idx) { \ + return TABLE[idx]; \ + } + +#define TABLE_FUNCTION_DECL(TYPE, NAME) \ + TYPE TABLE_MANGLE(NAME)(size_t idx); + +#define USE_TABLE(NAME, IDX) \ + TABLE_MANGLE(NAME)(IDX) + +TABLE_FUNCTION_DECL(float2, loge_tbl); +TABLE_FUNCTION_DECL(float, log_inv_tbl); + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +TABLE_FUNCTION_DECL(double2, ln_tbl); + +#endif // cl_khr_fp64 diff --git a/libclc/generic/lib/math/tan.cl b/libclc/generic/lib/math/tan.cl new file mode 100644 index 000000000000..a447999ea8b9 --- /dev/null +++ b/libclc/generic/lib/math/tan.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <tan.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/math/tan.inc b/libclc/generic/lib/math/tan.inc new file mode 100644 index 000000000000..8d9d9fe24786 --- /dev/null +++ b/libclc/generic/lib/math/tan.inc @@ -0,0 +1,8 @@ +/* + * Note: tan(x) = sin(x)/cos(x) also, but the final assembly ends up being + * twice as long for R600 (maybe for others as well). + */ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE tan(__CLC_GENTYPE x) { + __CLC_GENTYPE sinx = sin(x); + return sinx / sqrt( (__CLC_GENTYPE) 1.0 - (sinx*sinx) ); +} diff --git a/libclc/generic/lib/relational/all.cl b/libclc/generic/lib/relational/all.cl new file mode 100644 index 000000000000..607d7a9c68c4 --- /dev/null +++ b/libclc/generic/lib/relational/all.cl @@ -0,0 +1,29 @@ +#include <clc/clc.h> + +#define _CLC_ALL(v) (((v) >> ((sizeof(v) * 8) - 1)) & 0x1) +#define _CLC_ALL2(v) (_CLC_ALL((v).s0) & _CLC_ALL((v).s1)) +#define _CLC_ALL3(v) (_CLC_ALL2((v)) & _CLC_ALL((v).s2)) +#define _CLC_ALL4(v) (_CLC_ALL3((v)) & _CLC_ALL((v).s3)) +#define _CLC_ALL8(v) (_CLC_ALL4((v)) & _CLC_ALL((v).s4) & _CLC_ALL((v).s5) \ + & _CLC_ALL((v).s6) & _CLC_ALL((v).s7)) +#define _CLC_ALL16(v) (_CLC_ALL8((v)) & _CLC_ALL((v).s8) & _CLC_ALL((v).s9) \ + & _CLC_ALL((v).sA) & _CLC_ALL((v).sB) \ + & _CLC_ALL((v).sC) & _CLC_ALL((v).sD) \ + & _CLC_ALL((v).sE) & _CLC_ALL((v).sf)) + + +#define ALL_ID(TYPE) \ + _CLC_OVERLOAD _CLC_DEF int all(TYPE v) + +#define ALL_VECTORIZE(TYPE) \ + ALL_ID(TYPE) { return _CLC_ALL(v); } \ + ALL_ID(TYPE##2) { return _CLC_ALL2(v); } \ + ALL_ID(TYPE##3) { return _CLC_ALL3(v); } \ + ALL_ID(TYPE##4) { return _CLC_ALL4(v); } \ + ALL_ID(TYPE##8) { return _CLC_ALL8(v); } \ + ALL_ID(TYPE##16) { return _CLC_ALL16(v); } + +ALL_VECTORIZE(char) +ALL_VECTORIZE(short) +ALL_VECTORIZE(int) +ALL_VECTORIZE(long) diff --git a/libclc/generic/lib/relational/any.cl b/libclc/generic/lib/relational/any.cl new file mode 100644 index 000000000000..4d372102021b --- /dev/null +++ b/libclc/generic/lib/relational/any.cl @@ -0,0 +1,30 @@ +#include <clc/clc.h> + +#define _CLC_ANY(v) (((v) >> ((sizeof(v) * 8) - 1)) & 0x1) +#define _CLC_ANY2(v) (_CLC_ANY((v).s0) | _CLC_ANY((v).s1)) +#define _CLC_ANY3(v) (_CLC_ANY2((v)) | _CLC_ANY((v).s2)) +#define _CLC_ANY4(v) (_CLC_ANY3((v)) | _CLC_ANY((v).s3)) +#define _CLC_ANY8(v) (_CLC_ANY4((v)) | _CLC_ANY((v).s4) | _CLC_ANY((v).s5) \ + | _CLC_ANY((v).s6) | _CLC_ANY((v).s7)) +#define _CLC_ANY16(v) (_CLC_ANY8((v)) | _CLC_ANY((v).s8) | _CLC_ANY((v).s9) \ + | _CLC_ANY((v).sA) | _CLC_ANY((v).sB) \ + | _CLC_ANY((v).sC) | _CLC_ANY((v).sD) \ + | _CLC_ANY((v).sE) | _CLC_ANY((v).sf)) + + +#define ANY_ID(TYPE) \ + _CLC_OVERLOAD _CLC_DEF int any(TYPE v) + +#define ANY_VECTORIZE(TYPE) \ + ANY_ID(TYPE) { return _CLC_ANY(v); } \ + ANY_ID(TYPE##2) { return _CLC_ANY2(v); } \ + ANY_ID(TYPE##3) { return _CLC_ANY3(v); } \ + ANY_ID(TYPE##4) { return _CLC_ANY4(v); } \ + ANY_ID(TYPE##8) { return _CLC_ANY8(v); } \ + ANY_ID(TYPE##16) { return _CLC_ANY16(v); } + +ANY_VECTORIZE(char) +ANY_VECTORIZE(short) +ANY_VECTORIZE(int) +ANY_VECTORIZE(long) + diff --git a/libclc/generic/lib/relational/isequal.cl b/libclc/generic/lib/relational/isequal.cl new file mode 100644 index 000000000000..9d79ba6b3dbe --- /dev/null +++ b/libclc/generic/lib/relational/isequal.cl @@ -0,0 +1,30 @@ +#include <clc/clc.h> + +#define _CLC_DEFINE_ISEQUAL(RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \ + return (x == y); \ +} \ + +_CLC_DEFINE_ISEQUAL(int, isequal, float, float) +_CLC_DEFINE_ISEQUAL(int2, isequal, float2, float2) +_CLC_DEFINE_ISEQUAL(int3, isequal, float3, float3) +_CLC_DEFINE_ISEQUAL(int4, isequal, float4, float4) +_CLC_DEFINE_ISEQUAL(int8, isequal, float8, float8) +_CLC_DEFINE_ISEQUAL(int16, isequal, float16, float16) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isequal(double) returns an int, but the vector versions +// return long. +_CLC_DEFINE_ISEQUAL(int, isequal, double, double) +_CLC_DEFINE_ISEQUAL(long2, isequal, double2, double2) +_CLC_DEFINE_ISEQUAL(long3, isequal, double3, double3) +_CLC_DEFINE_ISEQUAL(long4, isequal, double4, double4) +_CLC_DEFINE_ISEQUAL(long8, isequal, double8, double8) +_CLC_DEFINE_ISEQUAL(long16, isequal, double16, double16) + +#endif + +#undef _CLC_DEFINE_ISEQUAL
\ No newline at end of file diff --git a/libclc/generic/lib/relational/isfinite.cl b/libclc/generic/lib/relational/isfinite.cl new file mode 100644 index 000000000000..d0658c01eacb --- /dev/null +++ b/libclc/generic/lib/relational/isfinite.cl @@ -0,0 +1,18 @@ +#include <clc/clc.h> +#include "relational.h" + +_CLC_DEFINE_RELATIONAL_UNARY(int, isfinite, __builtin_isfinite, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isfinite(double) returns an int, but the vector versions +// return long. +_CLC_DEF _CLC_OVERLOAD int isfinite(double x) { + return __builtin_isfinite(x); +} + +_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isfinite, double) + +#endif diff --git a/libclc/generic/lib/relational/isgreater.cl b/libclc/generic/lib/relational/isgreater.cl new file mode 100644 index 000000000000..79456e56d517 --- /dev/null +++ b/libclc/generic/lib/relational/isgreater.cl @@ -0,0 +1,22 @@ +#include <clc/clc.h> +#include "relational.h" + +//Note: It would be nice to use __builtin_isgreater with vector inputs, but it seems to only take scalar values as +// input, which will produce incorrect output for vector input types. + +_CLC_DEFINE_RELATIONAL_BINARY(int, isgreater, __builtin_isgreater, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isgreater(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int isgreater(double x, double y){ + return __builtin_isgreater(x, y); +} + +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isgreater, double, double) + +#endif diff --git a/libclc/generic/lib/relational/isgreaterequal.cl b/libclc/generic/lib/relational/isgreaterequal.cl new file mode 100644 index 000000000000..2d5ebe5770c7 --- /dev/null +++ b/libclc/generic/lib/relational/isgreaterequal.cl @@ -0,0 +1,22 @@ +#include <clc/clc.h> +#include "relational.h" + +//Note: It would be nice to use __builtin_isgreaterequal with vector inputs, but it seems to only take scalar values as +// input, which will produce incorrect output for vector input types. + +_CLC_DEFINE_RELATIONAL_BINARY(int, isgreaterequal, __builtin_isgreaterequal, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isgreaterequal(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int isgreaterequal(double x, double y){ + return __builtin_isgreaterequal(x, y); +} + +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isgreaterequal, double, double) + +#endif diff --git a/libclc/generic/lib/relational/isinf.cl b/libclc/generic/lib/relational/isinf.cl new file mode 100644 index 000000000000..1452d919cb86 --- /dev/null +++ b/libclc/generic/lib/relational/isinf.cl @@ -0,0 +1,18 @@ +#include <clc/clc.h> +#include "relational.h" + +_CLC_DEFINE_RELATIONAL_UNARY(int, isinf, __builtin_isinf, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isinf(double) returns an int, but the vector versions +// return long. +_CLC_DEF _CLC_OVERLOAD int isinf(double x) { + return __builtin_isinf(x); +} + +_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isinf, double) + +#endif diff --git a/libclc/generic/lib/relational/isless.cl b/libclc/generic/lib/relational/isless.cl new file mode 100644 index 000000000000..56a3e1329b48 --- /dev/null +++ b/libclc/generic/lib/relational/isless.cl @@ -0,0 +1,22 @@ +#include <clc/clc.h> +#include "relational.h" + +//Note: It would be nice to use __builtin_isless with vector inputs, but it seems to only take scalar values as +// input, which will produce incorrect output for vector input types. + +_CLC_DEFINE_RELATIONAL_BINARY(int, isless, __builtin_isless, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isless(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int isless(double x, double y){ + return __builtin_isless(x, y); +} + +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isless, double, double) + +#endif diff --git a/libclc/generic/lib/relational/islessequal.cl b/libclc/generic/lib/relational/islessequal.cl new file mode 100644 index 000000000000..259c307da453 --- /dev/null +++ b/libclc/generic/lib/relational/islessequal.cl @@ -0,0 +1,22 @@ +#include <clc/clc.h> +#include "relational.h" + +//Note: It would be nice to use __builtin_islessequal with vector inputs, but it seems to only take scalar values as +// input, which will produce incorrect output for vector input types. + +_CLC_DEFINE_RELATIONAL_BINARY(int, islessequal, __builtin_islessequal, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of islessequal(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int islessequal(double x, double y){ + return __builtin_islessequal(x, y); +} + +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, islessequal, double, double) + +#endif diff --git a/libclc/generic/lib/relational/islessgreater.cl b/libclc/generic/lib/relational/islessgreater.cl new file mode 100644 index 000000000000..fc029f35b73a --- /dev/null +++ b/libclc/generic/lib/relational/islessgreater.cl @@ -0,0 +1,22 @@ +#include <clc/clc.h> +#include "relational.h" + +//Note: It would be nice to use __builtin_islessgreater with vector inputs, but it seems to only take scalar values as +// input, which will produce incorrect output for vector input types. + +_CLC_DEFINE_RELATIONAL_BINARY(int, islessgreater, __builtin_islessgreater, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of islessgreater(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int islessgreater(double x, double y){ + return __builtin_islessgreater(x, y); +} + +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, islessgreater, double, double) + +#endif diff --git a/libclc/generic/lib/relational/isnan.cl b/libclc/generic/lib/relational/isnan.cl new file mode 100644 index 000000000000..f82dc5d59da5 --- /dev/null +++ b/libclc/generic/lib/relational/isnan.cl @@ -0,0 +1,18 @@ +#include <clc/clc.h> +#include "relational.h" + +_CLC_DEFINE_RELATIONAL_UNARY(int, isnan, __builtin_isnan, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isnan(double) returns an int, but the vector versions +// return long. +_CLC_DEF _CLC_OVERLOAD int isnan(double x) { + return __builtin_isnan(x); +} + +_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isnan, double) + +#endif diff --git a/libclc/generic/lib/relational/isnormal.cl b/libclc/generic/lib/relational/isnormal.cl new file mode 100644 index 000000000000..2e6b42d00178 --- /dev/null +++ b/libclc/generic/lib/relational/isnormal.cl @@ -0,0 +1,18 @@ +#include <clc/clc.h> +#include "relational.h" + +_CLC_DEFINE_RELATIONAL_UNARY(int, isnormal, __builtin_isnormal, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isnormal(double) returns an int, but the vector versions +// return long. +_CLC_DEF _CLC_OVERLOAD int isnormal(double x) { + return __builtin_isnormal(x); +} + +_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isnormal, double) + +#endif diff --git a/libclc/generic/lib/relational/isnotequal.cl b/libclc/generic/lib/relational/isnotequal.cl new file mode 100644 index 000000000000..787fd8d53c20 --- /dev/null +++ b/libclc/generic/lib/relational/isnotequal.cl @@ -0,0 +1,23 @@ +#include <clc/clc.h> +#include "relational.h" + +#define _CLC_DEFINE_ISNOTEQUAL(RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \ + return (x != y); \ +} \ + +_CLC_DEFINE_ISNOTEQUAL(int, isnotequal, float, float) +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(int, isnotequal, float, float) + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isnotequal(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEFINE_ISNOTEQUAL(int, isnotequal, double, double) +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isnotequal, double, double) + +#endif + +#undef _CLC_DEFINE_ISNOTEQUAL diff --git a/libclc/generic/lib/relational/isordered.cl b/libclc/generic/lib/relational/isordered.cl new file mode 100644 index 000000000000..ebda2eb72ba2 --- /dev/null +++ b/libclc/generic/lib/relational/isordered.cl @@ -0,0 +1,23 @@ +#include <clc/clc.h> +#include "relational.h" + +#define _CLC_DEFINE_ISORDERED(RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \ + return isequal(x, x) && isequal(y, y); \ +} \ + +_CLC_DEFINE_ISORDERED(int, isordered, float, float) +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(int, isordered, float, float) + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isordered(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEFINE_ISORDERED(int, isordered, double, double) +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isordered, double, double) + +#endif + +#undef _CLC_DEFINE_ISORDERED diff --git a/libclc/generic/lib/relational/isunordered.cl b/libclc/generic/lib/relational/isunordered.cl new file mode 100644 index 000000000000..8bc5e3fa7f6d --- /dev/null +++ b/libclc/generic/lib/relational/isunordered.cl @@ -0,0 +1,22 @@ +#include <clc/clc.h> +#include "relational.h" + +//Note: It would be nice to use __builtin_isunordered with vector inputs, but it seems to only take scalar values as +// input, which will produce incorrect output for vector input types. + +_CLC_DEFINE_RELATIONAL_BINARY(int, isunordered, __builtin_isunordered, float, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of isunordered(double, double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int isunordered(double x, double y){ + return __builtin_isunordered(x, y); +} + +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isunordered, double, double) + +#endif diff --git a/libclc/generic/lib/relational/relational.h b/libclc/generic/lib/relational/relational.h new file mode 100644 index 000000000000..e492750dacb3 --- /dev/null +++ b/libclc/generic/lib/relational/relational.h @@ -0,0 +1,117 @@ +/* + * Contains relational macros that have to return 1 for scalar and -1 for vector + * when the result is true. + */ + +#define _CLC_DEFINE_RELATIONAL_UNARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_NAME, ARG_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x){ \ + return BUILTIN_NAME(x); \ +} + +#define _CLC_DEFINE_RELATIONAL_UNARY_VEC2(RET_TYPE, FUNCTION, ARG_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \ + return (RET_TYPE)( (RET_TYPE){FUNCTION(x.lo), FUNCTION(x.hi)} != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_UNARY_VEC3(RET_TYPE, FUNCTION, ARG_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \ + return (RET_TYPE)( (RET_TYPE){FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2)} != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_UNARY_VEC4(RET_TYPE, FUNCTION, ARG_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \ + return (RET_TYPE)( \ + (RET_TYPE){ \ + FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2), FUNCTION(x.s3) \ + } != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_UNARY_VEC8(RET_TYPE, FUNCTION, ARG_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \ + return (RET_TYPE)( \ + (RET_TYPE){ \ + FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2), FUNCTION(x.s3), \ + FUNCTION(x.s4), FUNCTION(x.s5), FUNCTION(x.s6), FUNCTION(x.s7) \ + } != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_UNARY_VEC16(RET_TYPE, FUNCTION, ARG_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \ + return (RET_TYPE)( \ + (RET_TYPE){ \ + FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2), FUNCTION(x.s3), \ + FUNCTION(x.s4), FUNCTION(x.s5), FUNCTION(x.s6), FUNCTION(x.s7), \ + FUNCTION(x.s8), FUNCTION(x.s9), FUNCTION(x.sa), FUNCTION(x.sb), \ + FUNCTION(x.sc), FUNCTION(x.sd), FUNCTION(x.se), FUNCTION(x.sf) \ + } != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(RET_TYPE, FUNCTION, ARG_TYPE) \ +_CLC_DEFINE_RELATIONAL_UNARY_VEC2(RET_TYPE##2, FUNCTION, ARG_TYPE##2) \ +_CLC_DEFINE_RELATIONAL_UNARY_VEC3(RET_TYPE##3, FUNCTION, ARG_TYPE##3) \ +_CLC_DEFINE_RELATIONAL_UNARY_VEC4(RET_TYPE##4, FUNCTION, ARG_TYPE##4) \ +_CLC_DEFINE_RELATIONAL_UNARY_VEC8(RET_TYPE##8, FUNCTION, ARG_TYPE##8) \ +_CLC_DEFINE_RELATIONAL_UNARY_VEC16(RET_TYPE##16, FUNCTION, ARG_TYPE##16) + +#define _CLC_DEFINE_RELATIONAL_UNARY(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG_TYPE) \ +_CLC_DEFINE_RELATIONAL_UNARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG_TYPE) \ +_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(RET_TYPE, FUNCTION, ARG_TYPE) \ + +#define _CLC_DEFINE_RELATIONAL_BINARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_NAME, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y){ \ + return BUILTIN_NAME(x, y); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \ + return (RET_TYPE)( (RET_TYPE){FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)} != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC2(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \ + return (RET_TYPE)( (RET_TYPE){FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)} != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC3(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \ + return (RET_TYPE)( (RET_TYPE){FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2)} != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC4(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \ + return (RET_TYPE)( \ + (RET_TYPE){ \ + FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2), FUNCTION(x.s3, y.s3) \ + } != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC8(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \ + return (RET_TYPE)( \ + (RET_TYPE){ \ + FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2), FUNCTION(x.s3, y.s3), \ + FUNCTION(x.s4, y.s4), FUNCTION(x.s5, y.s5), FUNCTION(x.s6, y.s6), FUNCTION(x.s7, y.s7) \ + } != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC16(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \ + return (RET_TYPE)( \ + (RET_TYPE){ \ + FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2), FUNCTION(x.s3, y.s3), \ + FUNCTION(x.s4, y.s4), FUNCTION(x.s5, y.s5), FUNCTION(x.s6, y.s6), FUNCTION(x.s7, y.s7), \ + FUNCTION(x.s8, y.s8), FUNCTION(x.s9, y.s9), FUNCTION(x.sa, y.sa), FUNCTION(x.sb, y.sb), \ + FUNCTION(x.sc, y.sc), FUNCTION(x.sd, y.sd), FUNCTION(x.se, y.se), FUNCTION(x.sf, y.sf) \ + } != (RET_TYPE)0); \ +} + +#define _CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEFINE_RELATIONAL_BINARY_VEC2(RET_TYPE##2, FUNCTION, ARG0_TYPE##2, ARG1_TYPE##2) \ +_CLC_DEFINE_RELATIONAL_BINARY_VEC3(RET_TYPE##3, FUNCTION, ARG0_TYPE##3, ARG1_TYPE##3) \ +_CLC_DEFINE_RELATIONAL_BINARY_VEC4(RET_TYPE##4, FUNCTION, ARG0_TYPE##4, ARG1_TYPE##4) \ +_CLC_DEFINE_RELATIONAL_BINARY_VEC8(RET_TYPE##8, FUNCTION, ARG0_TYPE##8, ARG1_TYPE##8) \ +_CLC_DEFINE_RELATIONAL_BINARY_VEC16(RET_TYPE##16, FUNCTION, ARG0_TYPE##16, ARG1_TYPE##16) + +#define _CLC_DEFINE_RELATIONAL_BINARY(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEFINE_RELATIONAL_BINARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG0_TYPE, ARG1_TYPE) \ +_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) diff --git a/libclc/generic/lib/relational/signbit.cl b/libclc/generic/lib/relational/signbit.cl new file mode 100644 index 000000000000..ab37d2f1288c --- /dev/null +++ b/libclc/generic/lib/relational/signbit.cl @@ -0,0 +1,19 @@ +#include <clc/clc.h> +#include "relational.h" + +_CLC_DEFINE_RELATIONAL_UNARY(int, signbit, __builtin_signbitf, float) + +#ifdef cl_khr_fp64 + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +// The scalar version of signbit(double) returns an int, but the vector versions +// return long. + +_CLC_DEF _CLC_OVERLOAD int signbit(double x){ + return __builtin_signbit(x); +} + +_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, signbit, double) + +#endif diff --git a/libclc/generic/lib/shared/clamp.cl b/libclc/generic/lib/shared/clamp.cl new file mode 100644 index 000000000000..c79a358e00e0 --- /dev/null +++ b/libclc/generic/lib/shared/clamp.cl @@ -0,0 +1,11 @@ +#include <clc/clc.h> + +#define __CLC_BODY <clamp.inc> +#include <clc/integer/gentype.inc> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <clamp.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/shared/clamp.inc b/libclc/generic/lib/shared/clamp.inc new file mode 100644 index 000000000000..c918f9c499e7 --- /dev/null +++ b/libclc/generic/lib/shared/clamp.inc @@ -0,0 +1,9 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z) { + return (x > z ? z : (x < y ? y : x)); +} + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_SCALAR_GENTYPE y, __CLC_SCALAR_GENTYPE z) { + return (x > (__CLC_GENTYPE)z ? (__CLC_GENTYPE)z : (x < (__CLC_GENTYPE)y ? (__CLC_GENTYPE)y : x)); +} +#endif diff --git a/libclc/generic/lib/shared/max.cl b/libclc/generic/lib/shared/max.cl new file mode 100644 index 000000000000..1c4457c82144 --- /dev/null +++ b/libclc/generic/lib/shared/max.cl @@ -0,0 +1,11 @@ +#include <clc/clc.h> + +#define __CLC_BODY <max.inc> +#include <clc/integer/gentype.inc> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <max.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/shared/max.inc b/libclc/generic/lib/shared/max.inc new file mode 100644 index 000000000000..75a24c077d1a --- /dev/null +++ b/libclc/generic/lib/shared/max.inc @@ -0,0 +1,9 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_GENTYPE b) { + return (a > b ? a : b); +} + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b) { + return (a > (__CLC_GENTYPE)b ? a : (__CLC_GENTYPE)b); +} +#endif diff --git a/libclc/generic/lib/shared/min.cl b/libclc/generic/lib/shared/min.cl new file mode 100644 index 000000000000..433087a1069d --- /dev/null +++ b/libclc/generic/lib/shared/min.cl @@ -0,0 +1,11 @@ +#include <clc/clc.h> + +#define __CLC_BODY <min.inc> +#include <clc/integer/gentype.inc> + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +#endif + +#define __CLC_BODY <min.inc> +#include <clc/math/gentype.inc> diff --git a/libclc/generic/lib/shared/min.inc b/libclc/generic/lib/shared/min.inc new file mode 100644 index 000000000000..fe42864df257 --- /dev/null +++ b/libclc/generic/lib/shared/min.inc @@ -0,0 +1,9 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_GENTYPE b) { + return (a < b ? a : b); +} + +#ifndef __CLC_SCALAR +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b) { + return (a < (__CLC_GENTYPE)b ? a : (__CLC_GENTYPE)b); +} +#endif diff --git a/libclc/generic/lib/shared/vload.cl b/libclc/generic/lib/shared/vload.cl new file mode 100644 index 000000000000..88972005cfa2 --- /dev/null +++ b/libclc/generic/lib/shared/vload.cl @@ -0,0 +1,52 @@ +#include <clc/clc.h> + +#define VLOAD_VECTORIZE(PRIM_TYPE, ADDR_SPACE) \ + typedef PRIM_TYPE##2 less_aligned_##ADDR_SPACE##PRIM_TYPE##2 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##2 vload2(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \ + return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&x[2*offset])); \ + } \ +\ + typedef PRIM_TYPE##3 less_aligned_##ADDR_SPACE##PRIM_TYPE##3 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##3 vload3(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \ + PRIM_TYPE##2 vec = *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&x[3*offset])); \ + return (PRIM_TYPE##3)(vec.s0, vec.s1, x[offset*3+2]); \ + } \ +\ + typedef PRIM_TYPE##4 less_aligned_##ADDR_SPACE##PRIM_TYPE##4 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##4 vload4(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \ + return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##4*) (&x[4*offset])); \ + } \ +\ + typedef PRIM_TYPE##8 less_aligned_##ADDR_SPACE##PRIM_TYPE##8 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##8 vload8(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \ + return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##8*) (&x[8*offset])); \ + } \ +\ + typedef PRIM_TYPE##16 less_aligned_##ADDR_SPACE##PRIM_TYPE##16 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##16 vload16(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \ + return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##16*) (&x[16*offset])); \ + } \ + +#define VLOAD_ADDR_SPACES(__CLC_SCALAR_GENTYPE) \ + VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __private) \ + VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __local) \ + VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __constant) \ + VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __global) \ + +#define VLOAD_TYPES() \ + VLOAD_ADDR_SPACES(char) \ + VLOAD_ADDR_SPACES(uchar) \ + VLOAD_ADDR_SPACES(short) \ + VLOAD_ADDR_SPACES(ushort) \ + VLOAD_ADDR_SPACES(int) \ + VLOAD_ADDR_SPACES(uint) \ + VLOAD_ADDR_SPACES(long) \ + VLOAD_ADDR_SPACES(ulong) \ + VLOAD_ADDR_SPACES(float) \ + +VLOAD_TYPES() + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + VLOAD_ADDR_SPACES(double) +#endif diff --git a/libclc/generic/lib/shared/vstore.cl b/libclc/generic/lib/shared/vstore.cl new file mode 100644 index 000000000000..4777b7ea76ad --- /dev/null +++ b/libclc/generic/lib/shared/vstore.cl @@ -0,0 +1,52 @@ +#include <clc/clc.h> + +#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable + +#define VSTORE_VECTORIZE(PRIM_TYPE, ADDR_SPACE) \ + typedef PRIM_TYPE##2 less_aligned_##ADDR_SPACE##PRIM_TYPE##2 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF void vstore2(PRIM_TYPE##2 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \ + *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&mem[2*offset])) = vec; \ + } \ +\ + _CLC_OVERLOAD _CLC_DEF void vstore3(PRIM_TYPE##3 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \ + *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&mem[3*offset])) = (PRIM_TYPE##2)(vec.s0, vec.s1); \ + mem[3 * offset + 2] = vec.s2;\ + } \ +\ + typedef PRIM_TYPE##4 less_aligned_##ADDR_SPACE##PRIM_TYPE##4 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF void vstore4(PRIM_TYPE##4 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \ + *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##4*) (&mem[4*offset])) = vec; \ + } \ +\ + typedef PRIM_TYPE##8 less_aligned_##ADDR_SPACE##PRIM_TYPE##8 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF void vstore8(PRIM_TYPE##8 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \ + *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##8*) (&mem[8*offset])) = vec; \ + } \ +\ + typedef PRIM_TYPE##16 less_aligned_##ADDR_SPACE##PRIM_TYPE##16 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\ + _CLC_OVERLOAD _CLC_DEF void vstore16(PRIM_TYPE##16 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \ + *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##16*) (&mem[16*offset])) = vec; \ + } \ + +#define VSTORE_ADDR_SPACES(__CLC_SCALAR___CLC_GENTYPE) \ + VSTORE_VECTORIZE(__CLC_SCALAR___CLC_GENTYPE, __private) \ + VSTORE_VECTORIZE(__CLC_SCALAR___CLC_GENTYPE, __local) \ + VSTORE_VECTORIZE(__CLC_SCALAR___CLC_GENTYPE, __global) \ + +#define VSTORE_TYPES() \ + VSTORE_ADDR_SPACES(char) \ + VSTORE_ADDR_SPACES(uchar) \ + VSTORE_ADDR_SPACES(short) \ + VSTORE_ADDR_SPACES(ushort) \ + VSTORE_ADDR_SPACES(int) \ + VSTORE_ADDR_SPACES(uint) \ + VSTORE_ADDR_SPACES(long) \ + VSTORE_ADDR_SPACES(ulong) \ + VSTORE_ADDR_SPACES(float) \ + +VSTORE_TYPES() + +#ifdef cl_khr_fp64 +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + VSTORE_ADDR_SPACES(double) +#endif diff --git a/libclc/generic/lib/workitem/get_global_id.cl b/libclc/generic/lib/workitem/get_global_id.cl new file mode 100644 index 000000000000..fdd83d2953d4 --- /dev/null +++ b/libclc/generic/lib/workitem/get_global_id.cl @@ -0,0 +1,5 @@ +#include <clc/clc.h> + +_CLC_DEF size_t get_global_id(uint dim) { + return get_group_id(dim)*get_local_size(dim) + get_local_id(dim); +} diff --git a/libclc/generic/lib/workitem/get_global_size.cl b/libclc/generic/lib/workitem/get_global_size.cl new file mode 100644 index 000000000000..5ae649e10d51 --- /dev/null +++ b/libclc/generic/lib/workitem/get_global_size.cl @@ -0,0 +1,5 @@ +#include <clc/clc.h> + +_CLC_DEF size_t get_global_size(uint dim) { + return get_num_groups(dim)*get_local_size(dim); +} diff --git a/libclc/ptx-nvidiacl/lib/SOURCES b/libclc/ptx-nvidiacl/lib/SOURCES new file mode 100644 index 000000000000..7cdbd8507699 --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/SOURCES @@ -0,0 +1,5 @@ +synchronization/barrier.cl +workitem/get_group_id.cl +workitem/get_local_id.cl +workitem/get_local_size.cl +workitem/get_num_groups.cl diff --git a/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl new file mode 100644 index 000000000000..fb36c2612be4 --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl @@ -0,0 +1,8 @@ +#include <clc/clc.h> + +_CLC_DEF void barrier(cl_mem_fence_flags flags) { + if (flags & CLK_LOCAL_MEM_FENCE) { + __builtin_ptx_bar_sync(0); + } +} + diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_group_id.cl b/libclc/ptx-nvidiacl/lib/workitem/get_group_id.cl new file mode 100644 index 000000000000..2b35b4eaaa95 --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/workitem/get_group_id.cl @@ -0,0 +1,10 @@ +#include <clc/clc.h> + +_CLC_DEF size_t get_group_id(uint dim) { + switch (dim) { + case 0: return __builtin_ptx_read_ctaid_x(); + case 1: return __builtin_ptx_read_ctaid_y(); + case 2: return __builtin_ptx_read_ctaid_z(); + default: return 0; + } +} diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_local_id.cl b/libclc/ptx-nvidiacl/lib/workitem/get_local_id.cl new file mode 100644 index 000000000000..f0cfdc005fe8 --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/workitem/get_local_id.cl @@ -0,0 +1,10 @@ +#include <clc/clc.h> + +_CLC_DEF size_t get_local_id(uint dim) { + switch (dim) { + case 0: return __builtin_ptx_read_tid_x(); + case 1: return __builtin_ptx_read_tid_y(); + case 2: return __builtin_ptx_read_tid_z(); + default: return 0; + } +} diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_local_size.cl b/libclc/ptx-nvidiacl/lib/workitem/get_local_size.cl new file mode 100644 index 000000000000..c3f542595def --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/workitem/get_local_size.cl @@ -0,0 +1,10 @@ +#include <clc/clc.h> + +_CLC_DEF size_t get_local_size(uint dim) { + switch (dim) { + case 0: return __builtin_ptx_read_ntid_x(); + case 1: return __builtin_ptx_read_ntid_y(); + case 2: return __builtin_ptx_read_ntid_z(); + default: return 0; + } +} diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_num_groups.cl b/libclc/ptx-nvidiacl/lib/workitem/get_num_groups.cl new file mode 100644 index 000000000000..90bdc2e41d2c --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/workitem/get_num_groups.cl @@ -0,0 +1,10 @@ +#include <clc/clc.h> + +_CLC_DEF size_t get_num_groups(uint dim) { + switch (dim) { + case 0: return __builtin_ptx_read_nctaid_x(); + case 1: return __builtin_ptx_read_nctaid_y(); + case 2: return __builtin_ptx_read_nctaid_z(); + default: return 0; + } +} diff --git a/libclc/ptx/lib/OVERRIDES b/libclc/ptx/lib/OVERRIDES new file mode 100644 index 000000000000..475162c97cd2 --- /dev/null +++ b/libclc/ptx/lib/OVERRIDES @@ -0,0 +1,2 @@ +integer/add_sat_if.ll +integer/sub_sat_if.ll diff --git a/libclc/ptx/lib/SOURCES b/libclc/ptx/lib/SOURCES new file mode 100644 index 000000000000..fb6e17fbc697 --- /dev/null +++ b/libclc/ptx/lib/SOURCES @@ -0,0 +1,2 @@ +integer/add_sat.ll +integer/sub_sat.ll
\ No newline at end of file diff --git a/libclc/ptx/lib/integer/add_sat.ll b/libclc/ptx/lib/integer/add_sat.ll new file mode 100644 index 000000000000..f887962c8a49 --- /dev/null +++ b/libclc/ptx/lib/integer/add_sat.ll @@ -0,0 +1,55 @@ +declare i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y) + +define ptx_device i8 @__clc_add_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y) + ret i8 %call +} + +declare i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y) + +define ptx_device i8 @__clc_add_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y) + ret i8 %call +} + +declare i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y) + +define ptx_device i16 @__clc_add_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y) + ret i16 %call +} + +declare i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y) + +define ptx_device i16 @__clc_add_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y) + ret i16 %call +} + +declare i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y) + +define ptx_device i32 @__clc_add_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y) + ret i32 %call +} + +declare i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y) + +define ptx_device i32 @__clc_add_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y) + ret i32 %call +} + +declare i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y) + +define ptx_device i64 @__clc_add_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y) + ret i64 %call +} + +declare i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y) + +define ptx_device i64 @__clc_add_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y) + ret i64 %call +} diff --git a/libclc/ptx/lib/integer/sub_sat.ll b/libclc/ptx/lib/integer/sub_sat.ll new file mode 100644 index 000000000000..1a66eb566b52 --- /dev/null +++ b/libclc/ptx/lib/integer/sub_sat.ll @@ -0,0 +1,55 @@ +declare i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y) + +define ptx_device i8 @__clc_sub_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y) + ret i8 %call +} + +declare i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y) + +define ptx_device i8 @__clc_sub_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline { + %call = call i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y) + ret i8 %call +} + +declare i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y) + +define ptx_device i16 @__clc_sub_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y) + ret i16 %call +} + +declare i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y) + +define ptx_device i16 @__clc_sub_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline { + %call = call i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y) + ret i16 %call +} + +declare i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y) + +define ptx_device i32 @__clc_sub_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y) + ret i32 %call +} + +declare i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y) + +define ptx_device i32 @__clc_sub_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline { + %call = call i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y) + ret i32 %call +} + +declare i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y) + +define ptx_device i64 @__clc_sub_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y) + ret i64 %call +} + +declare i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y) + +define ptx_device i64 @__clc_sub_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline { + %call = call i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y) + ret i64 %call +} diff --git a/libclc/r600/lib/OVERRIDES b/libclc/r600/lib/OVERRIDES new file mode 100644 index 000000000000..3f941d890be7 --- /dev/null +++ b/libclc/r600/lib/OVERRIDES @@ -0,0 +1,2 @@ +workitem/get_group_id.cl +workitem/get_global_size.cl diff --git a/libclc/r600/lib/SOURCES b/libclc/r600/lib/SOURCES new file mode 100644 index 000000000000..ef23d83a5450 --- /dev/null +++ b/libclc/r600/lib/SOURCES @@ -0,0 +1,10 @@ +atomic/atomic.cl +math/nextafter.cl +workitem/get_num_groups.ll +workitem/get_group_id.ll +workitem/get_local_size.ll +workitem/get_local_id.ll +workitem/get_global_size.ll +workitem/get_work_dim.ll +synchronization/barrier.cl +synchronization/barrier_impl.ll diff --git a/libclc/r600/lib/atomic/atomic.cl b/libclc/r600/lib/atomic/atomic.cl new file mode 100644 index 000000000000..5bfe07b94bfd --- /dev/null +++ b/libclc/r600/lib/atomic/atomic.cl @@ -0,0 +1,65 @@ +#include <clc/clc.h> + +#define ATOMIC_FUNC_DEFINE(RET_SIGN, ARG_SIGN, TYPE, CL_FUNCTION, CLC_FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \ +_CLC_OVERLOAD _CLC_DEF RET_SIGN TYPE CL_FUNCTION (volatile CL_ADDRSPACE RET_SIGN TYPE *p, RET_SIGN TYPE val) { \ + return (RET_SIGN TYPE)__clc_##CLC_FUNCTION##_addr##LLVM_ADDRSPACE((volatile CL_ADDRSPACE ARG_SIGN TYPE*)p, (ARG_SIGN TYPE)val); \ +} + +/* For atomic functions that don't need different bitcode dependending on argument signedness */ +#define ATOMIC_FUNC_SIGN(TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \ + _CLC_DECL signed TYPE __clc_##FUNCTION##_addr##LLVM_ADDRSPACE(volatile CL_ADDRSPACE signed TYPE*, signed TYPE); \ + ATOMIC_FUNC_DEFINE(signed, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \ + ATOMIC_FUNC_DEFINE(unsigned, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) + +#define ATOMIC_FUNC_ADDRSPACE(TYPE, FUNCTION) \ + ATOMIC_FUNC_SIGN(TYPE, FUNCTION, global, 1) \ + ATOMIC_FUNC_SIGN(TYPE, FUNCTION, local, 3) + +#define ATOMIC_FUNC(FUNCTION) \ + ATOMIC_FUNC_ADDRSPACE(int, FUNCTION) + +#define ATOMIC_FUNC_DEFINE_3_ARG(RET_SIGN, ARG_SIGN, TYPE, CL_FUNCTION, CLC_FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \ +_CLC_OVERLOAD _CLC_DEF RET_SIGN TYPE CL_FUNCTION (volatile CL_ADDRSPACE RET_SIGN TYPE *p, RET_SIGN TYPE cmp, RET_SIGN TYPE val) { \ + return (RET_SIGN TYPE)__clc_##CLC_FUNCTION##_addr##LLVM_ADDRSPACE((volatile CL_ADDRSPACE ARG_SIGN TYPE*)p, (ARG_SIGN TYPE)cmp, (ARG_SIGN TYPE)val); \ +} + +/* For atomic functions that don't need different bitcode dependending on argument signedness */ +#define ATOMIC_FUNC_SIGN_3_ARG(TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \ + _CLC_DECL signed TYPE __clc_##FUNCTION##_addr##LLVM_ADDRSPACE(volatile CL_ADDRSPACE signed TYPE*, signed TYPE, signed TYPE); \ + ATOMIC_FUNC_DEFINE_3_ARG(signed, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \ + ATOMIC_FUNC_DEFINE_3_ARG(unsigned, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) + +#define ATOMIC_FUNC_ADDRSPACE_3_ARG(TYPE, FUNCTION) \ + ATOMIC_FUNC_SIGN_3_ARG(TYPE, FUNCTION, global, 1) \ + ATOMIC_FUNC_SIGN_3_ARG(TYPE, FUNCTION, local, 3) + +#define ATOMIC_FUNC_3_ARG(FUNCTION) \ + ATOMIC_FUNC_ADDRSPACE_3_ARG(int, FUNCTION) + +ATOMIC_FUNC(atomic_add) +ATOMIC_FUNC(atomic_and) +ATOMIC_FUNC(atomic_or) +ATOMIC_FUNC(atomic_sub) +ATOMIC_FUNC(atomic_xchg) +ATOMIC_FUNC(atomic_xor) +ATOMIC_FUNC_3_ARG(atomic_cmpxchg) + +_CLC_DECL signed int __clc_atomic_max_addr1(volatile global signed int*, signed int); +_CLC_DECL signed int __clc_atomic_max_addr3(volatile local signed int*, signed int); +_CLC_DECL uint __clc_atomic_umax_addr1(volatile global uint*, uint); +_CLC_DECL uint __clc_atomic_umax_addr3(volatile local uint*, uint); + +ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_max, atomic_max, global, 1) +ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_max, atomic_max, local, 3) +ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_max, atomic_umax, global, 1) +ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_max, atomic_umax, local, 3) + +_CLC_DECL signed int __clc_atomic_min_addr1(volatile global signed int*, signed int); +_CLC_DECL signed int __clc_atomic_min_addr3(volatile local signed int*, signed int); +_CLC_DECL uint __clc_atomic_umin_addr1(volatile global uint*, uint); +_CLC_DECL uint __clc_atomic_umin_addr3(volatile local uint*, uint); + +ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_min, atomic_min, global, 1) +ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_min, atomic_min, local, 3) +ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_min, atomic_umin, global, 1) +ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_min, atomic_umin, local, 3) diff --git a/libclc/r600/lib/math/nextafter.cl b/libclc/r600/lib/math/nextafter.cl new file mode 100644 index 000000000000..4611c81ae91e --- /dev/null +++ b/libclc/r600/lib/math/nextafter.cl @@ -0,0 +1,4 @@ +#include <clc/clc.h> +#include "../lib/clcmacro.h" + +_CLC_DEFINE_BINARY_BUILTIN(float, nextafter, __clc_nextafter, float, float) diff --git a/libclc/r600/lib/synchronization/barrier.cl b/libclc/r600/lib/synchronization/barrier.cl new file mode 100644 index 000000000000..6f2900b06eef --- /dev/null +++ b/libclc/r600/lib/synchronization/barrier.cl @@ -0,0 +1,10 @@ + +#include <clc/clc.h> + +_CLC_DEF int __clc_clk_local_mem_fence() { + return CLK_LOCAL_MEM_FENCE; +} + +_CLC_DEF int __clc_clk_global_mem_fence() { + return CLK_GLOBAL_MEM_FENCE; +} diff --git a/libclc/r600/lib/synchronization/barrier_impl.ll b/libclc/r600/lib/synchronization/barrier_impl.ll new file mode 100644 index 000000000000..3d8ee66bab6e --- /dev/null +++ b/libclc/r600/lib/synchronization/barrier_impl.ll @@ -0,0 +1,29 @@ +declare i32 @__clc_clk_local_mem_fence() nounwind alwaysinline +declare i32 @__clc_clk_global_mem_fence() nounwind alwaysinline +declare void @llvm.AMDGPU.barrier.local() nounwind noduplicate +declare void @llvm.AMDGPU.barrier.global() nounwind noduplicate + +define void @barrier(i32 %flags) nounwind noduplicate alwaysinline { +barrier_local_test: + %CLK_LOCAL_MEM_FENCE = call i32 @__clc_clk_local_mem_fence() + %0 = and i32 %flags, %CLK_LOCAL_MEM_FENCE + %1 = icmp ne i32 %0, 0 + br i1 %1, label %barrier_local, label %barrier_global_test + +barrier_local: + call void @llvm.AMDGPU.barrier.local() noduplicate + br label %barrier_global_test + +barrier_global_test: + %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence() + %2 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE + %3 = icmp ne i32 %2, 0 + br i1 %3, label %barrier_global, label %done + +barrier_global: + call void @llvm.AMDGPU.barrier.global() noduplicate + br label %done + +done: + ret void +} diff --git a/libclc/r600/lib/workitem/get_global_size.ll b/libclc/r600/lib/workitem/get_global_size.ll new file mode 100644 index 000000000000..ac2d08d8ee19 --- /dev/null +++ b/libclc/r600/lib/workitem/get_global_size.ll @@ -0,0 +1,18 @@ +declare i32 @llvm.r600.read.global.size.x() nounwind readnone +declare i32 @llvm.r600.read.global.size.y() nounwind readnone +declare i32 @llvm.r600.read.global.size.z() nounwind readnone + +define i32 @get_global_size(i32 %dim) nounwind readnone alwaysinline { + switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim] +x_dim: + %x = call i32 @llvm.r600.read.global.size.x() nounwind readnone + ret i32 %x +y_dim: + %y = call i32 @llvm.r600.read.global.size.y() nounwind readnone + ret i32 %y +z_dim: + %z = call i32 @llvm.r600.read.global.size.z() nounwind readnone + ret i32 %z +default: + ret i32 0 +} diff --git a/libclc/r600/lib/workitem/get_group_id.ll b/libclc/r600/lib/workitem/get_group_id.ll new file mode 100644 index 000000000000..0dc86e5edfe1 --- /dev/null +++ b/libclc/r600/lib/workitem/get_group_id.ll @@ -0,0 +1,18 @@ +declare i32 @llvm.r600.read.tgid.x() nounwind readnone +declare i32 @llvm.r600.read.tgid.y() nounwind readnone +declare i32 @llvm.r600.read.tgid.z() nounwind readnone + +define i32 @get_group_id(i32 %dim) nounwind readnone alwaysinline { + switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim] +x_dim: + %x = call i32 @llvm.r600.read.tgid.x() nounwind readnone + ret i32 %x +y_dim: + %y = call i32 @llvm.r600.read.tgid.y() nounwind readnone + ret i32 %y +z_dim: + %z = call i32 @llvm.r600.read.tgid.z() nounwind readnone + ret i32 %z +default: + ret i32 0 +} diff --git a/libclc/r600/lib/workitem/get_local_id.ll b/libclc/r600/lib/workitem/get_local_id.ll new file mode 100644 index 000000000000..ac5522a7822b --- /dev/null +++ b/libclc/r600/lib/workitem/get_local_id.ll @@ -0,0 +1,18 @@ +declare i32 @llvm.r600.read.tidig.x() nounwind readnone +declare i32 @llvm.r600.read.tidig.y() nounwind readnone +declare i32 @llvm.r600.read.tidig.z() nounwind readnone + +define i32 @get_local_id(i32 %dim) nounwind readnone alwaysinline { + switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim] +x_dim: + %x = call i32 @llvm.r600.read.tidig.x() nounwind readnone + ret i32 %x +y_dim: + %y = call i32 @llvm.r600.read.tidig.y() nounwind readnone + ret i32 %y +z_dim: + %z = call i32 @llvm.r600.read.tidig.z() nounwind readnone + ret i32 %z +default: + ret i32 0 +} diff --git a/libclc/r600/lib/workitem/get_local_size.ll b/libclc/r600/lib/workitem/get_local_size.ll new file mode 100644 index 000000000000..0a98de683ae4 --- /dev/null +++ b/libclc/r600/lib/workitem/get_local_size.ll @@ -0,0 +1,18 @@ +declare i32 @llvm.r600.read.local.size.x() nounwind readnone +declare i32 @llvm.r600.read.local.size.y() nounwind readnone +declare i32 @llvm.r600.read.local.size.z() nounwind readnone + +define i32 @get_local_size(i32 %dim) nounwind readnone alwaysinline { + switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim] +x_dim: + %x = call i32 @llvm.r600.read.local.size.x() nounwind readnone + ret i32 %x +y_dim: + %y = call i32 @llvm.r600.read.local.size.y() nounwind readnone + ret i32 %y +z_dim: + %z = call i32 @llvm.r600.read.local.size.z() nounwind readnone + ret i32 %z +default: + ret i32 0 +} diff --git a/libclc/r600/lib/workitem/get_num_groups.ll b/libclc/r600/lib/workitem/get_num_groups.ll new file mode 100644 index 000000000000..a708f422c27e --- /dev/null +++ b/libclc/r600/lib/workitem/get_num_groups.ll @@ -0,0 +1,18 @@ +declare i32 @llvm.r600.read.ngroups.x() nounwind readnone +declare i32 @llvm.r600.read.ngroups.y() nounwind readnone +declare i32 @llvm.r600.read.ngroups.z() nounwind readnone + +define i32 @get_num_groups(i32 %dim) nounwind readnone alwaysinline { + switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim] +x_dim: + %x = call i32 @llvm.r600.read.ngroups.x() nounwind readnone + ret i32 %x +y_dim: + %y = call i32 @llvm.r600.read.ngroups.y() nounwind readnone + ret i32 %y +z_dim: + %z = call i32 @llvm.r600.read.ngroups.z() nounwind readnone + ret i32 %z +default: + ret i32 0 +} diff --git a/libclc/r600/lib/workitem/get_work_dim.ll b/libclc/r600/lib/workitem/get_work_dim.ll new file mode 100644 index 000000000000..1220153fe2bd --- /dev/null +++ b/libclc/r600/lib/workitem/get_work_dim.ll @@ -0,0 +1,8 @@ +declare i32 @llvm.AMDGPU.read.workdim() nounwind readnone + +define i32 @get_work_dim() nounwind readnone alwaysinline { + %x = call i32 @llvm.AMDGPU.read.workdim() nounwind readnone , !range !0 + ret i32 %x +} + +!0 = metadata !{ i32 1, i32 4 } diff --git a/libclc/test/add_sat.cl b/libclc/test/add_sat.cl new file mode 100644 index 000000000000..45c8567b4403 --- /dev/null +++ b/libclc/test/add_sat.cl @@ -0,0 +1,3 @@ +__kernel void foo(__global char *a, __global char *b, __global char *c) { + *a = add_sat(*b, *c); +} diff --git a/libclc/test/as_type.cl b/libclc/test/as_type.cl new file mode 100644 index 000000000000..e8fb1228d28d --- /dev/null +++ b/libclc/test/as_type.cl @@ -0,0 +1,3 @@ +__kernel void foo(int4 *x, float4 *y) { + *x = as_int4(*y); +} diff --git a/libclc/test/convert.cl b/libclc/test/convert.cl new file mode 100644 index 000000000000..928fc326b6a1 --- /dev/null +++ b/libclc/test/convert.cl @@ -0,0 +1,3 @@ +__kernel void foo(int4 *x, float4 *y) { + *x = convert_int4(*y); +} diff --git a/libclc/test/cos.cl b/libclc/test/cos.cl new file mode 100644 index 000000000000..4230eb2a0e93 --- /dev/null +++ b/libclc/test/cos.cl @@ -0,0 +1,3 @@ +__kernel void foo(float4 *f) { + *f = cos(*f); +} diff --git a/libclc/test/cross.cl b/libclc/test/cross.cl new file mode 100644 index 000000000000..08955cbd9af5 --- /dev/null +++ b/libclc/test/cross.cl @@ -0,0 +1,3 @@ +__kernel void foo(float4 *f) { + *f = cross(f[0], f[1]); +} diff --git a/libclc/test/fabs.cl b/libclc/test/fabs.cl new file mode 100644 index 000000000000..91d42c466676 --- /dev/null +++ b/libclc/test/fabs.cl @@ -0,0 +1,3 @@ +__kernel void foo(float *f) { + *f = fabs(*f); +} diff --git a/libclc/test/get_group_id.cl b/libclc/test/get_group_id.cl new file mode 100644 index 000000000000..43725cda8027 --- /dev/null +++ b/libclc/test/get_group_id.cl @@ -0,0 +1,3 @@ +__kernel void foo(int *i) { + i[get_group_id(0)] = 1; +} diff --git a/libclc/test/rsqrt.cl b/libclc/test/rsqrt.cl new file mode 100644 index 000000000000..13ad216b79f4 --- /dev/null +++ b/libclc/test/rsqrt.cl @@ -0,0 +1,6 @@ +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +__kernel void foo(float4 *x, double4 *y) { + x[1] = rsqrt(x[0]); + y[1] = rsqrt(y[0]); +} diff --git a/libclc/test/subsat.cl b/libclc/test/subsat.cl new file mode 100644 index 000000000000..a83414b4dc85 --- /dev/null +++ b/libclc/test/subsat.cl @@ -0,0 +1,19 @@ +__kernel void test_subsat_char(char *a, char x, char y) { + *a = sub_sat(x, y); + return; +} + +__kernel void test_subsat_uchar(uchar *a, uchar x, uchar y) { + *a = sub_sat(x, y); + return; +} + +__kernel void test_subsat_long(long *a, long x, long y) { + *a = sub_sat(x, y); + return; +} + +__kernel void test_subsat_ulong(ulong *a, ulong x, ulong y) { + *a = sub_sat(x, y); + return; +}
\ No newline at end of file diff --git a/libclc/utils/prepare-builtins.cpp b/libclc/utils/prepare-builtins.cpp new file mode 100644 index 000000000000..ee51edfee01f --- /dev/null +++ b/libclc/utils/prepare-builtins.cpp @@ -0,0 +1,137 @@ +#include "llvm/Bitcode/ReaderWriter.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/GlobalVariable.h" +#include "llvm/IR/LLVMContext.h" +#include "llvm/IR/Module.h" +#include "llvm/Support/CommandLine.h" +#include "llvm/Support/ManagedStatic.h" +#include "llvm/Support/MemoryBuffer.h" +#include "llvm/Support/FileSystem.h" +#include "llvm/Support/raw_ostream.h" +#include "llvm/Support/ErrorOr.h" +#include "llvm/Support/ToolOutputFile.h" +#include "llvm/Config/llvm-config.h" + +#define LLVM_360_AND_NEWER \ + (LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 6)) + +#define LLVM_350_AND_NEWER \ + (LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 5)) + +#if LLVM_350_AND_NEWER +#include <system_error> + +#define ERROR_CODE std::error_code +#define UNIQUE_PTR std::unique_ptr +#else +#include "llvm/ADT/OwningPtr.h" +#include "llvm/Support/system_error.h" + +#define ERROR_CODE error_code +#define UNIQUE_PTR OwningPtr +#endif + +using namespace llvm; + +static cl::opt<std::string> +InputFilename(cl::Positional, cl::desc("<input bitcode>"), cl::init("-")); + +static cl::opt<std::string> +OutputFilename("o", cl::desc("Output filename"), + cl::value_desc("filename")); + +int main(int argc, char **argv) { + LLVMContext &Context = getGlobalContext(); + llvm_shutdown_obj Y; // Call llvm_shutdown() on exit. + + cl::ParseCommandLineOptions(argc, argv, "libclc builtin preparation tool\n"); + + std::string ErrorMessage; + std::auto_ptr<Module> M; + + { +#if LLVM_350_AND_NEWER + ErrorOr<std::unique_ptr<MemoryBuffer>> BufferOrErr = + MemoryBuffer::getFile(InputFilename); + std::unique_ptr<MemoryBuffer> &BufferPtr = BufferOrErr.get(); + if (std::error_code ec = BufferOrErr.getError()) +#else + UNIQUE_PTR<MemoryBuffer> BufferPtr; + if (ERROR_CODE ec = MemoryBuffer::getFileOrSTDIN(InputFilename, BufferPtr)) +#endif + ErrorMessage = ec.message(); + else { +#if LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR > 4) +# if LLVM_360_AND_NEWER + ErrorOr<Module *> ModuleOrErr = + parseBitcodeFile(BufferPtr.get()->getMemBufferRef(), Context); +# else + ErrorOr<Module *> ModuleOrErr = parseBitcodeFile(BufferPtr.get(), Context); +# endif + if (ERROR_CODE ec = ModuleOrErr.getError()) + ErrorMessage = ec.message(); + M.reset(ModuleOrErr.get()); +#else + M.reset(ParseBitcodeFile(BufferPtr.get(), Context, &ErrorMessage)); +#endif + } + } + + if (M.get() == 0) { + errs() << argv[0] << ": "; + if (ErrorMessage.size()) + errs() << ErrorMessage << "\n"; + else + errs() << "bitcode didn't read correctly.\n"; + return 1; + } + + // Set linkage of every external definition to linkonce_odr. + for (Module::iterator i = M->begin(), e = M->end(); i != e; ++i) { + if (!i->isDeclaration() && i->getLinkage() == GlobalValue::ExternalLinkage) + i->setLinkage(GlobalValue::LinkOnceODRLinkage); + } + + for (Module::global_iterator i = M->global_begin(), e = M->global_end(); + i != e; ++i) { + if (!i->isDeclaration() && i->getLinkage() == GlobalValue::ExternalLinkage) + i->setLinkage(GlobalValue::LinkOnceODRLinkage); + } + + if (OutputFilename.empty()) { + errs() << "no output file\n"; + return 1; + } + +#if LLVM_360_AND_NEWER + std::error_code EC; + UNIQUE_PTR<tool_output_file> Out + (new tool_output_file(OutputFilename, EC, sys::fs::F_None)); + if (EC) { + errs() << EC.message() << '\n'; + exit(1); + } +#else + std::string ErrorInfo; + UNIQUE_PTR<tool_output_file> Out + (new tool_output_file(OutputFilename.c_str(), ErrorInfo, +#if (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR == 4) + sys::fs::F_Binary)); +#elif LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 5) + sys::fs::F_None)); +#else + raw_fd_ostream::F_Binary)); +#endif + if (!ErrorInfo.empty()) { + errs() << ErrorInfo << '\n'; + exit(1); + } +#endif // LLVM_360_AND_NEWER + + WriteBitcodeToFile(M.get(), Out->os()); + + // Declare success. + Out->keep(); + return 0; +} + diff --git a/libclc/www/index.html b/libclc/www/index.html new file mode 100644 index 000000000000..bbd0dc8fcede --- /dev/null +++ b/libclc/www/index.html @@ -0,0 +1,55 @@ +<html> +<head> +<title>libclc</title> +</head> +<body> +<h1>libclc</h1> +<p> +libclc is an open source, BSD/MIT dual licensed +implementation of the library requirements of the +OpenCL C programming language, as specified by the <a +href="http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf">OpenCL +1.1 Specification</a>. The following sections of the specification +impose library requirements: +<ul> +<li>6.1: Supported Data Types +<li>6.2.3: Explicit Conversions +<li>6.2.4.2: Reinterpreting Types Using as_type() and as_typen() +<li>6.9: Preprocessor Directives and Macros +<li>6.11: Built-in Functions +<li>9.3: Double Precision Floating-Point +<li>9.4: 64-bit Atomics +<li>9.5: Writing to 3D image memory objects +<li>9.6: Half Precision Floating-Point +</ul> +</p> + +<p> +libclc is intended to be used with the <a href="http://clang.llvm.org/">Clang</a> +compiler's OpenCL frontend. +</p> + +<p> +libclc is designed to be portable and extensible. To this end, +it provides generic implementations of most library requirements, +allowing the target to override the generic implementation at the +granularity of individual functions. +</p> + +<p> +libclc currently only supports the PTX target, but support for more +targets is welcome. +</p> + +<h2>Download</h2> + +<tt>svn checkout http://llvm.org/svn/llvm-project/libclc/trunk libclc</tt> (<a href="http://llvm.org/viewvc/llvm-project/libclc/trunk/">ViewVC</a>) +<br>- or -<br> +<tt>git clone http://llvm.org/git/libclc.git</tt> + +<h2>Mailing List</h2> + +libclc-dev@pcc.me.uk (<a href="http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev">subscribe/unsubscribe</a>, <a href="http://www.pcc.me.uk/pipermail/libclc-dev/">archives</a>) + +</body> +</html> |