1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
151#include "llvm/Transforms/Instrumentation/MemorySanitizer.h"
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
155#include "llvm/ADT/DepthFirstIterator.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
159#include "llvm/ADT/StringExtras.h"
160#include "llvm/ADT/StringRef.h"
161#include "llvm/Analysis/GlobalsModRef.h"
162#include "llvm/Analysis/TargetLibraryInfo.h"
163#include "llvm/Analysis/ValueTracking.h"
164#include "llvm/IR/Argument.h"
165#include "llvm/IR/AttributeMask.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
175#include "llvm/IR/GlobalVariable.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
182#include "llvm/IR/IntrinsicInst.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
191#include "llvm/Support/Alignment.h"
192#include "llvm/Support/AtomicOrdering.h"
193#include "llvm/Support/Casting.h"
194#include "llvm/Support/CommandLine.h"
195#include "llvm/Support/Debug.h"
196#include "llvm/Support/DebugCounter.h"
197#include "llvm/Support/ErrorHandling.h"
198#include "llvm/Support/MathExtras.h"
199#include "llvm/Support/raw_ostream.h"
200#include "llvm/TargetParser/Triple.h"
201#include "llvm/Transforms/Utils/BasicBlockUtils.h"
202#include "llvm/Transforms/Utils/Instrumentation.h"
203#include "llvm/Transforms/Utils/Local.h"
204#include "llvm/Transforms/Utils/ModuleUtils.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
225static const Align kMinOriginAlignment = Align(4);
226static const Align kShadowTLSAlignment = Align(8);
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
240static cl::opt<int> ClTrackOrigins(
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(Val: 0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(Val: false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(Val: true));
253
254static cl::opt<bool> ClPoisonStackWithCall(
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(Val: false));
258
259static cl::opt<int> ClPoisonStackPattern(
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(Val: 0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(Val: true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(Val: true));
276
277static cl::opt<bool> ClPoisonUndefVectors(
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(Val: false));
285
286static cl::opt<bool> ClPreciseDisjointOr(
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(Val: false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(Val: true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(Val: true));
301
302static cl::opt<bool> ClHandleLifetimeIntrinsics(
303 "msan-handle-lifetime-intrinsics",
304 cl::desc(
305 "when possible, poison scoped variables at the beginning of the scope "
306 "(slower, but more precise)"),
307 cl::Hidden, cl::init(Val: true));
308
309// When compiling the Linux kernel, we sometimes see false positives related to
310// MSan being unable to understand that inline assembly calls may initialize
311// local variables.
312// This flag makes the compiler conservatively unpoison every memory location
313// passed into an assembly call. Note that this may cause false positives.
314// Because it's impossible to figure out the array sizes, we can only unpoison
315// the first sizeof(type) bytes for each type* pointer.
316static cl::opt<bool> ClHandleAsmConservative(
317 "msan-handle-asm-conservative",
318 cl::desc("conservative handling of inline assembly"), cl::Hidden,
319 cl::init(Val: true));
320
321// This flag controls whether we check the shadow of the address
322// operand of load or store. Such bugs are very rare, since load from
323// a garbage address typically results in SEGV, but still happen
324// (e.g. only lower bits of address are garbage, or the access happens
325// early at program startup where malloc-ed memory is more likely to
326// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
327static cl::opt<bool> ClCheckAccessAddress(
328 "msan-check-access-address",
329 cl::desc("report accesses through a pointer which has poisoned shadow"),
330 cl::Hidden, cl::init(Val: true));
331
332static cl::opt<bool> ClEagerChecks(
333 "msan-eager-checks",
334 cl::desc("check arguments and return values at function call boundaries"),
335 cl::Hidden, cl::init(Val: false));
336
337static cl::opt<bool> ClDumpStrictInstructions(
338 "msan-dump-strict-instructions",
339 cl::desc("print out instructions with default strict semantics i.e.,"
340 "check that all the inputs are fully initialized, and mark "
341 "the output as fully initialized. These semantics are applied "
342 "to instructions that could not be handled explicitly nor "
343 "heuristically."),
344 cl::Hidden, cl::init(Val: false));
345
346// Currently, all the heuristically handled instructions are specifically
347// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
348// to parallel 'msan-dump-strict-instructions', and to keep the door open to
349// handling non-intrinsic instructions heuristically.
350static cl::opt<bool> ClDumpHeuristicInstructions(
351 "msan-dump-heuristic-instructions",
352 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
353 "Use -msan-dump-strict-instructions to print instructions that "
354 "could not be handled explicitly nor heuristically."),
355 cl::Hidden, cl::init(Val: false));
356
357static cl::opt<int> ClInstrumentationWithCallThreshold(
358 "msan-instrumentation-with-call-threshold",
359 cl::desc(
360 "If the function being instrumented requires more than "
361 "this number of checks and origin stores, use callbacks instead of "
362 "inline checks (-1 means never use callbacks)."),
363 cl::Hidden, cl::init(Val: 3500));
364
365static cl::opt<bool>
366 ClEnableKmsan("msan-kernel",
367 cl::desc("Enable KernelMemorySanitizer instrumentation"),
368 cl::Hidden, cl::init(Val: false));
369
370static cl::opt<bool>
371 ClDisableChecks("msan-disable-checks",
372 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
373 cl::init(Val: false));
374
375static cl::opt<bool>
376 ClCheckConstantShadow("msan-check-constant-shadow",
377 cl::desc("Insert checks for constant shadow values"),
378 cl::Hidden, cl::init(Val: true));
379
380// This is off by default because of a bug in gold:
381// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
382static cl::opt<bool>
383 ClWithComdat("msan-with-comdat",
384 cl::desc("Place MSan constructors in comdat sections"),
385 cl::Hidden, cl::init(Val: false));
386
387// These options allow to specify custom memory map parameters
388// See MemoryMapParams for details.
389static cl::opt<uint64_t> ClAndMask("msan-and-mask",
390 cl::desc("Define custom MSan AndMask"),
391 cl::Hidden, cl::init(Val: 0));
392
393static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
394 cl::desc("Define custom MSan XorMask"),
395 cl::Hidden, cl::init(Val: 0));
396
397static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
398 cl::desc("Define custom MSan ShadowBase"),
399 cl::Hidden, cl::init(Val: 0));
400
401static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
402 cl::desc("Define custom MSan OriginBase"),
403 cl::Hidden, cl::init(Val: 0));
404
405static cl::opt<int>
406 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
407 cl::desc("Define threshold for number of checks per "
408 "debug location to force origin update."),
409 cl::Hidden, cl::init(Val: 3));
410
411const char kMsanModuleCtorName[] = "msan.module_ctor";
412const char kMsanInitName[] = "__msan_init";
413
414namespace {
415
416// Memory map parameters used in application-to-shadow address calculation.
417// Offset = (Addr & ~AndMask) ^ XorMask
418// Shadow = ShadowBase + Offset
419// Origin = OriginBase + Offset
420struct MemoryMapParams {
421 uint64_t AndMask;
422 uint64_t XorMask;
423 uint64_t ShadowBase;
424 uint64_t OriginBase;
425};
426
427struct PlatformMemoryMapParams {
428 const MemoryMapParams *bits32;
429 const MemoryMapParams *bits64;
430};
431
432} // end anonymous namespace
433
434// i386 Linux
435static const MemoryMapParams Linux_I386_MemoryMapParams = {
436 .AndMask: 0x000080000000, // AndMask
437 .XorMask: 0, // XorMask (not used)
438 .ShadowBase: 0, // ShadowBase (not used)
439 .OriginBase: 0x000040000000, // OriginBase
440};
441
442// x86_64 Linux
443static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
444 .AndMask: 0, // AndMask (not used)
445 .XorMask: 0x500000000000, // XorMask
446 .ShadowBase: 0, // ShadowBase (not used)
447 .OriginBase: 0x100000000000, // OriginBase
448};
449
450// mips32 Linux
451// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
452// after picking good constants
453
454// mips64 Linux
455static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
456 .AndMask: 0, // AndMask (not used)
457 .XorMask: 0x008000000000, // XorMask
458 .ShadowBase: 0, // ShadowBase (not used)
459 .OriginBase: 0x002000000000, // OriginBase
460};
461
462// ppc32 Linux
463// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
464// after picking good constants
465
466// ppc64 Linux
467static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
468 .AndMask: 0xE00000000000, // AndMask
469 .XorMask: 0x100000000000, // XorMask
470 .ShadowBase: 0x080000000000, // ShadowBase
471 .OriginBase: 0x1C0000000000, // OriginBase
472};
473
474// s390x Linux
475static const MemoryMapParams Linux_S390X_MemoryMapParams = {
476 .AndMask: 0xC00000000000, // AndMask
477 .XorMask: 0, // XorMask (not used)
478 .ShadowBase: 0x080000000000, // ShadowBase
479 .OriginBase: 0x1C0000000000, // OriginBase
480};
481
482// arm32 Linux
483// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
484// after picking good constants
485
486// aarch64 Linux
487static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
488 .AndMask: 0, // AndMask (not used)
489 .XorMask: 0x0B00000000000, // XorMask
490 .ShadowBase: 0, // ShadowBase (not used)
491 .OriginBase: 0x0200000000000, // OriginBase
492};
493
494// loongarch64 Linux
495static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
496 .AndMask: 0, // AndMask (not used)
497 .XorMask: 0x500000000000, // XorMask
498 .ShadowBase: 0, // ShadowBase (not used)
499 .OriginBase: 0x100000000000, // OriginBase
500};
501
502// riscv32 Linux
503// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
504// after picking good constants
505
506// aarch64 FreeBSD
507static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
508 .AndMask: 0x1800000000000, // AndMask
509 .XorMask: 0x0400000000000, // XorMask
510 .ShadowBase: 0x0200000000000, // ShadowBase
511 .OriginBase: 0x0700000000000, // OriginBase
512};
513
514// i386 FreeBSD
515static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
516 .AndMask: 0x000180000000, // AndMask
517 .XorMask: 0x000040000000, // XorMask
518 .ShadowBase: 0x000020000000, // ShadowBase
519 .OriginBase: 0x000700000000, // OriginBase
520};
521
522// x86_64 FreeBSD
523static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
524 .AndMask: 0xc00000000000, // AndMask
525 .XorMask: 0x200000000000, // XorMask
526 .ShadowBase: 0x100000000000, // ShadowBase
527 .OriginBase: 0x380000000000, // OriginBase
528};
529
530// x86_64 NetBSD
531static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
532 .AndMask: 0, // AndMask
533 .XorMask: 0x500000000000, // XorMask
534 .ShadowBase: 0, // ShadowBase
535 .OriginBase: 0x100000000000, // OriginBase
536};
537
538static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
539 .bits32: &Linux_I386_MemoryMapParams,
540 .bits64: &Linux_X86_64_MemoryMapParams,
541};
542
543static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
544 .bits32: nullptr,
545 .bits64: &Linux_MIPS64_MemoryMapParams,
546};
547
548static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
549 .bits32: nullptr,
550 .bits64: &Linux_PowerPC64_MemoryMapParams,
551};
552
553static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
554 .bits32: nullptr,
555 .bits64: &Linux_S390X_MemoryMapParams,
556};
557
558static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
559 .bits32: nullptr,
560 .bits64: &Linux_AArch64_MemoryMapParams,
561};
562
563static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
564 .bits32: nullptr,
565 .bits64: &Linux_LoongArch64_MemoryMapParams,
566};
567
568static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
569 .bits32: nullptr,
570 .bits64: &FreeBSD_AArch64_MemoryMapParams,
571};
572
573static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
574 .bits32: &FreeBSD_I386_MemoryMapParams,
575 .bits64: &FreeBSD_X86_64_MemoryMapParams,
576};
577
578static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
579 .bits32: nullptr,
580 .bits64: &NetBSD_X86_64_MemoryMapParams,
581};
582
583enum OddOrEvenLanes { kBothLanes, kEvenLanes, kOddLanes };
584
585namespace {
586
587/// Instrument functions of a module to detect uninitialized reads.
588///
589/// Instantiating MemorySanitizer inserts the msan runtime library API function
590/// declarations into the module if they don't exist already. Instantiating
591/// ensures the __msan_init function is in the list of global constructors for
592/// the module.
593class MemorySanitizer {
594public:
595 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
596 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
597 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
598 initializeModule(M);
599 }
600
601 // MSan cannot be moved or copied because of MapParams.
602 MemorySanitizer(MemorySanitizer &&) = delete;
603 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
604 MemorySanitizer(const MemorySanitizer &) = delete;
605 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
606
607 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
608
609private:
610 friend struct MemorySanitizerVisitor;
611 friend struct VarArgHelperBase;
612 friend struct VarArgAMD64Helper;
613 friend struct VarArgAArch64Helper;
614 friend struct VarArgPowerPC64Helper;
615 friend struct VarArgPowerPC32Helper;
616 friend struct VarArgSystemZHelper;
617 friend struct VarArgI386Helper;
618 friend struct VarArgGenericHelper;
619
620 void initializeModule(Module &M);
621 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
622 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
623 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
624
625 template <typename... ArgsTy>
626 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
627 ArgsTy... Args);
628
629 /// True if we're compiling the Linux kernel.
630 bool CompileKernel;
631 /// Track origins (allocation points) of uninitialized values.
632 int TrackOrigins;
633 bool Recover;
634 bool EagerChecks;
635
636 Triple TargetTriple;
637 LLVMContext *C;
638 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
639 Type *OriginTy;
640 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
641
642 // XxxTLS variables represent the per-thread state in MSan and per-task state
643 // in KMSAN.
644 // For the userspace these point to thread-local globals. In the kernel land
645 // they point to the members of a per-task struct obtained via a call to
646 // __msan_get_context_state().
647
648 /// Thread-local shadow storage for function parameters.
649 Value *ParamTLS;
650
651 /// Thread-local origin storage for function parameters.
652 Value *ParamOriginTLS;
653
654 /// Thread-local shadow storage for function return value.
655 Value *RetvalTLS;
656
657 /// Thread-local origin storage for function return value.
658 Value *RetvalOriginTLS;
659
660 /// Thread-local shadow storage for in-register va_arg function.
661 Value *VAArgTLS;
662
663 /// Thread-local shadow storage for in-register va_arg function.
664 Value *VAArgOriginTLS;
665
666 /// Thread-local shadow storage for va_arg overflow area.
667 Value *VAArgOverflowSizeTLS;
668
669 /// Are the instrumentation callbacks set up?
670 bool CallbacksInitialized = false;
671
672 /// The run-time callback to print a warning.
673 FunctionCallee WarningFn;
674
675 // These arrays are indexed by log2(AccessSize).
676 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
677 FunctionCallee MaybeWarningVarSizeFn;
678 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
679
680 /// Run-time helper that generates a new origin value for a stack
681 /// allocation.
682 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
683 // No description version
684 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
685
686 /// Run-time helper that poisons stack on function entry.
687 FunctionCallee MsanPoisonStackFn;
688
689 /// Run-time helper that records a store (or any event) of an
690 /// uninitialized value and returns an updated origin id encoding this info.
691 FunctionCallee MsanChainOriginFn;
692
693 /// Run-time helper that paints an origin over a region.
694 FunctionCallee MsanSetOriginFn;
695
696 /// MSan runtime replacements for memmove, memcpy and memset.
697 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
698
699 /// KMSAN callback for task-local function argument shadow.
700 StructType *MsanContextStateTy;
701 FunctionCallee MsanGetContextStateFn;
702
703 /// Functions for poisoning/unpoisoning local variables
704 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
705
706 /// Pair of shadow/origin pointers.
707 Type *MsanMetadata;
708
709 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
710 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
711 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
712 FunctionCallee MsanMetadataPtrForStore_1_8[4];
713 FunctionCallee MsanInstrumentAsmStoreFn;
714
715 /// Storage for return values of the MsanMetadataPtrXxx functions.
716 Value *MsanMetadataAlloca;
717
718 /// Helper to choose between different MsanMetadataPtrXxx().
719 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
720
721 /// Memory map parameters used in application-to-shadow calculation.
722 const MemoryMapParams *MapParams;
723
724 /// Custom memory map parameters used when -msan-shadow-base or
725 // -msan-origin-base is provided.
726 MemoryMapParams CustomMapParams;
727
728 MDNode *ColdCallWeights;
729
730 /// Branch weights for origin store.
731 MDNode *OriginStoreWeights;
732};
733
734void insertModuleCtor(Module &M) {
735 getOrCreateSanitizerCtorAndInitFunctions(
736 M, CtorName: kMsanModuleCtorName, InitName: kMsanInitName,
737 /*InitArgTypes=*/{},
738 /*InitArgs=*/{},
739 // This callback is invoked when the functions are created the first
740 // time. Hook them into the global ctors list in that case:
741 FunctionsCreatedCallback: [&](Function *Ctor, FunctionCallee) {
742 if (!ClWithComdat) {
743 appendToGlobalCtors(M, F: Ctor, Priority: 0);
744 return;
745 }
746 Comdat *MsanCtorComdat = M.getOrInsertComdat(Name: kMsanModuleCtorName);
747 Ctor->setComdat(MsanCtorComdat);
748 appendToGlobalCtors(M, F: Ctor, Priority: 0, Data: Ctor);
749 });
750}
751
752template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
753 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
754}
755
756} // end anonymous namespace
757
758MemorySanitizerOptions::MemorySanitizerOptions(int TO, bool R, bool K,
759 bool EagerChecks)
760 : Kernel(getOptOrDefault(Opt: ClEnableKmsan, Default: K)),
761 TrackOrigins(getOptOrDefault(Opt: ClTrackOrigins, Default: Kernel ? 2 : TO)),
762 Recover(getOptOrDefault(Opt: ClKeepGoing, Default: Kernel || R)),
763 EagerChecks(getOptOrDefault(Opt: ClEagerChecks, Default: EagerChecks)) {}
764
765PreservedAnalyses MemorySanitizerPass::run(Module &M,
766 ModuleAnalysisManager &AM) {
767 // Return early if nosanitize_memory module flag is present for the module.
768 if (checkIfAlreadyInstrumented(M, Flag: "nosanitize_memory"))
769 return PreservedAnalyses::all();
770 bool Modified = false;
771 if (!Options.Kernel) {
772 insertModuleCtor(M);
773 Modified = true;
774 }
775
776 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(IR&: M).getManager();
777 for (Function &F : M) {
778 if (F.empty())
779 continue;
780 MemorySanitizer Msan(*F.getParent(), Options);
781 Modified |=
782 Msan.sanitizeFunction(F, TLI&: FAM.getResult<TargetLibraryAnalysis>(IR&: F));
783 }
784
785 if (!Modified)
786 return PreservedAnalyses::all();
787
788 PreservedAnalyses PA = PreservedAnalyses::none();
789 // GlobalsAA is considered stateless and does not get invalidated unless
790 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
791 // make changes that require GlobalsAA to be invalidated.
792 PA.abandon<GlobalsAA>();
793 return PA;
794}
795
796void MemorySanitizerPass::printPipeline(
797 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
798 static_cast<PassInfoMixin<MemorySanitizerPass> *>(this)->printPipeline(
799 OS, MapClassName2PassName);
800 OS << '<';
801 if (Options.Recover)
802 OS << "recover;";
803 if (Options.Kernel)
804 OS << "kernel;";
805 if (Options.EagerChecks)
806 OS << "eager-checks;";
807 OS << "track-origins=" << Options.TrackOrigins;
808 OS << '>';
809}
810
811/// Create a non-const global initialized with the given string.
812///
813/// Creates a writable global for Str so that we can pass it to the
814/// run-time lib. Runtime uses first 4 bytes of the string to store the
815/// frame ID, so the string needs to be mutable.
816static GlobalVariable *createPrivateConstGlobalForString(Module &M,
817 StringRef Str) {
818 Constant *StrConst = ConstantDataArray::getString(Context&: M.getContext(), Initializer: Str);
819 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
820 GlobalValue::PrivateLinkage, StrConst, "");
821}
822
823template <typename... ArgsTy>
824FunctionCallee
825MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
826 ArgsTy... Args) {
827 if (TargetTriple.getArch() == Triple::systemz) {
828 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
829 return M.getOrInsertFunction(Name, Type::getVoidTy(C&: *C), PtrTy,
830 std::forward<ArgsTy>(Args)...);
831 }
832
833 return M.getOrInsertFunction(Name, MsanMetadata,
834 std::forward<ArgsTy>(Args)...);
835}
836
837/// Create KMSAN API callbacks.
838void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
839 IRBuilder<> IRB(*C);
840
841 // These will be initialized in insertKmsanPrologue().
842 RetvalTLS = nullptr;
843 RetvalOriginTLS = nullptr;
844 ParamTLS = nullptr;
845 ParamOriginTLS = nullptr;
846 VAArgTLS = nullptr;
847 VAArgOriginTLS = nullptr;
848 VAArgOverflowSizeTLS = nullptr;
849
850 WarningFn = M.getOrInsertFunction(Name: "__msan_warning",
851 AttributeList: TLI.getAttrList(C, ArgNos: {0}, /*Signed=*/false),
852 RetTy: IRB.getVoidTy(), Args: IRB.getInt32Ty());
853
854 // Requests the per-task context state (kmsan_context_state*) from the
855 // runtime library.
856 MsanContextStateTy = StructType::get(
857 elt1: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8),
858 elts: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kRetvalTLSSize / 8),
859 elts: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8),
860 elts: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8), /* va_arg_origin */
861 elts: IRB.getInt64Ty(), elts: ArrayType::get(ElementType: OriginTy, NumElements: kParamTLSSize / 4), elts: OriginTy,
862 elts: OriginTy);
863 MsanGetContextStateFn =
864 M.getOrInsertFunction(Name: "__msan_get_context_state", RetTy: PtrTy);
865
866 MsanMetadata = StructType::get(elt1: PtrTy, elts: PtrTy);
867
868 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
869 std::string name_load =
870 "__msan_metadata_ptr_for_load_" + std::to_string(val: size);
871 std::string name_store =
872 "__msan_metadata_ptr_for_store_" + std::to_string(val: size);
873 MsanMetadataPtrForLoad_1_8[ind] =
874 getOrInsertMsanMetadataFunction(M, Name: name_load, Args: PtrTy);
875 MsanMetadataPtrForStore_1_8[ind] =
876 getOrInsertMsanMetadataFunction(M, Name: name_store, Args: PtrTy);
877 }
878
879 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
880 M, Name: "__msan_metadata_ptr_for_load_n", Args: PtrTy, Args: IntptrTy);
881 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
882 M, Name: "__msan_metadata_ptr_for_store_n", Args: PtrTy, Args: IntptrTy);
883
884 // Functions for poisoning and unpoisoning memory.
885 MsanPoisonAllocaFn = M.getOrInsertFunction(
886 Name: "__msan_poison_alloca", RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: PtrTy);
887 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
888 Name: "__msan_unpoison_alloca", RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy);
889}
890
891static Constant *getOrInsertGlobal(Module &M, StringRef Name, Type *Ty) {
892 return M.getOrInsertGlobal(Name, Ty, CreateGlobalCallback: [&] {
893 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
894 nullptr, Name, nullptr,
895 GlobalVariable::InitialExecTLSModel);
896 });
897}
898
899/// Insert declarations for userspace-specific functions and globals.
900void MemorySanitizer::createUserspaceApi(Module &M,
901 const TargetLibraryInfo &TLI) {
902 IRBuilder<> IRB(*C);
903
904 // Create the callback.
905 // FIXME: this function should have "Cold" calling conv,
906 // which is not yet implemented.
907 if (TrackOrigins) {
908 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
909 : "__msan_warning_with_origin_noreturn";
910 WarningFn = M.getOrInsertFunction(Name: WarningFnName,
911 AttributeList: TLI.getAttrList(C, ArgNos: {0}, /*Signed=*/false),
912 RetTy: IRB.getVoidTy(), Args: IRB.getInt32Ty());
913 } else {
914 StringRef WarningFnName =
915 Recover ? "__msan_warning" : "__msan_warning_noreturn";
916 WarningFn = M.getOrInsertFunction(Name: WarningFnName, RetTy: IRB.getVoidTy());
917 }
918
919 // Create the global TLS variables.
920 RetvalTLS =
921 getOrInsertGlobal(M, Name: "__msan_retval_tls",
922 Ty: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kRetvalTLSSize / 8));
923
924 RetvalOriginTLS = getOrInsertGlobal(M, Name: "__msan_retval_origin_tls", Ty: OriginTy);
925
926 ParamTLS =
927 getOrInsertGlobal(M, Name: "__msan_param_tls",
928 Ty: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8));
929
930 ParamOriginTLS =
931 getOrInsertGlobal(M, Name: "__msan_param_origin_tls",
932 Ty: ArrayType::get(ElementType: OriginTy, NumElements: kParamTLSSize / 4));
933
934 VAArgTLS =
935 getOrInsertGlobal(M, Name: "__msan_va_arg_tls",
936 Ty: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8));
937
938 VAArgOriginTLS =
939 getOrInsertGlobal(M, Name: "__msan_va_arg_origin_tls",
940 Ty: ArrayType::get(ElementType: OriginTy, NumElements: kParamTLSSize / 4));
941
942 VAArgOverflowSizeTLS = getOrInsertGlobal(M, Name: "__msan_va_arg_overflow_size_tls",
943 Ty: IRB.getIntPtrTy(DL: M.getDataLayout()));
944
945 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
946 AccessSizeIndex++) {
947 unsigned AccessSize = 1 << AccessSizeIndex;
948 std::string FunctionName = "__msan_maybe_warning_" + itostr(X: AccessSize);
949 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
950 Name: FunctionName, AttributeList: TLI.getAttrList(C, ArgNos: {0, 1}, /*Signed=*/false),
951 RetTy: IRB.getVoidTy(), Args: IRB.getIntNTy(N: AccessSize * 8), Args: IRB.getInt32Ty());
952 MaybeWarningVarSizeFn = M.getOrInsertFunction(
953 Name: "__msan_maybe_warning_N", AttributeList: TLI.getAttrList(C, ArgNos: {}, /*Signed=*/false),
954 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IRB.getInt64Ty(), Args: IRB.getInt32Ty());
955 FunctionName = "__msan_maybe_store_origin_" + itostr(X: AccessSize);
956 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
957 Name: FunctionName, AttributeList: TLI.getAttrList(C, ArgNos: {0, 2}, /*Signed=*/false),
958 RetTy: IRB.getVoidTy(), Args: IRB.getIntNTy(N: AccessSize * 8), Args: PtrTy,
959 Args: IRB.getInt32Ty());
960 }
961
962 MsanSetAllocaOriginWithDescriptionFn =
963 M.getOrInsertFunction(Name: "__msan_set_alloca_origin_with_descr",
964 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: PtrTy, Args: PtrTy);
965 MsanSetAllocaOriginNoDescriptionFn =
966 M.getOrInsertFunction(Name: "__msan_set_alloca_origin_no_descr",
967 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: PtrTy);
968 MsanPoisonStackFn = M.getOrInsertFunction(Name: "__msan_poison_stack",
969 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy);
970}
971
972/// Insert extern declaration of runtime-provided functions and globals.
973void MemorySanitizer::initializeCallbacks(Module &M,
974 const TargetLibraryInfo &TLI) {
975 // Only do this once.
976 if (CallbacksInitialized)
977 return;
978
979 IRBuilder<> IRB(*C);
980 // Initialize callbacks that are common for kernel and userspace
981 // instrumentation.
982 MsanChainOriginFn = M.getOrInsertFunction(
983 Name: "__msan_chain_origin",
984 AttributeList: TLI.getAttrList(C, ArgNos: {0}, /*Signed=*/false, /*Ret=*/true), RetTy: IRB.getInt32Ty(),
985 Args: IRB.getInt32Ty());
986 MsanSetOriginFn = M.getOrInsertFunction(
987 Name: "__msan_set_origin", AttributeList: TLI.getAttrList(C, ArgNos: {2}, /*Signed=*/false),
988 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: IRB.getInt32Ty());
989 MemmoveFn =
990 M.getOrInsertFunction(Name: "__msan_memmove", RetTy: PtrTy, Args: PtrTy, Args: PtrTy, Args: IntptrTy);
991 MemcpyFn =
992 M.getOrInsertFunction(Name: "__msan_memcpy", RetTy: PtrTy, Args: PtrTy, Args: PtrTy, Args: IntptrTy);
993 MemsetFn = M.getOrInsertFunction(Name: "__msan_memset",
994 AttributeList: TLI.getAttrList(C, ArgNos: {1}, /*Signed=*/true),
995 RetTy: PtrTy, Args: PtrTy, Args: IRB.getInt32Ty(), Args: IntptrTy);
996
997 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
998 Name: "__msan_instrument_asm_store", RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy);
999
1000 if (CompileKernel) {
1001 createKernelApi(M, TLI);
1002 } else {
1003 createUserspaceApi(M, TLI);
1004 }
1005 CallbacksInitialized = true;
1006}
1007
1008FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1009 int size) {
1010 FunctionCallee *Fns =
1011 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1012 switch (size) {
1013 case 1:
1014 return Fns[0];
1015 case 2:
1016 return Fns[1];
1017 case 4:
1018 return Fns[2];
1019 case 8:
1020 return Fns[3];
1021 default:
1022 return nullptr;
1023 }
1024}
1025
1026/// Module-level initialization.
1027///
1028/// inserts a call to __msan_init to the module's constructor list.
1029void MemorySanitizer::initializeModule(Module &M) {
1030 auto &DL = M.getDataLayout();
1031
1032 TargetTriple = M.getTargetTriple();
1033
1034 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1035 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1036 // Check the overrides first
1037 if (ShadowPassed || OriginPassed) {
1038 CustomMapParams.AndMask = ClAndMask;
1039 CustomMapParams.XorMask = ClXorMask;
1040 CustomMapParams.ShadowBase = ClShadowBase;
1041 CustomMapParams.OriginBase = ClOriginBase;
1042 MapParams = &CustomMapParams;
1043 } else {
1044 switch (TargetTriple.getOS()) {
1045 case Triple::FreeBSD:
1046 switch (TargetTriple.getArch()) {
1047 case Triple::aarch64:
1048 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1049 break;
1050 case Triple::x86_64:
1051 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1052 break;
1053 case Triple::x86:
1054 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1055 break;
1056 default:
1057 report_fatal_error(reason: "unsupported architecture");
1058 }
1059 break;
1060 case Triple::NetBSD:
1061 switch (TargetTriple.getArch()) {
1062 case Triple::x86_64:
1063 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1064 break;
1065 default:
1066 report_fatal_error(reason: "unsupported architecture");
1067 }
1068 break;
1069 case Triple::Linux:
1070 switch (TargetTriple.getArch()) {
1071 case Triple::x86_64:
1072 MapParams = Linux_X86_MemoryMapParams.bits64;
1073 break;
1074 case Triple::x86:
1075 MapParams = Linux_X86_MemoryMapParams.bits32;
1076 break;
1077 case Triple::mips64:
1078 case Triple::mips64el:
1079 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1080 break;
1081 case Triple::ppc64:
1082 case Triple::ppc64le:
1083 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1084 break;
1085 case Triple::systemz:
1086 MapParams = Linux_S390_MemoryMapParams.bits64;
1087 break;
1088 case Triple::aarch64:
1089 case Triple::aarch64_be:
1090 MapParams = Linux_ARM_MemoryMapParams.bits64;
1091 break;
1092 case Triple::loongarch64:
1093 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1094 break;
1095 default:
1096 report_fatal_error(reason: "unsupported architecture");
1097 }
1098 break;
1099 default:
1100 report_fatal_error(reason: "unsupported operating system");
1101 }
1102 }
1103
1104 C = &(M.getContext());
1105 IRBuilder<> IRB(*C);
1106 IntptrTy = IRB.getIntPtrTy(DL);
1107 OriginTy = IRB.getInt32Ty();
1108 PtrTy = IRB.getPtrTy();
1109
1110 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1111 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1112
1113 if (!CompileKernel) {
1114 if (TrackOrigins)
1115 M.getOrInsertGlobal(Name: "__msan_track_origins", Ty: IRB.getInt32Ty(), CreateGlobalCallback: [&] {
1116 return new GlobalVariable(
1117 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1118 IRB.getInt32(C: TrackOrigins), "__msan_track_origins");
1119 });
1120
1121 if (Recover)
1122 M.getOrInsertGlobal(Name: "__msan_keep_going", Ty: IRB.getInt32Ty(), CreateGlobalCallback: [&] {
1123 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1124 GlobalValue::WeakODRLinkage,
1125 IRB.getInt32(C: Recover), "__msan_keep_going");
1126 });
1127 }
1128}
1129
1130namespace {
1131
1132/// A helper class that handles instrumentation of VarArg
1133/// functions on a particular platform.
1134///
1135/// Implementations are expected to insert the instrumentation
1136/// necessary to propagate argument shadow through VarArg function
1137/// calls. Visit* methods are called during an InstVisitor pass over
1138/// the function, and should avoid creating new basic blocks. A new
1139/// instance of this class is created for each instrumented function.
1140struct VarArgHelper {
1141 virtual ~VarArgHelper() = default;
1142
1143 /// Visit a CallBase.
1144 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1145
1146 /// Visit a va_start call.
1147 virtual void visitVAStartInst(VAStartInst &I) = 0;
1148
1149 /// Visit a va_copy call.
1150 virtual void visitVACopyInst(VACopyInst &I) = 0;
1151
1152 /// Finalize function instrumentation.
1153 ///
1154 /// This method is called after visiting all interesting (see above)
1155 /// instructions in a function.
1156 virtual void finalizeInstrumentation() = 0;
1157};
1158
1159struct MemorySanitizerVisitor;
1160
1161} // end anonymous namespace
1162
1163static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1164 MemorySanitizerVisitor &Visitor);
1165
1166static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1167 if (TS.isScalable())
1168 // Scalable types unconditionally take slowpaths.
1169 return kNumberOfAccessSizes;
1170 unsigned TypeSizeFixed = TS.getFixedValue();
1171 if (TypeSizeFixed <= 8)
1172 return 0;
1173 return Log2_32_Ceil(Value: (TypeSizeFixed + 7) / 8);
1174}
1175
1176namespace {
1177
1178/// Helper class to attach debug information of the given instruction onto new
1179/// instructions inserted after.
1180class NextNodeIRBuilder : public IRBuilder<> {
1181public:
1182 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1183 SetCurrentDebugLocation(IP->getDebugLoc());
1184 }
1185};
1186
1187/// This class does all the work for a given function. Store and Load
1188/// instructions store and load corresponding shadow and origin
1189/// values. Most instructions propagate shadow from arguments to their
1190/// return values. Certain instructions (most importantly, BranchInst)
1191/// test their argument shadow and print reports (with a runtime call) if it's
1192/// non-zero.
1193struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1194 Function &F;
1195 MemorySanitizer &MS;
1196 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1197 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1198 std::unique_ptr<VarArgHelper> VAHelper;
1199 const TargetLibraryInfo *TLI;
1200 Instruction *FnPrologueEnd;
1201 SmallVector<Instruction *, 16> Instructions;
1202
1203 // The following flags disable parts of MSan instrumentation based on
1204 // exclusion list contents and command-line options.
1205 bool InsertChecks;
1206 bool PropagateShadow;
1207 bool PoisonStack;
1208 bool PoisonUndef;
1209 bool PoisonUndefVectors;
1210
1211 struct ShadowOriginAndInsertPoint {
1212 Value *Shadow;
1213 Value *Origin;
1214 Instruction *OrigIns;
1215
1216 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1217 : Shadow(S), Origin(O), OrigIns(I) {}
1218 };
1219 SmallVector<ShadowOriginAndInsertPoint, 16> InstrumentationList;
1220 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1221 SmallSetVector<AllocaInst *, 16> AllocaSet;
1222 SmallVector<std::pair<IntrinsicInst *, AllocaInst *>, 16> LifetimeStartList;
1223 SmallVector<StoreInst *, 16> StoreList;
1224 int64_t SplittableBlocksCount = 0;
1225
1226 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1227 const TargetLibraryInfo &TLI)
1228 : F(F), MS(MS), VAHelper(CreateVarArgHelper(Func&: F, Msan&: MS, Visitor&: *this)), TLI(&TLI) {
1229 bool SanitizeFunction =
1230 F.hasFnAttribute(Kind: Attribute::SanitizeMemory) && !ClDisableChecks;
1231 InsertChecks = SanitizeFunction;
1232 PropagateShadow = SanitizeFunction;
1233 PoisonStack = SanitizeFunction && ClPoisonStack;
1234 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1235 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1236
1237 // In the presence of unreachable blocks, we may see Phi nodes with
1238 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1239 // blocks, such nodes will not have any shadow value associated with them.
1240 // It's easier to remove unreachable blocks than deal with missing shadow.
1241 removeUnreachableBlocks(F);
1242
1243 MS.initializeCallbacks(M&: *F.getParent(), TLI);
1244 FnPrologueEnd =
1245 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1246 .CreateIntrinsic(ID: Intrinsic::donothing, Args: {});
1247
1248 if (MS.CompileKernel) {
1249 IRBuilder<> IRB(FnPrologueEnd);
1250 insertKmsanPrologue(IRB);
1251 }
1252
1253 LLVM_DEBUG(if (!InsertChecks) dbgs()
1254 << "MemorySanitizer is not inserting checks into '"
1255 << F.getName() << "'\n");
1256 }
1257
1258 bool instrumentWithCalls(Value *V) {
1259 // Constants likely will be eliminated by follow-up passes.
1260 if (isa<Constant>(Val: V))
1261 return false;
1262 ++SplittableBlocksCount;
1263 return ClInstrumentationWithCallThreshold >= 0 &&
1264 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1265 }
1266
1267 bool isInPrologue(Instruction &I) {
1268 return I.getParent() == FnPrologueEnd->getParent() &&
1269 (&I == FnPrologueEnd || I.comesBefore(Other: FnPrologueEnd));
1270 }
1271
1272 // Creates a new origin and records the stack trace. In general we can call
1273 // this function for any origin manipulation we like. However it will cost
1274 // runtime resources. So use this wisely only if it can provide additional
1275 // information helpful to a user.
1276 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1277 if (MS.TrackOrigins <= 1)
1278 return V;
1279 return IRB.CreateCall(Callee: MS.MsanChainOriginFn, Args: V);
1280 }
1281
1282 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1283 const DataLayout &DL = F.getDataLayout();
1284 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
1285 if (IntptrSize == kOriginSize)
1286 return Origin;
1287 assert(IntptrSize == kOriginSize * 2);
1288 Origin = IRB.CreateIntCast(V: Origin, DestTy: MS.IntptrTy, /* isSigned */ false);
1289 return IRB.CreateOr(LHS: Origin, RHS: IRB.CreateShl(LHS: Origin, RHS: kOriginSize * 8));
1290 }
1291
1292 /// Fill memory range with the given origin value.
1293 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1294 TypeSize TS, Align Alignment) {
1295 const DataLayout &DL = F.getDataLayout();
1296 const Align IntptrAlignment = DL.getABITypeAlign(Ty: MS.IntptrTy);
1297 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
1298 assert(IntptrAlignment >= kMinOriginAlignment);
1299 assert(IntptrSize >= kOriginSize);
1300
1301 // Note: The loop based formation works for fixed length vectors too,
1302 // however we prefer to unroll and specialize alignment below.
1303 if (TS.isScalable()) {
1304 Value *Size = IRB.CreateTypeSize(Ty: MS.IntptrTy, Size: TS);
1305 Value *RoundUp =
1306 IRB.CreateAdd(LHS: Size, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kOriginSize - 1));
1307 Value *End =
1308 IRB.CreateUDiv(LHS: RoundUp, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kOriginSize));
1309 auto [InsertPt, Index] =
1310 SplitBlockAndInsertSimpleForLoop(End, SplitBefore: IRB.GetInsertPoint());
1311 IRB.SetInsertPoint(InsertPt);
1312
1313 Value *GEP = IRB.CreateGEP(Ty: MS.OriginTy, Ptr: OriginPtr, IdxList: Index);
1314 IRB.CreateAlignedStore(Val: Origin, Ptr: GEP, Align: kMinOriginAlignment);
1315 return;
1316 }
1317
1318 unsigned Size = TS.getFixedValue();
1319
1320 unsigned Ofs = 0;
1321 Align CurrentAlignment = Alignment;
1322 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1323 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1324 Value *IntptrOriginPtr = IRB.CreatePointerCast(V: OriginPtr, DestTy: MS.PtrTy);
1325 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1326 Value *Ptr = i ? IRB.CreateConstGEP1_32(Ty: MS.IntptrTy, Ptr: IntptrOriginPtr, Idx0: i)
1327 : IntptrOriginPtr;
1328 IRB.CreateAlignedStore(Val: IntptrOrigin, Ptr, Align: CurrentAlignment);
1329 Ofs += IntptrSize / kOriginSize;
1330 CurrentAlignment = IntptrAlignment;
1331 }
1332 }
1333
1334 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1335 Value *GEP =
1336 i ? IRB.CreateConstGEP1_32(Ty: MS.OriginTy, Ptr: OriginPtr, Idx0: i) : OriginPtr;
1337 IRB.CreateAlignedStore(Val: Origin, Ptr: GEP, Align: CurrentAlignment);
1338 CurrentAlignment = kMinOriginAlignment;
1339 }
1340 }
1341
1342 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1343 Value *OriginPtr, Align Alignment) {
1344 const DataLayout &DL = F.getDataLayout();
1345 const Align OriginAlignment = std::max(a: kMinOriginAlignment, b: Alignment);
1346 TypeSize StoreSize = DL.getTypeStoreSize(Ty: Shadow->getType());
1347 // ZExt cannot convert between vector and scalar
1348 Value *ConvertedShadow = convertShadowToScalar(V: Shadow, IRB);
1349 if (auto *ConstantShadow = dyn_cast<Constant>(Val: ConvertedShadow)) {
1350 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1351 // Origin is not needed: value is initialized or const shadow is
1352 // ignored.
1353 return;
1354 }
1355 if (llvm::isKnownNonZero(V: ConvertedShadow, Q: DL)) {
1356 // Copy origin as the value is definitely uninitialized.
1357 paintOrigin(IRB, Origin: updateOrigin(V: Origin, IRB), OriginPtr, TS: StoreSize,
1358 Alignment: OriginAlignment);
1359 return;
1360 }
1361 // Fallback to runtime check, which still can be optimized out later.
1362 }
1363
1364 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(Ty: ConvertedShadow->getType());
1365 unsigned SizeIndex = TypeSizeToSizeIndex(TS: TypeSizeInBits);
1366 if (instrumentWithCalls(V: ConvertedShadow) &&
1367 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1368 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1369 Value *ConvertedShadow2 =
1370 IRB.CreateZExt(V: ConvertedShadow, DestTy: IRB.getIntNTy(N: 8 * (1 << SizeIndex)));
1371 CallBase *CB = IRB.CreateCall(Callee: Fn, Args: {ConvertedShadow2, Addr, Origin});
1372 CB->addParamAttr(ArgNo: 0, Kind: Attribute::ZExt);
1373 CB->addParamAttr(ArgNo: 2, Kind: Attribute::ZExt);
1374 } else {
1375 Value *Cmp = convertToBool(V: ConvertedShadow, IRB, name: "_mscmp");
1376 Instruction *CheckTerm = SplitBlockAndInsertIfThen(
1377 Cond: Cmp, SplitBefore: &*IRB.GetInsertPoint(), Unreachable: false, BranchWeights: MS.OriginStoreWeights);
1378 IRBuilder<> IRBNew(CheckTerm);
1379 paintOrigin(IRB&: IRBNew, Origin: updateOrigin(V: Origin, IRB&: IRBNew), OriginPtr, TS: StoreSize,
1380 Alignment: OriginAlignment);
1381 }
1382 }
1383
1384 void materializeStores() {
1385 for (StoreInst *SI : StoreList) {
1386 IRBuilder<> IRB(SI);
1387 Value *Val = SI->getValueOperand();
1388 Value *Addr = SI->getPointerOperand();
1389 Value *Shadow = SI->isAtomic() ? getCleanShadow(V: Val) : getShadow(V: Val);
1390 Value *ShadowPtr, *OriginPtr;
1391 Type *ShadowTy = Shadow->getType();
1392 const Align Alignment = SI->getAlign();
1393 const Align OriginAlignment = std::max(a: kMinOriginAlignment, b: Alignment);
1394 std::tie(args&: ShadowPtr, args&: OriginPtr) =
1395 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1396
1397 [[maybe_unused]] StoreInst *NewSI =
1398 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowPtr, Align: Alignment);
1399 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1400
1401 if (SI->isAtomic())
1402 SI->setOrdering(addReleaseOrdering(a: SI->getOrdering()));
1403
1404 if (MS.TrackOrigins && !SI->isAtomic())
1405 storeOrigin(IRB, Addr, Shadow, Origin: getOrigin(V: Val), OriginPtr,
1406 Alignment: OriginAlignment);
1407 }
1408 }
1409
1410 // Returns true if Debug Location corresponds to multiple warnings.
1411 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1412 if (MS.TrackOrigins < 2)
1413 return false;
1414
1415 if (LazyWarningDebugLocationCount.empty())
1416 for (const auto &I : InstrumentationList)
1417 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1418
1419 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1420 }
1421
1422 /// Helper function to insert a warning at IRB's current insert point.
1423 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1424 if (!Origin)
1425 Origin = (Value *)IRB.getInt32(C: 0);
1426 assert(Origin->getType()->isIntegerTy());
1427
1428 if (shouldDisambiguateWarningLocation(DebugLoc: IRB.getCurrentDebugLocation())) {
1429 // Try to create additional origin with debug info of the last origin
1430 // instruction. It may provide additional information to the user.
1431 if (Instruction *OI = dyn_cast_or_null<Instruction>(Val: Origin)) {
1432 assert(MS.TrackOrigins);
1433 auto NewDebugLoc = OI->getDebugLoc();
1434 // Origin update with missing or the same debug location provides no
1435 // additional value.
1436 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1437 // Insert update just before the check, so we call runtime only just
1438 // before the report.
1439 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1440 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1441 Origin = updateOrigin(V: Origin, IRB&: IRBOrigin);
1442 }
1443 }
1444 }
1445
1446 if (MS.CompileKernel || MS.TrackOrigins)
1447 IRB.CreateCall(Callee: MS.WarningFn, Args: Origin)->setCannotMerge();
1448 else
1449 IRB.CreateCall(Callee: MS.WarningFn)->setCannotMerge();
1450 // FIXME: Insert UnreachableInst if !MS.Recover?
1451 // This may invalidate some of the following checks and needs to be done
1452 // at the very end.
1453 }
1454
1455 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1456 Value *Origin) {
1457 const DataLayout &DL = F.getDataLayout();
1458 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(Ty: ConvertedShadow->getType());
1459 unsigned SizeIndex = TypeSizeToSizeIndex(TS: TypeSizeInBits);
1460 if (instrumentWithCalls(V: ConvertedShadow) && !MS.CompileKernel) {
1461 // ZExt cannot convert between vector and scalar
1462 ConvertedShadow = convertShadowToScalar(V: ConvertedShadow, IRB);
1463 Value *ConvertedShadow2 =
1464 IRB.CreateZExt(V: ConvertedShadow, DestTy: IRB.getIntNTy(N: 8 * (1 << SizeIndex)));
1465
1466 if (SizeIndex < kNumberOfAccessSizes) {
1467 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1468 CallBase *CB = IRB.CreateCall(
1469 Callee: Fn,
1470 Args: {ConvertedShadow2,
1471 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(C: 0)});
1472 CB->addParamAttr(ArgNo: 0, Kind: Attribute::ZExt);
1473 CB->addParamAttr(ArgNo: 1, Kind: Attribute::ZExt);
1474 } else {
1475 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1476 Value *ShadowAlloca = IRB.CreateAlloca(Ty: ConvertedShadow2->getType(), AddrSpace: 0u);
1477 IRB.CreateStore(Val: ConvertedShadow2, Ptr: ShadowAlloca);
1478 unsigned ShadowSize = DL.getTypeAllocSize(Ty: ConvertedShadow2->getType());
1479 CallBase *CB = IRB.CreateCall(
1480 Callee: Fn,
1481 Args: {ShadowAlloca, ConstantInt::get(Ty: IRB.getInt64Ty(), V: ShadowSize),
1482 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(C: 0)});
1483 CB->addParamAttr(ArgNo: 1, Kind: Attribute::ZExt);
1484 CB->addParamAttr(ArgNo: 2, Kind: Attribute::ZExt);
1485 }
1486 } else {
1487 Value *Cmp = convertToBool(V: ConvertedShadow, IRB, name: "_mscmp");
1488 Instruction *CheckTerm = SplitBlockAndInsertIfThen(
1489 Cond: Cmp, SplitBefore: &*IRB.GetInsertPoint(),
1490 /* Unreachable */ !MS.Recover, BranchWeights: MS.ColdCallWeights);
1491
1492 IRB.SetInsertPoint(CheckTerm);
1493 insertWarningFn(IRB, Origin);
1494 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1495 }
1496 }
1497
1498 void materializeInstructionChecks(
1499 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1500 const DataLayout &DL = F.getDataLayout();
1501 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1502 // correct origin.
1503 bool Combine = !MS.TrackOrigins;
1504 Instruction *Instruction = InstructionChecks.front().OrigIns;
1505 Value *Shadow = nullptr;
1506 for (const auto &ShadowData : InstructionChecks) {
1507 assert(ShadowData.OrigIns == Instruction);
1508 IRBuilder<> IRB(Instruction);
1509
1510 Value *ConvertedShadow = ShadowData.Shadow;
1511
1512 if (auto *ConstantShadow = dyn_cast<Constant>(Val: ConvertedShadow)) {
1513 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1514 // Skip, value is initialized or const shadow is ignored.
1515 continue;
1516 }
1517 if (llvm::isKnownNonZero(V: ConvertedShadow, Q: DL)) {
1518 // Report as the value is definitely uninitialized.
1519 insertWarningFn(IRB, Origin: ShadowData.Origin);
1520 if (!MS.Recover)
1521 return; // Always fail and stop here, not need to check the rest.
1522 // Skip entire instruction,
1523 continue;
1524 }
1525 // Fallback to runtime check, which still can be optimized out later.
1526 }
1527
1528 if (!Combine) {
1529 materializeOneCheck(IRB, ConvertedShadow, Origin: ShadowData.Origin);
1530 continue;
1531 }
1532
1533 if (!Shadow) {
1534 Shadow = ConvertedShadow;
1535 continue;
1536 }
1537
1538 Shadow = convertToBool(V: Shadow, IRB, name: "_mscmp");
1539 ConvertedShadow = convertToBool(V: ConvertedShadow, IRB, name: "_mscmp");
1540 Shadow = IRB.CreateOr(LHS: Shadow, RHS: ConvertedShadow, Name: "_msor");
1541 }
1542
1543 if (Shadow) {
1544 assert(Combine);
1545 IRBuilder<> IRB(Instruction);
1546 materializeOneCheck(IRB, ConvertedShadow: Shadow, Origin: nullptr);
1547 }
1548 }
1549
1550 static bool isAArch64SVCount(Type *Ty) {
1551 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Val: Ty))
1552 return TTy->getName() == "aarch64.svcount";
1553 return false;
1554 }
1555
1556 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1557 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1558 static bool isScalableNonVectorType(Type *Ty) {
1559 if (!isAArch64SVCount(Ty))
1560 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1561 << "\n");
1562
1563 return Ty->isScalableTy() && !isa<VectorType>(Val: Ty);
1564 }
1565
1566 void materializeChecks() {
1567#ifndef NDEBUG
1568 // For assert below.
1569 SmallPtrSet<Instruction *, 16> Done;
1570#endif
1571
1572 for (auto I = InstrumentationList.begin();
1573 I != InstrumentationList.end();) {
1574 auto OrigIns = I->OrigIns;
1575 // Checks are grouped by the original instruction. We call all
1576 // `insertShadowCheck` for an instruction at once.
1577 assert(Done.insert(OrigIns).second);
1578 auto J = std::find_if(first: I + 1, last: InstrumentationList.end(),
1579 pred: [OrigIns](const ShadowOriginAndInsertPoint &R) {
1580 return OrigIns != R.OrigIns;
1581 });
1582 // Process all checks of instruction at once.
1583 materializeInstructionChecks(InstructionChecks: ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1584 I = J;
1585 }
1586
1587 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1588 }
1589
1590 // Returns the last instruction in the new prologue
1591 void insertKmsanPrologue(IRBuilder<> &IRB) {
1592 Value *ContextState = IRB.CreateCall(Callee: MS.MsanGetContextStateFn, Args: {});
1593 Constant *Zero = IRB.getInt32(C: 0);
1594 MS.ParamTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1595 IdxList: {Zero, IRB.getInt32(C: 0)}, Name: "param_shadow");
1596 MS.RetvalTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1597 IdxList: {Zero, IRB.getInt32(C: 1)}, Name: "retval_shadow");
1598 MS.VAArgTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1599 IdxList: {Zero, IRB.getInt32(C: 2)}, Name: "va_arg_shadow");
1600 MS.VAArgOriginTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1601 IdxList: {Zero, IRB.getInt32(C: 3)}, Name: "va_arg_origin");
1602 MS.VAArgOverflowSizeTLS =
1603 IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1604 IdxList: {Zero, IRB.getInt32(C: 4)}, Name: "va_arg_overflow_size");
1605 MS.ParamOriginTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1606 IdxList: {Zero, IRB.getInt32(C: 5)}, Name: "param_origin");
1607 MS.RetvalOriginTLS =
1608 IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1609 IdxList: {Zero, IRB.getInt32(C: 6)}, Name: "retval_origin");
1610 if (MS.TargetTriple.getArch() == Triple::systemz)
1611 MS.MsanMetadataAlloca = IRB.CreateAlloca(Ty: MS.MsanMetadata, AddrSpace: 0u);
1612 }
1613
1614 /// Add MemorySanitizer instrumentation to a function.
1615 bool runOnFunction() {
1616 // Iterate all BBs in depth-first order and create shadow instructions
1617 // for all instructions (where applicable).
1618 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1619 for (BasicBlock *BB : depth_first(G: FnPrologueEnd->getParent()))
1620 visit(BB&: *BB);
1621
1622 // `visit` above only collects instructions. Process them after iterating
1623 // CFG to avoid requirement on CFG transformations.
1624 for (Instruction *I : Instructions)
1625 InstVisitor<MemorySanitizerVisitor>::visit(I&: *I);
1626
1627 // Finalize PHI nodes.
1628 for (PHINode *PN : ShadowPHINodes) {
1629 PHINode *PNS = cast<PHINode>(Val: getShadow(V: PN));
1630 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(Val: getOrigin(V: PN)) : nullptr;
1631 size_t NumValues = PN->getNumIncomingValues();
1632 for (size_t v = 0; v < NumValues; v++) {
1633 PNS->addIncoming(V: getShadow(I: PN, i: v), BB: PN->getIncomingBlock(i: v));
1634 if (PNO)
1635 PNO->addIncoming(V: getOrigin(I: PN, i: v), BB: PN->getIncomingBlock(i: v));
1636 }
1637 }
1638
1639 VAHelper->finalizeInstrumentation();
1640
1641 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1642 // instrumenting only allocas.
1643 if (ClHandleLifetimeIntrinsics) {
1644 for (auto Item : LifetimeStartList) {
1645 instrumentAlloca(I&: *Item.second, InsPoint: Item.first);
1646 AllocaSet.remove(X: Item.second);
1647 }
1648 }
1649 // Poison the allocas for which we didn't instrument the corresponding
1650 // lifetime intrinsics.
1651 for (AllocaInst *AI : AllocaSet)
1652 instrumentAlloca(I&: *AI);
1653
1654 // Insert shadow value checks.
1655 materializeChecks();
1656
1657 // Delayed instrumentation of StoreInst.
1658 // This may not add new address checks.
1659 materializeStores();
1660
1661 return true;
1662 }
1663
1664 /// Compute the shadow type that corresponds to a given Value.
1665 Type *getShadowTy(Value *V) { return getShadowTy(OrigTy: V->getType()); }
1666
1667 /// Compute the shadow type that corresponds to a given Type.
1668 Type *getShadowTy(Type *OrigTy) {
1669 if (!OrigTy->isSized()) {
1670 return nullptr;
1671 }
1672 // For integer type, shadow is the same as the original type.
1673 // This may return weird-sized types like i1.
1674 if (IntegerType *IT = dyn_cast<IntegerType>(Val: OrigTy))
1675 return IT;
1676 const DataLayout &DL = F.getDataLayout();
1677 if (VectorType *VT = dyn_cast<VectorType>(Val: OrigTy)) {
1678 uint32_t EltSize = DL.getTypeSizeInBits(Ty: VT->getElementType());
1679 return VectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: EltSize),
1680 EC: VT->getElementCount());
1681 }
1682 if (ArrayType *AT = dyn_cast<ArrayType>(Val: OrigTy)) {
1683 return ArrayType::get(ElementType: getShadowTy(OrigTy: AT->getElementType()),
1684 NumElements: AT->getNumElements());
1685 }
1686 if (StructType *ST = dyn_cast<StructType>(Val: OrigTy)) {
1687 SmallVector<Type *, 4> Elements;
1688 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1689 Elements.push_back(Elt: getShadowTy(OrigTy: ST->getElementType(N: i)));
1690 StructType *Res = StructType::get(Context&: *MS.C, Elements, isPacked: ST->isPacked());
1691 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1692 return Res;
1693 }
1694 if (isScalableNonVectorType(Ty: OrigTy)) {
1695 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1696 << "\n");
1697 return OrigTy;
1698 }
1699
1700 uint32_t TypeSize = DL.getTypeSizeInBits(Ty: OrigTy);
1701 return IntegerType::get(C&: *MS.C, NumBits: TypeSize);
1702 }
1703
1704 /// Extract combined shadow of struct elements as a bool
1705 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1706 IRBuilder<> &IRB) {
1707 Value *FalseVal = IRB.getIntN(/* width */ N: 1, /* value */ C: 0);
1708 Value *Aggregator = FalseVal;
1709
1710 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1711 // Combine by ORing together each element's bool shadow
1712 Value *ShadowItem = IRB.CreateExtractValue(Agg: Shadow, Idxs: Idx);
1713 Value *ShadowBool = convertToBool(V: ShadowItem, IRB);
1714
1715 if (Aggregator != FalseVal)
1716 Aggregator = IRB.CreateOr(LHS: Aggregator, RHS: ShadowBool);
1717 else
1718 Aggregator = ShadowBool;
1719 }
1720
1721 return Aggregator;
1722 }
1723
1724 // Extract combined shadow of array elements
1725 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1726 IRBuilder<> &IRB) {
1727 if (!Array->getNumElements())
1728 return IRB.getIntN(/* width */ N: 1, /* value */ C: 0);
1729
1730 Value *FirstItem = IRB.CreateExtractValue(Agg: Shadow, Idxs: 0);
1731 Value *Aggregator = convertShadowToScalar(V: FirstItem, IRB);
1732
1733 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1734 Value *ShadowItem = IRB.CreateExtractValue(Agg: Shadow, Idxs: Idx);
1735 Value *ShadowInner = convertShadowToScalar(V: ShadowItem, IRB);
1736 Aggregator = IRB.CreateOr(LHS: Aggregator, RHS: ShadowInner);
1737 }
1738 return Aggregator;
1739 }
1740
1741 /// Convert a shadow value to it's flattened variant. The resulting
1742 /// shadow may not necessarily have the same bit width as the input
1743 /// value, but it will always be comparable to zero.
1744 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1745 if (StructType *Struct = dyn_cast<StructType>(Val: V->getType()))
1746 return collapseStructShadow(Struct, Shadow: V, IRB);
1747 if (ArrayType *Array = dyn_cast<ArrayType>(Val: V->getType()))
1748 return collapseArrayShadow(Array, Shadow: V, IRB);
1749 if (isa<VectorType>(Val: V->getType())) {
1750 if (isa<ScalableVectorType>(Val: V->getType()))
1751 return convertShadowToScalar(V: IRB.CreateOrReduce(Src: V), IRB);
1752 unsigned BitWidth =
1753 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1754 return IRB.CreateBitCast(V, DestTy: IntegerType::get(C&: *MS.C, NumBits: BitWidth));
1755 }
1756 return V;
1757 }
1758
1759 // Convert a scalar value to an i1 by comparing with 0
1760 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1761 Type *VTy = V->getType();
1762 if (!VTy->isIntegerTy())
1763 return convertToBool(V: convertShadowToScalar(V, IRB), IRB, name);
1764 if (VTy->getIntegerBitWidth() == 1)
1765 // Just converting a bool to a bool, so do nothing.
1766 return V;
1767 return IRB.CreateICmpNE(LHS: V, RHS: ConstantInt::get(Ty: VTy, V: 0), Name: name);
1768 }
1769
1770 Type *ptrToIntPtrType(Type *PtrTy) const {
1771 if (VectorType *VectTy = dyn_cast<VectorType>(Val: PtrTy)) {
1772 return VectorType::get(ElementType: ptrToIntPtrType(PtrTy: VectTy->getElementType()),
1773 EC: VectTy->getElementCount());
1774 }
1775 assert(PtrTy->isIntOrPtrTy());
1776 return MS.IntptrTy;
1777 }
1778
1779 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1780 if (VectorType *VectTy = dyn_cast<VectorType>(Val: IntPtrTy)) {
1781 return VectorType::get(
1782 ElementType: getPtrToShadowPtrType(IntPtrTy: VectTy->getElementType(), ShadowTy),
1783 EC: VectTy->getElementCount());
1784 }
1785 assert(IntPtrTy == MS.IntptrTy);
1786 return MS.PtrTy;
1787 }
1788
1789 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1790 if (VectorType *VectTy = dyn_cast<VectorType>(Val: IntPtrTy)) {
1791 return ConstantVector::getSplat(
1792 EC: VectTy->getElementCount(),
1793 Elt: constToIntPtr(IntPtrTy: VectTy->getElementType(), C));
1794 }
1795 assert(IntPtrTy == MS.IntptrTy);
1796 // TODO: Avoid implicit trunc?
1797 // See https://github.com/llvm/llvm-project/issues/112510.
1798 return ConstantInt::get(Ty: MS.IntptrTy, V: C, /*IsSigned=*/false,
1799 /*ImplicitTrunc=*/true);
1800 }
1801
1802 /// Returns the integer shadow offset that corresponds to a given
1803 /// application address, whereby:
1804 ///
1805 /// Offset = (Addr & ~AndMask) ^ XorMask
1806 /// Shadow = ShadowBase + Offset
1807 /// Origin = (OriginBase + Offset) & ~Alignment
1808 ///
1809 /// Note: for efficiency, many shadow mappings only require use the XorMask
1810 /// and OriginBase; the AndMask and ShadowBase are often zero.
1811 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1812 Type *IntptrTy = ptrToIntPtrType(PtrTy: Addr->getType());
1813 Value *OffsetLong = IRB.CreatePointerCast(V: Addr, DestTy: IntptrTy);
1814
1815 if (uint64_t AndMask = MS.MapParams->AndMask)
1816 OffsetLong = IRB.CreateAnd(LHS: OffsetLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: ~AndMask));
1817
1818 if (uint64_t XorMask = MS.MapParams->XorMask)
1819 OffsetLong = IRB.CreateXor(LHS: OffsetLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: XorMask));
1820 return OffsetLong;
1821 }
1822
1823 /// Compute the shadow and origin addresses corresponding to a given
1824 /// application address.
1825 ///
1826 /// Shadow = ShadowBase + Offset
1827 /// Origin = (OriginBase + Offset) & ~3ULL
1828 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1829 /// a single pointee.
1830 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1831 std::pair<Value *, Value *>
1832 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1833 MaybeAlign Alignment) {
1834 VectorType *VectTy = dyn_cast<VectorType>(Val: Addr->getType());
1835 if (!VectTy) {
1836 assert(Addr->getType()->isPointerTy());
1837 } else {
1838 assert(VectTy->getElementType()->isPointerTy());
1839 }
1840 Type *IntptrTy = ptrToIntPtrType(PtrTy: Addr->getType());
1841 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1842 Value *ShadowLong = ShadowOffset;
1843 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1844 ShadowLong =
1845 IRB.CreateAdd(LHS: ShadowLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: ShadowBase));
1846 }
1847 Value *ShadowPtr = IRB.CreateIntToPtr(
1848 V: ShadowLong, DestTy: getPtrToShadowPtrType(IntPtrTy: IntptrTy, ShadowTy));
1849
1850 Value *OriginPtr = nullptr;
1851 if (MS.TrackOrigins) {
1852 Value *OriginLong = ShadowOffset;
1853 uint64_t OriginBase = MS.MapParams->OriginBase;
1854 if (OriginBase != 0)
1855 OriginLong =
1856 IRB.CreateAdd(LHS: OriginLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: OriginBase));
1857 if (!Alignment || *Alignment < kMinOriginAlignment) {
1858 uint64_t Mask = kMinOriginAlignment.value() - 1;
1859 OriginLong = IRB.CreateAnd(LHS: OriginLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: ~Mask));
1860 }
1861 OriginPtr = IRB.CreateIntToPtr(
1862 V: OriginLong, DestTy: getPtrToShadowPtrType(IntPtrTy: IntptrTy, ShadowTy: MS.OriginTy));
1863 }
1864 return std::make_pair(x&: ShadowPtr, y&: OriginPtr);
1865 }
1866
1867 template <typename... ArgsTy>
1868 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1869 ArgsTy... Args) {
1870 if (MS.TargetTriple.getArch() == Triple::systemz) {
1871 IRB.CreateCall(Callee,
1872 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1873 return IRB.CreateLoad(Ty: MS.MsanMetadata, Ptr: MS.MsanMetadataAlloca);
1874 }
1875
1876 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1877 }
1878
1879 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1880 IRBuilder<> &IRB,
1881 Type *ShadowTy,
1882 bool isStore) {
1883 Value *ShadowOriginPtrs;
1884 const DataLayout &DL = F.getDataLayout();
1885 TypeSize Size = DL.getTypeStoreSize(Ty: ShadowTy);
1886
1887 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, size: Size);
1888 Value *AddrCast = IRB.CreatePointerCast(V: Addr, DestTy: MS.PtrTy);
1889 if (Getter) {
1890 ShadowOriginPtrs = createMetadataCall(IRB, Callee: Getter, Args: AddrCast);
1891 } else {
1892 Value *SizeVal = ConstantInt::get(Ty: MS.IntptrTy, V: Size);
1893 ShadowOriginPtrs = createMetadataCall(
1894 IRB,
1895 Callee: isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1896 Args: AddrCast, Args: SizeVal);
1897 }
1898 Value *ShadowPtr = IRB.CreateExtractValue(Agg: ShadowOriginPtrs, Idxs: 0);
1899 ShadowPtr = IRB.CreatePointerCast(V: ShadowPtr, DestTy: MS.PtrTy);
1900 Value *OriginPtr = IRB.CreateExtractValue(Agg: ShadowOriginPtrs, Idxs: 1);
1901
1902 return std::make_pair(x&: ShadowPtr, y&: OriginPtr);
1903 }
1904
1905 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1906 /// a single pointee.
1907 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1908 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1909 IRBuilder<> &IRB,
1910 Type *ShadowTy,
1911 bool isStore) {
1912 VectorType *VectTy = dyn_cast<VectorType>(Val: Addr->getType());
1913 if (!VectTy) {
1914 assert(Addr->getType()->isPointerTy());
1915 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1916 }
1917
1918 // TODO: Support callbacs with vectors of addresses.
1919 unsigned NumElements = cast<FixedVectorType>(Val: VectTy)->getNumElements();
1920 Value *ShadowPtrs = ConstantInt::getNullValue(
1921 Ty: FixedVectorType::get(ElementType: IRB.getPtrTy(), NumElts: NumElements));
1922 Value *OriginPtrs = nullptr;
1923 if (MS.TrackOrigins)
1924 OriginPtrs = ConstantInt::getNullValue(
1925 Ty: FixedVectorType::get(ElementType: IRB.getPtrTy(), NumElts: NumElements));
1926 for (unsigned i = 0; i < NumElements; ++i) {
1927 Value *OneAddr =
1928 IRB.CreateExtractElement(Vec: Addr, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
1929 auto [ShadowPtr, OriginPtr] =
1930 getShadowOriginPtrKernelNoVec(Addr: OneAddr, IRB, ShadowTy, isStore);
1931
1932 ShadowPtrs = IRB.CreateInsertElement(
1933 Vec: ShadowPtrs, NewElt: ShadowPtr, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
1934 if (MS.TrackOrigins)
1935 OriginPtrs = IRB.CreateInsertElement(
1936 Vec: OriginPtrs, NewElt: OriginPtr, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
1937 }
1938 return {ShadowPtrs, OriginPtrs};
1939 }
1940
1941 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1942 Type *ShadowTy,
1943 MaybeAlign Alignment,
1944 bool isStore) {
1945 if (MS.CompileKernel)
1946 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1947 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1948 }
1949
1950 /// Compute the shadow address for a given function argument.
1951 ///
1952 /// Shadow = ParamTLS+ArgOffset.
1953 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1954 return IRB.CreatePtrAdd(Ptr: MS.ParamTLS,
1955 Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset), Name: "_msarg");
1956 }
1957
1958 /// Compute the origin address for a given function argument.
1959 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1960 if (!MS.TrackOrigins)
1961 return nullptr;
1962 return IRB.CreatePtrAdd(Ptr: MS.ParamOriginTLS,
1963 Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset),
1964 Name: "_msarg_o");
1965 }
1966
1967 /// Compute the shadow address for a retval.
1968 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1969 return IRB.CreatePointerCast(V: MS.RetvalTLS, DestTy: IRB.getPtrTy(AddrSpace: 0), Name: "_msret");
1970 }
1971
1972 /// Compute the origin address for a retval.
1973 Value *getOriginPtrForRetval() {
1974 // We keep a single origin for the entire retval. Might be too optimistic.
1975 return MS.RetvalOriginTLS;
1976 }
1977
1978 /// Set SV to be the shadow value for V.
1979 void setShadow(Value *V, Value *SV) {
1980 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1981 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1982 }
1983
1984 /// Set Origin to be the origin value for V.
1985 void setOrigin(Value *V, Value *Origin) {
1986 if (!MS.TrackOrigins)
1987 return;
1988 assert(!OriginMap.count(V) && "Values may only have one origin");
1989 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1990 OriginMap[V] = Origin;
1991 }
1992
1993 Constant *getCleanShadow(Type *OrigTy) {
1994 Type *ShadowTy = getShadowTy(OrigTy);
1995 if (!ShadowTy)
1996 return nullptr;
1997 return Constant::getNullValue(Ty: ShadowTy);
1998 }
1999
2000 /// Create a clean shadow value for a given value.
2001 ///
2002 /// Clean shadow (all zeroes) means all bits of the value are defined
2003 /// (initialized).
2004 Constant *getCleanShadow(Value *V) { return getCleanShadow(OrigTy: V->getType()); }
2005
2006 /// Create a dirty shadow of a given shadow type.
2007 Constant *getPoisonedShadow(Type *ShadowTy) {
2008 assert(ShadowTy);
2009 if (isa<IntegerType>(Val: ShadowTy) || isa<VectorType>(Val: ShadowTy))
2010 return Constant::getAllOnesValue(Ty: ShadowTy);
2011 if (ArrayType *AT = dyn_cast<ArrayType>(Val: ShadowTy)) {
2012 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2013 getPoisonedShadow(ShadowTy: AT->getElementType()));
2014 return ConstantArray::get(T: AT, V: Vals);
2015 }
2016 if (StructType *ST = dyn_cast<StructType>(Val: ShadowTy)) {
2017 SmallVector<Constant *, 4> Vals;
2018 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2019 Vals.push_back(Elt: getPoisonedShadow(ShadowTy: ST->getElementType(N: i)));
2020 return ConstantStruct::get(T: ST, V: Vals);
2021 }
2022 llvm_unreachable("Unexpected shadow type");
2023 }
2024
2025 /// Create a dirty shadow for a given value.
2026 Constant *getPoisonedShadow(Value *V) {
2027 Type *ShadowTy = getShadowTy(V);
2028 if (!ShadowTy)
2029 return nullptr;
2030 return getPoisonedShadow(ShadowTy);
2031 }
2032
2033 /// Create a clean (zero) origin.
2034 Value *getCleanOrigin() { return Constant::getNullValue(Ty: MS.OriginTy); }
2035
2036 /// Get the shadow value for a given Value.
2037 ///
2038 /// This function either returns the value set earlier with setShadow,
2039 /// or extracts if from ParamTLS (for function arguments).
2040 Value *getShadow(Value *V) {
2041 if (Instruction *I = dyn_cast<Instruction>(Val: V)) {
2042 if (!PropagateShadow || I->getMetadata(KindID: LLVMContext::MD_nosanitize))
2043 return getCleanShadow(V);
2044 // For instructions the shadow is already stored in the map.
2045 Value *Shadow = ShadowMap[V];
2046 if (!Shadow) {
2047 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2048 assert(Shadow && "No shadow for a value");
2049 }
2050 return Shadow;
2051 }
2052 // Handle fully undefined values
2053 // (partially undefined constant vectors are handled later)
2054 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(Val: V)) {
2055 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2056 : getCleanShadow(V);
2057 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2058 return AllOnes;
2059 }
2060 if (Argument *A = dyn_cast<Argument>(Val: V)) {
2061 // For arguments we compute the shadow on demand and store it in the map.
2062 Value *&ShadowPtr = ShadowMap[V];
2063 if (ShadowPtr)
2064 return ShadowPtr;
2065 Function *F = A->getParent();
2066 IRBuilder<> EntryIRB(FnPrologueEnd);
2067 unsigned ArgOffset = 0;
2068 const DataLayout &DL = F->getDataLayout();
2069 for (auto &FArg : F->args()) {
2070 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2071 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2072 ? "vscale not fully supported\n"
2073 : "Arg is not sized\n"));
2074 if (A == &FArg) {
2075 ShadowPtr = getCleanShadow(V);
2076 setOrigin(V: A, Origin: getCleanOrigin());
2077 break;
2078 }
2079 continue;
2080 }
2081
2082 unsigned Size = FArg.hasByValAttr()
2083 ? DL.getTypeAllocSize(Ty: FArg.getParamByValType())
2084 : DL.getTypeAllocSize(Ty: FArg.getType());
2085
2086 if (A == &FArg) {
2087 bool Overflow = ArgOffset + Size > kParamTLSSize;
2088 if (FArg.hasByValAttr()) {
2089 // ByVal pointer itself has clean shadow. We copy the actual
2090 // argument shadow to the underlying memory.
2091 // Figure out maximal valid memcpy alignment.
2092 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2093 Alignment: FArg.getParamAlign(), Ty: FArg.getParamByValType());
2094 Value *CpShadowPtr, *CpOriginPtr;
2095 std::tie(args&: CpShadowPtr, args&: CpOriginPtr) =
2096 getShadowOriginPtr(Addr: V, IRB&: EntryIRB, ShadowTy: EntryIRB.getInt8Ty(), Alignment: ArgAlign,
2097 /*isStore*/ true);
2098 if (!PropagateShadow || Overflow) {
2099 // ParamTLS overflow.
2100 EntryIRB.CreateMemSet(
2101 Ptr: CpShadowPtr, Val: Constant::getNullValue(Ty: EntryIRB.getInt8Ty()),
2102 Size, Align: ArgAlign);
2103 } else {
2104 Value *Base = getShadowPtrForArgument(IRB&: EntryIRB, ArgOffset);
2105 const Align CopyAlign = std::min(a: ArgAlign, b: kShadowTLSAlignment);
2106 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2107 Dst: CpShadowPtr, DstAlign: CopyAlign, Src: Base, SrcAlign: CopyAlign, Size);
2108 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2109
2110 if (MS.TrackOrigins) {
2111 Value *OriginPtr = getOriginPtrForArgument(IRB&: EntryIRB, ArgOffset);
2112 // FIXME: OriginSize should be:
2113 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2114 unsigned OriginSize = alignTo(Size, A: kMinOriginAlignment);
2115 EntryIRB.CreateMemCpy(
2116 Dst: CpOriginPtr,
2117 /* by getShadowOriginPtr */ DstAlign: kMinOriginAlignment, Src: OriginPtr,
2118 /* by origin_tls[ArgOffset] */ SrcAlign: kMinOriginAlignment,
2119 Size: OriginSize);
2120 }
2121 }
2122 }
2123
2124 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2125 (MS.EagerChecks && FArg.hasAttribute(Kind: Attribute::NoUndef))) {
2126 ShadowPtr = getCleanShadow(V);
2127 setOrigin(V: A, Origin: getCleanOrigin());
2128 } else {
2129 // Shadow over TLS
2130 Value *Base = getShadowPtrForArgument(IRB&: EntryIRB, ArgOffset);
2131 ShadowPtr = EntryIRB.CreateAlignedLoad(Ty: getShadowTy(V: &FArg), Ptr: Base,
2132 Align: kShadowTLSAlignment);
2133 if (MS.TrackOrigins) {
2134 Value *OriginPtr = getOriginPtrForArgument(IRB&: EntryIRB, ArgOffset);
2135 setOrigin(V: A, Origin: EntryIRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr));
2136 }
2137 }
2138 LLVM_DEBUG(dbgs()
2139 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2140 break;
2141 }
2142
2143 ArgOffset += alignTo(Size, A: kShadowTLSAlignment);
2144 }
2145 assert(ShadowPtr && "Could not find shadow for an argument");
2146 return ShadowPtr;
2147 }
2148
2149 // Check for partially-undefined constant vectors
2150 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2151 if (isa<FixedVectorType>(Val: V->getType()) && isa<Constant>(Val: V) &&
2152 cast<Constant>(Val: V)->containsUndefOrPoisonElement() && PropagateShadow &&
2153 PoisonUndefVectors) {
2154 unsigned NumElems = cast<FixedVectorType>(Val: V->getType())->getNumElements();
2155 SmallVector<Constant *, 32> ShadowVector(NumElems);
2156 for (unsigned i = 0; i != NumElems; ++i) {
2157 Constant *Elem = cast<Constant>(Val: V)->getAggregateElement(Elt: i);
2158 ShadowVector[i] = isa<UndefValue>(Val: Elem) ? getPoisonedShadow(V: Elem)
2159 : getCleanShadow(V: Elem);
2160 }
2161
2162 Value *ShadowConstant = ConstantVector::get(V: ShadowVector);
2163 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2164 << *ShadowConstant << "\n");
2165
2166 return ShadowConstant;
2167 }
2168
2169 // TODO: partially-undefined constant arrays, structures, and nested types
2170
2171 // For everything else the shadow is zero.
2172 return getCleanShadow(V);
2173 }
2174
2175 /// Get the shadow for i-th argument of the instruction I.
2176 Value *getShadow(Instruction *I, int i) {
2177 return getShadow(V: I->getOperand(i));
2178 }
2179
2180 /// Get the origin for a value.
2181 Value *getOrigin(Value *V) {
2182 if (!MS.TrackOrigins)
2183 return nullptr;
2184 if (!PropagateShadow || isa<Constant>(Val: V) || isa<InlineAsm>(Val: V))
2185 return getCleanOrigin();
2186 assert((isa<Instruction>(V) || isa<Argument>(V)) &&
2187 "Unexpected value type in getOrigin()");
2188 if (Instruction *I = dyn_cast<Instruction>(Val: V)) {
2189 if (I->getMetadata(KindID: LLVMContext::MD_nosanitize))
2190 return getCleanOrigin();
2191 }
2192 Value *Origin = OriginMap[V];
2193 assert(Origin && "Missing origin");
2194 return Origin;
2195 }
2196
2197 /// Get the origin for i-th argument of the instruction I.
2198 Value *getOrigin(Instruction *I, int i) {
2199 return getOrigin(V: I->getOperand(i));
2200 }
2201
2202 /// Remember the place where a shadow check should be inserted.
2203 ///
2204 /// This location will be later instrumented with a check that will print a
2205 /// UMR warning in runtime if the shadow value is not 0.
2206 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2207 assert(Shadow);
2208 if (!InsertChecks)
2209 return;
2210
2211 if (!DebugCounter::shouldExecute(Counter&: DebugInsertCheck)) {
2212 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2213 << *OrigIns << "\n");
2214 return;
2215 }
2216
2217 Type *ShadowTy = Shadow->getType();
2218 if (isScalableNonVectorType(Ty: ShadowTy)) {
2219 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2220 << " before " << *OrigIns << "\n");
2221 return;
2222 }
2223#ifndef NDEBUG
2224 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2225 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2226 "Can only insert checks for integer, vector, and aggregate shadow "
2227 "types");
2228#endif
2229 InstrumentationList.push_back(
2230 Elt: ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2231 }
2232
2233 /// Get shadow for value, and remember the place where a shadow check should
2234 /// be inserted.
2235 ///
2236 /// This location will be later instrumented with a check that will print a
2237 /// UMR warning in runtime if the value is not fully defined.
2238 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2239 assert(Val);
2240 Value *Shadow, *Origin;
2241 if (ClCheckConstantShadow) {
2242 Shadow = getShadow(V: Val);
2243 if (!Shadow)
2244 return;
2245 Origin = getOrigin(V: Val);
2246 } else {
2247 Shadow = dyn_cast_or_null<Instruction>(Val: getShadow(V: Val));
2248 if (!Shadow)
2249 return;
2250 Origin = dyn_cast_or_null<Instruction>(Val: getOrigin(V: Val));
2251 }
2252 insertCheckShadow(Shadow, Origin, OrigIns);
2253 }
2254
2255 AtomicOrdering addReleaseOrdering(AtomicOrdering a) {
2256 switch (a) {
2257 case AtomicOrdering::NotAtomic:
2258 return AtomicOrdering::NotAtomic;
2259 case AtomicOrdering::Unordered:
2260 case AtomicOrdering::Monotonic:
2261 case AtomicOrdering::Release:
2262 return AtomicOrdering::Release;
2263 case AtomicOrdering::Acquire:
2264 case AtomicOrdering::AcquireRelease:
2265 return AtomicOrdering::AcquireRelease;
2266 case AtomicOrdering::SequentiallyConsistent:
2267 return AtomicOrdering::SequentiallyConsistent;
2268 }
2269 llvm_unreachable("Unknown ordering");
2270 }
2271
2272 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2273 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2274 uint32_t OrderingTable[NumOrderings] = {};
2275
2276 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2277 OrderingTable[(int)AtomicOrderingCABI::release] =
2278 (int)AtomicOrderingCABI::release;
2279 OrderingTable[(int)AtomicOrderingCABI::consume] =
2280 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2281 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2282 (int)AtomicOrderingCABI::acq_rel;
2283 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2284 (int)AtomicOrderingCABI::seq_cst;
2285
2286 return ConstantDataVector::get(Context&: IRB.getContext(), Elts: OrderingTable);
2287 }
2288
2289 AtomicOrdering addAcquireOrdering(AtomicOrdering a) {
2290 switch (a) {
2291 case AtomicOrdering::NotAtomic:
2292 return AtomicOrdering::NotAtomic;
2293 case AtomicOrdering::Unordered:
2294 case AtomicOrdering::Monotonic:
2295 case AtomicOrdering::Acquire:
2296 return AtomicOrdering::Acquire;
2297 case AtomicOrdering::Release:
2298 case AtomicOrdering::AcquireRelease:
2299 return AtomicOrdering::AcquireRelease;
2300 case AtomicOrdering::SequentiallyConsistent:
2301 return AtomicOrdering::SequentiallyConsistent;
2302 }
2303 llvm_unreachable("Unknown ordering");
2304 }
2305
2306 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2307 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2308 uint32_t OrderingTable[NumOrderings] = {};
2309
2310 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2311 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2312 OrderingTable[(int)AtomicOrderingCABI::consume] =
2313 (int)AtomicOrderingCABI::acquire;
2314 OrderingTable[(int)AtomicOrderingCABI::release] =
2315 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2316 (int)AtomicOrderingCABI::acq_rel;
2317 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2318 (int)AtomicOrderingCABI::seq_cst;
2319
2320 return ConstantDataVector::get(Context&: IRB.getContext(), Elts: OrderingTable);
2321 }
2322
2323 // ------------------- Visitors.
2324 using InstVisitor<MemorySanitizerVisitor>::visit;
2325 void visit(Instruction &I) {
2326 if (I.getMetadata(KindID: LLVMContext::MD_nosanitize))
2327 return;
2328 // Don't want to visit if we're in the prologue
2329 if (isInPrologue(I))
2330 return;
2331 if (!DebugCounter::shouldExecute(Counter&: DebugInstrumentInstruction)) {
2332 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2333 // We still need to set the shadow and origin to clean values.
2334 setShadow(V: &I, SV: getCleanShadow(V: &I));
2335 setOrigin(V: &I, Origin: getCleanOrigin());
2336 return;
2337 }
2338
2339 Instructions.push_back(Elt: &I);
2340 }
2341
2342 /// Instrument LoadInst
2343 ///
2344 /// Loads the corresponding shadow and (optionally) origin.
2345 /// Optionally, checks that the load address is fully defined.
2346 void visitLoadInst(LoadInst &I) {
2347 assert(I.getType()->isSized() && "Load type must have size");
2348 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2349 NextNodeIRBuilder IRB(&I);
2350 Type *ShadowTy = getShadowTy(V: &I);
2351 Value *Addr = I.getPointerOperand();
2352 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2353 const Align Alignment = I.getAlign();
2354 if (PropagateShadow) {
2355 std::tie(args&: ShadowPtr, args&: OriginPtr) =
2356 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2357 setShadow(V: &I,
2358 SV: IRB.CreateAlignedLoad(Ty: ShadowTy, Ptr: ShadowPtr, Align: Alignment, Name: "_msld"));
2359 } else {
2360 setShadow(V: &I, SV: getCleanShadow(V: &I));
2361 }
2362
2363 if (ClCheckAccessAddress)
2364 insertCheckShadowOf(Val: I.getPointerOperand(), OrigIns: &I);
2365
2366 if (I.isAtomic())
2367 I.setOrdering(addAcquireOrdering(a: I.getOrdering()));
2368
2369 if (MS.TrackOrigins) {
2370 if (PropagateShadow) {
2371 const Align OriginAlignment = std::max(a: kMinOriginAlignment, b: Alignment);
2372 setOrigin(
2373 V: &I, Origin: IRB.CreateAlignedLoad(Ty: MS.OriginTy, Ptr: OriginPtr, Align: OriginAlignment));
2374 } else {
2375 setOrigin(V: &I, Origin: getCleanOrigin());
2376 }
2377 }
2378 }
2379
2380 /// Instrument StoreInst
2381 ///
2382 /// Stores the corresponding shadow and (optionally) origin.
2383 /// Optionally, checks that the store address is fully defined.
2384 void visitStoreInst(StoreInst &I) {
2385 StoreList.push_back(Elt: &I);
2386 if (ClCheckAccessAddress)
2387 insertCheckShadowOf(Val: I.getPointerOperand(), OrigIns: &I);
2388 }
2389
2390 void handleCASOrRMW(Instruction &I) {
2391 assert(isa<AtomicRMWInst>(I) || isa<AtomicCmpXchgInst>(I));
2392
2393 IRBuilder<> IRB(&I);
2394 Value *Addr = I.getOperand(i: 0);
2395 Value *Val = I.getOperand(i: 1);
2396 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, ShadowTy: getShadowTy(V: Val), Alignment: Align(1),
2397 /*isStore*/ true)
2398 .first;
2399
2400 if (ClCheckAccessAddress)
2401 insertCheckShadowOf(Val: Addr, OrigIns: &I);
2402
2403 // Only test the conditional argument of cmpxchg instruction.
2404 // The other argument can potentially be uninitialized, but we can not
2405 // detect this situation reliably without possible false positives.
2406 if (isa<AtomicCmpXchgInst>(Val: I))
2407 insertCheckShadowOf(Val, OrigIns: &I);
2408
2409 IRB.CreateStore(Val: getCleanShadow(V: Val), Ptr: ShadowPtr);
2410
2411 setShadow(V: &I, SV: getCleanShadow(V: &I));
2412 setOrigin(V: &I, Origin: getCleanOrigin());
2413 }
2414
2415 void visitAtomicRMWInst(AtomicRMWInst &I) {
2416 handleCASOrRMW(I);
2417 I.setOrdering(addReleaseOrdering(a: I.getOrdering()));
2418 }
2419
2420 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2421 handleCASOrRMW(I);
2422 I.setSuccessOrdering(addReleaseOrdering(a: I.getSuccessOrdering()));
2423 }
2424
2425 // Vector manipulation.
2426 void visitExtractElementInst(ExtractElementInst &I) {
2427 insertCheckShadowOf(Val: I.getOperand(i_nocapture: 1), OrigIns: &I);
2428 IRBuilder<> IRB(&I);
2429 setShadow(V: &I, SV: IRB.CreateExtractElement(Vec: getShadow(I: &I, i: 0), Idx: I.getOperand(i_nocapture: 1),
2430 Name: "_msprop"));
2431 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2432 }
2433
2434 void visitInsertElementInst(InsertElementInst &I) {
2435 insertCheckShadowOf(Val: I.getOperand(i_nocapture: 2), OrigIns: &I);
2436 IRBuilder<> IRB(&I);
2437 auto *Shadow0 = getShadow(I: &I, i: 0);
2438 auto *Shadow1 = getShadow(I: &I, i: 1);
2439 setShadow(V: &I, SV: IRB.CreateInsertElement(Vec: Shadow0, NewElt: Shadow1, Idx: I.getOperand(i_nocapture: 2),
2440 Name: "_msprop"));
2441 setOriginForNaryOp(I);
2442 }
2443
2444 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2445 IRBuilder<> IRB(&I);
2446 auto *Shadow0 = getShadow(I: &I, i: 0);
2447 auto *Shadow1 = getShadow(I: &I, i: 1);
2448 setShadow(V: &I, SV: IRB.CreateShuffleVector(V1: Shadow0, V2: Shadow1, Mask: I.getShuffleMask(),
2449 Name: "_msprop"));
2450 setOriginForNaryOp(I);
2451 }
2452
2453 // Casts.
2454 void visitSExtInst(SExtInst &I) {
2455 IRBuilder<> IRB(&I);
2456 setShadow(V: &I, SV: IRB.CreateSExt(V: getShadow(I: &I, i: 0), DestTy: I.getType(), Name: "_msprop"));
2457 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2458 }
2459
2460 void visitZExtInst(ZExtInst &I) {
2461 IRBuilder<> IRB(&I);
2462 setShadow(V: &I, SV: IRB.CreateZExt(V: getShadow(I: &I, i: 0), DestTy: I.getType(), Name: "_msprop"));
2463 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2464 }
2465
2466 void visitTruncInst(TruncInst &I) {
2467 IRBuilder<> IRB(&I);
2468 setShadow(V: &I, SV: IRB.CreateTrunc(V: getShadow(I: &I, i: 0), DestTy: I.getType(), Name: "_msprop"));
2469 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2470 }
2471
2472 void visitBitCastInst(BitCastInst &I) {
2473 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2474 // a musttail call and a ret, don't instrument. New instructions are not
2475 // allowed after a musttail call.
2476 if (auto *CI = dyn_cast<CallInst>(Val: I.getOperand(i_nocapture: 0)))
2477 if (CI->isMustTailCall())
2478 return;
2479 IRBuilder<> IRB(&I);
2480 setShadow(V: &I, SV: IRB.CreateBitCast(V: getShadow(I: &I, i: 0), DestTy: getShadowTy(V: &I)));
2481 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2482 }
2483
2484 void visitPtrToIntInst(PtrToIntInst &I) {
2485 IRBuilder<> IRB(&I);
2486 setShadow(V: &I, SV: IRB.CreateIntCast(V: getShadow(I: &I, i: 0), DestTy: getShadowTy(V: &I), isSigned: false,
2487 Name: "_msprop_ptrtoint"));
2488 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2489 }
2490
2491 void visitIntToPtrInst(IntToPtrInst &I) {
2492 IRBuilder<> IRB(&I);
2493 setShadow(V: &I, SV: IRB.CreateIntCast(V: getShadow(I: &I, i: 0), DestTy: getShadowTy(V: &I), isSigned: false,
2494 Name: "_msprop_inttoptr"));
2495 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2496 }
2497
2498 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2499 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2500 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2501 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2502 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2503 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2504
2505 /// Generic handler to compute shadow for bitwise AND.
2506 ///
2507 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2508 ///
2509 /// This code is precise: it implements the rule that "And" of an initialized
2510 /// zero bit always results in an initialized value:
2511 // 1&1 => 1; 0&1 => 0; p&1 => p;
2512 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2513 // 1&p => p; 0&p => 0; p&p => p;
2514 //
2515 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2516 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2517 Value *S2) {
2518 Value *S1S2 = IRB.CreateAnd(LHS: S1, RHS: S2);
2519 Value *V1S2 = IRB.CreateAnd(LHS: V1, RHS: S2);
2520 Value *S1V2 = IRB.CreateAnd(LHS: S1, RHS: V2);
2521
2522 if (V1->getType() != S1->getType()) {
2523 V1 = IRB.CreateIntCast(V: V1, DestTy: S1->getType(), isSigned: false);
2524 V2 = IRB.CreateIntCast(V: V2, DestTy: S2->getType(), isSigned: false);
2525 }
2526
2527 return IRB.CreateOr(Ops: {S1S2, V1S2, S1V2});
2528 }
2529
2530 /// Handler for bitwise AND operator.
2531 void visitAnd(BinaryOperator &I) {
2532 IRBuilder<> IRB(&I);
2533 Value *V1 = I.getOperand(i_nocapture: 0);
2534 Value *V2 = I.getOperand(i_nocapture: 1);
2535 Value *S1 = getShadow(I: &I, i: 0);
2536 Value *S2 = getShadow(I: &I, i: 1);
2537
2538 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2539
2540 setShadow(V: &I, SV: OutShadow);
2541 setOriginForNaryOp(I);
2542 }
2543
2544 void visitOr(BinaryOperator &I) {
2545 IRBuilder<> IRB(&I);
2546 // "Or" of 1 and a poisoned value results in unpoisoned value:
2547 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2548 // 1|0 => 1; 0|0 => 0; p|0 => p;
2549 // 1|p => 1; 0|p => p; p|p => p;
2550 //
2551 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2552 //
2553 // If the "disjoint OR" property is violated, the result is poison, and
2554 // hence the entire shadow is uninitialized:
2555 // S = S | SignExt(V1 & V2 != 0)
2556 Value *S1 = getShadow(I: &I, i: 0);
2557 Value *S2 = getShadow(I: &I, i: 1);
2558 Value *V1 = I.getOperand(i_nocapture: 0);
2559 Value *V2 = I.getOperand(i_nocapture: 1);
2560 if (V1->getType() != S1->getType()) {
2561 V1 = IRB.CreateIntCast(V: V1, DestTy: S1->getType(), isSigned: false);
2562 V2 = IRB.CreateIntCast(V: V2, DestTy: S2->getType(), isSigned: false);
2563 }
2564
2565 Value *NotV1 = IRB.CreateNot(V: V1);
2566 Value *NotV2 = IRB.CreateNot(V: V2);
2567
2568 Value *S1S2 = IRB.CreateAnd(LHS: S1, RHS: S2);
2569 Value *S2NotV1 = IRB.CreateAnd(LHS: NotV1, RHS: S2);
2570 Value *S1NotV2 = IRB.CreateAnd(LHS: S1, RHS: NotV2);
2571
2572 Value *S = IRB.CreateOr(Ops: {S1S2, S2NotV1, S1NotV2});
2573
2574 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(Val: &I)->isDisjoint()) {
2575 Value *V1V2 = IRB.CreateAnd(LHS: V1, RHS: V2);
2576 Value *DisjointOrShadow = IRB.CreateSExt(
2577 V: IRB.CreateICmpNE(LHS: V1V2, RHS: getCleanShadow(V: V1V2)), DestTy: V1V2->getType());
2578 S = IRB.CreateOr(LHS: S, RHS: DisjointOrShadow, Name: "_ms_disjoint");
2579 }
2580
2581 setShadow(V: &I, SV: S);
2582 setOriginForNaryOp(I);
2583 }
2584
2585 /// Default propagation of shadow and/or origin.
2586 ///
2587 /// This class implements the general case of shadow propagation, used in all
2588 /// cases where we don't know and/or don't care about what the operation
2589 /// actually does. It converts all input shadow values to a common type
2590 /// (extending or truncating as necessary), and bitwise OR's them.
2591 ///
2592 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2593 /// fully initialized), and less prone to false positives.
2594 ///
2595 /// This class also implements the general case of origin propagation. For a
2596 /// Nary operation, result origin is set to the origin of an argument that is
2597 /// not entirely initialized. If there is more than one such arguments, the
2598 /// rightmost of them is picked. It does not matter which one is picked if all
2599 /// arguments are initialized.
2600 template <bool CombineShadow> class Combiner {
2601 Value *Shadow = nullptr;
2602 Value *Origin = nullptr;
2603 IRBuilder<> &IRB;
2604 MemorySanitizerVisitor *MSV;
2605
2606 public:
2607 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2608 : IRB(IRB), MSV(MSV) {}
2609
2610 /// Add a pair of shadow and origin values to the mix.
2611 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2612 if (CombineShadow) {
2613 assert(OpShadow);
2614 if (!Shadow)
2615 Shadow = OpShadow;
2616 else {
2617 OpShadow = MSV->CreateShadowCast(IRB, V: OpShadow, dstTy: Shadow->getType());
2618 Shadow = IRB.CreateOr(LHS: Shadow, RHS: OpShadow, Name: "_msprop");
2619 }
2620 }
2621
2622 if (MSV->MS.TrackOrigins) {
2623 assert(OpOrigin);
2624 if (!Origin) {
2625 Origin = OpOrigin;
2626 } else {
2627 Constant *ConstOrigin = dyn_cast<Constant>(Val: OpOrigin);
2628 // No point in adding something that might result in 0 origin value.
2629 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2630 Value *Cond = MSV->convertToBool(V: OpShadow, IRB);
2631 Origin = IRB.CreateSelect(C: Cond, True: OpOrigin, False: Origin);
2632 }
2633 }
2634 }
2635 return *this;
2636 }
2637
2638 /// Add an application value to the mix.
2639 Combiner &Add(Value *V) {
2640 Value *OpShadow = MSV->getShadow(V);
2641 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2642 return Add(OpShadow, OpOrigin);
2643 }
2644
2645 /// Set the current combined values as the given instruction's shadow
2646 /// and origin.
2647 void Done(Instruction *I) {
2648 if (CombineShadow) {
2649 assert(Shadow);
2650 Shadow = MSV->CreateShadowCast(IRB, V: Shadow, dstTy: MSV->getShadowTy(V: I));
2651 MSV->setShadow(V: I, SV: Shadow);
2652 }
2653 if (MSV->MS.TrackOrigins) {
2654 assert(Origin);
2655 MSV->setOrigin(V: I, Origin);
2656 }
2657 }
2658
2659 /// Store the current combined value at the specified origin
2660 /// location.
2661 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2662 if (MSV->MS.TrackOrigins) {
2663 assert(Origin);
2664 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, Alignment: kMinOriginAlignment);
2665 }
2666 }
2667 };
2668
2669 using ShadowAndOriginCombiner = Combiner<true>;
2670 using OriginCombiner = Combiner<false>;
2671
2672 /// Propagate origin for arbitrary operation.
2673 void setOriginForNaryOp(Instruction &I) {
2674 if (!MS.TrackOrigins)
2675 return;
2676 IRBuilder<> IRB(&I);
2677 OriginCombiner OC(this, IRB);
2678 for (Use &Op : I.operands())
2679 OC.Add(V: Op.get());
2680 OC.Done(I: &I);
2681 }
2682
2683 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2684 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2685 "Vector of pointers is not a valid shadow type");
2686 return Ty->isVectorTy() ? cast<FixedVectorType>(Val: Ty)->getNumElements() *
2687 Ty->getScalarSizeInBits()
2688 : Ty->getPrimitiveSizeInBits();
2689 }
2690
2691 /// Cast between two shadow types, extending or truncating as
2692 /// necessary.
2693 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2694 bool Signed = false) {
2695 Type *srcTy = V->getType();
2696 if (srcTy == dstTy)
2697 return V;
2698 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(Ty: srcTy);
2699 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(Ty: dstTy);
2700 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2701 return IRB.CreateICmpNE(LHS: V, RHS: getCleanShadow(V));
2702
2703 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2704 return IRB.CreateIntCast(V, DestTy: dstTy, isSigned: Signed);
2705 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2706 cast<VectorType>(Val: dstTy)->getElementCount() ==
2707 cast<VectorType>(Val: srcTy)->getElementCount())
2708 return IRB.CreateIntCast(V, DestTy: dstTy, isSigned: Signed);
2709 Value *V1 = IRB.CreateBitCast(V, DestTy: Type::getIntNTy(C&: *MS.C, N: srcSizeInBits));
2710 Value *V2 =
2711 IRB.CreateIntCast(V: V1, DestTy: Type::getIntNTy(C&: *MS.C, N: dstSizeInBits), isSigned: Signed);
2712 return IRB.CreateBitCast(V: V2, DestTy: dstTy);
2713 // TODO: handle struct types.
2714 }
2715
2716 /// Cast an application value to the type of its own shadow.
2717 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2718 Type *ShadowTy = getShadowTy(V);
2719 if (V->getType() == ShadowTy)
2720 return V;
2721 if (V->getType()->isPtrOrPtrVectorTy())
2722 return IRB.CreatePtrToInt(V, DestTy: ShadowTy);
2723 else
2724 return IRB.CreateBitCast(V, DestTy: ShadowTy);
2725 }
2726
2727 /// Propagate shadow for arbitrary operation.
2728 void handleShadowOr(Instruction &I) {
2729 IRBuilder<> IRB(&I);
2730 ShadowAndOriginCombiner SC(this, IRB);
2731 for (Use &Op : I.operands())
2732 SC.Add(V: Op.get());
2733 SC.Done(I: &I);
2734 }
2735
2736 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2737 // of elements.
2738 //
2739 // For example, suppose we have:
2740 // VectorA: <a0, a1, a2, a3, a4, a5>
2741 // VectorB: <b0, b1, b2, b3, b4, b5>
2742 // ReductionFactor: 3
2743 // Shards: 1
2744 // The output would be:
2745 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2746 //
2747 // If we have:
2748 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2749 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2750 // ReductionFactor: 2
2751 // Shards: 2
2752 // then a and be each have 2 "shards", resulting in the output being
2753 // interleaved:
2754 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2755 //
2756 // This is convenient for instrumenting horizontal add/sub.
2757 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2758 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2759 unsigned Shards, Value *VectorA, Value *VectorB) {
2760 assert(isa<FixedVectorType>(VectorA->getType()));
2761 unsigned NumElems =
2762 cast<FixedVectorType>(Val: VectorA->getType())->getNumElements();
2763
2764 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2765 if (VectorB) {
2766 assert(VectorA->getType() == VectorB->getType());
2767 TotalNumElems *= 2;
2768 }
2769
2770 assert(NumElems % (ReductionFactor * Shards) == 0);
2771
2772 Value *Or = nullptr;
2773
2774 IRBuilder<> IRB(&I);
2775 for (unsigned i = 0; i < ReductionFactor; i++) {
2776 SmallVector<int, 16> Mask;
2777
2778 for (unsigned j = 0; j < Shards; j++) {
2779 unsigned Offset = NumElems / Shards * j;
2780
2781 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2782 Mask.push_back(Elt: Offset + X + i);
2783
2784 if (VectorB) {
2785 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2786 Mask.push_back(Elt: NumElems + Offset + X + i);
2787 }
2788 }
2789
2790 Value *Masked;
2791 if (VectorB)
2792 Masked = IRB.CreateShuffleVector(V1: VectorA, V2: VectorB, Mask);
2793 else
2794 Masked = IRB.CreateShuffleVector(V: VectorA, Mask);
2795
2796 if (Or)
2797 Or = IRB.CreateOr(LHS: Or, RHS: Masked);
2798 else
2799 Or = Masked;
2800 }
2801
2802 return Or;
2803 }
2804
2805 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2806 /// fields.
2807 ///
2808 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2809 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2810 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2811 assert(I.arg_size() == 1 || I.arg_size() == 2);
2812
2813 assert(I.getType()->isVectorTy());
2814 assert(I.getArgOperand(0)->getType()->isVectorTy());
2815
2816 [[maybe_unused]] FixedVectorType *ParamType =
2817 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType());
2818 assert((I.arg_size() != 2) ||
2819 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2820 [[maybe_unused]] FixedVectorType *ReturnType =
2821 cast<FixedVectorType>(Val: I.getType());
2822 assert(ParamType->getNumElements() * I.arg_size() ==
2823 2 * ReturnType->getNumElements());
2824
2825 IRBuilder<> IRB(&I);
2826
2827 // Horizontal OR of shadow
2828 Value *FirstArgShadow = getShadow(I: &I, i: 0);
2829 Value *SecondArgShadow = nullptr;
2830 if (I.arg_size() == 2)
2831 SecondArgShadow = getShadow(I: &I, i: 1);
2832
2833 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2834 VectorA: FirstArgShadow, VectorB: SecondArgShadow);
2835
2836 OrShadow = CreateShadowCast(IRB, V: OrShadow, dstTy: getShadowTy(V: &I));
2837
2838 setShadow(V: &I, SV: OrShadow);
2839 setOriginForNaryOp(I);
2840 }
2841
2842 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2843 /// fields, with the parameters reinterpreted to have elements of a specified
2844 /// width. For example:
2845 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2846 /// conceptually operates on
2847 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2848 /// and can be handled with ReinterpretElemWidth == 16.
2849 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2850 int ReinterpretElemWidth) {
2851 assert(I.arg_size() == 1 || I.arg_size() == 2);
2852
2853 assert(I.getType()->isVectorTy());
2854 assert(I.getArgOperand(0)->getType()->isVectorTy());
2855
2856 FixedVectorType *ParamType =
2857 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType());
2858 assert((I.arg_size() != 2) ||
2859 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2860
2861 [[maybe_unused]] FixedVectorType *ReturnType =
2862 cast<FixedVectorType>(Val: I.getType());
2863 assert(ParamType->getNumElements() * I.arg_size() ==
2864 2 * ReturnType->getNumElements());
2865
2866 IRBuilder<> IRB(&I);
2867
2868 FixedVectorType *ReinterpretShadowTy = nullptr;
2869 assert(isAligned(Align(ReinterpretElemWidth),
2870 ParamType->getPrimitiveSizeInBits()));
2871 ReinterpretShadowTy = FixedVectorType::get(
2872 ElementType: IRB.getIntNTy(N: ReinterpretElemWidth),
2873 NumElts: ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2874
2875 // Horizontal OR of shadow
2876 Value *FirstArgShadow = getShadow(I: &I, i: 0);
2877 FirstArgShadow = IRB.CreateBitCast(V: FirstArgShadow, DestTy: ReinterpretShadowTy);
2878
2879 // If we had two parameters each with an odd number of elements, the total
2880 // number of elements is even, but we have never seen this in extant
2881 // instruction sets, so we enforce that each parameter must have an even
2882 // number of elements.
2883 assert(isAligned(
2884 Align(2),
2885 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2886
2887 Value *SecondArgShadow = nullptr;
2888 if (I.arg_size() == 2) {
2889 SecondArgShadow = getShadow(I: &I, i: 1);
2890 SecondArgShadow = IRB.CreateBitCast(V: SecondArgShadow, DestTy: ReinterpretShadowTy);
2891 }
2892
2893 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2894 VectorA: FirstArgShadow, VectorB: SecondArgShadow);
2895
2896 OrShadow = CreateShadowCast(IRB, V: OrShadow, dstTy: getShadowTy(V: &I));
2897
2898 setShadow(V: &I, SV: OrShadow);
2899 setOriginForNaryOp(I);
2900 }
2901
2902 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2903
2904 // Handle multiplication by constant.
2905 //
2906 // Handle a special case of multiplication by constant that may have one or
2907 // more zeros in the lower bits. This makes corresponding number of lower bits
2908 // of the result zero as well. We model it by shifting the other operand
2909 // shadow left by the required number of bits. Effectively, we transform
2910 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2911 // We use multiplication by 2**N instead of shift to cover the case of
2912 // multiplication by 0, which may occur in some elements of a vector operand.
2913 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2914 Value *OtherArg) {
2915 Constant *ShadowMul;
2916 Type *Ty = ConstArg->getType();
2917 if (auto *VTy = dyn_cast<VectorType>(Val: Ty)) {
2918 unsigned NumElements = cast<FixedVectorType>(Val: VTy)->getNumElements();
2919 Type *EltTy = VTy->getElementType();
2920 SmallVector<Constant *, 16> Elements;
2921 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2922 if (ConstantInt *Elt =
2923 dyn_cast<ConstantInt>(Val: ConstArg->getAggregateElement(Elt: Idx))) {
2924 const APInt &V = Elt->getValue();
2925 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2926 Elements.push_back(Elt: ConstantInt::get(Ty: EltTy, V: V2));
2927 } else {
2928 Elements.push_back(Elt: ConstantInt::get(Ty: EltTy, V: 1));
2929 }
2930 }
2931 ShadowMul = ConstantVector::get(V: Elements);
2932 } else {
2933 if (ConstantInt *Elt = dyn_cast<ConstantInt>(Val: ConstArg)) {
2934 const APInt &V = Elt->getValue();
2935 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2936 ShadowMul = ConstantInt::get(Ty, V: V2);
2937 } else {
2938 ShadowMul = ConstantInt::get(Ty, V: 1);
2939 }
2940 }
2941
2942 IRBuilder<> IRB(&I);
2943 setShadow(V: &I,
2944 SV: IRB.CreateMul(LHS: getShadow(V: OtherArg), RHS: ShadowMul, Name: "msprop_mul_cst"));
2945 setOrigin(V: &I, Origin: getOrigin(V: OtherArg));
2946 }
2947
2948 void visitMul(BinaryOperator &I) {
2949 Constant *constOp0 = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 0));
2950 Constant *constOp1 = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 1));
2951 if (constOp0 && !constOp1)
2952 handleMulByConstant(I, ConstArg: constOp0, OtherArg: I.getOperand(i_nocapture: 1));
2953 else if (constOp1 && !constOp0)
2954 handleMulByConstant(I, ConstArg: constOp1, OtherArg: I.getOperand(i_nocapture: 0));
2955 else
2956 handleShadowOr(I);
2957 }
2958
2959 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2960 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2961 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2962 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2963 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2964 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2965
2966 void handleIntegerDiv(Instruction &I) {
2967 IRBuilder<> IRB(&I);
2968 // Strict on the second argument.
2969 insertCheckShadowOf(Val: I.getOperand(i: 1), OrigIns: &I);
2970 setShadow(V: &I, SV: getShadow(I: &I, i: 0));
2971 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2972 }
2973
2974 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2975 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2976 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2977 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2978
2979 // Floating point division is side-effect free. We can not require that the
2980 // divisor is fully initialized and must propagate shadow. See PR37523.
2981 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2982 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2983
2984 /// Instrument == and != comparisons.
2985 ///
2986 /// Sometimes the comparison result is known even if some of the bits of the
2987 /// arguments are not.
2988 void handleEqualityComparison(ICmpInst &I) {
2989 IRBuilder<> IRB(&I);
2990 Value *A = I.getOperand(i_nocapture: 0);
2991 Value *B = I.getOperand(i_nocapture: 1);
2992 Value *Sa = getShadow(V: A);
2993 Value *Sb = getShadow(V: B);
2994
2995 // Get rid of pointers and vectors of pointers.
2996 // For ints (and vectors of ints), types of A and Sa match,
2997 // and this is a no-op.
2998 A = IRB.CreatePointerCast(V: A, DestTy: Sa->getType());
2999 B = IRB.CreatePointerCast(V: B, DestTy: Sb->getType());
3000
3001 // A == B <==> (C = A^B) == 0
3002 // A != B <==> (C = A^B) != 0
3003 // Sc = Sa | Sb
3004 Value *C = IRB.CreateXor(LHS: A, RHS: B);
3005 Value *Sc = IRB.CreateOr(LHS: Sa, RHS: Sb);
3006 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
3007 // Result is defined if one of the following is true
3008 // * there is a defined 1 bit in C
3009 // * C is fully defined
3010 // Si = !(C & ~Sc) && Sc
3011 Value *Zero = Constant::getNullValue(Ty: Sc->getType());
3012 Value *MinusOne = Constant::getAllOnesValue(Ty: Sc->getType());
3013 Value *LHS = IRB.CreateICmpNE(LHS: Sc, RHS: Zero);
3014 Value *RHS =
3015 IRB.CreateICmpEQ(LHS: IRB.CreateAnd(LHS: IRB.CreateXor(LHS: Sc, RHS: MinusOne), RHS: C), RHS: Zero);
3016 Value *Si = IRB.CreateAnd(LHS, RHS);
3017 Si->setName("_msprop_icmp");
3018 setShadow(V: &I, SV: Si);
3019 setOriginForNaryOp(I);
3020 }
3021
3022 /// Instrument relational comparisons.
3023 ///
3024 /// This function does exact shadow propagation for all relational
3025 /// comparisons of integers, pointers and vectors of those.
3026 /// FIXME: output seems suboptimal when one of the operands is a constant
3027 void handleRelationalComparisonExact(ICmpInst &I) {
3028 IRBuilder<> IRB(&I);
3029 Value *A = I.getOperand(i_nocapture: 0);
3030 Value *B = I.getOperand(i_nocapture: 1);
3031 Value *Sa = getShadow(V: A);
3032 Value *Sb = getShadow(V: B);
3033
3034 // Get rid of pointers and vectors of pointers.
3035 // For ints (and vectors of ints), types of A and Sa match,
3036 // and this is a no-op.
3037 A = IRB.CreatePointerCast(V: A, DestTy: Sa->getType());
3038 B = IRB.CreatePointerCast(V: B, DestTy: Sb->getType());
3039
3040 // Let [a0, a1] be the interval of possible values of A, taking into account
3041 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3042 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3043 bool IsSigned = I.isSigned();
3044
3045 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3046 if (IsSigned) {
3047 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3048 // should be preserved, if checked with `getUnsignedPredicate()`.
3049 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3050 // affected, as they are created by effectively adding/substructing from
3051 // A (or B) a value, derived from shadow, with no overflow, either
3052 // before or after sign flip.
3053 APInt MinVal =
3054 APInt::getSignedMinValue(numBits: V->getType()->getScalarSizeInBits());
3055 V = IRB.CreateXor(LHS: V, RHS: ConstantInt::get(Ty: V->getType(), V: MinVal));
3056 }
3057 // Minimize undefined bits.
3058 Value *Min = IRB.CreateAnd(LHS: V, RHS: IRB.CreateNot(V: S));
3059 Value *Max = IRB.CreateOr(LHS: V, RHS: S);
3060 return std::make_pair(x&: Min, y&: Max);
3061 };
3062
3063 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3064 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3065 Value *S1 = IRB.CreateICmp(P: I.getUnsignedPredicate(), LHS: Amin, RHS: Bmax);
3066 Value *S2 = IRB.CreateICmp(P: I.getUnsignedPredicate(), LHS: Amax, RHS: Bmin);
3067
3068 Value *Si = IRB.CreateXor(LHS: S1, RHS: S2);
3069 setShadow(V: &I, SV: Si);
3070 setOriginForNaryOp(I);
3071 }
3072
3073 /// Instrument signed relational comparisons.
3074 ///
3075 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3076 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3077 void handleSignedRelationalComparison(ICmpInst &I) {
3078 Constant *constOp;
3079 Value *op = nullptr;
3080 CmpInst::Predicate pre;
3081 if ((constOp = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 1)))) {
3082 op = I.getOperand(i_nocapture: 0);
3083 pre = I.getPredicate();
3084 } else if ((constOp = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 0)))) {
3085 op = I.getOperand(i_nocapture: 1);
3086 pre = I.getSwappedPredicate();
3087 } else {
3088 handleShadowOr(I);
3089 return;
3090 }
3091
3092 if ((constOp->isNullValue() &&
3093 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3094 (constOp->isAllOnesValue() &&
3095 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3096 IRBuilder<> IRB(&I);
3097 Value *Shadow = IRB.CreateICmpSLT(LHS: getShadow(V: op), RHS: getCleanShadow(V: op),
3098 Name: "_msprop_icmp_s");
3099 setShadow(V: &I, SV: Shadow);
3100 setOrigin(V: &I, Origin: getOrigin(V: op));
3101 } else {
3102 handleShadowOr(I);
3103 }
3104 }
3105
3106 void visitICmpInst(ICmpInst &I) {
3107 if (!ClHandleICmp) {
3108 handleShadowOr(I);
3109 return;
3110 }
3111 if (I.isEquality()) {
3112 handleEqualityComparison(I);
3113 return;
3114 }
3115
3116 assert(I.isRelational());
3117 if (ClHandleICmpExact) {
3118 handleRelationalComparisonExact(I);
3119 return;
3120 }
3121 if (I.isSigned()) {
3122 handleSignedRelationalComparison(I);
3123 return;
3124 }
3125
3126 assert(I.isUnsigned());
3127 if ((isa<Constant>(Val: I.getOperand(i_nocapture: 0)) || isa<Constant>(Val: I.getOperand(i_nocapture: 1)))) {
3128 handleRelationalComparisonExact(I);
3129 return;
3130 }
3131
3132 handleShadowOr(I);
3133 }
3134
3135 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3136
3137 void handleShift(BinaryOperator &I) {
3138 IRBuilder<> IRB(&I);
3139 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3140 // Otherwise perform the same shift on S1.
3141 Value *S1 = getShadow(I: &I, i: 0);
3142 Value *S2 = getShadow(I: &I, i: 1);
3143 Value *S2Conv =
3144 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S2, RHS: getCleanShadow(V: S2)), DestTy: S2->getType());
3145 Value *V2 = I.getOperand(i_nocapture: 1);
3146 Value *Shift = IRB.CreateBinOp(Opc: I.getOpcode(), LHS: S1, RHS: V2);
3147 setShadow(V: &I, SV: IRB.CreateOr(LHS: Shift, RHS: S2Conv));
3148 setOriginForNaryOp(I);
3149 }
3150
3151 void visitShl(BinaryOperator &I) { handleShift(I); }
3152 void visitAShr(BinaryOperator &I) { handleShift(I); }
3153 void visitLShr(BinaryOperator &I) { handleShift(I); }
3154
3155 void handleFunnelShift(IntrinsicInst &I) {
3156 IRBuilder<> IRB(&I);
3157 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3158 // Otherwise perform the same shift on S0 and S1.
3159 Value *S0 = getShadow(I: &I, i: 0);
3160 Value *S1 = getShadow(I: &I, i: 1);
3161 Value *S2 = getShadow(I: &I, i: 2);
3162 Value *S2Conv =
3163 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S2, RHS: getCleanShadow(V: S2)), DestTy: S2->getType());
3164 Value *V2 = I.getOperand(i_nocapture: 2);
3165 Value *Shift = IRB.CreateIntrinsic(ID: I.getIntrinsicID(), Types: S2Conv->getType(),
3166 Args: {S0, S1, V2});
3167 setShadow(V: &I, SV: IRB.CreateOr(LHS: Shift, RHS: S2Conv));
3168 setOriginForNaryOp(I);
3169 }
3170
3171 /// Instrument llvm.memmove
3172 ///
3173 /// At this point we don't know if llvm.memmove will be inlined or not.
3174 /// If we don't instrument it and it gets inlined,
3175 /// our interceptor will not kick in and we will lose the memmove.
3176 /// If we instrument the call here, but it does not get inlined,
3177 /// we will memmove the shadow twice: which is bad in case
3178 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3179 ///
3180 /// Similar situation exists for memcpy and memset.
3181 void visitMemMoveInst(MemMoveInst &I) {
3182 getShadow(V: I.getArgOperand(i: 1)); // Ensure shadow initialized
3183 IRBuilder<> IRB(&I);
3184 IRB.CreateCall(Callee: MS.MemmoveFn,
3185 Args: {I.getArgOperand(i: 0), I.getArgOperand(i: 1),
3186 IRB.CreateIntCast(V: I.getArgOperand(i: 2), DestTy: MS.IntptrTy, isSigned: false)});
3187 I.eraseFromParent();
3188 }
3189
3190 /// Instrument memcpy
3191 ///
3192 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3193 /// unfortunate as it may slowdown small constant memcpys.
3194 /// FIXME: consider doing manual inline for small constant sizes and proper
3195 /// alignment.
3196 ///
3197 /// Note: This also handles memcpy.inline, which promises no calls to external
3198 /// functions as an optimization. However, with instrumentation enabled this
3199 /// is difficult to promise; additionally, we know that the MSan runtime
3200 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3201 /// instrumentation it's safe to turn memcpy.inline into a call to
3202 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3203 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3204 void visitMemCpyInst(MemCpyInst &I) {
3205 getShadow(V: I.getArgOperand(i: 1)); // Ensure shadow initialized
3206 IRBuilder<> IRB(&I);
3207 IRB.CreateCall(Callee: MS.MemcpyFn,
3208 Args: {I.getArgOperand(i: 0), I.getArgOperand(i: 1),
3209 IRB.CreateIntCast(V: I.getArgOperand(i: 2), DestTy: MS.IntptrTy, isSigned: false)});
3210 I.eraseFromParent();
3211 }
3212
3213 // Same as memcpy.
3214 void visitMemSetInst(MemSetInst &I) {
3215 IRBuilder<> IRB(&I);
3216 IRB.CreateCall(
3217 Callee: MS.MemsetFn,
3218 Args: {I.getArgOperand(i: 0),
3219 IRB.CreateIntCast(V: I.getArgOperand(i: 1), DestTy: IRB.getInt32Ty(), isSigned: false),
3220 IRB.CreateIntCast(V: I.getArgOperand(i: 2), DestTy: MS.IntptrTy, isSigned: false)});
3221 I.eraseFromParent();
3222 }
3223
3224 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3225
3226 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3227
3228 /// Handle vector store-like intrinsics.
3229 ///
3230 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3231 /// has 1 pointer argument and 1 vector argument, returns void.
3232 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3233 assert(I.arg_size() == 2);
3234
3235 IRBuilder<> IRB(&I);
3236 Value *Addr = I.getArgOperand(i: 0);
3237 Value *Shadow = getShadow(I: &I, i: 1);
3238 Value *ShadowPtr, *OriginPtr;
3239
3240 // We don't know the pointer alignment (could be unaligned SSE store!).
3241 // Have to assume to worst case.
3242 std::tie(args&: ShadowPtr, args&: OriginPtr) = getShadowOriginPtr(
3243 Addr, IRB, ShadowTy: Shadow->getType(), Alignment: Align(1), /*isStore*/ true);
3244 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowPtr, Align: Align(1));
3245
3246 if (ClCheckAccessAddress)
3247 insertCheckShadowOf(Val: Addr, OrigIns: &I);
3248
3249 // FIXME: factor out common code from materializeStores
3250 if (MS.TrackOrigins)
3251 IRB.CreateStore(Val: getOrigin(I: &I, i: 1), Ptr: OriginPtr);
3252 return true;
3253 }
3254
3255 /// Handle vector load-like intrinsics.
3256 ///
3257 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3258 /// has 1 pointer argument, returns a vector.
3259 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3260 assert(I.arg_size() == 1);
3261
3262 IRBuilder<> IRB(&I);
3263 Value *Addr = I.getArgOperand(i: 0);
3264
3265 Type *ShadowTy = getShadowTy(V: &I);
3266 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3267 if (PropagateShadow) {
3268 // We don't know the pointer alignment (could be unaligned SSE load!).
3269 // Have to assume to worst case.
3270 const Align Alignment = Align(1);
3271 std::tie(args&: ShadowPtr, args&: OriginPtr) =
3272 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3273 setShadow(V: &I,
3274 SV: IRB.CreateAlignedLoad(Ty: ShadowTy, Ptr: ShadowPtr, Align: Alignment, Name: "_msld"));
3275 } else {
3276 setShadow(V: &I, SV: getCleanShadow(V: &I));
3277 }
3278
3279 if (ClCheckAccessAddress)
3280 insertCheckShadowOf(Val: Addr, OrigIns: &I);
3281
3282 if (MS.TrackOrigins) {
3283 if (PropagateShadow)
3284 setOrigin(V: &I, Origin: IRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr));
3285 else
3286 setOrigin(V: &I, Origin: getCleanOrigin());
3287 }
3288 return true;
3289 }
3290
3291 /// Handle (SIMD arithmetic)-like intrinsics.
3292 ///
3293 /// Instrument intrinsics with any number of arguments of the same type [*],
3294 /// equal to the return type, plus a specified number of trailing flags of
3295 /// any type.
3296 ///
3297 /// [*] The type should be simple (no aggregates or pointers; vectors are
3298 /// fine).
3299 ///
3300 /// Caller guarantees that this intrinsic does not access memory.
3301 ///
3302 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3303 /// by this handler. See horizontalReduce().
3304 ///
3305 /// TODO: permutation intrinsics are also often incorrectly matched.
3306 [[maybe_unused]] bool
3307 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3308 unsigned int trailingFlags) {
3309 Type *RetTy = I.getType();
3310 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3311 return false;
3312
3313 unsigned NumArgOperands = I.arg_size();
3314 assert(NumArgOperands >= trailingFlags);
3315 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3316 Type *Ty = I.getArgOperand(i)->getType();
3317 if (Ty != RetTy)
3318 return false;
3319 }
3320
3321 IRBuilder<> IRB(&I);
3322 ShadowAndOriginCombiner SC(this, IRB);
3323 for (unsigned i = 0; i < NumArgOperands; ++i)
3324 SC.Add(V: I.getArgOperand(i));
3325 SC.Done(I: &I);
3326
3327 return true;
3328 }
3329
3330 /// Returns whether it was able to heuristically instrument unknown
3331 /// intrinsics.
3332 ///
3333 /// The main purpose of this code is to do something reasonable with all
3334 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3335 /// We recognize several classes of intrinsics by their argument types and
3336 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3337 /// sure that we know what the intrinsic does.
3338 ///
3339 /// We special-case intrinsics where this approach fails. See llvm.bswap
3340 /// handling as an example of that.
3341 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3342 unsigned NumArgOperands = I.arg_size();
3343 if (NumArgOperands == 0)
3344 return false;
3345
3346 if (NumArgOperands == 2 && I.getArgOperand(i: 0)->getType()->isPointerTy() &&
3347 I.getArgOperand(i: 1)->getType()->isVectorTy() &&
3348 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3349 // This looks like a vector store.
3350 return handleVectorStoreIntrinsic(I);
3351 }
3352
3353 if (NumArgOperands == 1 && I.getArgOperand(i: 0)->getType()->isPointerTy() &&
3354 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3355 // This looks like a vector load.
3356 return handleVectorLoadIntrinsic(I);
3357 }
3358
3359 if (I.doesNotAccessMemory())
3360 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3361 return true;
3362
3363 // FIXME: detect and handle SSE maskstore/maskload?
3364 // Some cases are now handled in handleAVXMasked{Load,Store}.
3365 return false;
3366 }
3367
3368 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3369 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3370 if (ClDumpHeuristicInstructions)
3371 dumpInst(I);
3372
3373 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3374 << "\n");
3375 return true;
3376 } else
3377 return false;
3378 }
3379
3380 void handleInvariantGroup(IntrinsicInst &I) {
3381 setShadow(V: &I, SV: getShadow(I: &I, i: 0));
3382 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
3383 }
3384
3385 void handleLifetimeStart(IntrinsicInst &I) {
3386 if (!PoisonStack)
3387 return;
3388 AllocaInst *AI = dyn_cast<AllocaInst>(Val: I.getArgOperand(i: 0));
3389 if (AI)
3390 LifetimeStartList.push_back(Elt: std::make_pair(x: &I, y&: AI));
3391 }
3392
3393 void handleBswap(IntrinsicInst &I) {
3394 IRBuilder<> IRB(&I);
3395 Value *Op = I.getArgOperand(i: 0);
3396 Type *OpType = Op->getType();
3397 setShadow(V: &I, SV: IRB.CreateIntrinsic(ID: Intrinsic::bswap, Types: ArrayRef(&OpType, 1),
3398 Args: getShadow(V: Op)));
3399 setOrigin(V: &I, Origin: getOrigin(V: Op));
3400 }
3401
3402 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3403 // and a 1. If the input is all zero, it is fully initialized iff
3404 // !is_zero_poison.
3405 //
3406 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3407 // concrete value 0/1, and ? is an uninitialized bit:
3408 // - 0001 0??? is fully initialized
3409 // - 000? ???? is fully uninitialized (*)
3410 // - ???? ???? is fully uninitialized
3411 // - 0000 0000 is fully uninitialized if is_zero_poison,
3412 // fully initialized otherwise
3413 //
3414 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3415 // only need to poison 4 bits.
3416 //
3417 // OutputShadow =
3418 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3419 // || (is_zero_poison && AllZeroSrc)
3420 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3421 IRBuilder<> IRB(&I);
3422 Value *Src = I.getArgOperand(i: 0);
3423 Value *SrcShadow = getShadow(V: Src);
3424
3425 Value *False = IRB.getInt1(V: false);
3426 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3427 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {Src, /*is_zero_poison=*/False});
3428 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3429 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {SrcShadow, /*is_zero_poison=*/False});
3430
3431 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3432 LHS: ConcreteZerosCount, RHS: ShadowZerosCount, Name: "_mscz_cmp_zeros");
3433
3434 Value *NotAllZeroShadow =
3435 IRB.CreateIsNotNull(Arg: SrcShadow, Name: "_mscz_shadow_not_null");
3436 Value *OutputShadow =
3437 IRB.CreateAnd(LHS: CompareConcreteZeros, RHS: NotAllZeroShadow, Name: "_mscz_main");
3438
3439 // If zero poison is requested, mix in with the shadow
3440 Constant *IsZeroPoison = cast<Constant>(Val: I.getOperand(i_nocapture: 1));
3441 if (!IsZeroPoison->isZeroValue()) {
3442 Value *BoolZeroPoison = IRB.CreateIsNull(Arg: Src, Name: "_mscz_bzp");
3443 OutputShadow = IRB.CreateOr(LHS: OutputShadow, RHS: BoolZeroPoison, Name: "_mscz_bs");
3444 }
3445
3446 OutputShadow = IRB.CreateSExt(V: OutputShadow, DestTy: getShadowTy(V: Src), Name: "_mscz_os");
3447
3448 setShadow(V: &I, SV: OutputShadow);
3449 setOriginForNaryOp(I);
3450 }
3451
3452 /// Handle Arm NEON vector convert intrinsics.
3453 ///
3454 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3455 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
3456 ///
3457 /// For conversions to or from fixed-point, there is a trailing argument to
3458 /// indicate the fixed-point precision:
3459 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
3460 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
3461 ///
3462 /// For x86 SSE vector convert intrinsics, see
3463 /// handleSSEVectorConvertIntrinsic().
3464 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I, bool FixedPoint) {
3465 if (FixedPoint)
3466 assert(I.arg_size() == 2);
3467 else
3468 assert(I.arg_size() == 1);
3469
3470 IRBuilder<> IRB(&I);
3471 Value *S0 = getShadow(I: &I, i: 0);
3472
3473 if (FixedPoint) {
3474 Value *Precision = I.getOperand(i_nocapture: 1);
3475 insertCheckShadowOf(Val: Precision, OrigIns: &I);
3476 }
3477
3478 /// For scalars:
3479 /// Since they are converting from floating-point to integer, the output is
3480 /// - fully uninitialized if *any* bit of the input is uninitialized
3481 /// - fully ininitialized if all bits of the input are ininitialized
3482 /// We apply the same principle on a per-field basis for vectors.
3483 Value *OutShadow = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S0, RHS: getCleanShadow(V: S0)),
3484 DestTy: getShadowTy(V: &I));
3485 setShadow(V: &I, SV: OutShadow);
3486 setOriginForNaryOp(I);
3487 }
3488
3489 /// Some instructions have additional zero-elements in the return type
3490 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3491 ///
3492 /// This function will return a vector type with the same number of elements
3493 /// as the input, but same per-element width as the return value e.g.,
3494 /// <8 x i8>.
3495 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3496 assert(isa<FixedVectorType>(getShadowTy(&I)));
3497 FixedVectorType *ShadowType = cast<FixedVectorType>(Val: getShadowTy(V: &I));
3498
3499 // TODO: generalize beyond 2x?
3500 if (ShadowType->getElementCount() ==
3501 cast<VectorType>(Val: Src->getType())->getElementCount() * 2)
3502 ShadowType = FixedVectorType::getHalfElementsVectorType(VTy: ShadowType);
3503
3504 assert(ShadowType->getElementCount() ==
3505 cast<VectorType>(Src->getType())->getElementCount());
3506
3507 return ShadowType;
3508 }
3509
3510 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3511 /// to match the length of the shadow for the instruction.
3512 /// If scalar types of the vectors are different, it will use the type of the
3513 /// input vector.
3514 /// This is more type-safe than CreateShadowCast().
3515 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3516 IRBuilder<> IRB(&I);
3517 assert(isa<FixedVectorType>(Shadow->getType()));
3518 assert(isa<FixedVectorType>(I.getType()));
3519
3520 Value *FullShadow = getCleanShadow(V: &I);
3521 unsigned ShadowNumElems =
3522 cast<FixedVectorType>(Val: Shadow->getType())->getNumElements();
3523 unsigned FullShadowNumElems =
3524 cast<FixedVectorType>(Val: FullShadow->getType())->getNumElements();
3525
3526 assert((ShadowNumElems == FullShadowNumElems) ||
3527 (ShadowNumElems * 2 == FullShadowNumElems));
3528
3529 if (ShadowNumElems == FullShadowNumElems) {
3530 FullShadow = Shadow;
3531 } else {
3532 // TODO: generalize beyond 2x?
3533 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3534 std::iota(first: ShadowMask.begin(), last: ShadowMask.end(), value: 0);
3535
3536 // Append zeros
3537 FullShadow =
3538 IRB.CreateShuffleVector(V1: Shadow, V2: getCleanShadow(V: Shadow), Mask: ShadowMask);
3539 }
3540
3541 return FullShadow;
3542 }
3543
3544 /// Handle x86 SSE vector conversion.
3545 ///
3546 /// e.g., single-precision to half-precision conversion:
3547 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3548 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3549 ///
3550 /// floating-point to integer:
3551 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3552 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3553 ///
3554 /// Note: if the output has more elements, they are zero-initialized (and
3555 /// therefore the shadow will also be initialized).
3556 ///
3557 /// This differs from handleSSEVectorConvertIntrinsic() because it
3558 /// propagates uninitialized shadow (instead of checking the shadow).
3559 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3560 bool HasRoundingMode) {
3561 if (HasRoundingMode) {
3562 assert(I.arg_size() == 2);
3563 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(i: 1);
3564 assert(RoundingMode->getType()->isIntegerTy());
3565 } else {
3566 assert(I.arg_size() == 1);
3567 }
3568
3569 Value *Src = I.getArgOperand(i: 0);
3570 assert(Src->getType()->isVectorTy());
3571
3572 // The return type might have more elements than the input.
3573 // Temporarily shrink the return type's number of elements.
3574 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3575
3576 IRBuilder<> IRB(&I);
3577 Value *S0 = getShadow(I: &I, i: 0);
3578
3579 /// For scalars:
3580 /// Since they are converting to and/or from floating-point, the output is:
3581 /// - fully uninitialized if *any* bit of the input is uninitialized
3582 /// - fully ininitialized if all bits of the input are ininitialized
3583 /// We apply the same principle on a per-field basis for vectors.
3584 Value *Shadow =
3585 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S0, RHS: getCleanShadow(V: S0)), DestTy: ShadowType);
3586
3587 // The return type might have more elements than the input.
3588 // Extend the return type back to its original width if necessary.
3589 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3590
3591 setShadow(V: &I, SV: FullShadow);
3592 setOriginForNaryOp(I);
3593 }
3594
3595 // Instrument x86 SSE vector convert intrinsic.
3596 //
3597 // This function instruments intrinsics like cvtsi2ss:
3598 // %Out = int_xxx_cvtyyy(%ConvertOp)
3599 // or
3600 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3601 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3602 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3603 // elements from \p CopyOp.
3604 // In most cases conversion involves floating-point value which may trigger a
3605 // hardware exception when not fully initialized. For this reason we require
3606 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3607 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3608 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3609 // return a fully initialized value.
3610 //
3611 // For Arm NEON vector convert intrinsics, see
3612 // handleNEONVectorConvertIntrinsic().
3613 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3614 bool HasRoundingMode = false) {
3615 IRBuilder<> IRB(&I);
3616 Value *CopyOp, *ConvertOp;
3617
3618 assert((!HasRoundingMode ||
3619 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3620 "Invalid rounding mode");
3621
3622 switch (I.arg_size() - HasRoundingMode) {
3623 case 2:
3624 CopyOp = I.getArgOperand(i: 0);
3625 ConvertOp = I.getArgOperand(i: 1);
3626 break;
3627 case 1:
3628 ConvertOp = I.getArgOperand(i: 0);
3629 CopyOp = nullptr;
3630 break;
3631 default:
3632 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3633 }
3634
3635 // The first *NumUsedElements* elements of ConvertOp are converted to the
3636 // same number of output elements. The rest of the output is copied from
3637 // CopyOp, or (if not available) filled with zeroes.
3638 // Combine shadow for elements of ConvertOp that are used in this operation,
3639 // and insert a check.
3640 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3641 // int->any conversion.
3642 Value *ConvertShadow = getShadow(V: ConvertOp);
3643 Value *AggShadow = nullptr;
3644 if (ConvertOp->getType()->isVectorTy()) {
3645 AggShadow = IRB.CreateExtractElement(
3646 Vec: ConvertShadow, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0));
3647 for (int i = 1; i < NumUsedElements; ++i) {
3648 Value *MoreShadow = IRB.CreateExtractElement(
3649 Vec: ConvertShadow, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
3650 AggShadow = IRB.CreateOr(LHS: AggShadow, RHS: MoreShadow);
3651 }
3652 } else {
3653 AggShadow = ConvertShadow;
3654 }
3655 assert(AggShadow->getType()->isIntegerTy());
3656 insertCheckShadow(Shadow: AggShadow, Origin: getOrigin(V: ConvertOp), OrigIns: &I);
3657
3658 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3659 // ConvertOp.
3660 if (CopyOp) {
3661 assert(CopyOp->getType() == I.getType());
3662 assert(CopyOp->getType()->isVectorTy());
3663 Value *ResultShadow = getShadow(V: CopyOp);
3664 Type *EltTy = cast<VectorType>(Val: ResultShadow->getType())->getElementType();
3665 for (int i = 0; i < NumUsedElements; ++i) {
3666 ResultShadow = IRB.CreateInsertElement(
3667 Vec: ResultShadow, NewElt: ConstantInt::getNullValue(Ty: EltTy),
3668 Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
3669 }
3670 setShadow(V: &I, SV: ResultShadow);
3671 setOrigin(V: &I, Origin: getOrigin(V: CopyOp));
3672 } else {
3673 setShadow(V: &I, SV: getCleanShadow(V: &I));
3674 setOrigin(V: &I, Origin: getCleanOrigin());
3675 }
3676 }
3677
3678 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3679 // zeroes if it is zero, and all ones otherwise.
3680 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3681 if (S->getType()->isVectorTy())
3682 S = CreateShadowCast(IRB, V: S, dstTy: IRB.getInt64Ty(), /* Signed */ true);
3683 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3684 Value *S2 = IRB.CreateICmpNE(LHS: S, RHS: getCleanShadow(V: S));
3685 return CreateShadowCast(IRB, V: S2, dstTy: T, /* Signed */ true);
3686 }
3687
3688 // Given a vector, extract its first element, and return all
3689 // zeroes if it is zero, and all ones otherwise.
3690 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3691 Value *S1 = IRB.CreateExtractElement(Vec: S, Idx: (uint64_t)0);
3692 Value *S2 = IRB.CreateICmpNE(LHS: S1, RHS: getCleanShadow(V: S1));
3693 return CreateShadowCast(IRB, V: S2, dstTy: T, /* Signed */ true);
3694 }
3695
3696 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3697 Type *T = S->getType();
3698 assert(T->isVectorTy());
3699 Value *S2 = IRB.CreateICmpNE(LHS: S, RHS: getCleanShadow(V: S));
3700 return IRB.CreateSExt(V: S2, DestTy: T);
3701 }
3702
3703 // Instrument vector shift intrinsic.
3704 //
3705 // This function instruments intrinsics like int_x86_avx2_psll_w.
3706 // Intrinsic shifts %In by %ShiftSize bits.
3707 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3708 // size, and the rest is ignored. Behavior is defined even if shift size is
3709 // greater than register (or field) width.
3710 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3711 assert(I.arg_size() == 2);
3712 IRBuilder<> IRB(&I);
3713 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3714 // Otherwise perform the same shift on S1.
3715 Value *S1 = getShadow(I: &I, i: 0);
3716 Value *S2 = getShadow(I: &I, i: 1);
3717 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S: S2)
3718 : Lower64ShadowExtend(IRB, S: S2, T: getShadowTy(V: &I));
3719 Value *V1 = I.getOperand(i_nocapture: 0);
3720 Value *V2 = I.getOperand(i_nocapture: 1);
3721 Value *Shift = IRB.CreateCall(FTy: I.getFunctionType(), Callee: I.getCalledOperand(),
3722 Args: {IRB.CreateBitCast(V: S1, DestTy: V1->getType()), V2});
3723 Shift = IRB.CreateBitCast(V: Shift, DestTy: getShadowTy(V: &I));
3724 setShadow(V: &I, SV: IRB.CreateOr(LHS: Shift, RHS: S2Conv));
3725 setOriginForNaryOp(I);
3726 }
3727
3728 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3729 // vectors.
3730 Type *getMMXVectorTy(unsigned EltSizeInBits,
3731 unsigned X86_MMXSizeInBits = 64) {
3732 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3733 "Illegal MMX vector element size");
3734 return FixedVectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: EltSizeInBits),
3735 NumElts: X86_MMXSizeInBits / EltSizeInBits);
3736 }
3737
3738 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3739 // intrinsic.
3740 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3741 switch (id) {
3742 case Intrinsic::x86_sse2_packsswb_128:
3743 case Intrinsic::x86_sse2_packuswb_128:
3744 return Intrinsic::x86_sse2_packsswb_128;
3745
3746 case Intrinsic::x86_sse2_packssdw_128:
3747 case Intrinsic::x86_sse41_packusdw:
3748 return Intrinsic::x86_sse2_packssdw_128;
3749
3750 case Intrinsic::x86_avx2_packsswb:
3751 case Intrinsic::x86_avx2_packuswb:
3752 return Intrinsic::x86_avx2_packsswb;
3753
3754 case Intrinsic::x86_avx2_packssdw:
3755 case Intrinsic::x86_avx2_packusdw:
3756 return Intrinsic::x86_avx2_packssdw;
3757
3758 case Intrinsic::x86_mmx_packsswb:
3759 case Intrinsic::x86_mmx_packuswb:
3760 return Intrinsic::x86_mmx_packsswb;
3761
3762 case Intrinsic::x86_mmx_packssdw:
3763 return Intrinsic::x86_mmx_packssdw;
3764
3765 case Intrinsic::x86_avx512_packssdw_512:
3766 case Intrinsic::x86_avx512_packusdw_512:
3767 return Intrinsic::x86_avx512_packssdw_512;
3768
3769 case Intrinsic::x86_avx512_packsswb_512:
3770 case Intrinsic::x86_avx512_packuswb_512:
3771 return Intrinsic::x86_avx512_packsswb_512;
3772
3773 default:
3774 llvm_unreachable("unexpected intrinsic id");
3775 }
3776 }
3777
3778 // Instrument vector pack intrinsic.
3779 //
3780 // This function instruments intrinsics like x86_mmx_packsswb, that
3781 // packs elements of 2 input vectors into half as many bits with saturation.
3782 // Shadow is propagated with the signed variant of the same intrinsic applied
3783 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3784 // MMXEltSizeInBits is used only for x86mmx arguments.
3785 //
3786 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3787 void handleVectorPackIntrinsic(IntrinsicInst &I,
3788 unsigned MMXEltSizeInBits = 0) {
3789 assert(I.arg_size() == 2);
3790 IRBuilder<> IRB(&I);
3791 Value *S1 = getShadow(I: &I, i: 0);
3792 Value *S2 = getShadow(I: &I, i: 1);
3793 assert(S1->getType()->isVectorTy());
3794
3795 // SExt and ICmpNE below must apply to individual elements of input vectors.
3796 // In case of x86mmx arguments, cast them to appropriate vector types and
3797 // back.
3798 Type *T =
3799 MMXEltSizeInBits ? getMMXVectorTy(EltSizeInBits: MMXEltSizeInBits) : S1->getType();
3800 if (MMXEltSizeInBits) {
3801 S1 = IRB.CreateBitCast(V: S1, DestTy: T);
3802 S2 = IRB.CreateBitCast(V: S2, DestTy: T);
3803 }
3804 Value *S1_ext =
3805 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S1, RHS: Constant::getNullValue(Ty: T)), DestTy: T);
3806 Value *S2_ext =
3807 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S2, RHS: Constant::getNullValue(Ty: T)), DestTy: T);
3808 if (MMXEltSizeInBits) {
3809 S1_ext = IRB.CreateBitCast(V: S1_ext, DestTy: getMMXVectorTy(EltSizeInBits: 64));
3810 S2_ext = IRB.CreateBitCast(V: S2_ext, DestTy: getMMXVectorTy(EltSizeInBits: 64));
3811 }
3812
3813 Value *S = IRB.CreateIntrinsic(ID: getSignedPackIntrinsic(id: I.getIntrinsicID()),
3814 Args: {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3815 Name: "_msprop_vector_pack");
3816 if (MMXEltSizeInBits)
3817 S = IRB.CreateBitCast(V: S, DestTy: getShadowTy(V: &I));
3818 setShadow(V: &I, SV: S);
3819 setOriginForNaryOp(I);
3820 }
3821
3822 // Convert `Mask` into `<n x i1>`.
3823 Constant *createDppMask(unsigned Width, unsigned Mask) {
3824 SmallVector<Constant *, 4> R(Width);
3825 for (auto &M : R) {
3826 M = ConstantInt::getBool(Context&: F.getContext(), V: Mask & 1);
3827 Mask >>= 1;
3828 }
3829 return ConstantVector::get(V: R);
3830 }
3831
3832 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3833 // arg is poisoned, entire dot product is poisoned.
3834 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3835 unsigned DstMask) {
3836 const unsigned Width =
3837 cast<FixedVectorType>(Val: S->getType())->getNumElements();
3838
3839 S = IRB.CreateSelect(C: createDppMask(Width, Mask: SrcMask), True: S,
3840 False: Constant::getNullValue(Ty: S->getType()));
3841 Value *SElem = IRB.CreateOrReduce(Src: S);
3842 Value *IsClean = IRB.CreateIsNull(Arg: SElem, Name: "_msdpp");
3843 Value *DstMaskV = createDppMask(Width, Mask: DstMask);
3844
3845 return IRB.CreateSelect(
3846 C: IsClean, True: Constant::getNullValue(Ty: DstMaskV->getType()), False: DstMaskV);
3847 }
3848
3849 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3850 //
3851 // 2 and 4 element versions produce single scalar of dot product, and then
3852 // puts it into elements of output vector, selected by 4 lowest bits of the
3853 // mask. Top 4 bits of the mask control which elements of input to use for dot
3854 // product.
3855 //
3856 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3857 // mask. According to the spec it just operates as 4 element version on first
3858 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3859 // output.
3860 void handleDppIntrinsic(IntrinsicInst &I) {
3861 IRBuilder<> IRB(&I);
3862
3863 Value *S0 = getShadow(I: &I, i: 0);
3864 Value *S1 = getShadow(I: &I, i: 1);
3865 Value *S = IRB.CreateOr(LHS: S0, RHS: S1);
3866
3867 const unsigned Width =
3868 cast<FixedVectorType>(Val: S->getType())->getNumElements();
3869 assert(Width == 2 || Width == 4 || Width == 8);
3870
3871 const unsigned Mask = cast<ConstantInt>(Val: I.getArgOperand(i: 2))->getZExtValue();
3872 const unsigned SrcMask = Mask >> 4;
3873 const unsigned DstMask = Mask & 0xf;
3874
3875 // Calculate shadow as `<n x i1>`.
3876 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3877 if (Width == 8) {
3878 // First 4 elements of shadow are already calculated. `makeDppShadow`
3879 // operats on 32 bit masks, so we can just shift masks, and repeat.
3880 SI1 = IRB.CreateOr(
3881 LHS: SI1, RHS: findDppPoisonedOutput(IRB, S, SrcMask: SrcMask << 4, DstMask: DstMask << 4));
3882 }
3883 // Extend to real size of shadow, poisoning either all or none bits of an
3884 // element.
3885 S = IRB.CreateSExt(V: SI1, DestTy: S->getType(), Name: "_msdpp");
3886
3887 setShadow(V: &I, SV: S);
3888 setOriginForNaryOp(I);
3889 }
3890
3891 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3892 C = CreateAppToShadowCast(IRB, V: C);
3893 FixedVectorType *FVT = cast<FixedVectorType>(Val: C->getType());
3894 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3895 C = IRB.CreateAShr(LHS: C, RHS: ElSize - 1);
3896 FVT = FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: FVT->getNumElements());
3897 return IRB.CreateTrunc(V: C, DestTy: FVT);
3898 }
3899
3900 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3901 void handleBlendvIntrinsic(IntrinsicInst &I) {
3902 Value *C = I.getOperand(i_nocapture: 2);
3903 Value *T = I.getOperand(i_nocapture: 1);
3904 Value *F = I.getOperand(i_nocapture: 0);
3905
3906 Value *Sc = getShadow(I: &I, i: 2);
3907 Value *Oc = MS.TrackOrigins ? getOrigin(V: C) : nullptr;
3908
3909 {
3910 IRBuilder<> IRB(&I);
3911 // Extract top bit from condition and its shadow.
3912 C = convertBlendvToSelectMask(IRB, C);
3913 Sc = convertBlendvToSelectMask(IRB, C: Sc);
3914
3915 setShadow(V: C, SV: Sc);
3916 setOrigin(V: C, Origin: Oc);
3917 }
3918
3919 handleSelectLikeInst(I, B: C, C: T, D: F);
3920 }
3921
3922 // Instrument sum-of-absolute-differences intrinsic.
3923 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3924 const unsigned SignificantBitsPerResultElement = 16;
3925 Type *ResTy = IsMMX ? IntegerType::get(C&: *MS.C, NumBits: 64) : I.getType();
3926 unsigned ZeroBitsPerResultElement =
3927 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3928
3929 IRBuilder<> IRB(&I);
3930 auto *Shadow0 = getShadow(I: &I, i: 0);
3931 auto *Shadow1 = getShadow(I: &I, i: 1);
3932 Value *S = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
3933 S = IRB.CreateBitCast(V: S, DestTy: ResTy);
3934 S = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S, RHS: Constant::getNullValue(Ty: ResTy)),
3935 DestTy: ResTy);
3936 S = IRB.CreateLShr(LHS: S, RHS: ZeroBitsPerResultElement);
3937 S = IRB.CreateBitCast(V: S, DestTy: getShadowTy(V: &I));
3938 setShadow(V: &I, SV: S);
3939 setOriginForNaryOp(I);
3940 }
3941
3942 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
3943 //
3944 // e.g., Two operands:
3945 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3946 //
3947 // Two operands which require an EltSizeInBits override:
3948 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3949 //
3950 // Three operands:
3951 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3952 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3953 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
3954 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
3955 // (these are equivalent to multiply-add on %a and %b, followed by
3956 // adding/"accumulating" %s. "Accumulation" stores the result in one
3957 // of the source registers, but this accumulate vs. add distinction
3958 // is lost when dealing with LLVM intrinsics.)
3959 //
3960 // ZeroPurifies means that multiplying a known-zero with an uninitialized
3961 // value results in an initialized value. This is applicable for integer
3962 // multiplication, but not floating-point (counter-example: NaN).
3963 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
3964 unsigned ReductionFactor,
3965 bool ZeroPurifies,
3966 unsigned EltSizeInBits,
3967 enum OddOrEvenLanes Lanes) {
3968 IRBuilder<> IRB(&I);
3969
3970 [[maybe_unused]] FixedVectorType *ReturnType =
3971 cast<FixedVectorType>(Val: I.getType());
3972 assert(isa<FixedVectorType>(ReturnType));
3973
3974 // Vectors A and B, and shadows
3975 Value *Va = nullptr;
3976 Value *Vb = nullptr;
3977 Value *Sa = nullptr;
3978 Value *Sb = nullptr;
3979
3980 assert(I.arg_size() == 2 || I.arg_size() == 3);
3981 if (I.arg_size() == 2) {
3982 assert(Lanes == kBothLanes);
3983
3984 Va = I.getOperand(i_nocapture: 0);
3985 Vb = I.getOperand(i_nocapture: 1);
3986
3987 Sa = getShadow(I: &I, i: 0);
3988 Sb = getShadow(I: &I, i: 1);
3989 } else if (I.arg_size() == 3) {
3990 // Operand 0 is the accumulator. We will deal with that below.
3991 Va = I.getOperand(i_nocapture: 1);
3992 Vb = I.getOperand(i_nocapture: 2);
3993
3994 Sa = getShadow(I: &I, i: 1);
3995 Sb = getShadow(I: &I, i: 2);
3996
3997 if (Lanes == kEvenLanes || Lanes == kOddLanes) {
3998 // Convert < S0, S1, S2, S3, S4, S5, S6, S7 >
3999 // to < S0, S0, S2, S2, S4, S4, S6, S6 > (if even)
4000 // to < S1, S1, S3, S3, S5, S5, S7, S7 > (if odd)
4001 //
4002 // Note: for aarch64.neon.bfmlalb/t, the odd/even-indexed values are
4003 // zeroed, not duplicated. However, for shadow propagation, this
4004 // distinction is unimportant because Step 1 below will squeeze
4005 // each pair of elements (e.g., [S0, S0]) into a single bit, and
4006 // we only care if it is fully initialized.
4007
4008 FixedVectorType *InputShadowType = cast<FixedVectorType>(Val: Sa->getType());
4009 unsigned Width = InputShadowType->getNumElements();
4010
4011 Sa = IRB.CreateShuffleVector(
4012 V: Sa, Mask: getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4013 Sb = IRB.CreateShuffleVector(
4014 V: Sb, Mask: getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4015 }
4016 }
4017
4018 FixedVectorType *ParamType = cast<FixedVectorType>(Val: Va->getType());
4019 assert(ParamType == Vb->getType());
4020
4021 assert(ParamType->getPrimitiveSizeInBits() ==
4022 ReturnType->getPrimitiveSizeInBits());
4023
4024 if (I.arg_size() == 3) {
4025 [[maybe_unused]] auto *AccumulatorType =
4026 cast<FixedVectorType>(Val: I.getOperand(i_nocapture: 0)->getType());
4027 assert(AccumulatorType == ReturnType);
4028 }
4029
4030 FixedVectorType *ImplicitReturnType =
4031 cast<FixedVectorType>(Val: getShadowTy(OrigTy: ReturnType));
4032 // Step 1: instrument multiplication of corresponding vector elements
4033 if (EltSizeInBits) {
4034 ImplicitReturnType = cast<FixedVectorType>(
4035 Val: getMMXVectorTy(EltSizeInBits: EltSizeInBits * ReductionFactor,
4036 X86_MMXSizeInBits: ParamType->getPrimitiveSizeInBits()));
4037 ParamType = cast<FixedVectorType>(
4038 Val: getMMXVectorTy(EltSizeInBits, X86_MMXSizeInBits: ParamType->getPrimitiveSizeInBits()));
4039
4040 Va = IRB.CreateBitCast(V: Va, DestTy: ParamType);
4041 Vb = IRB.CreateBitCast(V: Vb, DestTy: ParamType);
4042
4043 Sa = IRB.CreateBitCast(V: Sa, DestTy: getShadowTy(OrigTy: ParamType));
4044 Sb = IRB.CreateBitCast(V: Sb, DestTy: getShadowTy(OrigTy: ParamType));
4045 } else {
4046 assert(ParamType->getNumElements() ==
4047 ReturnType->getNumElements() * ReductionFactor);
4048 }
4049
4050 // Each element of the vector is represented by a single bit (poisoned or
4051 // not) e.g., <8 x i1>.
4052 Value *SaNonZero = IRB.CreateIsNotNull(Arg: Sa);
4053 Value *SbNonZero = IRB.CreateIsNotNull(Arg: Sb);
4054 Value *And;
4055 if (ZeroPurifies) {
4056 // Multiplying an *initialized* zero by an uninitialized element results
4057 // in an initialized zero element.
4058 //
4059 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4060 // results in an unpoisoned value.
4061 Value *VaInt = Va;
4062 Value *VbInt = Vb;
4063 if (!Va->getType()->isIntegerTy()) {
4064 VaInt = CreateAppToShadowCast(IRB, V: Va);
4065 VbInt = CreateAppToShadowCast(IRB, V: Vb);
4066 }
4067
4068 // We check for non-zero on a per-element basis, not per-bit.
4069 Value *VaNonZero = IRB.CreateIsNotNull(Arg: VaInt);
4070 Value *VbNonZero = IRB.CreateIsNotNull(Arg: VbInt);
4071
4072 And = handleBitwiseAnd(IRB, V1: VaNonZero, V2: VbNonZero, S1: SaNonZero, S2: SbNonZero);
4073 } else {
4074 And = IRB.CreateOr(Ops: {SaNonZero, SbNonZero});
4075 }
4076
4077 // Extend <8 x i1> to <8 x i16>.
4078 // (The real pmadd intrinsic would have computed intermediate values of
4079 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4080 // consider each element to be either fully initialized or fully
4081 // uninitialized.)
4082 And = IRB.CreateSExt(V: And, DestTy: Sa->getType());
4083
4084 // Step 2: instrument horizontal add
4085 // We don't need bit-precise horizontalReduce because we only want to check
4086 // if each pair/quad of elements is fully zero.
4087 // Cast to <4 x i32>.
4088 Value *Horizontal = IRB.CreateBitCast(V: And, DestTy: ImplicitReturnType);
4089
4090 // Compute <4 x i1>, then extend back to <4 x i32>.
4091 Value *OutShadow = IRB.CreateSExt(
4092 V: IRB.CreateICmpNE(LHS: Horizontal,
4093 RHS: Constant::getNullValue(Ty: Horizontal->getType())),
4094 DestTy: ImplicitReturnType);
4095
4096 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4097 // AVX, it is already correct).
4098 if (EltSizeInBits)
4099 OutShadow = CreateShadowCast(IRB, V: OutShadow, dstTy: getShadowTy(V: &I));
4100
4101 // Step 3 (if applicable): instrument accumulator
4102 if (I.arg_size() == 3)
4103 OutShadow = IRB.CreateOr(LHS: OutShadow, RHS: getShadow(I: &I, i: 0));
4104
4105 setShadow(V: &I, SV: OutShadow);
4106 setOriginForNaryOp(I);
4107 }
4108
4109 // Instrument compare-packed intrinsic.
4110 //
4111 // x86 has the predicate as the third operand, which is ImmArg e.g.,
4112 // - <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double>, <4 x double>, i8)
4113 // - <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double>, <2 x double>, i8)
4114 //
4115 // while Arm has separate intrinsics for >= and > e.g.,
4116 // - <2 x i32> @llvm.aarch64.neon.facge.v2i32.v2f32
4117 // (<2 x float> %A, <2 x float>)
4118 // - <2 x i32> @llvm.aarch64.neon.facgt.v2i32.v2f32
4119 // (<2 x float> %A, <2 x float>)
4120 void handleVectorComparePackedIntrinsic(IntrinsicInst &I,
4121 bool PredicateAsOperand) {
4122 if (PredicateAsOperand) {
4123 assert(I.arg_size() == 3);
4124 assert(I.paramHasAttr(2, Attribute::ImmArg));
4125 } else
4126 assert(I.arg_size() == 2);
4127
4128 IRBuilder<> IRB(&I);
4129
4130 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4131 // all-ones shadow.
4132 Type *ResTy = getShadowTy(V: &I);
4133 auto *Shadow0 = getShadow(I: &I, i: 0);
4134 auto *Shadow1 = getShadow(I: &I, i: 1);
4135 Value *S0 = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4136 Value *S = IRB.CreateSExt(
4137 V: IRB.CreateICmpNE(LHS: S0, RHS: Constant::getNullValue(Ty: ResTy)), DestTy: ResTy);
4138 setShadow(V: &I, SV: S);
4139 setOriginForNaryOp(I);
4140 }
4141
4142 // Instrument compare-scalar intrinsic.
4143 // This handles both cmp* intrinsics which return the result in the first
4144 // element of a vector, and comi* which return the result as i32.
4145 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4146 IRBuilder<> IRB(&I);
4147 auto *Shadow0 = getShadow(I: &I, i: 0);
4148 auto *Shadow1 = getShadow(I: &I, i: 1);
4149 Value *S0 = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4150 Value *S = LowerElementShadowExtend(IRB, S: S0, T: getShadowTy(V: &I));
4151 setShadow(V: &I, SV: S);
4152 setOriginForNaryOp(I);
4153 }
4154
4155 // Instrument generic vector reduction intrinsics
4156 // by ORing together all their fields.
4157 //
4158 // If AllowShadowCast is true, the return type does not need to be the same
4159 // type as the fields
4160 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4161 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4162 assert(I.arg_size() == 1);
4163
4164 IRBuilder<> IRB(&I);
4165 Value *S = IRB.CreateOrReduce(Src: getShadow(I: &I, i: 0));
4166 if (AllowShadowCast)
4167 S = CreateShadowCast(IRB, V: S, dstTy: getShadowTy(V: &I));
4168 else
4169 assert(S->getType() == getShadowTy(&I));
4170 setShadow(V: &I, SV: S);
4171 setOriginForNaryOp(I);
4172 }
4173
4174 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4175 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4176 // %a1)
4177 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4178 //
4179 // The type of the return value, initial starting value, and elements of the
4180 // vector must be identical.
4181 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4182 assert(I.arg_size() == 2);
4183
4184 IRBuilder<> IRB(&I);
4185 Value *Shadow0 = getShadow(I: &I, i: 0);
4186 Value *Shadow1 = IRB.CreateOrReduce(Src: getShadow(I: &I, i: 1));
4187 assert(Shadow0->getType() == Shadow1->getType());
4188 Value *S = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4189 assert(S->getType() == getShadowTy(&I));
4190 setShadow(V: &I, SV: S);
4191 setOriginForNaryOp(I);
4192 }
4193
4194 // Instrument vector.reduce.or intrinsic.
4195 // Valid (non-poisoned) set bits in the operand pull low the
4196 // corresponding shadow bits.
4197 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4198 assert(I.arg_size() == 1);
4199
4200 IRBuilder<> IRB(&I);
4201 Value *OperandShadow = getShadow(I: &I, i: 0);
4202 Value *OperandUnsetBits = IRB.CreateNot(V: I.getOperand(i_nocapture: 0));
4203 Value *OperandUnsetOrPoison = IRB.CreateOr(LHS: OperandUnsetBits, RHS: OperandShadow);
4204 // Bit N is clean if any field's bit N is 1 and unpoison
4205 Value *OutShadowMask = IRB.CreateAndReduce(Src: OperandUnsetOrPoison);
4206 // Otherwise, it is clean if every field's bit N is unpoison
4207 Value *OrShadow = IRB.CreateOrReduce(Src: OperandShadow);
4208 Value *S = IRB.CreateAnd(LHS: OutShadowMask, RHS: OrShadow);
4209
4210 setShadow(V: &I, SV: S);
4211 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
4212 }
4213
4214 // Instrument vector.reduce.and intrinsic.
4215 // Valid (non-poisoned) unset bits in the operand pull down the
4216 // corresponding shadow bits.
4217 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4218 assert(I.arg_size() == 1);
4219
4220 IRBuilder<> IRB(&I);
4221 Value *OperandShadow = getShadow(I: &I, i: 0);
4222 Value *OperandSetOrPoison = IRB.CreateOr(LHS: I.getOperand(i_nocapture: 0), RHS: OperandShadow);
4223 // Bit N is clean if any field's bit N is 0 and unpoison
4224 Value *OutShadowMask = IRB.CreateAndReduce(Src: OperandSetOrPoison);
4225 // Otherwise, it is clean if every field's bit N is unpoison
4226 Value *OrShadow = IRB.CreateOrReduce(Src: OperandShadow);
4227 Value *S = IRB.CreateAnd(LHS: OutShadowMask, RHS: OrShadow);
4228
4229 setShadow(V: &I, SV: S);
4230 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
4231 }
4232
4233 void handleStmxcsr(IntrinsicInst &I) {
4234 IRBuilder<> IRB(&I);
4235 Value *Addr = I.getArgOperand(i: 0);
4236 Type *Ty = IRB.getInt32Ty();
4237 Value *ShadowPtr =
4238 getShadowOriginPtr(Addr, IRB, ShadowTy: Ty, Alignment: Align(1), /*isStore*/ true).first;
4239
4240 IRB.CreateStore(Val: getCleanShadow(OrigTy: Ty), Ptr: ShadowPtr);
4241
4242 if (ClCheckAccessAddress)
4243 insertCheckShadowOf(Val: Addr, OrigIns: &I);
4244 }
4245
4246 void handleLdmxcsr(IntrinsicInst &I) {
4247 if (!InsertChecks)
4248 return;
4249
4250 IRBuilder<> IRB(&I);
4251 Value *Addr = I.getArgOperand(i: 0);
4252 Type *Ty = IRB.getInt32Ty();
4253 const Align Alignment = Align(1);
4254 Value *ShadowPtr, *OriginPtr;
4255 std::tie(args&: ShadowPtr, args&: OriginPtr) =
4256 getShadowOriginPtr(Addr, IRB, ShadowTy: Ty, Alignment, /*isStore*/ false);
4257
4258 if (ClCheckAccessAddress)
4259 insertCheckShadowOf(Val: Addr, OrigIns: &I);
4260
4261 Value *Shadow = IRB.CreateAlignedLoad(Ty, Ptr: ShadowPtr, Align: Alignment, Name: "_ldmxcsr");
4262 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr)
4263 : getCleanOrigin();
4264 insertCheckShadow(Shadow, Origin, OrigIns: &I);
4265 }
4266
4267 void handleMaskedExpandLoad(IntrinsicInst &I) {
4268 IRBuilder<> IRB(&I);
4269 Value *Ptr = I.getArgOperand(i: 0);
4270 MaybeAlign Align = I.getParamAlign(ArgNo: 0);
4271 Value *Mask = I.getArgOperand(i: 1);
4272 Value *PassThru = I.getArgOperand(i: 2);
4273
4274 if (ClCheckAccessAddress) {
4275 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4276 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4277 }
4278
4279 if (!PropagateShadow) {
4280 setShadow(V: &I, SV: getCleanShadow(V: &I));
4281 setOrigin(V: &I, Origin: getCleanOrigin());
4282 return;
4283 }
4284
4285 Type *ShadowTy = getShadowTy(V: &I);
4286 Type *ElementShadowTy = cast<VectorType>(Val: ShadowTy)->getElementType();
4287 auto [ShadowPtr, OriginPtr] =
4288 getShadowOriginPtr(Addr: Ptr, IRB, ShadowTy: ElementShadowTy, Alignment: Align, /*isStore*/ false);
4289
4290 Value *Shadow =
4291 IRB.CreateMaskedExpandLoad(Ty: ShadowTy, Ptr: ShadowPtr, Align, Mask,
4292 PassThru: getShadow(V: PassThru), Name: "_msmaskedexpload");
4293
4294 setShadow(V: &I, SV: Shadow);
4295
4296 // TODO: Store origins.
4297 setOrigin(V: &I, Origin: getCleanOrigin());
4298 }
4299
4300 void handleMaskedCompressStore(IntrinsicInst &I) {
4301 IRBuilder<> IRB(&I);
4302 Value *Values = I.getArgOperand(i: 0);
4303 Value *Ptr = I.getArgOperand(i: 1);
4304 MaybeAlign Align = I.getParamAlign(ArgNo: 1);
4305 Value *Mask = I.getArgOperand(i: 2);
4306
4307 if (ClCheckAccessAddress) {
4308 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4309 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4310 }
4311
4312 Value *Shadow = getShadow(V: Values);
4313 Type *ElementShadowTy =
4314 getShadowTy(OrigTy: cast<VectorType>(Val: Values->getType())->getElementType());
4315 auto [ShadowPtr, OriginPtrs] =
4316 getShadowOriginPtr(Addr: Ptr, IRB, ShadowTy: ElementShadowTy, Alignment: Align, /*isStore*/ true);
4317
4318 IRB.CreateMaskedCompressStore(Val: Shadow, Ptr: ShadowPtr, Align, Mask);
4319
4320 // TODO: Store origins.
4321 }
4322
4323 void handleMaskedGather(IntrinsicInst &I) {
4324 IRBuilder<> IRB(&I);
4325 Value *Ptrs = I.getArgOperand(i: 0);
4326 const Align Alignment = I.getParamAlign(ArgNo: 0).valueOrOne();
4327 Value *Mask = I.getArgOperand(i: 1);
4328 Value *PassThru = I.getArgOperand(i: 2);
4329
4330 Type *PtrsShadowTy = getShadowTy(V: Ptrs);
4331 if (ClCheckAccessAddress) {
4332 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4333 Value *MaskedPtrShadow = IRB.CreateSelect(
4334 C: Mask, True: getShadow(V: Ptrs), False: Constant::getNullValue(Ty: (PtrsShadowTy)),
4335 Name: "_msmaskedptrs");
4336 insertCheckShadow(Shadow: MaskedPtrShadow, Origin: getOrigin(V: Ptrs), OrigIns: &I);
4337 }
4338
4339 if (!PropagateShadow) {
4340 setShadow(V: &I, SV: getCleanShadow(V: &I));
4341 setOrigin(V: &I, Origin: getCleanOrigin());
4342 return;
4343 }
4344
4345 Type *ShadowTy = getShadowTy(V: &I);
4346 Type *ElementShadowTy = cast<VectorType>(Val: ShadowTy)->getElementType();
4347 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4348 Addr: Ptrs, IRB, ShadowTy: ElementShadowTy, Alignment, /*isStore*/ false);
4349
4350 Value *Shadow =
4351 IRB.CreateMaskedGather(Ty: ShadowTy, Ptrs: ShadowPtrs, Alignment, Mask,
4352 PassThru: getShadow(V: PassThru), Name: "_msmaskedgather");
4353
4354 setShadow(V: &I, SV: Shadow);
4355
4356 // TODO: Store origins.
4357 setOrigin(V: &I, Origin: getCleanOrigin());
4358 }
4359
4360 void handleMaskedScatter(IntrinsicInst &I) {
4361 IRBuilder<> IRB(&I);
4362 Value *Values = I.getArgOperand(i: 0);
4363 Value *Ptrs = I.getArgOperand(i: 1);
4364 const Align Alignment = I.getParamAlign(ArgNo: 1).valueOrOne();
4365 Value *Mask = I.getArgOperand(i: 2);
4366
4367 Type *PtrsShadowTy = getShadowTy(V: Ptrs);
4368 if (ClCheckAccessAddress) {
4369 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4370 Value *MaskedPtrShadow = IRB.CreateSelect(
4371 C: Mask, True: getShadow(V: Ptrs), False: Constant::getNullValue(Ty: (PtrsShadowTy)),
4372 Name: "_msmaskedptrs");
4373 insertCheckShadow(Shadow: MaskedPtrShadow, Origin: getOrigin(V: Ptrs), OrigIns: &I);
4374 }
4375
4376 Value *Shadow = getShadow(V: Values);
4377 Type *ElementShadowTy =
4378 getShadowTy(OrigTy: cast<VectorType>(Val: Values->getType())->getElementType());
4379 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4380 Addr: Ptrs, IRB, ShadowTy: ElementShadowTy, Alignment, /*isStore*/ true);
4381
4382 IRB.CreateMaskedScatter(Val: Shadow, Ptrs: ShadowPtrs, Alignment, Mask);
4383
4384 // TODO: Store origin.
4385 }
4386
4387 // Intrinsic::masked_store
4388 //
4389 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4390 // stores are lowered to Intrinsic::masked_store.
4391 void handleMaskedStore(IntrinsicInst &I) {
4392 IRBuilder<> IRB(&I);
4393 Value *V = I.getArgOperand(i: 0);
4394 Value *Ptr = I.getArgOperand(i: 1);
4395 const Align Alignment = I.getParamAlign(ArgNo: 1).valueOrOne();
4396 Value *Mask = I.getArgOperand(i: 2);
4397 Value *Shadow = getShadow(V);
4398
4399 if (ClCheckAccessAddress) {
4400 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4401 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4402 }
4403
4404 Value *ShadowPtr;
4405 Value *OriginPtr;
4406 std::tie(args&: ShadowPtr, args&: OriginPtr) = getShadowOriginPtr(
4407 Addr: Ptr, IRB, ShadowTy: Shadow->getType(), Alignment, /*isStore*/ true);
4408
4409 IRB.CreateMaskedStore(Val: Shadow, Ptr: ShadowPtr, Alignment, Mask);
4410
4411 if (!MS.TrackOrigins)
4412 return;
4413
4414 auto &DL = F.getDataLayout();
4415 paintOrigin(IRB, Origin: getOrigin(V), OriginPtr,
4416 TS: DL.getTypeStoreSize(Ty: Shadow->getType()),
4417 Alignment: std::max(a: Alignment, b: kMinOriginAlignment));
4418 }
4419
4420 // Intrinsic::masked_load
4421 //
4422 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4423 // loads are lowered to Intrinsic::masked_load.
4424 void handleMaskedLoad(IntrinsicInst &I) {
4425 IRBuilder<> IRB(&I);
4426 Value *Ptr = I.getArgOperand(i: 0);
4427 const Align Alignment = I.getParamAlign(ArgNo: 0).valueOrOne();
4428 Value *Mask = I.getArgOperand(i: 1);
4429 Value *PassThru = I.getArgOperand(i: 2);
4430
4431 if (ClCheckAccessAddress) {
4432 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4433 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4434 }
4435
4436 if (!PropagateShadow) {
4437 setShadow(V: &I, SV: getCleanShadow(V: &I));
4438 setOrigin(V: &I, Origin: getCleanOrigin());
4439 return;
4440 }
4441
4442 Type *ShadowTy = getShadowTy(V: &I);
4443 Value *ShadowPtr, *OriginPtr;
4444 std::tie(args&: ShadowPtr, args&: OriginPtr) =
4445 getShadowOriginPtr(Addr: Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4446 setShadow(V: &I, SV: IRB.CreateMaskedLoad(Ty: ShadowTy, Ptr: ShadowPtr, Alignment, Mask,
4447 PassThru: getShadow(V: PassThru), Name: "_msmaskedld"));
4448
4449 if (!MS.TrackOrigins)
4450 return;
4451
4452 // Choose between PassThru's and the loaded value's origins.
4453 Value *MaskedPassThruShadow = IRB.CreateAnd(
4454 LHS: getShadow(V: PassThru), RHS: IRB.CreateSExt(V: IRB.CreateNeg(V: Mask), DestTy: ShadowTy));
4455
4456 Value *NotNull = convertToBool(V: MaskedPassThruShadow, IRB, name: "_mscmp");
4457
4458 Value *PtrOrigin = IRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr);
4459 Value *Origin = IRB.CreateSelect(C: NotNull, True: getOrigin(V: PassThru), False: PtrOrigin);
4460
4461 setOrigin(V: &I, Origin);
4462 }
4463
4464 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4465 // dst mask src
4466 //
4467 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4468 // by handleMaskedStore.
4469 //
4470 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4471 // vector of integers, unlike the LLVM masked intrinsics, which require a
4472 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4473 // mentions that the x86 backend does not know how to efficiently convert
4474 // from a vector of booleans back into the AVX mask format; therefore, they
4475 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4476 // intrinsics.
4477 void handleAVXMaskedStore(IntrinsicInst &I) {
4478 assert(I.arg_size() == 3);
4479
4480 IRBuilder<> IRB(&I);
4481
4482 Value *Dst = I.getArgOperand(i: 0);
4483 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4484
4485 Value *Mask = I.getArgOperand(i: 1);
4486 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4487
4488 Value *Src = I.getArgOperand(i: 2);
4489 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4490
4491 const Align Alignment = Align(1);
4492
4493 Value *SrcShadow = getShadow(V: Src);
4494
4495 if (ClCheckAccessAddress) {
4496 insertCheckShadowOf(Val: Dst, OrigIns: &I);
4497 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4498 }
4499
4500 Value *DstShadowPtr;
4501 Value *DstOriginPtr;
4502 std::tie(args&: DstShadowPtr, args&: DstOriginPtr) = getShadowOriginPtr(
4503 Addr: Dst, IRB, ShadowTy: SrcShadow->getType(), Alignment, /*isStore*/ true);
4504
4505 SmallVector<Value *, 2> ShadowArgs;
4506 ShadowArgs.append(NumInputs: 1, Elt: DstShadowPtr);
4507 ShadowArgs.append(NumInputs: 1, Elt: Mask);
4508 // The intrinsic may require floating-point but shadows can be arbitrary
4509 // bit patterns, of which some would be interpreted as "invalid"
4510 // floating-point values (NaN etc.); we assume the intrinsic will happily
4511 // copy them.
4512 ShadowArgs.append(NumInputs: 1, Elt: IRB.CreateBitCast(V: SrcShadow, DestTy: Src->getType()));
4513
4514 CallInst *CI =
4515 IRB.CreateIntrinsic(RetTy: IRB.getVoidTy(), ID: I.getIntrinsicID(), Args: ShadowArgs);
4516 setShadow(V: &I, SV: CI);
4517
4518 if (!MS.TrackOrigins)
4519 return;
4520
4521 // Approximation only
4522 auto &DL = F.getDataLayout();
4523 paintOrigin(IRB, Origin: getOrigin(V: Src), OriginPtr: DstOriginPtr,
4524 TS: DL.getTypeStoreSize(Ty: SrcShadow->getType()),
4525 Alignment: std::max(a: Alignment, b: kMinOriginAlignment));
4526 }
4527
4528 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4529 // return src mask
4530 //
4531 // Masked-off values are replaced with 0, which conveniently also represents
4532 // initialized memory.
4533 //
4534 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4535 // by handleMaskedStore.
4536 //
4537 // We do not combine this with handleMaskedLoad; see comment in
4538 // handleAVXMaskedStore for the rationale.
4539 //
4540 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4541 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4542 // parameter.
4543 void handleAVXMaskedLoad(IntrinsicInst &I) {
4544 assert(I.arg_size() == 2);
4545
4546 IRBuilder<> IRB(&I);
4547
4548 Value *Src = I.getArgOperand(i: 0);
4549 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4550
4551 Value *Mask = I.getArgOperand(i: 1);
4552 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4553
4554 const Align Alignment = Align(1);
4555
4556 if (ClCheckAccessAddress) {
4557 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4558 }
4559
4560 Type *SrcShadowTy = getShadowTy(V: Src);
4561 Value *SrcShadowPtr, *SrcOriginPtr;
4562 std::tie(args&: SrcShadowPtr, args&: SrcOriginPtr) =
4563 getShadowOriginPtr(Addr: Src, IRB, ShadowTy: SrcShadowTy, Alignment, /*isStore*/ false);
4564
4565 SmallVector<Value *, 2> ShadowArgs;
4566 ShadowArgs.append(NumInputs: 1, Elt: SrcShadowPtr);
4567 ShadowArgs.append(NumInputs: 1, Elt: Mask);
4568
4569 CallInst *CI =
4570 IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(), Args: ShadowArgs);
4571 // The AVX masked load intrinsics do not have integer variants. We use the
4572 // floating-point variants, which will happily copy the shadows even if
4573 // they are interpreted as "invalid" floating-point values (NaN etc.).
4574 setShadow(V: &I, SV: IRB.CreateBitCast(V: CI, DestTy: getShadowTy(V: &I)));
4575
4576 if (!MS.TrackOrigins)
4577 return;
4578
4579 // The "pass-through" value is always zero (initialized). To the extent
4580 // that that results in initialized aligned 4-byte chunks, the origin value
4581 // is ignored. It is therefore correct to simply copy the origin from src.
4582 Value *PtrSrcOrigin = IRB.CreateLoad(Ty: MS.OriginTy, Ptr: SrcOriginPtr);
4583 setOrigin(V: &I, Origin: PtrSrcOrigin);
4584 }
4585
4586 // Test whether the mask indices are initialized, only checking the bits that
4587 // are actually used.
4588 //
4589 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4590 // used/checked.
4591 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4592 assert(isFixedIntVector(Idx));
4593 auto IdxVectorSize =
4594 cast<FixedVectorType>(Val: Idx->getType())->getNumElements();
4595 assert(isPowerOf2_64(IdxVectorSize));
4596
4597 // Compiler isn't smart enough, let's help it
4598 if (isa<Constant>(Val: Idx))
4599 return;
4600
4601 auto *IdxShadow = getShadow(V: Idx);
4602 Value *Truncated = IRB.CreateTrunc(
4603 V: IdxShadow,
4604 DestTy: FixedVectorType::get(ElementType: Type::getIntNTy(C&: *MS.C, N: Log2_64(Value: IdxVectorSize)),
4605 NumElts: IdxVectorSize));
4606 insertCheckShadow(Shadow: Truncated, Origin: getOrigin(V: Idx), OrigIns: I);
4607 }
4608
4609 // Instrument AVX permutation intrinsic.
4610 // We apply the same permutation (argument index 1) to the shadow.
4611 void handleAVXVpermilvar(IntrinsicInst &I) {
4612 IRBuilder<> IRB(&I);
4613 Value *Shadow = getShadow(I: &I, i: 0);
4614 maskedCheckAVXIndexShadow(IRB, Idx: I.getArgOperand(i: 1), I: &I);
4615
4616 // Shadows are integer-ish types but some intrinsics require a
4617 // different (e.g., floating-point) type.
4618 Shadow = IRB.CreateBitCast(V: Shadow, DestTy: I.getArgOperand(i: 0)->getType());
4619 CallInst *CI = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
4620 Args: {Shadow, I.getArgOperand(i: 1)});
4621
4622 setShadow(V: &I, SV: IRB.CreateBitCast(V: CI, DestTy: getShadowTy(V: &I)));
4623 setOriginForNaryOp(I);
4624 }
4625
4626 // Instrument AVX permutation intrinsic.
4627 // We apply the same permutation (argument index 1) to the shadows.
4628 void handleAVXVpermi2var(IntrinsicInst &I) {
4629 assert(I.arg_size() == 3);
4630 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4631 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4632 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4633 [[maybe_unused]] auto ArgVectorSize =
4634 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4635 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4636 ->getNumElements() == ArgVectorSize);
4637 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4638 ->getNumElements() == ArgVectorSize);
4639 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4640 assert(I.getType() == I.getArgOperand(0)->getType());
4641 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4642 IRBuilder<> IRB(&I);
4643 Value *AShadow = getShadow(I: &I, i: 0);
4644 Value *Idx = I.getArgOperand(i: 1);
4645 Value *BShadow = getShadow(I: &I, i: 2);
4646
4647 maskedCheckAVXIndexShadow(IRB, Idx, I: &I);
4648
4649 // Shadows are integer-ish types but some intrinsics require a
4650 // different (e.g., floating-point) type.
4651 AShadow = IRB.CreateBitCast(V: AShadow, DestTy: I.getArgOperand(i: 0)->getType());
4652 BShadow = IRB.CreateBitCast(V: BShadow, DestTy: I.getArgOperand(i: 2)->getType());
4653 CallInst *CI = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
4654 Args: {AShadow, Idx, BShadow});
4655 setShadow(V: &I, SV: IRB.CreateBitCast(V: CI, DestTy: getShadowTy(V: &I)));
4656 setOriginForNaryOp(I);
4657 }
4658
4659 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4660 return isa<FixedVectorType>(Val: T) && T->isIntOrIntVectorTy();
4661 }
4662
4663 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4664 return isa<FixedVectorType>(Val: T) && T->isFPOrFPVectorTy();
4665 }
4666
4667 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4668 return isFixedIntVectorTy(T: V->getType());
4669 }
4670
4671 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4672 return isFixedFPVectorTy(T: V->getType());
4673 }
4674
4675 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4676 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4677 // i32 rounding)
4678 //
4679 // Inconveniently, some similar intrinsics have a different operand order:
4680 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4681 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4682 // i16 mask)
4683 //
4684 // If the return type has more elements than A, the excess elements are
4685 // zeroed (and the corresponding shadow is initialized).
4686 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4687 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4688 // i8 mask)
4689 //
4690 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4691 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4692 // where all_or_nothing(x) is fully uninitialized if x has any
4693 // uninitialized bits
4694 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4695 IRBuilder<> IRB(&I);
4696
4697 assert(I.arg_size() == 4);
4698 Value *A = I.getOperand(i_nocapture: 0);
4699 Value *WriteThrough;
4700 Value *Mask;
4701 Value *RoundingMode;
4702 if (LastMask) {
4703 WriteThrough = I.getOperand(i_nocapture: 2);
4704 Mask = I.getOperand(i_nocapture: 3);
4705 RoundingMode = I.getOperand(i_nocapture: 1);
4706 } else {
4707 WriteThrough = I.getOperand(i_nocapture: 1);
4708 Mask = I.getOperand(i_nocapture: 2);
4709 RoundingMode = I.getOperand(i_nocapture: 3);
4710 }
4711
4712 assert(isFixedFPVector(A));
4713 assert(isFixedIntVector(WriteThrough));
4714
4715 unsigned ANumElements =
4716 cast<FixedVectorType>(Val: A->getType())->getNumElements();
4717 [[maybe_unused]] unsigned WriteThruNumElements =
4718 cast<FixedVectorType>(Val: WriteThrough->getType())->getNumElements();
4719 assert(ANumElements == WriteThruNumElements ||
4720 ANumElements * 2 == WriteThruNumElements);
4721
4722 assert(Mask->getType()->isIntegerTy());
4723 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4724 assert(ANumElements == MaskNumElements ||
4725 ANumElements * 2 == MaskNumElements);
4726
4727 assert(WriteThruNumElements == MaskNumElements);
4728
4729 // Some bits of the mask may be unused, though it's unusual to have partly
4730 // uninitialized bits.
4731 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4732
4733 assert(RoundingMode->getType()->isIntegerTy());
4734 // Only some bits of the rounding mode are used, though it's very
4735 // unusual to have uninitialized bits there (more commonly, it's a
4736 // constant).
4737 insertCheckShadowOf(Val: RoundingMode, OrigIns: &I);
4738
4739 assert(I.getType() == WriteThrough->getType());
4740
4741 Value *AShadow = getShadow(V: A);
4742 AShadow = maybeExtendVectorShadowWithZeros(Shadow: AShadow, I);
4743
4744 if (ANumElements * 2 == MaskNumElements) {
4745 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4746 // from the zeroed shadow instead of the writethrough's shadow.
4747 Mask =
4748 IRB.CreateTrunc(V: Mask, DestTy: IRB.getIntNTy(N: ANumElements), Name: "_ms_mask_trunc");
4749 Mask =
4750 IRB.CreateZExt(V: Mask, DestTy: IRB.getIntNTy(N: MaskNumElements), Name: "_ms_mask_zext");
4751 }
4752
4753 // Convert i16 mask to <16 x i1>
4754 Mask = IRB.CreateBitCast(
4755 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: MaskNumElements),
4756 Name: "_ms_mask_bitcast");
4757
4758 /// For floating-point to integer conversion, the output is:
4759 /// - fully uninitialized if *any* bit of the input is uninitialized
4760 /// - fully ininitialized if all bits of the input are ininitialized
4761 /// We apply the same principle on a per-element basis for vectors.
4762 ///
4763 /// We use the scalar width of the return type instead of A's.
4764 AShadow = IRB.CreateSExt(
4765 V: IRB.CreateICmpNE(LHS: AShadow, RHS: getCleanShadow(OrigTy: AShadow->getType())),
4766 DestTy: getShadowTy(V: &I), Name: "_ms_a_shadow");
4767
4768 Value *WriteThroughShadow = getShadow(V: WriteThrough);
4769 Value *Shadow = IRB.CreateSelect(C: Mask, True: AShadow, False: WriteThroughShadow,
4770 Name: "_ms_writethru_select");
4771
4772 setShadow(V: &I, SV: Shadow);
4773 setOriginForNaryOp(I);
4774 }
4775
4776 // Instrument BMI / BMI2 intrinsics.
4777 // All of these intrinsics are Z = I(X, Y)
4778 // where the types of all operands and the result match, and are either i32 or
4779 // i64. The following instrumentation happens to work for all of them:
4780 // Sz = I(Sx, Y) | (sext (Sy != 0))
4781 void handleBmiIntrinsic(IntrinsicInst &I) {
4782 IRBuilder<> IRB(&I);
4783 Type *ShadowTy = getShadowTy(V: &I);
4784
4785 // If any bit of the mask operand is poisoned, then the whole thing is.
4786 Value *SMask = getShadow(I: &I, i: 1);
4787 SMask = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: SMask, RHS: getCleanShadow(OrigTy: ShadowTy)),
4788 DestTy: ShadowTy);
4789 // Apply the same intrinsic to the shadow of the first operand.
4790 Value *S = IRB.CreateCall(Callee: I.getCalledFunction(),
4791 Args: {getShadow(I: &I, i: 0), I.getOperand(i_nocapture: 1)});
4792 S = IRB.CreateOr(LHS: SMask, RHS: S);
4793 setShadow(V: &I, SV: S);
4794 setOriginForNaryOp(I);
4795 }
4796
4797 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4798 SmallVector<int, 8> Mask;
4799 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4800 Mask.append(NumInputs: 2, Elt: X);
4801 }
4802 return Mask;
4803 }
4804
4805 // Instrument pclmul intrinsics.
4806 // These intrinsics operate either on odd or on even elements of the input
4807 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4808 // Replace the unused elements with copies of the used ones, ex:
4809 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4810 // or
4811 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4812 // and then apply the usual shadow combining logic.
4813 void handlePclmulIntrinsic(IntrinsicInst &I) {
4814 IRBuilder<> IRB(&I);
4815 unsigned Width =
4816 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4817 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4818 "pclmul 3rd operand must be a constant");
4819 unsigned Imm = cast<ConstantInt>(Val: I.getArgOperand(i: 2))->getZExtValue();
4820 Value *Shuf0 = IRB.CreateShuffleVector(V: getShadow(I: &I, i: 0),
4821 Mask: getPclmulMask(Width, OddElements: Imm & 0x01));
4822 Value *Shuf1 = IRB.CreateShuffleVector(V: getShadow(I: &I, i: 1),
4823 Mask: getPclmulMask(Width, OddElements: Imm & 0x10));
4824 ShadowAndOriginCombiner SOC(this, IRB);
4825 SOC.Add(OpShadow: Shuf0, OpOrigin: getOrigin(I: &I, i: 0));
4826 SOC.Add(OpShadow: Shuf1, OpOrigin: getOrigin(I: &I, i: 1));
4827 SOC.Done(I: &I);
4828 }
4829
4830 // Instrument _mm_*_sd|ss intrinsics
4831 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4832 IRBuilder<> IRB(&I);
4833 unsigned Width =
4834 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4835 Value *First = getShadow(I: &I, i: 0);
4836 Value *Second = getShadow(I: &I, i: 1);
4837 // First element of second operand, remaining elements of first operand
4838 SmallVector<int, 16> Mask;
4839 Mask.push_back(Elt: Width);
4840 for (unsigned i = 1; i < Width; i++)
4841 Mask.push_back(Elt: i);
4842 Value *Shadow = IRB.CreateShuffleVector(V1: First, V2: Second, Mask);
4843
4844 setShadow(V: &I, SV: Shadow);
4845 setOriginForNaryOp(I);
4846 }
4847
4848 void handleVtestIntrinsic(IntrinsicInst &I) {
4849 IRBuilder<> IRB(&I);
4850 Value *Shadow0 = getShadow(I: &I, i: 0);
4851 Value *Shadow1 = getShadow(I: &I, i: 1);
4852 Value *Or = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4853 Value *NZ = IRB.CreateICmpNE(LHS: Or, RHS: Constant::getNullValue(Ty: Or->getType()));
4854 Value *Scalar = convertShadowToScalar(V: NZ, IRB);
4855 Value *Shadow = IRB.CreateZExt(V: Scalar, DestTy: getShadowTy(V: &I));
4856
4857 setShadow(V: &I, SV: Shadow);
4858 setOriginForNaryOp(I);
4859 }
4860
4861 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4862 IRBuilder<> IRB(&I);
4863 unsigned Width =
4864 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4865 Value *First = getShadow(I: &I, i: 0);
4866 Value *Second = getShadow(I: &I, i: 1);
4867 Value *OrShadow = IRB.CreateOr(LHS: First, RHS: Second);
4868 // First element of both OR'd together, remaining elements of first operand
4869 SmallVector<int, 16> Mask;
4870 Mask.push_back(Elt: Width);
4871 for (unsigned i = 1; i < Width; i++)
4872 Mask.push_back(Elt: i);
4873 Value *Shadow = IRB.CreateShuffleVector(V1: First, V2: OrShadow, Mask);
4874
4875 setShadow(V: &I, SV: Shadow);
4876 setOriginForNaryOp(I);
4877 }
4878
4879 // _mm_round_ps / _mm_round_ps.
4880 // Similar to maybeHandleSimpleNomemIntrinsic except
4881 // the second argument is guaranteed to be a constant integer.
4882 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4883 assert(I.getArgOperand(0)->getType() == I.getType());
4884 assert(I.arg_size() == 2);
4885 assert(isa<ConstantInt>(I.getArgOperand(1)));
4886
4887 IRBuilder<> IRB(&I);
4888 ShadowAndOriginCombiner SC(this, IRB);
4889 SC.Add(V: I.getArgOperand(i: 0));
4890 SC.Done(I: &I);
4891 }
4892
4893 // Instrument @llvm.abs intrinsic.
4894 //
4895 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4896 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4897 void handleAbsIntrinsic(IntrinsicInst &I) {
4898 assert(I.arg_size() == 2);
4899 Value *Src = I.getArgOperand(i: 0);
4900 Value *IsIntMinPoison = I.getArgOperand(i: 1);
4901
4902 assert(I.getType()->isIntOrIntVectorTy());
4903
4904 assert(Src->getType() == I.getType());
4905
4906 assert(IsIntMinPoison->getType()->isIntegerTy());
4907 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4908
4909 IRBuilder<> IRB(&I);
4910 Value *SrcShadow = getShadow(V: Src);
4911
4912 APInt MinVal =
4913 APInt::getSignedMinValue(numBits: Src->getType()->getScalarSizeInBits());
4914 Value *MinValVec = ConstantInt::get(Ty: Src->getType(), V: MinVal);
4915 Value *SrcIsMin = IRB.CreateICmp(P: CmpInst::ICMP_EQ, LHS: Src, RHS: MinValVec);
4916
4917 Value *PoisonedShadow = getPoisonedShadow(V: Src);
4918 Value *PoisonedIfIntMinShadow =
4919 IRB.CreateSelect(C: SrcIsMin, True: PoisonedShadow, False: SrcShadow);
4920 Value *Shadow =
4921 IRB.CreateSelect(C: IsIntMinPoison, True: PoisonedIfIntMinShadow, False: SrcShadow);
4922
4923 setShadow(V: &I, SV: Shadow);
4924 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
4925 }
4926
4927 void handleIsFpClass(IntrinsicInst &I) {
4928 IRBuilder<> IRB(&I);
4929 Value *Shadow = getShadow(I: &I, i: 0);
4930 setShadow(V: &I, SV: IRB.CreateICmpNE(LHS: Shadow, RHS: getCleanShadow(V: Shadow)));
4931 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
4932 }
4933
4934 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4935 IRBuilder<> IRB(&I);
4936 Value *Shadow0 = getShadow(I: &I, i: 0);
4937 Value *Shadow1 = getShadow(I: &I, i: 1);
4938 Value *ShadowElt0 = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4939 Value *ShadowElt1 =
4940 IRB.CreateICmpNE(LHS: ShadowElt0, RHS: getCleanShadow(V: ShadowElt0));
4941
4942 Value *Shadow = PoisonValue::get(T: getShadowTy(V: &I));
4943 Shadow = IRB.CreateInsertValue(Agg: Shadow, Val: ShadowElt0, Idxs: 0);
4944 Shadow = IRB.CreateInsertValue(Agg: Shadow, Val: ShadowElt1, Idxs: 1);
4945
4946 setShadow(V: &I, SV: Shadow);
4947 setOriginForNaryOp(I);
4948 }
4949
4950 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4951 assert(isa<FixedVectorType>(V->getType()));
4952 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4953 Value *Shadow = getShadow(V);
4954 return IRB.CreateExtractElement(Vec: Shadow,
4955 Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0));
4956 }
4957
4958 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4959 //
4960 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4961 // (<8 x i64>, <16 x i8>, i8)
4962 // A WriteThru Mask
4963 //
4964 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4965 // (<16 x i32>, <16 x i8>, i16)
4966 //
4967 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4968 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4969 //
4970 // If Dst has more elements than A, the excess elements are zeroed (and the
4971 // corresponding shadow is initialized).
4972 //
4973 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4974 // and is much faster than this handler.
4975 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4976 IRBuilder<> IRB(&I);
4977
4978 assert(I.arg_size() == 3);
4979 Value *A = I.getOperand(i_nocapture: 0);
4980 Value *WriteThrough = I.getOperand(i_nocapture: 1);
4981 Value *Mask = I.getOperand(i_nocapture: 2);
4982
4983 assert(isFixedIntVector(A));
4984 assert(isFixedIntVector(WriteThrough));
4985
4986 unsigned ANumElements =
4987 cast<FixedVectorType>(Val: A->getType())->getNumElements();
4988 unsigned OutputNumElements =
4989 cast<FixedVectorType>(Val: WriteThrough->getType())->getNumElements();
4990 assert(ANumElements == OutputNumElements ||
4991 ANumElements * 2 == OutputNumElements);
4992
4993 assert(Mask->getType()->isIntegerTy());
4994 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4995 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4996
4997 assert(I.getType() == WriteThrough->getType());
4998
4999 // Widen the mask, if necessary, to have one bit per element of the output
5000 // vector.
5001 // We want the extra bits to have '1's, so that the CreateSelect will
5002 // select the values from AShadow instead of WriteThroughShadow ("maskless"
5003 // versions of the intrinsics are sometimes implemented using an all-1's
5004 // mask and an undefined value for WriteThroughShadow). We accomplish this
5005 // by using bitwise NOT before and after the ZExt.
5006 if (ANumElements != OutputNumElements) {
5007 Mask = IRB.CreateNot(V: Mask);
5008 Mask = IRB.CreateZExt(V: Mask, DestTy: Type::getIntNTy(C&: *MS.C, N: OutputNumElements),
5009 Name: "_ms_widen_mask");
5010 Mask = IRB.CreateNot(V: Mask);
5011 }
5012 Mask = IRB.CreateBitCast(
5013 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: OutputNumElements));
5014
5015 Value *AShadow = getShadow(V: A);
5016
5017 // The return type might have more elements than the input.
5018 // Temporarily shrink the return type's number of elements.
5019 VectorType *ShadowType = maybeShrinkVectorShadowType(Src: A, I);
5020
5021 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
5022 // This handler treats them all as truncation, which leads to some rare
5023 // false positives in the cases where the truncated bytes could
5024 // unambiguously saturate the value e.g., if A = ??????10 ????????
5025 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
5026 // fully defined, but the truncated byte is ????????.
5027 //
5028 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
5029 AShadow = IRB.CreateTrunc(V: AShadow, DestTy: ShadowType, Name: "_ms_trunc_shadow");
5030 AShadow = maybeExtendVectorShadowWithZeros(Shadow: AShadow, I);
5031
5032 Value *WriteThroughShadow = getShadow(V: WriteThrough);
5033
5034 Value *Shadow = IRB.CreateSelect(C: Mask, True: AShadow, False: WriteThroughShadow);
5035 setShadow(V: &I, SV: Shadow);
5036 setOriginForNaryOp(I);
5037 }
5038
5039 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
5040 // values and perform an operation whose shadow propagation should be handled
5041 // as all-or-nothing [*], with masking provided by a vector and a mask
5042 // supplied as an integer.
5043 //
5044 // [*] if all bits of a vector element are initialized, the output is fully
5045 // initialized; otherwise, the output is fully uninitialized
5046 //
5047 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5048 // (<16 x float>, <16 x float>, i16)
5049 // A WriteThru Mask
5050 //
5051 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5052 // (<2 x double>, <2 x double>, i8)
5053 //
5054 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5055 // (<8 x double>, i32, <8 x double>, i8, i32)
5056 // A Imm WriteThru Mask Rounding
5057 //
5058 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
5059 // be fully initialized.
5060 //
5061 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
5062 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
5063 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
5064 unsigned WriteThruIndex,
5065 unsigned MaskIndex) {
5066 IRBuilder<> IRB(&I);
5067
5068 unsigned NumArgs = I.arg_size();
5069 assert(AIndex < NumArgs);
5070 assert(WriteThruIndex < NumArgs);
5071 assert(MaskIndex < NumArgs);
5072 assert(AIndex != WriteThruIndex);
5073 assert(AIndex != MaskIndex);
5074 assert(WriteThruIndex != MaskIndex);
5075
5076 Value *A = I.getOperand(i_nocapture: AIndex);
5077 Value *WriteThru = I.getOperand(i_nocapture: WriteThruIndex);
5078 Value *Mask = I.getOperand(i_nocapture: MaskIndex);
5079
5080 assert(isFixedFPVector(A));
5081 assert(isFixedFPVector(WriteThru));
5082
5083 [[maybe_unused]] unsigned ANumElements =
5084 cast<FixedVectorType>(Val: A->getType())->getNumElements();
5085 unsigned OutputNumElements =
5086 cast<FixedVectorType>(Val: WriteThru->getType())->getNumElements();
5087 assert(ANumElements == OutputNumElements);
5088
5089 for (unsigned i = 0; i < NumArgs; ++i) {
5090 if (i != AIndex && i != WriteThruIndex) {
5091 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5092 // they be fully initialized.
5093 assert(I.getOperand(i)->getType()->isIntegerTy());
5094 insertCheckShadowOf(Val: I.getOperand(i_nocapture: i), OrigIns: &I);
5095 }
5096 }
5097
5098 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5099 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
5100 Mask = IRB.CreateTrunc(V: Mask, DestTy: Type::getIntNTy(C&: *MS.C, N: ANumElements));
5101 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5102
5103 assert(I.getType() == WriteThru->getType());
5104
5105 Mask = IRB.CreateBitCast(
5106 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: OutputNumElements));
5107
5108 Value *AShadow = getShadow(V: A);
5109
5110 // All-or-nothing shadow
5111 AShadow = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: AShadow, RHS: getCleanShadow(V: AShadow)),
5112 DestTy: AShadow->getType());
5113
5114 Value *WriteThruShadow = getShadow(V: WriteThru);
5115
5116 Value *Shadow = IRB.CreateSelect(C: Mask, True: AShadow, False: WriteThruShadow);
5117 setShadow(V: &I, SV: Shadow);
5118
5119 setOriginForNaryOp(I);
5120 }
5121
5122 // For sh.* compiler intrinsics:
5123 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5124 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5125 // A B WriteThru Mask RoundingMode
5126 //
5127 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5128 // DstShadow[1..7] = AShadow[1..7]
5129 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5130 IRBuilder<> IRB(&I);
5131
5132 assert(I.arg_size() == 5);
5133 Value *A = I.getOperand(i_nocapture: 0);
5134 Value *B = I.getOperand(i_nocapture: 1);
5135 Value *WriteThrough = I.getOperand(i_nocapture: 2);
5136 Value *Mask = I.getOperand(i_nocapture: 3);
5137 Value *RoundingMode = I.getOperand(i_nocapture: 4);
5138
5139 // Technically, we could probably just check whether the LSB is
5140 // initialized, but intuitively it feels like a partly uninitialized mask
5141 // is unintended, and we should warn the user immediately.
5142 insertCheckShadowOf(Val: Mask, OrigIns: &I);
5143 insertCheckShadowOf(Val: RoundingMode, OrigIns: &I);
5144
5145 assert(isa<FixedVectorType>(A->getType()));
5146 unsigned NumElements =
5147 cast<FixedVectorType>(Val: A->getType())->getNumElements();
5148 assert(NumElements == 8);
5149 assert(A->getType() == B->getType());
5150 assert(B->getType() == WriteThrough->getType());
5151 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5152 assert(RoundingMode->getType()->isIntegerTy());
5153
5154 Value *ALowerShadow = extractLowerShadow(IRB, V: A);
5155 Value *BLowerShadow = extractLowerShadow(IRB, V: B);
5156
5157 Value *ABLowerShadow = IRB.CreateOr(LHS: ALowerShadow, RHS: BLowerShadow);
5158
5159 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, V: WriteThrough);
5160
5161 Mask = IRB.CreateBitCast(
5162 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: NumElements));
5163 Value *MaskLower =
5164 IRB.CreateExtractElement(Vec: Mask, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0));
5165
5166 Value *AShadow = getShadow(V: A);
5167 Value *DstLowerShadow =
5168 IRB.CreateSelect(C: MaskLower, True: ABLowerShadow, False: WriteThroughLowerShadow);
5169 Value *DstShadow = IRB.CreateInsertElement(
5170 Vec: AShadow, NewElt: DstLowerShadow, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0),
5171 Name: "_msprop");
5172
5173 setShadow(V: &I, SV: DstShadow);
5174 setOriginForNaryOp(I);
5175 }
5176
5177 // Approximately handle AVX Galois Field Affine Transformation
5178 //
5179 // e.g.,
5180 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5181 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5182 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5183 // Out A x b
5184 // where A and x are packed matrices, b is a vector,
5185 // Out = A * x + b in GF(2)
5186 //
5187 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5188 // computation also includes a parity calculation.
5189 //
5190 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5191 // Out_Shadow = (V1_Shadow & V2_Shadow)
5192 // | (V1 & V2_Shadow)
5193 // | (V1_Shadow & V2 )
5194 //
5195 // We approximate the shadow of gf2p8affineqb using:
5196 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5197 // | gf2p8affineqb(x, A_shadow, 0)
5198 // | gf2p8affineqb(x_Shadow, A, 0)
5199 // | set1_epi8(b_Shadow)
5200 //
5201 // This approximation has false negatives: if an intermediate dot-product
5202 // contains an even number of 1's, the parity is 0.
5203 // It has no false positives.
5204 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5205 IRBuilder<> IRB(&I);
5206
5207 assert(I.arg_size() == 3);
5208 Value *A = I.getOperand(i_nocapture: 0);
5209 Value *X = I.getOperand(i_nocapture: 1);
5210 Value *B = I.getOperand(i_nocapture: 2);
5211
5212 assert(isFixedIntVector(A));
5213 assert(cast<VectorType>(A->getType())
5214 ->getElementType()
5215 ->getScalarSizeInBits() == 8);
5216
5217 assert(A->getType() == X->getType());
5218
5219 assert(B->getType()->isIntegerTy());
5220 assert(B->getType()->getScalarSizeInBits() == 8);
5221
5222 assert(I.getType() == A->getType());
5223
5224 Value *AShadow = getShadow(V: A);
5225 Value *XShadow = getShadow(V: X);
5226 Value *BZeroShadow = getCleanShadow(V: B);
5227
5228 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5229 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {XShadow, AShadow, BZeroShadow});
5230 CallInst *AShadowX = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
5231 Args: {X, AShadow, BZeroShadow});
5232 CallInst *XShadowA = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
5233 Args: {XShadow, A, BZeroShadow});
5234
5235 unsigned NumElements = cast<FixedVectorType>(Val: I.getType())->getNumElements();
5236 Value *BShadow = getShadow(V: B);
5237 Value *BBroadcastShadow = getCleanShadow(V: AShadow);
5238 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5239 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5240 // lower appropriately (e.g., VPBROADCASTB).
5241 // Besides, b is often a constant, in which case it is fully initialized.
5242 for (unsigned i = 0; i < NumElements; i++)
5243 BBroadcastShadow = IRB.CreateInsertElement(Vec: BBroadcastShadow, NewElt: BShadow, Idx: i);
5244
5245 setShadow(V: &I, SV: IRB.CreateOr(
5246 Ops: {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5247 setOriginForNaryOp(I);
5248 }
5249
5250 // Handle Arm NEON vector load intrinsics (vld*).
5251 //
5252 // The WithLane instructions (ld[234]lane) are similar to:
5253 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5254 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5255 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5256 // %A)
5257 //
5258 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5259 // to:
5260 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5261 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5262 unsigned int numArgs = I.arg_size();
5263
5264 // Return type is a struct of vectors of integers or floating-point
5265 assert(I.getType()->isStructTy());
5266 [[maybe_unused]] StructType *RetTy = cast<StructType>(Val: I.getType());
5267 assert(RetTy->getNumElements() > 0);
5268 assert(RetTy->getElementType(0)->isIntOrIntVectorTy() ||
5269 RetTy->getElementType(0)->isFPOrFPVectorTy());
5270 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5271 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5272
5273 if (WithLane) {
5274 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5275 assert(4 <= numArgs && numArgs <= 6);
5276
5277 // Return type is a struct of the input vectors
5278 assert(RetTy->getNumElements() + 2 == numArgs);
5279 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5280 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5281 } else {
5282 assert(numArgs == 1);
5283 }
5284
5285 IRBuilder<> IRB(&I);
5286
5287 SmallVector<Value *, 6> ShadowArgs;
5288 if (WithLane) {
5289 for (unsigned int i = 0; i < numArgs - 2; i++)
5290 ShadowArgs.push_back(Elt: getShadow(V: I.getArgOperand(i)));
5291
5292 // Lane number, passed verbatim
5293 Value *LaneNumber = I.getArgOperand(i: numArgs - 2);
5294 ShadowArgs.push_back(Elt: LaneNumber);
5295
5296 // TODO: blend shadow of lane number into output shadow?
5297 insertCheckShadowOf(Val: LaneNumber, OrigIns: &I);
5298 }
5299
5300 Value *Src = I.getArgOperand(i: numArgs - 1);
5301 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5302
5303 Type *SrcShadowTy = getShadowTy(V: Src);
5304 auto [SrcShadowPtr, SrcOriginPtr] =
5305 getShadowOriginPtr(Addr: Src, IRB, ShadowTy: SrcShadowTy, Alignment: Align(1), /*isStore*/ false);
5306 ShadowArgs.push_back(Elt: SrcShadowPtr);
5307
5308 // The NEON vector load instructions handled by this function all have
5309 // integer variants. It is easier to use those rather than trying to cast
5310 // a struct of vectors of floats into a struct of vectors of integers.
5311 CallInst *CI =
5312 IRB.CreateIntrinsic(RetTy: getShadowTy(V: &I), ID: I.getIntrinsicID(), Args: ShadowArgs);
5313 setShadow(V: &I, SV: CI);
5314
5315 if (!MS.TrackOrigins)
5316 return;
5317
5318 Value *PtrSrcOrigin = IRB.CreateLoad(Ty: MS.OriginTy, Ptr: SrcOriginPtr);
5319 setOrigin(V: &I, Origin: PtrSrcOrigin);
5320 }
5321
5322 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5323 /// and vst{2,3,4}lane).
5324 ///
5325 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5326 /// last argument, with the initial arguments being the inputs (and lane
5327 /// number for vst{2,3,4}lane). They return void.
5328 ///
5329 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5330 /// abcdabcdabcdabcd... into *outP
5331 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5332 /// writes aaaa...bbbb...cccc...dddd... into *outP
5333 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5334 /// These instructions can all be instrumented with essentially the same
5335 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5336 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5337 IRBuilder<> IRB(&I);
5338
5339 // Don't use getNumOperands() because it includes the callee
5340 int numArgOperands = I.arg_size();
5341
5342 // The last arg operand is the output (pointer)
5343 assert(numArgOperands >= 1);
5344 Value *Addr = I.getArgOperand(i: numArgOperands - 1);
5345 assert(Addr->getType()->isPointerTy());
5346 int skipTrailingOperands = 1;
5347
5348 if (ClCheckAccessAddress)
5349 insertCheckShadowOf(Val: Addr, OrigIns: &I);
5350
5351 // Second-last operand is the lane number (for vst{2,3,4}lane)
5352 if (useLane) {
5353 skipTrailingOperands++;
5354 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5355 assert(isa<IntegerType>(
5356 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5357 }
5358
5359 SmallVector<Value *, 8> ShadowArgs;
5360 // All the initial operands are the inputs
5361 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5362 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5363 Value *Shadow = getShadow(I: &I, i);
5364 ShadowArgs.append(NumInputs: 1, Elt: Shadow);
5365 }
5366
5367 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5368 // e.g., for:
5369 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5370 // we know the type of the output (and its shadow) is <16 x i8>.
5371 //
5372 // Arm NEON VST is unusual because the last argument is the output address:
5373 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5374 // call void @llvm.aarch64.neon.st2.v16i8.p0
5375 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5376 // and we have no type information about P's operand. We must manually
5377 // compute the type (<16 x i8> x 2).
5378 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5379 ElementType: cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getElementType(),
5380 NumElts: cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements() *
5381 (numArgOperands - skipTrailingOperands));
5382 Type *OutputShadowTy = getShadowTy(OrigTy: OutputVectorTy);
5383
5384 if (useLane)
5385 ShadowArgs.append(NumInputs: 1,
5386 Elt: I.getArgOperand(i: numArgOperands - skipTrailingOperands));
5387
5388 Value *OutputShadowPtr, *OutputOriginPtr;
5389 // AArch64 NEON does not need alignment (unless OS requires it)
5390 std::tie(args&: OutputShadowPtr, args&: OutputOriginPtr) = getShadowOriginPtr(
5391 Addr, IRB, ShadowTy: OutputShadowTy, Alignment: Align(1), /*isStore*/ true);
5392 ShadowArgs.append(NumInputs: 1, Elt: OutputShadowPtr);
5393
5394 CallInst *CI =
5395 IRB.CreateIntrinsic(RetTy: IRB.getVoidTy(), ID: I.getIntrinsicID(), Args: ShadowArgs);
5396 setShadow(V: &I, SV: CI);
5397
5398 if (MS.TrackOrigins) {
5399 // TODO: if we modelled the vst* instruction more precisely, we could
5400 // more accurately track the origins (e.g., if both inputs are
5401 // uninitialized for vst2, we currently blame the second input, even
5402 // though part of the output depends only on the first input).
5403 //
5404 // This is particularly imprecise for vst{2,3,4}lane, since only one
5405 // lane of each input is actually copied to the output.
5406 OriginCombiner OC(this, IRB);
5407 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5408 OC.Add(V: I.getArgOperand(i));
5409
5410 const DataLayout &DL = F.getDataLayout();
5411 OC.DoneAndStoreOrigin(TS: DL.getTypeStoreSize(Ty: OutputVectorTy),
5412 OriginPtr: OutputOriginPtr);
5413 }
5414 }
5415
5416 // <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8
5417 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5418 // <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5419 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5420 // <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8
5421 // (<4 x i32> R%, <16 x i8> %X, <16 x i8> %Y)
5422 //
5423 // Note:
5424 // - < 4 x *> is a 2x2 matrix
5425 // - <16 x *> is a 2x8 matrix and 8x2 matrix respectively
5426 //
5427 // The general shadow propagation approach is:
5428 // 1) get the shadows of the input matrices %X and %Y
5429 // 2) change the shadow values to 0x1 if the corresponding value is fully
5430 // initialized, and 0x0 otherwise
5431 // 3) perform a matrix multiplication on the shadows of %X and %Y. The output
5432 // will be a 2x2 matrix; for each element, a value of 0x8 means all the
5433 // corresponding inputs were clean.
5434 // 4) blend in the shadow of %R
5435 //
5436 // TODO: consider allowing multiplication of zero with an uninitialized value
5437 // to result in an initialized value.
5438 //
5439 // TODO: handle floating-point matrix multiply using ummla on the shadows:
5440 // case Intrinsic::aarch64_neon_bfmmla:
5441 // handleNEONMatrixMultiply(I, /*ARows=*/ 2, /*ACols=*/ 4,
5442 // /*BRows=*/ 4, /*BCols=*/ 2);
5443 //
5444 void handleNEONMatrixMultiply(IntrinsicInst &I, unsigned int ARows,
5445 unsigned int ACols, unsigned int BRows,
5446 unsigned int BCols) {
5447 IRBuilder<> IRB(&I);
5448
5449 assert(I.arg_size() == 3);
5450 Value *R = I.getArgOperand(i: 0);
5451 Value *A = I.getArgOperand(i: 1);
5452 Value *B = I.getArgOperand(i: 2);
5453
5454 assert(I.getType() == R->getType());
5455
5456 assert(isa<FixedVectorType>(R->getType()));
5457 assert(isa<FixedVectorType>(A->getType()));
5458 assert(isa<FixedVectorType>(B->getType()));
5459
5460 [[maybe_unused]] FixedVectorType *RTy = cast<FixedVectorType>(Val: R->getType());
5461 [[maybe_unused]] FixedVectorType *ATy = cast<FixedVectorType>(Val: A->getType());
5462 [[maybe_unused]] FixedVectorType *BTy = cast<FixedVectorType>(Val: B->getType());
5463
5464 assert(ACols == BRows);
5465 assert(ATy->getNumElements() == ARows * ACols);
5466 assert(BTy->getNumElements() == BRows * BCols);
5467 assert(RTy->getNumElements() == ARows * BCols);
5468
5469 LLVM_DEBUG(dbgs() << "### R: " << *RTy->getElementType() << "\n");
5470 LLVM_DEBUG(dbgs() << "### A: " << *ATy->getElementType() << "\n");
5471 if (RTy->getElementType()->isIntegerTy()) {
5472 // Types are not identical e.g., <4 x i32> %R, <16 x i8> %A
5473 assert(ATy->getElementType()->isIntegerTy());
5474 } else {
5475 assert(RTy->getElementType()->isFloatingPointTy());
5476 assert(ATy->getElementType()->isFloatingPointTy());
5477 }
5478 assert(ATy->getElementType() == BTy->getElementType());
5479
5480 Value *ShadowR = getShadow(I: &I, i: 0);
5481 Value *ShadowA = getShadow(I: &I, i: 1);
5482 Value *ShadowB = getShadow(I: &I, i: 2);
5483
5484 // If the value is fully initialized, the shadow will be 000...001.
5485 // Otherwise, the shadow will be all zero.
5486 // (This is the opposite of how we typically handle shadows.)
5487 ShadowA = IRB.CreateZExt(V: IRB.CreateICmpEQ(LHS: ShadowA, RHS: getCleanShadow(V: A)),
5488 DestTy: ShadowA->getType());
5489 ShadowB = IRB.CreateZExt(V: IRB.CreateICmpEQ(LHS: ShadowB, RHS: getCleanShadow(V: B)),
5490 DestTy: ShadowB->getType());
5491
5492 Value *ShadowAB = IRB.CreateIntrinsic(
5493 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {getCleanShadow(V: R), ShadowA, ShadowB});
5494
5495 Value *FullyInit = ConstantVector::getSplat(
5496 EC: RTy->getElementCount(),
5497 Elt: ConstantInt::get(Ty: cast<VectorType>(Val: getShadowTy(V: R))->getElementType(),
5498 V: ACols));
5499
5500 ShadowAB = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: ShadowAB, RHS: FullyInit),
5501 DestTy: ShadowAB->getType());
5502
5503 ShadowR = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: ShadowR, RHS: getCleanShadow(V: R)),
5504 DestTy: ShadowR->getType());
5505
5506 setShadow(V: &I, SV: IRB.CreateOr(LHS: ShadowAB, RHS: ShadowR));
5507 setOriginForNaryOp(I);
5508 }
5509
5510 /// Handle intrinsics by applying the intrinsic to the shadows.
5511 ///
5512 /// The trailing arguments are passed verbatim to the intrinsic, though any
5513 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5514 /// intrinsic with one trailing verbatim argument:
5515 /// out = intrinsic(var1, var2, opType)
5516 /// we compute:
5517 /// shadow[out] =
5518 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5519 ///
5520 /// Typically, shadowIntrinsicID will be specified by the caller to be
5521 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5522 /// intrinsic of the same type.
5523 ///
5524 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5525 /// bit-patterns (for example, if the intrinsic accepts floats for
5526 /// var1, we require that it doesn't care if inputs are NaNs).
5527 ///
5528 /// For example, this can be applied to the Arm NEON vector table intrinsics
5529 /// (tbl{1,2,3,4}).
5530 ///
5531 /// The origin is approximated using setOriginForNaryOp.
5532 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5533 Intrinsic::ID shadowIntrinsicID,
5534 unsigned int trailingVerbatimArgs) {
5535 IRBuilder<> IRB(&I);
5536
5537 assert(trailingVerbatimArgs < I.arg_size());
5538
5539 SmallVector<Value *, 8> ShadowArgs;
5540 // Don't use getNumOperands() because it includes the callee
5541 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5542 Value *Shadow = getShadow(I: &I, i);
5543
5544 // Shadows are integer-ish types but some intrinsics require a
5545 // different (e.g., floating-point) type.
5546 ShadowArgs.push_back(
5547 Elt: IRB.CreateBitCast(V: Shadow, DestTy: I.getArgOperand(i)->getType()));
5548 }
5549
5550 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5551 i++) {
5552 Value *Arg = I.getArgOperand(i);
5553 ShadowArgs.push_back(Elt: Arg);
5554 }
5555
5556 CallInst *CI =
5557 IRB.CreateIntrinsic(RetTy: I.getType(), ID: shadowIntrinsicID, Args: ShadowArgs);
5558 Value *CombinedShadow = CI;
5559
5560 // Combine the computed shadow with the shadow of trailing args
5561 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5562 i++) {
5563 Value *Shadow =
5564 CreateShadowCast(IRB, V: getShadow(I: &I, i), dstTy: CombinedShadow->getType());
5565 CombinedShadow = IRB.CreateOr(LHS: Shadow, RHS: CombinedShadow, Name: "_msprop");
5566 }
5567
5568 setShadow(V: &I, SV: IRB.CreateBitCast(V: CombinedShadow, DestTy: getShadowTy(V: &I)));
5569
5570 setOriginForNaryOp(I);
5571 }
5572
5573 // Approximation only
5574 //
5575 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5576 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5577 assert(I.arg_size() == 2);
5578
5579 handleShadowOr(I);
5580 }
5581
5582 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5583 switch (I.getIntrinsicID()) {
5584 case Intrinsic::uadd_with_overflow:
5585 case Intrinsic::sadd_with_overflow:
5586 case Intrinsic::usub_with_overflow:
5587 case Intrinsic::ssub_with_overflow:
5588 case Intrinsic::umul_with_overflow:
5589 case Intrinsic::smul_with_overflow:
5590 handleArithmeticWithOverflow(I);
5591 break;
5592 case Intrinsic::abs:
5593 handleAbsIntrinsic(I);
5594 break;
5595 case Intrinsic::bitreverse:
5596 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
5597 /*trailingVerbatimArgs*/ 0);
5598 break;
5599 case Intrinsic::is_fpclass:
5600 handleIsFpClass(I);
5601 break;
5602 case Intrinsic::lifetime_start:
5603 handleLifetimeStart(I);
5604 break;
5605 case Intrinsic::launder_invariant_group:
5606 case Intrinsic::strip_invariant_group:
5607 handleInvariantGroup(I);
5608 break;
5609 case Intrinsic::bswap:
5610 handleBswap(I);
5611 break;
5612 case Intrinsic::ctlz:
5613 case Intrinsic::cttz:
5614 handleCountLeadingTrailingZeros(I);
5615 break;
5616 case Intrinsic::masked_compressstore:
5617 handleMaskedCompressStore(I);
5618 break;
5619 case Intrinsic::masked_expandload:
5620 handleMaskedExpandLoad(I);
5621 break;
5622 case Intrinsic::masked_gather:
5623 handleMaskedGather(I);
5624 break;
5625 case Intrinsic::masked_scatter:
5626 handleMaskedScatter(I);
5627 break;
5628 case Intrinsic::masked_store:
5629 handleMaskedStore(I);
5630 break;
5631 case Intrinsic::masked_load:
5632 handleMaskedLoad(I);
5633 break;
5634 case Intrinsic::vector_reduce_and:
5635 handleVectorReduceAndIntrinsic(I);
5636 break;
5637 case Intrinsic::vector_reduce_or:
5638 handleVectorReduceOrIntrinsic(I);
5639 break;
5640
5641 case Intrinsic::vector_reduce_add:
5642 case Intrinsic::vector_reduce_xor:
5643 case Intrinsic::vector_reduce_mul:
5644 // Signed/Unsigned Min/Max
5645 // TODO: handling similarly to AND/OR may be more precise.
5646 case Intrinsic::vector_reduce_smax:
5647 case Intrinsic::vector_reduce_smin:
5648 case Intrinsic::vector_reduce_umax:
5649 case Intrinsic::vector_reduce_umin:
5650 // TODO: this has no false positives, but arguably we should check that all
5651 // the bits are initialized.
5652 case Intrinsic::vector_reduce_fmax:
5653 case Intrinsic::vector_reduce_fmin:
5654 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5655 break;
5656
5657 case Intrinsic::vector_reduce_fadd:
5658 case Intrinsic::vector_reduce_fmul:
5659 handleVectorReduceWithStarterIntrinsic(I);
5660 break;
5661
5662 case Intrinsic::scmp:
5663 case Intrinsic::ucmp: {
5664 handleShadowOr(I);
5665 break;
5666 }
5667
5668 case Intrinsic::fshl:
5669 case Intrinsic::fshr:
5670 handleFunnelShift(I);
5671 break;
5672
5673 case Intrinsic::is_constant:
5674 // The result of llvm.is.constant() is always defined.
5675 setShadow(V: &I, SV: getCleanShadow(V: &I));
5676 setOrigin(V: &I, Origin: getCleanOrigin());
5677 break;
5678
5679 default:
5680 return false;
5681 }
5682
5683 return true;
5684 }
5685
5686 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5687 switch (I.getIntrinsicID()) {
5688 case Intrinsic::x86_sse_stmxcsr:
5689 handleStmxcsr(I);
5690 break;
5691 case Intrinsic::x86_sse_ldmxcsr:
5692 handleLdmxcsr(I);
5693 break;
5694
5695 // Convert Scalar Double Precision Floating-Point Value
5696 // to Unsigned Doubleword Integer
5697 // etc.
5698 case Intrinsic::x86_avx512_vcvtsd2usi64:
5699 case Intrinsic::x86_avx512_vcvtsd2usi32:
5700 case Intrinsic::x86_avx512_vcvtss2usi64:
5701 case Intrinsic::x86_avx512_vcvtss2usi32:
5702 case Intrinsic::x86_avx512_cvttss2usi64:
5703 case Intrinsic::x86_avx512_cvttss2usi:
5704 case Intrinsic::x86_avx512_cvttsd2usi64:
5705 case Intrinsic::x86_avx512_cvttsd2usi:
5706 case Intrinsic::x86_avx512_cvtusi2ss:
5707 case Intrinsic::x86_avx512_cvtusi642sd:
5708 case Intrinsic::x86_avx512_cvtusi642ss:
5709 handleSSEVectorConvertIntrinsic(I, NumUsedElements: 1, HasRoundingMode: true);
5710 break;
5711 case Intrinsic::x86_sse2_cvtsd2si64:
5712 case Intrinsic::x86_sse2_cvtsd2si:
5713 case Intrinsic::x86_sse2_cvtsd2ss:
5714 case Intrinsic::x86_sse2_cvttsd2si64:
5715 case Intrinsic::x86_sse2_cvttsd2si:
5716 case Intrinsic::x86_sse_cvtss2si64:
5717 case Intrinsic::x86_sse_cvtss2si:
5718 case Intrinsic::x86_sse_cvttss2si64:
5719 case Intrinsic::x86_sse_cvttss2si:
5720 handleSSEVectorConvertIntrinsic(I, NumUsedElements: 1);
5721 break;
5722 case Intrinsic::x86_sse_cvtps2pi:
5723 case Intrinsic::x86_sse_cvttps2pi:
5724 handleSSEVectorConvertIntrinsic(I, NumUsedElements: 2);
5725 break;
5726
5727 // TODO:
5728 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5729 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5730 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5731
5732 case Intrinsic::x86_vcvtps2ph_128:
5733 case Intrinsic::x86_vcvtps2ph_256: {
5734 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5735 break;
5736 }
5737
5738 // Convert Packed Single Precision Floating-Point Values
5739 // to Packed Signed Doubleword Integer Values
5740 //
5741 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5742 // (<16 x float>, <16 x i32>, i16, i32)
5743 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5744 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5745 break;
5746
5747 // Convert Packed Double Precision Floating-Point Values
5748 // to Packed Single Precision Floating-Point Values
5749 case Intrinsic::x86_sse2_cvtpd2ps:
5750 case Intrinsic::x86_sse2_cvtps2dq:
5751 case Intrinsic::x86_sse2_cvtpd2dq:
5752 case Intrinsic::x86_sse2_cvttps2dq:
5753 case Intrinsic::x86_sse2_cvttpd2dq:
5754 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5755 case Intrinsic::x86_avx_cvt_ps2dq_256:
5756 case Intrinsic::x86_avx_cvt_pd2dq_256:
5757 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5758 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5759 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5760 break;
5761 }
5762
5763 // Convert Single-Precision FP Value to 16-bit FP Value
5764 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5765 // (<16 x float>, i32, <16 x i16>, i16)
5766 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5767 // (<4 x float>, i32, <8 x i16>, i8)
5768 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5769 // (<8 x float>, i32, <8 x i16>, i8)
5770 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5771 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5772 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5773 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5774 break;
5775
5776 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5777 case Intrinsic::x86_avx512_psll_w_512:
5778 case Intrinsic::x86_avx512_psll_d_512:
5779 case Intrinsic::x86_avx512_psll_q_512:
5780 case Intrinsic::x86_avx512_pslli_w_512:
5781 case Intrinsic::x86_avx512_pslli_d_512:
5782 case Intrinsic::x86_avx512_pslli_q_512:
5783 case Intrinsic::x86_avx512_psrl_w_512:
5784 case Intrinsic::x86_avx512_psrl_d_512:
5785 case Intrinsic::x86_avx512_psrl_q_512:
5786 case Intrinsic::x86_avx512_psra_w_512:
5787 case Intrinsic::x86_avx512_psra_d_512:
5788 case Intrinsic::x86_avx512_psra_q_512:
5789 case Intrinsic::x86_avx512_psrli_w_512:
5790 case Intrinsic::x86_avx512_psrli_d_512:
5791 case Intrinsic::x86_avx512_psrli_q_512:
5792 case Intrinsic::x86_avx512_psrai_w_512:
5793 case Intrinsic::x86_avx512_psrai_d_512:
5794 case Intrinsic::x86_avx512_psrai_q_512:
5795 case Intrinsic::x86_avx512_psra_q_256:
5796 case Intrinsic::x86_avx512_psra_q_128:
5797 case Intrinsic::x86_avx512_psrai_q_256:
5798 case Intrinsic::x86_avx512_psrai_q_128:
5799 case Intrinsic::x86_avx2_psll_w:
5800 case Intrinsic::x86_avx2_psll_d:
5801 case Intrinsic::x86_avx2_psll_q:
5802 case Intrinsic::x86_avx2_pslli_w:
5803 case Intrinsic::x86_avx2_pslli_d:
5804 case Intrinsic::x86_avx2_pslli_q:
5805 case Intrinsic::x86_avx2_psrl_w:
5806 case Intrinsic::x86_avx2_psrl_d:
5807 case Intrinsic::x86_avx2_psrl_q:
5808 case Intrinsic::x86_avx2_psra_w:
5809 case Intrinsic::x86_avx2_psra_d:
5810 case Intrinsic::x86_avx2_psrli_w:
5811 case Intrinsic::x86_avx2_psrli_d:
5812 case Intrinsic::x86_avx2_psrli_q:
5813 case Intrinsic::x86_avx2_psrai_w:
5814 case Intrinsic::x86_avx2_psrai_d:
5815 case Intrinsic::x86_sse2_psll_w:
5816 case Intrinsic::x86_sse2_psll_d:
5817 case Intrinsic::x86_sse2_psll_q:
5818 case Intrinsic::x86_sse2_pslli_w:
5819 case Intrinsic::x86_sse2_pslli_d:
5820 case Intrinsic::x86_sse2_pslli_q:
5821 case Intrinsic::x86_sse2_psrl_w:
5822 case Intrinsic::x86_sse2_psrl_d:
5823 case Intrinsic::x86_sse2_psrl_q:
5824 case Intrinsic::x86_sse2_psra_w:
5825 case Intrinsic::x86_sse2_psra_d:
5826 case Intrinsic::x86_sse2_psrli_w:
5827 case Intrinsic::x86_sse2_psrli_d:
5828 case Intrinsic::x86_sse2_psrli_q:
5829 case Intrinsic::x86_sse2_psrai_w:
5830 case Intrinsic::x86_sse2_psrai_d:
5831 case Intrinsic::x86_mmx_psll_w:
5832 case Intrinsic::x86_mmx_psll_d:
5833 case Intrinsic::x86_mmx_psll_q:
5834 case Intrinsic::x86_mmx_pslli_w:
5835 case Intrinsic::x86_mmx_pslli_d:
5836 case Intrinsic::x86_mmx_pslli_q:
5837 case Intrinsic::x86_mmx_psrl_w:
5838 case Intrinsic::x86_mmx_psrl_d:
5839 case Intrinsic::x86_mmx_psrl_q:
5840 case Intrinsic::x86_mmx_psra_w:
5841 case Intrinsic::x86_mmx_psra_d:
5842 case Intrinsic::x86_mmx_psrli_w:
5843 case Intrinsic::x86_mmx_psrli_d:
5844 case Intrinsic::x86_mmx_psrli_q:
5845 case Intrinsic::x86_mmx_psrai_w:
5846 case Intrinsic::x86_mmx_psrai_d:
5847 handleVectorShiftIntrinsic(I, /* Variable */ false);
5848 break;
5849 case Intrinsic::x86_avx2_psllv_d:
5850 case Intrinsic::x86_avx2_psllv_d_256:
5851 case Intrinsic::x86_avx512_psllv_d_512:
5852 case Intrinsic::x86_avx2_psllv_q:
5853 case Intrinsic::x86_avx2_psllv_q_256:
5854 case Intrinsic::x86_avx512_psllv_q_512:
5855 case Intrinsic::x86_avx2_psrlv_d:
5856 case Intrinsic::x86_avx2_psrlv_d_256:
5857 case Intrinsic::x86_avx512_psrlv_d_512:
5858 case Intrinsic::x86_avx2_psrlv_q:
5859 case Intrinsic::x86_avx2_psrlv_q_256:
5860 case Intrinsic::x86_avx512_psrlv_q_512:
5861 case Intrinsic::x86_avx2_psrav_d:
5862 case Intrinsic::x86_avx2_psrav_d_256:
5863 case Intrinsic::x86_avx512_psrav_d_512:
5864 case Intrinsic::x86_avx512_psrav_q_128:
5865 case Intrinsic::x86_avx512_psrav_q_256:
5866 case Intrinsic::x86_avx512_psrav_q_512:
5867 handleVectorShiftIntrinsic(I, /* Variable */ true);
5868 break;
5869
5870 // Pack with Signed/Unsigned Saturation
5871 case Intrinsic::x86_sse2_packsswb_128:
5872 case Intrinsic::x86_sse2_packssdw_128:
5873 case Intrinsic::x86_sse2_packuswb_128:
5874 case Intrinsic::x86_sse41_packusdw:
5875 case Intrinsic::x86_avx2_packsswb:
5876 case Intrinsic::x86_avx2_packssdw:
5877 case Intrinsic::x86_avx2_packuswb:
5878 case Intrinsic::x86_avx2_packusdw:
5879 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5880 // (<32 x i16> %a, <32 x i16> %b)
5881 // <32 x i16> @llvm.x86.avx512.packssdw.512
5882 // (<16 x i32> %a, <16 x i32> %b)
5883 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5884 case Intrinsic::x86_avx512_packsswb_512:
5885 case Intrinsic::x86_avx512_packssdw_512:
5886 case Intrinsic::x86_avx512_packuswb_512:
5887 case Intrinsic::x86_avx512_packusdw_512:
5888 handleVectorPackIntrinsic(I);
5889 break;
5890
5891 case Intrinsic::x86_sse41_pblendvb:
5892 case Intrinsic::x86_sse41_blendvpd:
5893 case Intrinsic::x86_sse41_blendvps:
5894 case Intrinsic::x86_avx_blendv_pd_256:
5895 case Intrinsic::x86_avx_blendv_ps_256:
5896 case Intrinsic::x86_avx2_pblendvb:
5897 handleBlendvIntrinsic(I);
5898 break;
5899
5900 case Intrinsic::x86_avx_dp_ps_256:
5901 case Intrinsic::x86_sse41_dppd:
5902 case Intrinsic::x86_sse41_dpps:
5903 handleDppIntrinsic(I);
5904 break;
5905
5906 case Intrinsic::x86_mmx_packsswb:
5907 case Intrinsic::x86_mmx_packuswb:
5908 handleVectorPackIntrinsic(I, MMXEltSizeInBits: 16);
5909 break;
5910
5911 case Intrinsic::x86_mmx_packssdw:
5912 handleVectorPackIntrinsic(I, MMXEltSizeInBits: 32);
5913 break;
5914
5915 case Intrinsic::x86_mmx_psad_bw:
5916 handleVectorSadIntrinsic(I, IsMMX: true);
5917 break;
5918 case Intrinsic::x86_sse2_psad_bw:
5919 case Intrinsic::x86_avx2_psad_bw:
5920 handleVectorSadIntrinsic(I);
5921 break;
5922
5923 // Multiply and Add Packed Words
5924 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5925 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5926 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5927 //
5928 // Multiply and Add Packed Signed and Unsigned Bytes
5929 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5930 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5931 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5932 //
5933 // These intrinsics are auto-upgraded into non-masked forms:
5934 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5935 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5936 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5937 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5938 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5939 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5940 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5941 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5942 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5943 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5944 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5945 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5946 case Intrinsic::x86_sse2_pmadd_wd:
5947 case Intrinsic::x86_avx2_pmadd_wd:
5948 case Intrinsic::x86_avx512_pmaddw_d_512:
5949 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5950 case Intrinsic::x86_avx2_pmadd_ub_sw:
5951 case Intrinsic::x86_avx512_pmaddubs_w_512:
5952 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5953 /*ZeroPurifies=*/true,
5954 /*EltSizeInBits=*/0,
5955 /*Lanes=*/kBothLanes);
5956 break;
5957
5958 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5959 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5960 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5961 /*ZeroPurifies=*/true,
5962 /*EltSizeInBits=*/8,
5963 /*Lanes=*/kBothLanes);
5964 break;
5965
5966 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5967 case Intrinsic::x86_mmx_pmadd_wd:
5968 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5969 /*ZeroPurifies=*/true,
5970 /*EltSizeInBits=*/16,
5971 /*Lanes=*/kBothLanes);
5972 break;
5973
5974 // BFloat16 multiply-add to single-precision
5975 // <4 x float> llvm.aarch64.neon.bfmlalt
5976 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5977 case Intrinsic::aarch64_neon_bfmlalt:
5978 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5979 /*ZeroPurifies=*/false,
5980 /*EltSizeInBits=*/0,
5981 /*Lanes=*/kOddLanes);
5982 break;
5983
5984 // <4 x float> llvm.aarch64.neon.bfmlalb
5985 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5986 case Intrinsic::aarch64_neon_bfmlalb:
5987 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5988 /*ZeroPurifies=*/false,
5989 /*EltSizeInBits=*/0,
5990 /*Lanes=*/kEvenLanes);
5991 break;
5992
5993 // AVX Vector Neural Network Instructions: bytes
5994 //
5995 // Multiply and Add Signed Bytes
5996 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5997 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5998 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5999 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6000 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
6001 // (<16 x i32>, <64 x i8>, <64 x i8>)
6002 //
6003 // Multiply and Add Signed Bytes With Saturation
6004 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
6005 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6006 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
6007 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6008 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
6009 // (<16 x i32>, <64 x i8>, <64 x i8>)
6010 //
6011 // Multiply and Add Signed and Unsigned Bytes
6012 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
6013 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6014 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
6015 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6016 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
6017 // (<16 x i32>, <64 x i8>, <64 x i8>)
6018 //
6019 // Multiply and Add Signed and Unsigned Bytes With Saturation
6020 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
6021 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6022 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
6023 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6024 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
6025 // (<16 x i32>, <64 x i8>, <64 x i8>)
6026 //
6027 // Multiply and Add Unsigned and Signed Bytes
6028 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
6029 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6030 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
6031 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6032 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
6033 // (<16 x i32>, <64 x i8>, <64 x i8>)
6034 //
6035 // Multiply and Add Unsigned and Signed Bytes With Saturation
6036 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
6037 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6038 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
6039 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6040 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
6041 // (<16 x i32>, <64 x i8>, <64 x i8>)
6042 //
6043 // Multiply and Add Unsigned Bytes
6044 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
6045 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6046 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
6047 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6048 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
6049 // (<16 x i32>, <64 x i8>, <64 x i8>)
6050 //
6051 // Multiply and Add Unsigned Bytes With Saturation
6052 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
6053 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6054 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
6055 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6056 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
6057 // (<16 x i32>, <64 x i8>, <64 x i8>)
6058 //
6059 // These intrinsics are auto-upgraded into non-masked forms:
6060 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
6061 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6062 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
6063 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6064 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
6065 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6066 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6067 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6068 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6069 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6070 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6071 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6072 //
6073 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6074 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6075 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6076 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6077 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6078 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6079 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6080 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6081 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6082 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6083 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6084 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6085 case Intrinsic::x86_avx512_vpdpbusd_128:
6086 case Intrinsic::x86_avx512_vpdpbusd_256:
6087 case Intrinsic::x86_avx512_vpdpbusd_512:
6088 case Intrinsic::x86_avx512_vpdpbusds_128:
6089 case Intrinsic::x86_avx512_vpdpbusds_256:
6090 case Intrinsic::x86_avx512_vpdpbusds_512:
6091 case Intrinsic::x86_avx2_vpdpbssd_128:
6092 case Intrinsic::x86_avx2_vpdpbssd_256:
6093 case Intrinsic::x86_avx10_vpdpbssd_512:
6094 case Intrinsic::x86_avx2_vpdpbssds_128:
6095 case Intrinsic::x86_avx2_vpdpbssds_256:
6096 case Intrinsic::x86_avx10_vpdpbssds_512:
6097 case Intrinsic::x86_avx2_vpdpbsud_128:
6098 case Intrinsic::x86_avx2_vpdpbsud_256:
6099 case Intrinsic::x86_avx10_vpdpbsud_512:
6100 case Intrinsic::x86_avx2_vpdpbsuds_128:
6101 case Intrinsic::x86_avx2_vpdpbsuds_256:
6102 case Intrinsic::x86_avx10_vpdpbsuds_512:
6103 case Intrinsic::x86_avx2_vpdpbuud_128:
6104 case Intrinsic::x86_avx2_vpdpbuud_256:
6105 case Intrinsic::x86_avx10_vpdpbuud_512:
6106 case Intrinsic::x86_avx2_vpdpbuuds_128:
6107 case Intrinsic::x86_avx2_vpdpbuuds_256:
6108 case Intrinsic::x86_avx10_vpdpbuuds_512:
6109 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6110 /*ZeroPurifies=*/true,
6111 /*EltSizeInBits=*/0,
6112 /*Lanes=*/kBothLanes);
6113 break;
6114
6115 // AVX Vector Neural Network Instructions: words
6116 //
6117 // Multiply and Add Signed Word Integers
6118 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6119 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6120 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6121 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6122 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6123 // (<16 x i32>, <32 x i16>, <32 x i16>)
6124 //
6125 // Multiply and Add Signed Word Integers With Saturation
6126 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6127 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6128 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6129 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6130 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6131 // (<16 x i32>, <32 x i16>, <32 x i16>)
6132 //
6133 // Multiply and Add Signed and Unsigned Word Integers
6134 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6135 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6136 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6137 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6138 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6139 // (<16 x i32>, <32 x i16>, <32 x i16>)
6140 //
6141 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6142 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6143 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6144 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6145 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6146 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6147 // (<16 x i32>, <32 x i16>, <32 x i16>)
6148 //
6149 // Multiply and Add Unsigned and Signed Word Integers
6150 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6151 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6152 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6153 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6154 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6155 // (<16 x i32>, <32 x i16>, <32 x i16>)
6156 //
6157 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6158 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6159 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6160 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6161 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6162 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6163 // (<16 x i32>, <32 x i16>, <32 x i16>)
6164 //
6165 // Multiply and Add Unsigned and Unsigned Word Integers
6166 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6167 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6168 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6169 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6170 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6171 // (<16 x i32>, <32 x i16>, <32 x i16>)
6172 //
6173 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6174 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6175 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6176 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6177 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6178 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6179 // (<16 x i32>, <32 x i16>, <32 x i16>)
6180 //
6181 // These intrinsics are auto-upgraded into non-masked forms:
6182 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6183 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6184 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6185 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6186 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6187 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6188 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6189 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6190 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6191 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6192 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6193 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6194 //
6195 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6196 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6197 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6198 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6199 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6200 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6201 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6202 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6203 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6204 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6205 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6206 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6207 case Intrinsic::x86_avx512_vpdpwssd_128:
6208 case Intrinsic::x86_avx512_vpdpwssd_256:
6209 case Intrinsic::x86_avx512_vpdpwssd_512:
6210 case Intrinsic::x86_avx512_vpdpwssds_128:
6211 case Intrinsic::x86_avx512_vpdpwssds_256:
6212 case Intrinsic::x86_avx512_vpdpwssds_512:
6213 case Intrinsic::x86_avx2_vpdpwsud_128:
6214 case Intrinsic::x86_avx2_vpdpwsud_256:
6215 case Intrinsic::x86_avx10_vpdpwsud_512:
6216 case Intrinsic::x86_avx2_vpdpwsuds_128:
6217 case Intrinsic::x86_avx2_vpdpwsuds_256:
6218 case Intrinsic::x86_avx10_vpdpwsuds_512:
6219 case Intrinsic::x86_avx2_vpdpwusd_128:
6220 case Intrinsic::x86_avx2_vpdpwusd_256:
6221 case Intrinsic::x86_avx10_vpdpwusd_512:
6222 case Intrinsic::x86_avx2_vpdpwusds_128:
6223 case Intrinsic::x86_avx2_vpdpwusds_256:
6224 case Intrinsic::x86_avx10_vpdpwusds_512:
6225 case Intrinsic::x86_avx2_vpdpwuud_128:
6226 case Intrinsic::x86_avx2_vpdpwuud_256:
6227 case Intrinsic::x86_avx10_vpdpwuud_512:
6228 case Intrinsic::x86_avx2_vpdpwuuds_128:
6229 case Intrinsic::x86_avx2_vpdpwuuds_256:
6230 case Intrinsic::x86_avx10_vpdpwuuds_512:
6231 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6232 /*ZeroPurifies=*/true,
6233 /*EltSizeInBits=*/0,
6234 /*Lanes=*/kBothLanes);
6235 break;
6236
6237 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6238 // Precision
6239 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6240 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6241 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6242 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6243 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6244 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6245 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6246 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6247 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6248 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6249 /*ZeroPurifies=*/false,
6250 /*EltSizeInBits=*/0,
6251 /*Lanes=*/kBothLanes);
6252 break;
6253
6254 case Intrinsic::x86_sse_cmp_ss:
6255 case Intrinsic::x86_sse2_cmp_sd:
6256 case Intrinsic::x86_sse_comieq_ss:
6257 case Intrinsic::x86_sse_comilt_ss:
6258 case Intrinsic::x86_sse_comile_ss:
6259 case Intrinsic::x86_sse_comigt_ss:
6260 case Intrinsic::x86_sse_comige_ss:
6261 case Intrinsic::x86_sse_comineq_ss:
6262 case Intrinsic::x86_sse_ucomieq_ss:
6263 case Intrinsic::x86_sse_ucomilt_ss:
6264 case Intrinsic::x86_sse_ucomile_ss:
6265 case Intrinsic::x86_sse_ucomigt_ss:
6266 case Intrinsic::x86_sse_ucomige_ss:
6267 case Intrinsic::x86_sse_ucomineq_ss:
6268 case Intrinsic::x86_sse2_comieq_sd:
6269 case Intrinsic::x86_sse2_comilt_sd:
6270 case Intrinsic::x86_sse2_comile_sd:
6271 case Intrinsic::x86_sse2_comigt_sd:
6272 case Intrinsic::x86_sse2_comige_sd:
6273 case Intrinsic::x86_sse2_comineq_sd:
6274 case Intrinsic::x86_sse2_ucomieq_sd:
6275 case Intrinsic::x86_sse2_ucomilt_sd:
6276 case Intrinsic::x86_sse2_ucomile_sd:
6277 case Intrinsic::x86_sse2_ucomigt_sd:
6278 case Intrinsic::x86_sse2_ucomige_sd:
6279 case Intrinsic::x86_sse2_ucomineq_sd:
6280 handleVectorCompareScalarIntrinsic(I);
6281 break;
6282
6283 case Intrinsic::x86_avx_cmp_pd_256:
6284 case Intrinsic::x86_avx_cmp_ps_256:
6285 case Intrinsic::x86_sse2_cmp_pd:
6286 case Intrinsic::x86_sse_cmp_ps:
6287 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/true);
6288 break;
6289
6290 case Intrinsic::x86_bmi_bextr_32:
6291 case Intrinsic::x86_bmi_bextr_64:
6292 case Intrinsic::x86_bmi_bzhi_32:
6293 case Intrinsic::x86_bmi_bzhi_64:
6294 case Intrinsic::x86_bmi_pdep_32:
6295 case Intrinsic::x86_bmi_pdep_64:
6296 case Intrinsic::x86_bmi_pext_32:
6297 case Intrinsic::x86_bmi_pext_64:
6298 handleBmiIntrinsic(I);
6299 break;
6300
6301 case Intrinsic::x86_pclmulqdq:
6302 case Intrinsic::x86_pclmulqdq_256:
6303 case Intrinsic::x86_pclmulqdq_512:
6304 handlePclmulIntrinsic(I);
6305 break;
6306
6307 case Intrinsic::x86_avx_round_pd_256:
6308 case Intrinsic::x86_avx_round_ps_256:
6309 case Intrinsic::x86_sse41_round_pd:
6310 case Intrinsic::x86_sse41_round_ps:
6311 handleRoundPdPsIntrinsic(I);
6312 break;
6313
6314 case Intrinsic::x86_sse41_round_sd:
6315 case Intrinsic::x86_sse41_round_ss:
6316 handleUnarySdSsIntrinsic(I);
6317 break;
6318
6319 case Intrinsic::x86_sse2_max_sd:
6320 case Intrinsic::x86_sse_max_ss:
6321 case Intrinsic::x86_sse2_min_sd:
6322 case Intrinsic::x86_sse_min_ss:
6323 handleBinarySdSsIntrinsic(I);
6324 break;
6325
6326 case Intrinsic::x86_avx_vtestc_pd:
6327 case Intrinsic::x86_avx_vtestc_pd_256:
6328 case Intrinsic::x86_avx_vtestc_ps:
6329 case Intrinsic::x86_avx_vtestc_ps_256:
6330 case Intrinsic::x86_avx_vtestnzc_pd:
6331 case Intrinsic::x86_avx_vtestnzc_pd_256:
6332 case Intrinsic::x86_avx_vtestnzc_ps:
6333 case Intrinsic::x86_avx_vtestnzc_ps_256:
6334 case Intrinsic::x86_avx_vtestz_pd:
6335 case Intrinsic::x86_avx_vtestz_pd_256:
6336 case Intrinsic::x86_avx_vtestz_ps:
6337 case Intrinsic::x86_avx_vtestz_ps_256:
6338 case Intrinsic::x86_avx_ptestc_256:
6339 case Intrinsic::x86_avx_ptestnzc_256:
6340 case Intrinsic::x86_avx_ptestz_256:
6341 case Intrinsic::x86_sse41_ptestc:
6342 case Intrinsic::x86_sse41_ptestnzc:
6343 case Intrinsic::x86_sse41_ptestz:
6344 handleVtestIntrinsic(I);
6345 break;
6346
6347 // Packed Horizontal Add/Subtract
6348 case Intrinsic::x86_ssse3_phadd_w:
6349 case Intrinsic::x86_ssse3_phadd_w_128:
6350 case Intrinsic::x86_ssse3_phsub_w:
6351 case Intrinsic::x86_ssse3_phsub_w_128:
6352 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6353 /*ReinterpretElemWidth=*/16);
6354 break;
6355
6356 case Intrinsic::x86_avx2_phadd_w:
6357 case Intrinsic::x86_avx2_phsub_w:
6358 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6359 /*ReinterpretElemWidth=*/16);
6360 break;
6361
6362 // Packed Horizontal Add/Subtract
6363 case Intrinsic::x86_ssse3_phadd_d:
6364 case Intrinsic::x86_ssse3_phadd_d_128:
6365 case Intrinsic::x86_ssse3_phsub_d:
6366 case Intrinsic::x86_ssse3_phsub_d_128:
6367 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6368 /*ReinterpretElemWidth=*/32);
6369 break;
6370
6371 case Intrinsic::x86_avx2_phadd_d:
6372 case Intrinsic::x86_avx2_phsub_d:
6373 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6374 /*ReinterpretElemWidth=*/32);
6375 break;
6376
6377 // Packed Horizontal Add/Subtract and Saturate
6378 case Intrinsic::x86_ssse3_phadd_sw:
6379 case Intrinsic::x86_ssse3_phadd_sw_128:
6380 case Intrinsic::x86_ssse3_phsub_sw:
6381 case Intrinsic::x86_ssse3_phsub_sw_128:
6382 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6383 /*ReinterpretElemWidth=*/16);
6384 break;
6385
6386 case Intrinsic::x86_avx2_phadd_sw:
6387 case Intrinsic::x86_avx2_phsub_sw:
6388 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6389 /*ReinterpretElemWidth=*/16);
6390 break;
6391
6392 // Packed Single/Double Precision Floating-Point Horizontal Add
6393 case Intrinsic::x86_sse3_hadd_ps:
6394 case Intrinsic::x86_sse3_hadd_pd:
6395 case Intrinsic::x86_sse3_hsub_ps:
6396 case Intrinsic::x86_sse3_hsub_pd:
6397 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6398 break;
6399
6400 case Intrinsic::x86_avx_hadd_pd_256:
6401 case Intrinsic::x86_avx_hadd_ps_256:
6402 case Intrinsic::x86_avx_hsub_pd_256:
6403 case Intrinsic::x86_avx_hsub_ps_256:
6404 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6405 break;
6406
6407 case Intrinsic::x86_avx_maskstore_ps:
6408 case Intrinsic::x86_avx_maskstore_pd:
6409 case Intrinsic::x86_avx_maskstore_ps_256:
6410 case Intrinsic::x86_avx_maskstore_pd_256:
6411 case Intrinsic::x86_avx2_maskstore_d:
6412 case Intrinsic::x86_avx2_maskstore_q:
6413 case Intrinsic::x86_avx2_maskstore_d_256:
6414 case Intrinsic::x86_avx2_maskstore_q_256: {
6415 handleAVXMaskedStore(I);
6416 break;
6417 }
6418
6419 case Intrinsic::x86_avx_maskload_ps:
6420 case Intrinsic::x86_avx_maskload_pd:
6421 case Intrinsic::x86_avx_maskload_ps_256:
6422 case Intrinsic::x86_avx_maskload_pd_256:
6423 case Intrinsic::x86_avx2_maskload_d:
6424 case Intrinsic::x86_avx2_maskload_q:
6425 case Intrinsic::x86_avx2_maskload_d_256:
6426 case Intrinsic::x86_avx2_maskload_q_256: {
6427 handleAVXMaskedLoad(I);
6428 break;
6429 }
6430
6431 // Packed
6432 case Intrinsic::x86_avx512fp16_add_ph_512:
6433 case Intrinsic::x86_avx512fp16_sub_ph_512:
6434 case Intrinsic::x86_avx512fp16_mul_ph_512:
6435 case Intrinsic::x86_avx512fp16_div_ph_512:
6436 case Intrinsic::x86_avx512fp16_max_ph_512:
6437 case Intrinsic::x86_avx512fp16_min_ph_512:
6438 case Intrinsic::x86_avx512_min_ps_512:
6439 case Intrinsic::x86_avx512_min_pd_512:
6440 case Intrinsic::x86_avx512_max_ps_512:
6441 case Intrinsic::x86_avx512_max_pd_512: {
6442 // These AVX512 variants contain the rounding mode as a trailing flag.
6443 // Earlier variants do not have a trailing flag and are already handled
6444 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6445 // maybeHandleUnknownIntrinsic.
6446 [[maybe_unused]] bool Success =
6447 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6448 assert(Success);
6449 break;
6450 }
6451
6452 case Intrinsic::x86_avx_vpermilvar_pd:
6453 case Intrinsic::x86_avx_vpermilvar_pd_256:
6454 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6455 case Intrinsic::x86_avx_vpermilvar_ps:
6456 case Intrinsic::x86_avx_vpermilvar_ps_256:
6457 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6458 handleAVXVpermilvar(I);
6459 break;
6460 }
6461
6462 case Intrinsic::x86_avx512_vpermi2var_d_128:
6463 case Intrinsic::x86_avx512_vpermi2var_d_256:
6464 case Intrinsic::x86_avx512_vpermi2var_d_512:
6465 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6466 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6467 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6468 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6469 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6470 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6471 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6472 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6473 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6474 case Intrinsic::x86_avx512_vpermi2var_q_128:
6475 case Intrinsic::x86_avx512_vpermi2var_q_256:
6476 case Intrinsic::x86_avx512_vpermi2var_q_512:
6477 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6478 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6479 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6480 handleAVXVpermi2var(I);
6481 break;
6482
6483 // Packed Shuffle
6484 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6485 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6486 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6487 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6488 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6489 //
6490 // The following intrinsics are auto-upgraded:
6491 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6492 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6493 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6494 case Intrinsic::x86_avx2_pshuf_b:
6495 case Intrinsic::x86_sse_pshuf_w:
6496 case Intrinsic::x86_ssse3_pshuf_b_128:
6497 case Intrinsic::x86_ssse3_pshuf_b:
6498 case Intrinsic::x86_avx512_pshuf_b_512:
6499 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
6500 /*trailingVerbatimArgs=*/1);
6501 break;
6502
6503 // AVX512 PMOV: Packed MOV, with truncation
6504 // Precisely handled by applying the same intrinsic to the shadow
6505 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6506 case Intrinsic::x86_avx512_mask_pmov_db_512:
6507 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6508 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6509 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6510 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6511 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
6512 /*trailingVerbatimArgs=*/1);
6513 break;
6514 }
6515
6516 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6517 // Approximately handled using the corresponding truncation intrinsic
6518 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6519 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6520 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6521 handleIntrinsicByApplyingToShadow(I,
6522 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_dw_512,
6523 /* trailingVerbatimArgs=*/1);
6524 break;
6525 }
6526
6527 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6528 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6529 handleIntrinsicByApplyingToShadow(I,
6530 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_db_512,
6531 /* trailingVerbatimArgs=*/1);
6532 break;
6533 }
6534
6535 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6536 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6537 handleIntrinsicByApplyingToShadow(I,
6538 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_qb_512,
6539 /* trailingVerbatimArgs=*/1);
6540 break;
6541 }
6542
6543 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6544 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6545 handleIntrinsicByApplyingToShadow(I,
6546 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_qw_512,
6547 /* trailingVerbatimArgs=*/1);
6548 break;
6549 }
6550
6551 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6552 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6553 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6554 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6555 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6556 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6557 // slow-path handler.
6558 handleAVX512VectorDownConvert(I);
6559 break;
6560 }
6561
6562 // AVX512/AVX10 Reciprocal
6563 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6564 // (<16 x float>, <16 x float>, i16)
6565 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6566 // (<8 x float>, <8 x float>, i8)
6567 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6568 // (<4 x float>, <4 x float>, i8)
6569 //
6570 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6571 // (<8 x double>, <8 x double>, i8)
6572 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6573 // (<4 x double>, <4 x double>, i8)
6574 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6575 // (<2 x double>, <2 x double>, i8)
6576 //
6577 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6578 // (<32 x bfloat>, <32 x bfloat>, i32)
6579 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6580 // (<16 x bfloat>, <16 x bfloat>, i16)
6581 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6582 // (<8 x bfloat>, <8 x bfloat>, i8)
6583 //
6584 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6585 // (<32 x half>, <32 x half>, i32)
6586 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6587 // (<16 x half>, <16 x half>, i16)
6588 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6589 // (<8 x half>, <8 x half>, i8)
6590 //
6591 // TODO: 3-operand variants are not handled:
6592 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6593 // (<2 x double>, <2 x double>, <2 x double>, i8)
6594 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6595 // (<4 x float>, <4 x float>, <4 x float>, i8)
6596 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6597 // (<8 x half>, <8 x half>, <8 x half>, i8)
6598 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6599 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6600 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6601 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6602 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6603 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6604 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6605 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6606 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6607 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6608 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6609 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6610 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6611 /*MaskIndex=*/2);
6612 break;
6613
6614 // AVX512/AVX10 Reciprocal Square Root
6615 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6616 // (<16 x float>, <16 x float>, i16)
6617 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6618 // (<8 x float>, <8 x float>, i8)
6619 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6620 // (<4 x float>, <4 x float>, i8)
6621 //
6622 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6623 // (<8 x double>, <8 x double>, i8)
6624 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6625 // (<4 x double>, <4 x double>, i8)
6626 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6627 // (<2 x double>, <2 x double>, i8)
6628 //
6629 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6630 // (<32 x bfloat>, <32 x bfloat>, i32)
6631 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6632 // (<16 x bfloat>, <16 x bfloat>, i16)
6633 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6634 // (<8 x bfloat>, <8 x bfloat>, i8)
6635 //
6636 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6637 // (<32 x half>, <32 x half>, i32)
6638 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6639 // (<16 x half>, <16 x half>, i16)
6640 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6641 // (<8 x half>, <8 x half>, i8)
6642 //
6643 // TODO: 3-operand variants are not handled:
6644 // <2 x double> @llvm.x86.avx512.rcp14.sd
6645 // (<2 x double>, <2 x double>, <2 x double>, i8)
6646 // <4 x float> @llvm.x86.avx512.rcp14.ss
6647 // (<4 x float>, <4 x float>, <4 x float>, i8)
6648 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6649 // (<8 x half>, <8 x half>, <8 x half>, i8)
6650 case Intrinsic::x86_avx512_rcp14_ps_512:
6651 case Intrinsic::x86_avx512_rcp14_ps_256:
6652 case Intrinsic::x86_avx512_rcp14_ps_128:
6653 case Intrinsic::x86_avx512_rcp14_pd_512:
6654 case Intrinsic::x86_avx512_rcp14_pd_256:
6655 case Intrinsic::x86_avx512_rcp14_pd_128:
6656 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6657 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6658 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6659 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6660 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6661 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6662 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6663 /*MaskIndex=*/2);
6664 break;
6665
6666 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6667 // (<32 x half>, i32, <32 x half>, i32, i32)
6668 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6669 // (<16 x half>, i32, <16 x half>, i32, i16)
6670 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6671 // (<8 x half>, i32, <8 x half>, i32, i8)
6672 //
6673 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6674 // (<16 x float>, i32, <16 x float>, i16, i32)
6675 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6676 // (<8 x float>, i32, <8 x float>, i8)
6677 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6678 // (<4 x float>, i32, <4 x float>, i8)
6679 //
6680 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6681 // (<8 x double>, i32, <8 x double>, i8, i32)
6682 // A Imm WriteThru Mask Rounding
6683 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6684 // (<4 x double>, i32, <4 x double>, i8)
6685 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6686 // (<2 x double>, i32, <2 x double>, i8)
6687 // A Imm WriteThru Mask
6688 //
6689 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6690 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6691 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6692 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6693 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6694 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6695 //
6696 // Not supported: three vectors
6697 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6698 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6699 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6700 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6701 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6702 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6703 // i32)
6704 // A B WriteThru Mask Imm
6705 // Rounding
6706 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6707 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6708 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6709 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6710 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6711 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6712 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6713 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6714 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6715 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6716 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6717 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6718 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6719 /*MaskIndex=*/3);
6720 break;
6721
6722 // AVX512 FP16 Arithmetic
6723 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6724 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6725 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6726 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6727 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6728 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6729 visitGenericScalarHalfwordInst(I);
6730 break;
6731 }
6732
6733 // AVX Galois Field New Instructions
6734 case Intrinsic::x86_vgf2p8affineqb_128:
6735 case Intrinsic::x86_vgf2p8affineqb_256:
6736 case Intrinsic::x86_vgf2p8affineqb_512:
6737 handleAVXGF2P8Affine(I);
6738 break;
6739
6740 default:
6741 return false;
6742 }
6743
6744 return true;
6745 }
6746
6747 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6748 switch (I.getIntrinsicID()) {
6749 // Two operands e.g.,
6750 // - <8 x i8> @llvm.aarch64.neon.rshrn.v8i8 (<8 x i16>, i32)
6751 // - <4 x i16> @llvm.aarch64.neon.uqrshl.v4i16(<4 x i16>, <4 x i16>)
6752 case Intrinsic::aarch64_neon_rshrn:
6753 case Intrinsic::aarch64_neon_sqrshl:
6754 case Intrinsic::aarch64_neon_sqrshrn:
6755 case Intrinsic::aarch64_neon_sqrshrun:
6756 case Intrinsic::aarch64_neon_sqshl:
6757 case Intrinsic::aarch64_neon_sqshlu:
6758 case Intrinsic::aarch64_neon_sqshrn:
6759 case Intrinsic::aarch64_neon_sqshrun:
6760 case Intrinsic::aarch64_neon_srshl:
6761 case Intrinsic::aarch64_neon_sshl:
6762 case Intrinsic::aarch64_neon_uqrshl:
6763 case Intrinsic::aarch64_neon_uqrshrn:
6764 case Intrinsic::aarch64_neon_uqshl:
6765 case Intrinsic::aarch64_neon_uqshrn:
6766 case Intrinsic::aarch64_neon_urshl:
6767 case Intrinsic::aarch64_neon_ushl:
6768 handleVectorShiftIntrinsic(I, /* Variable */ false);
6769 break;
6770
6771 // Vector Shift Left/Right and Insert
6772 //
6773 // Three operands e.g.,
6774 // - <4 x i16> @llvm.aarch64.neon.vsli.v4i16
6775 // (<4 x i16> %a, <4 x i16> %b, i32 %n)
6776 // - <16 x i8> @llvm.aarch64.neon.vsri.v16i8
6777 // (<16 x i8> %a, <16 x i8> %b, i32 %n)
6778 //
6779 // %b is shifted by %n bits, and the "missing" bits are filled in with %a
6780 // (instead of zero-extending/sign-extending).
6781 case Intrinsic::aarch64_neon_vsli:
6782 case Intrinsic::aarch64_neon_vsri:
6783 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
6784 /*trailingVerbatimArgs=*/1);
6785 break;
6786
6787 // TODO: handling max/min similarly to AND/OR may be more precise
6788 // Floating-Point Maximum/Minimum Pairwise
6789 case Intrinsic::aarch64_neon_fmaxp:
6790 case Intrinsic::aarch64_neon_fminp:
6791 // Floating-Point Maximum/Minimum Number Pairwise
6792 case Intrinsic::aarch64_neon_fmaxnmp:
6793 case Intrinsic::aarch64_neon_fminnmp:
6794 // Signed/Unsigned Maximum/Minimum Pairwise
6795 case Intrinsic::aarch64_neon_smaxp:
6796 case Intrinsic::aarch64_neon_sminp:
6797 case Intrinsic::aarch64_neon_umaxp:
6798 case Intrinsic::aarch64_neon_uminp:
6799 // Add Pairwise
6800 case Intrinsic::aarch64_neon_addp:
6801 // Floating-point Add Pairwise
6802 case Intrinsic::aarch64_neon_faddp:
6803 // Add Long Pairwise
6804 case Intrinsic::aarch64_neon_saddlp:
6805 case Intrinsic::aarch64_neon_uaddlp: {
6806 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6807 break;
6808 }
6809
6810 // Floating-point Convert to integer, rounding to nearest with ties to Away
6811 case Intrinsic::aarch64_neon_fcvtas:
6812 case Intrinsic::aarch64_neon_fcvtau:
6813 // Floating-point convert to integer, rounding toward minus infinity
6814 case Intrinsic::aarch64_neon_fcvtms:
6815 case Intrinsic::aarch64_neon_fcvtmu:
6816 // Floating-point convert to integer, rounding to nearest with ties to even
6817 case Intrinsic::aarch64_neon_fcvtns:
6818 case Intrinsic::aarch64_neon_fcvtnu:
6819 // Floating-point convert to integer, rounding toward plus infinity
6820 case Intrinsic::aarch64_neon_fcvtps:
6821 case Intrinsic::aarch64_neon_fcvtpu:
6822 // Floating-point Convert to integer, rounding toward Zero
6823 case Intrinsic::aarch64_neon_fcvtzs:
6824 case Intrinsic::aarch64_neon_fcvtzu:
6825 // Floating-point convert to lower precision narrow, rounding to odd
6826 case Intrinsic::aarch64_neon_fcvtxn:
6827 // Vector Conversions Between Half-Precision and Single-Precision
6828 case Intrinsic::aarch64_neon_vcvthf2fp:
6829 case Intrinsic::aarch64_neon_vcvtfp2hf:
6830 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/false);
6831 break;
6832
6833 // Vector Conversions Between Fixed-Point and Floating-Point
6834 case Intrinsic::aarch64_neon_vcvtfxs2fp:
6835 case Intrinsic::aarch64_neon_vcvtfp2fxs:
6836 case Intrinsic::aarch64_neon_vcvtfxu2fp:
6837 case Intrinsic::aarch64_neon_vcvtfp2fxu:
6838 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/true);
6839 break;
6840
6841 // TODO: bfloat conversions
6842 // - bfloat @llvm.aarch64.neon.bfcvt(float)
6843 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
6844 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
6845
6846 // Add reduction to scalar
6847 case Intrinsic::aarch64_neon_faddv:
6848 case Intrinsic::aarch64_neon_saddv:
6849 case Intrinsic::aarch64_neon_uaddv:
6850 // Signed/Unsigned min/max (Vector)
6851 // TODO: handling similarly to AND/OR may be more precise.
6852 case Intrinsic::aarch64_neon_smaxv:
6853 case Intrinsic::aarch64_neon_sminv:
6854 case Intrinsic::aarch64_neon_umaxv:
6855 case Intrinsic::aarch64_neon_uminv:
6856 // Floating-point min/max (vector)
6857 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6858 // but our shadow propagation is the same.
6859 case Intrinsic::aarch64_neon_fmaxv:
6860 case Intrinsic::aarch64_neon_fminv:
6861 case Intrinsic::aarch64_neon_fmaxnmv:
6862 case Intrinsic::aarch64_neon_fminnmv:
6863 // Sum long across vector
6864 case Intrinsic::aarch64_neon_saddlv:
6865 case Intrinsic::aarch64_neon_uaddlv:
6866 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6867 break;
6868
6869 case Intrinsic::aarch64_neon_ld1x2:
6870 case Intrinsic::aarch64_neon_ld1x3:
6871 case Intrinsic::aarch64_neon_ld1x4:
6872 case Intrinsic::aarch64_neon_ld2:
6873 case Intrinsic::aarch64_neon_ld3:
6874 case Intrinsic::aarch64_neon_ld4:
6875 case Intrinsic::aarch64_neon_ld2r:
6876 case Intrinsic::aarch64_neon_ld3r:
6877 case Intrinsic::aarch64_neon_ld4r: {
6878 handleNEONVectorLoad(I, /*WithLane=*/false);
6879 break;
6880 }
6881
6882 case Intrinsic::aarch64_neon_ld2lane:
6883 case Intrinsic::aarch64_neon_ld3lane:
6884 case Intrinsic::aarch64_neon_ld4lane: {
6885 handleNEONVectorLoad(I, /*WithLane=*/true);
6886 break;
6887 }
6888
6889 // Saturating extract narrow
6890 case Intrinsic::aarch64_neon_sqxtn:
6891 case Intrinsic::aarch64_neon_sqxtun:
6892 case Intrinsic::aarch64_neon_uqxtn:
6893 // These only have one argument, but we (ab)use handleShadowOr because it
6894 // does work on single argument intrinsics and will typecast the shadow
6895 // (and update the origin).
6896 handleShadowOr(I);
6897 break;
6898
6899 case Intrinsic::aarch64_neon_st1x2:
6900 case Intrinsic::aarch64_neon_st1x3:
6901 case Intrinsic::aarch64_neon_st1x4:
6902 case Intrinsic::aarch64_neon_st2:
6903 case Intrinsic::aarch64_neon_st3:
6904 case Intrinsic::aarch64_neon_st4: {
6905 handleNEONVectorStoreIntrinsic(I, useLane: false);
6906 break;
6907 }
6908
6909 case Intrinsic::aarch64_neon_st2lane:
6910 case Intrinsic::aarch64_neon_st3lane:
6911 case Intrinsic::aarch64_neon_st4lane: {
6912 handleNEONVectorStoreIntrinsic(I, useLane: true);
6913 break;
6914 }
6915
6916 // Arm NEON vector table intrinsics have the source/table register(s) as
6917 // arguments, followed by the index register. They return the output.
6918 //
6919 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6920 // original value unchanged in the destination register.'
6921 // Conveniently, zero denotes a clean shadow, which means out-of-range
6922 // indices for TBL will initialize the user data with zero and also clean
6923 // the shadow. (For TBX, neither the user data nor the shadow will be
6924 // updated, which is also correct.)
6925 case Intrinsic::aarch64_neon_tbl1:
6926 case Intrinsic::aarch64_neon_tbl2:
6927 case Intrinsic::aarch64_neon_tbl3:
6928 case Intrinsic::aarch64_neon_tbl4:
6929 case Intrinsic::aarch64_neon_tbx1:
6930 case Intrinsic::aarch64_neon_tbx2:
6931 case Intrinsic::aarch64_neon_tbx3:
6932 case Intrinsic::aarch64_neon_tbx4: {
6933 // The last trailing argument (index register) should be handled verbatim
6934 handleIntrinsicByApplyingToShadow(
6935 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6936 /*trailingVerbatimArgs*/ 1);
6937 break;
6938 }
6939
6940 case Intrinsic::aarch64_neon_fmulx:
6941 case Intrinsic::aarch64_neon_pmul:
6942 case Intrinsic::aarch64_neon_pmull:
6943 case Intrinsic::aarch64_neon_smull:
6944 case Intrinsic::aarch64_neon_pmull64:
6945 case Intrinsic::aarch64_neon_umull: {
6946 handleNEONVectorMultiplyIntrinsic(I);
6947 break;
6948 }
6949
6950 case Intrinsic::aarch64_neon_smmla:
6951 case Intrinsic::aarch64_neon_ummla:
6952 case Intrinsic::aarch64_neon_usmmla:
6953 handleNEONMatrixMultiply(I, /*ARows=*/2, /*ACols=*/8, /*BRows=*/8,
6954 /*BCols=*/2);
6955 break;
6956
6957 // <2 x i32> @llvm.aarch64.neon.{u,s,us}dot.v2i32.v8i8
6958 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
6959 // <4 x i32> @llvm.aarch64.neon.{u,s,us}dot.v4i32.v16i8
6960 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
6961 case Intrinsic::aarch64_neon_sdot:
6962 case Intrinsic::aarch64_neon_udot:
6963 case Intrinsic::aarch64_neon_usdot:
6964 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6965 /*ZeroPurifies=*/true,
6966 /*EltSizeInBits=*/0,
6967 /*Lanes=*/kBothLanes);
6968 break;
6969
6970 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
6971 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
6972 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
6973 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
6974 case Intrinsic::aarch64_neon_bfdot:
6975 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6976 /*ZeroPurifies=*/false,
6977 /*EltSizeInBits=*/0,
6978 /*Lanes=*/kBothLanes);
6979 break;
6980
6981 default:
6982 return false;
6983 }
6984
6985 return true;
6986 }
6987
6988 void visitIntrinsicInst(IntrinsicInst &I) {
6989 if (maybeHandleCrossPlatformIntrinsic(I))
6990 return;
6991
6992 if (maybeHandleX86SIMDIntrinsic(I))
6993 return;
6994
6995 if (maybeHandleArmSIMDIntrinsic(I))
6996 return;
6997
6998 if (maybeHandleUnknownIntrinsic(I))
6999 return;
7000
7001 visitInstruction(I);
7002 }
7003
7004 void visitLibAtomicLoad(CallBase &CB) {
7005 // Since we use getNextNode here, we can't have CB terminate the BB.
7006 assert(isa<CallInst>(CB));
7007
7008 IRBuilder<> IRB(&CB);
7009 Value *Size = CB.getArgOperand(i: 0);
7010 Value *SrcPtr = CB.getArgOperand(i: 1);
7011 Value *DstPtr = CB.getArgOperand(i: 2);
7012 Value *Ordering = CB.getArgOperand(i: 3);
7013 // Convert the call to have at least Acquire ordering to make sure
7014 // the shadow operations aren't reordered before it.
7015 Value *NewOrdering =
7016 IRB.CreateExtractElement(Vec: makeAddAcquireOrderingTable(IRB), Idx: Ordering);
7017 CB.setArgOperand(i: 3, v: NewOrdering);
7018
7019 NextNodeIRBuilder NextIRB(&CB);
7020 Value *SrcShadowPtr, *SrcOriginPtr;
7021 std::tie(args&: SrcShadowPtr, args&: SrcOriginPtr) =
7022 getShadowOriginPtr(Addr: SrcPtr, IRB&: NextIRB, ShadowTy: NextIRB.getInt8Ty(), Alignment: Align(1),
7023 /*isStore*/ false);
7024 Value *DstShadowPtr =
7025 getShadowOriginPtr(Addr: DstPtr, IRB&: NextIRB, ShadowTy: NextIRB.getInt8Ty(), Alignment: Align(1),
7026 /*isStore*/ true)
7027 .first;
7028
7029 NextIRB.CreateMemCpy(Dst: DstShadowPtr, DstAlign: Align(1), Src: SrcShadowPtr, SrcAlign: Align(1), Size);
7030 if (MS.TrackOrigins) {
7031 Value *SrcOrigin = NextIRB.CreateAlignedLoad(Ty: MS.OriginTy, Ptr: SrcOriginPtr,
7032 Align: kMinOriginAlignment);
7033 Value *NewOrigin = updateOrigin(V: SrcOrigin, IRB&: NextIRB);
7034 NextIRB.CreateCall(Callee: MS.MsanSetOriginFn, Args: {DstPtr, Size, NewOrigin});
7035 }
7036 }
7037
7038 void visitLibAtomicStore(CallBase &CB) {
7039 IRBuilder<> IRB(&CB);
7040 Value *Size = CB.getArgOperand(i: 0);
7041 Value *DstPtr = CB.getArgOperand(i: 2);
7042 Value *Ordering = CB.getArgOperand(i: 3);
7043 // Convert the call to have at least Release ordering to make sure
7044 // the shadow operations aren't reordered after it.
7045 Value *NewOrdering =
7046 IRB.CreateExtractElement(Vec: makeAddReleaseOrderingTable(IRB), Idx: Ordering);
7047 CB.setArgOperand(i: 3, v: NewOrdering);
7048
7049 Value *DstShadowPtr =
7050 getShadowOriginPtr(Addr: DstPtr, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: Align(1),
7051 /*isStore*/ true)
7052 .first;
7053
7054 // Atomic store always paints clean shadow/origin. See file header.
7055 IRB.CreateMemSet(Ptr: DstShadowPtr, Val: getCleanShadow(OrigTy: IRB.getInt8Ty()), Size,
7056 Align: Align(1));
7057 }
7058
7059 void visitCallBase(CallBase &CB) {
7060 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
7061 if (CB.isInlineAsm()) {
7062 // For inline asm (either a call to asm function, or callbr instruction),
7063 // do the usual thing: check argument shadow and mark all outputs as
7064 // clean. Note that any side effects of the inline asm that are not
7065 // immediately visible in its constraints are not handled.
7066 if (ClHandleAsmConservative)
7067 visitAsmInstruction(I&: CB);
7068 else
7069 visitInstruction(I&: CB);
7070 return;
7071 }
7072 LibFunc LF;
7073 if (TLI->getLibFunc(CB, F&: LF)) {
7074 // libatomic.a functions need to have special handling because there isn't
7075 // a good way to intercept them or compile the library with
7076 // instrumentation.
7077 switch (LF) {
7078 case LibFunc_atomic_load:
7079 if (!isa<CallInst>(Val: CB)) {
7080 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
7081 "Ignoring!\n";
7082 break;
7083 }
7084 visitLibAtomicLoad(CB);
7085 return;
7086 case LibFunc_atomic_store:
7087 visitLibAtomicStore(CB);
7088 return;
7089 default:
7090 break;
7091 }
7092 }
7093
7094 if (auto *Call = dyn_cast<CallInst>(Val: &CB)) {
7095 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7096
7097 // We are going to insert code that relies on the fact that the callee
7098 // will become a non-readonly function after it is instrumented by us. To
7099 // prevent this code from being optimized out, mark that function
7100 // non-readonly in advance.
7101 // TODO: We can likely do better than dropping memory() completely here.
7102 AttributeMask B;
7103 B.addAttribute(Val: Attribute::Memory).addAttribute(Val: Attribute::Speculatable);
7104
7105 Call->removeFnAttrs(AttrsToRemove: B);
7106 if (Function *Func = Call->getCalledFunction()) {
7107 Func->removeFnAttrs(Attrs: B);
7108 }
7109
7110 maybeMarkSanitizerLibraryCallNoBuiltin(CI: Call, TLI);
7111 }
7112 IRBuilder<> IRB(&CB);
7113 bool MayCheckCall = MS.EagerChecks;
7114 if (Function *Func = CB.getCalledFunction()) {
7115 // __sanitizer_unaligned_{load,store} functions may be called by users
7116 // and always expects shadows in the TLS. So don't check them.
7117 MayCheckCall &= !Func->getName().starts_with(Prefix: "__sanitizer_unaligned_");
7118 }
7119
7120 unsigned ArgOffset = 0;
7121 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7122 for (const auto &[i, A] : llvm::enumerate(First: CB.args())) {
7123 if (!A->getType()->isSized()) {
7124 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7125 continue;
7126 }
7127
7128 if (A->getType()->isScalableTy()) {
7129 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7130 // Handle as noundef, but don't reserve tls slots.
7131 insertCheckShadowOf(Val: A, OrigIns: &CB);
7132 continue;
7133 }
7134
7135 unsigned Size = 0;
7136 const DataLayout &DL = F.getDataLayout();
7137
7138 bool ByVal = CB.paramHasAttr(ArgNo: i, Kind: Attribute::ByVal);
7139 bool NoUndef = CB.paramHasAttr(ArgNo: i, Kind: Attribute::NoUndef);
7140 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7141
7142 if (EagerCheck) {
7143 insertCheckShadowOf(Val: A, OrigIns: &CB);
7144 Size = DL.getTypeAllocSize(Ty: A->getType());
7145 } else {
7146 [[maybe_unused]] Value *Store = nullptr;
7147 // Compute the Shadow for arg even if it is ByVal, because
7148 // in that case getShadow() will copy the actual arg shadow to
7149 // __msan_param_tls.
7150 Value *ArgShadow = getShadow(V: A);
7151 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7152 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7153 << " Shadow: " << *ArgShadow << "\n");
7154 if (ByVal) {
7155 // ByVal requires some special handling as it's too big for a single
7156 // load
7157 assert(A->getType()->isPointerTy() &&
7158 "ByVal argument is not a pointer!");
7159 Size = DL.getTypeAllocSize(Ty: CB.getParamByValType(ArgNo: i));
7160 if (ArgOffset + Size > kParamTLSSize)
7161 break;
7162 const MaybeAlign ParamAlignment(CB.getParamAlign(ArgNo: i));
7163 MaybeAlign Alignment = std::nullopt;
7164 if (ParamAlignment)
7165 Alignment = std::min(a: *ParamAlignment, b: kShadowTLSAlignment);
7166 Value *AShadowPtr, *AOriginPtr;
7167 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
7168 getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(), Alignment,
7169 /*isStore*/ false);
7170 if (!PropagateShadow) {
7171 Store = IRB.CreateMemSet(Ptr: ArgShadowBase,
7172 Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
7173 Size, Align: Alignment);
7174 } else {
7175 Store = IRB.CreateMemCpy(Dst: ArgShadowBase, DstAlign: Alignment, Src: AShadowPtr,
7176 SrcAlign: Alignment, Size);
7177 if (MS.TrackOrigins) {
7178 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7179 // FIXME: OriginSize should be:
7180 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7181 unsigned OriginSize = alignTo(Size, A: kMinOriginAlignment);
7182 IRB.CreateMemCpy(
7183 Dst: ArgOriginBase,
7184 /* by origin_tls[ArgOffset] */ DstAlign: kMinOriginAlignment,
7185 Src: AOriginPtr,
7186 /* by getShadowOriginPtr */ SrcAlign: kMinOriginAlignment, Size: OriginSize);
7187 }
7188 }
7189 } else {
7190 // Any other parameters mean we need bit-grained tracking of uninit
7191 // data
7192 Size = DL.getTypeAllocSize(Ty: A->getType());
7193 if (ArgOffset + Size > kParamTLSSize)
7194 break;
7195 Store = IRB.CreateAlignedStore(Val: ArgShadow, Ptr: ArgShadowBase,
7196 Align: kShadowTLSAlignment);
7197 Constant *Cst = dyn_cast<Constant>(Val: ArgShadow);
7198 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7199 IRB.CreateStore(Val: getOrigin(V: A),
7200 Ptr: getOriginPtrForArgument(IRB, ArgOffset));
7201 }
7202 }
7203 assert(Store != nullptr);
7204 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7205 }
7206 assert(Size != 0);
7207 ArgOffset += alignTo(Size, A: kShadowTLSAlignment);
7208 }
7209 LLVM_DEBUG(dbgs() << " done with call args\n");
7210
7211 FunctionType *FT = CB.getFunctionType();
7212 if (FT->isVarArg()) {
7213 VAHelper->visitCallBase(CB, IRB);
7214 }
7215
7216 // Now, get the shadow for the RetVal.
7217 if (!CB.getType()->isSized())
7218 return;
7219 // Don't emit the epilogue for musttail call returns.
7220 if (isa<CallInst>(Val: CB) && cast<CallInst>(Val&: CB).isMustTailCall())
7221 return;
7222
7223 if (MayCheckCall && CB.hasRetAttr(Kind: Attribute::NoUndef)) {
7224 setShadow(V: &CB, SV: getCleanShadow(V: &CB));
7225 setOrigin(V: &CB, Origin: getCleanOrigin());
7226 return;
7227 }
7228
7229 IRBuilder<> IRBBefore(&CB);
7230 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7231 Value *Base = getShadowPtrForRetval(IRB&: IRBBefore);
7232 IRBBefore.CreateAlignedStore(Val: getCleanShadow(V: &CB), Ptr: Base,
7233 Align: kShadowTLSAlignment);
7234 BasicBlock::iterator NextInsn;
7235 if (isa<CallInst>(Val: CB)) {
7236 NextInsn = ++CB.getIterator();
7237 assert(NextInsn != CB.getParent()->end());
7238 } else {
7239 BasicBlock *NormalDest = cast<InvokeInst>(Val&: CB).getNormalDest();
7240 if (!NormalDest->getSinglePredecessor()) {
7241 // FIXME: this case is tricky, so we are just conservative here.
7242 // Perhaps we need to split the edge between this BB and NormalDest,
7243 // but a naive attempt to use SplitEdge leads to a crash.
7244 setShadow(V: &CB, SV: getCleanShadow(V: &CB));
7245 setOrigin(V: &CB, Origin: getCleanOrigin());
7246 return;
7247 }
7248 // FIXME: NextInsn is likely in a basic block that has not been visited
7249 // yet. Anything inserted there will be instrumented by MSan later!
7250 NextInsn = NormalDest->getFirstInsertionPt();
7251 assert(NextInsn != NormalDest->end() &&
7252 "Could not find insertion point for retval shadow load");
7253 }
7254 IRBuilder<> IRBAfter(&*NextInsn);
7255 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7256 Ty: getShadowTy(V: &CB), Ptr: getShadowPtrForRetval(IRB&: IRBAfter), Align: kShadowTLSAlignment,
7257 Name: "_msret");
7258 setShadow(V: &CB, SV: RetvalShadow);
7259 if (MS.TrackOrigins)
7260 setOrigin(V: &CB, Origin: IRBAfter.CreateLoad(Ty: MS.OriginTy, Ptr: getOriginPtrForRetval()));
7261 }
7262
7263 bool isAMustTailRetVal(Value *RetVal) {
7264 if (auto *I = dyn_cast<BitCastInst>(Val: RetVal)) {
7265 RetVal = I->getOperand(i_nocapture: 0);
7266 }
7267 if (auto *I = dyn_cast<CallInst>(Val: RetVal)) {
7268 return I->isMustTailCall();
7269 }
7270 return false;
7271 }
7272
7273 void visitReturnInst(ReturnInst &I) {
7274 IRBuilder<> IRB(&I);
7275 Value *RetVal = I.getReturnValue();
7276 if (!RetVal)
7277 return;
7278 // Don't emit the epilogue for musttail call returns.
7279 if (isAMustTailRetVal(RetVal))
7280 return;
7281 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7282 bool HasNoUndef = F.hasRetAttribute(Kind: Attribute::NoUndef);
7283 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7284 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7285 // must always return fully initialized values. For now, we hardcode "main".
7286 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7287
7288 Value *Shadow = getShadow(V: RetVal);
7289 bool StoreOrigin = true;
7290 if (EagerCheck) {
7291 insertCheckShadowOf(Val: RetVal, OrigIns: &I);
7292 Shadow = getCleanShadow(V: RetVal);
7293 StoreOrigin = false;
7294 }
7295
7296 // The caller may still expect information passed over TLS if we pass our
7297 // check
7298 if (StoreShadow) {
7299 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowPtr, Align: kShadowTLSAlignment);
7300 if (MS.TrackOrigins && StoreOrigin)
7301 IRB.CreateStore(Val: getOrigin(V: RetVal), Ptr: getOriginPtrForRetval());
7302 }
7303 }
7304
7305 void visitPHINode(PHINode &I) {
7306 IRBuilder<> IRB(&I);
7307 if (!PropagateShadow) {
7308 setShadow(V: &I, SV: getCleanShadow(V: &I));
7309 setOrigin(V: &I, Origin: getCleanOrigin());
7310 return;
7311 }
7312
7313 ShadowPHINodes.push_back(Elt: &I);
7314 setShadow(V: &I, SV: IRB.CreatePHI(Ty: getShadowTy(V: &I), NumReservedValues: I.getNumIncomingValues(),
7315 Name: "_msphi_s"));
7316 if (MS.TrackOrigins)
7317 setOrigin(
7318 V: &I, Origin: IRB.CreatePHI(Ty: MS.OriginTy, NumReservedValues: I.getNumIncomingValues(), Name: "_msphi_o"));
7319 }
7320
7321 Value *getLocalVarIdptr(AllocaInst &I) {
7322 ConstantInt *IntConst =
7323 ConstantInt::get(Ty: Type::getInt32Ty(C&: (*F.getParent()).getContext()), V: 0);
7324 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7325 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7326 IntConst);
7327 }
7328
7329 Value *getLocalVarDescription(AllocaInst &I) {
7330 return createPrivateConstGlobalForString(M&: *F.getParent(), Str: I.getName());
7331 }
7332
7333 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7334 if (PoisonStack && ClPoisonStackWithCall) {
7335 IRB.CreateCall(Callee: MS.MsanPoisonStackFn, Args: {&I, Len});
7336 } else {
7337 Value *ShadowBase, *OriginBase;
7338 std::tie(args&: ShadowBase, args&: OriginBase) = getShadowOriginPtr(
7339 Addr: &I, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: Align(1), /*isStore*/ true);
7340
7341 Value *PoisonValue = IRB.getInt8(C: PoisonStack ? ClPoisonStackPattern : 0);
7342 IRB.CreateMemSet(Ptr: ShadowBase, Val: PoisonValue, Size: Len, Align: I.getAlign());
7343 }
7344
7345 if (PoisonStack && MS.TrackOrigins) {
7346 Value *Idptr = getLocalVarIdptr(I);
7347 if (ClPrintStackNames) {
7348 Value *Descr = getLocalVarDescription(I);
7349 IRB.CreateCall(Callee: MS.MsanSetAllocaOriginWithDescriptionFn,
7350 Args: {&I, Len, Idptr, Descr});
7351 } else {
7352 IRB.CreateCall(Callee: MS.MsanSetAllocaOriginNoDescriptionFn, Args: {&I, Len, Idptr});
7353 }
7354 }
7355 }
7356
7357 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7358 Value *Descr = getLocalVarDescription(I);
7359 if (PoisonStack) {
7360 IRB.CreateCall(Callee: MS.MsanPoisonAllocaFn, Args: {&I, Len, Descr});
7361 } else {
7362 IRB.CreateCall(Callee: MS.MsanUnpoisonAllocaFn, Args: {&I, Len});
7363 }
7364 }
7365
7366 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7367 if (!InsPoint)
7368 InsPoint = &I;
7369 NextNodeIRBuilder IRB(InsPoint);
7370 Value *Len = IRB.CreateAllocationSize(DestTy: MS.IntptrTy, AI: &I);
7371
7372 if (MS.CompileKernel)
7373 poisonAllocaKmsan(I, IRB, Len);
7374 else
7375 poisonAllocaUserspace(I, IRB, Len);
7376 }
7377
7378 void visitAllocaInst(AllocaInst &I) {
7379 setShadow(V: &I, SV: getCleanShadow(V: &I));
7380 setOrigin(V: &I, Origin: getCleanOrigin());
7381 // We'll get to this alloca later unless it's poisoned at the corresponding
7382 // llvm.lifetime.start.
7383 AllocaSet.insert(X: &I);
7384 }
7385
7386 void visitSelectInst(SelectInst &I) {
7387 // a = select b, c, d
7388 Value *B = I.getCondition();
7389 Value *C = I.getTrueValue();
7390 Value *D = I.getFalseValue();
7391
7392 handleSelectLikeInst(I, B, C, D);
7393 }
7394
7395 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7396 IRBuilder<> IRB(&I);
7397
7398 Value *Sb = getShadow(V: B);
7399 Value *Sc = getShadow(V: C);
7400 Value *Sd = getShadow(V: D);
7401
7402 Value *Ob = MS.TrackOrigins ? getOrigin(V: B) : nullptr;
7403 Value *Oc = MS.TrackOrigins ? getOrigin(V: C) : nullptr;
7404 Value *Od = MS.TrackOrigins ? getOrigin(V: D) : nullptr;
7405
7406 // Result shadow if condition shadow is 0.
7407 Value *Sa0 = IRB.CreateSelect(C: B, True: Sc, False: Sd);
7408 Value *Sa1;
7409 if (I.getType()->isAggregateType()) {
7410 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7411 // an extra "select". This results in much more compact IR.
7412 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7413 Sa1 = getPoisonedShadow(ShadowTy: getShadowTy(OrigTy: I.getType()));
7414 } else if (isScalableNonVectorType(Ty: I.getType())) {
7415 // This is intended to handle target("aarch64.svcount"), which can't be
7416 // handled in the else branch because of incompatibility with CreateXor
7417 // ("The supported LLVM operations on this type are limited to load,
7418 // store, phi, select and alloca instructions").
7419
7420 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7421 // branch as needed instead.
7422 Sa1 = getCleanShadow(OrigTy: getShadowTy(OrigTy: I.getType()));
7423 } else {
7424 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7425 // If Sb (condition is poisoned), look for bits in c and d that are equal
7426 // and both unpoisoned.
7427 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7428
7429 // Cast arguments to shadow-compatible type.
7430 C = CreateAppToShadowCast(IRB, V: C);
7431 D = CreateAppToShadowCast(IRB, V: D);
7432
7433 // Result shadow if condition shadow is 1.
7434 Sa1 = IRB.CreateOr(Ops: {IRB.CreateXor(LHS: C, RHS: D), Sc, Sd});
7435 }
7436 Value *Sa = IRB.CreateSelect(C: Sb, True: Sa1, False: Sa0, Name: "_msprop_select");
7437 setShadow(V: &I, SV: Sa);
7438 if (MS.TrackOrigins) {
7439 // Origins are always i32, so any vector conditions must be flattened.
7440 // FIXME: consider tracking vector origins for app vectors?
7441 if (B->getType()->isVectorTy()) {
7442 B = convertToBool(V: B, IRB);
7443 Sb = convertToBool(V: Sb, IRB);
7444 }
7445 // a = select b, c, d
7446 // Oa = Sb ? Ob : (b ? Oc : Od)
7447 setOrigin(V: &I, Origin: IRB.CreateSelect(C: Sb, True: Ob, False: IRB.CreateSelect(C: B, True: Oc, False: Od)));
7448 }
7449 }
7450
7451 void visitLandingPadInst(LandingPadInst &I) {
7452 // Do nothing.
7453 // See https://github.com/google/sanitizers/issues/504
7454 setShadow(V: &I, SV: getCleanShadow(V: &I));
7455 setOrigin(V: &I, Origin: getCleanOrigin());
7456 }
7457
7458 void visitCatchSwitchInst(CatchSwitchInst &I) {
7459 setShadow(V: &I, SV: getCleanShadow(V: &I));
7460 setOrigin(V: &I, Origin: getCleanOrigin());
7461 }
7462
7463 void visitFuncletPadInst(FuncletPadInst &I) {
7464 setShadow(V: &I, SV: getCleanShadow(V: &I));
7465 setOrigin(V: &I, Origin: getCleanOrigin());
7466 }
7467
7468 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7469
7470 void visitExtractValueInst(ExtractValueInst &I) {
7471 IRBuilder<> IRB(&I);
7472 Value *Agg = I.getAggregateOperand();
7473 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7474 Value *AggShadow = getShadow(V: Agg);
7475 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7476 Value *ResShadow = IRB.CreateExtractValue(Agg: AggShadow, Idxs: I.getIndices());
7477 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7478 setShadow(V: &I, SV: ResShadow);
7479 setOriginForNaryOp(I);
7480 }
7481
7482 void visitInsertValueInst(InsertValueInst &I) {
7483 IRBuilder<> IRB(&I);
7484 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7485 Value *AggShadow = getShadow(V: I.getAggregateOperand());
7486 Value *InsShadow = getShadow(V: I.getInsertedValueOperand());
7487 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7488 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7489 Value *Res = IRB.CreateInsertValue(Agg: AggShadow, Val: InsShadow, Idxs: I.getIndices());
7490 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7491 setShadow(V: &I, SV: Res);
7492 setOriginForNaryOp(I);
7493 }
7494
7495 void dumpInst(Instruction &I) {
7496 if (CallInst *CI = dyn_cast<CallInst>(Val: &I)) {
7497 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7498 } else {
7499 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7500 }
7501 errs() << "QQQ " << I << "\n";
7502 }
7503
7504 void visitResumeInst(ResumeInst &I) {
7505 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7506 // Nothing to do here.
7507 }
7508
7509 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7510 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7511 // Nothing to do here.
7512 }
7513
7514 void visitCatchReturnInst(CatchReturnInst &CRI) {
7515 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7516 // Nothing to do here.
7517 }
7518
7519 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7520 IRBuilder<> &IRB, const DataLayout &DL,
7521 bool isOutput) {
7522 // For each assembly argument, we check its value for being initialized.
7523 // If the argument is a pointer, we assume it points to a single element
7524 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7525 // Each such pointer is instrumented with a call to the runtime library.
7526 Type *OpType = Operand->getType();
7527 // Check the operand value itself.
7528 insertCheckShadowOf(Val: Operand, OrigIns: &I);
7529 if (!OpType->isPointerTy() || !isOutput) {
7530 assert(!isOutput);
7531 return;
7532 }
7533 if (!ElemTy->isSized())
7534 return;
7535 auto Size = DL.getTypeStoreSize(Ty: ElemTy);
7536 Value *SizeVal = IRB.CreateTypeSize(Ty: MS.IntptrTy, Size);
7537 if (MS.CompileKernel) {
7538 IRB.CreateCall(Callee: MS.MsanInstrumentAsmStoreFn, Args: {Operand, SizeVal});
7539 } else {
7540 // ElemTy, derived from elementtype(), does not encode the alignment of
7541 // the pointer. Conservatively assume that the shadow memory is unaligned.
7542 // When Size is large, avoid StoreInst as it would expand to many
7543 // instructions.
7544 auto [ShadowPtr, _] =
7545 getShadowOriginPtrUserspace(Addr: Operand, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: Align(1));
7546 if (Size <= 32)
7547 IRB.CreateAlignedStore(Val: getCleanShadow(OrigTy: ElemTy), Ptr: ShadowPtr, Align: Align(1));
7548 else
7549 IRB.CreateMemSet(Ptr: ShadowPtr, Val: ConstantInt::getNullValue(Ty: IRB.getInt8Ty()),
7550 Size: SizeVal, Align: Align(1));
7551 }
7552 }
7553
7554 /// Get the number of output arguments returned by pointers.
7555 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7556 int NumRetOutputs = 0;
7557 int NumOutputs = 0;
7558 Type *RetTy = cast<Value>(Val: CB)->getType();
7559 if (!RetTy->isVoidTy()) {
7560 // Register outputs are returned via the CallInst return value.
7561 auto *ST = dyn_cast<StructType>(Val: RetTy);
7562 if (ST)
7563 NumRetOutputs = ST->getNumElements();
7564 else
7565 NumRetOutputs = 1;
7566 }
7567 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7568 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7569 switch (Info.Type) {
7570 case InlineAsm::isOutput:
7571 NumOutputs++;
7572 break;
7573 default:
7574 break;
7575 }
7576 }
7577 return NumOutputs - NumRetOutputs;
7578 }
7579
7580 void visitAsmInstruction(Instruction &I) {
7581 // Conservative inline assembly handling: check for poisoned shadow of
7582 // asm() arguments, then unpoison the result and all the memory locations
7583 // pointed to by those arguments.
7584 // An inline asm() statement in C++ contains lists of input and output
7585 // arguments used by the assembly code. These are mapped to operands of the
7586 // CallInst as follows:
7587 // - nR register outputs ("=r) are returned by value in a single structure
7588 // (SSA value of the CallInst);
7589 // - nO other outputs ("=m" and others) are returned by pointer as first
7590 // nO operands of the CallInst;
7591 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7592 // remaining nI operands.
7593 // The total number of asm() arguments in the source is nR+nO+nI, and the
7594 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7595 // function to be called).
7596 const DataLayout &DL = F.getDataLayout();
7597 CallBase *CB = cast<CallBase>(Val: &I);
7598 IRBuilder<> IRB(&I);
7599 InlineAsm *IA = cast<InlineAsm>(Val: CB->getCalledOperand());
7600 int OutputArgs = getNumOutputArgs(IA, CB);
7601 // The last operand of a CallInst is the function itself.
7602 int NumOperands = CB->getNumOperands() - 1;
7603
7604 // Check input arguments. Doing so before unpoisoning output arguments, so
7605 // that we won't overwrite uninit values before checking them.
7606 for (int i = OutputArgs; i < NumOperands; i++) {
7607 Value *Operand = CB->getOperand(i_nocapture: i);
7608 instrumentAsmArgument(Operand, ElemTy: CB->getParamElementType(ArgNo: i), I, IRB, DL,
7609 /*isOutput*/ false);
7610 }
7611 // Unpoison output arguments. This must happen before the actual InlineAsm
7612 // call, so that the shadow for memory published in the asm() statement
7613 // remains valid.
7614 for (int i = 0; i < OutputArgs; i++) {
7615 Value *Operand = CB->getOperand(i_nocapture: i);
7616 instrumentAsmArgument(Operand, ElemTy: CB->getParamElementType(ArgNo: i), I, IRB, DL,
7617 /*isOutput*/ true);
7618 }
7619
7620 setShadow(V: &I, SV: getCleanShadow(V: &I));
7621 setOrigin(V: &I, Origin: getCleanOrigin());
7622 }
7623
7624 void visitFreezeInst(FreezeInst &I) {
7625 // Freeze always returns a fully defined value.
7626 setShadow(V: &I, SV: getCleanShadow(V: &I));
7627 setOrigin(V: &I, Origin: getCleanOrigin());
7628 }
7629
7630 void visitInstruction(Instruction &I) {
7631 // Everything else: stop propagating and check for poisoned shadow.
7632 if (ClDumpStrictInstructions)
7633 dumpInst(I);
7634 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7635 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7636 Value *Operand = I.getOperand(i);
7637 if (Operand->getType()->isSized())
7638 insertCheckShadowOf(Val: Operand, OrigIns: &I);
7639 }
7640 setShadow(V: &I, SV: getCleanShadow(V: &I));
7641 setOrigin(V: &I, Origin: getCleanOrigin());
7642 }
7643};
7644
7645struct VarArgHelperBase : public VarArgHelper {
7646 Function &F;
7647 MemorySanitizer &MS;
7648 MemorySanitizerVisitor &MSV;
7649 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7650 const unsigned VAListTagSize;
7651
7652 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7653 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7654 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7655
7656 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7657 Value *Base = IRB.CreatePointerCast(V: MS.VAArgTLS, DestTy: MS.IntptrTy);
7658 return IRB.CreateAdd(LHS: Base, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset));
7659 }
7660
7661 /// Compute the shadow address for a given va_arg.
7662 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7663 return IRB.CreatePtrAdd(
7664 Ptr: MS.VAArgTLS, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset), Name: "_msarg_va_s");
7665 }
7666
7667 /// Compute the shadow address for a given va_arg.
7668 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7669 unsigned ArgSize) {
7670 // Make sure we don't overflow __msan_va_arg_tls.
7671 if (ArgOffset + ArgSize > kParamTLSSize)
7672 return nullptr;
7673 return getShadowPtrForVAArgument(IRB, ArgOffset);
7674 }
7675
7676 /// Compute the origin address for a given va_arg.
7677 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7678 // getOriginPtrForVAArgument() is always called after
7679 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7680 // overflow.
7681 return IRB.CreatePtrAdd(Ptr: MS.VAArgOriginTLS,
7682 Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset),
7683 Name: "_msarg_va_o");
7684 }
7685
7686 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7687 unsigned BaseOffset) {
7688 // The tails of __msan_va_arg_tls is not large enough to fit full
7689 // value shadow, but it will be copied to backup anyway. Make it
7690 // clean.
7691 if (BaseOffset >= kParamTLSSize)
7692 return;
7693 Value *TailSize =
7694 ConstantInt::getSigned(Ty: IRB.getInt32Ty(), V: kParamTLSSize - BaseOffset);
7695 IRB.CreateMemSet(Ptr: ShadowBase, Val: ConstantInt::getNullValue(Ty: IRB.getInt8Ty()),
7696 Size: TailSize, Align: Align(8));
7697 }
7698
7699 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7700 IRBuilder<> IRB(&I);
7701 Value *VAListTag = I.getArgOperand(i: 0);
7702 const Align Alignment = Align(8);
7703 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7704 Addr: VAListTag, IRB, ShadowTy: IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7705 // Unpoison the whole __va_list_tag.
7706 IRB.CreateMemSet(Ptr: ShadowPtr, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
7707 Size: VAListTagSize, Align: Alignment, isVolatile: false);
7708 }
7709
7710 void visitVAStartInst(VAStartInst &I) override {
7711 if (F.getCallingConv() == CallingConv::Win64)
7712 return;
7713 VAStartInstrumentationList.push_back(Elt: &I);
7714 unpoisonVAListTagForInst(I);
7715 }
7716
7717 void visitVACopyInst(VACopyInst &I) override {
7718 if (F.getCallingConv() == CallingConv::Win64)
7719 return;
7720 unpoisonVAListTagForInst(I);
7721 }
7722};
7723
7724/// AMD64-specific implementation of VarArgHelper.
7725struct VarArgAMD64Helper : public VarArgHelperBase {
7726 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7727 // See a comment in visitCallBase for more details.
7728 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7729 static const unsigned AMD64FpEndOffsetSSE = 176;
7730 // If SSE is disabled, fp_offset in va_list is zero.
7731 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7732
7733 unsigned AMD64FpEndOffset;
7734 AllocaInst *VAArgTLSCopy = nullptr;
7735 AllocaInst *VAArgTLSOriginCopy = nullptr;
7736 Value *VAArgOverflowSize = nullptr;
7737
7738 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7739
7740 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7741 MemorySanitizerVisitor &MSV)
7742 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7743 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7744 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7745 if (Attr.isStringAttribute() &&
7746 (Attr.getKindAsString() == "target-features")) {
7747 if (Attr.getValueAsString().contains(Other: "-sse"))
7748 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7749 break;
7750 }
7751 }
7752 }
7753
7754 ArgKind classifyArgument(Value *arg) {
7755 // A very rough approximation of X86_64 argument classification rules.
7756 Type *T = arg->getType();
7757 if (T->isX86_FP80Ty())
7758 return AK_Memory;
7759 if (T->isFPOrFPVectorTy())
7760 return AK_FloatingPoint;
7761 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7762 return AK_GeneralPurpose;
7763 if (T->isPointerTy())
7764 return AK_GeneralPurpose;
7765 return AK_Memory;
7766 }
7767
7768 // For VarArg functions, store the argument shadow in an ABI-specific format
7769 // that corresponds to va_list layout.
7770 // We do this because Clang lowers va_arg in the frontend, and this pass
7771 // only sees the low level code that deals with va_list internals.
7772 // A much easier alternative (provided that Clang emits va_arg instructions)
7773 // would have been to associate each live instance of va_list with a copy of
7774 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7775 // order.
7776 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7777 unsigned GpOffset = 0;
7778 unsigned FpOffset = AMD64GpEndOffset;
7779 unsigned OverflowOffset = AMD64FpEndOffset;
7780 const DataLayout &DL = F.getDataLayout();
7781
7782 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
7783 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7784 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
7785 if (IsByVal) {
7786 // ByVal arguments always go to the overflow area.
7787 // Fixed arguments passed through the overflow area will be stepped
7788 // over by va_start, so don't count them towards the offset.
7789 if (IsFixed)
7790 continue;
7791 assert(A->getType()->isPointerTy());
7792 Type *RealTy = CB.getParamByValType(ArgNo);
7793 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
7794 uint64_t AlignedSize = alignTo(Value: ArgSize, Align: 8);
7795 unsigned BaseOffset = OverflowOffset;
7796 Value *ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
7797 Value *OriginBase = nullptr;
7798 if (MS.TrackOrigins)
7799 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
7800 OverflowOffset += AlignedSize;
7801
7802 if (OverflowOffset > kParamTLSSize) {
7803 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7804 continue; // We have no space to copy shadow there.
7805 }
7806
7807 Value *ShadowPtr, *OriginPtr;
7808 std::tie(args&: ShadowPtr, args&: OriginPtr) =
7809 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: kShadowTLSAlignment,
7810 /*isStore*/ false);
7811 IRB.CreateMemCpy(Dst: ShadowBase, DstAlign: kShadowTLSAlignment, Src: ShadowPtr,
7812 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
7813 if (MS.TrackOrigins)
7814 IRB.CreateMemCpy(Dst: OriginBase, DstAlign: kShadowTLSAlignment, Src: OriginPtr,
7815 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
7816 } else {
7817 ArgKind AK = classifyArgument(arg: A);
7818 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7819 AK = AK_Memory;
7820 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7821 AK = AK_Memory;
7822 Value *ShadowBase, *OriginBase = nullptr;
7823 switch (AK) {
7824 case AK_GeneralPurpose:
7825 ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: GpOffset);
7826 if (MS.TrackOrigins)
7827 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: GpOffset);
7828 GpOffset += 8;
7829 assert(GpOffset <= kParamTLSSize);
7830 break;
7831 case AK_FloatingPoint:
7832 ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: FpOffset);
7833 if (MS.TrackOrigins)
7834 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: FpOffset);
7835 FpOffset += 16;
7836 assert(FpOffset <= kParamTLSSize);
7837 break;
7838 case AK_Memory:
7839 if (IsFixed)
7840 continue;
7841 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
7842 uint64_t AlignedSize = alignTo(Value: ArgSize, Align: 8);
7843 unsigned BaseOffset = OverflowOffset;
7844 ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
7845 if (MS.TrackOrigins) {
7846 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
7847 }
7848 OverflowOffset += AlignedSize;
7849 if (OverflowOffset > kParamTLSSize) {
7850 // We have no space to copy shadow there.
7851 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7852 continue;
7853 }
7854 }
7855 // Take fixed arguments into account for GpOffset and FpOffset,
7856 // but don't actually store shadows for them.
7857 // TODO(glider): don't call get*PtrForVAArgument() for them.
7858 if (IsFixed)
7859 continue;
7860 Value *Shadow = MSV.getShadow(V: A);
7861 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowBase, Align: kShadowTLSAlignment);
7862 if (MS.TrackOrigins) {
7863 Value *Origin = MSV.getOrigin(V: A);
7864 TypeSize StoreSize = DL.getTypeStoreSize(Ty: Shadow->getType());
7865 MSV.paintOrigin(IRB, Origin, OriginPtr: OriginBase, TS: StoreSize,
7866 Alignment: std::max(a: kShadowTLSAlignment, b: kMinOriginAlignment));
7867 }
7868 }
7869 }
7870 Constant *OverflowSize =
7871 ConstantInt::get(Ty: IRB.getInt64Ty(), V: OverflowOffset - AMD64FpEndOffset);
7872 IRB.CreateStore(Val: OverflowSize, Ptr: MS.VAArgOverflowSizeTLS);
7873 }
7874
7875 void finalizeInstrumentation() override {
7876 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7877 "finalizeInstrumentation called twice");
7878 if (!VAStartInstrumentationList.empty()) {
7879 // If there is a va_start in this function, make a backup copy of
7880 // va_arg_tls somewhere in the function entry block.
7881 IRBuilder<> IRB(MSV.FnPrologueEnd);
7882 VAArgOverflowSize =
7883 IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
7884 Value *CopySize = IRB.CreateAdd(
7885 LHS: ConstantInt::get(Ty: MS.IntptrTy, V: AMD64FpEndOffset), RHS: VAArgOverflowSize);
7886 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
7887 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7888 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
7889 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
7890
7891 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7892 ID: Intrinsic::umin, LHS: CopySize,
7893 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
7894 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
7895 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
7896 if (MS.TrackOrigins) {
7897 VAArgTLSOriginCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
7898 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7899 IRB.CreateMemCpy(Dst: VAArgTLSOriginCopy, DstAlign: kShadowTLSAlignment,
7900 Src: MS.VAArgOriginTLS, SrcAlign: kShadowTLSAlignment, Size: SrcSize);
7901 }
7902 }
7903
7904 // Instrument va_start.
7905 // Copy va_list shadow from the backup copy of the TLS contents.
7906 for (CallInst *OrigInst : VAStartInstrumentationList) {
7907 NextNodeIRBuilder IRB(OrigInst);
7908 Value *VAListTag = OrigInst->getArgOperand(i: 0);
7909
7910 Value *RegSaveAreaPtrPtr =
7911 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: 16));
7912 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
7913 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7914 const Align Alignment = Align(16);
7915 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
7916 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
7917 Alignment, /*isStore*/ true);
7918 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
7919 Size: AMD64FpEndOffset);
7920 if (MS.TrackOrigins)
7921 IRB.CreateMemCpy(Dst: RegSaveAreaOriginPtr, DstAlign: Alignment, Src: VAArgTLSOriginCopy,
7922 SrcAlign: Alignment, Size: AMD64FpEndOffset);
7923 Value *OverflowArgAreaPtrPtr =
7924 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: 8));
7925 Value *OverflowArgAreaPtr =
7926 IRB.CreateLoad(Ty: MS.PtrTy, Ptr: OverflowArgAreaPtrPtr);
7927 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7928 std::tie(args&: OverflowArgAreaShadowPtr, args&: OverflowArgAreaOriginPtr) =
7929 MSV.getShadowOriginPtr(Addr: OverflowArgAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
7930 Alignment, /*isStore*/ true);
7931 Value *SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSCopy,
7932 Idx0: AMD64FpEndOffset);
7933 IRB.CreateMemCpy(Dst: OverflowArgAreaShadowPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
7934 Size: VAArgOverflowSize);
7935 if (MS.TrackOrigins) {
7936 SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSOriginCopy,
7937 Idx0: AMD64FpEndOffset);
7938 IRB.CreateMemCpy(Dst: OverflowArgAreaOriginPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
7939 Size: VAArgOverflowSize);
7940 }
7941 }
7942 }
7943};
7944
7945/// AArch64-specific implementation of VarArgHelper.
7946struct VarArgAArch64Helper : public VarArgHelperBase {
7947 static const unsigned kAArch64GrArgSize = 64;
7948 static const unsigned kAArch64VrArgSize = 128;
7949
7950 static const unsigned AArch64GrBegOffset = 0;
7951 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7952 // Make VR space aligned to 16 bytes.
7953 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7954 static const unsigned AArch64VrEndOffset =
7955 AArch64VrBegOffset + kAArch64VrArgSize;
7956 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7957
7958 AllocaInst *VAArgTLSCopy = nullptr;
7959 Value *VAArgOverflowSize = nullptr;
7960
7961 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7962
7963 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7964 MemorySanitizerVisitor &MSV)
7965 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7966
7967 // A very rough approximation of aarch64 argument classification rules.
7968 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7969 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7970 return {AK_GeneralPurpose, 1};
7971 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7972 return {AK_FloatingPoint, 1};
7973
7974 if (T->isArrayTy()) {
7975 auto R = classifyArgument(T: T->getArrayElementType());
7976 R.second *= T->getScalarType()->getArrayNumElements();
7977 return R;
7978 }
7979
7980 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(Val: T)) {
7981 auto R = classifyArgument(T: FV->getScalarType());
7982 R.second *= FV->getNumElements();
7983 return R;
7984 }
7985
7986 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7987 return {AK_Memory, 0};
7988 }
7989
7990 // The instrumentation stores the argument shadow in a non ABI-specific
7991 // format because it does not know which argument is named (since Clang,
7992 // like x86_64 case, lowers the va_args in the frontend and this pass only
7993 // sees the low level code that deals with va_list internals).
7994 // The first seven GR registers are saved in the first 56 bytes of the
7995 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7996 // the remaining arguments.
7997 // Using constant offset within the va_arg TLS array allows fast copy
7998 // in the finalize instrumentation.
7999 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8000 unsigned GrOffset = AArch64GrBegOffset;
8001 unsigned VrOffset = AArch64VrBegOffset;
8002 unsigned OverflowOffset = AArch64VAEndOffset;
8003
8004 const DataLayout &DL = F.getDataLayout();
8005 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8006 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8007 auto [AK, RegNum] = classifyArgument(T: A->getType());
8008 if (AK == AK_GeneralPurpose &&
8009 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
8010 AK = AK_Memory;
8011 if (AK == AK_FloatingPoint &&
8012 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
8013 AK = AK_Memory;
8014 Value *Base;
8015 switch (AK) {
8016 case AK_GeneralPurpose:
8017 Base = getShadowPtrForVAArgument(IRB, ArgOffset: GrOffset);
8018 GrOffset += 8 * RegNum;
8019 break;
8020 case AK_FloatingPoint:
8021 Base = getShadowPtrForVAArgument(IRB, ArgOffset: VrOffset);
8022 VrOffset += 16 * RegNum;
8023 break;
8024 case AK_Memory:
8025 // Don't count fixed arguments in the overflow area - va_start will
8026 // skip right over them.
8027 if (IsFixed)
8028 continue;
8029 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8030 uint64_t AlignedSize = alignTo(Value: ArgSize, Align: 8);
8031 unsigned BaseOffset = OverflowOffset;
8032 Base = getShadowPtrForVAArgument(IRB, ArgOffset: BaseOffset);
8033 OverflowOffset += AlignedSize;
8034 if (OverflowOffset > kParamTLSSize) {
8035 // We have no space to copy shadow there.
8036 CleanUnusedTLS(IRB, ShadowBase: Base, BaseOffset);
8037 continue;
8038 }
8039 break;
8040 }
8041 // Count Gp/Vr fixed arguments to their respective offsets, but don't
8042 // bother to actually store a shadow.
8043 if (IsFixed)
8044 continue;
8045 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
8046 }
8047 Constant *OverflowSize =
8048 ConstantInt::get(Ty: IRB.getInt64Ty(), V: OverflowOffset - AArch64VAEndOffset);
8049 IRB.CreateStore(Val: OverflowSize, Ptr: MS.VAArgOverflowSizeTLS);
8050 }
8051
8052 // Retrieve a va_list field of 'void*' size.
8053 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8054 Value *SaveAreaPtrPtr =
8055 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: offset));
8056 return IRB.CreateLoad(Ty: Type::getInt64Ty(C&: *MS.C), Ptr: SaveAreaPtrPtr);
8057 }
8058
8059 // Retrieve a va_list field of 'int' size.
8060 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8061 Value *SaveAreaPtr =
8062 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: offset));
8063 Value *SaveArea32 = IRB.CreateLoad(Ty: IRB.getInt32Ty(), Ptr: SaveAreaPtr);
8064 return IRB.CreateSExt(V: SaveArea32, DestTy: MS.IntptrTy);
8065 }
8066
8067 void finalizeInstrumentation() override {
8068 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8069 "finalizeInstrumentation called twice");
8070 if (!VAStartInstrumentationList.empty()) {
8071 // If there is a va_start in this function, make a backup copy of
8072 // va_arg_tls somewhere in the function entry block.
8073 IRBuilder<> IRB(MSV.FnPrologueEnd);
8074 VAArgOverflowSize =
8075 IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8076 Value *CopySize = IRB.CreateAdd(
8077 LHS: ConstantInt::get(Ty: MS.IntptrTy, V: AArch64VAEndOffset), RHS: VAArgOverflowSize);
8078 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8079 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8080 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8081 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8082
8083 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8084 ID: Intrinsic::umin, LHS: CopySize,
8085 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8086 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8087 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8088 }
8089
8090 Value *GrArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: kAArch64GrArgSize);
8091 Value *VrArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: kAArch64VrArgSize);
8092
8093 // Instrument va_start, copy va_list shadow from the backup copy of
8094 // the TLS contents.
8095 for (CallInst *OrigInst : VAStartInstrumentationList) {
8096 NextNodeIRBuilder IRB(OrigInst);
8097
8098 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8099
8100 // The variadic ABI for AArch64 creates two areas to save the incoming
8101 // argument registers (one for 64-bit general register xn-x7 and another
8102 // for 128-bit FP/SIMD vn-v7).
8103 // We need then to propagate the shadow arguments on both regions
8104 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8105 // The remaining arguments are saved on shadow for 'va::stack'.
8106 // One caveat is it requires only to propagate the non-named arguments,
8107 // however on the call site instrumentation 'all' the arguments are
8108 // saved. So to copy the shadow values from the va_arg TLS array
8109 // we need to adjust the offset for both GR and VR fields based on
8110 // the __{gr,vr}_offs value (since they are stores based on incoming
8111 // named arguments).
8112 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8113
8114 // Read the stack pointer from the va_list.
8115 Value *StackSaveAreaPtr =
8116 IRB.CreateIntToPtr(V: getVAField64(IRB, VAListTag, offset: 0), DestTy: RegSaveAreaPtrTy);
8117
8118 // Read both the __gr_top and __gr_off and add them up.
8119 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, offset: 8);
8120 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, offset: 24);
8121
8122 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8123 V: IRB.CreateAdd(LHS: GrTopSaveAreaPtr, RHS: GrOffSaveArea), DestTy: RegSaveAreaPtrTy);
8124
8125 // Read both the __vr_top and __vr_off and add them up.
8126 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, offset: 16);
8127 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, offset: 28);
8128
8129 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8130 V: IRB.CreateAdd(LHS: VrTopSaveAreaPtr, RHS: VrOffSaveArea), DestTy: RegSaveAreaPtrTy);
8131
8132 // It does not know how many named arguments is being used and, on the
8133 // callsite all the arguments were saved. Since __gr_off is defined as
8134 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8135 // argument by ignoring the bytes of shadow from named arguments.
8136 Value *GrRegSaveAreaShadowPtrOff =
8137 IRB.CreateAdd(LHS: GrArgSize, RHS: GrOffSaveArea);
8138
8139 Value *GrRegSaveAreaShadowPtr =
8140 MSV.getShadowOriginPtr(Addr: GrRegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8141 Alignment: Align(8), /*isStore*/ true)
8142 .first;
8143
8144 Value *GrSrcPtr =
8145 IRB.CreateInBoundsPtrAdd(Ptr: VAArgTLSCopy, Offset: GrRegSaveAreaShadowPtrOff);
8146 Value *GrCopySize = IRB.CreateSub(LHS: GrArgSize, RHS: GrRegSaveAreaShadowPtrOff);
8147
8148 IRB.CreateMemCpy(Dst: GrRegSaveAreaShadowPtr, DstAlign: Align(8), Src: GrSrcPtr, SrcAlign: Align(8),
8149 Size: GrCopySize);
8150
8151 // Again, but for FP/SIMD values.
8152 Value *VrRegSaveAreaShadowPtrOff =
8153 IRB.CreateAdd(LHS: VrArgSize, RHS: VrOffSaveArea);
8154
8155 Value *VrRegSaveAreaShadowPtr =
8156 MSV.getShadowOriginPtr(Addr: VrRegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8157 Alignment: Align(8), /*isStore*/ true)
8158 .first;
8159
8160 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8161 Ptr: IRB.CreateInBoundsPtrAdd(Ptr: VAArgTLSCopy,
8162 Offset: IRB.getInt32(C: AArch64VrBegOffset)),
8163 Offset: VrRegSaveAreaShadowPtrOff);
8164 Value *VrCopySize = IRB.CreateSub(LHS: VrArgSize, RHS: VrRegSaveAreaShadowPtrOff);
8165
8166 IRB.CreateMemCpy(Dst: VrRegSaveAreaShadowPtr, DstAlign: Align(8), Src: VrSrcPtr, SrcAlign: Align(8),
8167 Size: VrCopySize);
8168
8169 // And finally for remaining arguments.
8170 Value *StackSaveAreaShadowPtr =
8171 MSV.getShadowOriginPtr(Addr: StackSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8172 Alignment: Align(16), /*isStore*/ true)
8173 .first;
8174
8175 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8176 Ptr: VAArgTLSCopy, Offset: IRB.getInt32(C: AArch64VAEndOffset));
8177
8178 IRB.CreateMemCpy(Dst: StackSaveAreaShadowPtr, DstAlign: Align(16), Src: StackSrcPtr,
8179 SrcAlign: Align(16), Size: VAArgOverflowSize);
8180 }
8181 }
8182};
8183
8184/// PowerPC64-specific implementation of VarArgHelper.
8185struct VarArgPowerPC64Helper : public VarArgHelperBase {
8186 AllocaInst *VAArgTLSCopy = nullptr;
8187 Value *VAArgSize = nullptr;
8188
8189 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8190 MemorySanitizerVisitor &MSV)
8191 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8192
8193 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8194 // For PowerPC, we need to deal with alignment of stack arguments -
8195 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8196 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8197 // For that reason, we compute current offset from stack pointer (which is
8198 // always properly aligned), and offset for the first vararg, then subtract
8199 // them.
8200 unsigned VAArgBase;
8201 Triple TargetTriple(F.getParent()->getTargetTriple());
8202 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8203 // and 32 bytes for ABIv2. This is usually determined by target
8204 // endianness, but in theory could be overridden by function attribute.
8205 if (TargetTriple.isPPC64ELFv2ABI())
8206 VAArgBase = 32;
8207 else
8208 VAArgBase = 48;
8209 unsigned VAArgOffset = VAArgBase;
8210 const DataLayout &DL = F.getDataLayout();
8211 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8212 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8213 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
8214 if (IsByVal) {
8215 assert(A->getType()->isPointerTy());
8216 Type *RealTy = CB.getParamByValType(ArgNo);
8217 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
8218 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(u: Align(8));
8219 if (ArgAlign < 8)
8220 ArgAlign = Align(8);
8221 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8222 if (!IsFixed) {
8223 Value *Base =
8224 getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase, ArgSize);
8225 if (Base) {
8226 Value *AShadowPtr, *AOriginPtr;
8227 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
8228 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(),
8229 Alignment: kShadowTLSAlignment, /*isStore*/ false);
8230
8231 IRB.CreateMemCpy(Dst: Base, DstAlign: kShadowTLSAlignment, Src: AShadowPtr,
8232 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8233 }
8234 }
8235 VAArgOffset += alignTo(Size: ArgSize, A: Align(8));
8236 } else {
8237 Value *Base;
8238 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8239 Align ArgAlign = Align(8);
8240 if (A->getType()->isArrayTy()) {
8241 // Arrays are aligned to element size, except for long double
8242 // arrays, which are aligned to 8 bytes.
8243 Type *ElementTy = A->getType()->getArrayElementType();
8244 if (!ElementTy->isPPC_FP128Ty())
8245 ArgAlign = Align(DL.getTypeAllocSize(Ty: ElementTy));
8246 } else if (A->getType()->isVectorTy()) {
8247 // Vectors are naturally aligned.
8248 ArgAlign = Align(ArgSize);
8249 }
8250 if (ArgAlign < 8)
8251 ArgAlign = Align(8);
8252 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8253 if (DL.isBigEndian()) {
8254 // Adjusting the shadow for argument with size < 8 to match the
8255 // placement of bits in big endian system
8256 if (ArgSize < 8)
8257 VAArgOffset += (8 - ArgSize);
8258 }
8259 if (!IsFixed) {
8260 Base =
8261 getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase, ArgSize);
8262 if (Base)
8263 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
8264 }
8265 VAArgOffset += ArgSize;
8266 VAArgOffset = alignTo(Size: VAArgOffset, A: Align(8));
8267 }
8268 if (IsFixed)
8269 VAArgBase = VAArgOffset;
8270 }
8271
8272 Constant *TotalVAArgSize =
8273 ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset - VAArgBase);
8274 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8275 // a new class member i.e. it is the total size of all VarArgs.
8276 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
8277 }
8278
8279 void finalizeInstrumentation() override {
8280 assert(!VAArgSize && !VAArgTLSCopy &&
8281 "finalizeInstrumentation called twice");
8282 IRBuilder<> IRB(MSV.FnPrologueEnd);
8283 VAArgSize = IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8284 Value *CopySize = VAArgSize;
8285
8286 if (!VAStartInstrumentationList.empty()) {
8287 // If there is a va_start in this function, make a backup copy of
8288 // va_arg_tls somewhere in the function entry block.
8289
8290 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8291 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8292 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8293 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8294
8295 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8296 ID: Intrinsic::umin, LHS: CopySize,
8297 RHS: ConstantInt::get(Ty: IRB.getInt64Ty(), V: kParamTLSSize));
8298 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8299 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8300 }
8301
8302 // Instrument va_start.
8303 // Copy va_list shadow from the backup copy of the TLS contents.
8304 for (CallInst *OrigInst : VAStartInstrumentationList) {
8305 NextNodeIRBuilder IRB(OrigInst);
8306 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8307 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy);
8308
8309 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(V: RegSaveAreaPtrPtr, DestTy: MS.PtrTy);
8310
8311 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8312 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8313 const DataLayout &DL = F.getDataLayout();
8314 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8315 const Align Alignment = Align(IntptrSize);
8316 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8317 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8318 Alignment, /*isStore*/ true);
8319 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8320 Size: CopySize);
8321 }
8322 }
8323};
8324
8325/// PowerPC32-specific implementation of VarArgHelper.
8326struct VarArgPowerPC32Helper : public VarArgHelperBase {
8327 AllocaInst *VAArgTLSCopy = nullptr;
8328 Value *VAArgSize = nullptr;
8329
8330 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8331 MemorySanitizerVisitor &MSV)
8332 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8333
8334 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8335 unsigned VAArgBase;
8336 // Parameter save area is 8 bytes from frame pointer in PPC32
8337 VAArgBase = 8;
8338 unsigned VAArgOffset = VAArgBase;
8339 const DataLayout &DL = F.getDataLayout();
8340 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8341 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8342 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8343 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
8344 if (IsByVal) {
8345 assert(A->getType()->isPointerTy());
8346 Type *RealTy = CB.getParamByValType(ArgNo);
8347 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
8348 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(u: Align(IntptrSize));
8349 if (ArgAlign < IntptrSize)
8350 ArgAlign = Align(IntptrSize);
8351 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8352 if (!IsFixed) {
8353 Value *Base =
8354 getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase, ArgSize);
8355 if (Base) {
8356 Value *AShadowPtr, *AOriginPtr;
8357 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
8358 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(),
8359 Alignment: kShadowTLSAlignment, /*isStore*/ false);
8360
8361 IRB.CreateMemCpy(Dst: Base, DstAlign: kShadowTLSAlignment, Src: AShadowPtr,
8362 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8363 }
8364 }
8365 VAArgOffset += alignTo(Size: ArgSize, A: Align(IntptrSize));
8366 } else {
8367 Value *Base;
8368 Type *ArgTy = A->getType();
8369
8370 // On PPC 32 floating point variable arguments are stored in separate
8371 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8372 // them as they will be found when checking call arguments.
8373 if (!ArgTy->isFloatingPointTy()) {
8374 uint64_t ArgSize = DL.getTypeAllocSize(Ty: ArgTy);
8375 Align ArgAlign = Align(IntptrSize);
8376 if (ArgTy->isArrayTy()) {
8377 // Arrays are aligned to element size, except for long double
8378 // arrays, which are aligned to 8 bytes.
8379 Type *ElementTy = ArgTy->getArrayElementType();
8380 if (!ElementTy->isPPC_FP128Ty())
8381 ArgAlign = Align(DL.getTypeAllocSize(Ty: ElementTy));
8382 } else if (ArgTy->isVectorTy()) {
8383 // Vectors are naturally aligned.
8384 ArgAlign = Align(ArgSize);
8385 }
8386 if (ArgAlign < IntptrSize)
8387 ArgAlign = Align(IntptrSize);
8388 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8389 if (DL.isBigEndian()) {
8390 // Adjusting the shadow for argument with size < IntptrSize to match
8391 // the placement of bits in big endian system
8392 if (ArgSize < IntptrSize)
8393 VAArgOffset += (IntptrSize - ArgSize);
8394 }
8395 if (!IsFixed) {
8396 Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase,
8397 ArgSize);
8398 if (Base)
8399 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base,
8400 Align: kShadowTLSAlignment);
8401 }
8402 VAArgOffset += ArgSize;
8403 VAArgOffset = alignTo(Size: VAArgOffset, A: Align(IntptrSize));
8404 }
8405 }
8406 }
8407
8408 Constant *TotalVAArgSize =
8409 ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset - VAArgBase);
8410 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8411 // a new class member i.e. it is the total size of all VarArgs.
8412 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
8413 }
8414
8415 void finalizeInstrumentation() override {
8416 assert(!VAArgSize && !VAArgTLSCopy &&
8417 "finalizeInstrumentation called twice");
8418 IRBuilder<> IRB(MSV.FnPrologueEnd);
8419 VAArgSize = IRB.CreateLoad(Ty: MS.IntptrTy, Ptr: MS.VAArgOverflowSizeTLS);
8420 Value *CopySize = VAArgSize;
8421
8422 if (!VAStartInstrumentationList.empty()) {
8423 // If there is a va_start in this function, make a backup copy of
8424 // va_arg_tls somewhere in the function entry block.
8425
8426 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8427 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8428 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8429 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8430
8431 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8432 ID: Intrinsic::umin, LHS: CopySize,
8433 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8434 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8435 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8436 }
8437
8438 // Instrument va_start.
8439 // Copy va_list shadow from the backup copy of the TLS contents.
8440 for (CallInst *OrigInst : VAStartInstrumentationList) {
8441 NextNodeIRBuilder IRB(OrigInst);
8442 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8443 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy);
8444 Value *RegSaveAreaSize = CopySize;
8445
8446 // In PPC32 va_list_tag is a struct
8447 RegSaveAreaPtrPtr =
8448 IRB.CreateAdd(LHS: RegSaveAreaPtrPtr, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 8));
8449
8450 // On PPC 32 reg_save_area can only hold 32 bytes of data
8451 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8452 ID: Intrinsic::umin, LHS: CopySize, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 32));
8453
8454 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(V: RegSaveAreaPtrPtr, DestTy: MS.PtrTy);
8455 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8456
8457 const DataLayout &DL = F.getDataLayout();
8458 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8459 const Align Alignment = Align(IntptrSize);
8460
8461 { // Copy reg save area
8462 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8463 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8464 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8465 Alignment, /*isStore*/ true);
8466 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy,
8467 SrcAlign: Alignment, Size: RegSaveAreaSize);
8468
8469 RegSaveAreaShadowPtr =
8470 IRB.CreatePtrToInt(V: RegSaveAreaShadowPtr, DestTy: MS.IntptrTy);
8471 Value *FPSaveArea = IRB.CreateAdd(LHS: RegSaveAreaShadowPtr,
8472 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 32));
8473 FPSaveArea = IRB.CreateIntToPtr(V: FPSaveArea, DestTy: MS.PtrTy);
8474 // We fill fp shadow with zeroes as uninitialized fp args should have
8475 // been found during call base check
8476 IRB.CreateMemSet(Ptr: FPSaveArea, Val: ConstantInt::getNullValue(Ty: IRB.getInt8Ty()),
8477 Size: ConstantInt::get(Ty: MS.IntptrTy, V: 32), Align: Alignment);
8478 }
8479
8480 { // Copy overflow area
8481 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8482 Value *OverflowAreaSize = IRB.CreateSub(LHS: CopySize, RHS: RegSaveAreaSize);
8483
8484 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy);
8485 OverflowAreaPtrPtr =
8486 IRB.CreateAdd(LHS: OverflowAreaPtrPtr, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 4));
8487 OverflowAreaPtrPtr = IRB.CreateIntToPtr(V: OverflowAreaPtrPtr, DestTy: MS.PtrTy);
8488
8489 Value *OverflowAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: OverflowAreaPtrPtr);
8490
8491 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8492 std::tie(args&: OverflowAreaShadowPtr, args&: OverflowAreaOriginPtr) =
8493 MSV.getShadowOriginPtr(Addr: OverflowAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8494 Alignment, /*isStore*/ true);
8495
8496 Value *OverflowVAArgTLSCopyPtr =
8497 IRB.CreatePtrToInt(V: VAArgTLSCopy, DestTy: MS.IntptrTy);
8498 OverflowVAArgTLSCopyPtr =
8499 IRB.CreateAdd(LHS: OverflowVAArgTLSCopyPtr, RHS: RegSaveAreaSize);
8500
8501 OverflowVAArgTLSCopyPtr =
8502 IRB.CreateIntToPtr(V: OverflowVAArgTLSCopyPtr, DestTy: MS.PtrTy);
8503 IRB.CreateMemCpy(Dst: OverflowAreaShadowPtr, DstAlign: Alignment,
8504 Src: OverflowVAArgTLSCopyPtr, SrcAlign: Alignment, Size: OverflowAreaSize);
8505 }
8506 }
8507 }
8508};
8509
8510/// SystemZ-specific implementation of VarArgHelper.
8511struct VarArgSystemZHelper : public VarArgHelperBase {
8512 static const unsigned SystemZGpOffset = 16;
8513 static const unsigned SystemZGpEndOffset = 56;
8514 static const unsigned SystemZFpOffset = 128;
8515 static const unsigned SystemZFpEndOffset = 160;
8516 static const unsigned SystemZMaxVrArgs = 8;
8517 static const unsigned SystemZRegSaveAreaSize = 160;
8518 static const unsigned SystemZOverflowOffset = 160;
8519 static const unsigned SystemZVAListTagSize = 32;
8520 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8521 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8522
8523 bool IsSoftFloatABI;
8524 AllocaInst *VAArgTLSCopy = nullptr;
8525 AllocaInst *VAArgTLSOriginCopy = nullptr;
8526 Value *VAArgOverflowSize = nullptr;
8527
8528 enum class ArgKind {
8529 GeneralPurpose,
8530 FloatingPoint,
8531 Vector,
8532 Memory,
8533 Indirect,
8534 };
8535
8536 enum class ShadowExtension { None, Zero, Sign };
8537
8538 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8539 MemorySanitizerVisitor &MSV)
8540 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8541 IsSoftFloatABI(F.getFnAttribute(Kind: "use-soft-float").getValueAsBool()) {}
8542
8543 ArgKind classifyArgument(Type *T) {
8544 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8545 // only a few possibilities of what it can be. In particular, enums, single
8546 // element structs and large types have already been taken care of.
8547
8548 // Some i128 and fp128 arguments are converted to pointers only in the
8549 // back end.
8550 if (T->isIntegerTy(Bitwidth: 128) || T->isFP128Ty())
8551 return ArgKind::Indirect;
8552 if (T->isFloatingPointTy())
8553 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8554 if (T->isIntegerTy() || T->isPointerTy())
8555 return ArgKind::GeneralPurpose;
8556 if (T->isVectorTy())
8557 return ArgKind::Vector;
8558 return ArgKind::Memory;
8559 }
8560
8561 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8562 // ABI says: "One of the simple integer types no more than 64 bits wide.
8563 // ... If such an argument is shorter than 64 bits, replace it by a full
8564 // 64-bit integer representing the same number, using sign or zero
8565 // extension". Shadow for an integer argument has the same type as the
8566 // argument itself, so it can be sign or zero extended as well.
8567 bool ZExt = CB.paramHasAttr(ArgNo, Kind: Attribute::ZExt);
8568 bool SExt = CB.paramHasAttr(ArgNo, Kind: Attribute::SExt);
8569 if (ZExt) {
8570 assert(!SExt);
8571 return ShadowExtension::Zero;
8572 }
8573 if (SExt) {
8574 assert(!ZExt);
8575 return ShadowExtension::Sign;
8576 }
8577 return ShadowExtension::None;
8578 }
8579
8580 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8581 unsigned GpOffset = SystemZGpOffset;
8582 unsigned FpOffset = SystemZFpOffset;
8583 unsigned VrIndex = 0;
8584 unsigned OverflowOffset = SystemZOverflowOffset;
8585 const DataLayout &DL = F.getDataLayout();
8586 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8587 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8588 // SystemZABIInfo does not produce ByVal parameters.
8589 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8590 Type *T = A->getType();
8591 ArgKind AK = classifyArgument(T);
8592 if (AK == ArgKind::Indirect) {
8593 T = MS.PtrTy;
8594 AK = ArgKind::GeneralPurpose;
8595 }
8596 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8597 AK = ArgKind::Memory;
8598 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8599 AK = ArgKind::Memory;
8600 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8601 AK = ArgKind::Memory;
8602 Value *ShadowBase = nullptr;
8603 Value *OriginBase = nullptr;
8604 ShadowExtension SE = ShadowExtension::None;
8605 switch (AK) {
8606 case ArgKind::GeneralPurpose: {
8607 // Always keep track of GpOffset, but store shadow only for varargs.
8608 uint64_t ArgSize = 8;
8609 if (GpOffset + ArgSize <= kParamTLSSize) {
8610 if (!IsFixed) {
8611 SE = getShadowExtension(CB, ArgNo);
8612 uint64_t GapSize = 0;
8613 if (SE == ShadowExtension::None) {
8614 uint64_t ArgAllocSize = DL.getTypeAllocSize(Ty: T);
8615 assert(ArgAllocSize <= ArgSize);
8616 GapSize = ArgSize - ArgAllocSize;
8617 }
8618 ShadowBase = getShadowAddrForVAArgument(IRB, ArgOffset: GpOffset + GapSize);
8619 if (MS.TrackOrigins)
8620 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: GpOffset + GapSize);
8621 }
8622 GpOffset += ArgSize;
8623 } else {
8624 GpOffset = kParamTLSSize;
8625 }
8626 break;
8627 }
8628 case ArgKind::FloatingPoint: {
8629 // Always keep track of FpOffset, but store shadow only for varargs.
8630 uint64_t ArgSize = 8;
8631 if (FpOffset + ArgSize <= kParamTLSSize) {
8632 if (!IsFixed) {
8633 // PoP says: "A short floating-point datum requires only the
8634 // left-most 32 bit positions of a floating-point register".
8635 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8636 // don't extend shadow and don't mind the gap.
8637 ShadowBase = getShadowAddrForVAArgument(IRB, ArgOffset: FpOffset);
8638 if (MS.TrackOrigins)
8639 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: FpOffset);
8640 }
8641 FpOffset += ArgSize;
8642 } else {
8643 FpOffset = kParamTLSSize;
8644 }
8645 break;
8646 }
8647 case ArgKind::Vector: {
8648 // Keep track of VrIndex. No need to store shadow, since vector varargs
8649 // go through AK_Memory.
8650 assert(IsFixed);
8651 VrIndex++;
8652 break;
8653 }
8654 case ArgKind::Memory: {
8655 // Keep track of OverflowOffset and store shadow only for varargs.
8656 // Ignore fixed args, since we need to copy only the vararg portion of
8657 // the overflow area shadow.
8658 if (!IsFixed) {
8659 uint64_t ArgAllocSize = DL.getTypeAllocSize(Ty: T);
8660 uint64_t ArgSize = alignTo(Value: ArgAllocSize, Align: 8);
8661 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8662 SE = getShadowExtension(CB, ArgNo);
8663 uint64_t GapSize =
8664 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8665 ShadowBase =
8666 getShadowAddrForVAArgument(IRB, ArgOffset: OverflowOffset + GapSize);
8667 if (MS.TrackOrigins)
8668 OriginBase =
8669 getOriginPtrForVAArgument(IRB, ArgOffset: OverflowOffset + GapSize);
8670 OverflowOffset += ArgSize;
8671 } else {
8672 OverflowOffset = kParamTLSSize;
8673 }
8674 }
8675 break;
8676 }
8677 case ArgKind::Indirect:
8678 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8679 }
8680 if (ShadowBase == nullptr)
8681 continue;
8682 Value *Shadow = MSV.getShadow(V: A);
8683 if (SE != ShadowExtension::None)
8684 Shadow = MSV.CreateShadowCast(IRB, V: Shadow, dstTy: IRB.getInt64Ty(),
8685 /*Signed*/ SE == ShadowExtension::Sign);
8686 ShadowBase = IRB.CreateIntToPtr(V: ShadowBase, DestTy: MS.PtrTy, Name: "_msarg_va_s");
8687 IRB.CreateStore(Val: Shadow, Ptr: ShadowBase);
8688 if (MS.TrackOrigins) {
8689 Value *Origin = MSV.getOrigin(V: A);
8690 TypeSize StoreSize = DL.getTypeStoreSize(Ty: Shadow->getType());
8691 MSV.paintOrigin(IRB, Origin, OriginPtr: OriginBase, TS: StoreSize,
8692 Alignment: kMinOriginAlignment);
8693 }
8694 }
8695 Constant *OverflowSize = ConstantInt::get(
8696 Ty: IRB.getInt64Ty(), V: OverflowOffset - SystemZOverflowOffset);
8697 IRB.CreateStore(Val: OverflowSize, Ptr: MS.VAArgOverflowSizeTLS);
8698 }
8699
8700 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8701 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8702 V: IRB.CreateAdd(
8703 LHS: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
8704 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: SystemZRegSaveAreaPtrOffset)),
8705 DestTy: MS.PtrTy);
8706 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8707 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8708 const Align Alignment = Align(8);
8709 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8710 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(), Alignment,
8711 /*isStore*/ true);
8712 // TODO(iii): copy only fragments filled by visitCallBase()
8713 // TODO(iii): support packed-stack && !use-soft-float
8714 // For use-soft-float functions, it is enough to copy just the GPRs.
8715 unsigned RegSaveAreaSize =
8716 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8717 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8718 Size: RegSaveAreaSize);
8719 if (MS.TrackOrigins)
8720 IRB.CreateMemCpy(Dst: RegSaveAreaOriginPtr, DstAlign: Alignment, Src: VAArgTLSOriginCopy,
8721 SrcAlign: Alignment, Size: RegSaveAreaSize);
8722 }
8723
8724 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8725 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8726 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8727 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8728 V: IRB.CreateAdd(
8729 LHS: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
8730 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: SystemZOverflowArgAreaPtrOffset)),
8731 DestTy: MS.PtrTy);
8732 Value *OverflowArgAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: OverflowArgAreaPtrPtr);
8733 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8734 const Align Alignment = Align(8);
8735 std::tie(args&: OverflowArgAreaShadowPtr, args&: OverflowArgAreaOriginPtr) =
8736 MSV.getShadowOriginPtr(Addr: OverflowArgAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8737 Alignment, /*isStore*/ true);
8738 Value *SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSCopy,
8739 Idx0: SystemZOverflowOffset);
8740 IRB.CreateMemCpy(Dst: OverflowArgAreaShadowPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
8741 Size: VAArgOverflowSize);
8742 if (MS.TrackOrigins) {
8743 SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSOriginCopy,
8744 Idx0: SystemZOverflowOffset);
8745 IRB.CreateMemCpy(Dst: OverflowArgAreaOriginPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
8746 Size: VAArgOverflowSize);
8747 }
8748 }
8749
8750 void finalizeInstrumentation() override {
8751 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8752 "finalizeInstrumentation called twice");
8753 if (!VAStartInstrumentationList.empty()) {
8754 // If there is a va_start in this function, make a backup copy of
8755 // va_arg_tls somewhere in the function entry block.
8756 IRBuilder<> IRB(MSV.FnPrologueEnd);
8757 VAArgOverflowSize =
8758 IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8759 Value *CopySize =
8760 IRB.CreateAdd(LHS: ConstantInt::get(Ty: MS.IntptrTy, V: SystemZOverflowOffset),
8761 RHS: VAArgOverflowSize);
8762 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8763 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8764 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8765 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8766
8767 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8768 ID: Intrinsic::umin, LHS: CopySize,
8769 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8770 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8771 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8772 if (MS.TrackOrigins) {
8773 VAArgTLSOriginCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8774 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8775 IRB.CreateMemCpy(Dst: VAArgTLSOriginCopy, DstAlign: kShadowTLSAlignment,
8776 Src: MS.VAArgOriginTLS, SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8777 }
8778 }
8779
8780 // Instrument va_start.
8781 // Copy va_list shadow from the backup copy of the TLS contents.
8782 for (CallInst *OrigInst : VAStartInstrumentationList) {
8783 NextNodeIRBuilder IRB(OrigInst);
8784 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8785 copyRegSaveArea(IRB, VAListTag);
8786 copyOverflowArea(IRB, VAListTag);
8787 }
8788 }
8789};
8790
8791/// i386-specific implementation of VarArgHelper.
8792struct VarArgI386Helper : public VarArgHelperBase {
8793 AllocaInst *VAArgTLSCopy = nullptr;
8794 Value *VAArgSize = nullptr;
8795
8796 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8797 MemorySanitizerVisitor &MSV)
8798 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8799
8800 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8801 const DataLayout &DL = F.getDataLayout();
8802 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8803 unsigned VAArgOffset = 0;
8804 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8805 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8806 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
8807 if (IsByVal) {
8808 assert(A->getType()->isPointerTy());
8809 Type *RealTy = CB.getParamByValType(ArgNo);
8810 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
8811 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(u: Align(IntptrSize));
8812 if (ArgAlign < IntptrSize)
8813 ArgAlign = Align(IntptrSize);
8814 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8815 if (!IsFixed) {
8816 Value *Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset, ArgSize);
8817 if (Base) {
8818 Value *AShadowPtr, *AOriginPtr;
8819 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
8820 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(),
8821 Alignment: kShadowTLSAlignment, /*isStore*/ false);
8822
8823 IRB.CreateMemCpy(Dst: Base, DstAlign: kShadowTLSAlignment, Src: AShadowPtr,
8824 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8825 }
8826 VAArgOffset += alignTo(Size: ArgSize, A: Align(IntptrSize));
8827 }
8828 } else {
8829 Value *Base;
8830 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8831 Align ArgAlign = Align(IntptrSize);
8832 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8833 if (DL.isBigEndian()) {
8834 // Adjusting the shadow for argument with size < IntptrSize to match
8835 // the placement of bits in big endian system
8836 if (ArgSize < IntptrSize)
8837 VAArgOffset += (IntptrSize - ArgSize);
8838 }
8839 if (!IsFixed) {
8840 Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset, ArgSize);
8841 if (Base)
8842 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
8843 VAArgOffset += ArgSize;
8844 VAArgOffset = alignTo(Size: VAArgOffset, A: Align(IntptrSize));
8845 }
8846 }
8847 }
8848
8849 Constant *TotalVAArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset);
8850 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8851 // a new class member i.e. it is the total size of all VarArgs.
8852 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
8853 }
8854
8855 void finalizeInstrumentation() override {
8856 assert(!VAArgSize && !VAArgTLSCopy &&
8857 "finalizeInstrumentation called twice");
8858 IRBuilder<> IRB(MSV.FnPrologueEnd);
8859 VAArgSize = IRB.CreateLoad(Ty: MS.IntptrTy, Ptr: MS.VAArgOverflowSizeTLS);
8860 Value *CopySize = VAArgSize;
8861
8862 if (!VAStartInstrumentationList.empty()) {
8863 // If there is a va_start in this function, make a backup copy of
8864 // va_arg_tls somewhere in the function entry block.
8865 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8866 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8867 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8868 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8869
8870 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8871 ID: Intrinsic::umin, LHS: CopySize,
8872 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8873 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8874 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8875 }
8876
8877 // Instrument va_start.
8878 // Copy va_list shadow from the backup copy of the TLS contents.
8879 for (CallInst *OrigInst : VAStartInstrumentationList) {
8880 NextNodeIRBuilder IRB(OrigInst);
8881 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8882 Type *RegSaveAreaPtrTy = PointerType::getUnqual(C&: *MS.C);
8883 Value *RegSaveAreaPtrPtr =
8884 IRB.CreateIntToPtr(V: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
8885 DestTy: PointerType::get(C&: *MS.C, AddressSpace: 0));
8886 Value *RegSaveAreaPtr =
8887 IRB.CreateLoad(Ty: RegSaveAreaPtrTy, Ptr: RegSaveAreaPtrPtr);
8888 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8889 const DataLayout &DL = F.getDataLayout();
8890 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8891 const Align Alignment = Align(IntptrSize);
8892 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8893 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8894 Alignment, /*isStore*/ true);
8895 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8896 Size: CopySize);
8897 }
8898 }
8899};
8900
8901/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8902/// LoongArch64.
8903struct VarArgGenericHelper : public VarArgHelperBase {
8904 AllocaInst *VAArgTLSCopy = nullptr;
8905 Value *VAArgSize = nullptr;
8906
8907 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8908 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8909 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8910
8911 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8912 unsigned VAArgOffset = 0;
8913 const DataLayout &DL = F.getDataLayout();
8914 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8915 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8916 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8917 if (IsFixed)
8918 continue;
8919 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8920 if (DL.isBigEndian()) {
8921 // Adjusting the shadow for argument with size < IntptrSize to match the
8922 // placement of bits in big endian system
8923 if (ArgSize < IntptrSize)
8924 VAArgOffset += (IntptrSize - ArgSize);
8925 }
8926 Value *Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset, ArgSize);
8927 VAArgOffset += ArgSize;
8928 VAArgOffset = alignTo(Value: VAArgOffset, Align: IntptrSize);
8929 if (!Base)
8930 continue;
8931 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
8932 }
8933
8934 Constant *TotalVAArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset);
8935 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8936 // a new class member i.e. it is the total size of all VarArgs.
8937 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
8938 }
8939
8940 void finalizeInstrumentation() override {
8941 assert(!VAArgSize && !VAArgTLSCopy &&
8942 "finalizeInstrumentation called twice");
8943 IRBuilder<> IRB(MSV.FnPrologueEnd);
8944 VAArgSize = IRB.CreateLoad(Ty: MS.IntptrTy, Ptr: MS.VAArgOverflowSizeTLS);
8945 Value *CopySize = VAArgSize;
8946
8947 if (!VAStartInstrumentationList.empty()) {
8948 // If there is a va_start in this function, make a backup copy of
8949 // va_arg_tls somewhere in the function entry block.
8950 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8951 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8952 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8953 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8954
8955 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8956 ID: Intrinsic::umin, LHS: CopySize,
8957 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8958 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8959 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8960 }
8961
8962 // Instrument va_start.
8963 // Copy va_list shadow from the backup copy of the TLS contents.
8964 for (CallInst *OrigInst : VAStartInstrumentationList) {
8965 NextNodeIRBuilder IRB(OrigInst);
8966 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8967 Type *RegSaveAreaPtrTy = PointerType::getUnqual(C&: *MS.C);
8968 Value *RegSaveAreaPtrPtr =
8969 IRB.CreateIntToPtr(V: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
8970 DestTy: PointerType::get(C&: *MS.C, AddressSpace: 0));
8971 Value *RegSaveAreaPtr =
8972 IRB.CreateLoad(Ty: RegSaveAreaPtrTy, Ptr: RegSaveAreaPtrPtr);
8973 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8974 const DataLayout &DL = F.getDataLayout();
8975 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8976 const Align Alignment = Align(IntptrSize);
8977 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8978 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8979 Alignment, /*isStore*/ true);
8980 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8981 Size: CopySize);
8982 }
8983 }
8984};
8985
8986// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8987// regarding VAArgs.
8988using VarArgARM32Helper = VarArgGenericHelper;
8989using VarArgRISCVHelper = VarArgGenericHelper;
8990using VarArgMIPSHelper = VarArgGenericHelper;
8991using VarArgLoongArch64Helper = VarArgGenericHelper;
8992
8993/// A no-op implementation of VarArgHelper.
8994struct VarArgNoOpHelper : public VarArgHelper {
8995 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8996 MemorySanitizerVisitor &MSV) {}
8997
8998 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8999
9000 void visitVAStartInst(VAStartInst &I) override {}
9001
9002 void visitVACopyInst(VACopyInst &I) override {}
9003
9004 void finalizeInstrumentation() override {}
9005};
9006
9007} // end anonymous namespace
9008
9009static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
9010 MemorySanitizerVisitor &Visitor) {
9011 // VarArg handling is only implemented on AMD64. False positives are possible
9012 // on other platforms.
9013 Triple TargetTriple(Func.getParent()->getTargetTriple());
9014
9015 if (TargetTriple.getArch() == Triple::x86)
9016 return new VarArgI386Helper(Func, Msan, Visitor);
9017
9018 if (TargetTriple.getArch() == Triple::x86_64)
9019 return new VarArgAMD64Helper(Func, Msan, Visitor);
9020
9021 if (TargetTriple.isARM())
9022 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9023
9024 if (TargetTriple.isAArch64())
9025 return new VarArgAArch64Helper(Func, Msan, Visitor);
9026
9027 if (TargetTriple.isSystemZ())
9028 return new VarArgSystemZHelper(Func, Msan, Visitor);
9029
9030 // On PowerPC32 VAListTag is a struct
9031 // {char, char, i16 padding, char *, char *}
9032 if (TargetTriple.isPPC32())
9033 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
9034
9035 if (TargetTriple.isPPC64())
9036 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
9037
9038 if (TargetTriple.isRISCV32())
9039 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9040
9041 if (TargetTriple.isRISCV64())
9042 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9043
9044 if (TargetTriple.isMIPS32())
9045 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9046
9047 if (TargetTriple.isMIPS64())
9048 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9049
9050 if (TargetTriple.isLoongArch64())
9051 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
9052 /*VAListTagSize=*/8);
9053
9054 return new VarArgNoOpHelper(Func, Msan, Visitor);
9055}
9056
9057bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
9058 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
9059 return false;
9060
9061 if (F.hasFnAttribute(Kind: Attribute::DisableSanitizerInstrumentation))
9062 return false;
9063
9064 MemorySanitizerVisitor Visitor(F, *this, TLI);
9065
9066 // Clear out memory attributes.
9067 AttributeMask B;
9068 B.addAttribute(Val: Attribute::Memory).addAttribute(Val: Attribute::Speculatable);
9069 F.removeFnAttrs(Attrs: B);
9070
9071 return Visitor.runOnFunction();
9072}
9073