1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
151#include "llvm/Transforms/Instrumentation/MemorySanitizer.h"
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
155#include "llvm/ADT/DepthFirstIterator.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
159#include "llvm/ADT/StringExtras.h"
160#include "llvm/ADT/StringRef.h"
161#include "llvm/Analysis/GlobalsModRef.h"
162#include "llvm/Analysis/TargetLibraryInfo.h"
163#include "llvm/Analysis/ValueTracking.h"
164#include "llvm/IR/Argument.h"
165#include "llvm/IR/AttributeMask.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
175#include "llvm/IR/GlobalVariable.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
182#include "llvm/IR/IntrinsicInst.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
191#include "llvm/Support/Alignment.h"
192#include "llvm/Support/AtomicOrdering.h"
193#include "llvm/Support/Casting.h"
194#include "llvm/Support/CommandLine.h"
195#include "llvm/Support/Debug.h"
196#include "llvm/Support/DebugCounter.h"
197#include "llvm/Support/ErrorHandling.h"
198#include "llvm/Support/MathExtras.h"
199#include "llvm/Support/raw_ostream.h"
200#include "llvm/TargetParser/Triple.h"
201#include "llvm/Transforms/Utils/BasicBlockUtils.h"
202#include "llvm/Transforms/Utils/Instrumentation.h"
203#include "llvm/Transforms/Utils/Local.h"
204#include "llvm/Transforms/Utils/ModuleUtils.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
225static const Align kMinOriginAlignment = Align(4);
226static const Align kShadowTLSAlignment = Align(8);
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
240static cl::opt<int> ClTrackOrigins(
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(Val: 0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(Val: false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(Val: true));
253
254static cl::opt<bool> ClPoisonStackWithCall(
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(Val: false));
258
259static cl::opt<int> ClPoisonStackPattern(
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(Val: 0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(Val: true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(Val: true));
276
277static cl::opt<bool> ClPoisonUndefVectors(
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(Val: false));
285
286static cl::opt<bool> ClPreciseDisjointOr(
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(Val: false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(Val: true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(Val: true));
301
302static cl::opt<int> ClSwitchPrecision(
303 "msan-switch-precision",
304 cl::desc("Controls the number of cases considered by MSan for LLVM switch "
305 "instructions. 0 means no UUMs detected. Higher values lead to "
306 "fewer false negatives but may impact compiler and/or "
307 "application performance. N.B. LLVM switch instructions do not "
308 "correspond exactly to C++ switch statements."),
309 cl::Hidden, cl::init(Val: 99));
310
311static cl::opt<bool> ClHandleLifetimeIntrinsics(
312 "msan-handle-lifetime-intrinsics",
313 cl::desc(
314 "when possible, poison scoped variables at the beginning of the scope "
315 "(slower, but more precise)"),
316 cl::Hidden, cl::init(Val: true));
317
318// When compiling the Linux kernel, we sometimes see false positives related to
319// MSan being unable to understand that inline assembly calls may initialize
320// local variables.
321// This flag makes the compiler conservatively unpoison every memory location
322// passed into an assembly call. Note that this may cause false positives.
323// Because it's impossible to figure out the array sizes, we can only unpoison
324// the first sizeof(type) bytes for each type* pointer.
325static cl::opt<bool> ClHandleAsmConservative(
326 "msan-handle-asm-conservative",
327 cl::desc("conservative handling of inline assembly"), cl::Hidden,
328 cl::init(Val: true));
329
330// This flag controls whether we check the shadow of the address
331// operand of load or store. Such bugs are very rare, since load from
332// a garbage address typically results in SEGV, but still happen
333// (e.g. only lower bits of address are garbage, or the access happens
334// early at program startup where malloc-ed memory is more likely to
335// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
336static cl::opt<bool> ClCheckAccessAddress(
337 "msan-check-access-address",
338 cl::desc("report accesses through a pointer which has poisoned shadow"),
339 cl::Hidden, cl::init(Val: true));
340
341static cl::opt<bool> ClEagerChecks(
342 "msan-eager-checks",
343 cl::desc("check arguments and return values at function call boundaries"),
344 cl::Hidden, cl::init(Val: false));
345
346static cl::opt<bool> ClDumpStrictInstructions(
347 "msan-dump-strict-instructions",
348 cl::desc("print out instructions with default strict semantics i.e.,"
349 "check that all the inputs are fully initialized, and mark "
350 "the output as fully initialized. These semantics are applied "
351 "to instructions that could not be handled explicitly nor "
352 "heuristically."),
353 cl::Hidden, cl::init(Val: false));
354
355// Currently, all the heuristically handled instructions are specifically
356// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
357// to parallel 'msan-dump-strict-instructions', and to keep the door open to
358// handling non-intrinsic instructions heuristically.
359static cl::opt<bool> ClDumpHeuristicInstructions(
360 "msan-dump-heuristic-instructions",
361 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
362 "Use -msan-dump-strict-instructions to print instructions that "
363 "could not be handled explicitly nor heuristically."),
364 cl::Hidden, cl::init(Val: false));
365
366static cl::opt<int> ClInstrumentationWithCallThreshold(
367 "msan-instrumentation-with-call-threshold",
368 cl::desc(
369 "If the function being instrumented requires more than "
370 "this number of checks and origin stores, use callbacks instead of "
371 "inline checks (-1 means never use callbacks)."),
372 cl::Hidden, cl::init(Val: 3500));
373
374static cl::opt<bool>
375 ClEnableKmsan("msan-kernel",
376 cl::desc("Enable KernelMemorySanitizer instrumentation"),
377 cl::Hidden, cl::init(Val: false));
378
379static cl::opt<bool>
380 ClDisableChecks("msan-disable-checks",
381 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
382 cl::init(Val: false));
383
384static cl::opt<bool>
385 ClCheckConstantShadow("msan-check-constant-shadow",
386 cl::desc("Insert checks for constant shadow values"),
387 cl::Hidden, cl::init(Val: true));
388
389// This is off by default because of a bug in gold:
390// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
391static cl::opt<bool>
392 ClWithComdat("msan-with-comdat",
393 cl::desc("Place MSan constructors in comdat sections"),
394 cl::Hidden, cl::init(Val: false));
395
396// These options allow to specify custom memory map parameters
397// See MemoryMapParams for details.
398static cl::opt<uint64_t> ClAndMask("msan-and-mask",
399 cl::desc("Define custom MSan AndMask"),
400 cl::Hidden, cl::init(Val: 0));
401
402static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
403 cl::desc("Define custom MSan XorMask"),
404 cl::Hidden, cl::init(Val: 0));
405
406static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
407 cl::desc("Define custom MSan ShadowBase"),
408 cl::Hidden, cl::init(Val: 0));
409
410static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
411 cl::desc("Define custom MSan OriginBase"),
412 cl::Hidden, cl::init(Val: 0));
413
414static cl::opt<int>
415 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
416 cl::desc("Define threshold for number of checks per "
417 "debug location to force origin update."),
418 cl::Hidden, cl::init(Val: 3));
419
420const char kMsanModuleCtorName[] = "msan.module_ctor";
421const char kMsanInitName[] = "__msan_init";
422
423namespace {
424
425// Memory map parameters used in application-to-shadow address calculation.
426// Offset = (Addr & ~AndMask) ^ XorMask
427// Shadow = ShadowBase + Offset
428// Origin = OriginBase + Offset
429struct MemoryMapParams {
430 uint64_t AndMask;
431 uint64_t XorMask;
432 uint64_t ShadowBase;
433 uint64_t OriginBase;
434};
435
436struct PlatformMemoryMapParams {
437 const MemoryMapParams *bits32;
438 const MemoryMapParams *bits64;
439};
440
441} // end anonymous namespace
442
443// i386 Linux
444static const MemoryMapParams Linux_I386_MemoryMapParams = {
445 .AndMask: 0x000080000000, // AndMask
446 .XorMask: 0, // XorMask (not used)
447 .ShadowBase: 0, // ShadowBase (not used)
448 .OriginBase: 0x000040000000, // OriginBase
449};
450
451// x86_64 Linux
452static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
453 .AndMask: 0, // AndMask (not used)
454 .XorMask: 0x500000000000, // XorMask
455 .ShadowBase: 0, // ShadowBase (not used)
456 .OriginBase: 0x100000000000, // OriginBase
457};
458
459// mips32 Linux
460// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
461// after picking good constants
462
463// mips64 Linux
464static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
465 .AndMask: 0, // AndMask (not used)
466 .XorMask: 0x008000000000, // XorMask
467 .ShadowBase: 0, // ShadowBase (not used)
468 .OriginBase: 0x002000000000, // OriginBase
469};
470
471// ppc32 Linux
472// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
473// after picking good constants
474
475// ppc64 Linux
476static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
477 .AndMask: 0xE00000000000, // AndMask
478 .XorMask: 0x100000000000, // XorMask
479 .ShadowBase: 0x080000000000, // ShadowBase
480 .OriginBase: 0x1C0000000000, // OriginBase
481};
482
483// s390x Linux
484static const MemoryMapParams Linux_S390X_MemoryMapParams = {
485 .AndMask: 0xC00000000000, // AndMask
486 .XorMask: 0, // XorMask (not used)
487 .ShadowBase: 0x080000000000, // ShadowBase
488 .OriginBase: 0x1C0000000000, // OriginBase
489};
490
491// arm32 Linux
492// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
493// after picking good constants
494
495// aarch64 Linux
496static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
497 .AndMask: 0, // AndMask (not used)
498 .XorMask: 0x0B00000000000, // XorMask
499 .ShadowBase: 0, // ShadowBase (not used)
500 .OriginBase: 0x0200000000000, // OriginBase
501};
502
503// loongarch64 Linux
504static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
505 .AndMask: 0, // AndMask (not used)
506 .XorMask: 0x500000000000, // XorMask
507 .ShadowBase: 0, // ShadowBase (not used)
508 .OriginBase: 0x100000000000, // OriginBase
509};
510
511// riscv32 Linux
512// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
513// after picking good constants
514
515// aarch64 FreeBSD
516static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
517 .AndMask: 0x1800000000000, // AndMask
518 .XorMask: 0x0400000000000, // XorMask
519 .ShadowBase: 0x0200000000000, // ShadowBase
520 .OriginBase: 0x0700000000000, // OriginBase
521};
522
523// i386 FreeBSD
524static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
525 .AndMask: 0x000180000000, // AndMask
526 .XorMask: 0x000040000000, // XorMask
527 .ShadowBase: 0x000020000000, // ShadowBase
528 .OriginBase: 0x000700000000, // OriginBase
529};
530
531// x86_64 FreeBSD
532static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
533 .AndMask: 0xc00000000000, // AndMask
534 .XorMask: 0x200000000000, // XorMask
535 .ShadowBase: 0x100000000000, // ShadowBase
536 .OriginBase: 0x380000000000, // OriginBase
537};
538
539// x86_64 NetBSD
540static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
541 .AndMask: 0, // AndMask
542 .XorMask: 0x500000000000, // XorMask
543 .ShadowBase: 0, // ShadowBase
544 .OriginBase: 0x100000000000, // OriginBase
545};
546
547static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
548 .bits32: &Linux_I386_MemoryMapParams,
549 .bits64: &Linux_X86_64_MemoryMapParams,
550};
551
552static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
553 .bits32: nullptr,
554 .bits64: &Linux_MIPS64_MemoryMapParams,
555};
556
557static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
558 .bits32: nullptr,
559 .bits64: &Linux_PowerPC64_MemoryMapParams,
560};
561
562static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
563 .bits32: nullptr,
564 .bits64: &Linux_S390X_MemoryMapParams,
565};
566
567static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
568 .bits32: nullptr,
569 .bits64: &Linux_AArch64_MemoryMapParams,
570};
571
572static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
573 .bits32: nullptr,
574 .bits64: &Linux_LoongArch64_MemoryMapParams,
575};
576
577static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
578 .bits32: nullptr,
579 .bits64: &FreeBSD_AArch64_MemoryMapParams,
580};
581
582static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
583 .bits32: &FreeBSD_I386_MemoryMapParams,
584 .bits64: &FreeBSD_X86_64_MemoryMapParams,
585};
586
587static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
588 .bits32: nullptr,
589 .bits64: &NetBSD_X86_64_MemoryMapParams,
590};
591
592enum OddOrEvenLanes { kBothLanes, kEvenLanes, kOddLanes };
593
594namespace {
595
596/// Instrument functions of a module to detect uninitialized reads.
597///
598/// Instantiating MemorySanitizer inserts the msan runtime library API function
599/// declarations into the module if they don't exist already. Instantiating
600/// ensures the __msan_init function is in the list of global constructors for
601/// the module.
602class MemorySanitizer {
603public:
604 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
605 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
606 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
607 initializeModule(M);
608 }
609
610 // MSan cannot be moved or copied because of MapParams.
611 MemorySanitizer(MemorySanitizer &&) = delete;
612 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
613 MemorySanitizer(const MemorySanitizer &) = delete;
614 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
615
616 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
617
618private:
619 friend struct MemorySanitizerVisitor;
620 friend struct VarArgHelperBase;
621 friend struct VarArgAMD64Helper;
622 friend struct VarArgAArch64Helper;
623 friend struct VarArgPowerPC64Helper;
624 friend struct VarArgPowerPC32Helper;
625 friend struct VarArgSystemZHelper;
626 friend struct VarArgI386Helper;
627 friend struct VarArgGenericHelper;
628
629 void initializeModule(Module &M);
630 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
631 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
632 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
633
634 template <typename... ArgsTy>
635 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
636 ArgsTy... Args);
637
638 /// True if we're compiling the Linux kernel.
639 bool CompileKernel;
640 /// Track origins (allocation points) of uninitialized values.
641 int TrackOrigins;
642 bool Recover;
643 bool EagerChecks;
644
645 Triple TargetTriple;
646 LLVMContext *C;
647 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
648 Type *OriginTy;
649 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
650
651 // XxxTLS variables represent the per-thread state in MSan and per-task state
652 // in KMSAN.
653 // For the userspace these point to thread-local globals. In the kernel land
654 // they point to the members of a per-task struct obtained via a call to
655 // __msan_get_context_state().
656
657 /// Thread-local shadow storage for function parameters.
658 Value *ParamTLS;
659
660 /// Thread-local origin storage for function parameters.
661 Value *ParamOriginTLS;
662
663 /// Thread-local shadow storage for function return value.
664 Value *RetvalTLS;
665
666 /// Thread-local origin storage for function return value.
667 Value *RetvalOriginTLS;
668
669 /// Thread-local shadow storage for in-register va_arg function.
670 Value *VAArgTLS;
671
672 /// Thread-local shadow storage for in-register va_arg function.
673 Value *VAArgOriginTLS;
674
675 /// Thread-local shadow storage for va_arg overflow area.
676 Value *VAArgOverflowSizeTLS;
677
678 /// Are the instrumentation callbacks set up?
679 bool CallbacksInitialized = false;
680
681 /// The run-time callback to print a warning.
682 FunctionCallee WarningFn;
683
684 // These arrays are indexed by log2(AccessSize).
685 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
686 FunctionCallee MaybeWarningVarSizeFn;
687 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
688
689 /// Run-time helper that generates a new origin value for a stack
690 /// allocation.
691 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
692 // No description version
693 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
694
695 /// Run-time helper that poisons stack on function entry.
696 FunctionCallee MsanPoisonStackFn;
697
698 /// Run-time helper that records a store (or any event) of an
699 /// uninitialized value and returns an updated origin id encoding this info.
700 FunctionCallee MsanChainOriginFn;
701
702 /// Run-time helper that paints an origin over a region.
703 FunctionCallee MsanSetOriginFn;
704
705 /// MSan runtime replacements for memmove, memcpy and memset.
706 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
707
708 /// KMSAN callback for task-local function argument shadow.
709 StructType *MsanContextStateTy;
710 FunctionCallee MsanGetContextStateFn;
711
712 /// Functions for poisoning/unpoisoning local variables
713 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
714
715 /// Pair of shadow/origin pointers.
716 Type *MsanMetadata;
717
718 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
719 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
720 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
721 FunctionCallee MsanMetadataPtrForStore_1_8[4];
722 FunctionCallee MsanInstrumentAsmStoreFn;
723
724 /// Storage for return values of the MsanMetadataPtrXxx functions.
725 Value *MsanMetadataAlloca;
726
727 /// Helper to choose between different MsanMetadataPtrXxx().
728 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
729
730 /// Memory map parameters used in application-to-shadow calculation.
731 const MemoryMapParams *MapParams;
732
733 /// Custom memory map parameters used when -msan-shadow-base or
734 // -msan-origin-base is provided.
735 MemoryMapParams CustomMapParams;
736
737 MDNode *ColdCallWeights;
738
739 /// Branch weights for origin store.
740 MDNode *OriginStoreWeights;
741};
742
743void insertModuleCtor(Module &M) {
744 getOrCreateSanitizerCtorAndInitFunctions(
745 M, CtorName: kMsanModuleCtorName, InitName: kMsanInitName,
746 /*InitArgTypes=*/{},
747 /*InitArgs=*/{},
748 // This callback is invoked when the functions are created the first
749 // time. Hook them into the global ctors list in that case:
750 FunctionsCreatedCallback: [&](Function *Ctor, FunctionCallee) {
751 if (!ClWithComdat) {
752 appendToGlobalCtors(M, F: Ctor, Priority: 0);
753 return;
754 }
755 Comdat *MsanCtorComdat = M.getOrInsertComdat(Name: kMsanModuleCtorName);
756 Ctor->setComdat(MsanCtorComdat);
757 appendToGlobalCtors(M, F: Ctor, Priority: 0, Data: Ctor);
758 });
759}
760
761template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
762 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
763}
764
765} // end anonymous namespace
766
767MemorySanitizerOptions::MemorySanitizerOptions(int TO, bool R, bool K,
768 bool EagerChecks)
769 : Kernel(getOptOrDefault(Opt: ClEnableKmsan, Default: K)),
770 TrackOrigins(getOptOrDefault(Opt: ClTrackOrigins, Default: Kernel ? 2 : TO)),
771 Recover(getOptOrDefault(Opt: ClKeepGoing, Default: Kernel || R)),
772 EagerChecks(getOptOrDefault(Opt: ClEagerChecks, Default: EagerChecks)) {}
773
774PreservedAnalyses MemorySanitizerPass::run(Module &M,
775 ModuleAnalysisManager &AM) {
776 // Return early if nosanitize_memory module flag is present for the module.
777 if (checkIfAlreadyInstrumented(M, Flag: "nosanitize_memory"))
778 return PreservedAnalyses::all();
779 bool Modified = false;
780 if (!Options.Kernel) {
781 insertModuleCtor(M);
782 Modified = true;
783 }
784
785 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(IR&: M).getManager();
786 for (Function &F : M) {
787 if (F.empty())
788 continue;
789 MemorySanitizer Msan(*F.getParent(), Options);
790 Modified |=
791 Msan.sanitizeFunction(F, TLI&: FAM.getResult<TargetLibraryAnalysis>(IR&: F));
792 }
793
794 if (!Modified)
795 return PreservedAnalyses::all();
796
797 PreservedAnalyses PA = PreservedAnalyses::none();
798 // GlobalsAA is considered stateless and does not get invalidated unless
799 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
800 // make changes that require GlobalsAA to be invalidated.
801 PA.abandon<GlobalsAA>();
802 return PA;
803}
804
805void MemorySanitizerPass::printPipeline(
806 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
807 static_cast<PassInfoMixin<MemorySanitizerPass> *>(this)->printPipeline(
808 OS, MapClassName2PassName);
809 OS << '<';
810 if (Options.Recover)
811 OS << "recover;";
812 if (Options.Kernel)
813 OS << "kernel;";
814 if (Options.EagerChecks)
815 OS << "eager-checks;";
816 OS << "track-origins=" << Options.TrackOrigins;
817 OS << '>';
818}
819
820/// Create a non-const global initialized with the given string.
821///
822/// Creates a writable global for Str so that we can pass it to the
823/// run-time lib. Runtime uses first 4 bytes of the string to store the
824/// frame ID, so the string needs to be mutable.
825static GlobalVariable *createPrivateConstGlobalForString(Module &M,
826 StringRef Str) {
827 Constant *StrConst = ConstantDataArray::getString(Context&: M.getContext(), Initializer: Str);
828 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
829 GlobalValue::PrivateLinkage, StrConst, "");
830}
831
832template <typename... ArgsTy>
833FunctionCallee
834MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
835 ArgsTy... Args) {
836 if (TargetTriple.getArch() == Triple::systemz) {
837 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
838 return M.getOrInsertFunction(Name, Type::getVoidTy(C&: *C), PtrTy,
839 std::forward<ArgsTy>(Args)...);
840 }
841
842 return M.getOrInsertFunction(Name, MsanMetadata,
843 std::forward<ArgsTy>(Args)...);
844}
845
846/// Create KMSAN API callbacks.
847void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
848 IRBuilder<> IRB(*C);
849
850 // These will be initialized in insertKmsanPrologue().
851 RetvalTLS = nullptr;
852 RetvalOriginTLS = nullptr;
853 ParamTLS = nullptr;
854 ParamOriginTLS = nullptr;
855 VAArgTLS = nullptr;
856 VAArgOriginTLS = nullptr;
857 VAArgOverflowSizeTLS = nullptr;
858
859 WarningFn = M.getOrInsertFunction(Name: "__msan_warning",
860 AttributeList: TLI.getAttrList(C, ArgNos: {0}, /*Signed=*/false),
861 RetTy: IRB.getVoidTy(), Args: IRB.getInt32Ty());
862
863 // Requests the per-task context state (kmsan_context_state*) from the
864 // runtime library.
865 MsanContextStateTy = StructType::get(
866 elt1: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8),
867 elts: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kRetvalTLSSize / 8),
868 elts: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8),
869 elts: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8), /* va_arg_origin */
870 elts: IRB.getInt64Ty(), elts: ArrayType::get(ElementType: OriginTy, NumElements: kParamTLSSize / 4), elts: OriginTy,
871 elts: OriginTy);
872 MsanGetContextStateFn =
873 M.getOrInsertFunction(Name: "__msan_get_context_state", RetTy: PtrTy);
874
875 MsanMetadata = StructType::get(elt1: PtrTy, elts: PtrTy);
876
877 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
878 std::string name_load =
879 "__msan_metadata_ptr_for_load_" + std::to_string(val: size);
880 std::string name_store =
881 "__msan_metadata_ptr_for_store_" + std::to_string(val: size);
882 MsanMetadataPtrForLoad_1_8[ind] =
883 getOrInsertMsanMetadataFunction(M, Name: name_load, Args: PtrTy);
884 MsanMetadataPtrForStore_1_8[ind] =
885 getOrInsertMsanMetadataFunction(M, Name: name_store, Args: PtrTy);
886 }
887
888 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
889 M, Name: "__msan_metadata_ptr_for_load_n", Args: PtrTy, Args: IntptrTy);
890 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
891 M, Name: "__msan_metadata_ptr_for_store_n", Args: PtrTy, Args: IntptrTy);
892
893 // Functions for poisoning and unpoisoning memory.
894 MsanPoisonAllocaFn = M.getOrInsertFunction(
895 Name: "__msan_poison_alloca", RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: PtrTy);
896 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
897 Name: "__msan_unpoison_alloca", RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy);
898}
899
900static Constant *getOrInsertGlobal(Module &M, StringRef Name, Type *Ty) {
901 return M.getOrInsertGlobal(Name, Ty, CreateGlobalCallback: [&] {
902 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
903 nullptr, Name, nullptr,
904 GlobalVariable::InitialExecTLSModel);
905 });
906}
907
908/// Insert declarations for userspace-specific functions and globals.
909void MemorySanitizer::createUserspaceApi(Module &M,
910 const TargetLibraryInfo &TLI) {
911 IRBuilder<> IRB(*C);
912
913 // Create the callback.
914 // FIXME: this function should have "Cold" calling conv,
915 // which is not yet implemented.
916 if (TrackOrigins) {
917 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
918 : "__msan_warning_with_origin_noreturn";
919 WarningFn = M.getOrInsertFunction(Name: WarningFnName,
920 AttributeList: TLI.getAttrList(C, ArgNos: {0}, /*Signed=*/false),
921 RetTy: IRB.getVoidTy(), Args: IRB.getInt32Ty());
922 } else {
923 StringRef WarningFnName =
924 Recover ? "__msan_warning" : "__msan_warning_noreturn";
925 WarningFn = M.getOrInsertFunction(Name: WarningFnName, RetTy: IRB.getVoidTy());
926 }
927
928 // Create the global TLS variables.
929 RetvalTLS =
930 getOrInsertGlobal(M, Name: "__msan_retval_tls",
931 Ty: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kRetvalTLSSize / 8));
932
933 RetvalOriginTLS = getOrInsertGlobal(M, Name: "__msan_retval_origin_tls", Ty: OriginTy);
934
935 ParamTLS =
936 getOrInsertGlobal(M, Name: "__msan_param_tls",
937 Ty: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8));
938
939 ParamOriginTLS =
940 getOrInsertGlobal(M, Name: "__msan_param_origin_tls",
941 Ty: ArrayType::get(ElementType: OriginTy, NumElements: kParamTLSSize / 4));
942
943 VAArgTLS =
944 getOrInsertGlobal(M, Name: "__msan_va_arg_tls",
945 Ty: ArrayType::get(ElementType: IRB.getInt64Ty(), NumElements: kParamTLSSize / 8));
946
947 VAArgOriginTLS =
948 getOrInsertGlobal(M, Name: "__msan_va_arg_origin_tls",
949 Ty: ArrayType::get(ElementType: OriginTy, NumElements: kParamTLSSize / 4));
950
951 VAArgOverflowSizeTLS = getOrInsertGlobal(M, Name: "__msan_va_arg_overflow_size_tls",
952 Ty: IRB.getIntPtrTy(DL: M.getDataLayout()));
953
954 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
955 AccessSizeIndex++) {
956 unsigned AccessSize = 1 << AccessSizeIndex;
957 std::string FunctionName = "__msan_maybe_warning_" + itostr(X: AccessSize);
958 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
959 Name: FunctionName, AttributeList: TLI.getAttrList(C, ArgNos: {0, 1}, /*Signed=*/false),
960 RetTy: IRB.getVoidTy(), Args: IRB.getIntNTy(N: AccessSize * 8), Args: IRB.getInt32Ty());
961 MaybeWarningVarSizeFn = M.getOrInsertFunction(
962 Name: "__msan_maybe_warning_N", AttributeList: TLI.getAttrList(C, ArgNos: {}, /*Signed=*/false),
963 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IRB.getInt64Ty(), Args: IRB.getInt32Ty());
964 FunctionName = "__msan_maybe_store_origin_" + itostr(X: AccessSize);
965 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
966 Name: FunctionName, AttributeList: TLI.getAttrList(C, ArgNos: {0, 2}, /*Signed=*/false),
967 RetTy: IRB.getVoidTy(), Args: IRB.getIntNTy(N: AccessSize * 8), Args: PtrTy,
968 Args: IRB.getInt32Ty());
969 }
970
971 MsanSetAllocaOriginWithDescriptionFn =
972 M.getOrInsertFunction(Name: "__msan_set_alloca_origin_with_descr",
973 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: PtrTy, Args: PtrTy);
974 MsanSetAllocaOriginNoDescriptionFn =
975 M.getOrInsertFunction(Name: "__msan_set_alloca_origin_no_descr",
976 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: PtrTy);
977 MsanPoisonStackFn = M.getOrInsertFunction(Name: "__msan_poison_stack",
978 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy);
979}
980
981/// Insert extern declaration of runtime-provided functions and globals.
982void MemorySanitizer::initializeCallbacks(Module &M,
983 const TargetLibraryInfo &TLI) {
984 // Only do this once.
985 if (CallbacksInitialized)
986 return;
987
988 IRBuilder<> IRB(*C);
989 // Initialize callbacks that are common for kernel and userspace
990 // instrumentation.
991 MsanChainOriginFn = M.getOrInsertFunction(
992 Name: "__msan_chain_origin",
993 AttributeList: TLI.getAttrList(C, ArgNos: {0}, /*Signed=*/false, /*Ret=*/true), RetTy: IRB.getInt32Ty(),
994 Args: IRB.getInt32Ty());
995 MsanSetOriginFn = M.getOrInsertFunction(
996 Name: "__msan_set_origin", AttributeList: TLI.getAttrList(C, ArgNos: {2}, /*Signed=*/false),
997 RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy, Args: IRB.getInt32Ty());
998 MemmoveFn =
999 M.getOrInsertFunction(Name: "__msan_memmove", RetTy: PtrTy, Args: PtrTy, Args: PtrTy, Args: IntptrTy);
1000 MemcpyFn =
1001 M.getOrInsertFunction(Name: "__msan_memcpy", RetTy: PtrTy, Args: PtrTy, Args: PtrTy, Args: IntptrTy);
1002 MemsetFn = M.getOrInsertFunction(Name: "__msan_memset",
1003 AttributeList: TLI.getAttrList(C, ArgNos: {1}, /*Signed=*/true),
1004 RetTy: PtrTy, Args: PtrTy, Args: IRB.getInt32Ty(), Args: IntptrTy);
1005
1006 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
1007 Name: "__msan_instrument_asm_store", RetTy: IRB.getVoidTy(), Args: PtrTy, Args: IntptrTy);
1008
1009 if (CompileKernel) {
1010 createKernelApi(M, TLI);
1011 } else {
1012 createUserspaceApi(M, TLI);
1013 }
1014 CallbacksInitialized = true;
1015}
1016
1017FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1018 int size) {
1019 FunctionCallee *Fns =
1020 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1021 switch (size) {
1022 case 1:
1023 return Fns[0];
1024 case 2:
1025 return Fns[1];
1026 case 4:
1027 return Fns[2];
1028 case 8:
1029 return Fns[3];
1030 default:
1031 return nullptr;
1032 }
1033}
1034
1035/// Module-level initialization.
1036///
1037/// inserts a call to __msan_init to the module's constructor list.
1038void MemorySanitizer::initializeModule(Module &M) {
1039 auto &DL = M.getDataLayout();
1040
1041 TargetTriple = M.getTargetTriple();
1042
1043 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1044 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1045 // Check the overrides first
1046 if (ShadowPassed || OriginPassed) {
1047 CustomMapParams.AndMask = ClAndMask;
1048 CustomMapParams.XorMask = ClXorMask;
1049 CustomMapParams.ShadowBase = ClShadowBase;
1050 CustomMapParams.OriginBase = ClOriginBase;
1051 MapParams = &CustomMapParams;
1052 } else {
1053 switch (TargetTriple.getOS()) {
1054 case Triple::FreeBSD:
1055 switch (TargetTriple.getArch()) {
1056 case Triple::aarch64:
1057 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1058 break;
1059 case Triple::x86_64:
1060 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1061 break;
1062 case Triple::x86:
1063 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1064 break;
1065 default:
1066 report_fatal_error(reason: "unsupported architecture");
1067 }
1068 break;
1069 case Triple::NetBSD:
1070 switch (TargetTriple.getArch()) {
1071 case Triple::x86_64:
1072 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1073 break;
1074 default:
1075 report_fatal_error(reason: "unsupported architecture");
1076 }
1077 break;
1078 case Triple::Linux:
1079 switch (TargetTriple.getArch()) {
1080 case Triple::x86_64:
1081 MapParams = Linux_X86_MemoryMapParams.bits64;
1082 break;
1083 case Triple::x86:
1084 MapParams = Linux_X86_MemoryMapParams.bits32;
1085 break;
1086 case Triple::mips64:
1087 case Triple::mips64el:
1088 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1089 break;
1090 case Triple::ppc64:
1091 case Triple::ppc64le:
1092 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1093 break;
1094 case Triple::systemz:
1095 MapParams = Linux_S390_MemoryMapParams.bits64;
1096 break;
1097 case Triple::aarch64:
1098 case Triple::aarch64_be:
1099 MapParams = Linux_ARM_MemoryMapParams.bits64;
1100 break;
1101 case Triple::loongarch64:
1102 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1103 break;
1104 default:
1105 report_fatal_error(reason: "unsupported architecture");
1106 }
1107 break;
1108 default:
1109 report_fatal_error(reason: "unsupported operating system");
1110 }
1111 }
1112
1113 C = &(M.getContext());
1114 IRBuilder<> IRB(*C);
1115 IntptrTy = IRB.getIntPtrTy(DL);
1116 OriginTy = IRB.getInt32Ty();
1117 PtrTy = IRB.getPtrTy();
1118
1119 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1120 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1121
1122 if (!CompileKernel) {
1123 if (TrackOrigins)
1124 M.getOrInsertGlobal(Name: "__msan_track_origins", Ty: IRB.getInt32Ty(), CreateGlobalCallback: [&] {
1125 return new GlobalVariable(
1126 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1127 IRB.getInt32(C: TrackOrigins), "__msan_track_origins");
1128 });
1129
1130 if (Recover)
1131 M.getOrInsertGlobal(Name: "__msan_keep_going", Ty: IRB.getInt32Ty(), CreateGlobalCallback: [&] {
1132 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1133 GlobalValue::WeakODRLinkage,
1134 IRB.getInt32(C: Recover), "__msan_keep_going");
1135 });
1136 }
1137}
1138
1139namespace {
1140
1141/// A helper class that handles instrumentation of VarArg
1142/// functions on a particular platform.
1143///
1144/// Implementations are expected to insert the instrumentation
1145/// necessary to propagate argument shadow through VarArg function
1146/// calls. Visit* methods are called during an InstVisitor pass over
1147/// the function, and should avoid creating new basic blocks. A new
1148/// instance of this class is created for each instrumented function.
1149struct VarArgHelper {
1150 virtual ~VarArgHelper() = default;
1151
1152 /// Visit a CallBase.
1153 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1154
1155 /// Visit a va_start call.
1156 virtual void visitVAStartInst(VAStartInst &I) = 0;
1157
1158 /// Visit a va_copy call.
1159 virtual void visitVACopyInst(VACopyInst &I) = 0;
1160
1161 /// Finalize function instrumentation.
1162 ///
1163 /// This method is called after visiting all interesting (see above)
1164 /// instructions in a function.
1165 virtual void finalizeInstrumentation() = 0;
1166};
1167
1168struct MemorySanitizerVisitor;
1169
1170} // end anonymous namespace
1171
1172static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1173 MemorySanitizerVisitor &Visitor);
1174
1175static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1176 if (TS.isScalable())
1177 // Scalable types unconditionally take slowpaths.
1178 return kNumberOfAccessSizes;
1179 unsigned TypeSizeFixed = TS.getFixedValue();
1180 if (TypeSizeFixed <= 8)
1181 return 0;
1182 return Log2_32_Ceil(Value: (TypeSizeFixed + 7) / 8);
1183}
1184
1185namespace {
1186
1187/// Helper class to attach debug information of the given instruction onto new
1188/// instructions inserted after.
1189class NextNodeIRBuilder : public IRBuilder<> {
1190public:
1191 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1192 SetCurrentDebugLocation(IP->getDebugLoc());
1193 }
1194};
1195
1196/// This class does all the work for a given function. Store and Load
1197/// instructions store and load corresponding shadow and origin
1198/// values. Most instructions propagate shadow from arguments to their
1199/// return values. Certain instructions (most importantly, BranchInst)
1200/// test their argument shadow and print reports (with a runtime call) if it's
1201/// non-zero.
1202struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1203 Function &F;
1204 MemorySanitizer &MS;
1205 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1206 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1207 std::unique_ptr<VarArgHelper> VAHelper;
1208 const TargetLibraryInfo *TLI;
1209 Instruction *FnPrologueEnd;
1210 SmallVector<Instruction *, 16> Instructions;
1211
1212 // The following flags disable parts of MSan instrumentation based on
1213 // exclusion list contents and command-line options.
1214 bool InsertChecks;
1215 bool PropagateShadow;
1216 bool PoisonStack;
1217 bool PoisonUndef;
1218 bool PoisonUndefVectors;
1219
1220 struct ShadowOriginAndInsertPoint {
1221 Value *Shadow;
1222 Value *Origin;
1223 Instruction *OrigIns;
1224
1225 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1226 : Shadow(S), Origin(O), OrigIns(I) {}
1227 };
1228 SmallVector<ShadowOriginAndInsertPoint, 16> InstrumentationList;
1229 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1230 SmallSetVector<AllocaInst *, 16> AllocaSet;
1231 SmallVector<std::pair<IntrinsicInst *, AllocaInst *>, 16> LifetimeStartList;
1232 SmallVector<StoreInst *, 16> StoreList;
1233 int64_t SplittableBlocksCount = 0;
1234
1235 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1236 const TargetLibraryInfo &TLI)
1237 : F(F), MS(MS), VAHelper(CreateVarArgHelper(Func&: F, Msan&: MS, Visitor&: *this)), TLI(&TLI) {
1238 bool SanitizeFunction =
1239 F.hasFnAttribute(Kind: Attribute::SanitizeMemory) && !ClDisableChecks;
1240 InsertChecks = SanitizeFunction;
1241 PropagateShadow = SanitizeFunction;
1242 PoisonStack = SanitizeFunction && ClPoisonStack;
1243 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1244 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1245
1246 // In the presence of unreachable blocks, we may see Phi nodes with
1247 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1248 // blocks, such nodes will not have any shadow value associated with them.
1249 // It's easier to remove unreachable blocks than deal with missing shadow.
1250 removeUnreachableBlocks(F);
1251
1252 MS.initializeCallbacks(M&: *F.getParent(), TLI);
1253 FnPrologueEnd =
1254 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1255 .CreateIntrinsic(ID: Intrinsic::donothing, Args: {});
1256
1257 if (MS.CompileKernel) {
1258 IRBuilder<> IRB(FnPrologueEnd);
1259 insertKmsanPrologue(IRB);
1260 }
1261
1262 LLVM_DEBUG(if (!InsertChecks) dbgs()
1263 << "MemorySanitizer is not inserting checks into '"
1264 << F.getName() << "'\n");
1265 }
1266
1267 bool instrumentWithCalls(Value *V) {
1268 // Constants likely will be eliminated by follow-up passes.
1269 if (isa<Constant>(Val: V))
1270 return false;
1271 ++SplittableBlocksCount;
1272 return ClInstrumentationWithCallThreshold >= 0 &&
1273 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1274 }
1275
1276 bool isInPrologue(Instruction &I) {
1277 return I.getParent() == FnPrologueEnd->getParent() &&
1278 (&I == FnPrologueEnd || I.comesBefore(Other: FnPrologueEnd));
1279 }
1280
1281 // Creates a new origin and records the stack trace. In general we can call
1282 // this function for any origin manipulation we like. However it will cost
1283 // runtime resources. So use this wisely only if it can provide additional
1284 // information helpful to a user.
1285 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1286 if (MS.TrackOrigins <= 1)
1287 return V;
1288 return IRB.CreateCall(Callee: MS.MsanChainOriginFn, Args: V);
1289 }
1290
1291 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1292 const DataLayout &DL = F.getDataLayout();
1293 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
1294 if (IntptrSize == kOriginSize)
1295 return Origin;
1296 assert(IntptrSize == kOriginSize * 2);
1297 Origin = IRB.CreateIntCast(V: Origin, DestTy: MS.IntptrTy, /* isSigned */ false);
1298 return IRB.CreateOr(LHS: Origin, RHS: IRB.CreateShl(LHS: Origin, RHS: kOriginSize * 8));
1299 }
1300
1301 /// Fill memory range with the given origin value.
1302 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1303 TypeSize TS, Align Alignment) {
1304 const DataLayout &DL = F.getDataLayout();
1305 const Align IntptrAlignment = DL.getABITypeAlign(Ty: MS.IntptrTy);
1306 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
1307 assert(IntptrAlignment >= kMinOriginAlignment);
1308 assert(IntptrSize >= kOriginSize);
1309
1310 // Note: The loop based formation works for fixed length vectors too,
1311 // however we prefer to unroll and specialize alignment below.
1312 if (TS.isScalable()) {
1313 Value *Size = IRB.CreateTypeSize(Ty: MS.IntptrTy, Size: TS);
1314 Value *RoundUp =
1315 IRB.CreateAdd(LHS: Size, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kOriginSize - 1));
1316 Value *End =
1317 IRB.CreateUDiv(LHS: RoundUp, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kOriginSize));
1318 auto [InsertPt, Index] =
1319 SplitBlockAndInsertSimpleForLoop(End, SplitBefore: IRB.GetInsertPoint());
1320 IRB.SetInsertPoint(InsertPt);
1321
1322 Value *GEP = IRB.CreateGEP(Ty: MS.OriginTy, Ptr: OriginPtr, IdxList: Index);
1323 IRB.CreateAlignedStore(Val: Origin, Ptr: GEP, Align: kMinOriginAlignment);
1324 return;
1325 }
1326
1327 unsigned Size = TS.getFixedValue();
1328
1329 unsigned Ofs = 0;
1330 Align CurrentAlignment = Alignment;
1331 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1332 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1333 Value *IntptrOriginPtr = IRB.CreatePointerCast(V: OriginPtr, DestTy: MS.PtrTy);
1334 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1335 Value *Ptr = i ? IRB.CreateConstGEP1_32(Ty: MS.IntptrTy, Ptr: IntptrOriginPtr, Idx0: i)
1336 : IntptrOriginPtr;
1337 IRB.CreateAlignedStore(Val: IntptrOrigin, Ptr, Align: CurrentAlignment);
1338 Ofs += IntptrSize / kOriginSize;
1339 CurrentAlignment = IntptrAlignment;
1340 }
1341 }
1342
1343 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1344 Value *GEP =
1345 i ? IRB.CreateConstGEP1_32(Ty: MS.OriginTy, Ptr: OriginPtr, Idx0: i) : OriginPtr;
1346 IRB.CreateAlignedStore(Val: Origin, Ptr: GEP, Align: CurrentAlignment);
1347 CurrentAlignment = kMinOriginAlignment;
1348 }
1349 }
1350
1351 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1352 Value *OriginPtr, Align Alignment) {
1353 const DataLayout &DL = F.getDataLayout();
1354 const Align OriginAlignment = std::max(a: kMinOriginAlignment, b: Alignment);
1355 TypeSize StoreSize = DL.getTypeStoreSize(Ty: Shadow->getType());
1356 // ZExt cannot convert between vector and scalar
1357 Value *ConvertedShadow = convertShadowToScalar(V: Shadow, IRB);
1358 if (auto *ConstantShadow = dyn_cast<Constant>(Val: ConvertedShadow)) {
1359 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1360 // Origin is not needed: value is initialized or const shadow is
1361 // ignored.
1362 return;
1363 }
1364 if (llvm::isKnownNonZero(V: ConvertedShadow, Q: DL)) {
1365 // Copy origin as the value is definitely uninitialized.
1366 paintOrigin(IRB, Origin: updateOrigin(V: Origin, IRB), OriginPtr, TS: StoreSize,
1367 Alignment: OriginAlignment);
1368 return;
1369 }
1370 // Fallback to runtime check, which still can be optimized out later.
1371 }
1372
1373 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(Ty: ConvertedShadow->getType());
1374 unsigned SizeIndex = TypeSizeToSizeIndex(TS: TypeSizeInBits);
1375 if (instrumentWithCalls(V: ConvertedShadow) &&
1376 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1377 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1378 Value *ConvertedShadow2 =
1379 IRB.CreateZExt(V: ConvertedShadow, DestTy: IRB.getIntNTy(N: 8 * (1 << SizeIndex)));
1380 CallBase *CB = IRB.CreateCall(Callee: Fn, Args: {ConvertedShadow2, Addr, Origin});
1381 CB->addParamAttr(ArgNo: 0, Kind: Attribute::ZExt);
1382 CB->addParamAttr(ArgNo: 2, Kind: Attribute::ZExt);
1383 } else {
1384 Value *Cmp = convertToBool(V: ConvertedShadow, IRB, name: "_mscmp");
1385 Instruction *CheckTerm = SplitBlockAndInsertIfThen(
1386 Cond: Cmp, SplitBefore: &*IRB.GetInsertPoint(), Unreachable: false, BranchWeights: MS.OriginStoreWeights);
1387 IRBuilder<> IRBNew(CheckTerm);
1388 paintOrigin(IRB&: IRBNew, Origin: updateOrigin(V: Origin, IRB&: IRBNew), OriginPtr, TS: StoreSize,
1389 Alignment: OriginAlignment);
1390 }
1391 }
1392
1393 void materializeStores() {
1394 for (StoreInst *SI : StoreList) {
1395 IRBuilder<> IRB(SI);
1396 Value *Val = SI->getValueOperand();
1397 Value *Addr = SI->getPointerOperand();
1398 Value *Shadow = SI->isAtomic() ? getCleanShadow(V: Val) : getShadow(V: Val);
1399 Value *ShadowPtr, *OriginPtr;
1400 Type *ShadowTy = Shadow->getType();
1401 const Align Alignment = SI->getAlign();
1402 const Align OriginAlignment = std::max(a: kMinOriginAlignment, b: Alignment);
1403 std::tie(args&: ShadowPtr, args&: OriginPtr) =
1404 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1405
1406 [[maybe_unused]] StoreInst *NewSI =
1407 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowPtr, Align: Alignment);
1408 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1409
1410 if (SI->isAtomic())
1411 SI->setOrdering(addReleaseOrdering(a: SI->getOrdering()));
1412
1413 if (MS.TrackOrigins && !SI->isAtomic())
1414 storeOrigin(IRB, Addr, Shadow, Origin: getOrigin(V: Val), OriginPtr,
1415 Alignment: OriginAlignment);
1416 }
1417 }
1418
1419 // Returns true if Debug Location corresponds to multiple warnings.
1420 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1421 if (MS.TrackOrigins < 2)
1422 return false;
1423
1424 if (LazyWarningDebugLocationCount.empty())
1425 for (const auto &I : InstrumentationList)
1426 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1427
1428 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1429 }
1430
1431 /// Helper function to insert a warning at IRB's current insert point.
1432 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1433 if (!Origin)
1434 Origin = (Value *)IRB.getInt32(C: 0);
1435 assert(Origin->getType()->isIntegerTy());
1436
1437 if (shouldDisambiguateWarningLocation(DebugLoc: IRB.getCurrentDebugLocation())) {
1438 // Try to create additional origin with debug info of the last origin
1439 // instruction. It may provide additional information to the user.
1440 if (Instruction *OI = dyn_cast_or_null<Instruction>(Val: Origin)) {
1441 assert(MS.TrackOrigins);
1442 auto NewDebugLoc = OI->getDebugLoc();
1443 // Origin update with missing or the same debug location provides no
1444 // additional value.
1445 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1446 // Insert update just before the check, so we call runtime only just
1447 // before the report.
1448 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1449 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1450 Origin = updateOrigin(V: Origin, IRB&: IRBOrigin);
1451 }
1452 }
1453 }
1454
1455 if (MS.CompileKernel || MS.TrackOrigins)
1456 IRB.CreateCall(Callee: MS.WarningFn, Args: Origin)->setCannotMerge();
1457 else
1458 IRB.CreateCall(Callee: MS.WarningFn)->setCannotMerge();
1459 // FIXME: Insert UnreachableInst if !MS.Recover?
1460 // This may invalidate some of the following checks and needs to be done
1461 // at the very end.
1462 }
1463
1464 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1465 Value *Origin) {
1466 const DataLayout &DL = F.getDataLayout();
1467 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(Ty: ConvertedShadow->getType());
1468 unsigned SizeIndex = TypeSizeToSizeIndex(TS: TypeSizeInBits);
1469 if (instrumentWithCalls(V: ConvertedShadow) && !MS.CompileKernel) {
1470 // ZExt cannot convert between vector and scalar
1471 ConvertedShadow = convertShadowToScalar(V: ConvertedShadow, IRB);
1472 Value *ConvertedShadow2 =
1473 IRB.CreateZExt(V: ConvertedShadow, DestTy: IRB.getIntNTy(N: 8 * (1 << SizeIndex)));
1474
1475 if (SizeIndex < kNumberOfAccessSizes) {
1476 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1477 CallBase *CB = IRB.CreateCall(
1478 Callee: Fn,
1479 Args: {ConvertedShadow2,
1480 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(C: 0)});
1481 CB->addParamAttr(ArgNo: 0, Kind: Attribute::ZExt);
1482 CB->addParamAttr(ArgNo: 1, Kind: Attribute::ZExt);
1483 } else {
1484 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1485 Value *ShadowAlloca = IRB.CreateAlloca(Ty: ConvertedShadow2->getType(), AddrSpace: 0u);
1486 IRB.CreateStore(Val: ConvertedShadow2, Ptr: ShadowAlloca);
1487 unsigned ShadowSize = DL.getTypeAllocSize(Ty: ConvertedShadow2->getType());
1488 CallBase *CB = IRB.CreateCall(
1489 Callee: Fn,
1490 Args: {ShadowAlloca, ConstantInt::get(Ty: IRB.getInt64Ty(), V: ShadowSize),
1491 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(C: 0)});
1492 CB->addParamAttr(ArgNo: 1, Kind: Attribute::ZExt);
1493 CB->addParamAttr(ArgNo: 2, Kind: Attribute::ZExt);
1494 }
1495 } else {
1496 Value *Cmp = convertToBool(V: ConvertedShadow, IRB, name: "_mscmp");
1497 Instruction *CheckTerm = SplitBlockAndInsertIfThen(
1498 Cond: Cmp, SplitBefore: &*IRB.GetInsertPoint(),
1499 /* Unreachable */ !MS.Recover, BranchWeights: MS.ColdCallWeights);
1500
1501 IRB.SetInsertPoint(CheckTerm);
1502 insertWarningFn(IRB, Origin);
1503 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1504 }
1505 }
1506
1507 void materializeInstructionChecks(
1508 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1509 const DataLayout &DL = F.getDataLayout();
1510 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1511 // correct origin.
1512 bool Combine = !MS.TrackOrigins;
1513 Instruction *Instruction = InstructionChecks.front().OrigIns;
1514 Value *Shadow = nullptr;
1515 for (const auto &ShadowData : InstructionChecks) {
1516 assert(ShadowData.OrigIns == Instruction);
1517 IRBuilder<> IRB(Instruction);
1518
1519 Value *ConvertedShadow = ShadowData.Shadow;
1520
1521 if (auto *ConstantShadow = dyn_cast<Constant>(Val: ConvertedShadow)) {
1522 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1523 // Skip, value is initialized or const shadow is ignored.
1524 continue;
1525 }
1526 if (llvm::isKnownNonZero(V: ConvertedShadow, Q: DL)) {
1527 // Report as the value is definitely uninitialized.
1528 insertWarningFn(IRB, Origin: ShadowData.Origin);
1529 if (!MS.Recover)
1530 return; // Always fail and stop here, not need to check the rest.
1531 // Skip entire instruction,
1532 continue;
1533 }
1534 // Fallback to runtime check, which still can be optimized out later.
1535 }
1536
1537 if (!Combine) {
1538 materializeOneCheck(IRB, ConvertedShadow, Origin: ShadowData.Origin);
1539 continue;
1540 }
1541
1542 if (!Shadow) {
1543 Shadow = ConvertedShadow;
1544 continue;
1545 }
1546
1547 Shadow = convertToBool(V: Shadow, IRB, name: "_mscmp");
1548 ConvertedShadow = convertToBool(V: ConvertedShadow, IRB, name: "_mscmp");
1549 Shadow = IRB.CreateOr(LHS: Shadow, RHS: ConvertedShadow, Name: "_msor");
1550 }
1551
1552 if (Shadow) {
1553 assert(Combine);
1554 IRBuilder<> IRB(Instruction);
1555 materializeOneCheck(IRB, ConvertedShadow: Shadow, Origin: nullptr);
1556 }
1557 }
1558
1559 static bool isAArch64SVCount(Type *Ty) {
1560 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Val: Ty))
1561 return TTy->getName() == "aarch64.svcount";
1562 return false;
1563 }
1564
1565 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1566 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1567 static bool isScalableNonVectorType(Type *Ty) {
1568 if (!isAArch64SVCount(Ty))
1569 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1570 << "\n");
1571
1572 return Ty->isScalableTy() && !isa<VectorType>(Val: Ty);
1573 }
1574
1575 void materializeChecks() {
1576#ifndef NDEBUG
1577 // For assert below.
1578 SmallPtrSet<Instruction *, 16> Done;
1579#endif
1580
1581 for (auto I = InstrumentationList.begin();
1582 I != InstrumentationList.end();) {
1583 auto OrigIns = I->OrigIns;
1584 // Checks are grouped by the original instruction. We call all
1585 // `insertShadowCheck` for an instruction at once.
1586 assert(Done.insert(OrigIns).second);
1587 auto J = std::find_if(first: I + 1, last: InstrumentationList.end(),
1588 pred: [OrigIns](const ShadowOriginAndInsertPoint &R) {
1589 return OrigIns != R.OrigIns;
1590 });
1591 // Process all checks of instruction at once.
1592 materializeInstructionChecks(InstructionChecks: ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1593 I = J;
1594 }
1595
1596 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1597 }
1598
1599 // Returns the last instruction in the new prologue
1600 void insertKmsanPrologue(IRBuilder<> &IRB) {
1601 Value *ContextState = IRB.CreateCall(Callee: MS.MsanGetContextStateFn, Args: {});
1602 Constant *Zero = IRB.getInt32(C: 0);
1603 MS.ParamTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1604 IdxList: {Zero, IRB.getInt32(C: 0)}, Name: "param_shadow");
1605 MS.RetvalTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1606 IdxList: {Zero, IRB.getInt32(C: 1)}, Name: "retval_shadow");
1607 MS.VAArgTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1608 IdxList: {Zero, IRB.getInt32(C: 2)}, Name: "va_arg_shadow");
1609 MS.VAArgOriginTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1610 IdxList: {Zero, IRB.getInt32(C: 3)}, Name: "va_arg_origin");
1611 MS.VAArgOverflowSizeTLS =
1612 IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1613 IdxList: {Zero, IRB.getInt32(C: 4)}, Name: "va_arg_overflow_size");
1614 MS.ParamOriginTLS = IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1615 IdxList: {Zero, IRB.getInt32(C: 5)}, Name: "param_origin");
1616 MS.RetvalOriginTLS =
1617 IRB.CreateGEP(Ty: MS.MsanContextStateTy, Ptr: ContextState,
1618 IdxList: {Zero, IRB.getInt32(C: 6)}, Name: "retval_origin");
1619 if (MS.TargetTriple.getArch() == Triple::systemz)
1620 MS.MsanMetadataAlloca = IRB.CreateAlloca(Ty: MS.MsanMetadata, AddrSpace: 0u);
1621 }
1622
1623 /// Add MemorySanitizer instrumentation to a function.
1624 bool runOnFunction() {
1625 // Iterate all BBs in depth-first order and create shadow instructions
1626 // for all instructions (where applicable).
1627 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1628 for (BasicBlock *BB : depth_first(G: FnPrologueEnd->getParent()))
1629 visit(BB&: *BB);
1630
1631 // `visit` above only collects instructions. Process them after iterating
1632 // CFG to avoid requirement on CFG transformations.
1633 for (Instruction *I : Instructions)
1634 InstVisitor<MemorySanitizerVisitor>::visit(I&: *I);
1635
1636 // Finalize PHI nodes.
1637 for (PHINode *PN : ShadowPHINodes) {
1638 PHINode *PNS = cast<PHINode>(Val: getShadow(V: PN));
1639 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(Val: getOrigin(V: PN)) : nullptr;
1640 size_t NumValues = PN->getNumIncomingValues();
1641 for (size_t v = 0; v < NumValues; v++) {
1642 PNS->addIncoming(V: getShadow(I: PN, i: v), BB: PN->getIncomingBlock(i: v));
1643 if (PNO)
1644 PNO->addIncoming(V: getOrigin(I: PN, i: v), BB: PN->getIncomingBlock(i: v));
1645 }
1646 }
1647
1648 VAHelper->finalizeInstrumentation();
1649
1650 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1651 // instrumenting only allocas.
1652 if (ClHandleLifetimeIntrinsics) {
1653 for (auto Item : LifetimeStartList) {
1654 instrumentAlloca(I&: *Item.second, InsPoint: Item.first);
1655 AllocaSet.remove(X: Item.second);
1656 }
1657 }
1658 // Poison the allocas for which we didn't instrument the corresponding
1659 // lifetime intrinsics.
1660 for (AllocaInst *AI : AllocaSet)
1661 instrumentAlloca(I&: *AI);
1662
1663 // Insert shadow value checks.
1664 materializeChecks();
1665
1666 // Delayed instrumentation of StoreInst.
1667 // This may not add new address checks.
1668 materializeStores();
1669
1670 return true;
1671 }
1672
1673 /// Compute the shadow type that corresponds to a given Value.
1674 Type *getShadowTy(Value *V) { return getShadowTy(OrigTy: V->getType()); }
1675
1676 /// Compute the shadow type that corresponds to a given Type.
1677 Type *getShadowTy(Type *OrigTy) {
1678 if (!OrigTy->isSized()) {
1679 return nullptr;
1680 }
1681 // For integer type, shadow is the same as the original type.
1682 // This may return weird-sized types like i1.
1683 if (IntegerType *IT = dyn_cast<IntegerType>(Val: OrigTy))
1684 return IT;
1685 const DataLayout &DL = F.getDataLayout();
1686 if (VectorType *VT = dyn_cast<VectorType>(Val: OrigTy)) {
1687 uint32_t EltSize = DL.getTypeSizeInBits(Ty: VT->getElementType());
1688 return VectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: EltSize),
1689 EC: VT->getElementCount());
1690 }
1691 if (ArrayType *AT = dyn_cast<ArrayType>(Val: OrigTy)) {
1692 return ArrayType::get(ElementType: getShadowTy(OrigTy: AT->getElementType()),
1693 NumElements: AT->getNumElements());
1694 }
1695 if (StructType *ST = dyn_cast<StructType>(Val: OrigTy)) {
1696 SmallVector<Type *, 4> Elements;
1697 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1698 Elements.push_back(Elt: getShadowTy(OrigTy: ST->getElementType(N: i)));
1699 StructType *Res = StructType::get(Context&: *MS.C, Elements, isPacked: ST->isPacked());
1700 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1701 return Res;
1702 }
1703 if (isScalableNonVectorType(Ty: OrigTy)) {
1704 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1705 << "\n");
1706 return OrigTy;
1707 }
1708
1709 uint32_t TypeSize = DL.getTypeSizeInBits(Ty: OrigTy);
1710 return IntegerType::get(C&: *MS.C, NumBits: TypeSize);
1711 }
1712
1713 /// Extract combined shadow of struct elements as a bool
1714 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1715 IRBuilder<> &IRB) {
1716 Value *FalseVal = IRB.getIntN(/* width */ N: 1, /* value */ C: 0);
1717 Value *Aggregator = FalseVal;
1718
1719 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1720 // Combine by ORing together each element's bool shadow
1721 Value *ShadowItem = IRB.CreateExtractValue(Agg: Shadow, Idxs: Idx);
1722 Value *ShadowBool = convertToBool(V: ShadowItem, IRB);
1723
1724 if (Aggregator != FalseVal)
1725 Aggregator = IRB.CreateOr(LHS: Aggregator, RHS: ShadowBool);
1726 else
1727 Aggregator = ShadowBool;
1728 }
1729
1730 return Aggregator;
1731 }
1732
1733 // Extract combined shadow of array elements
1734 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1735 IRBuilder<> &IRB) {
1736 if (!Array->getNumElements())
1737 return IRB.getIntN(/* width */ N: 1, /* value */ C: 0);
1738
1739 Value *FirstItem = IRB.CreateExtractValue(Agg: Shadow, Idxs: 0);
1740 Value *Aggregator = convertShadowToScalar(V: FirstItem, IRB);
1741
1742 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1743 Value *ShadowItem = IRB.CreateExtractValue(Agg: Shadow, Idxs: Idx);
1744 Value *ShadowInner = convertShadowToScalar(V: ShadowItem, IRB);
1745 Aggregator = IRB.CreateOr(LHS: Aggregator, RHS: ShadowInner);
1746 }
1747 return Aggregator;
1748 }
1749
1750 /// Convert a shadow value to it's flattened variant. The resulting
1751 /// shadow may not necessarily have the same bit width as the input
1752 /// value, but it will always be comparable to zero.
1753 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1754 if (StructType *Struct = dyn_cast<StructType>(Val: V->getType()))
1755 return collapseStructShadow(Struct, Shadow: V, IRB);
1756 if (ArrayType *Array = dyn_cast<ArrayType>(Val: V->getType()))
1757 return collapseArrayShadow(Array, Shadow: V, IRB);
1758 if (isa<VectorType>(Val: V->getType())) {
1759 if (isa<ScalableVectorType>(Val: V->getType()))
1760 return convertShadowToScalar(V: IRB.CreateOrReduce(Src: V), IRB);
1761 unsigned BitWidth =
1762 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1763 return IRB.CreateBitCast(V, DestTy: IntegerType::get(C&: *MS.C, NumBits: BitWidth));
1764 }
1765 return V;
1766 }
1767
1768 // Convert a scalar value to an i1 by comparing with 0
1769 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1770 Type *VTy = V->getType();
1771 if (!VTy->isIntegerTy())
1772 return convertToBool(V: convertShadowToScalar(V, IRB), IRB, name);
1773 if (VTy->getIntegerBitWidth() == 1)
1774 // Just converting a bool to a bool, so do nothing.
1775 return V;
1776 return IRB.CreateICmpNE(LHS: V, RHS: ConstantInt::get(Ty: VTy, V: 0), Name: name);
1777 }
1778
1779 Type *ptrToIntPtrType(Type *PtrTy) const {
1780 if (VectorType *VectTy = dyn_cast<VectorType>(Val: PtrTy)) {
1781 return VectorType::get(ElementType: ptrToIntPtrType(PtrTy: VectTy->getElementType()),
1782 EC: VectTy->getElementCount());
1783 }
1784 assert(PtrTy->isIntOrPtrTy());
1785 return MS.IntptrTy;
1786 }
1787
1788 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1789 if (VectorType *VectTy = dyn_cast<VectorType>(Val: IntPtrTy)) {
1790 return VectorType::get(
1791 ElementType: getPtrToShadowPtrType(IntPtrTy: VectTy->getElementType(), ShadowTy),
1792 EC: VectTy->getElementCount());
1793 }
1794 assert(IntPtrTy == MS.IntptrTy);
1795 return MS.PtrTy;
1796 }
1797
1798 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1799 if (VectorType *VectTy = dyn_cast<VectorType>(Val: IntPtrTy)) {
1800 return ConstantVector::getSplat(
1801 EC: VectTy->getElementCount(),
1802 Elt: constToIntPtr(IntPtrTy: VectTy->getElementType(), C));
1803 }
1804 assert(IntPtrTy == MS.IntptrTy);
1805 // TODO: Avoid implicit trunc?
1806 // See https://github.com/llvm/llvm-project/issues/112510.
1807 return ConstantInt::get(Ty: MS.IntptrTy, V: C, /*IsSigned=*/false,
1808 /*ImplicitTrunc=*/true);
1809 }
1810
1811 /// Returns the integer shadow offset that corresponds to a given
1812 /// application address, whereby:
1813 ///
1814 /// Offset = (Addr & ~AndMask) ^ XorMask
1815 /// Shadow = ShadowBase + Offset
1816 /// Origin = (OriginBase + Offset) & ~Alignment
1817 ///
1818 /// Note: for efficiency, many shadow mappings only require use the XorMask
1819 /// and OriginBase; the AndMask and ShadowBase are often zero.
1820 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1821 Type *IntptrTy = ptrToIntPtrType(PtrTy: Addr->getType());
1822 Value *OffsetLong = IRB.CreatePointerCast(V: Addr, DestTy: IntptrTy);
1823
1824 if (uint64_t AndMask = MS.MapParams->AndMask)
1825 OffsetLong = IRB.CreateAnd(LHS: OffsetLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: ~AndMask));
1826
1827 if (uint64_t XorMask = MS.MapParams->XorMask)
1828 OffsetLong = IRB.CreateXor(LHS: OffsetLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: XorMask));
1829 return OffsetLong;
1830 }
1831
1832 /// Compute the shadow and origin addresses corresponding to a given
1833 /// application address.
1834 ///
1835 /// Shadow = ShadowBase + Offset
1836 /// Origin = (OriginBase + Offset) & ~3ULL
1837 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1838 /// a single pointee.
1839 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1840 std::pair<Value *, Value *>
1841 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1842 MaybeAlign Alignment) {
1843 VectorType *VectTy = dyn_cast<VectorType>(Val: Addr->getType());
1844 if (!VectTy) {
1845 assert(Addr->getType()->isPointerTy());
1846 } else {
1847 assert(VectTy->getElementType()->isPointerTy());
1848 }
1849 Type *IntptrTy = ptrToIntPtrType(PtrTy: Addr->getType());
1850 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1851 Value *ShadowLong = ShadowOffset;
1852 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1853 ShadowLong =
1854 IRB.CreateAdd(LHS: ShadowLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: ShadowBase));
1855 }
1856 Value *ShadowPtr = IRB.CreateIntToPtr(
1857 V: ShadowLong, DestTy: getPtrToShadowPtrType(IntPtrTy: IntptrTy, ShadowTy));
1858
1859 Value *OriginPtr = nullptr;
1860 if (MS.TrackOrigins) {
1861 Value *OriginLong = ShadowOffset;
1862 uint64_t OriginBase = MS.MapParams->OriginBase;
1863 if (OriginBase != 0)
1864 OriginLong =
1865 IRB.CreateAdd(LHS: OriginLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: OriginBase));
1866 if (!Alignment || *Alignment < kMinOriginAlignment) {
1867 uint64_t Mask = kMinOriginAlignment.value() - 1;
1868 OriginLong = IRB.CreateAnd(LHS: OriginLong, RHS: constToIntPtr(IntPtrTy: IntptrTy, C: ~Mask));
1869 }
1870 OriginPtr = IRB.CreateIntToPtr(
1871 V: OriginLong, DestTy: getPtrToShadowPtrType(IntPtrTy: IntptrTy, ShadowTy: MS.OriginTy));
1872 }
1873 return std::make_pair(x&: ShadowPtr, y&: OriginPtr);
1874 }
1875
1876 template <typename... ArgsTy>
1877 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1878 ArgsTy... Args) {
1879 if (MS.TargetTriple.getArch() == Triple::systemz) {
1880 IRB.CreateCall(Callee,
1881 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1882 return IRB.CreateLoad(Ty: MS.MsanMetadata, Ptr: MS.MsanMetadataAlloca);
1883 }
1884
1885 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1886 }
1887
1888 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1889 IRBuilder<> &IRB,
1890 Type *ShadowTy,
1891 bool isStore) {
1892 Value *ShadowOriginPtrs;
1893 const DataLayout &DL = F.getDataLayout();
1894 TypeSize Size = DL.getTypeStoreSize(Ty: ShadowTy);
1895
1896 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, size: Size);
1897 Value *AddrCast = IRB.CreatePointerCast(V: Addr, DestTy: MS.PtrTy);
1898 if (Getter) {
1899 ShadowOriginPtrs = createMetadataCall(IRB, Callee: Getter, Args: AddrCast);
1900 } else {
1901 Value *SizeVal = ConstantInt::get(Ty: MS.IntptrTy, V: Size);
1902 ShadowOriginPtrs = createMetadataCall(
1903 IRB,
1904 Callee: isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1905 Args: AddrCast, Args: SizeVal);
1906 }
1907 Value *ShadowPtr = IRB.CreateExtractValue(Agg: ShadowOriginPtrs, Idxs: 0);
1908 ShadowPtr = IRB.CreatePointerCast(V: ShadowPtr, DestTy: MS.PtrTy);
1909 Value *OriginPtr = IRB.CreateExtractValue(Agg: ShadowOriginPtrs, Idxs: 1);
1910
1911 return std::make_pair(x&: ShadowPtr, y&: OriginPtr);
1912 }
1913
1914 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1915 /// a single pointee.
1916 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1917 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1918 IRBuilder<> &IRB,
1919 Type *ShadowTy,
1920 bool isStore) {
1921 VectorType *VectTy = dyn_cast<VectorType>(Val: Addr->getType());
1922 if (!VectTy) {
1923 assert(Addr->getType()->isPointerTy());
1924 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1925 }
1926
1927 // TODO: Support callbacs with vectors of addresses.
1928 unsigned NumElements = cast<FixedVectorType>(Val: VectTy)->getNumElements();
1929 Value *ShadowPtrs = ConstantInt::getNullValue(
1930 Ty: FixedVectorType::get(ElementType: IRB.getPtrTy(), NumElts: NumElements));
1931 Value *OriginPtrs = nullptr;
1932 if (MS.TrackOrigins)
1933 OriginPtrs = ConstantInt::getNullValue(
1934 Ty: FixedVectorType::get(ElementType: IRB.getPtrTy(), NumElts: NumElements));
1935 for (unsigned i = 0; i < NumElements; ++i) {
1936 Value *OneAddr =
1937 IRB.CreateExtractElement(Vec: Addr, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
1938 auto [ShadowPtr, OriginPtr] =
1939 getShadowOriginPtrKernelNoVec(Addr: OneAddr, IRB, ShadowTy, isStore);
1940
1941 ShadowPtrs = IRB.CreateInsertElement(
1942 Vec: ShadowPtrs, NewElt: ShadowPtr, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
1943 if (MS.TrackOrigins)
1944 OriginPtrs = IRB.CreateInsertElement(
1945 Vec: OriginPtrs, NewElt: OriginPtr, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
1946 }
1947 return {ShadowPtrs, OriginPtrs};
1948 }
1949
1950 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1951 Type *ShadowTy,
1952 MaybeAlign Alignment,
1953 bool isStore) {
1954 if (MS.CompileKernel)
1955 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1956 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1957 }
1958
1959 /// Compute the shadow address for a given function argument.
1960 ///
1961 /// Shadow = ParamTLS+ArgOffset.
1962 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1963 return IRB.CreatePtrAdd(Ptr: MS.ParamTLS,
1964 Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset), Name: "_msarg");
1965 }
1966
1967 /// Compute the origin address for a given function argument.
1968 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1969 if (!MS.TrackOrigins)
1970 return nullptr;
1971 return IRB.CreatePtrAdd(Ptr: MS.ParamOriginTLS,
1972 Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset),
1973 Name: "_msarg_o");
1974 }
1975
1976 /// Compute the shadow address for a retval.
1977 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1978 return IRB.CreatePointerCast(V: MS.RetvalTLS, DestTy: IRB.getPtrTy(AddrSpace: 0), Name: "_msret");
1979 }
1980
1981 /// Compute the origin address for a retval.
1982 Value *getOriginPtrForRetval() {
1983 // We keep a single origin for the entire retval. Might be too optimistic.
1984 return MS.RetvalOriginTLS;
1985 }
1986
1987 /// Set SV to be the shadow value for V.
1988 void setShadow(Value *V, Value *SV) {
1989 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1990 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1991 }
1992
1993 /// Set Origin to be the origin value for V.
1994 void setOrigin(Value *V, Value *Origin) {
1995 if (!MS.TrackOrigins)
1996 return;
1997 assert(!OriginMap.count(V) && "Values may only have one origin");
1998 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1999 OriginMap[V] = Origin;
2000 }
2001
2002 Constant *getCleanShadow(Type *OrigTy) {
2003 Type *ShadowTy = getShadowTy(OrigTy);
2004 if (!ShadowTy)
2005 return nullptr;
2006 return Constant::getNullValue(Ty: ShadowTy);
2007 }
2008
2009 /// Create a clean shadow value for a given value.
2010 ///
2011 /// Clean shadow (all zeroes) means all bits of the value are defined
2012 /// (initialized).
2013 Constant *getCleanShadow(Value *V) { return getCleanShadow(OrigTy: V->getType()); }
2014
2015 /// Create a dirty shadow of a given shadow type.
2016 Constant *getPoisonedShadow(Type *ShadowTy) {
2017 assert(ShadowTy);
2018 if (isa<IntegerType>(Val: ShadowTy) || isa<VectorType>(Val: ShadowTy))
2019 return Constant::getAllOnesValue(Ty: ShadowTy);
2020 if (ArrayType *AT = dyn_cast<ArrayType>(Val: ShadowTy)) {
2021 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2022 getPoisonedShadow(ShadowTy: AT->getElementType()));
2023 return ConstantArray::get(T: AT, V: Vals);
2024 }
2025 if (StructType *ST = dyn_cast<StructType>(Val: ShadowTy)) {
2026 SmallVector<Constant *, 4> Vals;
2027 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2028 Vals.push_back(Elt: getPoisonedShadow(ShadowTy: ST->getElementType(N: i)));
2029 return ConstantStruct::get(T: ST, V: Vals);
2030 }
2031 llvm_unreachable("Unexpected shadow type");
2032 }
2033
2034 /// Create a dirty shadow for a given value.
2035 Constant *getPoisonedShadow(Value *V) {
2036 Type *ShadowTy = getShadowTy(V);
2037 if (!ShadowTy)
2038 return nullptr;
2039 return getPoisonedShadow(ShadowTy);
2040 }
2041
2042 /// Create a clean (zero) origin.
2043 Value *getCleanOrigin() { return Constant::getNullValue(Ty: MS.OriginTy); }
2044
2045 /// Get the shadow value for a given Value.
2046 ///
2047 /// This function either returns the value set earlier with setShadow,
2048 /// or extracts if from ParamTLS (for function arguments).
2049 Value *getShadow(Value *V) {
2050 if (Instruction *I = dyn_cast<Instruction>(Val: V)) {
2051 if (!PropagateShadow || I->getMetadata(KindID: LLVMContext::MD_nosanitize))
2052 return getCleanShadow(V);
2053 // For instructions the shadow is already stored in the map.
2054 Value *Shadow = ShadowMap[V];
2055 if (!Shadow) {
2056 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2057 assert(Shadow && "No shadow for a value");
2058 }
2059 return Shadow;
2060 }
2061 // Handle fully undefined values
2062 // (partially undefined constant vectors are handled later)
2063 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(Val: V)) {
2064 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2065 : getCleanShadow(V);
2066 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2067 return AllOnes;
2068 }
2069 if (Argument *A = dyn_cast<Argument>(Val: V)) {
2070 // For arguments we compute the shadow on demand and store it in the map.
2071 Value *&ShadowPtr = ShadowMap[V];
2072 if (ShadowPtr)
2073 return ShadowPtr;
2074 Function *F = A->getParent();
2075 IRBuilder<> EntryIRB(FnPrologueEnd);
2076 unsigned ArgOffset = 0;
2077 const DataLayout &DL = F->getDataLayout();
2078 for (auto &FArg : F->args()) {
2079 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2080 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2081 ? "vscale not fully supported\n"
2082 : "Arg is not sized\n"));
2083 if (A == &FArg) {
2084 ShadowPtr = getCleanShadow(V);
2085 setOrigin(V: A, Origin: getCleanOrigin());
2086 break;
2087 }
2088 continue;
2089 }
2090
2091 unsigned Size = FArg.hasByValAttr()
2092 ? DL.getTypeAllocSize(Ty: FArg.getParamByValType())
2093 : DL.getTypeAllocSize(Ty: FArg.getType());
2094
2095 if (A == &FArg) {
2096 bool Overflow = ArgOffset + Size > kParamTLSSize;
2097 if (FArg.hasByValAttr()) {
2098 // ByVal pointer itself has clean shadow. We copy the actual
2099 // argument shadow to the underlying memory.
2100 // Figure out maximal valid memcpy alignment.
2101 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2102 Alignment: FArg.getParamAlign(), Ty: FArg.getParamByValType());
2103 Value *CpShadowPtr, *CpOriginPtr;
2104 std::tie(args&: CpShadowPtr, args&: CpOriginPtr) =
2105 getShadowOriginPtr(Addr: V, IRB&: EntryIRB, ShadowTy: EntryIRB.getInt8Ty(), Alignment: ArgAlign,
2106 /*isStore*/ true);
2107 if (!PropagateShadow || Overflow) {
2108 // ParamTLS overflow.
2109 EntryIRB.CreateMemSet(
2110 Ptr: CpShadowPtr, Val: Constant::getNullValue(Ty: EntryIRB.getInt8Ty()),
2111 Size, Align: ArgAlign);
2112 } else {
2113 Value *Base = getShadowPtrForArgument(IRB&: EntryIRB, ArgOffset);
2114 const Align CopyAlign = std::min(a: ArgAlign, b: kShadowTLSAlignment);
2115 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2116 Dst: CpShadowPtr, DstAlign: CopyAlign, Src: Base, SrcAlign: CopyAlign, Size);
2117 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2118
2119 if (MS.TrackOrigins) {
2120 Value *OriginPtr = getOriginPtrForArgument(IRB&: EntryIRB, ArgOffset);
2121 // FIXME: OriginSize should be:
2122 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2123 unsigned OriginSize = alignTo(Size, A: kMinOriginAlignment);
2124 EntryIRB.CreateMemCpy(
2125 Dst: CpOriginPtr,
2126 /* by getShadowOriginPtr */ DstAlign: kMinOriginAlignment, Src: OriginPtr,
2127 /* by origin_tls[ArgOffset] */ SrcAlign: kMinOriginAlignment,
2128 Size: OriginSize);
2129 }
2130 }
2131 }
2132
2133 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2134 (MS.EagerChecks && FArg.hasAttribute(Kind: Attribute::NoUndef))) {
2135 ShadowPtr = getCleanShadow(V);
2136 setOrigin(V: A, Origin: getCleanOrigin());
2137 } else {
2138 // Shadow over TLS
2139 Value *Base = getShadowPtrForArgument(IRB&: EntryIRB, ArgOffset);
2140 ShadowPtr = EntryIRB.CreateAlignedLoad(Ty: getShadowTy(V: &FArg), Ptr: Base,
2141 Align: kShadowTLSAlignment);
2142 if (MS.TrackOrigins) {
2143 Value *OriginPtr = getOriginPtrForArgument(IRB&: EntryIRB, ArgOffset);
2144 setOrigin(V: A, Origin: EntryIRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr));
2145 }
2146 }
2147 LLVM_DEBUG(dbgs()
2148 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2149 break;
2150 }
2151
2152 ArgOffset += alignTo(Size, A: kShadowTLSAlignment);
2153 }
2154 assert(ShadowPtr && "Could not find shadow for an argument");
2155 return ShadowPtr;
2156 }
2157
2158 // Check for partially-undefined constant vectors
2159 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2160 if (isa<FixedVectorType>(Val: V->getType()) && isa<Constant>(Val: V) &&
2161 cast<Constant>(Val: V)->containsUndefOrPoisonElement() && PropagateShadow &&
2162 PoisonUndefVectors) {
2163 unsigned NumElems = cast<FixedVectorType>(Val: V->getType())->getNumElements();
2164 SmallVector<Constant *, 32> ShadowVector(NumElems);
2165 for (unsigned i = 0; i != NumElems; ++i) {
2166 Constant *Elem = cast<Constant>(Val: V)->getAggregateElement(Elt: i);
2167 ShadowVector[i] = isa<UndefValue>(Val: Elem) ? getPoisonedShadow(V: Elem)
2168 : getCleanShadow(V: Elem);
2169 }
2170
2171 Value *ShadowConstant = ConstantVector::get(V: ShadowVector);
2172 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2173 << *ShadowConstant << "\n");
2174
2175 return ShadowConstant;
2176 }
2177
2178 // TODO: partially-undefined constant arrays, structures, and nested types
2179
2180 // For everything else the shadow is zero.
2181 return getCleanShadow(V);
2182 }
2183
2184 /// Get the shadow for i-th argument of the instruction I.
2185 Value *getShadow(Instruction *I, int i) {
2186 return getShadow(V: I->getOperand(i));
2187 }
2188
2189 /// Get the origin for a value.
2190 Value *getOrigin(Value *V) {
2191 if (!MS.TrackOrigins)
2192 return nullptr;
2193 if (!PropagateShadow || isa<Constant>(Val: V) || isa<InlineAsm>(Val: V))
2194 return getCleanOrigin();
2195 assert((isa<Instruction>(V) || isa<Argument>(V)) &&
2196 "Unexpected value type in getOrigin()");
2197 if (Instruction *I = dyn_cast<Instruction>(Val: V)) {
2198 if (I->getMetadata(KindID: LLVMContext::MD_nosanitize))
2199 return getCleanOrigin();
2200 }
2201 Value *Origin = OriginMap[V];
2202 assert(Origin && "Missing origin");
2203 return Origin;
2204 }
2205
2206 /// Get the origin for i-th argument of the instruction I.
2207 Value *getOrigin(Instruction *I, int i) {
2208 return getOrigin(V: I->getOperand(i));
2209 }
2210
2211 /// Remember the place where a shadow check should be inserted.
2212 ///
2213 /// This location will be later instrumented with a check that will print a
2214 /// UMR warning in runtime if the shadow value is not 0.
2215 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2216 assert(Shadow);
2217 if (!InsertChecks)
2218 return;
2219
2220 if (!DebugCounter::shouldExecute(Counter&: DebugInsertCheck)) {
2221 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2222 << *OrigIns << "\n");
2223 return;
2224 }
2225
2226 Type *ShadowTy = Shadow->getType();
2227 if (isScalableNonVectorType(Ty: ShadowTy)) {
2228 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2229 << " before " << *OrigIns << "\n");
2230 return;
2231 }
2232#ifndef NDEBUG
2233 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2234 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2235 "Can only insert checks for integer, vector, and aggregate shadow "
2236 "types");
2237#endif
2238 InstrumentationList.push_back(
2239 Elt: ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2240 }
2241
2242 /// Get shadow for value, and remember the place where a shadow check should
2243 /// be inserted.
2244 ///
2245 /// This location will be later instrumented with a check that will print a
2246 /// UMR warning in runtime if the value is not fully defined.
2247 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2248 assert(Val);
2249 Value *Shadow, *Origin;
2250 if (ClCheckConstantShadow) {
2251 Shadow = getShadow(V: Val);
2252 if (!Shadow)
2253 return;
2254 Origin = getOrigin(V: Val);
2255 } else {
2256 Shadow = dyn_cast_or_null<Instruction>(Val: getShadow(V: Val));
2257 if (!Shadow)
2258 return;
2259 Origin = dyn_cast_or_null<Instruction>(Val: getOrigin(V: Val));
2260 }
2261 insertCheckShadow(Shadow, Origin, OrigIns);
2262 }
2263
2264 AtomicOrdering addReleaseOrdering(AtomicOrdering a) {
2265 switch (a) {
2266 case AtomicOrdering::NotAtomic:
2267 return AtomicOrdering::NotAtomic;
2268 case AtomicOrdering::Unordered:
2269 case AtomicOrdering::Monotonic:
2270 case AtomicOrdering::Release:
2271 return AtomicOrdering::Release;
2272 case AtomicOrdering::Acquire:
2273 case AtomicOrdering::AcquireRelease:
2274 return AtomicOrdering::AcquireRelease;
2275 case AtomicOrdering::SequentiallyConsistent:
2276 return AtomicOrdering::SequentiallyConsistent;
2277 }
2278 llvm_unreachable("Unknown ordering");
2279 }
2280
2281 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2282 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2283 uint32_t OrderingTable[NumOrderings] = {};
2284
2285 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2286 OrderingTable[(int)AtomicOrderingCABI::release] =
2287 (int)AtomicOrderingCABI::release;
2288 OrderingTable[(int)AtomicOrderingCABI::consume] =
2289 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2290 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2291 (int)AtomicOrderingCABI::acq_rel;
2292 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2293 (int)AtomicOrderingCABI::seq_cst;
2294
2295 return ConstantDataVector::get(Context&: IRB.getContext(), Elts: OrderingTable);
2296 }
2297
2298 AtomicOrdering addAcquireOrdering(AtomicOrdering a) {
2299 switch (a) {
2300 case AtomicOrdering::NotAtomic:
2301 return AtomicOrdering::NotAtomic;
2302 case AtomicOrdering::Unordered:
2303 case AtomicOrdering::Monotonic:
2304 case AtomicOrdering::Acquire:
2305 return AtomicOrdering::Acquire;
2306 case AtomicOrdering::Release:
2307 case AtomicOrdering::AcquireRelease:
2308 return AtomicOrdering::AcquireRelease;
2309 case AtomicOrdering::SequentiallyConsistent:
2310 return AtomicOrdering::SequentiallyConsistent;
2311 }
2312 llvm_unreachable("Unknown ordering");
2313 }
2314
2315 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2316 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2317 uint32_t OrderingTable[NumOrderings] = {};
2318
2319 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2320 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2321 OrderingTable[(int)AtomicOrderingCABI::consume] =
2322 (int)AtomicOrderingCABI::acquire;
2323 OrderingTable[(int)AtomicOrderingCABI::release] =
2324 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2325 (int)AtomicOrderingCABI::acq_rel;
2326 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2327 (int)AtomicOrderingCABI::seq_cst;
2328
2329 return ConstantDataVector::get(Context&: IRB.getContext(), Elts: OrderingTable);
2330 }
2331
2332 // ------------------- Visitors.
2333 using InstVisitor<MemorySanitizerVisitor>::visit;
2334 void visit(Instruction &I) {
2335 if (I.getMetadata(KindID: LLVMContext::MD_nosanitize))
2336 return;
2337 // Don't want to visit if we're in the prologue
2338 if (isInPrologue(I))
2339 return;
2340 if (!DebugCounter::shouldExecute(Counter&: DebugInstrumentInstruction)) {
2341 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2342 // We still need to set the shadow and origin to clean values.
2343 setShadow(V: &I, SV: getCleanShadow(V: &I));
2344 setOrigin(V: &I, Origin: getCleanOrigin());
2345 return;
2346 }
2347
2348 Instructions.push_back(Elt: &I);
2349 }
2350
2351 /// Instrument LoadInst
2352 ///
2353 /// Loads the corresponding shadow and (optionally) origin.
2354 /// Optionally, checks that the load address is fully defined.
2355 void visitLoadInst(LoadInst &I) {
2356 assert(I.getType()->isSized() && "Load type must have size");
2357 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2358 NextNodeIRBuilder IRB(&I);
2359 Type *ShadowTy = getShadowTy(V: &I);
2360 Value *Addr = I.getPointerOperand();
2361 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2362 const Align Alignment = I.getAlign();
2363 if (PropagateShadow) {
2364 std::tie(args&: ShadowPtr, args&: OriginPtr) =
2365 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2366 setShadow(V: &I,
2367 SV: IRB.CreateAlignedLoad(Ty: ShadowTy, Ptr: ShadowPtr, Align: Alignment, Name: "_msld"));
2368 } else {
2369 setShadow(V: &I, SV: getCleanShadow(V: &I));
2370 }
2371
2372 if (ClCheckAccessAddress)
2373 insertCheckShadowOf(Val: I.getPointerOperand(), OrigIns: &I);
2374
2375 if (I.isAtomic())
2376 I.setOrdering(addAcquireOrdering(a: I.getOrdering()));
2377
2378 if (MS.TrackOrigins) {
2379 if (PropagateShadow) {
2380 const Align OriginAlignment = std::max(a: kMinOriginAlignment, b: Alignment);
2381 setOrigin(
2382 V: &I, Origin: IRB.CreateAlignedLoad(Ty: MS.OriginTy, Ptr: OriginPtr, Align: OriginAlignment));
2383 } else {
2384 setOrigin(V: &I, Origin: getCleanOrigin());
2385 }
2386 }
2387 }
2388
2389 /// Instrument StoreInst
2390 ///
2391 /// Stores the corresponding shadow and (optionally) origin.
2392 /// Optionally, checks that the store address is fully defined.
2393 void visitStoreInst(StoreInst &I) {
2394 StoreList.push_back(Elt: &I);
2395 if (ClCheckAccessAddress)
2396 insertCheckShadowOf(Val: I.getPointerOperand(), OrigIns: &I);
2397 }
2398
2399 void handleCASOrRMW(Instruction &I) {
2400 assert(isa<AtomicRMWInst>(I) || isa<AtomicCmpXchgInst>(I));
2401
2402 IRBuilder<> IRB(&I);
2403 Value *Addr = I.getOperand(i: 0);
2404 Value *Val = I.getOperand(i: 1);
2405 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, ShadowTy: getShadowTy(V: Val), Alignment: Align(1),
2406 /*isStore*/ true)
2407 .first;
2408
2409 if (ClCheckAccessAddress)
2410 insertCheckShadowOf(Val: Addr, OrigIns: &I);
2411
2412 // Only test the conditional argument of cmpxchg instruction.
2413 // The other argument can potentially be uninitialized, but we can not
2414 // detect this situation reliably without possible false positives.
2415 if (isa<AtomicCmpXchgInst>(Val: I))
2416 insertCheckShadowOf(Val, OrigIns: &I);
2417
2418 IRB.CreateStore(Val: getCleanShadow(V: Val), Ptr: ShadowPtr);
2419
2420 setShadow(V: &I, SV: getCleanShadow(V: &I));
2421 setOrigin(V: &I, Origin: getCleanOrigin());
2422 }
2423
2424 void visitAtomicRMWInst(AtomicRMWInst &I) {
2425 handleCASOrRMW(I);
2426 I.setOrdering(addReleaseOrdering(a: I.getOrdering()));
2427 }
2428
2429 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2430 handleCASOrRMW(I);
2431 I.setSuccessOrdering(addReleaseOrdering(a: I.getSuccessOrdering()));
2432 }
2433
2434 /// Generic handler to compute shadow for == and != comparisons.
2435 ///
2436 /// This function is used by handleEqualityComparison and visitSwitchInst.
2437 ///
2438 /// Sometimes the comparison result is known even if some of the bits of the
2439 /// arguments are not.
2440 Value *propagateEqualityComparison(IRBuilder<> &IRB, Value *A, Value *B,
2441 Value *Sa, Value *Sb) {
2442 assert(getShadowTy(A) == Sa->getType());
2443 assert(getShadowTy(B) == Sb->getType());
2444
2445 // Get rid of pointers and vectors of pointers.
2446 // For ints (and vectors of ints), types of A and Sa match,
2447 // and this is a no-op.
2448 A = IRB.CreatePointerCast(V: A, DestTy: Sa->getType());
2449 B = IRB.CreatePointerCast(V: B, DestTy: Sb->getType());
2450
2451 // A == B <==> (C = A^B) == 0
2452 // A != B <==> (C = A^B) != 0
2453 // Sc = Sa | Sb
2454 Value *C = IRB.CreateXor(LHS: A, RHS: B);
2455 Value *Sc = IRB.CreateOr(LHS: Sa, RHS: Sb);
2456 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2457 // Result is defined if one of the following is true
2458 // * there is a defined 1 bit in C
2459 // * C is fully defined
2460 // Si = !(C & ~Sc) && Sc
2461 Value *Zero = Constant::getNullValue(Ty: Sc->getType());
2462 Value *MinusOne = Constant::getAllOnesValue(Ty: Sc->getType());
2463 Value *LHS = IRB.CreateICmpNE(LHS: Sc, RHS: Zero);
2464 Value *RHS =
2465 IRB.CreateICmpEQ(LHS: IRB.CreateAnd(LHS: IRB.CreateXor(LHS: Sc, RHS: MinusOne), RHS: C), RHS: Zero);
2466 Value *Si = IRB.CreateAnd(LHS, RHS);
2467 Si->setName("_msprop_icmp");
2468
2469 return Si;
2470 }
2471
2472 // Instrument:
2473 // switch i32 %Val, label %else [ i32 0, label %A
2474 // i32 1, label %B
2475 // i32 2, label %C ]
2476 //
2477 // Typically, the switch input value (%Val) is fully initialized.
2478 //
2479 // Sometimes the compiler may convert (icmp + br) into a switch statement.
2480 // MSan allows icmp eq/ne with partly initialized inputs to still result in a
2481 // fully initialized output, if there exists a bit that is initialized in
2482 // both inputs with a differing value. For compatibility, we support this in
2483 // the switch instrumentation as well. Note that this edge case only applies
2484 // if the switch input value does not match *any* of the cases (matching any
2485 // of the cases requires an exact, fully initialized match).
2486 //
2487 // ShadowCases = 0
2488 // | propagateEqualityComparison(Val, 0)
2489 // | propagateEqualityComparison(Val, 1)
2490 // | propagateEqualityComparison(Val, 2))
2491 void visitSwitchInst(SwitchInst &SI) {
2492 IRBuilder<> IRB(&SI);
2493
2494 Value *Val = SI.getCondition();
2495 Value *ShadowVal = getShadow(V: Val);
2496 // TODO: add fast path - if the condition is fully initialized, we know
2497 // there is no UUM, without needing to consider the case values below.
2498
2499 // Some code (e.g., AMDGPUGenMCCodeEmitter.inc) has tens of thousands of
2500 // cases. This results in an extremely long chained expression for MSan's
2501 // switch instrumentation, which can cause the JumpThreadingPass to have a
2502 // stack overflow or excessive runtime. We limit the number of cases
2503 // considered, with the tradeoff of niche false negatives.
2504 // TODO: figure out a better solution.
2505 int casesToConsider = ClSwitchPrecision;
2506
2507 Value *ShadowCases = nullptr;
2508 for (auto Case : SI.cases()) {
2509 if (casesToConsider <= 0)
2510 break;
2511
2512 Value *Comparator = Case.getCaseValue();
2513 // TODO: some simplification is possible when comparing multiple cases
2514 // simultaneously.
2515 Value *ComparisonShadow = propagateEqualityComparison(
2516 IRB, A: Val, B: Comparator, Sa: ShadowVal, Sb: getShadow(V: Comparator));
2517
2518 if (ShadowCases)
2519 ShadowCases = IRB.CreateOr(LHS: ShadowCases, RHS: ComparisonShadow);
2520 else
2521 ShadowCases = ComparisonShadow;
2522
2523 casesToConsider--;
2524 }
2525
2526 if (ShadowCases)
2527 insertCheckShadow(Shadow: ShadowCases, Origin: getOrigin(V: Val), OrigIns: &SI);
2528 }
2529
2530 // Vector manipulation.
2531 void visitExtractElementInst(ExtractElementInst &I) {
2532 insertCheckShadowOf(Val: I.getOperand(i_nocapture: 1), OrigIns: &I);
2533 IRBuilder<> IRB(&I);
2534 setShadow(V: &I, SV: IRB.CreateExtractElement(Vec: getShadow(I: &I, i: 0), Idx: I.getOperand(i_nocapture: 1),
2535 Name: "_msprop"));
2536 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2537 }
2538
2539 void visitInsertElementInst(InsertElementInst &I) {
2540 insertCheckShadowOf(Val: I.getOperand(i_nocapture: 2), OrigIns: &I);
2541 IRBuilder<> IRB(&I);
2542 auto *Shadow0 = getShadow(I: &I, i: 0);
2543 auto *Shadow1 = getShadow(I: &I, i: 1);
2544 setShadow(V: &I, SV: IRB.CreateInsertElement(Vec: Shadow0, NewElt: Shadow1, Idx: I.getOperand(i_nocapture: 2),
2545 Name: "_msprop"));
2546 setOriginForNaryOp(I);
2547 }
2548
2549 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2550 IRBuilder<> IRB(&I);
2551 auto *Shadow0 = getShadow(I: &I, i: 0);
2552 auto *Shadow1 = getShadow(I: &I, i: 1);
2553 setShadow(V: &I, SV: IRB.CreateShuffleVector(V1: Shadow0, V2: Shadow1, Mask: I.getShuffleMask(),
2554 Name: "_msprop"));
2555 setOriginForNaryOp(I);
2556 }
2557
2558 // Casts.
2559 void visitSExtInst(SExtInst &I) {
2560 IRBuilder<> IRB(&I);
2561 setShadow(V: &I, SV: IRB.CreateSExt(V: getShadow(I: &I, i: 0), DestTy: I.getType(), Name: "_msprop"));
2562 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2563 }
2564
2565 void visitZExtInst(ZExtInst &I) {
2566 IRBuilder<> IRB(&I);
2567 setShadow(V: &I, SV: IRB.CreateZExt(V: getShadow(I: &I, i: 0), DestTy: I.getType(), Name: "_msprop"));
2568 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2569 }
2570
2571 void visitTruncInst(TruncInst &I) {
2572 IRBuilder<> IRB(&I);
2573 setShadow(V: &I, SV: IRB.CreateTrunc(V: getShadow(I: &I, i: 0), DestTy: I.getType(), Name: "_msprop"));
2574 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2575 }
2576
2577 void visitBitCastInst(BitCastInst &I) {
2578 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2579 // a musttail call and a ret, don't instrument. New instructions are not
2580 // allowed after a musttail call.
2581 if (auto *CI = dyn_cast<CallInst>(Val: I.getOperand(i_nocapture: 0)))
2582 if (CI->isMustTailCall())
2583 return;
2584 IRBuilder<> IRB(&I);
2585 setShadow(V: &I, SV: IRB.CreateBitCast(V: getShadow(I: &I, i: 0), DestTy: getShadowTy(V: &I)));
2586 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2587 }
2588
2589 void visitPtrToIntInst(PtrToIntInst &I) {
2590 IRBuilder<> IRB(&I);
2591 setShadow(V: &I, SV: IRB.CreateIntCast(V: getShadow(I: &I, i: 0), DestTy: getShadowTy(V: &I), isSigned: false,
2592 Name: "_msprop_ptrtoint"));
2593 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2594 }
2595
2596 void visitIntToPtrInst(IntToPtrInst &I) {
2597 IRBuilder<> IRB(&I);
2598 setShadow(V: &I, SV: IRB.CreateIntCast(V: getShadow(I: &I, i: 0), DestTy: getShadowTy(V: &I), isSigned: false,
2599 Name: "_msprop_inttoptr"));
2600 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
2601 }
2602
2603 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2604 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2605 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2606 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2607 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2608 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2609
2610 /// Generic handler to compute shadow for bitwise AND.
2611 ///
2612 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2613 ///
2614 /// This code is precise: it implements the rule that "And" of an initialized
2615 /// zero bit always results in an initialized value:
2616 // 1&1 => 1; 0&1 => 0; p&1 => p;
2617 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2618 // 1&p => p; 0&p => 0; p&p => p;
2619 //
2620 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2621 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2622 Value *S2) {
2623 Value *S1S2 = IRB.CreateAnd(LHS: S1, RHS: S2);
2624 Value *V1S2 = IRB.CreateAnd(LHS: V1, RHS: S2);
2625 Value *S1V2 = IRB.CreateAnd(LHS: S1, RHS: V2);
2626
2627 if (V1->getType() != S1->getType()) {
2628 V1 = IRB.CreateIntCast(V: V1, DestTy: S1->getType(), isSigned: false);
2629 V2 = IRB.CreateIntCast(V: V2, DestTy: S2->getType(), isSigned: false);
2630 }
2631
2632 return IRB.CreateOr(Ops: {S1S2, V1S2, S1V2});
2633 }
2634
2635 /// Handler for bitwise AND operator.
2636 void visitAnd(BinaryOperator &I) {
2637 IRBuilder<> IRB(&I);
2638 Value *V1 = I.getOperand(i_nocapture: 0);
2639 Value *V2 = I.getOperand(i_nocapture: 1);
2640 Value *S1 = getShadow(I: &I, i: 0);
2641 Value *S2 = getShadow(I: &I, i: 1);
2642
2643 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2644
2645 setShadow(V: &I, SV: OutShadow);
2646 setOriginForNaryOp(I);
2647 }
2648
2649 void visitOr(BinaryOperator &I) {
2650 IRBuilder<> IRB(&I);
2651 // "Or" of 1 and a poisoned value results in unpoisoned value:
2652 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2653 // 1|0 => 1; 0|0 => 0; p|0 => p;
2654 // 1|p => 1; 0|p => p; p|p => p;
2655 //
2656 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2657 //
2658 // If the "disjoint OR" property is violated, the result is poison, and
2659 // hence the entire shadow is uninitialized:
2660 // S = S | SignExt(V1 & V2 != 0)
2661 Value *S1 = getShadow(I: &I, i: 0);
2662 Value *S2 = getShadow(I: &I, i: 1);
2663 Value *V1 = I.getOperand(i_nocapture: 0);
2664 Value *V2 = I.getOperand(i_nocapture: 1);
2665 if (V1->getType() != S1->getType()) {
2666 V1 = IRB.CreateIntCast(V: V1, DestTy: S1->getType(), isSigned: false);
2667 V2 = IRB.CreateIntCast(V: V2, DestTy: S2->getType(), isSigned: false);
2668 }
2669
2670 Value *NotV1 = IRB.CreateNot(V: V1);
2671 Value *NotV2 = IRB.CreateNot(V: V2);
2672
2673 Value *S1S2 = IRB.CreateAnd(LHS: S1, RHS: S2);
2674 Value *S2NotV1 = IRB.CreateAnd(LHS: NotV1, RHS: S2);
2675 Value *S1NotV2 = IRB.CreateAnd(LHS: S1, RHS: NotV2);
2676
2677 Value *S = IRB.CreateOr(Ops: {S1S2, S2NotV1, S1NotV2});
2678
2679 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(Val: &I)->isDisjoint()) {
2680 Value *V1V2 = IRB.CreateAnd(LHS: V1, RHS: V2);
2681 Value *DisjointOrShadow = IRB.CreateSExt(
2682 V: IRB.CreateICmpNE(LHS: V1V2, RHS: getCleanShadow(V: V1V2)), DestTy: V1V2->getType());
2683 S = IRB.CreateOr(LHS: S, RHS: DisjointOrShadow, Name: "_ms_disjoint");
2684 }
2685
2686 setShadow(V: &I, SV: S);
2687 setOriginForNaryOp(I);
2688 }
2689
2690 /// Default propagation of shadow and/or origin.
2691 ///
2692 /// This class implements the general case of shadow propagation, used in all
2693 /// cases where we don't know and/or don't care about what the operation
2694 /// actually does. It converts all input shadow values to a common type
2695 /// (extending or truncating as necessary), and bitwise OR's them.
2696 ///
2697 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2698 /// fully initialized), and less prone to false positives.
2699 ///
2700 /// This class also implements the general case of origin propagation. For a
2701 /// Nary operation, result origin is set to the origin of an argument that is
2702 /// not entirely initialized. If there is more than one such arguments, the
2703 /// rightmost of them is picked. It does not matter which one is picked if all
2704 /// arguments are initialized.
2705 template <bool CombineShadow> class Combiner {
2706 Value *Shadow = nullptr;
2707 Value *Origin = nullptr;
2708 IRBuilder<> &IRB;
2709 MemorySanitizerVisitor *MSV;
2710
2711 public:
2712 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2713 : IRB(IRB), MSV(MSV) {}
2714
2715 /// Add a pair of shadow and origin values to the mix.
2716 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2717 if (CombineShadow) {
2718 assert(OpShadow);
2719 if (!Shadow)
2720 Shadow = OpShadow;
2721 else {
2722 OpShadow = MSV->CreateShadowCast(IRB, V: OpShadow, dstTy: Shadow->getType());
2723 Shadow = IRB.CreateOr(LHS: Shadow, RHS: OpShadow, Name: "_msprop");
2724 }
2725 }
2726
2727 if (MSV->MS.TrackOrigins) {
2728 assert(OpOrigin);
2729 if (!Origin) {
2730 Origin = OpOrigin;
2731 } else {
2732 Constant *ConstOrigin = dyn_cast<Constant>(Val: OpOrigin);
2733 // No point in adding something that might result in 0 origin value.
2734 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2735 Value *Cond = MSV->convertToBool(V: OpShadow, IRB);
2736 Origin = IRB.CreateSelect(C: Cond, True: OpOrigin, False: Origin);
2737 }
2738 }
2739 }
2740 return *this;
2741 }
2742
2743 /// Add an application value to the mix.
2744 Combiner &Add(Value *V) {
2745 Value *OpShadow = MSV->getShadow(V);
2746 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2747 return Add(OpShadow, OpOrigin);
2748 }
2749
2750 /// Set the current combined values as the given instruction's shadow
2751 /// and origin.
2752 void Done(Instruction *I) {
2753 if (CombineShadow) {
2754 assert(Shadow);
2755 Shadow = MSV->CreateShadowCast(IRB, V: Shadow, dstTy: MSV->getShadowTy(V: I));
2756 MSV->setShadow(V: I, SV: Shadow);
2757 }
2758 if (MSV->MS.TrackOrigins) {
2759 assert(Origin);
2760 MSV->setOrigin(V: I, Origin);
2761 }
2762 }
2763
2764 /// Store the current combined value at the specified origin
2765 /// location.
2766 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2767 if (MSV->MS.TrackOrigins) {
2768 assert(Origin);
2769 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, Alignment: kMinOriginAlignment);
2770 }
2771 }
2772 };
2773
2774 using ShadowAndOriginCombiner = Combiner<true>;
2775 using OriginCombiner = Combiner<false>;
2776
2777 /// Propagate origin for arbitrary operation.
2778 void setOriginForNaryOp(Instruction &I) {
2779 if (!MS.TrackOrigins)
2780 return;
2781 IRBuilder<> IRB(&I);
2782 OriginCombiner OC(this, IRB);
2783 for (Use &Op : I.operands())
2784 OC.Add(V: Op.get());
2785 OC.Done(I: &I);
2786 }
2787
2788 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2789 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2790 "Vector of pointers is not a valid shadow type");
2791 return Ty->isVectorTy() ? cast<FixedVectorType>(Val: Ty)->getNumElements() *
2792 Ty->getScalarSizeInBits()
2793 : Ty->getPrimitiveSizeInBits();
2794 }
2795
2796 /// Cast between two shadow types, extending or truncating as
2797 /// necessary.
2798 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2799 bool Signed = false) {
2800 Type *srcTy = V->getType();
2801 if (srcTy == dstTy)
2802 return V;
2803 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(Ty: srcTy);
2804 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(Ty: dstTy);
2805 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2806 return IRB.CreateICmpNE(LHS: V, RHS: getCleanShadow(V));
2807
2808 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2809 return IRB.CreateIntCast(V, DestTy: dstTy, isSigned: Signed);
2810 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2811 cast<VectorType>(Val: dstTy)->getElementCount() ==
2812 cast<VectorType>(Val: srcTy)->getElementCount())
2813 return IRB.CreateIntCast(V, DestTy: dstTy, isSigned: Signed);
2814 Value *V1 = IRB.CreateBitCast(V, DestTy: Type::getIntNTy(C&: *MS.C, N: srcSizeInBits));
2815 Value *V2 =
2816 IRB.CreateIntCast(V: V1, DestTy: Type::getIntNTy(C&: *MS.C, N: dstSizeInBits), isSigned: Signed);
2817 return IRB.CreateBitCast(V: V2, DestTy: dstTy);
2818 // TODO: handle struct types.
2819 }
2820
2821 /// Cast an application value to the type of its own shadow.
2822 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2823 Type *ShadowTy = getShadowTy(V);
2824 if (V->getType() == ShadowTy)
2825 return V;
2826 if (V->getType()->isPtrOrPtrVectorTy())
2827 return IRB.CreatePtrToInt(V, DestTy: ShadowTy);
2828 else
2829 return IRB.CreateBitCast(V, DestTy: ShadowTy);
2830 }
2831
2832 /// Propagate shadow for arbitrary operation.
2833 void handleShadowOr(Instruction &I) {
2834 IRBuilder<> IRB(&I);
2835 ShadowAndOriginCombiner SC(this, IRB);
2836 for (Use &Op : I.operands())
2837 SC.Add(V: Op.get());
2838 SC.Done(I: &I);
2839 }
2840
2841 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2842 // of elements.
2843 //
2844 // For example, suppose we have:
2845 // VectorA: <a0, a1, a2, a3, a4, a5>
2846 // VectorB: <b0, b1, b2, b3, b4, b5>
2847 // ReductionFactor: 3
2848 // Shards: 1
2849 // The output would be:
2850 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2851 //
2852 // If we have:
2853 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2854 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2855 // ReductionFactor: 2
2856 // Shards: 2
2857 // then a and be each have 2 "shards", resulting in the output being
2858 // interleaved:
2859 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2860 //
2861 // This is convenient for instrumenting horizontal add/sub.
2862 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2863 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2864 unsigned Shards, Value *VectorA, Value *VectorB) {
2865 assert(isa<FixedVectorType>(VectorA->getType()));
2866 unsigned NumElems =
2867 cast<FixedVectorType>(Val: VectorA->getType())->getNumElements();
2868
2869 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2870 if (VectorB) {
2871 assert(VectorA->getType() == VectorB->getType());
2872 TotalNumElems *= 2;
2873 }
2874
2875 assert(NumElems % (ReductionFactor * Shards) == 0);
2876
2877 Value *Or = nullptr;
2878
2879 IRBuilder<> IRB(&I);
2880 for (unsigned i = 0; i < ReductionFactor; i++) {
2881 SmallVector<int, 16> Mask;
2882
2883 for (unsigned j = 0; j < Shards; j++) {
2884 unsigned Offset = NumElems / Shards * j;
2885
2886 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2887 Mask.push_back(Elt: Offset + X + i);
2888
2889 if (VectorB) {
2890 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2891 Mask.push_back(Elt: NumElems + Offset + X + i);
2892 }
2893 }
2894
2895 Value *Masked;
2896 if (VectorB)
2897 Masked = IRB.CreateShuffleVector(V1: VectorA, V2: VectorB, Mask);
2898 else
2899 Masked = IRB.CreateShuffleVector(V: VectorA, Mask);
2900
2901 if (Or)
2902 Or = IRB.CreateOr(LHS: Or, RHS: Masked);
2903 else
2904 Or = Masked;
2905 }
2906
2907 return Or;
2908 }
2909
2910 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2911 /// fields.
2912 ///
2913 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2914 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2915 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2916 assert(I.arg_size() == 1 || I.arg_size() == 2);
2917
2918 assert(I.getType()->isVectorTy());
2919 assert(I.getArgOperand(0)->getType()->isVectorTy());
2920
2921 [[maybe_unused]] FixedVectorType *ParamType =
2922 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType());
2923 assert((I.arg_size() != 2) ||
2924 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2925 [[maybe_unused]] FixedVectorType *ReturnType =
2926 cast<FixedVectorType>(Val: I.getType());
2927 assert(ParamType->getNumElements() * I.arg_size() ==
2928 2 * ReturnType->getNumElements());
2929
2930 IRBuilder<> IRB(&I);
2931
2932 // Horizontal OR of shadow
2933 Value *FirstArgShadow = getShadow(I: &I, i: 0);
2934 Value *SecondArgShadow = nullptr;
2935 if (I.arg_size() == 2)
2936 SecondArgShadow = getShadow(I: &I, i: 1);
2937
2938 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2939 VectorA: FirstArgShadow, VectorB: SecondArgShadow);
2940
2941 OrShadow = CreateShadowCast(IRB, V: OrShadow, dstTy: getShadowTy(V: &I));
2942
2943 setShadow(V: &I, SV: OrShadow);
2944 setOriginForNaryOp(I);
2945 }
2946
2947 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2948 /// fields, with the parameters reinterpreted to have elements of a specified
2949 /// width. For example:
2950 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2951 /// conceptually operates on
2952 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2953 /// and can be handled with ReinterpretElemWidth == 16.
2954 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2955 int ReinterpretElemWidth) {
2956 assert(I.arg_size() == 1 || I.arg_size() == 2);
2957
2958 assert(I.getType()->isVectorTy());
2959 assert(I.getArgOperand(0)->getType()->isVectorTy());
2960
2961 FixedVectorType *ParamType =
2962 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType());
2963 assert((I.arg_size() != 2) ||
2964 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2965
2966 [[maybe_unused]] FixedVectorType *ReturnType =
2967 cast<FixedVectorType>(Val: I.getType());
2968 assert(ParamType->getNumElements() * I.arg_size() ==
2969 2 * ReturnType->getNumElements());
2970
2971 IRBuilder<> IRB(&I);
2972
2973 FixedVectorType *ReinterpretShadowTy = nullptr;
2974 assert(isAligned(Align(ReinterpretElemWidth),
2975 ParamType->getPrimitiveSizeInBits()));
2976 ReinterpretShadowTy = FixedVectorType::get(
2977 ElementType: IRB.getIntNTy(N: ReinterpretElemWidth),
2978 NumElts: ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2979
2980 // Horizontal OR of shadow
2981 Value *FirstArgShadow = getShadow(I: &I, i: 0);
2982 FirstArgShadow = IRB.CreateBitCast(V: FirstArgShadow, DestTy: ReinterpretShadowTy);
2983
2984 // If we had two parameters each with an odd number of elements, the total
2985 // number of elements is even, but we have never seen this in extant
2986 // instruction sets, so we enforce that each parameter must have an even
2987 // number of elements.
2988 assert(isAligned(
2989 Align(2),
2990 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2991
2992 Value *SecondArgShadow = nullptr;
2993 if (I.arg_size() == 2) {
2994 SecondArgShadow = getShadow(I: &I, i: 1);
2995 SecondArgShadow = IRB.CreateBitCast(V: SecondArgShadow, DestTy: ReinterpretShadowTy);
2996 }
2997
2998 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2999 VectorA: FirstArgShadow, VectorB: SecondArgShadow);
3000
3001 OrShadow = CreateShadowCast(IRB, V: OrShadow, dstTy: getShadowTy(V: &I));
3002
3003 setShadow(V: &I, SV: OrShadow);
3004 setOriginForNaryOp(I);
3005 }
3006
3007 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
3008
3009 // Handle multiplication by constant.
3010 //
3011 // Handle a special case of multiplication by constant that may have one or
3012 // more zeros in the lower bits. This makes corresponding number of lower bits
3013 // of the result zero as well. We model it by shifting the other operand
3014 // shadow left by the required number of bits. Effectively, we transform
3015 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
3016 // We use multiplication by 2**N instead of shift to cover the case of
3017 // multiplication by 0, which may occur in some elements of a vector operand.
3018 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
3019 Value *OtherArg) {
3020 Constant *ShadowMul;
3021 Type *Ty = ConstArg->getType();
3022 if (auto *VTy = dyn_cast<VectorType>(Val: Ty)) {
3023 unsigned NumElements = cast<FixedVectorType>(Val: VTy)->getNumElements();
3024 Type *EltTy = VTy->getElementType();
3025 SmallVector<Constant *, 16> Elements;
3026 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
3027 if (ConstantInt *Elt =
3028 dyn_cast<ConstantInt>(Val: ConstArg->getAggregateElement(Elt: Idx))) {
3029 const APInt &V = Elt->getValue();
3030 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3031 Elements.push_back(Elt: ConstantInt::get(Ty: EltTy, V: V2));
3032 } else {
3033 Elements.push_back(Elt: ConstantInt::get(Ty: EltTy, V: 1));
3034 }
3035 }
3036 ShadowMul = ConstantVector::get(V: Elements);
3037 } else {
3038 if (ConstantInt *Elt = dyn_cast<ConstantInt>(Val: ConstArg)) {
3039 const APInt &V = Elt->getValue();
3040 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3041 ShadowMul = ConstantInt::get(Ty, V: V2);
3042 } else {
3043 ShadowMul = ConstantInt::get(Ty, V: 1);
3044 }
3045 }
3046
3047 IRBuilder<> IRB(&I);
3048 setShadow(V: &I,
3049 SV: IRB.CreateMul(LHS: getShadow(V: OtherArg), RHS: ShadowMul, Name: "msprop_mul_cst"));
3050 setOrigin(V: &I, Origin: getOrigin(V: OtherArg));
3051 }
3052
3053 void visitMul(BinaryOperator &I) {
3054 Constant *constOp0 = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 0));
3055 Constant *constOp1 = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 1));
3056 if (constOp0 && !constOp1)
3057 handleMulByConstant(I, ConstArg: constOp0, OtherArg: I.getOperand(i_nocapture: 1));
3058 else if (constOp1 && !constOp0)
3059 handleMulByConstant(I, ConstArg: constOp1, OtherArg: I.getOperand(i_nocapture: 0));
3060 else
3061 handleShadowOr(I);
3062 }
3063
3064 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
3065 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
3066 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
3067 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
3068 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
3069 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
3070
3071 void handleIntegerDiv(Instruction &I) {
3072 IRBuilder<> IRB(&I);
3073 // Strict on the second argument.
3074 insertCheckShadowOf(Val: I.getOperand(i: 1), OrigIns: &I);
3075 setShadow(V: &I, SV: getShadow(I: &I, i: 0));
3076 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
3077 }
3078
3079 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3080 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3081 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
3082 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
3083
3084 // Floating point division is side-effect free. We can not require that the
3085 // divisor is fully initialized and must propagate shadow. See PR37523.
3086 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
3087 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
3088
3089 /// Instrument == and != comparisons.
3090 ///
3091 /// Sometimes the comparison result is known even if some of the bits of the
3092 /// arguments are not.
3093 void handleEqualityComparison(ICmpInst &I) {
3094 IRBuilder<> IRB(&I);
3095 Value *A = I.getOperand(i_nocapture: 0);
3096 Value *B = I.getOperand(i_nocapture: 1);
3097 Value *Sa = getShadow(V: A);
3098 Value *Sb = getShadow(V: B);
3099
3100 Value *Si = propagateEqualityComparison(IRB, A, B, Sa, Sb);
3101
3102 setShadow(V: &I, SV: Si);
3103 setOriginForNaryOp(I);
3104 }
3105
3106 /// Instrument relational comparisons.
3107 ///
3108 /// This function does exact shadow propagation for all relational
3109 /// comparisons of integers, pointers and vectors of those.
3110 /// FIXME: output seems suboptimal when one of the operands is a constant
3111 void handleRelationalComparisonExact(ICmpInst &I) {
3112 IRBuilder<> IRB(&I);
3113 Value *A = I.getOperand(i_nocapture: 0);
3114 Value *B = I.getOperand(i_nocapture: 1);
3115 Value *Sa = getShadow(V: A);
3116 Value *Sb = getShadow(V: B);
3117
3118 // Get rid of pointers and vectors of pointers.
3119 // For ints (and vectors of ints), types of A and Sa match,
3120 // and this is a no-op.
3121 A = IRB.CreatePointerCast(V: A, DestTy: Sa->getType());
3122 B = IRB.CreatePointerCast(V: B, DestTy: Sb->getType());
3123
3124 // Let [a0, a1] be the interval of possible values of A, taking into account
3125 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3126 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3127 bool IsSigned = I.isSigned();
3128
3129 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3130 if (IsSigned) {
3131 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3132 // should be preserved, if checked with `getUnsignedPredicate()`.
3133 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3134 // affected, as they are created by effectively adding/substructing from
3135 // A (or B) a value, derived from shadow, with no overflow, either
3136 // before or after sign flip.
3137 APInt MinVal =
3138 APInt::getSignedMinValue(numBits: V->getType()->getScalarSizeInBits());
3139 V = IRB.CreateXor(LHS: V, RHS: ConstantInt::get(Ty: V->getType(), V: MinVal));
3140 }
3141 // Minimize undefined bits.
3142 Value *Min = IRB.CreateAnd(LHS: V, RHS: IRB.CreateNot(V: S));
3143 Value *Max = IRB.CreateOr(LHS: V, RHS: S);
3144 return std::make_pair(x&: Min, y&: Max);
3145 };
3146
3147 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3148 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3149 Value *S1 = IRB.CreateICmp(P: I.getUnsignedPredicate(), LHS: Amin, RHS: Bmax);
3150 Value *S2 = IRB.CreateICmp(P: I.getUnsignedPredicate(), LHS: Amax, RHS: Bmin);
3151
3152 Value *Si = IRB.CreateXor(LHS: S1, RHS: S2);
3153 setShadow(V: &I, SV: Si);
3154 setOriginForNaryOp(I);
3155 }
3156
3157 /// Instrument signed relational comparisons.
3158 ///
3159 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3160 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3161 void handleSignedRelationalComparison(ICmpInst &I) {
3162 Constant *constOp;
3163 Value *op = nullptr;
3164 CmpInst::Predicate pre;
3165 if ((constOp = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 1)))) {
3166 op = I.getOperand(i_nocapture: 0);
3167 pre = I.getPredicate();
3168 } else if ((constOp = dyn_cast<Constant>(Val: I.getOperand(i_nocapture: 0)))) {
3169 op = I.getOperand(i_nocapture: 1);
3170 pre = I.getSwappedPredicate();
3171 } else {
3172 handleShadowOr(I);
3173 return;
3174 }
3175
3176 if ((constOp->isNullValue() &&
3177 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3178 (constOp->isAllOnesValue() &&
3179 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3180 IRBuilder<> IRB(&I);
3181 Value *Shadow = IRB.CreateICmpSLT(LHS: getShadow(V: op), RHS: getCleanShadow(V: op),
3182 Name: "_msprop_icmp_s");
3183 setShadow(V: &I, SV: Shadow);
3184 setOrigin(V: &I, Origin: getOrigin(V: op));
3185 } else {
3186 handleShadowOr(I);
3187 }
3188 }
3189
3190 void visitICmpInst(ICmpInst &I) {
3191 if (!ClHandleICmp) {
3192 handleShadowOr(I);
3193 return;
3194 }
3195 if (I.isEquality()) {
3196 handleEqualityComparison(I);
3197 return;
3198 }
3199
3200 assert(I.isRelational());
3201 if (ClHandleICmpExact) {
3202 handleRelationalComparisonExact(I);
3203 return;
3204 }
3205 if (I.isSigned()) {
3206 handleSignedRelationalComparison(I);
3207 return;
3208 }
3209
3210 assert(I.isUnsigned());
3211 if ((isa<Constant>(Val: I.getOperand(i_nocapture: 0)) || isa<Constant>(Val: I.getOperand(i_nocapture: 1)))) {
3212 handleRelationalComparisonExact(I);
3213 return;
3214 }
3215
3216 handleShadowOr(I);
3217 }
3218
3219 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3220
3221 void handleShift(BinaryOperator &I) {
3222 IRBuilder<> IRB(&I);
3223 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3224 // Otherwise perform the same shift on S1.
3225 Value *S1 = getShadow(I: &I, i: 0);
3226 Value *S2 = getShadow(I: &I, i: 1);
3227 Value *S2Conv =
3228 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S2, RHS: getCleanShadow(V: S2)), DestTy: S2->getType());
3229 Value *V2 = I.getOperand(i_nocapture: 1);
3230 Value *Shift = IRB.CreateBinOp(Opc: I.getOpcode(), LHS: S1, RHS: V2);
3231 setShadow(V: &I, SV: IRB.CreateOr(LHS: Shift, RHS: S2Conv));
3232 setOriginForNaryOp(I);
3233 }
3234
3235 void visitShl(BinaryOperator &I) { handleShift(I); }
3236 void visitAShr(BinaryOperator &I) { handleShift(I); }
3237 void visitLShr(BinaryOperator &I) { handleShift(I); }
3238
3239 void handleFunnelShift(IntrinsicInst &I) {
3240 IRBuilder<> IRB(&I);
3241 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3242 // Otherwise perform the same shift on S0 and S1.
3243 Value *S0 = getShadow(I: &I, i: 0);
3244 Value *S1 = getShadow(I: &I, i: 1);
3245 Value *S2 = getShadow(I: &I, i: 2);
3246 Value *S2Conv =
3247 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S2, RHS: getCleanShadow(V: S2)), DestTy: S2->getType());
3248 Value *V2 = I.getOperand(i_nocapture: 2);
3249 Value *Shift = IRB.CreateIntrinsic(ID: I.getIntrinsicID(), Types: S2Conv->getType(),
3250 Args: {S0, S1, V2});
3251 setShadow(V: &I, SV: IRB.CreateOr(LHS: Shift, RHS: S2Conv));
3252 setOriginForNaryOp(I);
3253 }
3254
3255 /// Instrument llvm.memmove
3256 ///
3257 /// At this point we don't know if llvm.memmove will be inlined or not.
3258 /// If we don't instrument it and it gets inlined,
3259 /// our interceptor will not kick in and we will lose the memmove.
3260 /// If we instrument the call here, but it does not get inlined,
3261 /// we will memmove the shadow twice: which is bad in case
3262 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3263 ///
3264 /// Similar situation exists for memcpy and memset.
3265 void visitMemMoveInst(MemMoveInst &I) {
3266 getShadow(V: I.getArgOperand(i: 1)); // Ensure shadow initialized
3267 IRBuilder<> IRB(&I);
3268 IRB.CreateCall(Callee: MS.MemmoveFn,
3269 Args: {I.getArgOperand(i: 0), I.getArgOperand(i: 1),
3270 IRB.CreateIntCast(V: I.getArgOperand(i: 2), DestTy: MS.IntptrTy, isSigned: false)});
3271 I.eraseFromParent();
3272 }
3273
3274 /// Instrument memcpy
3275 ///
3276 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3277 /// unfortunate as it may slowdown small constant memcpys.
3278 /// FIXME: consider doing manual inline for small constant sizes and proper
3279 /// alignment.
3280 ///
3281 /// Note: This also handles memcpy.inline, which promises no calls to external
3282 /// functions as an optimization. However, with instrumentation enabled this
3283 /// is difficult to promise; additionally, we know that the MSan runtime
3284 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3285 /// instrumentation it's safe to turn memcpy.inline into a call to
3286 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3287 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3288 void visitMemCpyInst(MemCpyInst &I) {
3289 getShadow(V: I.getArgOperand(i: 1)); // Ensure shadow initialized
3290 IRBuilder<> IRB(&I);
3291 IRB.CreateCall(Callee: MS.MemcpyFn,
3292 Args: {I.getArgOperand(i: 0), I.getArgOperand(i: 1),
3293 IRB.CreateIntCast(V: I.getArgOperand(i: 2), DestTy: MS.IntptrTy, isSigned: false)});
3294 I.eraseFromParent();
3295 }
3296
3297 // Same as memcpy.
3298 void visitMemSetInst(MemSetInst &I) {
3299 IRBuilder<> IRB(&I);
3300 IRB.CreateCall(
3301 Callee: MS.MemsetFn,
3302 Args: {I.getArgOperand(i: 0),
3303 IRB.CreateIntCast(V: I.getArgOperand(i: 1), DestTy: IRB.getInt32Ty(), isSigned: false),
3304 IRB.CreateIntCast(V: I.getArgOperand(i: 2), DestTy: MS.IntptrTy, isSigned: false)});
3305 I.eraseFromParent();
3306 }
3307
3308 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3309
3310 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3311
3312 /// Handle vector store-like intrinsics.
3313 ///
3314 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3315 /// has 1 pointer argument and 1 vector argument, returns void.
3316 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3317 assert(I.arg_size() == 2);
3318
3319 IRBuilder<> IRB(&I);
3320 Value *Addr = I.getArgOperand(i: 0);
3321 Value *Shadow = getShadow(I: &I, i: 1);
3322 Value *ShadowPtr, *OriginPtr;
3323
3324 // We don't know the pointer alignment (could be unaligned SSE store!).
3325 // Have to assume to worst case.
3326 std::tie(args&: ShadowPtr, args&: OriginPtr) = getShadowOriginPtr(
3327 Addr, IRB, ShadowTy: Shadow->getType(), Alignment: Align(1), /*isStore*/ true);
3328 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowPtr, Align: Align(1));
3329
3330 if (ClCheckAccessAddress)
3331 insertCheckShadowOf(Val: Addr, OrigIns: &I);
3332
3333 // FIXME: factor out common code from materializeStores
3334 if (MS.TrackOrigins)
3335 IRB.CreateStore(Val: getOrigin(I: &I, i: 1), Ptr: OriginPtr);
3336 return true;
3337 }
3338
3339 /// Handle vector load-like intrinsics.
3340 ///
3341 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3342 /// has 1 pointer argument, returns a vector.
3343 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3344 assert(I.arg_size() == 1);
3345
3346 IRBuilder<> IRB(&I);
3347 Value *Addr = I.getArgOperand(i: 0);
3348
3349 Type *ShadowTy = getShadowTy(V: &I);
3350 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3351 if (PropagateShadow) {
3352 // We don't know the pointer alignment (could be unaligned SSE load!).
3353 // Have to assume to worst case.
3354 const Align Alignment = Align(1);
3355 std::tie(args&: ShadowPtr, args&: OriginPtr) =
3356 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3357 setShadow(V: &I,
3358 SV: IRB.CreateAlignedLoad(Ty: ShadowTy, Ptr: ShadowPtr, Align: Alignment, Name: "_msld"));
3359 } else {
3360 setShadow(V: &I, SV: getCleanShadow(V: &I));
3361 }
3362
3363 if (ClCheckAccessAddress)
3364 insertCheckShadowOf(Val: Addr, OrigIns: &I);
3365
3366 if (MS.TrackOrigins) {
3367 if (PropagateShadow)
3368 setOrigin(V: &I, Origin: IRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr));
3369 else
3370 setOrigin(V: &I, Origin: getCleanOrigin());
3371 }
3372 return true;
3373 }
3374
3375 /// Handle (SIMD arithmetic)-like intrinsics.
3376 ///
3377 /// Instrument intrinsics with any number of arguments of the same type [*],
3378 /// equal to the return type, plus a specified number of trailing flags of
3379 /// any type.
3380 ///
3381 /// [*] The type should be simple (no aggregates or pointers; vectors are
3382 /// fine).
3383 ///
3384 /// Caller guarantees that this intrinsic does not access memory.
3385 ///
3386 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3387 /// by this handler. See horizontalReduce().
3388 ///
3389 /// TODO: permutation intrinsics are also often incorrectly matched.
3390 [[maybe_unused]] bool
3391 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3392 unsigned int trailingFlags) {
3393 Type *RetTy = I.getType();
3394 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3395 return false;
3396
3397 unsigned NumArgOperands = I.arg_size();
3398 assert(NumArgOperands >= trailingFlags);
3399 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3400 Type *Ty = I.getArgOperand(i)->getType();
3401 if (Ty != RetTy)
3402 return false;
3403 }
3404
3405 IRBuilder<> IRB(&I);
3406 ShadowAndOriginCombiner SC(this, IRB);
3407 for (unsigned i = 0; i < NumArgOperands; ++i)
3408 SC.Add(V: I.getArgOperand(i));
3409 SC.Done(I: &I);
3410
3411 return true;
3412 }
3413
3414 /// Returns whether it was able to heuristically instrument unknown
3415 /// intrinsics.
3416 ///
3417 /// The main purpose of this code is to do something reasonable with all
3418 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3419 /// We recognize several classes of intrinsics by their argument types and
3420 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3421 /// sure that we know what the intrinsic does.
3422 ///
3423 /// We special-case intrinsics where this approach fails. See llvm.bswap
3424 /// handling as an example of that.
3425 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3426 unsigned NumArgOperands = I.arg_size();
3427 if (NumArgOperands == 0)
3428 return false;
3429
3430 if (NumArgOperands == 2 && I.getArgOperand(i: 0)->getType()->isPointerTy() &&
3431 I.getArgOperand(i: 1)->getType()->isVectorTy() &&
3432 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3433 // This looks like a vector store.
3434 return handleVectorStoreIntrinsic(I);
3435 }
3436
3437 if (NumArgOperands == 1 && I.getArgOperand(i: 0)->getType()->isPointerTy() &&
3438 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3439 // This looks like a vector load.
3440 return handleVectorLoadIntrinsic(I);
3441 }
3442
3443 if (I.doesNotAccessMemory())
3444 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3445 return true;
3446
3447 // FIXME: detect and handle SSE maskstore/maskload?
3448 // Some cases are now handled in handleAVXMasked{Load,Store}.
3449 return false;
3450 }
3451
3452 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3453 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3454 if (ClDumpHeuristicInstructions)
3455 dumpInst(I);
3456
3457 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3458 << "\n");
3459 return true;
3460 } else
3461 return false;
3462 }
3463
3464 void handleInvariantGroup(IntrinsicInst &I) {
3465 setShadow(V: &I, SV: getShadow(I: &I, i: 0));
3466 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
3467 }
3468
3469 void handleLifetimeStart(IntrinsicInst &I) {
3470 if (!PoisonStack)
3471 return;
3472 AllocaInst *AI = dyn_cast<AllocaInst>(Val: I.getArgOperand(i: 0));
3473 if (AI)
3474 LifetimeStartList.push_back(Elt: std::make_pair(x: &I, y&: AI));
3475 }
3476
3477 void handleBswap(IntrinsicInst &I) {
3478 IRBuilder<> IRB(&I);
3479 Value *Op = I.getArgOperand(i: 0);
3480 Type *OpType = Op->getType();
3481 setShadow(V: &I, SV: IRB.CreateIntrinsic(ID: Intrinsic::bswap, Types: ArrayRef(&OpType, 1),
3482 Args: getShadow(V: Op)));
3483 setOrigin(V: &I, Origin: getOrigin(V: Op));
3484 }
3485
3486 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3487 // and a 1. If the input is all zero, it is fully initialized iff
3488 // !is_zero_poison.
3489 //
3490 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3491 // concrete value 0/1, and ? is an uninitialized bit:
3492 // - 0001 0??? is fully initialized
3493 // - 000? ???? is fully uninitialized (*)
3494 // - ???? ???? is fully uninitialized
3495 // - 0000 0000 is fully uninitialized if is_zero_poison,
3496 // fully initialized otherwise
3497 //
3498 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3499 // only need to poison 4 bits.
3500 //
3501 // OutputShadow =
3502 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3503 // || (is_zero_poison && AllZeroSrc)
3504 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3505 IRBuilder<> IRB(&I);
3506 Value *Src = I.getArgOperand(i: 0);
3507 Value *SrcShadow = getShadow(V: Src);
3508
3509 Value *False = IRB.getInt1(V: false);
3510 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3511 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {Src, /*is_zero_poison=*/False});
3512 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3513 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {SrcShadow, /*is_zero_poison=*/False});
3514
3515 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3516 LHS: ConcreteZerosCount, RHS: ShadowZerosCount, Name: "_mscz_cmp_zeros");
3517
3518 Value *NotAllZeroShadow =
3519 IRB.CreateIsNotNull(Arg: SrcShadow, Name: "_mscz_shadow_not_null");
3520 Value *OutputShadow =
3521 IRB.CreateAnd(LHS: CompareConcreteZeros, RHS: NotAllZeroShadow, Name: "_mscz_main");
3522
3523 // If zero poison is requested, mix in with the shadow
3524 Constant *IsZeroPoison = cast<Constant>(Val: I.getOperand(i_nocapture: 1));
3525 if (!IsZeroPoison->isNullValue()) {
3526 Value *BoolZeroPoison = IRB.CreateIsNull(Arg: Src, Name: "_mscz_bzp");
3527 OutputShadow = IRB.CreateOr(LHS: OutputShadow, RHS: BoolZeroPoison, Name: "_mscz_bs");
3528 }
3529
3530 OutputShadow = IRB.CreateSExt(V: OutputShadow, DestTy: getShadowTy(V: Src), Name: "_mscz_os");
3531
3532 setShadow(V: &I, SV: OutputShadow);
3533 setOriginForNaryOp(I);
3534 }
3535
3536 /// Handle Arm NEON vector convert intrinsics.
3537 ///
3538 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3539 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
3540 ///
3541 /// For conversions to or from fixed-point, there is a trailing argument to
3542 /// indicate the fixed-point precision:
3543 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
3544 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
3545 ///
3546 /// For x86 SSE vector convert intrinsics, see
3547 /// handleSSEVectorConvertIntrinsic().
3548 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I, bool FixedPoint) {
3549 if (FixedPoint)
3550 assert(I.arg_size() == 2);
3551 else
3552 assert(I.arg_size() == 1);
3553
3554 IRBuilder<> IRB(&I);
3555 Value *S0 = getShadow(I: &I, i: 0);
3556
3557 if (FixedPoint) {
3558 Value *Precision = I.getOperand(i_nocapture: 1);
3559 insertCheckShadowOf(Val: Precision, OrigIns: &I);
3560 }
3561
3562 /// For scalars:
3563 /// Since they are converting from floating-point to integer, the output is
3564 /// - fully uninitialized if *any* bit of the input is uninitialized
3565 /// - fully ininitialized if all bits of the input are ininitialized
3566 /// We apply the same principle on a per-field basis for vectors.
3567 Value *OutShadow = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S0, RHS: getCleanShadow(V: S0)),
3568 DestTy: getShadowTy(V: &I));
3569 setShadow(V: &I, SV: OutShadow);
3570 setOriginForNaryOp(I);
3571 }
3572
3573 /// Some instructions have additional zero-elements in the return type
3574 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3575 ///
3576 /// This function will return a vector type with the same number of elements
3577 /// as the input, but same per-element width as the return value e.g.,
3578 /// <8 x i8>.
3579 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3580 assert(isa<FixedVectorType>(getShadowTy(&I)));
3581 FixedVectorType *ShadowType = cast<FixedVectorType>(Val: getShadowTy(V: &I));
3582
3583 // TODO: generalize beyond 2x?
3584 if (ShadowType->getElementCount() ==
3585 cast<VectorType>(Val: Src->getType())->getElementCount() * 2)
3586 ShadowType = FixedVectorType::getHalfElementsVectorType(VTy: ShadowType);
3587
3588 assert(ShadowType->getElementCount() ==
3589 cast<VectorType>(Src->getType())->getElementCount());
3590
3591 return ShadowType;
3592 }
3593
3594 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3595 /// to match the length of the shadow for the instruction.
3596 /// If scalar types of the vectors are different, it will use the type of the
3597 /// input vector.
3598 /// This is more type-safe than CreateShadowCast().
3599 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3600 IRBuilder<> IRB(&I);
3601 assert(isa<FixedVectorType>(Shadow->getType()));
3602 assert(isa<FixedVectorType>(I.getType()));
3603
3604 Value *FullShadow = getCleanShadow(V: &I);
3605 unsigned ShadowNumElems =
3606 cast<FixedVectorType>(Val: Shadow->getType())->getNumElements();
3607 unsigned FullShadowNumElems =
3608 cast<FixedVectorType>(Val: FullShadow->getType())->getNumElements();
3609
3610 assert((ShadowNumElems == FullShadowNumElems) ||
3611 (ShadowNumElems * 2 == FullShadowNumElems));
3612
3613 if (ShadowNumElems == FullShadowNumElems) {
3614 FullShadow = Shadow;
3615 } else {
3616 // TODO: generalize beyond 2x?
3617 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3618 std::iota(first: ShadowMask.begin(), last: ShadowMask.end(), value: 0);
3619
3620 // Append zeros
3621 FullShadow =
3622 IRB.CreateShuffleVector(V1: Shadow, V2: getCleanShadow(V: Shadow), Mask: ShadowMask);
3623 }
3624
3625 return FullShadow;
3626 }
3627
3628 /// Handle x86 SSE vector conversion.
3629 ///
3630 /// e.g., single-precision to half-precision conversion:
3631 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3632 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3633 ///
3634 /// floating-point to integer:
3635 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3636 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3637 ///
3638 /// Note: if the output has more elements, they are zero-initialized (and
3639 /// therefore the shadow will also be initialized).
3640 ///
3641 /// This differs from handleSSEVectorConvertIntrinsic() because it
3642 /// propagates uninitialized shadow (instead of checking the shadow).
3643 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3644 bool HasRoundingMode) {
3645 if (HasRoundingMode) {
3646 assert(I.arg_size() == 2);
3647 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(i: 1);
3648 assert(RoundingMode->getType()->isIntegerTy());
3649 } else {
3650 assert(I.arg_size() == 1);
3651 }
3652
3653 Value *Src = I.getArgOperand(i: 0);
3654 assert(Src->getType()->isVectorTy());
3655
3656 // The return type might have more elements than the input.
3657 // Temporarily shrink the return type's number of elements.
3658 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3659
3660 IRBuilder<> IRB(&I);
3661 Value *S0 = getShadow(I: &I, i: 0);
3662
3663 /// For scalars:
3664 /// Since they are converting to and/or from floating-point, the output is:
3665 /// - fully uninitialized if *any* bit of the input is uninitialized
3666 /// - fully ininitialized if all bits of the input are ininitialized
3667 /// We apply the same principle on a per-field basis for vectors.
3668 Value *Shadow =
3669 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S0, RHS: getCleanShadow(V: S0)), DestTy: ShadowType);
3670
3671 // The return type might have more elements than the input.
3672 // Extend the return type back to its original width if necessary.
3673 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3674
3675 setShadow(V: &I, SV: FullShadow);
3676 setOriginForNaryOp(I);
3677 }
3678
3679 // Instrument x86 SSE vector convert intrinsic.
3680 //
3681 // This function instruments intrinsics like cvtsi2ss:
3682 // %Out = int_xxx_cvtyyy(%ConvertOp)
3683 // or
3684 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3685 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3686 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3687 // elements from \p CopyOp.
3688 // In most cases conversion involves floating-point value which may trigger a
3689 // hardware exception when not fully initialized. For this reason we require
3690 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3691 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3692 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3693 // return a fully initialized value.
3694 //
3695 // For Arm NEON vector convert intrinsics, see
3696 // handleNEONVectorConvertIntrinsic().
3697 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3698 bool HasRoundingMode = false) {
3699 IRBuilder<> IRB(&I);
3700 Value *CopyOp, *ConvertOp;
3701
3702 assert((!HasRoundingMode ||
3703 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3704 "Invalid rounding mode");
3705
3706 switch (I.arg_size() - HasRoundingMode) {
3707 case 2:
3708 CopyOp = I.getArgOperand(i: 0);
3709 ConvertOp = I.getArgOperand(i: 1);
3710 break;
3711 case 1:
3712 ConvertOp = I.getArgOperand(i: 0);
3713 CopyOp = nullptr;
3714 break;
3715 default:
3716 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3717 }
3718
3719 // The first *NumUsedElements* elements of ConvertOp are converted to the
3720 // same number of output elements. The rest of the output is copied from
3721 // CopyOp, or (if not available) filled with zeroes.
3722 // Combine shadow for elements of ConvertOp that are used in this operation,
3723 // and insert a check.
3724 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3725 // int->any conversion.
3726 Value *ConvertShadow = getShadow(V: ConvertOp);
3727 Value *AggShadow = nullptr;
3728 if (ConvertOp->getType()->isVectorTy()) {
3729 AggShadow = IRB.CreateExtractElement(
3730 Vec: ConvertShadow, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0));
3731 for (int i = 1; i < NumUsedElements; ++i) {
3732 Value *MoreShadow = IRB.CreateExtractElement(
3733 Vec: ConvertShadow, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
3734 AggShadow = IRB.CreateOr(LHS: AggShadow, RHS: MoreShadow);
3735 }
3736 } else {
3737 AggShadow = ConvertShadow;
3738 }
3739 assert(AggShadow->getType()->isIntegerTy());
3740 insertCheckShadow(Shadow: AggShadow, Origin: getOrigin(V: ConvertOp), OrigIns: &I);
3741
3742 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3743 // ConvertOp.
3744 if (CopyOp) {
3745 assert(CopyOp->getType() == I.getType());
3746 assert(CopyOp->getType()->isVectorTy());
3747 Value *ResultShadow = getShadow(V: CopyOp);
3748 Type *EltTy = cast<VectorType>(Val: ResultShadow->getType())->getElementType();
3749 for (int i = 0; i < NumUsedElements; ++i) {
3750 ResultShadow = IRB.CreateInsertElement(
3751 Vec: ResultShadow, NewElt: ConstantInt::getNullValue(Ty: EltTy),
3752 Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: i));
3753 }
3754 setShadow(V: &I, SV: ResultShadow);
3755 setOrigin(V: &I, Origin: getOrigin(V: CopyOp));
3756 } else {
3757 setShadow(V: &I, SV: getCleanShadow(V: &I));
3758 setOrigin(V: &I, Origin: getCleanOrigin());
3759 }
3760 }
3761
3762 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3763 // zeroes if it is zero, and all ones otherwise.
3764 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3765 if (S->getType()->isVectorTy())
3766 S = CreateShadowCast(IRB, V: S, dstTy: IRB.getInt64Ty(), /* Signed */ true);
3767 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3768 Value *S2 = IRB.CreateICmpNE(LHS: S, RHS: getCleanShadow(V: S));
3769 return CreateShadowCast(IRB, V: S2, dstTy: T, /* Signed */ true);
3770 }
3771
3772 // Given a vector, extract its first element, and return all
3773 // zeroes if it is zero, and all ones otherwise.
3774 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3775 Value *S1 = IRB.CreateExtractElement(Vec: S, Idx: (uint64_t)0);
3776 Value *S2 = IRB.CreateICmpNE(LHS: S1, RHS: getCleanShadow(V: S1));
3777 return CreateShadowCast(IRB, V: S2, dstTy: T, /* Signed */ true);
3778 }
3779
3780 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3781 Type *T = S->getType();
3782 assert(T->isVectorTy());
3783 Value *S2 = IRB.CreateICmpNE(LHS: S, RHS: getCleanShadow(V: S));
3784 return IRB.CreateSExt(V: S2, DestTy: T);
3785 }
3786
3787 // Instrument vector shift intrinsic.
3788 //
3789 // This function instruments intrinsics like int_x86_avx2_psll_w.
3790 // Intrinsic shifts %In by %ShiftSize bits.
3791 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3792 // size, and the rest is ignored. Behavior is defined even if shift size is
3793 // greater than register (or field) width.
3794 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3795 assert(I.arg_size() == 2);
3796 IRBuilder<> IRB(&I);
3797 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3798 // Otherwise perform the same shift on S1.
3799 Value *S1 = getShadow(I: &I, i: 0);
3800 Value *S2 = getShadow(I: &I, i: 1);
3801 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S: S2)
3802 : Lower64ShadowExtend(IRB, S: S2, T: getShadowTy(V: &I));
3803 Value *V1 = I.getOperand(i_nocapture: 0);
3804 Value *V2 = I.getOperand(i_nocapture: 1);
3805 Value *Shift = IRB.CreateCall(FTy: I.getFunctionType(), Callee: I.getCalledOperand(),
3806 Args: {IRB.CreateBitCast(V: S1, DestTy: V1->getType()), V2});
3807 Shift = IRB.CreateBitCast(V: Shift, DestTy: getShadowTy(V: &I));
3808 setShadow(V: &I, SV: IRB.CreateOr(LHS: Shift, RHS: S2Conv));
3809 setOriginForNaryOp(I);
3810 }
3811
3812 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3813 // vectors.
3814 Type *getMMXVectorTy(unsigned EltSizeInBits,
3815 unsigned X86_MMXSizeInBits = 64) {
3816 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3817 "Illegal MMX vector element size");
3818 return FixedVectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: EltSizeInBits),
3819 NumElts: X86_MMXSizeInBits / EltSizeInBits);
3820 }
3821
3822 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3823 // intrinsic.
3824 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3825 switch (id) {
3826 case Intrinsic::x86_sse2_packsswb_128:
3827 case Intrinsic::x86_sse2_packuswb_128:
3828 return Intrinsic::x86_sse2_packsswb_128;
3829
3830 case Intrinsic::x86_sse2_packssdw_128:
3831 case Intrinsic::x86_sse41_packusdw:
3832 return Intrinsic::x86_sse2_packssdw_128;
3833
3834 case Intrinsic::x86_avx2_packsswb:
3835 case Intrinsic::x86_avx2_packuswb:
3836 return Intrinsic::x86_avx2_packsswb;
3837
3838 case Intrinsic::x86_avx2_packssdw:
3839 case Intrinsic::x86_avx2_packusdw:
3840 return Intrinsic::x86_avx2_packssdw;
3841
3842 case Intrinsic::x86_mmx_packsswb:
3843 case Intrinsic::x86_mmx_packuswb:
3844 return Intrinsic::x86_mmx_packsswb;
3845
3846 case Intrinsic::x86_mmx_packssdw:
3847 return Intrinsic::x86_mmx_packssdw;
3848
3849 case Intrinsic::x86_avx512_packssdw_512:
3850 case Intrinsic::x86_avx512_packusdw_512:
3851 return Intrinsic::x86_avx512_packssdw_512;
3852
3853 case Intrinsic::x86_avx512_packsswb_512:
3854 case Intrinsic::x86_avx512_packuswb_512:
3855 return Intrinsic::x86_avx512_packsswb_512;
3856
3857 default:
3858 llvm_unreachable("unexpected intrinsic id");
3859 }
3860 }
3861
3862 // Instrument vector pack intrinsic.
3863 //
3864 // This function instruments intrinsics like x86_mmx_packsswb, that
3865 // packs elements of 2 input vectors into half as many bits with saturation.
3866 // Shadow is propagated with the signed variant of the same intrinsic applied
3867 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3868 // MMXEltSizeInBits is used only for x86mmx arguments.
3869 //
3870 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3871 void handleVectorPackIntrinsic(IntrinsicInst &I,
3872 unsigned MMXEltSizeInBits = 0) {
3873 assert(I.arg_size() == 2);
3874 IRBuilder<> IRB(&I);
3875 Value *S1 = getShadow(I: &I, i: 0);
3876 Value *S2 = getShadow(I: &I, i: 1);
3877 assert(S1->getType()->isVectorTy());
3878
3879 // SExt and ICmpNE below must apply to individual elements of input vectors.
3880 // In case of x86mmx arguments, cast them to appropriate vector types and
3881 // back.
3882 Type *T =
3883 MMXEltSizeInBits ? getMMXVectorTy(EltSizeInBits: MMXEltSizeInBits) : S1->getType();
3884 if (MMXEltSizeInBits) {
3885 S1 = IRB.CreateBitCast(V: S1, DestTy: T);
3886 S2 = IRB.CreateBitCast(V: S2, DestTy: T);
3887 }
3888 Value *S1_ext =
3889 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S1, RHS: Constant::getNullValue(Ty: T)), DestTy: T);
3890 Value *S2_ext =
3891 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S2, RHS: Constant::getNullValue(Ty: T)), DestTy: T);
3892 if (MMXEltSizeInBits) {
3893 S1_ext = IRB.CreateBitCast(V: S1_ext, DestTy: getMMXVectorTy(EltSizeInBits: 64));
3894 S2_ext = IRB.CreateBitCast(V: S2_ext, DestTy: getMMXVectorTy(EltSizeInBits: 64));
3895 }
3896
3897 Value *S = IRB.CreateIntrinsic(ID: getSignedPackIntrinsic(id: I.getIntrinsicID()),
3898 Args: {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3899 Name: "_msprop_vector_pack");
3900 if (MMXEltSizeInBits)
3901 S = IRB.CreateBitCast(V: S, DestTy: getShadowTy(V: &I));
3902 setShadow(V: &I, SV: S);
3903 setOriginForNaryOp(I);
3904 }
3905
3906 // Convert `Mask` into `<n x i1>`.
3907 Constant *createDppMask(unsigned Width, unsigned Mask) {
3908 SmallVector<Constant *, 4> R(Width);
3909 for (auto &M : R) {
3910 M = ConstantInt::getBool(Context&: F.getContext(), V: Mask & 1);
3911 Mask >>= 1;
3912 }
3913 return ConstantVector::get(V: R);
3914 }
3915
3916 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3917 // arg is poisoned, entire dot product is poisoned.
3918 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3919 unsigned DstMask) {
3920 const unsigned Width =
3921 cast<FixedVectorType>(Val: S->getType())->getNumElements();
3922
3923 S = IRB.CreateSelect(C: createDppMask(Width, Mask: SrcMask), True: S,
3924 False: Constant::getNullValue(Ty: S->getType()));
3925 Value *SElem = IRB.CreateOrReduce(Src: S);
3926 Value *IsClean = IRB.CreateIsNull(Arg: SElem, Name: "_msdpp");
3927 Value *DstMaskV = createDppMask(Width, Mask: DstMask);
3928
3929 return IRB.CreateSelect(
3930 C: IsClean, True: Constant::getNullValue(Ty: DstMaskV->getType()), False: DstMaskV);
3931 }
3932
3933 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3934 //
3935 // 2 and 4 element versions produce single scalar of dot product, and then
3936 // puts it into elements of output vector, selected by 4 lowest bits of the
3937 // mask. Top 4 bits of the mask control which elements of input to use for dot
3938 // product.
3939 //
3940 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3941 // mask. According to the spec it just operates as 4 element version on first
3942 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3943 // output.
3944 void handleDppIntrinsic(IntrinsicInst &I) {
3945 IRBuilder<> IRB(&I);
3946
3947 Value *S0 = getShadow(I: &I, i: 0);
3948 Value *S1 = getShadow(I: &I, i: 1);
3949 Value *S = IRB.CreateOr(LHS: S0, RHS: S1);
3950
3951 const unsigned Width =
3952 cast<FixedVectorType>(Val: S->getType())->getNumElements();
3953 assert(Width == 2 || Width == 4 || Width == 8);
3954
3955 const unsigned Mask = cast<ConstantInt>(Val: I.getArgOperand(i: 2))->getZExtValue();
3956 const unsigned SrcMask = Mask >> 4;
3957 const unsigned DstMask = Mask & 0xf;
3958
3959 // Calculate shadow as `<n x i1>`.
3960 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3961 if (Width == 8) {
3962 // First 4 elements of shadow are already calculated. `makeDppShadow`
3963 // operats on 32 bit masks, so we can just shift masks, and repeat.
3964 SI1 = IRB.CreateOr(
3965 LHS: SI1, RHS: findDppPoisonedOutput(IRB, S, SrcMask: SrcMask << 4, DstMask: DstMask << 4));
3966 }
3967 // Extend to real size of shadow, poisoning either all or none bits of an
3968 // element.
3969 S = IRB.CreateSExt(V: SI1, DestTy: S->getType(), Name: "_msdpp");
3970
3971 setShadow(V: &I, SV: S);
3972 setOriginForNaryOp(I);
3973 }
3974
3975 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3976 C = CreateAppToShadowCast(IRB, V: C);
3977 FixedVectorType *FVT = cast<FixedVectorType>(Val: C->getType());
3978 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3979 C = IRB.CreateAShr(LHS: C, RHS: ElSize - 1);
3980 FVT = FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: FVT->getNumElements());
3981 return IRB.CreateTrunc(V: C, DestTy: FVT);
3982 }
3983
3984 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3985 void handleBlendvIntrinsic(IntrinsicInst &I) {
3986 Value *C = I.getOperand(i_nocapture: 2);
3987 Value *T = I.getOperand(i_nocapture: 1);
3988 Value *F = I.getOperand(i_nocapture: 0);
3989
3990 Value *Sc = getShadow(I: &I, i: 2);
3991 Value *Oc = MS.TrackOrigins ? getOrigin(V: C) : nullptr;
3992
3993 {
3994 IRBuilder<> IRB(&I);
3995 // Extract top bit from condition and its shadow.
3996 C = convertBlendvToSelectMask(IRB, C);
3997 Sc = convertBlendvToSelectMask(IRB, C: Sc);
3998
3999 setShadow(V: C, SV: Sc);
4000 setOrigin(V: C, Origin: Oc);
4001 }
4002
4003 handleSelectLikeInst(I, B: C, C: T, D: F);
4004 }
4005
4006 // Instrument sum-of-absolute-differences intrinsic.
4007 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
4008 const unsigned SignificantBitsPerResultElement = 16;
4009 Type *ResTy = IsMMX ? IntegerType::get(C&: *MS.C, NumBits: 64) : I.getType();
4010 unsigned ZeroBitsPerResultElement =
4011 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
4012
4013 IRBuilder<> IRB(&I);
4014 auto *Shadow0 = getShadow(I: &I, i: 0);
4015 auto *Shadow1 = getShadow(I: &I, i: 1);
4016 Value *S = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4017 S = IRB.CreateBitCast(V: S, DestTy: ResTy);
4018 S = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: S, RHS: Constant::getNullValue(Ty: ResTy)),
4019 DestTy: ResTy);
4020 S = IRB.CreateLShr(LHS: S, RHS: ZeroBitsPerResultElement);
4021 S = IRB.CreateBitCast(V: S, DestTy: getShadowTy(V: &I));
4022 setShadow(V: &I, SV: S);
4023 setOriginForNaryOp(I);
4024 }
4025
4026 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
4027 //
4028 // e.g., Two operands:
4029 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
4030 //
4031 // Two operands which require an EltSizeInBits override:
4032 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
4033 //
4034 // Three operands:
4035 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
4036 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
4037 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
4038 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
4039 // (these are equivalent to multiply-add on %a and %b, followed by
4040 // adding/"accumulating" %s. "Accumulation" stores the result in one
4041 // of the source registers, but this accumulate vs. add distinction
4042 // is lost when dealing with LLVM intrinsics.)
4043 //
4044 // ZeroPurifies means that multiplying a known-zero with an uninitialized
4045 // value results in an initialized value. This is applicable for integer
4046 // multiplication, but not floating-point (counter-example: NaN).
4047 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
4048 unsigned ReductionFactor,
4049 bool ZeroPurifies,
4050 unsigned EltSizeInBits,
4051 enum OddOrEvenLanes Lanes) {
4052 IRBuilder<> IRB(&I);
4053
4054 [[maybe_unused]] FixedVectorType *ReturnType =
4055 cast<FixedVectorType>(Val: I.getType());
4056 assert(isa<FixedVectorType>(ReturnType));
4057
4058 // Vectors A and B, and shadows
4059 Value *Va = nullptr;
4060 Value *Vb = nullptr;
4061 Value *Sa = nullptr;
4062 Value *Sb = nullptr;
4063
4064 assert(I.arg_size() == 2 || I.arg_size() == 3);
4065 if (I.arg_size() == 2) {
4066 assert(Lanes == kBothLanes);
4067
4068 Va = I.getOperand(i_nocapture: 0);
4069 Vb = I.getOperand(i_nocapture: 1);
4070
4071 Sa = getShadow(I: &I, i: 0);
4072 Sb = getShadow(I: &I, i: 1);
4073 } else if (I.arg_size() == 3) {
4074 // Operand 0 is the accumulator. We will deal with that below.
4075 Va = I.getOperand(i_nocapture: 1);
4076 Vb = I.getOperand(i_nocapture: 2);
4077
4078 Sa = getShadow(I: &I, i: 1);
4079 Sb = getShadow(I: &I, i: 2);
4080
4081 if (Lanes == kEvenLanes || Lanes == kOddLanes) {
4082 // Convert < S0, S1, S2, S3, S4, S5, S6, S7 >
4083 // to < S0, S0, S2, S2, S4, S4, S6, S6 > (if even)
4084 // to < S1, S1, S3, S3, S5, S5, S7, S7 > (if odd)
4085 //
4086 // Note: for aarch64.neon.bfmlalb/t, the odd/even-indexed values are
4087 // zeroed, not duplicated. However, for shadow propagation, this
4088 // distinction is unimportant because Step 1 below will squeeze
4089 // each pair of elements (e.g., [S0, S0]) into a single bit, and
4090 // we only care if it is fully initialized.
4091
4092 FixedVectorType *InputShadowType = cast<FixedVectorType>(Val: Sa->getType());
4093 unsigned Width = InputShadowType->getNumElements();
4094
4095 Sa = IRB.CreateShuffleVector(
4096 V: Sa, Mask: getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4097 Sb = IRB.CreateShuffleVector(
4098 V: Sb, Mask: getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4099 }
4100 }
4101
4102 FixedVectorType *ParamType = cast<FixedVectorType>(Val: Va->getType());
4103 assert(ParamType == Vb->getType());
4104
4105 assert(ParamType->getPrimitiveSizeInBits() ==
4106 ReturnType->getPrimitiveSizeInBits());
4107
4108 if (I.arg_size() == 3) {
4109 [[maybe_unused]] auto *AccumulatorType =
4110 cast<FixedVectorType>(Val: I.getOperand(i_nocapture: 0)->getType());
4111 assert(AccumulatorType == ReturnType);
4112 }
4113
4114 FixedVectorType *ImplicitReturnType =
4115 cast<FixedVectorType>(Val: getShadowTy(OrigTy: ReturnType));
4116 // Step 1: instrument multiplication of corresponding vector elements
4117 if (EltSizeInBits) {
4118 ImplicitReturnType = cast<FixedVectorType>(
4119 Val: getMMXVectorTy(EltSizeInBits: EltSizeInBits * ReductionFactor,
4120 X86_MMXSizeInBits: ParamType->getPrimitiveSizeInBits()));
4121 ParamType = cast<FixedVectorType>(
4122 Val: getMMXVectorTy(EltSizeInBits, X86_MMXSizeInBits: ParamType->getPrimitiveSizeInBits()));
4123
4124 Va = IRB.CreateBitCast(V: Va, DestTy: ParamType);
4125 Vb = IRB.CreateBitCast(V: Vb, DestTy: ParamType);
4126
4127 Sa = IRB.CreateBitCast(V: Sa, DestTy: getShadowTy(OrigTy: ParamType));
4128 Sb = IRB.CreateBitCast(V: Sb, DestTy: getShadowTy(OrigTy: ParamType));
4129 } else {
4130 assert(ParamType->getNumElements() ==
4131 ReturnType->getNumElements() * ReductionFactor);
4132 }
4133
4134 // Each element of the vector is represented by a single bit (poisoned or
4135 // not) e.g., <8 x i1>.
4136 Value *SaNonZero = IRB.CreateIsNotNull(Arg: Sa);
4137 Value *SbNonZero = IRB.CreateIsNotNull(Arg: Sb);
4138 Value *And;
4139 if (ZeroPurifies) {
4140 // Multiplying an *initialized* zero by an uninitialized element results
4141 // in an initialized zero element.
4142 //
4143 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4144 // results in an unpoisoned value.
4145 Value *VaInt = Va;
4146 Value *VbInt = Vb;
4147 if (!Va->getType()->isIntegerTy()) {
4148 VaInt = CreateAppToShadowCast(IRB, V: Va);
4149 VbInt = CreateAppToShadowCast(IRB, V: Vb);
4150 }
4151
4152 // We check for non-zero on a per-element basis, not per-bit.
4153 Value *VaNonZero = IRB.CreateIsNotNull(Arg: VaInt);
4154 Value *VbNonZero = IRB.CreateIsNotNull(Arg: VbInt);
4155
4156 And = handleBitwiseAnd(IRB, V1: VaNonZero, V2: VbNonZero, S1: SaNonZero, S2: SbNonZero);
4157 } else {
4158 And = IRB.CreateOr(Ops: {SaNonZero, SbNonZero});
4159 }
4160
4161 // Extend <8 x i1> to <8 x i16>.
4162 // (The real pmadd intrinsic would have computed intermediate values of
4163 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4164 // consider each element to be either fully initialized or fully
4165 // uninitialized.)
4166 And = IRB.CreateSExt(V: And, DestTy: Sa->getType());
4167
4168 // Step 2: instrument horizontal add
4169 // We don't need bit-precise horizontalReduce because we only want to check
4170 // if each pair/quad of elements is fully zero.
4171 // Cast to <4 x i32>.
4172 Value *Horizontal = IRB.CreateBitCast(V: And, DestTy: ImplicitReturnType);
4173
4174 // Compute <4 x i1>, then extend back to <4 x i32>.
4175 Value *OutShadow = IRB.CreateSExt(
4176 V: IRB.CreateICmpNE(LHS: Horizontal,
4177 RHS: Constant::getNullValue(Ty: Horizontal->getType())),
4178 DestTy: ImplicitReturnType);
4179
4180 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4181 // AVX, it is already correct).
4182 if (EltSizeInBits)
4183 OutShadow = CreateShadowCast(IRB, V: OutShadow, dstTy: getShadowTy(V: &I));
4184
4185 // Step 3 (if applicable): instrument accumulator
4186 if (I.arg_size() == 3)
4187 OutShadow = IRB.CreateOr(LHS: OutShadow, RHS: getShadow(I: &I, i: 0));
4188
4189 setShadow(V: &I, SV: OutShadow);
4190 setOriginForNaryOp(I);
4191 }
4192
4193 // Instrument compare-packed intrinsic.
4194 //
4195 // x86 has the predicate as the third operand, which is ImmArg e.g.,
4196 // - <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double>, <4 x double>, i8)
4197 // - <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double>, <2 x double>, i8)
4198 //
4199 // while Arm has separate intrinsics for >= and > e.g.,
4200 // - <2 x i32> @llvm.aarch64.neon.facge.v2i32.v2f32
4201 // (<2 x float> %A, <2 x float>)
4202 // - <2 x i32> @llvm.aarch64.neon.facgt.v2i32.v2f32
4203 // (<2 x float> %A, <2 x float>)
4204 //
4205 // Bonus: this also handles scalar cases e.g.,
4206 // - i32 @llvm.aarch64.neon.facgt.i32.f32(float %A, float %B)
4207 void handleVectorComparePackedIntrinsic(IntrinsicInst &I,
4208 bool PredicateAsOperand) {
4209 if (PredicateAsOperand) {
4210 assert(I.arg_size() == 3);
4211 assert(I.paramHasAttr(2, Attribute::ImmArg));
4212 } else
4213 assert(I.arg_size() == 2);
4214
4215 IRBuilder<> IRB(&I);
4216
4217 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4218 // all-ones shadow.
4219 Type *ResTy = getShadowTy(V: &I);
4220 auto *Shadow0 = getShadow(I: &I, i: 0);
4221 auto *Shadow1 = getShadow(I: &I, i: 1);
4222 Value *S0 = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4223 Value *S = IRB.CreateSExt(
4224 V: IRB.CreateICmpNE(LHS: S0, RHS: Constant::getNullValue(Ty: ResTy)), DestTy: ResTy);
4225 setShadow(V: &I, SV: S);
4226 setOriginForNaryOp(I);
4227 }
4228
4229 // Instrument compare-scalar intrinsic.
4230 // This handles both cmp* intrinsics which return the result in the first
4231 // element of a vector, and comi* which return the result as i32.
4232 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4233 IRBuilder<> IRB(&I);
4234 auto *Shadow0 = getShadow(I: &I, i: 0);
4235 auto *Shadow1 = getShadow(I: &I, i: 1);
4236 Value *S0 = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4237 Value *S = LowerElementShadowExtend(IRB, S: S0, T: getShadowTy(V: &I));
4238 setShadow(V: &I, SV: S);
4239 setOriginForNaryOp(I);
4240 }
4241
4242 // Instrument generic vector reduction intrinsics
4243 // by ORing together all their fields.
4244 //
4245 // If AllowShadowCast is true, the return type does not need to be the same
4246 // type as the fields
4247 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4248 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4249 assert(I.arg_size() == 1);
4250
4251 IRBuilder<> IRB(&I);
4252 Value *S = IRB.CreateOrReduce(Src: getShadow(I: &I, i: 0));
4253 if (AllowShadowCast)
4254 S = CreateShadowCast(IRB, V: S, dstTy: getShadowTy(V: &I));
4255 else
4256 assert(S->getType() == getShadowTy(&I));
4257 setShadow(V: &I, SV: S);
4258 setOriginForNaryOp(I);
4259 }
4260
4261 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4262 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4263 // %a1)
4264 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4265 //
4266 // The type of the return value, initial starting value, and elements of the
4267 // vector must be identical.
4268 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4269 assert(I.arg_size() == 2);
4270
4271 IRBuilder<> IRB(&I);
4272 Value *Shadow0 = getShadow(I: &I, i: 0);
4273 Value *Shadow1 = IRB.CreateOrReduce(Src: getShadow(I: &I, i: 1));
4274 assert(Shadow0->getType() == Shadow1->getType());
4275 Value *S = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4276 assert(S->getType() == getShadowTy(&I));
4277 setShadow(V: &I, SV: S);
4278 setOriginForNaryOp(I);
4279 }
4280
4281 // Instrument vector.reduce.or intrinsic.
4282 // Valid (non-poisoned) set bits in the operand pull low the
4283 // corresponding shadow bits.
4284 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4285 assert(I.arg_size() == 1);
4286
4287 IRBuilder<> IRB(&I);
4288 Value *OperandShadow = getShadow(I: &I, i: 0);
4289 Value *OperandUnsetBits = IRB.CreateNot(V: I.getOperand(i_nocapture: 0));
4290 Value *OperandUnsetOrPoison = IRB.CreateOr(LHS: OperandUnsetBits, RHS: OperandShadow);
4291 // Bit N is clean if any field's bit N is 1 and unpoison
4292 Value *OutShadowMask = IRB.CreateAndReduce(Src: OperandUnsetOrPoison);
4293 // Otherwise, it is clean if every field's bit N is unpoison
4294 Value *OrShadow = IRB.CreateOrReduce(Src: OperandShadow);
4295 Value *S = IRB.CreateAnd(LHS: OutShadowMask, RHS: OrShadow);
4296
4297 setShadow(V: &I, SV: S);
4298 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
4299 }
4300
4301 // Instrument vector.reduce.and intrinsic.
4302 // Valid (non-poisoned) unset bits in the operand pull down the
4303 // corresponding shadow bits.
4304 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4305 assert(I.arg_size() == 1);
4306
4307 IRBuilder<> IRB(&I);
4308 Value *OperandShadow = getShadow(I: &I, i: 0);
4309 Value *OperandSetOrPoison = IRB.CreateOr(LHS: I.getOperand(i_nocapture: 0), RHS: OperandShadow);
4310 // Bit N is clean if any field's bit N is 0 and unpoison
4311 Value *OutShadowMask = IRB.CreateAndReduce(Src: OperandSetOrPoison);
4312 // Otherwise, it is clean if every field's bit N is unpoison
4313 Value *OrShadow = IRB.CreateOrReduce(Src: OperandShadow);
4314 Value *S = IRB.CreateAnd(LHS: OutShadowMask, RHS: OrShadow);
4315
4316 setShadow(V: &I, SV: S);
4317 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
4318 }
4319
4320 void handleStmxcsr(IntrinsicInst &I) {
4321 IRBuilder<> IRB(&I);
4322 Value *Addr = I.getArgOperand(i: 0);
4323 Type *Ty = IRB.getInt32Ty();
4324 Value *ShadowPtr =
4325 getShadowOriginPtr(Addr, IRB, ShadowTy: Ty, Alignment: Align(1), /*isStore*/ true).first;
4326
4327 IRB.CreateStore(Val: getCleanShadow(OrigTy: Ty), Ptr: ShadowPtr);
4328
4329 if (ClCheckAccessAddress)
4330 insertCheckShadowOf(Val: Addr, OrigIns: &I);
4331 }
4332
4333 void handleLdmxcsr(IntrinsicInst &I) {
4334 if (!InsertChecks)
4335 return;
4336
4337 IRBuilder<> IRB(&I);
4338 Value *Addr = I.getArgOperand(i: 0);
4339 Type *Ty = IRB.getInt32Ty();
4340 const Align Alignment = Align(1);
4341 Value *ShadowPtr, *OriginPtr;
4342 std::tie(args&: ShadowPtr, args&: OriginPtr) =
4343 getShadowOriginPtr(Addr, IRB, ShadowTy: Ty, Alignment, /*isStore*/ false);
4344
4345 if (ClCheckAccessAddress)
4346 insertCheckShadowOf(Val: Addr, OrigIns: &I);
4347
4348 Value *Shadow = IRB.CreateAlignedLoad(Ty, Ptr: ShadowPtr, Align: Alignment, Name: "_ldmxcsr");
4349 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr)
4350 : getCleanOrigin();
4351 insertCheckShadow(Shadow, Origin, OrigIns: &I);
4352 }
4353
4354 void handleMaskedExpandLoad(IntrinsicInst &I) {
4355 IRBuilder<> IRB(&I);
4356 Value *Ptr = I.getArgOperand(i: 0);
4357 MaybeAlign Align = I.getParamAlign(ArgNo: 0);
4358 Value *Mask = I.getArgOperand(i: 1);
4359 Value *PassThru = I.getArgOperand(i: 2);
4360
4361 if (ClCheckAccessAddress) {
4362 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4363 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4364 }
4365
4366 if (!PropagateShadow) {
4367 setShadow(V: &I, SV: getCleanShadow(V: &I));
4368 setOrigin(V: &I, Origin: getCleanOrigin());
4369 return;
4370 }
4371
4372 Type *ShadowTy = getShadowTy(V: &I);
4373 Type *ElementShadowTy = cast<VectorType>(Val: ShadowTy)->getElementType();
4374 auto [ShadowPtr, OriginPtr] =
4375 getShadowOriginPtr(Addr: Ptr, IRB, ShadowTy: ElementShadowTy, Alignment: Align, /*isStore*/ false);
4376
4377 Value *Shadow =
4378 IRB.CreateMaskedExpandLoad(Ty: ShadowTy, Ptr: ShadowPtr, Align, Mask,
4379 PassThru: getShadow(V: PassThru), Name: "_msmaskedexpload");
4380
4381 setShadow(V: &I, SV: Shadow);
4382
4383 // TODO: Store origins.
4384 setOrigin(V: &I, Origin: getCleanOrigin());
4385 }
4386
4387 void handleMaskedCompressStore(IntrinsicInst &I) {
4388 IRBuilder<> IRB(&I);
4389 Value *Values = I.getArgOperand(i: 0);
4390 Value *Ptr = I.getArgOperand(i: 1);
4391 MaybeAlign Align = I.getParamAlign(ArgNo: 1);
4392 Value *Mask = I.getArgOperand(i: 2);
4393
4394 if (ClCheckAccessAddress) {
4395 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4396 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4397 }
4398
4399 Value *Shadow = getShadow(V: Values);
4400 Type *ElementShadowTy =
4401 getShadowTy(OrigTy: cast<VectorType>(Val: Values->getType())->getElementType());
4402 auto [ShadowPtr, OriginPtrs] =
4403 getShadowOriginPtr(Addr: Ptr, IRB, ShadowTy: ElementShadowTy, Alignment: Align, /*isStore*/ true);
4404
4405 IRB.CreateMaskedCompressStore(Val: Shadow, Ptr: ShadowPtr, Align, Mask);
4406
4407 // TODO: Store origins.
4408 }
4409
4410 void handleMaskedGather(IntrinsicInst &I) {
4411 IRBuilder<> IRB(&I);
4412 Value *Ptrs = I.getArgOperand(i: 0);
4413 const Align Alignment = I.getParamAlign(ArgNo: 0).valueOrOne();
4414 Value *Mask = I.getArgOperand(i: 1);
4415 Value *PassThru = I.getArgOperand(i: 2);
4416
4417 Type *PtrsShadowTy = getShadowTy(V: Ptrs);
4418 if (ClCheckAccessAddress) {
4419 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4420 Value *MaskedPtrShadow = IRB.CreateSelect(
4421 C: Mask, True: getShadow(V: Ptrs), False: Constant::getNullValue(Ty: (PtrsShadowTy)),
4422 Name: "_msmaskedptrs");
4423 insertCheckShadow(Shadow: MaskedPtrShadow, Origin: getOrigin(V: Ptrs), OrigIns: &I);
4424 }
4425
4426 if (!PropagateShadow) {
4427 setShadow(V: &I, SV: getCleanShadow(V: &I));
4428 setOrigin(V: &I, Origin: getCleanOrigin());
4429 return;
4430 }
4431
4432 Type *ShadowTy = getShadowTy(V: &I);
4433 Type *ElementShadowTy = cast<VectorType>(Val: ShadowTy)->getElementType();
4434 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4435 Addr: Ptrs, IRB, ShadowTy: ElementShadowTy, Alignment, /*isStore*/ false);
4436
4437 Value *Shadow =
4438 IRB.CreateMaskedGather(Ty: ShadowTy, Ptrs: ShadowPtrs, Alignment, Mask,
4439 PassThru: getShadow(V: PassThru), Name: "_msmaskedgather");
4440
4441 setShadow(V: &I, SV: Shadow);
4442
4443 // TODO: Store origins.
4444 setOrigin(V: &I, Origin: getCleanOrigin());
4445 }
4446
4447 void handleMaskedScatter(IntrinsicInst &I) {
4448 IRBuilder<> IRB(&I);
4449 Value *Values = I.getArgOperand(i: 0);
4450 Value *Ptrs = I.getArgOperand(i: 1);
4451 const Align Alignment = I.getParamAlign(ArgNo: 1).valueOrOne();
4452 Value *Mask = I.getArgOperand(i: 2);
4453
4454 Type *PtrsShadowTy = getShadowTy(V: Ptrs);
4455 if (ClCheckAccessAddress) {
4456 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4457 Value *MaskedPtrShadow = IRB.CreateSelect(
4458 C: Mask, True: getShadow(V: Ptrs), False: Constant::getNullValue(Ty: (PtrsShadowTy)),
4459 Name: "_msmaskedptrs");
4460 insertCheckShadow(Shadow: MaskedPtrShadow, Origin: getOrigin(V: Ptrs), OrigIns: &I);
4461 }
4462
4463 Value *Shadow = getShadow(V: Values);
4464 Type *ElementShadowTy =
4465 getShadowTy(OrigTy: cast<VectorType>(Val: Values->getType())->getElementType());
4466 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4467 Addr: Ptrs, IRB, ShadowTy: ElementShadowTy, Alignment, /*isStore*/ true);
4468
4469 IRB.CreateMaskedScatter(Val: Shadow, Ptrs: ShadowPtrs, Alignment, Mask);
4470
4471 // TODO: Store origin.
4472 }
4473
4474 // Intrinsic::masked_store
4475 //
4476 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4477 // stores are lowered to Intrinsic::masked_store.
4478 void handleMaskedStore(IntrinsicInst &I) {
4479 IRBuilder<> IRB(&I);
4480 Value *V = I.getArgOperand(i: 0);
4481 Value *Ptr = I.getArgOperand(i: 1);
4482 const Align Alignment = I.getParamAlign(ArgNo: 1).valueOrOne();
4483 Value *Mask = I.getArgOperand(i: 2);
4484 Value *Shadow = getShadow(V);
4485
4486 if (ClCheckAccessAddress) {
4487 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4488 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4489 }
4490
4491 Value *ShadowPtr;
4492 Value *OriginPtr;
4493 std::tie(args&: ShadowPtr, args&: OriginPtr) = getShadowOriginPtr(
4494 Addr: Ptr, IRB, ShadowTy: Shadow->getType(), Alignment, /*isStore*/ true);
4495
4496 IRB.CreateMaskedStore(Val: Shadow, Ptr: ShadowPtr, Alignment, Mask);
4497
4498 if (!MS.TrackOrigins)
4499 return;
4500
4501 auto &DL = F.getDataLayout();
4502 paintOrigin(IRB, Origin: getOrigin(V), OriginPtr,
4503 TS: DL.getTypeStoreSize(Ty: Shadow->getType()),
4504 Alignment: std::max(a: Alignment, b: kMinOriginAlignment));
4505 }
4506
4507 // Intrinsic::masked_load
4508 //
4509 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4510 // loads are lowered to Intrinsic::masked_load.
4511 void handleMaskedLoad(IntrinsicInst &I) {
4512 IRBuilder<> IRB(&I);
4513 Value *Ptr = I.getArgOperand(i: 0);
4514 const Align Alignment = I.getParamAlign(ArgNo: 0).valueOrOne();
4515 Value *Mask = I.getArgOperand(i: 1);
4516 Value *PassThru = I.getArgOperand(i: 2);
4517
4518 if (ClCheckAccessAddress) {
4519 insertCheckShadowOf(Val: Ptr, OrigIns: &I);
4520 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4521 }
4522
4523 if (!PropagateShadow) {
4524 setShadow(V: &I, SV: getCleanShadow(V: &I));
4525 setOrigin(V: &I, Origin: getCleanOrigin());
4526 return;
4527 }
4528
4529 Type *ShadowTy = getShadowTy(V: &I);
4530 Value *ShadowPtr, *OriginPtr;
4531 std::tie(args&: ShadowPtr, args&: OriginPtr) =
4532 getShadowOriginPtr(Addr: Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4533 setShadow(V: &I, SV: IRB.CreateMaskedLoad(Ty: ShadowTy, Ptr: ShadowPtr, Alignment, Mask,
4534 PassThru: getShadow(V: PassThru), Name: "_msmaskedld"));
4535
4536 if (!MS.TrackOrigins)
4537 return;
4538
4539 // Choose between PassThru's and the loaded value's origins.
4540 Value *MaskedPassThruShadow = IRB.CreateAnd(
4541 LHS: getShadow(V: PassThru), RHS: IRB.CreateSExt(V: IRB.CreateNeg(V: Mask), DestTy: ShadowTy));
4542
4543 Value *NotNull = convertToBool(V: MaskedPassThruShadow, IRB, name: "_mscmp");
4544
4545 Value *PtrOrigin = IRB.CreateLoad(Ty: MS.OriginTy, Ptr: OriginPtr);
4546 Value *Origin = IRB.CreateSelect(C: NotNull, True: getOrigin(V: PassThru), False: PtrOrigin);
4547
4548 setOrigin(V: &I, Origin);
4549 }
4550
4551 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4552 // dst mask src
4553 //
4554 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4555 // by handleMaskedStore.
4556 //
4557 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4558 // vector of integers, unlike the LLVM masked intrinsics, which require a
4559 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4560 // mentions that the x86 backend does not know how to efficiently convert
4561 // from a vector of booleans back into the AVX mask format; therefore, they
4562 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4563 // intrinsics.
4564 void handleAVXMaskedStore(IntrinsicInst &I) {
4565 assert(I.arg_size() == 3);
4566
4567 IRBuilder<> IRB(&I);
4568
4569 Value *Dst = I.getArgOperand(i: 0);
4570 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4571
4572 Value *Mask = I.getArgOperand(i: 1);
4573 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4574
4575 Value *Src = I.getArgOperand(i: 2);
4576 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4577
4578 const Align Alignment = Align(1);
4579
4580 Value *SrcShadow = getShadow(V: Src);
4581
4582 if (ClCheckAccessAddress) {
4583 insertCheckShadowOf(Val: Dst, OrigIns: &I);
4584 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4585 }
4586
4587 Value *DstShadowPtr;
4588 Value *DstOriginPtr;
4589 std::tie(args&: DstShadowPtr, args&: DstOriginPtr) = getShadowOriginPtr(
4590 Addr: Dst, IRB, ShadowTy: SrcShadow->getType(), Alignment, /*isStore*/ true);
4591
4592 SmallVector<Value *, 2> ShadowArgs;
4593 ShadowArgs.append(NumInputs: 1, Elt: DstShadowPtr);
4594 ShadowArgs.append(NumInputs: 1, Elt: Mask);
4595 // The intrinsic may require floating-point but shadows can be arbitrary
4596 // bit patterns, of which some would be interpreted as "invalid"
4597 // floating-point values (NaN etc.); we assume the intrinsic will happily
4598 // copy them.
4599 ShadowArgs.append(NumInputs: 1, Elt: IRB.CreateBitCast(V: SrcShadow, DestTy: Src->getType()));
4600
4601 CallInst *CI =
4602 IRB.CreateIntrinsic(RetTy: IRB.getVoidTy(), ID: I.getIntrinsicID(), Args: ShadowArgs);
4603 setShadow(V: &I, SV: CI);
4604
4605 if (!MS.TrackOrigins)
4606 return;
4607
4608 // Approximation only
4609 auto &DL = F.getDataLayout();
4610 paintOrigin(IRB, Origin: getOrigin(V: Src), OriginPtr: DstOriginPtr,
4611 TS: DL.getTypeStoreSize(Ty: SrcShadow->getType()),
4612 Alignment: std::max(a: Alignment, b: kMinOriginAlignment));
4613 }
4614
4615 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4616 // return src mask
4617 //
4618 // Masked-off values are replaced with 0, which conveniently also represents
4619 // initialized memory.
4620 //
4621 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4622 // by handleMaskedStore.
4623 //
4624 // We do not combine this with handleMaskedLoad; see comment in
4625 // handleAVXMaskedStore for the rationale.
4626 //
4627 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4628 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4629 // parameter.
4630 void handleAVXMaskedLoad(IntrinsicInst &I) {
4631 assert(I.arg_size() == 2);
4632
4633 IRBuilder<> IRB(&I);
4634
4635 Value *Src = I.getArgOperand(i: 0);
4636 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4637
4638 Value *Mask = I.getArgOperand(i: 1);
4639 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4640
4641 const Align Alignment = Align(1);
4642
4643 if (ClCheckAccessAddress) {
4644 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4645 }
4646
4647 Type *SrcShadowTy = getShadowTy(V: Src);
4648 Value *SrcShadowPtr, *SrcOriginPtr;
4649 std::tie(args&: SrcShadowPtr, args&: SrcOriginPtr) =
4650 getShadowOriginPtr(Addr: Src, IRB, ShadowTy: SrcShadowTy, Alignment, /*isStore*/ false);
4651
4652 SmallVector<Value *, 2> ShadowArgs;
4653 ShadowArgs.append(NumInputs: 1, Elt: SrcShadowPtr);
4654 ShadowArgs.append(NumInputs: 1, Elt: Mask);
4655
4656 CallInst *CI =
4657 IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(), Args: ShadowArgs);
4658 // The AVX masked load intrinsics do not have integer variants. We use the
4659 // floating-point variants, which will happily copy the shadows even if
4660 // they are interpreted as "invalid" floating-point values (NaN etc.).
4661 setShadow(V: &I, SV: IRB.CreateBitCast(V: CI, DestTy: getShadowTy(V: &I)));
4662
4663 if (!MS.TrackOrigins)
4664 return;
4665
4666 // The "pass-through" value is always zero (initialized). To the extent
4667 // that that results in initialized aligned 4-byte chunks, the origin value
4668 // is ignored. It is therefore correct to simply copy the origin from src.
4669 Value *PtrSrcOrigin = IRB.CreateLoad(Ty: MS.OriginTy, Ptr: SrcOriginPtr);
4670 setOrigin(V: &I, Origin: PtrSrcOrigin);
4671 }
4672
4673 // Test whether the mask indices are initialized, only checking the bits that
4674 // are actually used.
4675 //
4676 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4677 // used/checked.
4678 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4679 assert(isFixedIntVector(Idx));
4680 auto IdxVectorSize =
4681 cast<FixedVectorType>(Val: Idx->getType())->getNumElements();
4682 assert(isPowerOf2_64(IdxVectorSize));
4683
4684 // Compiler isn't smart enough, let's help it
4685 if (isa<Constant>(Val: Idx))
4686 return;
4687
4688 auto *IdxShadow = getShadow(V: Idx);
4689 Value *Truncated = IRB.CreateTrunc(
4690 V: IdxShadow,
4691 DestTy: FixedVectorType::get(ElementType: Type::getIntNTy(C&: *MS.C, N: Log2_64(Value: IdxVectorSize)),
4692 NumElts: IdxVectorSize));
4693 insertCheckShadow(Shadow: Truncated, Origin: getOrigin(V: Idx), OrigIns: I);
4694 }
4695
4696 // Instrument AVX permutation intrinsic.
4697 // We apply the same permutation (argument index 1) to the shadow.
4698 void handleAVXVpermilvar(IntrinsicInst &I) {
4699 IRBuilder<> IRB(&I);
4700 Value *Shadow = getShadow(I: &I, i: 0);
4701 maskedCheckAVXIndexShadow(IRB, Idx: I.getArgOperand(i: 1), I: &I);
4702
4703 // Shadows are integer-ish types but some intrinsics require a
4704 // different (e.g., floating-point) type.
4705 Shadow = IRB.CreateBitCast(V: Shadow, DestTy: I.getArgOperand(i: 0)->getType());
4706 CallInst *CI = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
4707 Args: {Shadow, I.getArgOperand(i: 1)});
4708
4709 setShadow(V: &I, SV: IRB.CreateBitCast(V: CI, DestTy: getShadowTy(V: &I)));
4710 setOriginForNaryOp(I);
4711 }
4712
4713 // Instrument AVX permutation intrinsic.
4714 // We apply the same permutation (argument index 1) to the shadows.
4715 void handleAVXVpermi2var(IntrinsicInst &I) {
4716 assert(I.arg_size() == 3);
4717 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4718 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4719 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4720 [[maybe_unused]] auto ArgVectorSize =
4721 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4722 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4723 ->getNumElements() == ArgVectorSize);
4724 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4725 ->getNumElements() == ArgVectorSize);
4726 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4727 assert(I.getType() == I.getArgOperand(0)->getType());
4728 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4729 IRBuilder<> IRB(&I);
4730 Value *AShadow = getShadow(I: &I, i: 0);
4731 Value *Idx = I.getArgOperand(i: 1);
4732 Value *BShadow = getShadow(I: &I, i: 2);
4733
4734 maskedCheckAVXIndexShadow(IRB, Idx, I: &I);
4735
4736 // Shadows are integer-ish types but some intrinsics require a
4737 // different (e.g., floating-point) type.
4738 AShadow = IRB.CreateBitCast(V: AShadow, DestTy: I.getArgOperand(i: 0)->getType());
4739 BShadow = IRB.CreateBitCast(V: BShadow, DestTy: I.getArgOperand(i: 2)->getType());
4740 CallInst *CI = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
4741 Args: {AShadow, Idx, BShadow});
4742 setShadow(V: &I, SV: IRB.CreateBitCast(V: CI, DestTy: getShadowTy(V: &I)));
4743 setOriginForNaryOp(I);
4744 }
4745
4746 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4747 return isa<FixedVectorType>(Val: T) && T->isIntOrIntVectorTy();
4748 }
4749
4750 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4751 return isa<FixedVectorType>(Val: T) && T->isFPOrFPVectorTy();
4752 }
4753
4754 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4755 return isFixedIntVectorTy(T: V->getType());
4756 }
4757
4758 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4759 return isFixedFPVectorTy(T: V->getType());
4760 }
4761
4762 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4763 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4764 // i32 rounding)
4765 //
4766 // Inconveniently, some similar intrinsics have a different operand order:
4767 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4768 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4769 // i16 mask)
4770 //
4771 // If the return type has more elements than A, the excess elements are
4772 // zeroed (and the corresponding shadow is initialized).
4773 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4774 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4775 // i8 mask)
4776 //
4777 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4778 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4779 // where all_or_nothing(x) is fully uninitialized if x has any
4780 // uninitialized bits
4781 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4782 IRBuilder<> IRB(&I);
4783
4784 assert(I.arg_size() == 4);
4785 Value *A = I.getOperand(i_nocapture: 0);
4786 Value *WriteThrough;
4787 Value *Mask;
4788 Value *RoundingMode;
4789 if (LastMask) {
4790 WriteThrough = I.getOperand(i_nocapture: 2);
4791 Mask = I.getOperand(i_nocapture: 3);
4792 RoundingMode = I.getOperand(i_nocapture: 1);
4793 } else {
4794 WriteThrough = I.getOperand(i_nocapture: 1);
4795 Mask = I.getOperand(i_nocapture: 2);
4796 RoundingMode = I.getOperand(i_nocapture: 3);
4797 }
4798
4799 assert(isFixedFPVector(A));
4800 assert(isFixedIntVector(WriteThrough));
4801
4802 unsigned ANumElements =
4803 cast<FixedVectorType>(Val: A->getType())->getNumElements();
4804 [[maybe_unused]] unsigned WriteThruNumElements =
4805 cast<FixedVectorType>(Val: WriteThrough->getType())->getNumElements();
4806 assert(ANumElements == WriteThruNumElements ||
4807 ANumElements * 2 == WriteThruNumElements);
4808
4809 assert(Mask->getType()->isIntegerTy());
4810 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4811 assert(ANumElements == MaskNumElements ||
4812 ANumElements * 2 == MaskNumElements);
4813
4814 assert(WriteThruNumElements == MaskNumElements);
4815
4816 // Some bits of the mask may be unused, though it's unusual to have partly
4817 // uninitialized bits.
4818 insertCheckShadowOf(Val: Mask, OrigIns: &I);
4819
4820 assert(RoundingMode->getType()->isIntegerTy());
4821 // Only some bits of the rounding mode are used, though it's very
4822 // unusual to have uninitialized bits there (more commonly, it's a
4823 // constant).
4824 insertCheckShadowOf(Val: RoundingMode, OrigIns: &I);
4825
4826 assert(I.getType() == WriteThrough->getType());
4827
4828 Value *AShadow = getShadow(V: A);
4829 AShadow = maybeExtendVectorShadowWithZeros(Shadow: AShadow, I);
4830
4831 if (ANumElements * 2 == MaskNumElements) {
4832 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4833 // from the zeroed shadow instead of the writethrough's shadow.
4834 Mask =
4835 IRB.CreateTrunc(V: Mask, DestTy: IRB.getIntNTy(N: ANumElements), Name: "_ms_mask_trunc");
4836 Mask =
4837 IRB.CreateZExt(V: Mask, DestTy: IRB.getIntNTy(N: MaskNumElements), Name: "_ms_mask_zext");
4838 }
4839
4840 // Convert i16 mask to <16 x i1>
4841 Mask = IRB.CreateBitCast(
4842 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: MaskNumElements),
4843 Name: "_ms_mask_bitcast");
4844
4845 /// For floating-point to integer conversion, the output is:
4846 /// - fully uninitialized if *any* bit of the input is uninitialized
4847 /// - fully ininitialized if all bits of the input are ininitialized
4848 /// We apply the same principle on a per-element basis for vectors.
4849 ///
4850 /// We use the scalar width of the return type instead of A's.
4851 AShadow = IRB.CreateSExt(
4852 V: IRB.CreateICmpNE(LHS: AShadow, RHS: getCleanShadow(OrigTy: AShadow->getType())),
4853 DestTy: getShadowTy(V: &I), Name: "_ms_a_shadow");
4854
4855 Value *WriteThroughShadow = getShadow(V: WriteThrough);
4856 Value *Shadow = IRB.CreateSelect(C: Mask, True: AShadow, False: WriteThroughShadow,
4857 Name: "_ms_writethru_select");
4858
4859 setShadow(V: &I, SV: Shadow);
4860 setOriginForNaryOp(I);
4861 }
4862
4863 // Instrument BMI / BMI2 intrinsics.
4864 // All of these intrinsics are Z = I(X, Y)
4865 // where the types of all operands and the result match, and are either i32 or
4866 // i64. The following instrumentation happens to work for all of them:
4867 // Sz = I(Sx, Y) | (sext (Sy != 0))
4868 void handleBmiIntrinsic(IntrinsicInst &I) {
4869 IRBuilder<> IRB(&I);
4870 Type *ShadowTy = getShadowTy(V: &I);
4871
4872 // If any bit of the mask operand is poisoned, then the whole thing is.
4873 Value *SMask = getShadow(I: &I, i: 1);
4874 SMask = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: SMask, RHS: getCleanShadow(OrigTy: ShadowTy)),
4875 DestTy: ShadowTy);
4876 // Apply the same intrinsic to the shadow of the first operand.
4877 Value *S = IRB.CreateCall(Callee: I.getCalledFunction(),
4878 Args: {getShadow(I: &I, i: 0), I.getOperand(i_nocapture: 1)});
4879 S = IRB.CreateOr(LHS: SMask, RHS: S);
4880 setShadow(V: &I, SV: S);
4881 setOriginForNaryOp(I);
4882 }
4883
4884 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4885 SmallVector<int, 8> Mask;
4886 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4887 Mask.append(NumInputs: 2, Elt: X);
4888 }
4889 return Mask;
4890 }
4891
4892 // Instrument pclmul intrinsics.
4893 // These intrinsics operate either on odd or on even elements of the input
4894 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4895 // Replace the unused elements with copies of the used ones, ex:
4896 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4897 // or
4898 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4899 // and then apply the usual shadow combining logic.
4900 void handlePclmulIntrinsic(IntrinsicInst &I) {
4901 IRBuilder<> IRB(&I);
4902 unsigned Width =
4903 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4904 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4905 "pclmul 3rd operand must be a constant");
4906 unsigned Imm = cast<ConstantInt>(Val: I.getArgOperand(i: 2))->getZExtValue();
4907 Value *Shuf0 = IRB.CreateShuffleVector(V: getShadow(I: &I, i: 0),
4908 Mask: getPclmulMask(Width, OddElements: Imm & 0x01));
4909 Value *Shuf1 = IRB.CreateShuffleVector(V: getShadow(I: &I, i: 1),
4910 Mask: getPclmulMask(Width, OddElements: Imm & 0x10));
4911 ShadowAndOriginCombiner SOC(this, IRB);
4912 SOC.Add(OpShadow: Shuf0, OpOrigin: getOrigin(I: &I, i: 0));
4913 SOC.Add(OpShadow: Shuf1, OpOrigin: getOrigin(I: &I, i: 1));
4914 SOC.Done(I: &I);
4915 }
4916
4917 // Instrument _mm_*_sd|ss intrinsics
4918 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4919 IRBuilder<> IRB(&I);
4920 unsigned Width =
4921 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4922 Value *First = getShadow(I: &I, i: 0);
4923 Value *Second = getShadow(I: &I, i: 1);
4924 // First element of second operand, remaining elements of first operand
4925 SmallVector<int, 16> Mask;
4926 Mask.push_back(Elt: Width);
4927 for (unsigned i = 1; i < Width; i++)
4928 Mask.push_back(Elt: i);
4929 Value *Shadow = IRB.CreateShuffleVector(V1: First, V2: Second, Mask);
4930
4931 setShadow(V: &I, SV: Shadow);
4932 setOriginForNaryOp(I);
4933 }
4934
4935 void handleVtestIntrinsic(IntrinsicInst &I) {
4936 IRBuilder<> IRB(&I);
4937 Value *Shadow0 = getShadow(I: &I, i: 0);
4938 Value *Shadow1 = getShadow(I: &I, i: 1);
4939 Value *Or = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
4940 Value *NZ = IRB.CreateICmpNE(LHS: Or, RHS: Constant::getNullValue(Ty: Or->getType()));
4941 Value *Scalar = convertShadowToScalar(V: NZ, IRB);
4942 Value *Shadow = IRB.CreateZExt(V: Scalar, DestTy: getShadowTy(V: &I));
4943
4944 setShadow(V: &I, SV: Shadow);
4945 setOriginForNaryOp(I);
4946 }
4947
4948 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4949 IRBuilder<> IRB(&I);
4950 unsigned Width =
4951 cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements();
4952 Value *First = getShadow(I: &I, i: 0);
4953 Value *Second = getShadow(I: &I, i: 1);
4954 Value *OrShadow = IRB.CreateOr(LHS: First, RHS: Second);
4955 // First element of both OR'd together, remaining elements of first operand
4956 SmallVector<int, 16> Mask;
4957 Mask.push_back(Elt: Width);
4958 for (unsigned i = 1; i < Width; i++)
4959 Mask.push_back(Elt: i);
4960 Value *Shadow = IRB.CreateShuffleVector(V1: First, V2: OrShadow, Mask);
4961
4962 setShadow(V: &I, SV: Shadow);
4963 setOriginForNaryOp(I);
4964 }
4965
4966 // _mm_round_ps / _mm_round_ps.
4967 // Similar to maybeHandleSimpleNomemIntrinsic except
4968 // the second argument is guaranteed to be a constant integer.
4969 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4970 assert(I.getArgOperand(0)->getType() == I.getType());
4971 assert(I.arg_size() == 2);
4972 assert(isa<ConstantInt>(I.getArgOperand(1)));
4973
4974 IRBuilder<> IRB(&I);
4975 ShadowAndOriginCombiner SC(this, IRB);
4976 SC.Add(V: I.getArgOperand(i: 0));
4977 SC.Done(I: &I);
4978 }
4979
4980 // Instrument @llvm.abs intrinsic.
4981 //
4982 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4983 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4984 void handleAbsIntrinsic(IntrinsicInst &I) {
4985 assert(I.arg_size() == 2);
4986 Value *Src = I.getArgOperand(i: 0);
4987 Value *IsIntMinPoison = I.getArgOperand(i: 1);
4988
4989 assert(I.getType()->isIntOrIntVectorTy());
4990
4991 assert(Src->getType() == I.getType());
4992
4993 assert(IsIntMinPoison->getType()->isIntegerTy());
4994 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4995
4996 IRBuilder<> IRB(&I);
4997 Value *SrcShadow = getShadow(V: Src);
4998
4999 APInt MinVal =
5000 APInt::getSignedMinValue(numBits: Src->getType()->getScalarSizeInBits());
5001 Value *MinValVec = ConstantInt::get(Ty: Src->getType(), V: MinVal);
5002 Value *SrcIsMin = IRB.CreateICmp(P: CmpInst::ICMP_EQ, LHS: Src, RHS: MinValVec);
5003
5004 Value *PoisonedShadow = getPoisonedShadow(V: Src);
5005 Value *PoisonedIfIntMinShadow =
5006 IRB.CreateSelect(C: SrcIsMin, True: PoisonedShadow, False: SrcShadow);
5007 Value *Shadow =
5008 IRB.CreateSelect(C: IsIntMinPoison, True: PoisonedIfIntMinShadow, False: SrcShadow);
5009
5010 setShadow(V: &I, SV: Shadow);
5011 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
5012 }
5013
5014 void handleIsFpClass(IntrinsicInst &I) {
5015 IRBuilder<> IRB(&I);
5016 Value *Shadow = getShadow(I: &I, i: 0);
5017 setShadow(V: &I, SV: IRB.CreateICmpNE(LHS: Shadow, RHS: getCleanShadow(V: Shadow)));
5018 setOrigin(V: &I, Origin: getOrigin(I: &I, i: 0));
5019 }
5020
5021 void handleArithmeticWithOverflow(IntrinsicInst &I) {
5022 IRBuilder<> IRB(&I);
5023 Value *Shadow0 = getShadow(I: &I, i: 0);
5024 Value *Shadow1 = getShadow(I: &I, i: 1);
5025 Value *ShadowElt0 = IRB.CreateOr(LHS: Shadow0, RHS: Shadow1);
5026 Value *ShadowElt1 =
5027 IRB.CreateICmpNE(LHS: ShadowElt0, RHS: getCleanShadow(V: ShadowElt0));
5028
5029 Value *Shadow = PoisonValue::get(T: getShadowTy(V: &I));
5030 Shadow = IRB.CreateInsertValue(Agg: Shadow, Val: ShadowElt0, Idxs: 0);
5031 Shadow = IRB.CreateInsertValue(Agg: Shadow, Val: ShadowElt1, Idxs: 1);
5032
5033 setShadow(V: &I, SV: Shadow);
5034 setOriginForNaryOp(I);
5035 }
5036
5037 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
5038 assert(isa<FixedVectorType>(V->getType()));
5039 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
5040 Value *Shadow = getShadow(V);
5041 return IRB.CreateExtractElement(Vec: Shadow,
5042 Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0));
5043 }
5044
5045 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
5046 //
5047 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
5048 // (<8 x i64>, <16 x i8>, i8)
5049 // A WriteThru Mask
5050 //
5051 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
5052 // (<16 x i32>, <16 x i8>, i16)
5053 //
5054 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
5055 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
5056 //
5057 // If Dst has more elements than A, the excess elements are zeroed (and the
5058 // corresponding shadow is initialized).
5059 //
5060 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
5061 // and is much faster than this handler.
5062 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
5063 IRBuilder<> IRB(&I);
5064
5065 assert(I.arg_size() == 3);
5066 Value *A = I.getOperand(i_nocapture: 0);
5067 Value *WriteThrough = I.getOperand(i_nocapture: 1);
5068 Value *Mask = I.getOperand(i_nocapture: 2);
5069
5070 assert(isFixedIntVector(A));
5071 assert(isFixedIntVector(WriteThrough));
5072
5073 unsigned ANumElements =
5074 cast<FixedVectorType>(Val: A->getType())->getNumElements();
5075 unsigned OutputNumElements =
5076 cast<FixedVectorType>(Val: WriteThrough->getType())->getNumElements();
5077 assert(ANumElements == OutputNumElements ||
5078 ANumElements * 2 == OutputNumElements);
5079
5080 assert(Mask->getType()->isIntegerTy());
5081 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5082 insertCheckShadowOf(Val: Mask, OrigIns: &I);
5083
5084 assert(I.getType() == WriteThrough->getType());
5085
5086 // Widen the mask, if necessary, to have one bit per element of the output
5087 // vector.
5088 // We want the extra bits to have '1's, so that the CreateSelect will
5089 // select the values from AShadow instead of WriteThroughShadow ("maskless"
5090 // versions of the intrinsics are sometimes implemented using an all-1's
5091 // mask and an undefined value for WriteThroughShadow). We accomplish this
5092 // by using bitwise NOT before and after the ZExt.
5093 if (ANumElements != OutputNumElements) {
5094 Mask = IRB.CreateNot(V: Mask);
5095 Mask = IRB.CreateZExt(V: Mask, DestTy: Type::getIntNTy(C&: *MS.C, N: OutputNumElements),
5096 Name: "_ms_widen_mask");
5097 Mask = IRB.CreateNot(V: Mask);
5098 }
5099 Mask = IRB.CreateBitCast(
5100 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: OutputNumElements));
5101
5102 Value *AShadow = getShadow(V: A);
5103
5104 // The return type might have more elements than the input.
5105 // Temporarily shrink the return type's number of elements.
5106 VectorType *ShadowType = maybeShrinkVectorShadowType(Src: A, I);
5107
5108 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
5109 // This handler treats them all as truncation, which leads to some rare
5110 // false positives in the cases where the truncated bytes could
5111 // unambiguously saturate the value e.g., if A = ??????10 ????????
5112 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
5113 // fully defined, but the truncated byte is ????????.
5114 //
5115 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
5116 AShadow = IRB.CreateTrunc(V: AShadow, DestTy: ShadowType, Name: "_ms_trunc_shadow");
5117 AShadow = maybeExtendVectorShadowWithZeros(Shadow: AShadow, I);
5118
5119 Value *WriteThroughShadow = getShadow(V: WriteThrough);
5120
5121 Value *Shadow = IRB.CreateSelect(C: Mask, True: AShadow, False: WriteThroughShadow);
5122 setShadow(V: &I, SV: Shadow);
5123 setOriginForNaryOp(I);
5124 }
5125
5126 // Handle llvm.x86.avx512.* instructions that take vector(s) of floating-point
5127 // values and perform an operation whose shadow propagation should be handled
5128 // as all-or-nothing [*], with masking provided by a vector and a mask
5129 // supplied as an integer.
5130 //
5131 // [*] if all bits of a vector element are initialized, the output is fully
5132 // initialized; otherwise, the output is fully uninitialized
5133 //
5134 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5135 // (<16 x float>, <16 x float>, i16)
5136 // A WriteThru Mask
5137 //
5138 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5139 // (<2 x double>, <2 x double>, i8)
5140 // A WriteThru Mask
5141 //
5142 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5143 // (<8 x double>, i32, <8 x double>, i8, i32)
5144 // A Imm WriteThru Mask Rounding
5145 //
5146 // <16 x float> @llvm.x86.avx512.mask.scalef.ps.512
5147 // (<16 x float>, <16 x float>, <16 x float>, i16, i32)
5148 // WriteThru A B Mask Rnd
5149 //
5150 // All operands other than A, B, ..., and WriteThru (e.g., Mask, Imm,
5151 // Rounding) must be fully initialized.
5152 //
5153 // Dst[i] = Mask[i] ? some_op(A[i], B[i], ...)
5154 // : WriteThru[i]
5155 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i] | B_shadow[i] | ...)
5156 // : WriteThru_shadow[i]
5157 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I,
5158 SmallVector<unsigned, 4> DataIndices,
5159 unsigned WriteThruIndex,
5160 unsigned MaskIndex) {
5161 IRBuilder<> IRB(&I);
5162
5163 unsigned NumArgs = I.arg_size();
5164
5165 assert(WriteThruIndex < NumArgs);
5166 assert(MaskIndex < NumArgs);
5167 assert(WriteThruIndex != MaskIndex);
5168 Value *WriteThru = I.getOperand(i_nocapture: WriteThruIndex);
5169
5170 unsigned OutputNumElements =
5171 cast<FixedVectorType>(Val: WriteThru->getType())->getNumElements();
5172
5173 assert(DataIndices.size() > 0);
5174
5175 bool isData[16] = {false};
5176 assert(NumArgs <= 16);
5177 for (unsigned i : DataIndices) {
5178 assert(i < NumArgs);
5179 assert(i != WriteThruIndex);
5180 assert(i != MaskIndex);
5181
5182 isData[i] = true;
5183
5184 Value *A = I.getOperand(i_nocapture: i);
5185 assert(isFixedFPVector(A));
5186 [[maybe_unused]] unsigned ANumElements =
5187 cast<FixedVectorType>(Val: A->getType())->getNumElements();
5188 assert(ANumElements == OutputNumElements);
5189 }
5190
5191 Value *Mask = I.getOperand(i_nocapture: MaskIndex);
5192
5193 assert(isFixedFPVector(WriteThru));
5194
5195 for (unsigned i = 0; i < NumArgs; ++i) {
5196 if (!isData[i] && i != WriteThruIndex) {
5197 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5198 // they be fully initialized.
5199 assert(I.getOperand(i)->getType()->isIntegerTy());
5200 insertCheckShadowOf(Val: I.getOperand(i_nocapture: i), OrigIns: &I);
5201 }
5202 }
5203
5204 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5205 if (Mask->getType()->getScalarSizeInBits() == 8 && OutputNumElements < 8)
5206 Mask = IRB.CreateTrunc(V: Mask, DestTy: Type::getIntNTy(C&: *MS.C, N: OutputNumElements));
5207 assert(Mask->getType()->getScalarSizeInBits() == OutputNumElements);
5208
5209 assert(I.getType() == WriteThru->getType());
5210
5211 Mask = IRB.CreateBitCast(
5212 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: OutputNumElements));
5213
5214 Value *DataShadow = nullptr;
5215 for (unsigned i : DataIndices) {
5216 Value *A = I.getOperand(i_nocapture: i);
5217 if (DataShadow)
5218 DataShadow = IRB.CreateOr(LHS: DataShadow, RHS: getShadow(V: A));
5219 else
5220 DataShadow = getShadow(V: A);
5221 }
5222
5223 // All-or-nothing shadow
5224 DataShadow =
5225 IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: DataShadow, RHS: getCleanShadow(V: DataShadow)),
5226 DestTy: DataShadow->getType());
5227
5228 Value *WriteThruShadow = getShadow(V: WriteThru);
5229
5230 Value *Shadow = IRB.CreateSelect(C: Mask, True: DataShadow, False: WriteThruShadow);
5231 setShadow(V: &I, SV: Shadow);
5232
5233 setOriginForNaryOp(I);
5234 }
5235
5236 // For sh.* compiler intrinsics:
5237 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5238 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5239 // A B WriteThru Mask RoundingMode
5240 //
5241 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5242 // DstShadow[1..7] = AShadow[1..7]
5243 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5244 IRBuilder<> IRB(&I);
5245
5246 assert(I.arg_size() == 5);
5247 Value *A = I.getOperand(i_nocapture: 0);
5248 Value *B = I.getOperand(i_nocapture: 1);
5249 Value *WriteThrough = I.getOperand(i_nocapture: 2);
5250 Value *Mask = I.getOperand(i_nocapture: 3);
5251 Value *RoundingMode = I.getOperand(i_nocapture: 4);
5252
5253 // Technically, we could probably just check whether the LSB is
5254 // initialized, but intuitively it feels like a partly uninitialized mask
5255 // is unintended, and we should warn the user immediately.
5256 insertCheckShadowOf(Val: Mask, OrigIns: &I);
5257 insertCheckShadowOf(Val: RoundingMode, OrigIns: &I);
5258
5259 assert(isa<FixedVectorType>(A->getType()));
5260 unsigned NumElements =
5261 cast<FixedVectorType>(Val: A->getType())->getNumElements();
5262 assert(NumElements == 8);
5263 assert(A->getType() == B->getType());
5264 assert(B->getType() == WriteThrough->getType());
5265 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5266 assert(RoundingMode->getType()->isIntegerTy());
5267
5268 Value *ALowerShadow = extractLowerShadow(IRB, V: A);
5269 Value *BLowerShadow = extractLowerShadow(IRB, V: B);
5270
5271 Value *ABLowerShadow = IRB.CreateOr(LHS: ALowerShadow, RHS: BLowerShadow);
5272
5273 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, V: WriteThrough);
5274
5275 Mask = IRB.CreateBitCast(
5276 V: Mask, DestTy: FixedVectorType::get(ElementType: IRB.getInt1Ty(), NumElts: NumElements));
5277 Value *MaskLower =
5278 IRB.CreateExtractElement(Vec: Mask, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0));
5279
5280 Value *AShadow = getShadow(V: A);
5281 Value *DstLowerShadow =
5282 IRB.CreateSelect(C: MaskLower, True: ABLowerShadow, False: WriteThroughLowerShadow);
5283 Value *DstShadow = IRB.CreateInsertElement(
5284 Vec: AShadow, NewElt: DstLowerShadow, Idx: ConstantInt::get(Ty: IRB.getInt32Ty(), V: 0),
5285 Name: "_msprop");
5286
5287 setShadow(V: &I, SV: DstShadow);
5288 setOriginForNaryOp(I);
5289 }
5290
5291 // Approximately handle AVX Galois Field Affine Transformation
5292 //
5293 // e.g.,
5294 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5295 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5296 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5297 // Out A x b
5298 // where A and x are packed matrices, b is a vector,
5299 // Out = A * x + b in GF(2)
5300 //
5301 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5302 // computation also includes a parity calculation.
5303 //
5304 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5305 // Out_Shadow = (V1_Shadow & V2_Shadow)
5306 // | (V1 & V2_Shadow)
5307 // | (V1_Shadow & V2 )
5308 //
5309 // We approximate the shadow of gf2p8affineqb using:
5310 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5311 // | gf2p8affineqb(x, A_shadow, 0)
5312 // | gf2p8affineqb(x_Shadow, A, 0)
5313 // | set1_epi8(b_Shadow)
5314 //
5315 // This approximation has false negatives: if an intermediate dot-product
5316 // contains an even number of 1's, the parity is 0.
5317 // It has no false positives.
5318 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5319 IRBuilder<> IRB(&I);
5320
5321 assert(I.arg_size() == 3);
5322 Value *A = I.getOperand(i_nocapture: 0);
5323 Value *X = I.getOperand(i_nocapture: 1);
5324 Value *B = I.getOperand(i_nocapture: 2);
5325
5326 assert(isFixedIntVector(A));
5327 assert(cast<VectorType>(A->getType())
5328 ->getElementType()
5329 ->getScalarSizeInBits() == 8);
5330
5331 assert(A->getType() == X->getType());
5332
5333 assert(B->getType()->isIntegerTy());
5334 assert(B->getType()->getScalarSizeInBits() == 8);
5335
5336 assert(I.getType() == A->getType());
5337
5338 Value *AShadow = getShadow(V: A);
5339 Value *XShadow = getShadow(V: X);
5340 Value *BZeroShadow = getCleanShadow(V: B);
5341
5342 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5343 RetTy: I.getType(), ID: I.getIntrinsicID(), Args: {XShadow, AShadow, BZeroShadow});
5344 CallInst *AShadowX = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
5345 Args: {X, AShadow, BZeroShadow});
5346 CallInst *XShadowA = IRB.CreateIntrinsic(RetTy: I.getType(), ID: I.getIntrinsicID(),
5347 Args: {XShadow, A, BZeroShadow});
5348
5349 unsigned NumElements = cast<FixedVectorType>(Val: I.getType())->getNumElements();
5350 Value *BShadow = getShadow(V: B);
5351 Value *BBroadcastShadow = getCleanShadow(V: AShadow);
5352 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5353 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5354 // lower appropriately (e.g., VPBROADCASTB).
5355 // Besides, b is often a constant, in which case it is fully initialized.
5356 for (unsigned i = 0; i < NumElements; i++)
5357 BBroadcastShadow = IRB.CreateInsertElement(Vec: BBroadcastShadow, NewElt: BShadow, Idx: i);
5358
5359 setShadow(V: &I, SV: IRB.CreateOr(
5360 Ops: {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5361 setOriginForNaryOp(I);
5362 }
5363
5364 // Handle Arm NEON vector load intrinsics (vld*).
5365 //
5366 // The WithLane instructions (ld[234]lane) are similar to:
5367 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5368 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5369 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5370 // %A)
5371 //
5372 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5373 // to:
5374 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5375 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5376 unsigned int numArgs = I.arg_size();
5377
5378 // Return type is a struct of vectors of integers or floating-point
5379 assert(I.getType()->isStructTy());
5380 [[maybe_unused]] StructType *RetTy = cast<StructType>(Val: I.getType());
5381 assert(RetTy->getNumElements() > 0);
5382 assert(RetTy->getElementType(0)->isIntOrIntVectorTy() ||
5383 RetTy->getElementType(0)->isFPOrFPVectorTy());
5384 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5385 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5386
5387 if (WithLane) {
5388 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5389 assert(4 <= numArgs && numArgs <= 6);
5390
5391 // Return type is a struct of the input vectors
5392 assert(RetTy->getNumElements() + 2 == numArgs);
5393 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5394 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5395 } else {
5396 assert(numArgs == 1);
5397 }
5398
5399 IRBuilder<> IRB(&I);
5400
5401 SmallVector<Value *, 6> ShadowArgs;
5402 if (WithLane) {
5403 for (unsigned int i = 0; i < numArgs - 2; i++)
5404 ShadowArgs.push_back(Elt: getShadow(V: I.getArgOperand(i)));
5405
5406 // Lane number, passed verbatim
5407 Value *LaneNumber = I.getArgOperand(i: numArgs - 2);
5408 ShadowArgs.push_back(Elt: LaneNumber);
5409
5410 // TODO: blend shadow of lane number into output shadow?
5411 insertCheckShadowOf(Val: LaneNumber, OrigIns: &I);
5412 }
5413
5414 Value *Src = I.getArgOperand(i: numArgs - 1);
5415 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5416
5417 Type *SrcShadowTy = getShadowTy(V: Src);
5418 auto [SrcShadowPtr, SrcOriginPtr] =
5419 getShadowOriginPtr(Addr: Src, IRB, ShadowTy: SrcShadowTy, Alignment: Align(1), /*isStore*/ false);
5420 ShadowArgs.push_back(Elt: SrcShadowPtr);
5421
5422 // The NEON vector load instructions handled by this function all have
5423 // integer variants. It is easier to use those rather than trying to cast
5424 // a struct of vectors of floats into a struct of vectors of integers.
5425 CallInst *CI =
5426 IRB.CreateIntrinsic(RetTy: getShadowTy(V: &I), ID: I.getIntrinsicID(), Args: ShadowArgs);
5427 setShadow(V: &I, SV: CI);
5428
5429 if (!MS.TrackOrigins)
5430 return;
5431
5432 Value *PtrSrcOrigin = IRB.CreateLoad(Ty: MS.OriginTy, Ptr: SrcOriginPtr);
5433 setOrigin(V: &I, Origin: PtrSrcOrigin);
5434 }
5435
5436 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5437 /// and vst{2,3,4}lane).
5438 ///
5439 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5440 /// last argument, with the initial arguments being the inputs (and lane
5441 /// number for vst{2,3,4}lane). They return void.
5442 ///
5443 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5444 /// abcdabcdabcdabcd... into *outP
5445 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5446 /// writes aaaa...bbbb...cccc...dddd... into *outP
5447 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5448 /// These instructions can all be instrumented with essentially the same
5449 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5450 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5451 IRBuilder<> IRB(&I);
5452
5453 // Don't use getNumOperands() because it includes the callee
5454 int numArgOperands = I.arg_size();
5455
5456 // The last arg operand is the output (pointer)
5457 assert(numArgOperands >= 1);
5458 Value *Addr = I.getArgOperand(i: numArgOperands - 1);
5459 assert(Addr->getType()->isPointerTy());
5460 int skipTrailingOperands = 1;
5461
5462 if (ClCheckAccessAddress)
5463 insertCheckShadowOf(Val: Addr, OrigIns: &I);
5464
5465 // Second-last operand is the lane number (for vst{2,3,4}lane)
5466 if (useLane) {
5467 skipTrailingOperands++;
5468 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5469 assert(isa<IntegerType>(
5470 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5471 }
5472
5473 SmallVector<Value *, 8> ShadowArgs;
5474 // All the initial operands are the inputs
5475 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5476 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5477 Value *Shadow = getShadow(I: &I, i);
5478 ShadowArgs.append(NumInputs: 1, Elt: Shadow);
5479 }
5480
5481 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5482 // e.g., for:
5483 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5484 // we know the type of the output (and its shadow) is <16 x i8>.
5485 //
5486 // Arm NEON VST is unusual because the last argument is the output address:
5487 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5488 // call void @llvm.aarch64.neon.st2.v16i8.p0
5489 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5490 // and we have no type information about P's operand. We must manually
5491 // compute the type (<16 x i8> x 2).
5492 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5493 ElementType: cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getElementType(),
5494 NumElts: cast<FixedVectorType>(Val: I.getArgOperand(i: 0)->getType())->getNumElements() *
5495 (numArgOperands - skipTrailingOperands));
5496 Type *OutputShadowTy = getShadowTy(OrigTy: OutputVectorTy);
5497
5498 if (useLane)
5499 ShadowArgs.append(NumInputs: 1,
5500 Elt: I.getArgOperand(i: numArgOperands - skipTrailingOperands));
5501
5502 Value *OutputShadowPtr, *OutputOriginPtr;
5503 // AArch64 NEON does not need alignment (unless OS requires it)
5504 std::tie(args&: OutputShadowPtr, args&: OutputOriginPtr) = getShadowOriginPtr(
5505 Addr, IRB, ShadowTy: OutputShadowTy, Alignment: Align(1), /*isStore*/ true);
5506 ShadowArgs.append(NumInputs: 1, Elt: OutputShadowPtr);
5507
5508 CallInst *CI =
5509 IRB.CreateIntrinsic(RetTy: IRB.getVoidTy(), ID: I.getIntrinsicID(), Args: ShadowArgs);
5510 setShadow(V: &I, SV: CI);
5511
5512 if (MS.TrackOrigins) {
5513 // TODO: if we modelled the vst* instruction more precisely, we could
5514 // more accurately track the origins (e.g., if both inputs are
5515 // uninitialized for vst2, we currently blame the second input, even
5516 // though part of the output depends only on the first input).
5517 //
5518 // This is particularly imprecise for vst{2,3,4}lane, since only one
5519 // lane of each input is actually copied to the output.
5520 OriginCombiner OC(this, IRB);
5521 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5522 OC.Add(V: I.getArgOperand(i));
5523
5524 const DataLayout &DL = F.getDataLayout();
5525 OC.DoneAndStoreOrigin(TS: DL.getTypeStoreSize(Ty: OutputVectorTy),
5526 OriginPtr: OutputOriginPtr);
5527 }
5528 }
5529
5530 // Integer matrix multiplication:
5531 // - <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8
5532 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5533 // - <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5534 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5535 // - <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8
5536 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5537 //
5538 // Note:
5539 // - <4 x i32> is a 2x2 matrix
5540 // - <16 x i8> %X and %Y are 2x8 and 8x2 matrices respectively
5541 //
5542 // 2x8 %X 8x2 %Y
5543 // [ X01 X02 X03 X04 X05 X06 X07 X08 ] [ Y01 Y09 ]
5544 // [ X09 X10 X11 X12 X13 X14 X15 X16 ] x [ Y02 Y10 ]
5545 // [ Y03 Y11 ]
5546 // [ Y04 Y12 ]
5547 // [ Y05 Y13 ]
5548 // [ Y06 Y14 ]
5549 // [ Y07 Y15 ]
5550 // [ Y08 Y16 ]
5551 //
5552 // The general shadow propagation approach is:
5553 // 1) get the shadows of the input matrices %X and %Y
5554 // 2) change the shadow values to 0x1 if the corresponding value is fully
5555 // initialized, and 0x0 otherwise
5556 // 3) perform a matrix multiplication on the shadows of %X and %Y. The output
5557 // will be a 2x2 matrix; for each element, a value of 0x8 means all the
5558 // corresponding inputs were clean.
5559 // 4) blend in the shadow of %R
5560 //
5561 // TODO: consider allowing multiplication of zero with an uninitialized value
5562 // to result in an initialized value.
5563 //
5564 // Floating-point matrix multiplication:
5565 // - <4 x float> @llvm.aarch64.neon.bfmmla
5566 // (<4 x float> %R, <8 x bfloat> %X, <8 x bfloat> %Y)
5567 // %X and %Y are 2x4 and 4x2 matrices respectively
5568 //
5569 // Although there are half as many elements of %X and %Y compared to the
5570 // integer case, each element is twice the bit-width. Thus, we can reuse the
5571 // shadow propagation logic if we cast the shadows to the same type as the
5572 // integer case, and apply ummla to the shadows:
5573 //
5574 // 2x4 %X 4x2 %Y
5575 // [ A01:A02 A03:A04 A05:A06 A07:A08 ] [ B01:B02 B09:B10 ]
5576 // [ A09:A10 A11:A12 A13:A14 A15:A16 ] x [ B03:B04 B11:B12 ]
5577 // [ B05:B06 B13:B14 ]
5578 // [ B07:B08 B15:B16 ]
5579 //
5580 // For example, consider multiplying the first row of %X with the first
5581 // column of Y. We want to know if
5582 // A01:A02*B01:B02 + A03:A04*B03:B04 + A05:A06*B06:B06 + A07:A08*B07:B08 is
5583 // fully initialized, which will be true if and only if (A01, A02, ..., A08)
5584 // and (B01, B02, ..., B08) are each fully initialized. This latter condition
5585 // is equivalent to what is tested by the instrumentation for the integer
5586 // form.
5587 void handleNEONMatrixMultiply(IntrinsicInst &I) {
5588 IRBuilder<> IRB(&I);
5589
5590 assert(I.arg_size() == 3);
5591 Value *R = I.getArgOperand(i: 0);
5592 Value *A = I.getArgOperand(i: 1);
5593 Value *B = I.getArgOperand(i: 2);
5594
5595 assert(I.getType() == R->getType());
5596
5597 assert(isa<FixedVectorType>(R->getType()));
5598 assert(isa<FixedVectorType>(A->getType()));
5599 assert(isa<FixedVectorType>(B->getType()));
5600
5601 [[maybe_unused]] FixedVectorType *RTy = cast<FixedVectorType>(Val: R->getType());
5602 [[maybe_unused]] FixedVectorType *ATy = cast<FixedVectorType>(Val: A->getType());
5603 [[maybe_unused]] FixedVectorType *BTy = cast<FixedVectorType>(Val: B->getType());
5604
5605 Value *ShadowR = getShadow(I: &I, i: 0);
5606 Value *ShadowA = getShadow(I: &I, i: 1);
5607 Value *ShadowB = getShadow(I: &I, i: 2);
5608
5609 // We will use ummla to compute the shadow. These are the types it expects.
5610 // These are also the types of the corresponding shadows.
5611 FixedVectorType *ExpectedRTy =
5612 FixedVectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: 32), NumElts: 4);
5613 FixedVectorType *ExpectedATy =
5614 FixedVectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: 8), NumElts: 16);
5615 FixedVectorType *ExpectedBTy =
5616 FixedVectorType::get(ElementType: IntegerType::get(C&: *MS.C, NumBits: 8), NumElts: 16);
5617
5618 if (RTy->getElementType()->isIntegerTy()) {
5619 // Types of R and A/B are not identical e.g., <4 x i32> %R, <16 x i8> %A
5620 assert(ATy->getElementType()->isIntegerTy());
5621
5622 assert(RTy == ExpectedRTy);
5623 assert(ATy == ExpectedATy);
5624 assert(BTy == ExpectedBTy);
5625 } else {
5626 assert(ATy->getElementType()->isFloatingPointTy());
5627 assert(BTy->getElementType()->isFloatingPointTy());
5628
5629 // Technically, what we care about is that:
5630 // getShadowTy(RTy)->canLosslesslyBitCastTo(ExpectedRTy)) etc.
5631 // but that is equivalent.
5632 assert(RTy->canLosslesslyBitCastTo(ExpectedRTy));
5633 assert(ATy->canLosslesslyBitCastTo(ExpectedATy));
5634 assert(BTy->canLosslesslyBitCastTo(ExpectedBTy));
5635
5636 ShadowA = IRB.CreateBitCast(V: ShadowA, DestTy: getShadowTy(OrigTy: ExpectedATy));
5637 ShadowB = IRB.CreateBitCast(V: ShadowB, DestTy: getShadowTy(OrigTy: ExpectedBTy));
5638 }
5639 assert(ATy->getElementType() == BTy->getElementType());
5640
5641 // From this point on, use Expected{R,A,B}Type.
5642
5643 // If the value is fully initialized, the shadow will be 000...001.
5644 // Otherwise, the shadow will be all zero.
5645 // (This is the opposite of how we typically handle shadows.)
5646 ShadowA =
5647 IRB.CreateZExt(V: IRB.CreateICmpEQ(LHS: ShadowA, RHS: getCleanShadow(OrigTy: ExpectedATy)),
5648 DestTy: getShadowTy(OrigTy: ExpectedATy));
5649 ShadowB =
5650 IRB.CreateZExt(V: IRB.CreateICmpEQ(LHS: ShadowB, RHS: getCleanShadow(OrigTy: ExpectedBTy)),
5651 DestTy: getShadowTy(OrigTy: ExpectedBTy));
5652
5653 Value *ShadowAB =
5654 IRB.CreateIntrinsic(RetTy: ExpectedRTy, ID: Intrinsic::aarch64_neon_ummla,
5655 Args: {getCleanShadow(OrigTy: ExpectedRTy), ShadowA, ShadowB});
5656
5657 // ummla multiplies a 2x8 matrix with an 8x2 matrix. If all entries of the
5658 // input matrices are equal to 0x1, all entries of the output matrix will
5659 // be 0x8.
5660 Value *FullyInit = ConstantVector::getSplat(
5661 EC: ExpectedRTy->getElementCount(),
5662 Elt: ConstantInt::get(Ty: ExpectedRTy->getElementType(), V: 0x8));
5663
5664 ShadowAB = IRB.CreateSExt(V: IRB.CreateICmpNE(LHS: ShadowAB, RHS: FullyInit),
5665 DestTy: ShadowAB->getType());
5666
5667 ShadowR = IRB.CreateSExt(
5668 V: IRB.CreateICmpNE(LHS: ShadowR, RHS: getCleanShadow(OrigTy: ExpectedRTy)), DestTy: ExpectedRTy);
5669
5670 setShadow(V: &I, SV: IRB.CreateOr(LHS: ShadowAB, RHS: ShadowR));
5671 setOriginForNaryOp(I);
5672 }
5673
5674 /// Handle intrinsics by applying the intrinsic to the shadows.
5675 ///
5676 /// The trailing arguments are passed verbatim to the intrinsic, though any
5677 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5678 /// intrinsic with one trailing verbatim argument:
5679 /// out = intrinsic(var1, var2, opType)
5680 /// we compute:
5681 /// shadow[out] =
5682 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5683 ///
5684 /// Typically, shadowIntrinsicID will be specified by the caller to be
5685 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5686 /// intrinsic of the same type.
5687 ///
5688 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5689 /// bit-patterns (for example, if the intrinsic accepts floats for
5690 /// var1, we require that it doesn't care if inputs are NaNs).
5691 ///
5692 /// For example, this can be applied to the Arm NEON vector table intrinsics
5693 /// (tbl{1,2,3,4}).
5694 ///
5695 /// The origin is approximated using setOriginForNaryOp.
5696 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5697 Intrinsic::ID shadowIntrinsicID,
5698 unsigned int trailingVerbatimArgs) {
5699 IRBuilder<> IRB(&I);
5700
5701 assert(trailingVerbatimArgs < I.arg_size());
5702
5703 SmallVector<Value *, 8> ShadowArgs;
5704 // Don't use getNumOperands() because it includes the callee
5705 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5706 Value *Shadow = getShadow(I: &I, i);
5707
5708 // Shadows are integer-ish types but some intrinsics require a
5709 // different (e.g., floating-point) type.
5710 ShadowArgs.push_back(
5711 Elt: IRB.CreateBitCast(V: Shadow, DestTy: I.getArgOperand(i)->getType()));
5712 }
5713
5714 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5715 i++) {
5716 Value *Arg = I.getArgOperand(i);
5717 ShadowArgs.push_back(Elt: Arg);
5718 }
5719
5720 CallInst *CI =
5721 IRB.CreateIntrinsic(RetTy: I.getType(), ID: shadowIntrinsicID, Args: ShadowArgs);
5722 Value *CombinedShadow = CI;
5723
5724 // Combine the computed shadow with the shadow of trailing args
5725 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5726 i++) {
5727 Value *Shadow =
5728 CreateShadowCast(IRB, V: getShadow(I: &I, i), dstTy: CombinedShadow->getType());
5729 CombinedShadow = IRB.CreateOr(LHS: Shadow, RHS: CombinedShadow, Name: "_msprop");
5730 }
5731
5732 setShadow(V: &I, SV: IRB.CreateBitCast(V: CombinedShadow, DestTy: getShadowTy(V: &I)));
5733
5734 setOriginForNaryOp(I);
5735 }
5736
5737 // Approximation only
5738 //
5739 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5740 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5741 assert(I.arg_size() == 2);
5742
5743 handleShadowOr(I);
5744 }
5745
5746 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5747 switch (I.getIntrinsicID()) {
5748 case Intrinsic::uadd_with_overflow:
5749 case Intrinsic::sadd_with_overflow:
5750 case Intrinsic::usub_with_overflow:
5751 case Intrinsic::ssub_with_overflow:
5752 case Intrinsic::umul_with_overflow:
5753 case Intrinsic::smul_with_overflow:
5754 handleArithmeticWithOverflow(I);
5755 break;
5756 case Intrinsic::abs:
5757 handleAbsIntrinsic(I);
5758 break;
5759 case Intrinsic::bitreverse:
5760 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
5761 /*trailingVerbatimArgs*/ 0);
5762 break;
5763 case Intrinsic::is_fpclass:
5764 handleIsFpClass(I);
5765 break;
5766 case Intrinsic::lifetime_start:
5767 handleLifetimeStart(I);
5768 break;
5769 case Intrinsic::launder_invariant_group:
5770 case Intrinsic::strip_invariant_group:
5771 handleInvariantGroup(I);
5772 break;
5773 case Intrinsic::bswap:
5774 handleBswap(I);
5775 break;
5776 case Intrinsic::ctlz:
5777 case Intrinsic::cttz:
5778 handleCountLeadingTrailingZeros(I);
5779 break;
5780 case Intrinsic::masked_compressstore:
5781 handleMaskedCompressStore(I);
5782 break;
5783 case Intrinsic::masked_expandload:
5784 handleMaskedExpandLoad(I);
5785 break;
5786 case Intrinsic::masked_gather:
5787 handleMaskedGather(I);
5788 break;
5789 case Intrinsic::masked_scatter:
5790 handleMaskedScatter(I);
5791 break;
5792 case Intrinsic::masked_store:
5793 handleMaskedStore(I);
5794 break;
5795 case Intrinsic::masked_load:
5796 handleMaskedLoad(I);
5797 break;
5798 case Intrinsic::vector_reduce_and:
5799 handleVectorReduceAndIntrinsic(I);
5800 break;
5801 case Intrinsic::vector_reduce_or:
5802 handleVectorReduceOrIntrinsic(I);
5803 break;
5804
5805 case Intrinsic::vector_reduce_add:
5806 case Intrinsic::vector_reduce_xor:
5807 case Intrinsic::vector_reduce_mul:
5808 // Signed/Unsigned Min/Max
5809 // TODO: handling similarly to AND/OR may be more precise.
5810 case Intrinsic::vector_reduce_smax:
5811 case Intrinsic::vector_reduce_smin:
5812 case Intrinsic::vector_reduce_umax:
5813 case Intrinsic::vector_reduce_umin:
5814 // TODO: this has no false positives, but arguably we should check that all
5815 // the bits are initialized.
5816 case Intrinsic::vector_reduce_fmax:
5817 case Intrinsic::vector_reduce_fmin:
5818 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5819 break;
5820
5821 case Intrinsic::vector_reduce_fadd:
5822 case Intrinsic::vector_reduce_fmul:
5823 handleVectorReduceWithStarterIntrinsic(I);
5824 break;
5825
5826 case Intrinsic::scmp:
5827 case Intrinsic::ucmp: {
5828 handleShadowOr(I);
5829 break;
5830 }
5831
5832 case Intrinsic::fshl:
5833 case Intrinsic::fshr:
5834 handleFunnelShift(I);
5835 break;
5836
5837 case Intrinsic::is_constant:
5838 // The result of llvm.is.constant() is always defined.
5839 setShadow(V: &I, SV: getCleanShadow(V: &I));
5840 setOrigin(V: &I, Origin: getCleanOrigin());
5841 break;
5842
5843 default:
5844 return false;
5845 }
5846
5847 return true;
5848 }
5849
5850 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5851 switch (I.getIntrinsicID()) {
5852 case Intrinsic::x86_sse_stmxcsr:
5853 handleStmxcsr(I);
5854 break;
5855 case Intrinsic::x86_sse_ldmxcsr:
5856 handleLdmxcsr(I);
5857 break;
5858
5859 // Convert Scalar Double Precision Floating-Point Value
5860 // to Unsigned Doubleword Integer
5861 // etc.
5862 case Intrinsic::x86_avx512_vcvtsd2usi64:
5863 case Intrinsic::x86_avx512_vcvtsd2usi32:
5864 case Intrinsic::x86_avx512_vcvtss2usi64:
5865 case Intrinsic::x86_avx512_vcvtss2usi32:
5866 case Intrinsic::x86_avx512_cvttss2usi64:
5867 case Intrinsic::x86_avx512_cvttss2usi:
5868 case Intrinsic::x86_avx512_cvttsd2usi64:
5869 case Intrinsic::x86_avx512_cvttsd2usi:
5870 case Intrinsic::x86_avx512_cvtusi2ss:
5871 case Intrinsic::x86_avx512_cvtusi642sd:
5872 case Intrinsic::x86_avx512_cvtusi642ss:
5873 handleSSEVectorConvertIntrinsic(I, NumUsedElements: 1, HasRoundingMode: true);
5874 break;
5875 case Intrinsic::x86_sse2_cvtsd2si64:
5876 case Intrinsic::x86_sse2_cvtsd2si:
5877 case Intrinsic::x86_sse2_cvtsd2ss:
5878 case Intrinsic::x86_sse2_cvttsd2si64:
5879 case Intrinsic::x86_sse2_cvttsd2si:
5880 case Intrinsic::x86_sse_cvtss2si64:
5881 case Intrinsic::x86_sse_cvtss2si:
5882 case Intrinsic::x86_sse_cvttss2si64:
5883 case Intrinsic::x86_sse_cvttss2si:
5884 handleSSEVectorConvertIntrinsic(I, NumUsedElements: 1);
5885 break;
5886 case Intrinsic::x86_sse_cvtps2pi:
5887 case Intrinsic::x86_sse_cvttps2pi:
5888 handleSSEVectorConvertIntrinsic(I, NumUsedElements: 2);
5889 break;
5890
5891 // TODO:
5892 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5893 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5894 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5895
5896 case Intrinsic::x86_vcvtps2ph_128:
5897 case Intrinsic::x86_vcvtps2ph_256: {
5898 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5899 break;
5900 }
5901
5902 // Convert Packed Single Precision Floating-Point Values
5903 // to Packed Signed Doubleword Integer Values
5904 //
5905 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5906 // (<16 x float>, <16 x i32>, i16, i32)
5907 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5908 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5909 break;
5910
5911 // Convert Packed Double Precision Floating-Point Values
5912 // to Packed Single Precision Floating-Point Values
5913 case Intrinsic::x86_sse2_cvtpd2ps:
5914 case Intrinsic::x86_sse2_cvtps2dq:
5915 case Intrinsic::x86_sse2_cvtpd2dq:
5916 case Intrinsic::x86_sse2_cvttps2dq:
5917 case Intrinsic::x86_sse2_cvttpd2dq:
5918 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5919 case Intrinsic::x86_avx_cvt_ps2dq_256:
5920 case Intrinsic::x86_avx_cvt_pd2dq_256:
5921 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5922 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5923 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5924 break;
5925 }
5926
5927 // Convert Single-Precision FP Value to 16-bit FP Value
5928 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5929 // (<16 x float>, i32, <16 x i16>, i16)
5930 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5931 // (<4 x float>, i32, <8 x i16>, i8)
5932 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5933 // (<8 x float>, i32, <8 x i16>, i8)
5934 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5935 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5936 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5937 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5938 break;
5939
5940 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5941 case Intrinsic::x86_avx512_psll_w_512:
5942 case Intrinsic::x86_avx512_psll_d_512:
5943 case Intrinsic::x86_avx512_psll_q_512:
5944 case Intrinsic::x86_avx512_pslli_w_512:
5945 case Intrinsic::x86_avx512_pslli_d_512:
5946 case Intrinsic::x86_avx512_pslli_q_512:
5947 case Intrinsic::x86_avx512_psrl_w_512:
5948 case Intrinsic::x86_avx512_psrl_d_512:
5949 case Intrinsic::x86_avx512_psrl_q_512:
5950 case Intrinsic::x86_avx512_psra_w_512:
5951 case Intrinsic::x86_avx512_psra_d_512:
5952 case Intrinsic::x86_avx512_psra_q_512:
5953 case Intrinsic::x86_avx512_psrli_w_512:
5954 case Intrinsic::x86_avx512_psrli_d_512:
5955 case Intrinsic::x86_avx512_psrli_q_512:
5956 case Intrinsic::x86_avx512_psrai_w_512:
5957 case Intrinsic::x86_avx512_psrai_d_512:
5958 case Intrinsic::x86_avx512_psrai_q_512:
5959 case Intrinsic::x86_avx512_psra_q_256:
5960 case Intrinsic::x86_avx512_psra_q_128:
5961 case Intrinsic::x86_avx512_psrai_q_256:
5962 case Intrinsic::x86_avx512_psrai_q_128:
5963 case Intrinsic::x86_avx2_psll_w:
5964 case Intrinsic::x86_avx2_psll_d:
5965 case Intrinsic::x86_avx2_psll_q:
5966 case Intrinsic::x86_avx2_pslli_w:
5967 case Intrinsic::x86_avx2_pslli_d:
5968 case Intrinsic::x86_avx2_pslli_q:
5969 case Intrinsic::x86_avx2_psrl_w:
5970 case Intrinsic::x86_avx2_psrl_d:
5971 case Intrinsic::x86_avx2_psrl_q:
5972 case Intrinsic::x86_avx2_psra_w:
5973 case Intrinsic::x86_avx2_psra_d:
5974 case Intrinsic::x86_avx2_psrli_w:
5975 case Intrinsic::x86_avx2_psrli_d:
5976 case Intrinsic::x86_avx2_psrli_q:
5977 case Intrinsic::x86_avx2_psrai_w:
5978 case Intrinsic::x86_avx2_psrai_d:
5979 case Intrinsic::x86_sse2_psll_w:
5980 case Intrinsic::x86_sse2_psll_d:
5981 case Intrinsic::x86_sse2_psll_q:
5982 case Intrinsic::x86_sse2_pslli_w:
5983 case Intrinsic::x86_sse2_pslli_d:
5984 case Intrinsic::x86_sse2_pslli_q:
5985 case Intrinsic::x86_sse2_psrl_w:
5986 case Intrinsic::x86_sse2_psrl_d:
5987 case Intrinsic::x86_sse2_psrl_q:
5988 case Intrinsic::x86_sse2_psra_w:
5989 case Intrinsic::x86_sse2_psra_d:
5990 case Intrinsic::x86_sse2_psrli_w:
5991 case Intrinsic::x86_sse2_psrli_d:
5992 case Intrinsic::x86_sse2_psrli_q:
5993 case Intrinsic::x86_sse2_psrai_w:
5994 case Intrinsic::x86_sse2_psrai_d:
5995 case Intrinsic::x86_mmx_psll_w:
5996 case Intrinsic::x86_mmx_psll_d:
5997 case Intrinsic::x86_mmx_psll_q:
5998 case Intrinsic::x86_mmx_pslli_w:
5999 case Intrinsic::x86_mmx_pslli_d:
6000 case Intrinsic::x86_mmx_pslli_q:
6001 case Intrinsic::x86_mmx_psrl_w:
6002 case Intrinsic::x86_mmx_psrl_d:
6003 case Intrinsic::x86_mmx_psrl_q:
6004 case Intrinsic::x86_mmx_psra_w:
6005 case Intrinsic::x86_mmx_psra_d:
6006 case Intrinsic::x86_mmx_psrli_w:
6007 case Intrinsic::x86_mmx_psrli_d:
6008 case Intrinsic::x86_mmx_psrli_q:
6009 case Intrinsic::x86_mmx_psrai_w:
6010 case Intrinsic::x86_mmx_psrai_d:
6011 handleVectorShiftIntrinsic(I, /* Variable */ false);
6012 break;
6013 case Intrinsic::x86_avx2_psllv_d:
6014 case Intrinsic::x86_avx2_psllv_d_256:
6015 case Intrinsic::x86_avx512_psllv_d_512:
6016 case Intrinsic::x86_avx2_psllv_q:
6017 case Intrinsic::x86_avx2_psllv_q_256:
6018 case Intrinsic::x86_avx512_psllv_q_512:
6019 case Intrinsic::x86_avx2_psrlv_d:
6020 case Intrinsic::x86_avx2_psrlv_d_256:
6021 case Intrinsic::x86_avx512_psrlv_d_512:
6022 case Intrinsic::x86_avx2_psrlv_q:
6023 case Intrinsic::x86_avx2_psrlv_q_256:
6024 case Intrinsic::x86_avx512_psrlv_q_512:
6025 case Intrinsic::x86_avx2_psrav_d:
6026 case Intrinsic::x86_avx2_psrav_d_256:
6027 case Intrinsic::x86_avx512_psrav_d_512:
6028 case Intrinsic::x86_avx512_psrav_q_128:
6029 case Intrinsic::x86_avx512_psrav_q_256:
6030 case Intrinsic::x86_avx512_psrav_q_512:
6031 handleVectorShiftIntrinsic(I, /* Variable */ true);
6032 break;
6033
6034 // Pack with Signed/Unsigned Saturation
6035 case Intrinsic::x86_sse2_packsswb_128:
6036 case Intrinsic::x86_sse2_packssdw_128:
6037 case Intrinsic::x86_sse2_packuswb_128:
6038 case Intrinsic::x86_sse41_packusdw:
6039 case Intrinsic::x86_avx2_packsswb:
6040 case Intrinsic::x86_avx2_packssdw:
6041 case Intrinsic::x86_avx2_packuswb:
6042 case Intrinsic::x86_avx2_packusdw:
6043 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
6044 // (<32 x i16> %a, <32 x i16> %b)
6045 // <32 x i16> @llvm.x86.avx512.packssdw.512
6046 // (<16 x i32> %a, <16 x i32> %b)
6047 // Note: AVX512 masked variants are auto-upgraded by LLVM.
6048 case Intrinsic::x86_avx512_packsswb_512:
6049 case Intrinsic::x86_avx512_packssdw_512:
6050 case Intrinsic::x86_avx512_packuswb_512:
6051 case Intrinsic::x86_avx512_packusdw_512:
6052 handleVectorPackIntrinsic(I);
6053 break;
6054
6055 case Intrinsic::x86_sse41_pblendvb:
6056 case Intrinsic::x86_sse41_blendvpd:
6057 case Intrinsic::x86_sse41_blendvps:
6058 case Intrinsic::x86_avx_blendv_pd_256:
6059 case Intrinsic::x86_avx_blendv_ps_256:
6060 case Intrinsic::x86_avx2_pblendvb:
6061 handleBlendvIntrinsic(I);
6062 break;
6063
6064 case Intrinsic::x86_avx_dp_ps_256:
6065 case Intrinsic::x86_sse41_dppd:
6066 case Intrinsic::x86_sse41_dpps:
6067 handleDppIntrinsic(I);
6068 break;
6069
6070 case Intrinsic::x86_mmx_packsswb:
6071 case Intrinsic::x86_mmx_packuswb:
6072 handleVectorPackIntrinsic(I, MMXEltSizeInBits: 16);
6073 break;
6074
6075 case Intrinsic::x86_mmx_packssdw:
6076 handleVectorPackIntrinsic(I, MMXEltSizeInBits: 32);
6077 break;
6078
6079 case Intrinsic::x86_mmx_psad_bw:
6080 handleVectorSadIntrinsic(I, IsMMX: true);
6081 break;
6082 case Intrinsic::x86_sse2_psad_bw:
6083 case Intrinsic::x86_avx2_psad_bw:
6084 handleVectorSadIntrinsic(I);
6085 break;
6086
6087 // Multiply and Add Packed Words
6088 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
6089 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
6090 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
6091 //
6092 // Multiply and Add Packed Signed and Unsigned Bytes
6093 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
6094 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
6095 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
6096 //
6097 // These intrinsics are auto-upgraded into non-masked forms:
6098 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
6099 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
6100 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
6101 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
6102 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
6103 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
6104 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
6105 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
6106 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
6107 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
6108 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
6109 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
6110 case Intrinsic::x86_sse2_pmadd_wd:
6111 case Intrinsic::x86_avx2_pmadd_wd:
6112 case Intrinsic::x86_avx512_pmaddw_d_512:
6113 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
6114 case Intrinsic::x86_avx2_pmadd_ub_sw:
6115 case Intrinsic::x86_avx512_pmaddubs_w_512:
6116 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6117 /*ZeroPurifies=*/true,
6118 /*EltSizeInBits=*/0,
6119 /*Lanes=*/kBothLanes);
6120 break;
6121
6122 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
6123 case Intrinsic::x86_ssse3_pmadd_ub_sw:
6124 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6125 /*ZeroPurifies=*/true,
6126 /*EltSizeInBits=*/8,
6127 /*Lanes=*/kBothLanes);
6128 break;
6129
6130 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
6131 case Intrinsic::x86_mmx_pmadd_wd:
6132 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6133 /*ZeroPurifies=*/true,
6134 /*EltSizeInBits=*/16,
6135 /*Lanes=*/kBothLanes);
6136 break;
6137
6138 // BFloat16 multiply-add to single-precision
6139 // <4 x float> llvm.aarch64.neon.bfmlalt
6140 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6141 case Intrinsic::aarch64_neon_bfmlalt:
6142 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6143 /*ZeroPurifies=*/false,
6144 /*EltSizeInBits=*/0,
6145 /*Lanes=*/kOddLanes);
6146 break;
6147
6148 // <4 x float> llvm.aarch64.neon.bfmlalb
6149 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6150 case Intrinsic::aarch64_neon_bfmlalb:
6151 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6152 /*ZeroPurifies=*/false,
6153 /*EltSizeInBits=*/0,
6154 /*Lanes=*/kEvenLanes);
6155 break;
6156
6157 // AVX Vector Neural Network Instructions: bytes
6158 //
6159 // Multiply and Add Signed Bytes
6160 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
6161 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6162 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
6163 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6164 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
6165 // (<16 x i32>, <64 x i8>, <64 x i8>)
6166 //
6167 // Multiply and Add Signed Bytes With Saturation
6168 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
6169 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6170 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
6171 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6172 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
6173 // (<16 x i32>, <64 x i8>, <64 x i8>)
6174 //
6175 // Multiply and Add Signed and Unsigned Bytes
6176 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
6177 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6178 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
6179 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6180 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
6181 // (<16 x i32>, <64 x i8>, <64 x i8>)
6182 //
6183 // Multiply and Add Signed and Unsigned Bytes With Saturation
6184 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
6185 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6186 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
6187 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6188 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
6189 // (<16 x i32>, <64 x i8>, <64 x i8>)
6190 //
6191 // Multiply and Add Unsigned and Signed Bytes
6192 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
6193 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6194 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
6195 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6196 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
6197 // (<16 x i32>, <64 x i8>, <64 x i8>)
6198 //
6199 // Multiply and Add Unsigned and Signed Bytes With Saturation
6200 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
6201 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6202 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
6203 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6204 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
6205 // (<16 x i32>, <64 x i8>, <64 x i8>)
6206 //
6207 // Multiply and Add Unsigned Bytes
6208 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
6209 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6210 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
6211 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6212 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
6213 // (<16 x i32>, <64 x i8>, <64 x i8>)
6214 //
6215 // Multiply and Add Unsigned Bytes With Saturation
6216 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
6217 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6218 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
6219 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6220 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
6221 // (<16 x i32>, <64 x i8>, <64 x i8>)
6222 //
6223 // These intrinsics are auto-upgraded into non-masked forms:
6224 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
6225 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6226 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
6227 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6228 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
6229 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6230 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6231 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6232 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6233 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6234 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6235 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6236 //
6237 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6238 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6239 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6240 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6241 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6242 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6243 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6244 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6245 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6246 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6247 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6248 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6249 case Intrinsic::x86_avx512_vpdpbusd_128:
6250 case Intrinsic::x86_avx512_vpdpbusd_256:
6251 case Intrinsic::x86_avx512_vpdpbusd_512:
6252 case Intrinsic::x86_avx512_vpdpbusds_128:
6253 case Intrinsic::x86_avx512_vpdpbusds_256:
6254 case Intrinsic::x86_avx512_vpdpbusds_512:
6255 case Intrinsic::x86_avx2_vpdpbssd_128:
6256 case Intrinsic::x86_avx2_vpdpbssd_256:
6257 case Intrinsic::x86_avx10_vpdpbssd_512:
6258 case Intrinsic::x86_avx2_vpdpbssds_128:
6259 case Intrinsic::x86_avx2_vpdpbssds_256:
6260 case Intrinsic::x86_avx10_vpdpbssds_512:
6261 case Intrinsic::x86_avx2_vpdpbsud_128:
6262 case Intrinsic::x86_avx2_vpdpbsud_256:
6263 case Intrinsic::x86_avx10_vpdpbsud_512:
6264 case Intrinsic::x86_avx2_vpdpbsuds_128:
6265 case Intrinsic::x86_avx2_vpdpbsuds_256:
6266 case Intrinsic::x86_avx10_vpdpbsuds_512:
6267 case Intrinsic::x86_avx2_vpdpbuud_128:
6268 case Intrinsic::x86_avx2_vpdpbuud_256:
6269 case Intrinsic::x86_avx10_vpdpbuud_512:
6270 case Intrinsic::x86_avx2_vpdpbuuds_128:
6271 case Intrinsic::x86_avx2_vpdpbuuds_256:
6272 case Intrinsic::x86_avx10_vpdpbuuds_512:
6273 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6274 /*ZeroPurifies=*/true,
6275 /*EltSizeInBits=*/0,
6276 /*Lanes=*/kBothLanes);
6277 break;
6278
6279 // AVX Vector Neural Network Instructions: words
6280 //
6281 // Multiply and Add Signed Word Integers
6282 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6283 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6284 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6285 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6286 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6287 // (<16 x i32>, <32 x i16>, <32 x i16>)
6288 //
6289 // Multiply and Add Signed Word Integers With Saturation
6290 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6291 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6292 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6293 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6294 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6295 // (<16 x i32>, <32 x i16>, <32 x i16>)
6296 //
6297 // Multiply and Add Signed and Unsigned Word Integers
6298 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6299 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6300 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6301 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6302 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6303 // (<16 x i32>, <32 x i16>, <32 x i16>)
6304 //
6305 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6306 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6307 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6308 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6309 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6310 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6311 // (<16 x i32>, <32 x i16>, <32 x i16>)
6312 //
6313 // Multiply and Add Unsigned and Signed Word Integers
6314 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6315 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6316 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6317 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6318 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6319 // (<16 x i32>, <32 x i16>, <32 x i16>)
6320 //
6321 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6322 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6323 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6324 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6325 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6326 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6327 // (<16 x i32>, <32 x i16>, <32 x i16>)
6328 //
6329 // Multiply and Add Unsigned and Unsigned Word Integers
6330 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6331 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6332 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6333 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6334 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6335 // (<16 x i32>, <32 x i16>, <32 x i16>)
6336 //
6337 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6338 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6339 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6340 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6341 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6342 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6343 // (<16 x i32>, <32 x i16>, <32 x i16>)
6344 //
6345 // These intrinsics are auto-upgraded into non-masked forms:
6346 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6347 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6348 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6349 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6350 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6351 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6352 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6353 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6354 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6355 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6356 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6357 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6358 //
6359 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6360 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6361 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6362 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6363 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6364 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6365 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6366 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6367 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6368 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6369 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6370 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6371 case Intrinsic::x86_avx512_vpdpwssd_128:
6372 case Intrinsic::x86_avx512_vpdpwssd_256:
6373 case Intrinsic::x86_avx512_vpdpwssd_512:
6374 case Intrinsic::x86_avx512_vpdpwssds_128:
6375 case Intrinsic::x86_avx512_vpdpwssds_256:
6376 case Intrinsic::x86_avx512_vpdpwssds_512:
6377 case Intrinsic::x86_avx2_vpdpwsud_128:
6378 case Intrinsic::x86_avx2_vpdpwsud_256:
6379 case Intrinsic::x86_avx10_vpdpwsud_512:
6380 case Intrinsic::x86_avx2_vpdpwsuds_128:
6381 case Intrinsic::x86_avx2_vpdpwsuds_256:
6382 case Intrinsic::x86_avx10_vpdpwsuds_512:
6383 case Intrinsic::x86_avx2_vpdpwusd_128:
6384 case Intrinsic::x86_avx2_vpdpwusd_256:
6385 case Intrinsic::x86_avx10_vpdpwusd_512:
6386 case Intrinsic::x86_avx2_vpdpwusds_128:
6387 case Intrinsic::x86_avx2_vpdpwusds_256:
6388 case Intrinsic::x86_avx10_vpdpwusds_512:
6389 case Intrinsic::x86_avx2_vpdpwuud_128:
6390 case Intrinsic::x86_avx2_vpdpwuud_256:
6391 case Intrinsic::x86_avx10_vpdpwuud_512:
6392 case Intrinsic::x86_avx2_vpdpwuuds_128:
6393 case Intrinsic::x86_avx2_vpdpwuuds_256:
6394 case Intrinsic::x86_avx10_vpdpwuuds_512:
6395 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6396 /*ZeroPurifies=*/true,
6397 /*EltSizeInBits=*/0,
6398 /*Lanes=*/kBothLanes);
6399 break;
6400
6401 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6402 // Precision
6403 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6404 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6405 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6406 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6407 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6408 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6409 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6410 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6411 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6412 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6413 /*ZeroPurifies=*/false,
6414 /*EltSizeInBits=*/0,
6415 /*Lanes=*/kBothLanes);
6416 break;
6417
6418 case Intrinsic::x86_sse_cmp_ss:
6419 case Intrinsic::x86_sse2_cmp_sd:
6420 case Intrinsic::x86_sse_comieq_ss:
6421 case Intrinsic::x86_sse_comilt_ss:
6422 case Intrinsic::x86_sse_comile_ss:
6423 case Intrinsic::x86_sse_comigt_ss:
6424 case Intrinsic::x86_sse_comige_ss:
6425 case Intrinsic::x86_sse_comineq_ss:
6426 case Intrinsic::x86_sse_ucomieq_ss:
6427 case Intrinsic::x86_sse_ucomilt_ss:
6428 case Intrinsic::x86_sse_ucomile_ss:
6429 case Intrinsic::x86_sse_ucomigt_ss:
6430 case Intrinsic::x86_sse_ucomige_ss:
6431 case Intrinsic::x86_sse_ucomineq_ss:
6432 case Intrinsic::x86_sse2_comieq_sd:
6433 case Intrinsic::x86_sse2_comilt_sd:
6434 case Intrinsic::x86_sse2_comile_sd:
6435 case Intrinsic::x86_sse2_comigt_sd:
6436 case Intrinsic::x86_sse2_comige_sd:
6437 case Intrinsic::x86_sse2_comineq_sd:
6438 case Intrinsic::x86_sse2_ucomieq_sd:
6439 case Intrinsic::x86_sse2_ucomilt_sd:
6440 case Intrinsic::x86_sse2_ucomile_sd:
6441 case Intrinsic::x86_sse2_ucomigt_sd:
6442 case Intrinsic::x86_sse2_ucomige_sd:
6443 case Intrinsic::x86_sse2_ucomineq_sd:
6444 handleVectorCompareScalarIntrinsic(I);
6445 break;
6446
6447 case Intrinsic::x86_avx_cmp_pd_256:
6448 case Intrinsic::x86_avx_cmp_ps_256:
6449 case Intrinsic::x86_sse2_cmp_pd:
6450 case Intrinsic::x86_sse_cmp_ps:
6451 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/true);
6452 break;
6453
6454 case Intrinsic::x86_bmi_bextr_32:
6455 case Intrinsic::x86_bmi_bextr_64:
6456 case Intrinsic::x86_bmi_bzhi_32:
6457 case Intrinsic::x86_bmi_bzhi_64:
6458 case Intrinsic::x86_bmi_pdep_32:
6459 case Intrinsic::x86_bmi_pdep_64:
6460 case Intrinsic::x86_bmi_pext_32:
6461 case Intrinsic::x86_bmi_pext_64:
6462 handleBmiIntrinsic(I);
6463 break;
6464
6465 case Intrinsic::x86_pclmulqdq:
6466 case Intrinsic::x86_pclmulqdq_256:
6467 case Intrinsic::x86_pclmulqdq_512:
6468 handlePclmulIntrinsic(I);
6469 break;
6470
6471 case Intrinsic::x86_avx_round_pd_256:
6472 case Intrinsic::x86_avx_round_ps_256:
6473 case Intrinsic::x86_sse41_round_pd:
6474 case Intrinsic::x86_sse41_round_ps:
6475 handleRoundPdPsIntrinsic(I);
6476 break;
6477
6478 case Intrinsic::x86_sse41_round_sd:
6479 case Intrinsic::x86_sse41_round_ss:
6480 handleUnarySdSsIntrinsic(I);
6481 break;
6482
6483 case Intrinsic::x86_sse2_max_sd:
6484 case Intrinsic::x86_sse_max_ss:
6485 case Intrinsic::x86_sse2_min_sd:
6486 case Intrinsic::x86_sse_min_ss:
6487 handleBinarySdSsIntrinsic(I);
6488 break;
6489
6490 case Intrinsic::x86_avx_vtestc_pd:
6491 case Intrinsic::x86_avx_vtestc_pd_256:
6492 case Intrinsic::x86_avx_vtestc_ps:
6493 case Intrinsic::x86_avx_vtestc_ps_256:
6494 case Intrinsic::x86_avx_vtestnzc_pd:
6495 case Intrinsic::x86_avx_vtestnzc_pd_256:
6496 case Intrinsic::x86_avx_vtestnzc_ps:
6497 case Intrinsic::x86_avx_vtestnzc_ps_256:
6498 case Intrinsic::x86_avx_vtestz_pd:
6499 case Intrinsic::x86_avx_vtestz_pd_256:
6500 case Intrinsic::x86_avx_vtestz_ps:
6501 case Intrinsic::x86_avx_vtestz_ps_256:
6502 case Intrinsic::x86_avx_ptestc_256:
6503 case Intrinsic::x86_avx_ptestnzc_256:
6504 case Intrinsic::x86_avx_ptestz_256:
6505 case Intrinsic::x86_sse41_ptestc:
6506 case Intrinsic::x86_sse41_ptestnzc:
6507 case Intrinsic::x86_sse41_ptestz:
6508 handleVtestIntrinsic(I);
6509 break;
6510
6511 // Packed Horizontal Add/Subtract
6512 case Intrinsic::x86_ssse3_phadd_w:
6513 case Intrinsic::x86_ssse3_phadd_w_128:
6514 case Intrinsic::x86_ssse3_phsub_w:
6515 case Intrinsic::x86_ssse3_phsub_w_128:
6516 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6517 /*ReinterpretElemWidth=*/16);
6518 break;
6519
6520 case Intrinsic::x86_avx2_phadd_w:
6521 case Intrinsic::x86_avx2_phsub_w:
6522 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6523 /*ReinterpretElemWidth=*/16);
6524 break;
6525
6526 // Packed Horizontal Add/Subtract
6527 case Intrinsic::x86_ssse3_phadd_d:
6528 case Intrinsic::x86_ssse3_phadd_d_128:
6529 case Intrinsic::x86_ssse3_phsub_d:
6530 case Intrinsic::x86_ssse3_phsub_d_128:
6531 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6532 /*ReinterpretElemWidth=*/32);
6533 break;
6534
6535 case Intrinsic::x86_avx2_phadd_d:
6536 case Intrinsic::x86_avx2_phsub_d:
6537 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6538 /*ReinterpretElemWidth=*/32);
6539 break;
6540
6541 // Packed Horizontal Add/Subtract and Saturate
6542 case Intrinsic::x86_ssse3_phadd_sw:
6543 case Intrinsic::x86_ssse3_phadd_sw_128:
6544 case Intrinsic::x86_ssse3_phsub_sw:
6545 case Intrinsic::x86_ssse3_phsub_sw_128:
6546 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6547 /*ReinterpretElemWidth=*/16);
6548 break;
6549
6550 case Intrinsic::x86_avx2_phadd_sw:
6551 case Intrinsic::x86_avx2_phsub_sw:
6552 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6553 /*ReinterpretElemWidth=*/16);
6554 break;
6555
6556 // Packed Single/Double Precision Floating-Point Horizontal Add
6557 case Intrinsic::x86_sse3_hadd_ps:
6558 case Intrinsic::x86_sse3_hadd_pd:
6559 case Intrinsic::x86_sse3_hsub_ps:
6560 case Intrinsic::x86_sse3_hsub_pd:
6561 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6562 break;
6563
6564 case Intrinsic::x86_avx_hadd_pd_256:
6565 case Intrinsic::x86_avx_hadd_ps_256:
6566 case Intrinsic::x86_avx_hsub_pd_256:
6567 case Intrinsic::x86_avx_hsub_ps_256:
6568 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6569 break;
6570
6571 case Intrinsic::x86_avx_maskstore_ps:
6572 case Intrinsic::x86_avx_maskstore_pd:
6573 case Intrinsic::x86_avx_maskstore_ps_256:
6574 case Intrinsic::x86_avx_maskstore_pd_256:
6575 case Intrinsic::x86_avx2_maskstore_d:
6576 case Intrinsic::x86_avx2_maskstore_q:
6577 case Intrinsic::x86_avx2_maskstore_d_256:
6578 case Intrinsic::x86_avx2_maskstore_q_256: {
6579 handleAVXMaskedStore(I);
6580 break;
6581 }
6582
6583 case Intrinsic::x86_avx_maskload_ps:
6584 case Intrinsic::x86_avx_maskload_pd:
6585 case Intrinsic::x86_avx_maskload_ps_256:
6586 case Intrinsic::x86_avx_maskload_pd_256:
6587 case Intrinsic::x86_avx2_maskload_d:
6588 case Intrinsic::x86_avx2_maskload_q:
6589 case Intrinsic::x86_avx2_maskload_d_256:
6590 case Intrinsic::x86_avx2_maskload_q_256: {
6591 handleAVXMaskedLoad(I);
6592 break;
6593 }
6594
6595 // Packed
6596 case Intrinsic::x86_avx512fp16_add_ph_512:
6597 case Intrinsic::x86_avx512fp16_sub_ph_512:
6598 case Intrinsic::x86_avx512fp16_mul_ph_512:
6599 case Intrinsic::x86_avx512fp16_div_ph_512:
6600 case Intrinsic::x86_avx512fp16_max_ph_512:
6601 case Intrinsic::x86_avx512fp16_min_ph_512:
6602 case Intrinsic::x86_avx512_min_ps_512:
6603 case Intrinsic::x86_avx512_min_pd_512:
6604 case Intrinsic::x86_avx512_max_ps_512:
6605 case Intrinsic::x86_avx512_max_pd_512: {
6606 // These AVX512 variants contain the rounding mode as a trailing flag.
6607 // Earlier variants do not have a trailing flag and are already handled
6608 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6609 // maybeHandleUnknownIntrinsic.
6610 [[maybe_unused]] bool Success =
6611 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6612 assert(Success);
6613 break;
6614 }
6615
6616 case Intrinsic::x86_avx_vpermilvar_pd:
6617 case Intrinsic::x86_avx_vpermilvar_pd_256:
6618 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6619 case Intrinsic::x86_avx_vpermilvar_ps:
6620 case Intrinsic::x86_avx_vpermilvar_ps_256:
6621 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6622 handleAVXVpermilvar(I);
6623 break;
6624 }
6625
6626 case Intrinsic::x86_avx512_vpermi2var_d_128:
6627 case Intrinsic::x86_avx512_vpermi2var_d_256:
6628 case Intrinsic::x86_avx512_vpermi2var_d_512:
6629 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6630 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6631 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6632 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6633 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6634 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6635 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6636 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6637 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6638 case Intrinsic::x86_avx512_vpermi2var_q_128:
6639 case Intrinsic::x86_avx512_vpermi2var_q_256:
6640 case Intrinsic::x86_avx512_vpermi2var_q_512:
6641 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6642 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6643 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6644 handleAVXVpermi2var(I);
6645 break;
6646
6647 // Packed Shuffle
6648 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6649 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6650 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6651 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6652 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6653 //
6654 // The following intrinsics are auto-upgraded:
6655 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6656 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6657 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6658 case Intrinsic::x86_avx2_pshuf_b:
6659 case Intrinsic::x86_sse_pshuf_w:
6660 case Intrinsic::x86_ssse3_pshuf_b_128:
6661 case Intrinsic::x86_ssse3_pshuf_b:
6662 case Intrinsic::x86_avx512_pshuf_b_512:
6663 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
6664 /*trailingVerbatimArgs=*/1);
6665 break;
6666
6667 // AVX512 PMOV: Packed MOV, with truncation
6668 // Precisely handled by applying the same intrinsic to the shadow
6669 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6670 case Intrinsic::x86_avx512_mask_pmov_db_512:
6671 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6672 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6673 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6674 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6675 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
6676 /*trailingVerbatimArgs=*/1);
6677 break;
6678 }
6679
6680 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6681 // Approximately handled using the corresponding truncation intrinsic
6682 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6683 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6684 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6685 handleIntrinsicByApplyingToShadow(I,
6686 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_dw_512,
6687 /* trailingVerbatimArgs=*/1);
6688 break;
6689 }
6690
6691 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6692 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6693 handleIntrinsicByApplyingToShadow(I,
6694 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_db_512,
6695 /* trailingVerbatimArgs=*/1);
6696 break;
6697 }
6698
6699 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6700 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6701 handleIntrinsicByApplyingToShadow(I,
6702 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_qb_512,
6703 /* trailingVerbatimArgs=*/1);
6704 break;
6705 }
6706
6707 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6708 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6709 handleIntrinsicByApplyingToShadow(I,
6710 shadowIntrinsicID: Intrinsic::x86_avx512_mask_pmov_qw_512,
6711 /* trailingVerbatimArgs=*/1);
6712 break;
6713 }
6714
6715 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6716 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6717 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6718 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6719 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6720 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6721 // slow-path handler.
6722 handleAVX512VectorDownConvert(I);
6723 break;
6724 }
6725
6726 // AVX512/AVX10 Reciprocal
6727 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6728 // (<16 x float>, <16 x float>, i16)
6729 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6730 // (<8 x float>, <8 x float>, i8)
6731 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6732 // (<4 x float>, <4 x float>, i8)
6733 //
6734 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6735 // (<8 x double>, <8 x double>, i8)
6736 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6737 // (<4 x double>, <4 x double>, i8)
6738 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6739 // (<2 x double>, <2 x double>, i8)
6740 //
6741 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6742 // (<32 x bfloat>, <32 x bfloat>, i32)
6743 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6744 // (<16 x bfloat>, <16 x bfloat>, i16)
6745 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6746 // (<8 x bfloat>, <8 x bfloat>, i8)
6747 //
6748 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6749 // (<32 x half>, <32 x half>, i32)
6750 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6751 // (<16 x half>, <16 x half>, i16)
6752 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6753 // (<8 x half>, <8 x half>, i8)
6754 //
6755 // TODO: 3-operand variants are not handled:
6756 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6757 // (<2 x double>, <2 x double>, <2 x double>, i8)
6758 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6759 // (<4 x float>, <4 x float>, <4 x float>, i8)
6760 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6761 // (<8 x half>, <8 x half>, <8 x half>, i8)
6762 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6763 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6764 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6765 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6766 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6767 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6768 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6769 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6770 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6771 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6772 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6773 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6774 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6775 /*WriteThruIndex=*/1,
6776 /*MaskIndex=*/2);
6777 break;
6778
6779 // AVX512/AVX10 Reciprocal Square Root
6780 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6781 // (<16 x float>, <16 x float>, i16)
6782 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6783 // (<8 x float>, <8 x float>, i8)
6784 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6785 // (<4 x float>, <4 x float>, i8)
6786 //
6787 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6788 // (<8 x double>, <8 x double>, i8)
6789 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6790 // (<4 x double>, <4 x double>, i8)
6791 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6792 // (<2 x double>, <2 x double>, i8)
6793 //
6794 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6795 // (<32 x bfloat>, <32 x bfloat>, i32)
6796 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6797 // (<16 x bfloat>, <16 x bfloat>, i16)
6798 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6799 // (<8 x bfloat>, <8 x bfloat>, i8)
6800 //
6801 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6802 // (<32 x half>, <32 x half>, i32)
6803 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6804 // (<16 x half>, <16 x half>, i16)
6805 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6806 // (<8 x half>, <8 x half>, i8)
6807 //
6808 // TODO: 3-operand variants are not handled:
6809 // <2 x double> @llvm.x86.avx512.rcp14.sd
6810 // (<2 x double>, <2 x double>, <2 x double>, i8)
6811 // <4 x float> @llvm.x86.avx512.rcp14.ss
6812 // (<4 x float>, <4 x float>, <4 x float>, i8)
6813 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6814 // (<8 x half>, <8 x half>, <8 x half>, i8)
6815 case Intrinsic::x86_avx512_rcp14_ps_512:
6816 case Intrinsic::x86_avx512_rcp14_ps_256:
6817 case Intrinsic::x86_avx512_rcp14_ps_128:
6818 case Intrinsic::x86_avx512_rcp14_pd_512:
6819 case Intrinsic::x86_avx512_rcp14_pd_256:
6820 case Intrinsic::x86_avx512_rcp14_pd_128:
6821 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6822 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6823 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6824 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6825 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6826 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6827 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6828 /*WriteThruIndex=*/1,
6829 /*MaskIndex=*/2);
6830 break;
6831
6832 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6833 // (<32 x half>, i32, <32 x half>, i32, i32)
6834 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6835 // (<16 x half>, i32, <16 x half>, i32, i16)
6836 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6837 // (<8 x half>, i32, <8 x half>, i32, i8)
6838 //
6839 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6840 // (<16 x float>, i32, <16 x float>, i16, i32)
6841 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6842 // (<8 x float>, i32, <8 x float>, i8)
6843 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6844 // (<4 x float>, i32, <4 x float>, i8)
6845 //
6846 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6847 // (<8 x double>, i32, <8 x double>, i8, i32)
6848 // A Imm WriteThru Mask Rounding
6849 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6850 // (<4 x double>, i32, <4 x double>, i8)
6851 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6852 // (<2 x double>, i32, <2 x double>, i8)
6853 // A Imm WriteThru Mask
6854 //
6855 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6856 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6857 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6858 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6859 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6860 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6861 //
6862 // Not supported: three vectors
6863 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6864 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6865 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6866 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6867 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6868 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6869 // i32)
6870 // A B WriteThru Mask Imm
6871 // Rounding
6872 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6873 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6874 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6875 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6876 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6877 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6878 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6879 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6880 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6881 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6882 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6883 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6884 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6885 /*WriteThruIndex=*/2,
6886 /*MaskIndex=*/3);
6887 break;
6888
6889 // AVX512 FP16 Arithmetic
6890 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6891 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6892 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6893 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6894 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6895 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6896 visitGenericScalarHalfwordInst(I);
6897 break;
6898 }
6899
6900 // AVX Galois Field New Instructions
6901 case Intrinsic::x86_vgf2p8affineqb_128:
6902 case Intrinsic::x86_vgf2p8affineqb_256:
6903 case Intrinsic::x86_vgf2p8affineqb_512:
6904 handleAVXGF2P8Affine(I);
6905 break;
6906
6907 default:
6908 return false;
6909 }
6910
6911 return true;
6912 }
6913
6914 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6915 switch (I.getIntrinsicID()) {
6916 // Two operands e.g.,
6917 // - <8 x i8> @llvm.aarch64.neon.rshrn.v8i8 (<8 x i16>, i32)
6918 // - <4 x i16> @llvm.aarch64.neon.uqrshl.v4i16(<4 x i16>, <4 x i16>)
6919 case Intrinsic::aarch64_neon_rshrn:
6920 case Intrinsic::aarch64_neon_sqrshl:
6921 case Intrinsic::aarch64_neon_sqrshrn:
6922 case Intrinsic::aarch64_neon_sqrshrun:
6923 case Intrinsic::aarch64_neon_sqshl:
6924 case Intrinsic::aarch64_neon_sqshlu:
6925 case Intrinsic::aarch64_neon_sqshrn:
6926 case Intrinsic::aarch64_neon_sqshrun:
6927 case Intrinsic::aarch64_neon_srshl:
6928 case Intrinsic::aarch64_neon_sshl:
6929 case Intrinsic::aarch64_neon_uqrshl:
6930 case Intrinsic::aarch64_neon_uqrshrn:
6931 case Intrinsic::aarch64_neon_uqshl:
6932 case Intrinsic::aarch64_neon_uqshrn:
6933 case Intrinsic::aarch64_neon_urshl:
6934 case Intrinsic::aarch64_neon_ushl:
6935 handleVectorShiftIntrinsic(I, /* Variable */ false);
6936 break;
6937
6938 // Vector Shift Left/Right and Insert
6939 //
6940 // Three operands e.g.,
6941 // - <4 x i16> @llvm.aarch64.neon.vsli.v4i16
6942 // (<4 x i16> %a, <4 x i16> %b, i32 %n)
6943 // - <16 x i8> @llvm.aarch64.neon.vsri.v16i8
6944 // (<16 x i8> %a, <16 x i8> %b, i32 %n)
6945 //
6946 // %b is shifted by %n bits, and the "missing" bits are filled in with %a
6947 // (instead of zero-extending/sign-extending).
6948 case Intrinsic::aarch64_neon_vsli:
6949 case Intrinsic::aarch64_neon_vsri:
6950 handleIntrinsicByApplyingToShadow(I, shadowIntrinsicID: I.getIntrinsicID(),
6951 /*trailingVerbatimArgs=*/1);
6952 break;
6953
6954 // TODO: handling max/min similarly to AND/OR may be more precise
6955 // Floating-Point Maximum/Minimum Pairwise
6956 case Intrinsic::aarch64_neon_fmaxp:
6957 case Intrinsic::aarch64_neon_fminp:
6958 // Floating-Point Maximum/Minimum Number Pairwise
6959 case Intrinsic::aarch64_neon_fmaxnmp:
6960 case Intrinsic::aarch64_neon_fminnmp:
6961 // Signed/Unsigned Maximum/Minimum Pairwise
6962 case Intrinsic::aarch64_neon_smaxp:
6963 case Intrinsic::aarch64_neon_sminp:
6964 case Intrinsic::aarch64_neon_umaxp:
6965 case Intrinsic::aarch64_neon_uminp:
6966 // Add Pairwise
6967 case Intrinsic::aarch64_neon_addp:
6968 // Floating-point Add Pairwise
6969 case Intrinsic::aarch64_neon_faddp:
6970 // Add Long Pairwise
6971 case Intrinsic::aarch64_neon_saddlp:
6972 case Intrinsic::aarch64_neon_uaddlp: {
6973 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6974 break;
6975 }
6976
6977 // Floating-point Convert to integer, rounding to nearest with ties to Away
6978 case Intrinsic::aarch64_neon_fcvtas:
6979 case Intrinsic::aarch64_neon_fcvtau:
6980 // Floating-point convert to integer, rounding toward minus infinity
6981 case Intrinsic::aarch64_neon_fcvtms:
6982 case Intrinsic::aarch64_neon_fcvtmu:
6983 // Floating-point convert to integer, rounding to nearest with ties to even
6984 case Intrinsic::aarch64_neon_fcvtns:
6985 case Intrinsic::aarch64_neon_fcvtnu:
6986 // Floating-point convert to integer, rounding toward plus infinity
6987 case Intrinsic::aarch64_neon_fcvtps:
6988 case Intrinsic::aarch64_neon_fcvtpu:
6989 // Floating-point Convert to integer, rounding toward Zero
6990 case Intrinsic::aarch64_neon_fcvtzs:
6991 case Intrinsic::aarch64_neon_fcvtzu:
6992 // Floating-point convert to lower precision narrow, rounding to odd
6993 case Intrinsic::aarch64_neon_fcvtxn:
6994 // Vector Conversions Between Half-Precision and Single-Precision
6995 case Intrinsic::aarch64_neon_vcvthf2fp:
6996 case Intrinsic::aarch64_neon_vcvtfp2hf:
6997 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/false);
6998 break;
6999
7000 // Vector Conversions Between Fixed-Point and Floating-Point
7001 case Intrinsic::aarch64_neon_vcvtfxs2fp:
7002 case Intrinsic::aarch64_neon_vcvtfp2fxs:
7003 case Intrinsic::aarch64_neon_vcvtfxu2fp:
7004 case Intrinsic::aarch64_neon_vcvtfp2fxu:
7005 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/true);
7006 break;
7007
7008 // TODO: bfloat conversions
7009 // - bfloat @llvm.aarch64.neon.bfcvt(float)
7010 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
7011 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
7012
7013 // Add reduction to scalar
7014 case Intrinsic::aarch64_neon_faddv:
7015 case Intrinsic::aarch64_neon_saddv:
7016 case Intrinsic::aarch64_neon_uaddv:
7017 // Signed/Unsigned min/max (Vector)
7018 // TODO: handling similarly to AND/OR may be more precise.
7019 case Intrinsic::aarch64_neon_smaxv:
7020 case Intrinsic::aarch64_neon_sminv:
7021 case Intrinsic::aarch64_neon_umaxv:
7022 case Intrinsic::aarch64_neon_uminv:
7023 // Floating-point min/max (vector)
7024 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
7025 // but our shadow propagation is the same.
7026 case Intrinsic::aarch64_neon_fmaxv:
7027 case Intrinsic::aarch64_neon_fminv:
7028 case Intrinsic::aarch64_neon_fmaxnmv:
7029 case Intrinsic::aarch64_neon_fminnmv:
7030 // Sum long across vector
7031 case Intrinsic::aarch64_neon_saddlv:
7032 case Intrinsic::aarch64_neon_uaddlv:
7033 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
7034 break;
7035
7036 case Intrinsic::aarch64_neon_ld1x2:
7037 case Intrinsic::aarch64_neon_ld1x3:
7038 case Intrinsic::aarch64_neon_ld1x4:
7039 case Intrinsic::aarch64_neon_ld2:
7040 case Intrinsic::aarch64_neon_ld3:
7041 case Intrinsic::aarch64_neon_ld4:
7042 case Intrinsic::aarch64_neon_ld2r:
7043 case Intrinsic::aarch64_neon_ld3r:
7044 case Intrinsic::aarch64_neon_ld4r: {
7045 handleNEONVectorLoad(I, /*WithLane=*/false);
7046 break;
7047 }
7048
7049 case Intrinsic::aarch64_neon_ld2lane:
7050 case Intrinsic::aarch64_neon_ld3lane:
7051 case Intrinsic::aarch64_neon_ld4lane: {
7052 handleNEONVectorLoad(I, /*WithLane=*/true);
7053 break;
7054 }
7055
7056 // Saturating extract narrow
7057 case Intrinsic::aarch64_neon_sqxtn:
7058 case Intrinsic::aarch64_neon_sqxtun:
7059 case Intrinsic::aarch64_neon_uqxtn:
7060 // These only have one argument, but we (ab)use handleShadowOr because it
7061 // does work on single argument intrinsics and will typecast the shadow
7062 // (and update the origin).
7063 handleShadowOr(I);
7064 break;
7065
7066 case Intrinsic::aarch64_neon_st1x2:
7067 case Intrinsic::aarch64_neon_st1x3:
7068 case Intrinsic::aarch64_neon_st1x4:
7069 case Intrinsic::aarch64_neon_st2:
7070 case Intrinsic::aarch64_neon_st3:
7071 case Intrinsic::aarch64_neon_st4: {
7072 handleNEONVectorStoreIntrinsic(I, useLane: false);
7073 break;
7074 }
7075
7076 case Intrinsic::aarch64_neon_st2lane:
7077 case Intrinsic::aarch64_neon_st3lane:
7078 case Intrinsic::aarch64_neon_st4lane: {
7079 handleNEONVectorStoreIntrinsic(I, useLane: true);
7080 break;
7081 }
7082
7083 // Arm NEON vector table intrinsics have the source/table register(s) as
7084 // arguments, followed by the index register. They return the output.
7085 //
7086 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
7087 // original value unchanged in the destination register.'
7088 // Conveniently, zero denotes a clean shadow, which means out-of-range
7089 // indices for TBL will initialize the user data with zero and also clean
7090 // the shadow. (For TBX, neither the user data nor the shadow will be
7091 // updated, which is also correct.)
7092 case Intrinsic::aarch64_neon_tbl1:
7093 case Intrinsic::aarch64_neon_tbl2:
7094 case Intrinsic::aarch64_neon_tbl3:
7095 case Intrinsic::aarch64_neon_tbl4:
7096 case Intrinsic::aarch64_neon_tbx1:
7097 case Intrinsic::aarch64_neon_tbx2:
7098 case Intrinsic::aarch64_neon_tbx3:
7099 case Intrinsic::aarch64_neon_tbx4: {
7100 // The last trailing argument (index register) should be handled verbatim
7101 handleIntrinsicByApplyingToShadow(
7102 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
7103 /*trailingVerbatimArgs*/ 1);
7104 break;
7105 }
7106
7107 case Intrinsic::aarch64_neon_fmulx:
7108 case Intrinsic::aarch64_neon_pmul:
7109 case Intrinsic::aarch64_neon_pmull:
7110 case Intrinsic::aarch64_neon_smull:
7111 case Intrinsic::aarch64_neon_pmull64:
7112 case Intrinsic::aarch64_neon_umull: {
7113 handleNEONVectorMultiplyIntrinsic(I);
7114 break;
7115 }
7116
7117 case Intrinsic::aarch64_neon_smmla:
7118 case Intrinsic::aarch64_neon_ummla:
7119 case Intrinsic::aarch64_neon_usmmla:
7120 case Intrinsic::aarch64_neon_bfmmla:
7121 handleNEONMatrixMultiply(I);
7122 break;
7123
7124 // <2 x i32> @llvm.aarch64.neon.{u,s,us}dot.v2i32.v8i8
7125 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
7126 // <4 x i32> @llvm.aarch64.neon.{u,s,us}dot.v4i32.v16i8
7127 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
7128 case Intrinsic::aarch64_neon_sdot:
7129 case Intrinsic::aarch64_neon_udot:
7130 case Intrinsic::aarch64_neon_usdot:
7131 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
7132 /*ZeroPurifies=*/true,
7133 /*EltSizeInBits=*/0,
7134 /*Lanes=*/kBothLanes);
7135 break;
7136
7137 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
7138 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
7139 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
7140 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
7141 case Intrinsic::aarch64_neon_bfdot:
7142 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
7143 /*ZeroPurifies=*/false,
7144 /*EltSizeInBits=*/0,
7145 /*Lanes=*/kBothLanes);
7146 break;
7147
7148 // Floating-Point Absolute Compare Greater Than/Equal
7149 case Intrinsic::aarch64_neon_facge:
7150 case Intrinsic::aarch64_neon_facgt:
7151 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/false);
7152 break;
7153
7154 default:
7155 return false;
7156 }
7157
7158 return true;
7159 }
7160
7161 void visitIntrinsicInst(IntrinsicInst &I) {
7162 if (maybeHandleCrossPlatformIntrinsic(I))
7163 return;
7164
7165 if (maybeHandleX86SIMDIntrinsic(I))
7166 return;
7167
7168 if (maybeHandleArmSIMDIntrinsic(I))
7169 return;
7170
7171 if (maybeHandleUnknownIntrinsic(I))
7172 return;
7173
7174 visitInstruction(I);
7175 }
7176
7177 void visitLibAtomicLoad(CallBase &CB) {
7178 // Since we use getNextNode here, we can't have CB terminate the BB.
7179 assert(isa<CallInst>(CB));
7180
7181 IRBuilder<> IRB(&CB);
7182 Value *Size = CB.getArgOperand(i: 0);
7183 Value *SrcPtr = CB.getArgOperand(i: 1);
7184 Value *DstPtr = CB.getArgOperand(i: 2);
7185 Value *Ordering = CB.getArgOperand(i: 3);
7186 // Convert the call to have at least Acquire ordering to make sure
7187 // the shadow operations aren't reordered before it.
7188 Value *NewOrdering =
7189 IRB.CreateExtractElement(Vec: makeAddAcquireOrderingTable(IRB), Idx: Ordering);
7190 CB.setArgOperand(i: 3, v: NewOrdering);
7191
7192 NextNodeIRBuilder NextIRB(&CB);
7193 Value *SrcShadowPtr, *SrcOriginPtr;
7194 std::tie(args&: SrcShadowPtr, args&: SrcOriginPtr) =
7195 getShadowOriginPtr(Addr: SrcPtr, IRB&: NextIRB, ShadowTy: NextIRB.getInt8Ty(), Alignment: Align(1),
7196 /*isStore*/ false);
7197 Value *DstShadowPtr =
7198 getShadowOriginPtr(Addr: DstPtr, IRB&: NextIRB, ShadowTy: NextIRB.getInt8Ty(), Alignment: Align(1),
7199 /*isStore*/ true)
7200 .first;
7201
7202 NextIRB.CreateMemCpy(Dst: DstShadowPtr, DstAlign: Align(1), Src: SrcShadowPtr, SrcAlign: Align(1), Size);
7203 if (MS.TrackOrigins) {
7204 Value *SrcOrigin = NextIRB.CreateAlignedLoad(Ty: MS.OriginTy, Ptr: SrcOriginPtr,
7205 Align: kMinOriginAlignment);
7206 Value *NewOrigin = updateOrigin(V: SrcOrigin, IRB&: NextIRB);
7207 NextIRB.CreateCall(Callee: MS.MsanSetOriginFn, Args: {DstPtr, Size, NewOrigin});
7208 }
7209 }
7210
7211 void visitLibAtomicStore(CallBase &CB) {
7212 IRBuilder<> IRB(&CB);
7213 Value *Size = CB.getArgOperand(i: 0);
7214 Value *DstPtr = CB.getArgOperand(i: 2);
7215 Value *Ordering = CB.getArgOperand(i: 3);
7216 // Convert the call to have at least Release ordering to make sure
7217 // the shadow operations aren't reordered after it.
7218 Value *NewOrdering =
7219 IRB.CreateExtractElement(Vec: makeAddReleaseOrderingTable(IRB), Idx: Ordering);
7220 CB.setArgOperand(i: 3, v: NewOrdering);
7221
7222 Value *DstShadowPtr =
7223 getShadowOriginPtr(Addr: DstPtr, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: Align(1),
7224 /*isStore*/ true)
7225 .first;
7226
7227 // Atomic store always paints clean shadow/origin. See file header.
7228 IRB.CreateMemSet(Ptr: DstShadowPtr, Val: getCleanShadow(OrigTy: IRB.getInt8Ty()), Size,
7229 Align: Align(1));
7230 }
7231
7232 void visitCallBase(CallBase &CB) {
7233 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
7234 if (CB.isInlineAsm()) {
7235 // For inline asm (either a call to asm function, or callbr instruction),
7236 // do the usual thing: check argument shadow and mark all outputs as
7237 // clean. Note that any side effects of the inline asm that are not
7238 // immediately visible in its constraints are not handled.
7239 if (ClHandleAsmConservative)
7240 visitAsmInstruction(I&: CB);
7241 else
7242 visitInstruction(I&: CB);
7243 return;
7244 }
7245 LibFunc LF;
7246 if (TLI->getLibFunc(CB, F&: LF)) {
7247 // libatomic.a functions need to have special handling because there isn't
7248 // a good way to intercept them or compile the library with
7249 // instrumentation.
7250 switch (LF) {
7251 case LibFunc_atomic_load:
7252 if (!isa<CallInst>(Val: CB)) {
7253 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
7254 "Ignoring!\n";
7255 break;
7256 }
7257 visitLibAtomicLoad(CB);
7258 return;
7259 case LibFunc_atomic_store:
7260 visitLibAtomicStore(CB);
7261 return;
7262 default:
7263 break;
7264 }
7265 }
7266
7267 if (auto *Call = dyn_cast<CallInst>(Val: &CB)) {
7268 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7269
7270 // We are going to insert code that relies on the fact that the callee
7271 // will become a non-readonly function after it is instrumented by us. To
7272 // prevent this code from being optimized out, mark that function
7273 // non-readonly in advance.
7274 // TODO: We can likely do better than dropping memory() completely here.
7275 AttributeMask B;
7276 B.addAttribute(Val: Attribute::Memory).addAttribute(Val: Attribute::Speculatable);
7277
7278 Call->removeFnAttrs(AttrsToRemove: B);
7279 if (Function *Func = Call->getCalledFunction()) {
7280 Func->removeFnAttrs(Attrs: B);
7281 }
7282
7283 maybeMarkSanitizerLibraryCallNoBuiltin(CI: Call, TLI);
7284 }
7285 IRBuilder<> IRB(&CB);
7286 bool MayCheckCall = MS.EagerChecks;
7287 if (Function *Func = CB.getCalledFunction()) {
7288 // __sanitizer_unaligned_{load,store} functions may be called by users
7289 // and always expects shadows in the TLS. So don't check them.
7290 MayCheckCall &= !Func->getName().starts_with(Prefix: "__sanitizer_unaligned_");
7291 }
7292
7293 unsigned ArgOffset = 0;
7294 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7295 for (const auto &[i, A] : llvm::enumerate(First: CB.args())) {
7296 if (!A->getType()->isSized()) {
7297 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7298 continue;
7299 }
7300
7301 if (A->getType()->isScalableTy()) {
7302 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7303 // Handle as noundef, but don't reserve tls slots.
7304 insertCheckShadowOf(Val: A, OrigIns: &CB);
7305 continue;
7306 }
7307
7308 unsigned Size = 0;
7309 const DataLayout &DL = F.getDataLayout();
7310
7311 bool ByVal = CB.paramHasAttr(ArgNo: i, Kind: Attribute::ByVal);
7312 bool NoUndef = CB.paramHasAttr(ArgNo: i, Kind: Attribute::NoUndef);
7313 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7314
7315 if (EagerCheck) {
7316 insertCheckShadowOf(Val: A, OrigIns: &CB);
7317 Size = DL.getTypeAllocSize(Ty: A->getType());
7318 } else {
7319 [[maybe_unused]] Value *Store = nullptr;
7320 // Compute the Shadow for arg even if it is ByVal, because
7321 // in that case getShadow() will copy the actual arg shadow to
7322 // __msan_param_tls.
7323 Value *ArgShadow = getShadow(V: A);
7324 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7325 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7326 << " Shadow: " << *ArgShadow << "\n");
7327 if (ByVal) {
7328 // ByVal requires some special handling as it's too big for a single
7329 // load
7330 assert(A->getType()->isPointerTy() &&
7331 "ByVal argument is not a pointer!");
7332 Size = DL.getTypeAllocSize(Ty: CB.getParamByValType(ArgNo: i));
7333 if (ArgOffset + Size > kParamTLSSize)
7334 break;
7335 const MaybeAlign ParamAlignment(CB.getParamAlign(ArgNo: i));
7336 MaybeAlign Alignment = std::nullopt;
7337 if (ParamAlignment)
7338 Alignment = std::min(a: *ParamAlignment, b: kShadowTLSAlignment);
7339 Value *AShadowPtr, *AOriginPtr;
7340 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
7341 getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(), Alignment,
7342 /*isStore*/ false);
7343 if (!PropagateShadow) {
7344 Store = IRB.CreateMemSet(Ptr: ArgShadowBase,
7345 Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
7346 Size, Align: Alignment);
7347 } else {
7348 Store = IRB.CreateMemCpy(Dst: ArgShadowBase, DstAlign: Alignment, Src: AShadowPtr,
7349 SrcAlign: Alignment, Size);
7350 if (MS.TrackOrigins) {
7351 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7352 // FIXME: OriginSize should be:
7353 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7354 unsigned OriginSize = alignTo(Size, A: kMinOriginAlignment);
7355 IRB.CreateMemCpy(
7356 Dst: ArgOriginBase,
7357 /* by origin_tls[ArgOffset] */ DstAlign: kMinOriginAlignment,
7358 Src: AOriginPtr,
7359 /* by getShadowOriginPtr */ SrcAlign: kMinOriginAlignment, Size: OriginSize);
7360 }
7361 }
7362 } else {
7363 // Any other parameters mean we need bit-grained tracking of uninit
7364 // data
7365 Size = DL.getTypeAllocSize(Ty: A->getType());
7366 if (ArgOffset + Size > kParamTLSSize)
7367 break;
7368 Store = IRB.CreateAlignedStore(Val: ArgShadow, Ptr: ArgShadowBase,
7369 Align: kShadowTLSAlignment);
7370 Constant *Cst = dyn_cast<Constant>(Val: ArgShadow);
7371 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7372 IRB.CreateStore(Val: getOrigin(V: A),
7373 Ptr: getOriginPtrForArgument(IRB, ArgOffset));
7374 }
7375 }
7376 assert(Store != nullptr);
7377 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7378 }
7379 assert(Size != 0);
7380 ArgOffset += alignTo(Size, A: kShadowTLSAlignment);
7381 }
7382 LLVM_DEBUG(dbgs() << " done with call args\n");
7383
7384 FunctionType *FT = CB.getFunctionType();
7385 if (FT->isVarArg()) {
7386 VAHelper->visitCallBase(CB, IRB);
7387 }
7388
7389 // Now, get the shadow for the RetVal.
7390 if (!CB.getType()->isSized())
7391 return;
7392 // Don't emit the epilogue for musttail call returns.
7393 if (isa<CallInst>(Val: CB) && cast<CallInst>(Val&: CB).isMustTailCall())
7394 return;
7395
7396 if (MayCheckCall && CB.hasRetAttr(Kind: Attribute::NoUndef)) {
7397 setShadow(V: &CB, SV: getCleanShadow(V: &CB));
7398 setOrigin(V: &CB, Origin: getCleanOrigin());
7399 return;
7400 }
7401
7402 IRBuilder<> IRBBefore(&CB);
7403 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7404 Value *Base = getShadowPtrForRetval(IRB&: IRBBefore);
7405 IRBBefore.CreateAlignedStore(Val: getCleanShadow(V: &CB), Ptr: Base,
7406 Align: kShadowTLSAlignment);
7407 BasicBlock::iterator NextInsn;
7408 if (isa<CallInst>(Val: CB)) {
7409 NextInsn = ++CB.getIterator();
7410 assert(NextInsn != CB.getParent()->end());
7411 } else {
7412 BasicBlock *NormalDest = cast<InvokeInst>(Val&: CB).getNormalDest();
7413 if (!NormalDest->getSinglePredecessor()) {
7414 // FIXME: this case is tricky, so we are just conservative here.
7415 // Perhaps we need to split the edge between this BB and NormalDest,
7416 // but a naive attempt to use SplitEdge leads to a crash.
7417 setShadow(V: &CB, SV: getCleanShadow(V: &CB));
7418 setOrigin(V: &CB, Origin: getCleanOrigin());
7419 return;
7420 }
7421 // FIXME: NextInsn is likely in a basic block that has not been visited
7422 // yet. Anything inserted there will be instrumented by MSan later!
7423 NextInsn = NormalDest->getFirstInsertionPt();
7424 assert(NextInsn != NormalDest->end() &&
7425 "Could not find insertion point for retval shadow load");
7426 }
7427 IRBuilder<> IRBAfter(&*NextInsn);
7428 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7429 Ty: getShadowTy(V: &CB), Ptr: getShadowPtrForRetval(IRB&: IRBAfter), Align: kShadowTLSAlignment,
7430 Name: "_msret");
7431 setShadow(V: &CB, SV: RetvalShadow);
7432 if (MS.TrackOrigins)
7433 setOrigin(V: &CB, Origin: IRBAfter.CreateLoad(Ty: MS.OriginTy, Ptr: getOriginPtrForRetval()));
7434 }
7435
7436 bool isAMustTailRetVal(Value *RetVal) {
7437 if (auto *I = dyn_cast<BitCastInst>(Val: RetVal)) {
7438 RetVal = I->getOperand(i_nocapture: 0);
7439 }
7440 if (auto *I = dyn_cast<CallInst>(Val: RetVal)) {
7441 return I->isMustTailCall();
7442 }
7443 return false;
7444 }
7445
7446 void visitReturnInst(ReturnInst &I) {
7447 IRBuilder<> IRB(&I);
7448 Value *RetVal = I.getReturnValue();
7449 if (!RetVal)
7450 return;
7451 // Don't emit the epilogue for musttail call returns.
7452 if (isAMustTailRetVal(RetVal))
7453 return;
7454 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7455 bool HasNoUndef = F.hasRetAttribute(Kind: Attribute::NoUndef);
7456 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7457 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7458 // must always return fully initialized values. For now, we hardcode "main".
7459 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7460
7461 Value *Shadow = getShadow(V: RetVal);
7462 bool StoreOrigin = true;
7463 if (EagerCheck) {
7464 insertCheckShadowOf(Val: RetVal, OrigIns: &I);
7465 Shadow = getCleanShadow(V: RetVal);
7466 StoreOrigin = false;
7467 }
7468
7469 // The caller may still expect information passed over TLS if we pass our
7470 // check
7471 if (StoreShadow) {
7472 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowPtr, Align: kShadowTLSAlignment);
7473 if (MS.TrackOrigins && StoreOrigin)
7474 IRB.CreateStore(Val: getOrigin(V: RetVal), Ptr: getOriginPtrForRetval());
7475 }
7476 }
7477
7478 void visitPHINode(PHINode &I) {
7479 IRBuilder<> IRB(&I);
7480 if (!PropagateShadow) {
7481 setShadow(V: &I, SV: getCleanShadow(V: &I));
7482 setOrigin(V: &I, Origin: getCleanOrigin());
7483 return;
7484 }
7485
7486 ShadowPHINodes.push_back(Elt: &I);
7487 setShadow(V: &I, SV: IRB.CreatePHI(Ty: getShadowTy(V: &I), NumReservedValues: I.getNumIncomingValues(),
7488 Name: "_msphi_s"));
7489 if (MS.TrackOrigins)
7490 setOrigin(
7491 V: &I, Origin: IRB.CreatePHI(Ty: MS.OriginTy, NumReservedValues: I.getNumIncomingValues(), Name: "_msphi_o"));
7492 }
7493
7494 Value *getLocalVarIdptr(AllocaInst &I) {
7495 ConstantInt *IntConst =
7496 ConstantInt::get(Ty: Type::getInt32Ty(C&: (*F.getParent()).getContext()), V: 0);
7497 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7498 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7499 IntConst);
7500 }
7501
7502 Value *getLocalVarDescription(AllocaInst &I) {
7503 return createPrivateConstGlobalForString(M&: *F.getParent(), Str: I.getName());
7504 }
7505
7506 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7507 if (PoisonStack && ClPoisonStackWithCall) {
7508 IRB.CreateCall(Callee: MS.MsanPoisonStackFn, Args: {&I, Len});
7509 } else {
7510 Value *ShadowBase, *OriginBase;
7511 std::tie(args&: ShadowBase, args&: OriginBase) = getShadowOriginPtr(
7512 Addr: &I, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: Align(1), /*isStore*/ true);
7513
7514 Value *PoisonValue = IRB.getInt8(C: PoisonStack ? ClPoisonStackPattern : 0);
7515 IRB.CreateMemSet(Ptr: ShadowBase, Val: PoisonValue, Size: Len, Align: I.getAlign());
7516 }
7517
7518 if (PoisonStack && MS.TrackOrigins) {
7519 Value *Idptr = getLocalVarIdptr(I);
7520 if (ClPrintStackNames) {
7521 Value *Descr = getLocalVarDescription(I);
7522 IRB.CreateCall(Callee: MS.MsanSetAllocaOriginWithDescriptionFn,
7523 Args: {&I, Len, Idptr, Descr});
7524 } else {
7525 IRB.CreateCall(Callee: MS.MsanSetAllocaOriginNoDescriptionFn, Args: {&I, Len, Idptr});
7526 }
7527 }
7528 }
7529
7530 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7531 Value *Descr = getLocalVarDescription(I);
7532 if (PoisonStack) {
7533 IRB.CreateCall(Callee: MS.MsanPoisonAllocaFn, Args: {&I, Len, Descr});
7534 } else {
7535 IRB.CreateCall(Callee: MS.MsanUnpoisonAllocaFn, Args: {&I, Len});
7536 }
7537 }
7538
7539 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7540 if (!InsPoint)
7541 InsPoint = &I;
7542 NextNodeIRBuilder IRB(InsPoint);
7543 Value *Len = IRB.CreateAllocationSize(DestTy: MS.IntptrTy, AI: &I);
7544
7545 if (MS.CompileKernel)
7546 poisonAllocaKmsan(I, IRB, Len);
7547 else
7548 poisonAllocaUserspace(I, IRB, Len);
7549 }
7550
7551 void visitAllocaInst(AllocaInst &I) {
7552 setShadow(V: &I, SV: getCleanShadow(V: &I));
7553 setOrigin(V: &I, Origin: getCleanOrigin());
7554 // We'll get to this alloca later unless it's poisoned at the corresponding
7555 // llvm.lifetime.start.
7556 AllocaSet.insert(X: &I);
7557 }
7558
7559 void visitSelectInst(SelectInst &I) {
7560 // a = select b, c, d
7561 Value *B = I.getCondition();
7562 Value *C = I.getTrueValue();
7563 Value *D = I.getFalseValue();
7564
7565 handleSelectLikeInst(I, B, C, D);
7566 }
7567
7568 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7569 IRBuilder<> IRB(&I);
7570
7571 Value *Sb = getShadow(V: B);
7572 Value *Sc = getShadow(V: C);
7573 Value *Sd = getShadow(V: D);
7574
7575 Value *Ob = MS.TrackOrigins ? getOrigin(V: B) : nullptr;
7576 Value *Oc = MS.TrackOrigins ? getOrigin(V: C) : nullptr;
7577 Value *Od = MS.TrackOrigins ? getOrigin(V: D) : nullptr;
7578
7579 // Result shadow if condition shadow is 0.
7580 Value *Sa0 = IRB.CreateSelect(C: B, True: Sc, False: Sd);
7581 Value *Sa1;
7582 if (I.getType()->isAggregateType()) {
7583 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7584 // an extra "select". This results in much more compact IR.
7585 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7586 Sa1 = getPoisonedShadow(ShadowTy: getShadowTy(OrigTy: I.getType()));
7587 } else if (isScalableNonVectorType(Ty: I.getType())) {
7588 // This is intended to handle target("aarch64.svcount"), which can't be
7589 // handled in the else branch because of incompatibility with CreateXor
7590 // ("The supported LLVM operations on this type are limited to load,
7591 // store, phi, select and alloca instructions").
7592
7593 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7594 // branch as needed instead.
7595 Sa1 = getCleanShadow(OrigTy: getShadowTy(OrigTy: I.getType()));
7596 } else {
7597 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7598 // If Sb (condition is poisoned), look for bits in c and d that are equal
7599 // and both unpoisoned.
7600 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7601
7602 // Cast arguments to shadow-compatible type.
7603 C = CreateAppToShadowCast(IRB, V: C);
7604 D = CreateAppToShadowCast(IRB, V: D);
7605
7606 // Result shadow if condition shadow is 1.
7607 Sa1 = IRB.CreateOr(Ops: {IRB.CreateXor(LHS: C, RHS: D), Sc, Sd});
7608 }
7609 Value *Sa = IRB.CreateSelect(C: Sb, True: Sa1, False: Sa0, Name: "_msprop_select");
7610 setShadow(V: &I, SV: Sa);
7611 if (MS.TrackOrigins) {
7612 // Origins are always i32, so any vector conditions must be flattened.
7613 // FIXME: consider tracking vector origins for app vectors?
7614 if (B->getType()->isVectorTy()) {
7615 B = convertToBool(V: B, IRB);
7616 Sb = convertToBool(V: Sb, IRB);
7617 }
7618 // a = select b, c, d
7619 // Oa = Sb ? Ob : (b ? Oc : Od)
7620 setOrigin(V: &I, Origin: IRB.CreateSelect(C: Sb, True: Ob, False: IRB.CreateSelect(C: B, True: Oc, False: Od)));
7621 }
7622 }
7623
7624 void visitLandingPadInst(LandingPadInst &I) {
7625 // Do nothing.
7626 // See https://github.com/google/sanitizers/issues/504
7627 setShadow(V: &I, SV: getCleanShadow(V: &I));
7628 setOrigin(V: &I, Origin: getCleanOrigin());
7629 }
7630
7631 void visitCatchSwitchInst(CatchSwitchInst &I) {
7632 setShadow(V: &I, SV: getCleanShadow(V: &I));
7633 setOrigin(V: &I, Origin: getCleanOrigin());
7634 }
7635
7636 void visitFuncletPadInst(FuncletPadInst &I) {
7637 setShadow(V: &I, SV: getCleanShadow(V: &I));
7638 setOrigin(V: &I, Origin: getCleanOrigin());
7639 }
7640
7641 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7642
7643 void visitExtractValueInst(ExtractValueInst &I) {
7644 IRBuilder<> IRB(&I);
7645 Value *Agg = I.getAggregateOperand();
7646 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7647 Value *AggShadow = getShadow(V: Agg);
7648 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7649 Value *ResShadow = IRB.CreateExtractValue(Agg: AggShadow, Idxs: I.getIndices());
7650 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7651 setShadow(V: &I, SV: ResShadow);
7652 setOriginForNaryOp(I);
7653 }
7654
7655 void visitInsertValueInst(InsertValueInst &I) {
7656 IRBuilder<> IRB(&I);
7657 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7658 Value *AggShadow = getShadow(V: I.getAggregateOperand());
7659 Value *InsShadow = getShadow(V: I.getInsertedValueOperand());
7660 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7661 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7662 Value *Res = IRB.CreateInsertValue(Agg: AggShadow, Val: InsShadow, Idxs: I.getIndices());
7663 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7664 setShadow(V: &I, SV: Res);
7665 setOriginForNaryOp(I);
7666 }
7667
7668 void dumpInst(Instruction &I) {
7669 // Instruction name only
7670 // For intrinsics, the full/overloaded name is used
7671 //
7672 // e.g., "call llvm.aarch64.neon.uqsub.v16i8"
7673 if (CallInst *CI = dyn_cast<CallInst>(Val: &I)) {
7674 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7675 } else {
7676 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7677 }
7678
7679 // Instruction prototype (including return type and parameter types)
7680 // For intrinsics, we use the base/non-overloaded name
7681 //
7682 // e.g., "call <16 x i8> @llvm.aarch64.neon.uqsub(<16 x i8>, <16 x i8>)"
7683 unsigned NumOperands = I.getNumOperands();
7684 if (CallInst *CI = dyn_cast<CallInst>(Val: &I)) {
7685 errs() << "YYY call " << *I.getType() << " @";
7686
7687 if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Val: CI))
7688 errs() << Intrinsic::getBaseName(id: II->getIntrinsicID());
7689 else
7690 errs() << CI->getCalledFunction()->getName();
7691
7692 errs() << "(";
7693
7694 // The last operand of a CallInst is the function itself.
7695 NumOperands--;
7696 } else
7697 errs() << "YYY " << *I.getType() << " " << I.getOpcodeName() << "(";
7698
7699 for (size_t i = 0; i < NumOperands; i++) {
7700 if (i > 0)
7701 errs() << ", ";
7702
7703 errs() << *(I.getOperand(i)->getType());
7704 }
7705
7706 errs() << ")\n";
7707
7708 // Full instruction, including types and operand values
7709 // For intrinsics, the full/overloaded name is used
7710 //
7711 // e.g., "%vqsubq_v.i15 = call noundef <16 x i8>
7712 // @llvm.aarch64.neon.uqsub.v16i8(<16 x i8> %vext21.i,
7713 // <16 x i8> splat (i8 1)), !dbg !66"
7714 errs() << "QQQ " << I << "\n";
7715 }
7716
7717 void visitResumeInst(ResumeInst &I) {
7718 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7719 // Nothing to do here.
7720 }
7721
7722 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7723 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7724 // Nothing to do here.
7725 }
7726
7727 void visitCatchReturnInst(CatchReturnInst &CRI) {
7728 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7729 // Nothing to do here.
7730 }
7731
7732 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7733 IRBuilder<> &IRB, const DataLayout &DL,
7734 bool isOutput) {
7735 // For each assembly argument, we check its value for being initialized.
7736 // If the argument is a pointer, we assume it points to a single element
7737 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7738 // Each such pointer is instrumented with a call to the runtime library.
7739 Type *OpType = Operand->getType();
7740 // Check the operand value itself.
7741 insertCheckShadowOf(Val: Operand, OrigIns: &I);
7742 if (!OpType->isPointerTy() || !isOutput) {
7743 assert(!isOutput);
7744 return;
7745 }
7746 if (!ElemTy->isSized())
7747 return;
7748 auto Size = DL.getTypeStoreSize(Ty: ElemTy);
7749 Value *SizeVal = IRB.CreateTypeSize(Ty: MS.IntptrTy, Size);
7750 if (MS.CompileKernel) {
7751 IRB.CreateCall(Callee: MS.MsanInstrumentAsmStoreFn, Args: {Operand, SizeVal});
7752 } else {
7753 // ElemTy, derived from elementtype(), does not encode the alignment of
7754 // the pointer. Conservatively assume that the shadow memory is unaligned.
7755 // When Size is large, avoid StoreInst as it would expand to many
7756 // instructions.
7757 auto [ShadowPtr, _] =
7758 getShadowOriginPtrUserspace(Addr: Operand, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: Align(1));
7759 if (Size <= 32)
7760 IRB.CreateAlignedStore(Val: getCleanShadow(OrigTy: ElemTy), Ptr: ShadowPtr, Align: Align(1));
7761 else
7762 IRB.CreateMemSet(Ptr: ShadowPtr, Val: ConstantInt::getNullValue(Ty: IRB.getInt8Ty()),
7763 Size: SizeVal, Align: Align(1));
7764 }
7765 }
7766
7767 /// Get the number of output arguments returned by pointers.
7768 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7769 int NumRetOutputs = 0;
7770 int NumOutputs = 0;
7771 Type *RetTy = cast<Value>(Val: CB)->getType();
7772 if (!RetTy->isVoidTy()) {
7773 // Register outputs are returned via the CallInst return value.
7774 auto *ST = dyn_cast<StructType>(Val: RetTy);
7775 if (ST)
7776 NumRetOutputs = ST->getNumElements();
7777 else
7778 NumRetOutputs = 1;
7779 }
7780 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7781 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7782 switch (Info.Type) {
7783 case InlineAsm::isOutput:
7784 NumOutputs++;
7785 break;
7786 default:
7787 break;
7788 }
7789 }
7790 return NumOutputs - NumRetOutputs;
7791 }
7792
7793 void visitAsmInstruction(Instruction &I) {
7794 // Conservative inline assembly handling: check for poisoned shadow of
7795 // asm() arguments, then unpoison the result and all the memory locations
7796 // pointed to by those arguments.
7797 // An inline asm() statement in C++ contains lists of input and output
7798 // arguments used by the assembly code. These are mapped to operands of the
7799 // CallInst as follows:
7800 // - nR register outputs ("=r) are returned by value in a single structure
7801 // (SSA value of the CallInst);
7802 // - nO other outputs ("=m" and others) are returned by pointer as first
7803 // nO operands of the CallInst;
7804 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7805 // remaining nI operands.
7806 // The total number of asm() arguments in the source is nR+nO+nI, and the
7807 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7808 // function to be called).
7809 const DataLayout &DL = F.getDataLayout();
7810 CallBase *CB = cast<CallBase>(Val: &I);
7811 IRBuilder<> IRB(&I);
7812 InlineAsm *IA = cast<InlineAsm>(Val: CB->getCalledOperand());
7813 int OutputArgs = getNumOutputArgs(IA, CB);
7814 // The last operand of a CallInst is the function itself.
7815 int NumOperands = CB->getNumOperands() - 1;
7816
7817 // Check input arguments. Doing so before unpoisoning output arguments, so
7818 // that we won't overwrite uninit values before checking them.
7819 for (int i = OutputArgs; i < NumOperands; i++) {
7820 Value *Operand = CB->getOperand(i_nocapture: i);
7821 instrumentAsmArgument(Operand, ElemTy: CB->getParamElementType(ArgNo: i), I, IRB, DL,
7822 /*isOutput*/ false);
7823 }
7824 // Unpoison output arguments. This must happen before the actual InlineAsm
7825 // call, so that the shadow for memory published in the asm() statement
7826 // remains valid.
7827 for (int i = 0; i < OutputArgs; i++) {
7828 Value *Operand = CB->getOperand(i_nocapture: i);
7829 instrumentAsmArgument(Operand, ElemTy: CB->getParamElementType(ArgNo: i), I, IRB, DL,
7830 /*isOutput*/ true);
7831 }
7832
7833 setShadow(V: &I, SV: getCleanShadow(V: &I));
7834 setOrigin(V: &I, Origin: getCleanOrigin());
7835 }
7836
7837 void visitFreezeInst(FreezeInst &I) {
7838 // Freeze always returns a fully defined value.
7839 setShadow(V: &I, SV: getCleanShadow(V: &I));
7840 setOrigin(V: &I, Origin: getCleanOrigin());
7841 }
7842
7843 void visitInstruction(Instruction &I) {
7844 // Everything else: stop propagating and check for poisoned shadow.
7845 if (ClDumpStrictInstructions)
7846 dumpInst(I);
7847 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7848 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7849 Value *Operand = I.getOperand(i);
7850 if (Operand->getType()->isSized())
7851 insertCheckShadowOf(Val: Operand, OrigIns: &I);
7852 }
7853 setShadow(V: &I, SV: getCleanShadow(V: &I));
7854 setOrigin(V: &I, Origin: getCleanOrigin());
7855 }
7856};
7857
7858struct VarArgHelperBase : public VarArgHelper {
7859 Function &F;
7860 MemorySanitizer &MS;
7861 MemorySanitizerVisitor &MSV;
7862 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7863 const unsigned VAListTagSize;
7864
7865 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7866 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7867 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7868
7869 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7870 Value *Base = IRB.CreatePointerCast(V: MS.VAArgTLS, DestTy: MS.IntptrTy);
7871 return IRB.CreateAdd(LHS: Base, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset));
7872 }
7873
7874 /// Compute the shadow address for a given va_arg.
7875 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7876 return IRB.CreatePtrAdd(
7877 Ptr: MS.VAArgTLS, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset), Name: "_msarg_va_s");
7878 }
7879
7880 /// Compute the shadow address for a given va_arg.
7881 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7882 unsigned ArgSize) {
7883 // Make sure we don't overflow __msan_va_arg_tls.
7884 if (ArgOffset + ArgSize > kParamTLSSize)
7885 return nullptr;
7886 return getShadowPtrForVAArgument(IRB, ArgOffset);
7887 }
7888
7889 /// Compute the origin address for a given va_arg.
7890 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7891 // getOriginPtrForVAArgument() is always called after
7892 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7893 // overflow.
7894 return IRB.CreatePtrAdd(Ptr: MS.VAArgOriginTLS,
7895 Offset: ConstantInt::get(Ty: MS.IntptrTy, V: ArgOffset),
7896 Name: "_msarg_va_o");
7897 }
7898
7899 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7900 unsigned BaseOffset) {
7901 // The tails of __msan_va_arg_tls is not large enough to fit full
7902 // value shadow, but it will be copied to backup anyway. Make it
7903 // clean.
7904 if (BaseOffset >= kParamTLSSize)
7905 return;
7906 Value *TailSize =
7907 ConstantInt::getSigned(Ty: IRB.getInt32Ty(), V: kParamTLSSize - BaseOffset);
7908 IRB.CreateMemSet(Ptr: ShadowBase, Val: ConstantInt::getNullValue(Ty: IRB.getInt8Ty()),
7909 Size: TailSize, Align: Align(8));
7910 }
7911
7912 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7913 IRBuilder<> IRB(&I);
7914 Value *VAListTag = I.getArgOperand(i: 0);
7915 const Align Alignment = Align(8);
7916 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7917 Addr: VAListTag, IRB, ShadowTy: IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7918 // Unpoison the whole __va_list_tag.
7919 IRB.CreateMemSet(Ptr: ShadowPtr, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
7920 Size: VAListTagSize, Align: Alignment, isVolatile: false);
7921 }
7922
7923 void visitVAStartInst(VAStartInst &I) override {
7924 if (F.getCallingConv() == CallingConv::Win64)
7925 return;
7926 VAStartInstrumentationList.push_back(Elt: &I);
7927 unpoisonVAListTagForInst(I);
7928 }
7929
7930 void visitVACopyInst(VACopyInst &I) override {
7931 if (F.getCallingConv() == CallingConv::Win64)
7932 return;
7933 unpoisonVAListTagForInst(I);
7934 }
7935};
7936
7937/// AMD64-specific implementation of VarArgHelper.
7938struct VarArgAMD64Helper : public VarArgHelperBase {
7939 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7940 // See a comment in visitCallBase for more details.
7941 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7942 static const unsigned AMD64FpEndOffsetSSE = 176;
7943 // If SSE is disabled, fp_offset in va_list is zero.
7944 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7945
7946 unsigned AMD64FpEndOffset;
7947 AllocaInst *VAArgTLSCopy = nullptr;
7948 AllocaInst *VAArgTLSOriginCopy = nullptr;
7949 Value *VAArgOverflowSize = nullptr;
7950
7951 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7952
7953 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7954 MemorySanitizerVisitor &MSV)
7955 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7956 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7957 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7958 if (Attr.isStringAttribute() &&
7959 (Attr.getKindAsString() == "target-features")) {
7960 if (Attr.getValueAsString().contains(Other: "-sse"))
7961 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7962 break;
7963 }
7964 }
7965 }
7966
7967 ArgKind classifyArgument(Value *arg) {
7968 // A very rough approximation of X86_64 argument classification rules.
7969 Type *T = arg->getType();
7970 if (T->isX86_FP80Ty())
7971 return AK_Memory;
7972 if (T->isFPOrFPVectorTy())
7973 return AK_FloatingPoint;
7974 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7975 return AK_GeneralPurpose;
7976 if (T->isPointerTy())
7977 return AK_GeneralPurpose;
7978 return AK_Memory;
7979 }
7980
7981 // For VarArg functions, store the argument shadow in an ABI-specific format
7982 // that corresponds to va_list layout.
7983 // We do this because Clang lowers va_arg in the frontend, and this pass
7984 // only sees the low level code that deals with va_list internals.
7985 // A much easier alternative (provided that Clang emits va_arg instructions)
7986 // would have been to associate each live instance of va_list with a copy of
7987 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7988 // order.
7989 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7990 unsigned GpOffset = 0;
7991 unsigned FpOffset = AMD64GpEndOffset;
7992 unsigned OverflowOffset = AMD64FpEndOffset;
7993 const DataLayout &DL = F.getDataLayout();
7994
7995 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
7996 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7997 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
7998 if (IsByVal) {
7999 // ByVal arguments always go to the overflow area.
8000 // Fixed arguments passed through the overflow area will be stepped
8001 // over by va_start, so don't count them towards the offset.
8002 if (IsFixed)
8003 continue;
8004 assert(A->getType()->isPointerTy());
8005 Type *RealTy = CB.getParamByValType(ArgNo);
8006 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
8007 uint64_t AlignedSize = alignTo(Value: ArgSize, Align: 8);
8008 unsigned BaseOffset = OverflowOffset;
8009 Value *ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
8010 Value *OriginBase = nullptr;
8011 if (MS.TrackOrigins)
8012 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
8013 OverflowOffset += AlignedSize;
8014
8015 if (OverflowOffset > kParamTLSSize) {
8016 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8017 continue; // We have no space to copy shadow there.
8018 }
8019
8020 Value *ShadowPtr, *OriginPtr;
8021 std::tie(args&: ShadowPtr, args&: OriginPtr) =
8022 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(), Alignment: kShadowTLSAlignment,
8023 /*isStore*/ false);
8024 IRB.CreateMemCpy(Dst: ShadowBase, DstAlign: kShadowTLSAlignment, Src: ShadowPtr,
8025 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8026 if (MS.TrackOrigins)
8027 IRB.CreateMemCpy(Dst: OriginBase, DstAlign: kShadowTLSAlignment, Src: OriginPtr,
8028 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8029 } else {
8030 ArgKind AK = classifyArgument(arg: A);
8031 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
8032 AK = AK_Memory;
8033 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
8034 AK = AK_Memory;
8035 Value *ShadowBase, *OriginBase = nullptr;
8036 switch (AK) {
8037 case AK_GeneralPurpose:
8038 ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: GpOffset);
8039 if (MS.TrackOrigins)
8040 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: GpOffset);
8041 GpOffset += 8;
8042 assert(GpOffset <= kParamTLSSize);
8043 break;
8044 case AK_FloatingPoint:
8045 ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: FpOffset);
8046 if (MS.TrackOrigins)
8047 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: FpOffset);
8048 FpOffset += 16;
8049 assert(FpOffset <= kParamTLSSize);
8050 break;
8051 case AK_Memory:
8052 if (IsFixed)
8053 continue;
8054 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8055 uint64_t AlignedSize = alignTo(Value: ArgSize, Align: 8);
8056 unsigned BaseOffset = OverflowOffset;
8057 ShadowBase = getShadowPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
8058 if (MS.TrackOrigins) {
8059 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: OverflowOffset);
8060 }
8061 OverflowOffset += AlignedSize;
8062 if (OverflowOffset > kParamTLSSize) {
8063 // We have no space to copy shadow there.
8064 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8065 continue;
8066 }
8067 }
8068 // Take fixed arguments into account for GpOffset and FpOffset,
8069 // but don't actually store shadows for them.
8070 // TODO(glider): don't call get*PtrForVAArgument() for them.
8071 if (IsFixed)
8072 continue;
8073 Value *Shadow = MSV.getShadow(V: A);
8074 IRB.CreateAlignedStore(Val: Shadow, Ptr: ShadowBase, Align: kShadowTLSAlignment);
8075 if (MS.TrackOrigins) {
8076 Value *Origin = MSV.getOrigin(V: A);
8077 TypeSize StoreSize = DL.getTypeStoreSize(Ty: Shadow->getType());
8078 MSV.paintOrigin(IRB, Origin, OriginPtr: OriginBase, TS: StoreSize,
8079 Alignment: std::max(a: kShadowTLSAlignment, b: kMinOriginAlignment));
8080 }
8081 }
8082 }
8083 Constant *OverflowSize =
8084 ConstantInt::get(Ty: IRB.getInt64Ty(), V: OverflowOffset - AMD64FpEndOffset);
8085 IRB.CreateStore(Val: OverflowSize, Ptr: MS.VAArgOverflowSizeTLS);
8086 }
8087
8088 void finalizeInstrumentation() override {
8089 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8090 "finalizeInstrumentation called twice");
8091 if (!VAStartInstrumentationList.empty()) {
8092 // If there is a va_start in this function, make a backup copy of
8093 // va_arg_tls somewhere in the function entry block.
8094 IRBuilder<> IRB(MSV.FnPrologueEnd);
8095 VAArgOverflowSize =
8096 IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8097 Value *CopySize = IRB.CreateAdd(
8098 LHS: ConstantInt::get(Ty: MS.IntptrTy, V: AMD64FpEndOffset), RHS: VAArgOverflowSize);
8099 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8100 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8101 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8102 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8103
8104 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8105 ID: Intrinsic::umin, LHS: CopySize,
8106 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8107 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8108 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8109 if (MS.TrackOrigins) {
8110 VAArgTLSOriginCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8111 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8112 IRB.CreateMemCpy(Dst: VAArgTLSOriginCopy, DstAlign: kShadowTLSAlignment,
8113 Src: MS.VAArgOriginTLS, SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8114 }
8115 }
8116
8117 // Instrument va_start.
8118 // Copy va_list shadow from the backup copy of the TLS contents.
8119 for (CallInst *OrigInst : VAStartInstrumentationList) {
8120 NextNodeIRBuilder IRB(OrigInst);
8121 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8122
8123 Value *RegSaveAreaPtrPtr =
8124 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: 16));
8125 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8126 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8127 const Align Alignment = Align(16);
8128 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8129 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8130 Alignment, /*isStore*/ true);
8131 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8132 Size: AMD64FpEndOffset);
8133 if (MS.TrackOrigins)
8134 IRB.CreateMemCpy(Dst: RegSaveAreaOriginPtr, DstAlign: Alignment, Src: VAArgTLSOriginCopy,
8135 SrcAlign: Alignment, Size: AMD64FpEndOffset);
8136 Value *OverflowArgAreaPtrPtr =
8137 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: 8));
8138 Value *OverflowArgAreaPtr =
8139 IRB.CreateLoad(Ty: MS.PtrTy, Ptr: OverflowArgAreaPtrPtr);
8140 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8141 std::tie(args&: OverflowArgAreaShadowPtr, args&: OverflowArgAreaOriginPtr) =
8142 MSV.getShadowOriginPtr(Addr: OverflowArgAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8143 Alignment, /*isStore*/ true);
8144 Value *SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSCopy,
8145 Idx0: AMD64FpEndOffset);
8146 IRB.CreateMemCpy(Dst: OverflowArgAreaShadowPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
8147 Size: VAArgOverflowSize);
8148 if (MS.TrackOrigins) {
8149 SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSOriginCopy,
8150 Idx0: AMD64FpEndOffset);
8151 IRB.CreateMemCpy(Dst: OverflowArgAreaOriginPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
8152 Size: VAArgOverflowSize);
8153 }
8154 }
8155 }
8156};
8157
8158/// AArch64-specific implementation of VarArgHelper.
8159struct VarArgAArch64Helper : public VarArgHelperBase {
8160 static const unsigned kAArch64GrArgSize = 64;
8161 static const unsigned kAArch64VrArgSize = 128;
8162
8163 static const unsigned AArch64GrBegOffset = 0;
8164 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
8165 // Make VR space aligned to 16 bytes.
8166 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
8167 static const unsigned AArch64VrEndOffset =
8168 AArch64VrBegOffset + kAArch64VrArgSize;
8169 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
8170
8171 AllocaInst *VAArgTLSCopy = nullptr;
8172 Value *VAArgOverflowSize = nullptr;
8173
8174 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
8175
8176 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
8177 MemorySanitizerVisitor &MSV)
8178 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
8179
8180 // A very rough approximation of aarch64 argument classification rules.
8181 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
8182 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
8183 return {AK_GeneralPurpose, 1};
8184 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
8185 return {AK_FloatingPoint, 1};
8186
8187 if (T->isArrayTy()) {
8188 auto R = classifyArgument(T: T->getArrayElementType());
8189 R.second *= T->getScalarType()->getArrayNumElements();
8190 return R;
8191 }
8192
8193 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(Val: T)) {
8194 auto R = classifyArgument(T: FV->getScalarType());
8195 R.second *= FV->getNumElements();
8196 return R;
8197 }
8198
8199 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
8200 return {AK_Memory, 0};
8201 }
8202
8203 // The instrumentation stores the argument shadow in a non ABI-specific
8204 // format because it does not know which argument is named (since Clang,
8205 // like x86_64 case, lowers the va_args in the frontend and this pass only
8206 // sees the low level code that deals with va_list internals).
8207 // The first seven GR registers are saved in the first 56 bytes of the
8208 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
8209 // the remaining arguments.
8210 // Using constant offset within the va_arg TLS array allows fast copy
8211 // in the finalize instrumentation.
8212 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8213 unsigned GrOffset = AArch64GrBegOffset;
8214 unsigned VrOffset = AArch64VrBegOffset;
8215 unsigned OverflowOffset = AArch64VAEndOffset;
8216
8217 const DataLayout &DL = F.getDataLayout();
8218 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8219 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8220 auto [AK, RegNum] = classifyArgument(T: A->getType());
8221 if (AK == AK_GeneralPurpose &&
8222 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
8223 AK = AK_Memory;
8224 if (AK == AK_FloatingPoint &&
8225 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
8226 AK = AK_Memory;
8227 Value *Base;
8228 switch (AK) {
8229 case AK_GeneralPurpose:
8230 Base = getShadowPtrForVAArgument(IRB, ArgOffset: GrOffset);
8231 GrOffset += 8 * RegNum;
8232 break;
8233 case AK_FloatingPoint:
8234 Base = getShadowPtrForVAArgument(IRB, ArgOffset: VrOffset);
8235 VrOffset += 16 * RegNum;
8236 break;
8237 case AK_Memory:
8238 // Don't count fixed arguments in the overflow area - va_start will
8239 // skip right over them.
8240 if (IsFixed)
8241 continue;
8242 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8243 uint64_t AlignedSize = alignTo(Value: ArgSize, Align: 8);
8244 unsigned BaseOffset = OverflowOffset;
8245 Base = getShadowPtrForVAArgument(IRB, ArgOffset: BaseOffset);
8246 OverflowOffset += AlignedSize;
8247 if (OverflowOffset > kParamTLSSize) {
8248 // We have no space to copy shadow there.
8249 CleanUnusedTLS(IRB, ShadowBase: Base, BaseOffset);
8250 continue;
8251 }
8252 break;
8253 }
8254 // Count Gp/Vr fixed arguments to their respective offsets, but don't
8255 // bother to actually store a shadow.
8256 if (IsFixed)
8257 continue;
8258 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
8259 }
8260 Constant *OverflowSize =
8261 ConstantInt::get(Ty: IRB.getInt64Ty(), V: OverflowOffset - AArch64VAEndOffset);
8262 IRB.CreateStore(Val: OverflowSize, Ptr: MS.VAArgOverflowSizeTLS);
8263 }
8264
8265 // Retrieve a va_list field of 'void*' size.
8266 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8267 Value *SaveAreaPtrPtr =
8268 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: offset));
8269 return IRB.CreateLoad(Ty: Type::getInt64Ty(C&: *MS.C), Ptr: SaveAreaPtrPtr);
8270 }
8271
8272 // Retrieve a va_list field of 'int' size.
8273 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8274 Value *SaveAreaPtr =
8275 IRB.CreatePtrAdd(Ptr: VAListTag, Offset: ConstantInt::get(Ty: MS.IntptrTy, V: offset));
8276 Value *SaveArea32 = IRB.CreateLoad(Ty: IRB.getInt32Ty(), Ptr: SaveAreaPtr);
8277 return IRB.CreateSExt(V: SaveArea32, DestTy: MS.IntptrTy);
8278 }
8279
8280 void finalizeInstrumentation() override {
8281 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8282 "finalizeInstrumentation called twice");
8283 if (!VAStartInstrumentationList.empty()) {
8284 // If there is a va_start in this function, make a backup copy of
8285 // va_arg_tls somewhere in the function entry block.
8286 IRBuilder<> IRB(MSV.FnPrologueEnd);
8287 VAArgOverflowSize =
8288 IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8289 Value *CopySize = IRB.CreateAdd(
8290 LHS: ConstantInt::get(Ty: MS.IntptrTy, V: AArch64VAEndOffset), RHS: VAArgOverflowSize);
8291 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8292 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8293 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8294 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8295
8296 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8297 ID: Intrinsic::umin, LHS: CopySize,
8298 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8299 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8300 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8301 }
8302
8303 Value *GrArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: kAArch64GrArgSize);
8304 Value *VrArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: kAArch64VrArgSize);
8305
8306 // Instrument va_start, copy va_list shadow from the backup copy of
8307 // the TLS contents.
8308 for (CallInst *OrigInst : VAStartInstrumentationList) {
8309 NextNodeIRBuilder IRB(OrigInst);
8310
8311 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8312
8313 // The variadic ABI for AArch64 creates two areas to save the incoming
8314 // argument registers (one for 64-bit general register xn-x7 and another
8315 // for 128-bit FP/SIMD vn-v7).
8316 // We need then to propagate the shadow arguments on both regions
8317 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8318 // The remaining arguments are saved on shadow for 'va::stack'.
8319 // One caveat is it requires only to propagate the non-named arguments,
8320 // however on the call site instrumentation 'all' the arguments are
8321 // saved. So to copy the shadow values from the va_arg TLS array
8322 // we need to adjust the offset for both GR and VR fields based on
8323 // the __{gr,vr}_offs value (since they are stores based on incoming
8324 // named arguments).
8325 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8326
8327 // Read the stack pointer from the va_list.
8328 Value *StackSaveAreaPtr =
8329 IRB.CreateIntToPtr(V: getVAField64(IRB, VAListTag, offset: 0), DestTy: RegSaveAreaPtrTy);
8330
8331 // Read both the __gr_top and __gr_off and add them up.
8332 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, offset: 8);
8333 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, offset: 24);
8334
8335 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8336 V: IRB.CreateAdd(LHS: GrTopSaveAreaPtr, RHS: GrOffSaveArea), DestTy: RegSaveAreaPtrTy);
8337
8338 // Read both the __vr_top and __vr_off and add them up.
8339 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, offset: 16);
8340 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, offset: 28);
8341
8342 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8343 V: IRB.CreateAdd(LHS: VrTopSaveAreaPtr, RHS: VrOffSaveArea), DestTy: RegSaveAreaPtrTy);
8344
8345 // It does not know how many named arguments is being used and, on the
8346 // callsite all the arguments were saved. Since __gr_off is defined as
8347 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8348 // argument by ignoring the bytes of shadow from named arguments.
8349 Value *GrRegSaveAreaShadowPtrOff =
8350 IRB.CreateAdd(LHS: GrArgSize, RHS: GrOffSaveArea);
8351
8352 Value *GrRegSaveAreaShadowPtr =
8353 MSV.getShadowOriginPtr(Addr: GrRegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8354 Alignment: Align(8), /*isStore*/ true)
8355 .first;
8356
8357 Value *GrSrcPtr =
8358 IRB.CreateInBoundsPtrAdd(Ptr: VAArgTLSCopy, Offset: GrRegSaveAreaShadowPtrOff);
8359 Value *GrCopySize = IRB.CreateSub(LHS: GrArgSize, RHS: GrRegSaveAreaShadowPtrOff);
8360
8361 IRB.CreateMemCpy(Dst: GrRegSaveAreaShadowPtr, DstAlign: Align(8), Src: GrSrcPtr, SrcAlign: Align(8),
8362 Size: GrCopySize);
8363
8364 // Again, but for FP/SIMD values.
8365 Value *VrRegSaveAreaShadowPtrOff =
8366 IRB.CreateAdd(LHS: VrArgSize, RHS: VrOffSaveArea);
8367
8368 Value *VrRegSaveAreaShadowPtr =
8369 MSV.getShadowOriginPtr(Addr: VrRegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8370 Alignment: Align(8), /*isStore*/ true)
8371 .first;
8372
8373 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8374 Ptr: IRB.CreateInBoundsPtrAdd(Ptr: VAArgTLSCopy,
8375 Offset: IRB.getInt32(C: AArch64VrBegOffset)),
8376 Offset: VrRegSaveAreaShadowPtrOff);
8377 Value *VrCopySize = IRB.CreateSub(LHS: VrArgSize, RHS: VrRegSaveAreaShadowPtrOff);
8378
8379 IRB.CreateMemCpy(Dst: VrRegSaveAreaShadowPtr, DstAlign: Align(8), Src: VrSrcPtr, SrcAlign: Align(8),
8380 Size: VrCopySize);
8381
8382 // And finally for remaining arguments.
8383 Value *StackSaveAreaShadowPtr =
8384 MSV.getShadowOriginPtr(Addr: StackSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8385 Alignment: Align(16), /*isStore*/ true)
8386 .first;
8387
8388 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8389 Ptr: VAArgTLSCopy, Offset: IRB.getInt32(C: AArch64VAEndOffset));
8390
8391 IRB.CreateMemCpy(Dst: StackSaveAreaShadowPtr, DstAlign: Align(16), Src: StackSrcPtr,
8392 SrcAlign: Align(16), Size: VAArgOverflowSize);
8393 }
8394 }
8395};
8396
8397/// PowerPC64-specific implementation of VarArgHelper.
8398struct VarArgPowerPC64Helper : public VarArgHelperBase {
8399 AllocaInst *VAArgTLSCopy = nullptr;
8400 Value *VAArgSize = nullptr;
8401
8402 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8403 MemorySanitizerVisitor &MSV)
8404 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8405
8406 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8407 // For PowerPC, we need to deal with alignment of stack arguments -
8408 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8409 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8410 // For that reason, we compute current offset from stack pointer (which is
8411 // always properly aligned), and offset for the first vararg, then subtract
8412 // them.
8413 unsigned VAArgBase;
8414 Triple TargetTriple(F.getParent()->getTargetTriple());
8415 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8416 // and 32 bytes for ABIv2. This is usually determined by target
8417 // endianness, but in theory could be overridden by function attribute.
8418 if (TargetTriple.isPPC64ELFv2ABI())
8419 VAArgBase = 32;
8420 else
8421 VAArgBase = 48;
8422 unsigned VAArgOffset = VAArgBase;
8423 const DataLayout &DL = F.getDataLayout();
8424 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8425 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8426 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
8427 if (IsByVal) {
8428 assert(A->getType()->isPointerTy());
8429 Type *RealTy = CB.getParamByValType(ArgNo);
8430 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
8431 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(u: Align(8));
8432 if (ArgAlign < 8)
8433 ArgAlign = Align(8);
8434 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8435 if (!IsFixed) {
8436 Value *Base =
8437 getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase, ArgSize);
8438 if (Base) {
8439 Value *AShadowPtr, *AOriginPtr;
8440 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
8441 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(),
8442 Alignment: kShadowTLSAlignment, /*isStore*/ false);
8443
8444 IRB.CreateMemCpy(Dst: Base, DstAlign: kShadowTLSAlignment, Src: AShadowPtr,
8445 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8446 }
8447 }
8448 VAArgOffset += alignTo(Size: ArgSize, A: Align(8));
8449 } else {
8450 Value *Base;
8451 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
8452 Align ArgAlign = Align(8);
8453 if (A->getType()->isArrayTy()) {
8454 // Arrays are aligned to element size, except for long double
8455 // arrays, which are aligned to 8 bytes.
8456 Type *ElementTy = A->getType()->getArrayElementType();
8457 if (!ElementTy->isPPC_FP128Ty())
8458 ArgAlign = Align(DL.getTypeAllocSize(Ty: ElementTy));
8459 } else if (A->getType()->isVectorTy()) {
8460 // Vectors are naturally aligned.
8461 ArgAlign = Align(ArgSize);
8462 }
8463 if (ArgAlign < 8)
8464 ArgAlign = Align(8);
8465 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8466 if (DL.isBigEndian()) {
8467 // Adjusting the shadow for argument with size < 8 to match the
8468 // placement of bits in big endian system
8469 if (ArgSize < 8)
8470 VAArgOffset += (8 - ArgSize);
8471 }
8472 if (!IsFixed) {
8473 Base =
8474 getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase, ArgSize);
8475 if (Base)
8476 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
8477 }
8478 VAArgOffset += ArgSize;
8479 VAArgOffset = alignTo(Size: VAArgOffset, A: Align(8));
8480 }
8481 if (IsFixed)
8482 VAArgBase = VAArgOffset;
8483 }
8484
8485 Constant *TotalVAArgSize =
8486 ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset - VAArgBase);
8487 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8488 // a new class member i.e. it is the total size of all VarArgs.
8489 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
8490 }
8491
8492 void finalizeInstrumentation() override {
8493 assert(!VAArgSize && !VAArgTLSCopy &&
8494 "finalizeInstrumentation called twice");
8495 IRBuilder<> IRB(MSV.FnPrologueEnd);
8496 VAArgSize = IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8497 Value *CopySize = VAArgSize;
8498
8499 if (!VAStartInstrumentationList.empty()) {
8500 // If there is a va_start in this function, make a backup copy of
8501 // va_arg_tls somewhere in the function entry block.
8502
8503 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8504 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8505 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8506 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8507
8508 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8509 ID: Intrinsic::umin, LHS: CopySize,
8510 RHS: ConstantInt::get(Ty: IRB.getInt64Ty(), V: kParamTLSSize));
8511 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8512 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8513 }
8514
8515 // Instrument va_start.
8516 // Copy va_list shadow from the backup copy of the TLS contents.
8517 for (CallInst *OrigInst : VAStartInstrumentationList) {
8518 NextNodeIRBuilder IRB(OrigInst);
8519 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8520 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy);
8521
8522 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(V: RegSaveAreaPtrPtr, DestTy: MS.PtrTy);
8523
8524 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8525 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8526 const DataLayout &DL = F.getDataLayout();
8527 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8528 const Align Alignment = Align(IntptrSize);
8529 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8530 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8531 Alignment, /*isStore*/ true);
8532 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8533 Size: CopySize);
8534 }
8535 }
8536};
8537
8538/// PowerPC32-specific implementation of VarArgHelper.
8539struct VarArgPowerPC32Helper : public VarArgHelperBase {
8540 AllocaInst *VAArgTLSCopy = nullptr;
8541 Value *VAArgSize = nullptr;
8542
8543 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8544 MemorySanitizerVisitor &MSV)
8545 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8546
8547 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8548 unsigned VAArgBase;
8549 // Parameter save area is 8 bytes from frame pointer in PPC32
8550 VAArgBase = 8;
8551 unsigned VAArgOffset = VAArgBase;
8552 const DataLayout &DL = F.getDataLayout();
8553 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8554 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8555 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8556 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
8557 if (IsByVal) {
8558 assert(A->getType()->isPointerTy());
8559 Type *RealTy = CB.getParamByValType(ArgNo);
8560 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
8561 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(u: Align(IntptrSize));
8562 if (ArgAlign < IntptrSize)
8563 ArgAlign = Align(IntptrSize);
8564 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8565 if (!IsFixed) {
8566 Value *Base =
8567 getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase, ArgSize);
8568 if (Base) {
8569 Value *AShadowPtr, *AOriginPtr;
8570 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
8571 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(),
8572 Alignment: kShadowTLSAlignment, /*isStore*/ false);
8573
8574 IRB.CreateMemCpy(Dst: Base, DstAlign: kShadowTLSAlignment, Src: AShadowPtr,
8575 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
8576 }
8577 }
8578 VAArgOffset += alignTo(Size: ArgSize, A: Align(IntptrSize));
8579 } else {
8580 Value *Base;
8581 Type *ArgTy = A->getType();
8582
8583 // On PPC 32 floating point variable arguments are stored in separate
8584 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8585 // them as they will be found when checking call arguments.
8586 if (!ArgTy->isFloatingPointTy()) {
8587 uint64_t ArgSize = DL.getTypeAllocSize(Ty: ArgTy);
8588 Align ArgAlign = Align(IntptrSize);
8589 if (ArgTy->isArrayTy()) {
8590 // Arrays are aligned to element size, except for long double
8591 // arrays, which are aligned to 8 bytes.
8592 Type *ElementTy = ArgTy->getArrayElementType();
8593 if (!ElementTy->isPPC_FP128Ty())
8594 ArgAlign = Align(DL.getTypeAllocSize(Ty: ElementTy));
8595 } else if (ArgTy->isVectorTy()) {
8596 // Vectors are naturally aligned.
8597 ArgAlign = Align(ArgSize);
8598 }
8599 if (ArgAlign < IntptrSize)
8600 ArgAlign = Align(IntptrSize);
8601 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
8602 if (DL.isBigEndian()) {
8603 // Adjusting the shadow for argument with size < IntptrSize to match
8604 // the placement of bits in big endian system
8605 if (ArgSize < IntptrSize)
8606 VAArgOffset += (IntptrSize - ArgSize);
8607 }
8608 if (!IsFixed) {
8609 Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset - VAArgBase,
8610 ArgSize);
8611 if (Base)
8612 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base,
8613 Align: kShadowTLSAlignment);
8614 }
8615 VAArgOffset += ArgSize;
8616 VAArgOffset = alignTo(Size: VAArgOffset, A: Align(IntptrSize));
8617 }
8618 }
8619 }
8620
8621 Constant *TotalVAArgSize =
8622 ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset - VAArgBase);
8623 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8624 // a new class member i.e. it is the total size of all VarArgs.
8625 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
8626 }
8627
8628 void finalizeInstrumentation() override {
8629 assert(!VAArgSize && !VAArgTLSCopy &&
8630 "finalizeInstrumentation called twice");
8631 IRBuilder<> IRB(MSV.FnPrologueEnd);
8632 VAArgSize = IRB.CreateLoad(Ty: MS.IntptrTy, Ptr: MS.VAArgOverflowSizeTLS);
8633 Value *CopySize = VAArgSize;
8634
8635 if (!VAStartInstrumentationList.empty()) {
8636 // If there is a va_start in this function, make a backup copy of
8637 // va_arg_tls somewhere in the function entry block.
8638
8639 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8640 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8641 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8642 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8643
8644 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8645 ID: Intrinsic::umin, LHS: CopySize,
8646 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8647 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8648 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8649 }
8650
8651 // Instrument va_start.
8652 // Copy va_list shadow from the backup copy of the TLS contents.
8653 for (CallInst *OrigInst : VAStartInstrumentationList) {
8654 NextNodeIRBuilder IRB(OrigInst);
8655 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8656 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy);
8657 Value *RegSaveAreaSize = CopySize;
8658
8659 // In PPC32 va_list_tag is a struct
8660 RegSaveAreaPtrPtr =
8661 IRB.CreateAdd(LHS: RegSaveAreaPtrPtr, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 8));
8662
8663 // On PPC 32 reg_save_area can only hold 32 bytes of data
8664 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8665 ID: Intrinsic::umin, LHS: CopySize, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 32));
8666
8667 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(V: RegSaveAreaPtrPtr, DestTy: MS.PtrTy);
8668 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8669
8670 const DataLayout &DL = F.getDataLayout();
8671 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
8672 const Align Alignment = Align(IntptrSize);
8673
8674 { // Copy reg save area
8675 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8676 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8677 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8678 Alignment, /*isStore*/ true);
8679 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy,
8680 SrcAlign: Alignment, Size: RegSaveAreaSize);
8681
8682 RegSaveAreaShadowPtr =
8683 IRB.CreatePtrToInt(V: RegSaveAreaShadowPtr, DestTy: MS.IntptrTy);
8684 Value *FPSaveArea = IRB.CreateAdd(LHS: RegSaveAreaShadowPtr,
8685 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 32));
8686 FPSaveArea = IRB.CreateIntToPtr(V: FPSaveArea, DestTy: MS.PtrTy);
8687 // We fill fp shadow with zeroes as uninitialized fp args should have
8688 // been found during call base check
8689 IRB.CreateMemSet(Ptr: FPSaveArea, Val: ConstantInt::getNullValue(Ty: IRB.getInt8Ty()),
8690 Size: ConstantInt::get(Ty: MS.IntptrTy, V: 32), Align: Alignment);
8691 }
8692
8693 { // Copy overflow area
8694 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8695 Value *OverflowAreaSize = IRB.CreateSub(LHS: CopySize, RHS: RegSaveAreaSize);
8696
8697 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy);
8698 OverflowAreaPtrPtr =
8699 IRB.CreateAdd(LHS: OverflowAreaPtrPtr, RHS: ConstantInt::get(Ty: MS.IntptrTy, V: 4));
8700 OverflowAreaPtrPtr = IRB.CreateIntToPtr(V: OverflowAreaPtrPtr, DestTy: MS.PtrTy);
8701
8702 Value *OverflowAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: OverflowAreaPtrPtr);
8703
8704 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8705 std::tie(args&: OverflowAreaShadowPtr, args&: OverflowAreaOriginPtr) =
8706 MSV.getShadowOriginPtr(Addr: OverflowAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8707 Alignment, /*isStore*/ true);
8708
8709 Value *OverflowVAArgTLSCopyPtr =
8710 IRB.CreatePtrToInt(V: VAArgTLSCopy, DestTy: MS.IntptrTy);
8711 OverflowVAArgTLSCopyPtr =
8712 IRB.CreateAdd(LHS: OverflowVAArgTLSCopyPtr, RHS: RegSaveAreaSize);
8713
8714 OverflowVAArgTLSCopyPtr =
8715 IRB.CreateIntToPtr(V: OverflowVAArgTLSCopyPtr, DestTy: MS.PtrTy);
8716 IRB.CreateMemCpy(Dst: OverflowAreaShadowPtr, DstAlign: Alignment,
8717 Src: OverflowVAArgTLSCopyPtr, SrcAlign: Alignment, Size: OverflowAreaSize);
8718 }
8719 }
8720 }
8721};
8722
8723/// SystemZ-specific implementation of VarArgHelper.
8724struct VarArgSystemZHelper : public VarArgHelperBase {
8725 static const unsigned SystemZGpOffset = 16;
8726 static const unsigned SystemZGpEndOffset = 56;
8727 static const unsigned SystemZFpOffset = 128;
8728 static const unsigned SystemZFpEndOffset = 160;
8729 static const unsigned SystemZMaxVrArgs = 8;
8730 static const unsigned SystemZRegSaveAreaSize = 160;
8731 static const unsigned SystemZOverflowOffset = 160;
8732 static const unsigned SystemZVAListTagSize = 32;
8733 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8734 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8735
8736 bool IsSoftFloatABI;
8737 AllocaInst *VAArgTLSCopy = nullptr;
8738 AllocaInst *VAArgTLSOriginCopy = nullptr;
8739 Value *VAArgOverflowSize = nullptr;
8740
8741 enum class ArgKind {
8742 GeneralPurpose,
8743 FloatingPoint,
8744 Vector,
8745 Memory,
8746 Indirect,
8747 };
8748
8749 enum class ShadowExtension { None, Zero, Sign };
8750
8751 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8752 MemorySanitizerVisitor &MSV)
8753 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8754 IsSoftFloatABI(F.getFnAttribute(Kind: "use-soft-float").getValueAsBool()) {}
8755
8756 ArgKind classifyArgument(Type *T) {
8757 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8758 // only a few possibilities of what it can be. In particular, enums, single
8759 // element structs and large types have already been taken care of.
8760
8761 // Some i128 and fp128 arguments are converted to pointers only in the
8762 // back end.
8763 if (T->isIntegerTy(Bitwidth: 128) || T->isFP128Ty())
8764 return ArgKind::Indirect;
8765 if (T->isFloatingPointTy())
8766 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8767 if (T->isIntegerTy() || T->isPointerTy())
8768 return ArgKind::GeneralPurpose;
8769 if (T->isVectorTy())
8770 return ArgKind::Vector;
8771 return ArgKind::Memory;
8772 }
8773
8774 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8775 // ABI says: "One of the simple integer types no more than 64 bits wide.
8776 // ... If such an argument is shorter than 64 bits, replace it by a full
8777 // 64-bit integer representing the same number, using sign or zero
8778 // extension". Shadow for an integer argument has the same type as the
8779 // argument itself, so it can be sign or zero extended as well.
8780 bool ZExt = CB.paramHasAttr(ArgNo, Kind: Attribute::ZExt);
8781 bool SExt = CB.paramHasAttr(ArgNo, Kind: Attribute::SExt);
8782 if (ZExt) {
8783 assert(!SExt);
8784 return ShadowExtension::Zero;
8785 }
8786 if (SExt) {
8787 assert(!ZExt);
8788 return ShadowExtension::Sign;
8789 }
8790 return ShadowExtension::None;
8791 }
8792
8793 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8794 unsigned GpOffset = SystemZGpOffset;
8795 unsigned FpOffset = SystemZFpOffset;
8796 unsigned VrIndex = 0;
8797 unsigned OverflowOffset = SystemZOverflowOffset;
8798 const DataLayout &DL = F.getDataLayout();
8799 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
8800 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8801 // SystemZABIInfo does not produce ByVal parameters.
8802 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8803 Type *T = A->getType();
8804 ArgKind AK = classifyArgument(T);
8805 if (AK == ArgKind::Indirect) {
8806 T = MS.PtrTy;
8807 AK = ArgKind::GeneralPurpose;
8808 }
8809 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8810 AK = ArgKind::Memory;
8811 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8812 AK = ArgKind::Memory;
8813 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8814 AK = ArgKind::Memory;
8815 Value *ShadowBase = nullptr;
8816 Value *OriginBase = nullptr;
8817 ShadowExtension SE = ShadowExtension::None;
8818 switch (AK) {
8819 case ArgKind::GeneralPurpose: {
8820 // Always keep track of GpOffset, but store shadow only for varargs.
8821 uint64_t ArgSize = 8;
8822 if (GpOffset + ArgSize <= kParamTLSSize) {
8823 if (!IsFixed) {
8824 SE = getShadowExtension(CB, ArgNo);
8825 uint64_t GapSize = 0;
8826 if (SE == ShadowExtension::None) {
8827 uint64_t ArgAllocSize = DL.getTypeAllocSize(Ty: T);
8828 assert(ArgAllocSize <= ArgSize);
8829 GapSize = ArgSize - ArgAllocSize;
8830 }
8831 ShadowBase = getShadowAddrForVAArgument(IRB, ArgOffset: GpOffset + GapSize);
8832 if (MS.TrackOrigins)
8833 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: GpOffset + GapSize);
8834 }
8835 GpOffset += ArgSize;
8836 } else {
8837 GpOffset = kParamTLSSize;
8838 }
8839 break;
8840 }
8841 case ArgKind::FloatingPoint: {
8842 // Always keep track of FpOffset, but store shadow only for varargs.
8843 uint64_t ArgSize = 8;
8844 if (FpOffset + ArgSize <= kParamTLSSize) {
8845 if (!IsFixed) {
8846 // PoP says: "A short floating-point datum requires only the
8847 // left-most 32 bit positions of a floating-point register".
8848 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8849 // don't extend shadow and don't mind the gap.
8850 ShadowBase = getShadowAddrForVAArgument(IRB, ArgOffset: FpOffset);
8851 if (MS.TrackOrigins)
8852 OriginBase = getOriginPtrForVAArgument(IRB, ArgOffset: FpOffset);
8853 }
8854 FpOffset += ArgSize;
8855 } else {
8856 FpOffset = kParamTLSSize;
8857 }
8858 break;
8859 }
8860 case ArgKind::Vector: {
8861 // Keep track of VrIndex. No need to store shadow, since vector varargs
8862 // go through AK_Memory.
8863 assert(IsFixed);
8864 VrIndex++;
8865 break;
8866 }
8867 case ArgKind::Memory: {
8868 // Keep track of OverflowOffset and store shadow only for varargs.
8869 // Ignore fixed args, since we need to copy only the vararg portion of
8870 // the overflow area shadow.
8871 if (!IsFixed) {
8872 uint64_t ArgAllocSize = DL.getTypeAllocSize(Ty: T);
8873 uint64_t ArgSize = alignTo(Value: ArgAllocSize, Align: 8);
8874 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8875 SE = getShadowExtension(CB, ArgNo);
8876 uint64_t GapSize =
8877 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8878 ShadowBase =
8879 getShadowAddrForVAArgument(IRB, ArgOffset: OverflowOffset + GapSize);
8880 if (MS.TrackOrigins)
8881 OriginBase =
8882 getOriginPtrForVAArgument(IRB, ArgOffset: OverflowOffset + GapSize);
8883 OverflowOffset += ArgSize;
8884 } else {
8885 OverflowOffset = kParamTLSSize;
8886 }
8887 }
8888 break;
8889 }
8890 case ArgKind::Indirect:
8891 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8892 }
8893 if (ShadowBase == nullptr)
8894 continue;
8895 Value *Shadow = MSV.getShadow(V: A);
8896 if (SE != ShadowExtension::None)
8897 Shadow = MSV.CreateShadowCast(IRB, V: Shadow, dstTy: IRB.getInt64Ty(),
8898 /*Signed*/ SE == ShadowExtension::Sign);
8899 ShadowBase = IRB.CreateIntToPtr(V: ShadowBase, DestTy: MS.PtrTy, Name: "_msarg_va_s");
8900 IRB.CreateStore(Val: Shadow, Ptr: ShadowBase);
8901 if (MS.TrackOrigins) {
8902 Value *Origin = MSV.getOrigin(V: A);
8903 TypeSize StoreSize = DL.getTypeStoreSize(Ty: Shadow->getType());
8904 MSV.paintOrigin(IRB, Origin, OriginPtr: OriginBase, TS: StoreSize,
8905 Alignment: kMinOriginAlignment);
8906 }
8907 }
8908 Constant *OverflowSize = ConstantInt::get(
8909 Ty: IRB.getInt64Ty(), V: OverflowOffset - SystemZOverflowOffset);
8910 IRB.CreateStore(Val: OverflowSize, Ptr: MS.VAArgOverflowSizeTLS);
8911 }
8912
8913 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8914 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8915 V: IRB.CreateAdd(
8916 LHS: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
8917 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: SystemZRegSaveAreaPtrOffset)),
8918 DestTy: MS.PtrTy);
8919 Value *RegSaveAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: RegSaveAreaPtrPtr);
8920 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8921 const Align Alignment = Align(8);
8922 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
8923 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(), Alignment,
8924 /*isStore*/ true);
8925 // TODO(iii): copy only fragments filled by visitCallBase()
8926 // TODO(iii): support packed-stack && !use-soft-float
8927 // For use-soft-float functions, it is enough to copy just the GPRs.
8928 unsigned RegSaveAreaSize =
8929 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8930 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
8931 Size: RegSaveAreaSize);
8932 if (MS.TrackOrigins)
8933 IRB.CreateMemCpy(Dst: RegSaveAreaOriginPtr, DstAlign: Alignment, Src: VAArgTLSOriginCopy,
8934 SrcAlign: Alignment, Size: RegSaveAreaSize);
8935 }
8936
8937 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8938 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8939 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8940 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8941 V: IRB.CreateAdd(
8942 LHS: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
8943 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: SystemZOverflowArgAreaPtrOffset)),
8944 DestTy: MS.PtrTy);
8945 Value *OverflowArgAreaPtr = IRB.CreateLoad(Ty: MS.PtrTy, Ptr: OverflowArgAreaPtrPtr);
8946 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8947 const Align Alignment = Align(8);
8948 std::tie(args&: OverflowArgAreaShadowPtr, args&: OverflowArgAreaOriginPtr) =
8949 MSV.getShadowOriginPtr(Addr: OverflowArgAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
8950 Alignment, /*isStore*/ true);
8951 Value *SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSCopy,
8952 Idx0: SystemZOverflowOffset);
8953 IRB.CreateMemCpy(Dst: OverflowArgAreaShadowPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
8954 Size: VAArgOverflowSize);
8955 if (MS.TrackOrigins) {
8956 SrcPtr = IRB.CreateConstGEP1_32(Ty: IRB.getInt8Ty(), Ptr: VAArgTLSOriginCopy,
8957 Idx0: SystemZOverflowOffset);
8958 IRB.CreateMemCpy(Dst: OverflowArgAreaOriginPtr, DstAlign: Alignment, Src: SrcPtr, SrcAlign: Alignment,
8959 Size: VAArgOverflowSize);
8960 }
8961 }
8962
8963 void finalizeInstrumentation() override {
8964 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8965 "finalizeInstrumentation called twice");
8966 if (!VAStartInstrumentationList.empty()) {
8967 // If there is a va_start in this function, make a backup copy of
8968 // va_arg_tls somewhere in the function entry block.
8969 IRBuilder<> IRB(MSV.FnPrologueEnd);
8970 VAArgOverflowSize =
8971 IRB.CreateLoad(Ty: IRB.getInt64Ty(), Ptr: MS.VAArgOverflowSizeTLS);
8972 Value *CopySize =
8973 IRB.CreateAdd(LHS: ConstantInt::get(Ty: MS.IntptrTy, V: SystemZOverflowOffset),
8974 RHS: VAArgOverflowSize);
8975 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8976 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8977 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
8978 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
8979
8980 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8981 ID: Intrinsic::umin, LHS: CopySize,
8982 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
8983 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
8984 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8985 if (MS.TrackOrigins) {
8986 VAArgTLSOriginCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
8987 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8988 IRB.CreateMemCpy(Dst: VAArgTLSOriginCopy, DstAlign: kShadowTLSAlignment,
8989 Src: MS.VAArgOriginTLS, SrcAlign: kShadowTLSAlignment, Size: SrcSize);
8990 }
8991 }
8992
8993 // Instrument va_start.
8994 // Copy va_list shadow from the backup copy of the TLS contents.
8995 for (CallInst *OrigInst : VAStartInstrumentationList) {
8996 NextNodeIRBuilder IRB(OrigInst);
8997 Value *VAListTag = OrigInst->getArgOperand(i: 0);
8998 copyRegSaveArea(IRB, VAListTag);
8999 copyOverflowArea(IRB, VAListTag);
9000 }
9001 }
9002};
9003
9004/// i386-specific implementation of VarArgHelper.
9005struct VarArgI386Helper : public VarArgHelperBase {
9006 AllocaInst *VAArgTLSCopy = nullptr;
9007 Value *VAArgSize = nullptr;
9008
9009 VarArgI386Helper(Function &F, MemorySanitizer &MS,
9010 MemorySanitizerVisitor &MSV)
9011 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
9012
9013 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9014 const DataLayout &DL = F.getDataLayout();
9015 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
9016 unsigned VAArgOffset = 0;
9017 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
9018 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9019 bool IsByVal = CB.paramHasAttr(ArgNo, Kind: Attribute::ByVal);
9020 if (IsByVal) {
9021 assert(A->getType()->isPointerTy());
9022 Type *RealTy = CB.getParamByValType(ArgNo);
9023 uint64_t ArgSize = DL.getTypeAllocSize(Ty: RealTy);
9024 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(u: Align(IntptrSize));
9025 if (ArgAlign < IntptrSize)
9026 ArgAlign = Align(IntptrSize);
9027 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
9028 if (!IsFixed) {
9029 Value *Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset, ArgSize);
9030 if (Base) {
9031 Value *AShadowPtr, *AOriginPtr;
9032 std::tie(args&: AShadowPtr, args&: AOriginPtr) =
9033 MSV.getShadowOriginPtr(Addr: A, IRB, ShadowTy: IRB.getInt8Ty(),
9034 Alignment: kShadowTLSAlignment, /*isStore*/ false);
9035
9036 IRB.CreateMemCpy(Dst: Base, DstAlign: kShadowTLSAlignment, Src: AShadowPtr,
9037 SrcAlign: kShadowTLSAlignment, Size: ArgSize);
9038 }
9039 VAArgOffset += alignTo(Size: ArgSize, A: Align(IntptrSize));
9040 }
9041 } else {
9042 Value *Base;
9043 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
9044 Align ArgAlign = Align(IntptrSize);
9045 VAArgOffset = alignTo(Size: VAArgOffset, A: ArgAlign);
9046 if (DL.isBigEndian()) {
9047 // Adjusting the shadow for argument with size < IntptrSize to match
9048 // the placement of bits in big endian system
9049 if (ArgSize < IntptrSize)
9050 VAArgOffset += (IntptrSize - ArgSize);
9051 }
9052 if (!IsFixed) {
9053 Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset, ArgSize);
9054 if (Base)
9055 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
9056 VAArgOffset += ArgSize;
9057 VAArgOffset = alignTo(Size: VAArgOffset, A: Align(IntptrSize));
9058 }
9059 }
9060 }
9061
9062 Constant *TotalVAArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset);
9063 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9064 // a new class member i.e. it is the total size of all VarArgs.
9065 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
9066 }
9067
9068 void finalizeInstrumentation() override {
9069 assert(!VAArgSize && !VAArgTLSCopy &&
9070 "finalizeInstrumentation called twice");
9071 IRBuilder<> IRB(MSV.FnPrologueEnd);
9072 VAArgSize = IRB.CreateLoad(Ty: MS.IntptrTy, Ptr: MS.VAArgOverflowSizeTLS);
9073 Value *CopySize = VAArgSize;
9074
9075 if (!VAStartInstrumentationList.empty()) {
9076 // If there is a va_start in this function, make a backup copy of
9077 // va_arg_tls somewhere in the function entry block.
9078 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
9079 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9080 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
9081 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
9082
9083 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9084 ID: Intrinsic::umin, LHS: CopySize,
9085 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
9086 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
9087 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
9088 }
9089
9090 // Instrument va_start.
9091 // Copy va_list shadow from the backup copy of the TLS contents.
9092 for (CallInst *OrigInst : VAStartInstrumentationList) {
9093 NextNodeIRBuilder IRB(OrigInst);
9094 Value *VAListTag = OrigInst->getArgOperand(i: 0);
9095 Type *RegSaveAreaPtrTy = PointerType::getUnqual(C&: *MS.C);
9096 Value *RegSaveAreaPtrPtr =
9097 IRB.CreateIntToPtr(V: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
9098 DestTy: PointerType::get(C&: *MS.C, AddressSpace: 0));
9099 Value *RegSaveAreaPtr =
9100 IRB.CreateLoad(Ty: RegSaveAreaPtrTy, Ptr: RegSaveAreaPtrPtr);
9101 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9102 const DataLayout &DL = F.getDataLayout();
9103 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
9104 const Align Alignment = Align(IntptrSize);
9105 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
9106 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
9107 Alignment, /*isStore*/ true);
9108 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
9109 Size: CopySize);
9110 }
9111 }
9112};
9113
9114/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
9115/// LoongArch64.
9116struct VarArgGenericHelper : public VarArgHelperBase {
9117 AllocaInst *VAArgTLSCopy = nullptr;
9118 Value *VAArgSize = nullptr;
9119
9120 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
9121 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
9122 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
9123
9124 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9125 unsigned VAArgOffset = 0;
9126 const DataLayout &DL = F.getDataLayout();
9127 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
9128 for (const auto &[ArgNo, A] : llvm::enumerate(First: CB.args())) {
9129 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9130 if (IsFixed)
9131 continue;
9132 uint64_t ArgSize = DL.getTypeAllocSize(Ty: A->getType());
9133 if (DL.isBigEndian()) {
9134 // Adjusting the shadow for argument with size < IntptrSize to match the
9135 // placement of bits in big endian system
9136 if (ArgSize < IntptrSize)
9137 VAArgOffset += (IntptrSize - ArgSize);
9138 }
9139 Value *Base = getShadowPtrForVAArgument(IRB, ArgOffset: VAArgOffset, ArgSize);
9140 VAArgOffset += ArgSize;
9141 VAArgOffset = alignTo(Value: VAArgOffset, Align: IntptrSize);
9142 if (!Base)
9143 continue;
9144 IRB.CreateAlignedStore(Val: MSV.getShadow(V: A), Ptr: Base, Align: kShadowTLSAlignment);
9145 }
9146
9147 Constant *TotalVAArgSize = ConstantInt::get(Ty: MS.IntptrTy, V: VAArgOffset);
9148 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9149 // a new class member i.e. it is the total size of all VarArgs.
9150 IRB.CreateStore(Val: TotalVAArgSize, Ptr: MS.VAArgOverflowSizeTLS);
9151 }
9152
9153 void finalizeInstrumentation() override {
9154 assert(!VAArgSize && !VAArgTLSCopy &&
9155 "finalizeInstrumentation called twice");
9156 IRBuilder<> IRB(MSV.FnPrologueEnd);
9157 VAArgSize = IRB.CreateLoad(Ty: MS.IntptrTy, Ptr: MS.VAArgOverflowSizeTLS);
9158 Value *CopySize = VAArgSize;
9159
9160 if (!VAStartInstrumentationList.empty()) {
9161 // If there is a va_start in this function, make a backup copy of
9162 // va_arg_tls somewhere in the function entry block.
9163 VAArgTLSCopy = IRB.CreateAlloca(Ty: Type::getInt8Ty(C&: *MS.C), ArraySize: CopySize);
9164 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9165 IRB.CreateMemSet(Ptr: VAArgTLSCopy, Val: Constant::getNullValue(Ty: IRB.getInt8Ty()),
9166 Size: CopySize, Align: kShadowTLSAlignment, isVolatile: false);
9167
9168 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9169 ID: Intrinsic::umin, LHS: CopySize,
9170 RHS: ConstantInt::get(Ty: MS.IntptrTy, V: kParamTLSSize));
9171 IRB.CreateMemCpy(Dst: VAArgTLSCopy, DstAlign: kShadowTLSAlignment, Src: MS.VAArgTLS,
9172 SrcAlign: kShadowTLSAlignment, Size: SrcSize);
9173 }
9174
9175 // Instrument va_start.
9176 // Copy va_list shadow from the backup copy of the TLS contents.
9177 for (CallInst *OrigInst : VAStartInstrumentationList) {
9178 NextNodeIRBuilder IRB(OrigInst);
9179 Value *VAListTag = OrigInst->getArgOperand(i: 0);
9180 Type *RegSaveAreaPtrTy = PointerType::getUnqual(C&: *MS.C);
9181 Value *RegSaveAreaPtrPtr =
9182 IRB.CreateIntToPtr(V: IRB.CreatePtrToInt(V: VAListTag, DestTy: MS.IntptrTy),
9183 DestTy: PointerType::get(C&: *MS.C, AddressSpace: 0));
9184 Value *RegSaveAreaPtr =
9185 IRB.CreateLoad(Ty: RegSaveAreaPtrTy, Ptr: RegSaveAreaPtrPtr);
9186 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9187 const DataLayout &DL = F.getDataLayout();
9188 unsigned IntptrSize = DL.getTypeStoreSize(Ty: MS.IntptrTy);
9189 const Align Alignment = Align(IntptrSize);
9190 std::tie(args&: RegSaveAreaShadowPtr, args&: RegSaveAreaOriginPtr) =
9191 MSV.getShadowOriginPtr(Addr: RegSaveAreaPtr, IRB, ShadowTy: IRB.getInt8Ty(),
9192 Alignment, /*isStore*/ true);
9193 IRB.CreateMemCpy(Dst: RegSaveAreaShadowPtr, DstAlign: Alignment, Src: VAArgTLSCopy, SrcAlign: Alignment,
9194 Size: CopySize);
9195 }
9196 }
9197};
9198
9199// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
9200// regarding VAArgs.
9201using VarArgARM32Helper = VarArgGenericHelper;
9202using VarArgRISCVHelper = VarArgGenericHelper;
9203using VarArgMIPSHelper = VarArgGenericHelper;
9204using VarArgLoongArch64Helper = VarArgGenericHelper;
9205
9206/// A no-op implementation of VarArgHelper.
9207struct VarArgNoOpHelper : public VarArgHelper {
9208 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
9209 MemorySanitizerVisitor &MSV) {}
9210
9211 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
9212
9213 void visitVAStartInst(VAStartInst &I) override {}
9214
9215 void visitVACopyInst(VACopyInst &I) override {}
9216
9217 void finalizeInstrumentation() override {}
9218};
9219
9220} // end anonymous namespace
9221
9222static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
9223 MemorySanitizerVisitor &Visitor) {
9224 // VarArg handling is only implemented on AMD64. False positives are possible
9225 // on other platforms.
9226 Triple TargetTriple(Func.getParent()->getTargetTriple());
9227
9228 if (TargetTriple.getArch() == Triple::x86)
9229 return new VarArgI386Helper(Func, Msan, Visitor);
9230
9231 if (TargetTriple.getArch() == Triple::x86_64)
9232 return new VarArgAMD64Helper(Func, Msan, Visitor);
9233
9234 if (TargetTriple.isARM())
9235 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9236
9237 if (TargetTriple.isAArch64())
9238 return new VarArgAArch64Helper(Func, Msan, Visitor);
9239
9240 if (TargetTriple.isSystemZ())
9241 return new VarArgSystemZHelper(Func, Msan, Visitor);
9242
9243 // On PowerPC32 VAListTag is a struct
9244 // {char, char, i16 padding, char *, char *}
9245 if (TargetTriple.isPPC32())
9246 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
9247
9248 if (TargetTriple.isPPC64())
9249 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
9250
9251 if (TargetTriple.isRISCV32())
9252 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9253
9254 if (TargetTriple.isRISCV64())
9255 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9256
9257 if (TargetTriple.isMIPS32())
9258 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9259
9260 if (TargetTriple.isMIPS64())
9261 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9262
9263 if (TargetTriple.isLoongArch64())
9264 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
9265 /*VAListTagSize=*/8);
9266
9267 return new VarArgNoOpHelper(Func, Msan, Visitor);
9268}
9269
9270bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
9271 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
9272 return false;
9273
9274 if (F.hasFnAttribute(Kind: Attribute::DisableSanitizerInstrumentation))
9275 return false;
9276
9277 MemorySanitizerVisitor Visitor(F, *this, TLI);
9278
9279 // Clear out memory attributes.
9280 AttributeMask B;
9281 B.addAttribute(Val: Attribute::Memory).addAttribute(Val: Attribute::Speculatable);
9282 F.removeFnAttrs(Attrs: B);
9283
9284 return Visitor.runOnFunction();
9285}
9286