Intel Gen Driver流程介绍

0x1 总体流程

GPU硬件简单说来可以包括这几块,首先是内存访问相关模块,如MMU,Cache等。其次是各种Fixed Function模块,如Rasterizer,Clipper等。最后是可编程单元模块Shader Core,Shader Core的加入使GPU具有了和CPU一样具体处理各种复杂问题的能力,赋予了用户很大的发挥空间来编写各个OpenGL shader,OpenCL kernel,这些shader和kernel最后都会生成对应的专有GPU指令并运行在Shader Core上。在Intel Gen GPU中,对应的Shader Core模块也称为EU(Execution Unit)。

从GPU driver到GPU硬件完成绘制的流程可以简单说明如下。

a. GPU user space driver在CPU侧根据应用设置的各种状态生成各种command list。

b. GPU user space driver在CPU侧通过compiler把shader源代码生成对应的gpu shader core指令. 这个shader core指令也是保存在memory中,driver会把其地址保存在某一个command结构体中。

c. GPU user space driver把前面生成的内容发送到GPU kernel space driver中。GPU kernel space driver启动gpu硬件读取command list的内容。

d. GPU中的command executor开始消费command,根据command的内容驱动gpu硬件中各个不同的模块协同工作。当需要shader core执行的时候,从相应的memory buffer中读取GPU指令,shader core根据GPU指令完成相应的shader操作。当需要读取Vertex和Texture内容的时候,从相应的command找到对应的buffer地址,然后读取相应的内容。
上面提到的流程说明可以简单地用下图所示。

本文后面用一个简单的三角形绘制测试程序说明一下GPU driver是如何工作的。这里的GPU driver主要是指开源的User space driver实现mesa。

这个测试程序的流程如下图所示。

0x2 Shader编译

下面说明一下mesa driver中Intel Gen shader编译过程。

Vertex Shader的编译过程如下。

对应的GLSL source code如下所示。
这个shader只是把外部设置的attribute信息Position设置到gl_Position中。

1
2
3
4
5
attribute vec4 vPosition;
void main()
{
gl_Position = vPosition;
}

编译的第一阶段是完成词法分析,语法分析,生成对应的抽象语法树AST。
然后根据AST生成Mesa内部的中间表示NIR。然后在NIR上执行各种编译器优化。每执行一次优化称为一个Pass。
经过多个Pass优化以后,最后生成的NIR代码如下所示。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
NIR (final form) for vertex shader:
shader: MESA_SHADER_VERTEX
name: GLSL3
inputs: 1
outputs: 1
uniforms: 0
shared: 0
decl_var shader_in INTERP_MODE_NONE highp vec4 vPosition (VERT_ATTRIB_GENERIC0.xyzw, 16, 0)
decl_var shader_out INTERP_MODE_NONE highp vec4 gl_Position (VARYING_SLOT_POS.xyzw, 0, 0)
decl_function main (0 params)
impl main {
block block_0:
/* preds: */
vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
vec4 32 ssa_1 = intrinsic load_input (ssa_0) (0, 0, 160, 144) /* base=0 */ /* component=0 */ /* dest_type=float32 */ /* location=16 slots=1 */
intrinsic store_output (ssa_1, ssa_0) (0, 15, 0, 160, 128) /* base=0 */ /* wrmask=xyzw */ /* component=0 */ /* src_type=float32 */ /* location=0 slots=1 */ /* gl_Position */
/* succs: block_1 */
block block_1:
}

下面开始执行编译器后端,生成具体的GPU shader core指令。这里shader core也就是前面提到的EU,所以我们也就是要生成EU code。
这里面也用到了传统的编译器后端技术,如图着色寄存器分配等。
最后生成的EU code如下所示。

1
2
3
4
5
6
7
8
9
10
11
Native code for unnamed vertex shader GLSL3 (sha1 15358268e6b3974dafb7512e3bdaa7c4c81a9394)
SIMD8 shader: 6 instructions. 0 loops. 20 cycles. 0:0 spills:fills, 1 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 96 to 64 bytes (33%)
START B0 (20 cycles)
mov(8) g122<1>UD g1<8,8,1>UD { align1 WE_all 1Q compacted };
mov(8) g123<1>F g2<8,8,1>F { align1 1Q compacted };
mov(8) g124<1>F g3<8,8,1>F { align1 1Q compacted };
mov(8) g125<1>F g4<8,8,1>F { align1 1Q compacted };
mov(8) g126<1>F g5<8,8,1>F { align1 1Q compacted };
send(8) null<1>F g122<8,8,1>F 0x8a080017
urb MsgDesc: 1 SIMD8 write mlen 5 rlen 0 { align1 1Q EOT };
END B0

Fragment Shader的编译过程如下。
具体的流程和Vertex Shader的编译过程类似。详细的过程不介绍了,下面只是列出各个编译阶段的结果。

GLSL source code

1
2
3
4
5
precision mediump float;
void main()
{
gl_FragColor = vec4 ( 1.0, 0.0, 0.0, 1.0 );
}

NIR code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
NIR (final form) for fragment shader:
shader: MESA_SHADER_FRAGMENT
name: GLSL3
inputs: 0
outputs: 1
uniforms: 0
shared: 0
decl_var shader_out INTERP_MODE_NONE mediump vec4 gl_FragColor (FRAG_RESULT_COLOR.xyzw, 4, 0)
decl_function main (0 params)
impl main {
block block_0:
/* preds: */
vec4 32 ssa_0 = load_const (0x3f800000 /* 1.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x3f800000 /* 1.000000 */)
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
intrinsic store_output (ssa_0, ssa_1) (4, 15, 0, 160, 8388738) /* base=4 */ /* wrmask=xyzw */ /* component=0 */ /* src_type=float32 */ /* location=2 slots=1 mediump */ /* gl_FragColor */
/* succs: block_1 */
block block_1:
}

EU code

1
2
3
4
5
6
7
8
9
10
Native code for unnamed fragment shader GLSL3 (sha1 f7554fd836ad71e30768cdc6b984e294b8f2fae5)
SIMD8 shader: 5 instructions. 0 loops. 18 cycles. 0:0 spills:fills, 1 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 80 to 64 bytes (20%)
START B0 (18 cycles)
mov(8) g123<1>F 0x3f800000F /* 1F */ { align1 1Q };
mov(8) g124<1>F 0x0VF /* [0F, 0F, 0F, 0F]VF */ { align1 1Q compacted };
mov(8) g125<1>F 0x0VF /* [0F, 0F, 0F, 0F]VF */ { align1 1Q compacted };
mov(8) g126<1>F 0x3f800000F /* 1F */ { align1 1Q };
sendc(8) null<1>UW g123<0,1,0>UD 0x88031400
render MsgDesc: RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };
END B0

0x3 Command生成

Command Stream Unit是Gen GPU内部用来管理3D pipeline或者Media单元的模块,通过配置不同的Command Stream命令,我们可以精细地控制3D pipeline的运行。
Command Stream Unit还提供了URB分配和管理的功能。URB可以理解成是用来在各个Pipeline阶段(如VS, Rasterizer, Clipper, PS等)之间传递参数的buffer。

对Command Stream的编程,简单地理解就是把Graphics API的状态配置转换成Command Stream的命令。不同API函数执行的时候需要配置不同的Command Stream命令,我们可以通过command stream dump机制把生成command保存下来,然后通过可视化的工具分析出现时问题。mesa提供了可视化的viewer工具来分析生成的每个command。

对于Broadcom GPU来说,和Command Stream类似的概念叫做Control List,执行Control List的硬件模块叫做Control List Executor。Broadcom GPU驱动也需要配置Control List来驱动GPU的执行。

下面来说明一下一个简单的OpenGL ES应用执行的时候需要配置哪些Command Stream命令。

eglCreateContext调用以后,会执行driver初始化动作。需要配置下面这些Command。

GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL
GEN9_PIPELINE_SELECT
state GEN9_L3CNTLREG
GEN9_MI_LOAD_REGISTER_IMM
GEN9_PIPE_CONTROL
GEN9_STATE_BASE_ADDRESS
GEN9_PIPE_CONTROL
state GEN9_CS_DEBUG_MODE2
GEN9_MI_LOAD_REGISTER_IMM
state GEN9_CACHE_MODE_1
GEN9_MI_LOAD_REGISTER_IMM
GEN9_3DSTATE_DRAWING_RECTANGLE
GEN9_3DSTATE_SAMPLE_PATTERN
GEN9_3DSTATE_AA_LINE_PARAMETERS
GEN9_3DSTATE_WM_CHROMAKEY
GEN9_3DSTATE_WM_HZ_OP
GEN9_3DSTATE_POLY_STIPPLE_OFFSET
GEN9_3DSTATE_PUSH_CONSTANT_ALLOC_VS
GEN9_3DSTATE_PUSH_CONSTANT_ALLOC_VS
GEN9_3DSTATE_PUSH_CONSTANT_ALLOC_VS
GEN9_3DSTATE_PUSH_CONSTANT_ALLOC_VS
GEN9_3DSTATE_PUSH_CONSTANT_ALLOC_VS
GEN9_3DSTATE_CC_STATE_POINTERS
GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL
GEN9_PIPELINE_SELECT
state GEN9_L3CNTLREG
GEN9_MI_LOAD_REGISTER_IMM
GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL
GEN9_STATE_BASE_ADDRESS
GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL

eglMakeCurrent调用以后,需要给frame buffer分配Gem buffer。

glShaderSource/glCompileShader创建shader对象,把glsl source code设置到gpu driver中。

glLinkProgram调用以后,执行glsl的编译动作,把glsl source code编译成glsl AST,然后转换成NIR。

glClear调用以后,mesa driver会通过Blit engine来执行clear操作,这个时候需要给Blit engine生成EU code。所以这个时候会生成glsl fragment shader code,再转换成NIR,再生成EU code。这个时候需要配置下面这些Command。

state GEN9_GT_MODE
GEN9_PIPE_CONTROL
GEN9_MI_LOAD_REGISTER_IMM

GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL

GEN9_PIPE_CONTROL
GEN9_STATE_BASE_ADDRESS
GEN9_PIPE_CONTROL

这边再说明一下,给Blit engine生成的fragment shader的NIR和EU code如下所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
NIR (final form) for fragment shader:
shader: MESA_SHADER_FRAGMENT
name: BLORP-clear
inputs: 0
outputs: 0
uniforms: 0
shared: 0
decl_var shader_in INTERP_MODE_FLAT vec4 clear_color (VARYING_SLOT_VAR0.xyzw, 32, 0)
decl_var shader_out INTERP_MODE_NONE vec4 gl_FragColor (FRAG_RESULT_COLOR.xyzw, 4, 0)
decl_function main (0 params)
impl main {
block block_0:
/* preds: */
vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
vec4 32 ssa_1 = intrinsic load_input (ssa_0) (32, 0, 160, 160) /* base=32 */ /* component=0 */ /* dest_type=float32 */ /* location=32 slots=1 */ /* clear_color */
intrinsic store_output (ssa_1, ssa_0) (4, 15, 0, 160, 130) /* base=4 */ /* wrmask=xyzw */ /* component=0 */ /* src_type=float32 */ /* location=2 slots=1 */ /* gl_FragColor */
/* succs: block_1 */
block block_1:
}
Native code for unnamed fragment shader BLORP-clear (sha1 867c2b9794cc6654dbcf091d8a21323f01e0d409)
SIMD16 shader: 2 instructions. 0 loops. 12 cycles. 0:0 spills:fills, 1 sends, scheduled with mode (null). Promoted 0 constants. Compacted 32 to 32 bytes (0%)
START B0 (12 cycles)
mov(4) g114<1>F g2.3<8,2,4>F { align1 WE_all 1N };
sendc(16) null<1>UW g114<0,1,0>F 0x82031100
render MsgDesc: RT write SIMD16/RepData LastRT Surface = 0 mlen 1 rlen 0 { align1 1H EOT };
END B0

然后通过下面代码加载Vertex数据。
glVertexAttribPointer ( 0, 3, GL_FLOAT, GL_FALSE, 0, vVertices );
glEnableVertexAttribArray ( 0 );

调用下面的函数执行Draw操作。
glDrawArrays ( GL_TRIANGLES, 0, 3 );
这个时候才会把对应shader的NIR代码转换成EU code。注意前面执行glCompileShader的时候只是生成了NIR,没有完成EU code的生成,也就是说编译器的后端到这个时候才开始工作。

先生成fragment shader的EU code,然后把EU code配置到下面的command中。
其中GEN9_3DSTATE_PS有一个分量会指明PS对应的EU Code保存地址。
GEN9_3DSTATE_PS
GEN9_3DSTATE_PS_EXTRA

然后生成vertex shader的EU code,然后把EU code配置到下面的command中。
其中GEN9_3DSTATE_VS有一个分量会指明VS对应的EU Code保存地址。
GEN9_3DSTATE_STREAMOUT
GEN9_3DSTATE_SO_DECL_LIST
GEN9_3DSTATE_VS

然后继续配置如下的command。其中GEN9_3DPRIMITIVE用来说明了需要参加绘制的Vertex数目。
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
state GEN9_BLEND_STATE_ENTRY
GEN9_3DSTATE_PS_BLEND
state GEN9_BLEND_STATE
GEN9_3DSTATE_SF
GEN9_3DSTATE_RASTER
GEN9_3DSTATE_CLIP
GEN9_3DSTATE_WM
GEN9_3DSTATE_LINE_STIPPLE
GEN9_3DSTATE_VERTEX_ELEMENTS
state GEN9_VERTEX_ELEMENT_STATE
GEN9_3DSTATE_VF_INSTANCING
state GEN9_VERTEX_ELEMENT_STATE
GEN9_3DSTATE_VF_INSTANCING
state GEN9_VERTEX_BUFFER_STATE
GEN9_PIPE_CONTROL
state GEN9_CS_CHICKEN1
GEN9_MI_LOAD_REGISTER_IMM
state GEN9_CC_VIEWPORT
GEN9_3DSTATE_VIEWPORT_STATE_POINTERS_CC
state GEN9_SF_CLIP_VIEWPORT
GEN9_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP
GEN9_3DSTATE_URB_VS
GEN9_3DSTATE_URB_VS
GEN9_3DSTATE_URB_VS
GEN9_3DSTATE_URB_VS
state GEN9_BLEND_STATE
GEN9_3DSTATE_BLEND_STATE_POINTERS
state GEN9_COLOR_CALC_STATE
GEN9_3DSTATE_CC_STATE_POINTERS
GEN9_3DSTATE_CONSTANT_VS
GEN9_3DSTATE_CONSTANT_VS
GEN9_3DSTATE_BINDING_TABLE_POINTERS_VS
GEN9_3DSTATE_BINDING_TABLE_POINTERS_VS
GEN9_3DSTATE_BINDING_TABLE_POINTERS_VS
GEN9_3DSTATE_BINDING_TABLE_POINTERS_VS
GEN9_3DSTATE_BINDING_TABLE_POINTERS_VS
GEN9_3DSTATE_SAMPLER_STATE_POINTERS_VS
GEN9_3DSTATE_SAMPLER_STATE_POINTERS_VS
GEN9_3DSTATE_MULTISAMPLE
GEN9_3DSTATE_SAMPLE_MASK
GEN9_3DSTATE_HS
GEN9_3DSTATE_TE
GEN9_3DSTATE_DS
GEN9_3DSTATE_GS
GEN9_3DSTATE_PS
GEN9_3DSTATE_PS_EXTRA
GEN9_3DSTATE_STREAMOUT
GEN9_3DSTATE_CLIP
GEN9_3DSTATE_SF
GEN9_3DSTATE_WM
GEN9_3DSTATE_SBE
GEN9_3DSTATE_SBE_SWIZ
GEN9_3DSTATE_PS_BLEND
GEN9_3DSTATE_WM_DEPTH_STENCIL
GEN9_3DSTATE_SCISSOR_STATE_POINTERS
GEN9_3DSTATE_CLEAR_PARAMS
GEN9_3DSTATE_POLY_STIPPLE_PATTERN
GEN9_3DSTATE_VF_TOPOLOGY
GENX(3DSTATE_VERTEX_BUFFERS)
GEN9_3DSTATE_VF_SGVS
GEN9_3DSTATE_VF
GEN9_3DSTATE_VF_STATISTICS
GEN9_3DPRIMITIVE

最后执行eglSwapBuffers,把前面生成的command都送到GPU kernel driver中,然后启动GPU硬件完成绘制。
GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL
GEN9_PIPE_CONTROL

0x4 Command配置的实现

这里以Vulkan driver中的Command配置实现为例说明Mesa中Gen gpu的command是如何配置的。对OpenGL driver来说,这部分实现也是很类似的。

通过xml配置文件来保存Command Streamer的结构体信息,然后在编译的时候通过python代码把xml文件转换成对应的配置头文件。这个过程如下图所示。

其中可以看到Gen的各种Driver实现都依赖于这个配置头文件。另外提到一点就是这些配置的函数都是在不同的Driver中实现的,有不少冗余代码,可以作为一个代码优化方向。

xml的内容是参考下面两个文档(以Gen11为例)来生成的。

Intel® Iris® Plus Graphics and UHD Graphics Open Source
Programmer’s Reference Manual
For the 2019 10th Generation Intel CoreTM Processors based on the
“Ice Lake” Platform
Volume 2a - Command Reference: Instructions (Command Opcodes)

Intel® Iris® Plus Graphics and UHD Graphics Open
Source
Programmer’s Reference Manual
For the 2019 10th Generation Intel CoreTM Processors
based on the “Ice Lake” Platform
Volume 8: Command Stream Programming

下面截取了部分xml内容如下所示。其中包括了PIPE_CONTROL的配置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<instruction name="PIPE_CONTROL" bias="2" length="6" engine="render">
<field name="DWord Length" start="0" end="7" type="uint" default="4"/>
<field name="3D Command Sub Opcode" start="16" end="23" type="uint" default="0"/>
<field name="3D Command Opcode" start="24" end="26" type="uint" default="2"/>
<field name="Command SubType" start="27" end="28" type="uint" default="3"/>
<field name="Command Type" start="29" end="31" type="uint" default="3"/>
<field name="Depth Cache Flush Enable" start="32" end="32" type="bool"/>
<field name="Stall At Pixel Scoreboard" start="33" end="33" type="bool"/>
<field name="State Cache Invalidation Enable" start="34" end="34" type="bool"/>
<field name="Constant Cache Invalidation Enable" start="35" end="35" type="bool"/>
<field name="VF Cache Invalidation Enable" start="36" end="36" type="bool"/>
<field name="DC Flush Enable" start="37" end="37" type="bool"/>
<field name="Pipe Control Flush Enable" start="39" end="39" type="bool"/>
<field name="Notify Enable" start="40" end="40" type="bool"/>
<field name="Indirect State Pointers Disable" start="41" end="41" type="bool"/>
<field name="Texture Cache Invalidation Enable" start="42" end="42" type="bool"/>
<field name="Instruction Cache Invalidate Enable" start="43" end="43" type="bool"/>
<field name="Render Target Cache Flush Enable" start="44" end="44" type="bool"/>
<field name="Depth Stall Enable" start="45" end="45" type="bool"/>
<field name="Post Sync Operation" start="46" end="47" type="uint">
<value name="No Write" value="0"/>
<value name="Write Immediate Data" value="1"/>
<value name="Write PS Depth Count" value="2"/>
<value name="Write Timestamp" value="3"/>
</field>
<field name="Generic Media State Clear" start="48" end="48" type="bool"/>
<field name="TLB Invalidate" start="50" end="50" type="bool"/>
<field name="Global Snapshot Count Reset" start="51" end="51" type="bool"/>
<field name="Command Streamer Stall Enable" start="52" end="52" type="bool"/>
<field name="Store Data Index" start="53" end="53" type="uint"/>
<field name="LRI Post Sync Operation" start="55" end="55" type="uint">
<value name="No LRI Operation" value="0"/>
<value name="MMIO Write Immediate Data" value="1"/>
</field>
<field name="Destination Address Type" start="56" end="56" type="uint" prefix="DAT">
<value name="PPGTT" value="0"/>
<value name="GGTT" value="1"/>
</field>
<field name="Flush LLC" start="58" end="58" type="bool"/>
<field name="Address" start="66" end="111" type="address"/>
<field name="Immediate Data" start="128" end="191" type="uint"/>
</instruction>

把xml内容转换后头文件内容如下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#define GEN9_PIPE_CONTROL_length 6
#define GEN9_PIPE_CONTROL_length_bias 2
// 定义Command的头信息
#define GEN9_PIPE_CONTROL_header \
.DWordLength = 4, \
._3DCommandSubOpcode = 0, \
._3DCommandOpcode = 2, \
.CommandSubType = 3, \
.CommandType = 3
// 定义Command的结构体信息
struct GEN9_PIPE_CONTROL {
uint32_t DWordLength;
uint32_t _3DCommandSubOpcode;
uint32_t _3DCommandOpcode;
uint32_t CommandSubType;
uint32_t CommandType;
bool DepthCacheFlushEnable;
bool StallAtPixelScoreboard;
bool StateCacheInvalidationEnable;
bool ConstantCacheInvalidationEnable;
bool VFCacheInvalidationEnable;
bool DCFlushEnable;
bool PipeControlFlushEnable;
bool NotifyEnable;
bool IndirectStatePointersDisable;
bool TextureCacheInvalidationEnable;
bool InstructionCacheInvalidateEnable;
bool RenderTargetCacheFlushEnable;
bool DepthStallEnable;
uint32_t PostSyncOperation;
#define NoWrite 0
#define WriteImmediateData 1
#define WritePSDepthCount 2
#define WriteTimestamp 3
bool GenericMediaStateClear;
bool TLBInvalidate;
bool GlobalSnapshotCountReset;
bool CommandStreamerStallEnable;
uint32_t StoreDataIndex;
uint32_t LRIPostSyncOperation;
#define NoLRIOperation 0
#define MMIOWriteImmediateData 1
uint32_t DestinationAddressType;
#define DAT_PPGTT 0
#define DAT_GGTT 1
bool FlushLLC;
__gen_address_type Address;
uint64_t ImmediateData;
};

Command配置的宏定义如下。
该宏定义包括了所有Command的配置。

1
2
3
4
5
6
7
8
9
10
11
12
#define __anv_cmd_header(cmd) cmd ## _header
#define __anv_cmd_pack(cmd) cmd ## _pack
#define anv_batch_emit(batch, cmd, name) \
for (struct cmd name = { __anv_cmd_header(cmd) }, \
*_dst = anv_batch_emit_dwords(batch, __anv_cmd_length(cmd)); \
__builtin_expect(_dst != NULL, 1); \
({ __anv_cmd_pack(cmd)(batch, _dst, &name); \
printf("%s\n", #cmd); \
VG(VALGRIND_CHECK_MEM_IS_DEFINED(_dst, __anv_cmd_length(cmd) * 4)); \
_dst = NULL; \
}))

上面的宏定义是通过实现各个不同Command的pack函数(也是通过前面的genXml自动生成的)来实现不同Command的配置的。

例如下面的函数实现了PIPE_CONTROL Command的结构体信息配置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
static inline __attribute__((always_inline)) void
GEN9_PIPE_CONTROL_pack(__attribute__((unused)) __gen_user_data *data,
__attribute__((unused)) void * restrict dst,
__attribute__((unused)) const struct GEN9_PIPE_CONTROL * restrict values)
{
uint32_t * restrict dw = (uint32_t * restrict) dst;
dw[0] =
__gen_uint(values->DWordLength, 0, 7) |
__gen_uint(values->_3DCommandSubOpcode, 16, 23) |
__gen_uint(values->_3DCommandOpcode, 24, 26) |
__gen_uint(values->CommandSubType, 27, 28) |
__gen_uint(values->CommandType, 29, 31);
dw[1] =
__gen_uint(values->DepthCacheFlushEnable, 0, 0) |
__gen_uint(values->StallAtPixelScoreboard, 1, 1) |
__gen_uint(values->StateCacheInvalidationEnable, 2, 2) |
__gen_uint(values->ConstantCacheInvalidationEnable, 3, 3) |
__gen_uint(values->VFCacheInvalidationEnable, 4, 4) |
__gen_uint(values->DCFlushEnable, 5, 5) |
__gen_uint(values->PipeControlFlushEnable, 7, 7) |
__gen_uint(values->NotifyEnable, 8, 8) |
__gen_uint(values->IndirectStatePointersDisable, 9, 9) |
__gen_uint(values->TextureCacheInvalidationEnable, 10, 10) |
__gen_uint(values->InstructionCacheInvalidateEnable, 11, 11) |
__gen_uint(values->RenderTargetCacheFlushEnable, 12, 12) |
__gen_uint(values->DepthStallEnable, 13, 13) |
__gen_uint(values->PostSyncOperation, 14, 15) |
__gen_uint(values->GenericMediaStateClear, 16, 16) |
__gen_uint(values->TLBInvalidate, 18, 18) |
__gen_uint(values->GlobalSnapshotCountReset, 19, 19) |
__gen_uint(values->CommandStreamerStallEnable, 20, 20) |
__gen_uint(values->StoreDataIndex, 21, 21) |
__gen_uint(values->LRIPostSyncOperation, 23, 23) |
__gen_uint(values->DestinationAddressType, 24, 24) |
__gen_uint(values->FlushLLC, 26, 26);
const uint64_t v2_address =
__gen_combine_address(data, &dw[2], values->Address, 0);
dw[2] = v2_address;
dw[3] = v2_address >> 32;
const uint64_t v4 =
__gen_uint(values->ImmediateData, 0, 63);
dw[4] = v4;
dw[5] = v4 >> 32;
}

在Vulkan驱动中通过类似的配置的调用代码来设置具体的Command信息。
最后调用的函数是前面实现的GEN9_PIPE_CONTROL_pack()。

1
2
3
4
5
anv_batch_emit(&cmd_buffer->batch, GENX(PIPE_CONTROL), pc) {
pc.DCFlushEnable = true;
pc.RenderTargetCacheFlushEnable = true;
pc.CommandStreamerStallEnable = true;
}