E.6 UltraSPARC and VIS Instruction Set Extensions
This section describes extensions that require SPARC-V9. The extensions support enhanced graphics functionality and improved memory access efficiency.
Note - SPARC-V9 instruction set extensions used in executables may not be portable to other SPARC-V9 systems.
E.6.1 Graphics Data Formats
The overhead of converting to and from floating-point arithmetic is high, so the graphics instructions are optimized for short-integer arithmetic. Image components are 8 or 16 bits. Intermediate results are 16 or 32 bits.
E.6.2 Eight-bit Format
A 32-bit word contains pixels of four unsigned 8-bit integers. The integers represent image intensity values (, G, B, R). Support is provided for band interleaved images (store color components of a point), and band sequential images (store all values of one color component).
E.6.3 Fixed Data Formats
A 64-bit word contains four 16-bit signed fixed-point values. This is the fixed 16-bit data format.
A 64-bit word contains two 8-bit signed fixed-point values. This is the fixed 32-bit data format.
Enough precision and dynamic range (for filtering and simple image computations on pixel values) can be provided by an intermediate format of fixed data values. Pixel multiplication is used to convert from pixel data to fixed data. Pack instructions are used to convert from fixed data to pixel data (clip and truncate to an 8-bit unsigned value). The FPACKFIX instruction supports conversion from 32-bit fixed to 16-bit fixed. Rounding is done by adding one to the rounding bit position. You should use floating-point data to perform complex calculations needing more precision or dynamic range.
E.6.4 SHUTDOWN Instruction
All outstanding transactions are completed before the SHUTDOWN instruction completes.
Table E-13
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
SHUTDOWN | shutdown |
| shutdown to enter power down mode |
E.6.5 Graphics Status Register (GSR)
You use ASR 0x13 instructions RDASR and WRASR to access the Graphics Status Register.
Table E-14
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
RDASR WRASR | rdasr wrasr | %gsr, regrd regrs1, reg_or_imm, %gsr | read GSR write GSR |
E.6.6 Graphics Instructions
Unless otherwise specified, floating-point registers contain all instruction operands. There are 32 double-precision registers. Single-precision floating-point registers contain the pixel values, and double-precision floating-point registers contain the fixed values.
The opcode space reserved for the Implementation-Dependent Instruction1 (IMPDEP1) instructions is where the graphics instruction set is mapped.
Partitioned add/subtract instructions perform two 32-bit or four 16-bit partitioned adds or subtracts between the source operands corresponding fixed point values.
Table E-15
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
FPADD16 FPADD16S FPADD32 FPADD32S FPSUB16 FPSUB16S FPSUB32 FPSUB32S | fpadd16 fpadd16s fpadd32 fpadd32s fpsub16 fpsub16s fpsub32 fpsub32s | fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd | four 16-bit add two 16-bit add two 32-bit add one 32-bit add four 16-bit subtract two 16-bit subtract two 32-bit subtract one 32-bit subtract |
Pack instructions convert to a lower pixel or precision fixed format.
Table E-16
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
FPACK16 FPACK32 FPACKFIX FEXPAND FPMERGE | fpack16 fpack32 fpackfix fexpand fpmerge | fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs2, fregrd fregrs2, fregrd fregrs1, fregrs2, fregrd | four 16-bit packs two 32-bit packs four 16-bit packs four 16-bit expands two 32-bit merges |
Partitioned multiply instructions have the following variations.
Table E-17
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
FMUL8x16 FMUL8x16AU FMUL8x16AL FMUL8SUx16 FMUL8ULx16 FMULD8SUx16 FMULD8ULx16 | fmul8x16 fmul8x16au fmul8x16al fmul8sux16 fmul8ulx16 fmuld8sux16 fmuld8ulx16 | fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd | 8x16-bit partition 8x16-bit upper partition 8x16-bit lower partition upper 8x16-bit partition lower unsigned 8x16-bit partition upper 8x16-bit partition lower unsigned 8x16-bit partition |
Alignment instructions have the following variations.
Table E-18
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
ALIGNADDRESS ALIGNADDRESS_LITTLE FALIGNDATA | alignaddr alignaddrl
faligndata | regrs1, regrs2, regrd regrs1, regrs2, regrd
fregrs1, fregrs2, fregrd | find misaligned data access address same as above, but little-endian
do misaligned data, data alignment |
Logical operate instructions perform one of sixteen 64-bit logical operations between rs1 and rs2 (in the standard 64-bit version).
Table E-19
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
FZERO FZEROS FONE FONES FSRC1 | fzero fzeros fone fones fsrc1 | fregrd fregrd fregrd fregrd fregrs1, fregrd | zero fill zero fill, single precision one fill one fill, single precision copy src1 |
FSRC1S FSRC2 FSRC2S FNOT1 FNOT1S | fsrc1s fsrc2 fsrc2s fnot1 fnot1s | fregrs1, fregrd fregrs2, fregrd fregrs2, fregrd fregrs1, fregrd fregrs1, fregrd | copy src1, single precision copy src2 copy src2, single precision negate src1, 1's complement same as above, single precision |
FNOT2 FNOT2S FOR FORS FNOR | fnot2 fnot2s for fors fnor | fregrs2, fregrd fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd | negate src2, 1's complement same as above, single precision logical OR logical OR, single precision logical NOR |
FNORS FAND FANDS FNAND FNANDS | fnors fand fands fnand fnands | fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd | logical NOR, single precision logical AND logical AND, single precision logical NAND logical NAND, single precision |
FXOR FXORS FXNOR FXNORS FORNOT1 | fxor fxors fxnor fxnors fornot1 | fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd | logical XOR logical XOR, single precision logical XNOR logical XNOR, single precision negated src1 OR src2 |
FORNOT1S FORNOT2 FORNOT2S FANDNOT1 | fornot1s fornot2 fornot2s fandnot1 | fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd | same as above, single precision src1 OR negated src2 same as above, single precision negated src1 AND src2 |
FANDNOT1S FANDNOT2 FANDNOT2S | fandnot1s fandnot2 fandnot2s | fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd
| same as above, single precision src1 AND negated src2 same as above, single precision |
Pixel compare instructions compare fixed-point values in rs1 and rs2 (two 32 bit or four 16 bit)
Table E-20
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
FCMPGT16 FCMPGT32 FCMPLE16 FCMPLE32 | fcmpgt16 fcmpgt32 fcmple16 fcmple32 | fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd | 4 16-bit compare, set rd if src1>src2 2 32-bit compare, set rd if src1>src2 4 16-bit compare, set rd if src1<=src2 2 32-bit compare, set rd if src1<=src2 |
FCMPNE16 FCMPNE32 FCMPEQ16 FCMPEQ32 | fcmpne16 fcmpne32 fcmpeq16 fcmpeq32 | fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd | 4 16-bit compare, set rd if src1src2 2 32-bit compare, set rd if src1src2 4 16-bit compare, set rd if src1=src2 2 32-bit compare, set rd if src1=src2 |
Edge handling instructions handle the boundary conditions for parallel pixel scan line loops.
Table E-21
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
EDGE8 EDGE8L EDGE16 | edge8 edge8l edge16 | regrs1, regrs2, regrd regrs1, regrs2, regrd regrs1, regrs2, regrd | 8 8-bit edge boundary processing same as above, little-endian 4 16-bit edge boundary processing |
EDGE16L EDGE32 EDGE32L | edge16l edge32 edge32l | regrs1, regrs2, regrd regrs1, regrs2, regrd regrs1, regrs2, regrd | same as above, little-endian 2 32-bit edge boundary processing same as above, little-endian |
Pixel component distance instructions are used for motion estimation in video compression algorithms.
Table E-22
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
PDIST | pdist | fregrs1, fregrs2, fregrd | 8 8-bit components, distance between |
The three-dimensional array addressing instructions convert three- dimensional fixed-point addresses (in rs1) to a blocked-byte address. The result is stored in rd.
Table E-23
SPARC | Mnemonic | Argument List | Description |
---|---|---|---|
ARRAY8
ARRAY16 ARRAY32 | array8
array16 array32 | regrs1, regrs2, regrd
regrs1, regrs2, regrd regrs1, regrs2, regrd | convert 8-bit 3-D address to blocked byte address same as above, but 16-bit same as above, but 32-bit |
E.6.7 Memory Access Instructions
These memory access instructions are part of the SPARC-V9 instruction set extensions.
Table E-24
SPARC | imm_asi | Argument List | Description |
---|---|---|---|
STDFA STDFA STDFA STDFA |
ASI_PST8_P ASI_PST8_S ASI_PST8_PL ASI_PST8_SL |
stda fregrd, [fregrs1] regmask, imm_asi | eight 8-bit conditional stores to: primary address space secondary address space primary address space, little endian secondary address space, little endian |
STDFA STDFA STDFA STDFA |
ASI_PST16_P ASI_PST16_S ASI_PST16_PL ASI_PST16_SL | four 16-bit conditional stores to: primary address space secondary address space primary address space, little endian secondary address space, little endian | |
STDFA STDFA STDFA STDFA |
ASI_PST32_P ASI_PST32_S ASI_PST32_PL ASI_PST32_SL |
| two 32-bit conditional stores to: primary address space secondary address space primary address space, little endian secondary address space, little endian |
Note - To select a partial store instruction, use one of the partial store ASIs with the STDA instruction.
Table E-25
SPARC | imm_asi | Argument List | Description |
---|---|---|---|
LDDFA STDFA |
ASI_FL8_P |
ldda [reg_addr] imm_asi, freqrd stda freqrd, [reg_addr] imm_asi | 8-bit load/store from/to: primary address space |
LDDFA STDFA | ASI_FL8_S | ldda [reg_plus_imm] %asi, freqrd stda [reg_plus_imm] %asi | secondary address space |
LDDFA STDFA | ASI_FL8_PL |
| primary address space, little endian |
LDDFA STDFA | ASI_FL8_SL |
| secondary address space, little endian |
LDDFA STDFA |
ASI_FL16_P |
| 16-bit load/store from/to: primary address space |
LDDFA STDFA | ASI_FL16_S |
| secondary address space |
LDDFA STDFA | ASI_FL16_PL | primary address space, little endian | |
LDDFA STDFA | ASI_FL16_SL |
| secondary address space, little endian |
Note - To select a short floating-point load and store instruction, use one of the short ASIs with the LDDA and STDA instructions.
Table E-26
SPARC | imm_asi | Argument List | Description |
---|---|---|---|
LDDA LDDA | ASI_NUCLEUS_QUAD_LDD ASI_NUCLEUS_QUAD_LDD_L | [reg_addr] imm_asi, regrd [reg_plus_imm] %asi, regrd | 128-bit atomic load 128-bit atomic load, little endian |
LDDFA STDFA |
ASI_BLK_AIUP |
ldda [reg_addr] imm_asi, freqrd stda freqrd, [reg_addr] imm_asi | 64-byte block load/store from/to: primary address space, user privilege |
LDDFA STDFA | ASI_BLK_AIUS | ldda [reg_plus_imm] %asi, freqrd stda fregrd, [reg_plus_imm] %asi | secondary address space, user privilege. |
LDDFA STDFA | ASI_BLK_AIUPL |
| primary address space, user privilege, little endian |
LDDFA STDFA | ASI_BLK_AIUSL |
| secondary address space, user privilege little endian |
LDDFA STDFA | ASI_BLK_P |
| primary address space |
LDDFA STDFA | ASI_BLK_S |
| secondary address space |
LDDFA STDFA | ASI_BLK_PL |
| primary address space, little endian |
LDDFA STDFA | ASI_BLK_SL |
| secondary address space, little endian |
LDDFA STDFA | ASI_BLK_COMMIT_P | 64-byte block commit store to primary address space | |
LDDFA STDFA | ASI_BLK_COMMIT_S |
| 64-byte block commit store to secondary address space |
Note - To select a block load and store instruction, use one of the block transfer ASIs with the LDDA and STDA instructions.