From 95f5c5eb5f02f8a43cbda77572de83cebf87b2fd Mon Sep 17 00:00:00 2001 From: RichardG867 Date: Fri, 31 Jan 2025 20:27:27 -0300 Subject: [PATCH] Partial architectural overview review --- _posts/2025-01-22-riva128-part-1.md | 197 ++++++++++++++-------------- 1 file changed, 98 insertions(+), 99 deletions(-) diff --git a/_posts/2025-01-22-riva128-part-1.md b/_posts/2025-01-22-riva128-part-1.md index d9f9a33..0b4e2ae 100644 --- a/_posts/2025-01-22-riva128-part-1.md +++ b/_posts/2025-01-22-riva128-part-1.md @@ -12,7 +12,7 @@ image: "/assets/images/riva128/p1/hero.png" --- -## Note: Documents Wanted +## Note: Documents wanted If you are in possession of any of: * NVIDIA RIVA 128 Programmers' Reference Manual @@ -42,19 +42,19 @@ This is the first part in a series of blog posts that aims to demystify, once an NVIDIA was conceived in 1992 by three LSI Logic and Sun Microsystems engineers: Jensen Huang (now one of the world's richest men, still the CEO and, apparently, mobbed by fans in his country of birth Taiwan), Curtis Priem (whose boss almost convinced him to work on Java instead of founding the company) and Chris Malachowsky (a veteran of graphics chip development). They saw a business opportunity in the PC graphics and audio market, which was dominated by low-end, high-volume players such as S3 Graphics, Tseng Labs, Cirrus Logic and Matrox[^matrox]. The company was formally founded on April 5, 1993, after all three left their jobs at LSI Logic and Sun between December 1992 and March 1993. -[^matrox]: Only Matrox is both still around and still in the graphics space, after exiting the consumer market in 2003 and ceasing to design graphics cards entirely in 2014 before recently returning with Intel Arc-based designs. Cirrus Logic is still around as an audio chip designer, stemming from their acquisition of Crystal in the 1990s. +[^matrox]: Only Matrox is both still around and still in the graphics space, after exiting the consumer market in 2003, ceasing to design graphics cards entirely in 2014, and recently coming back with Intel Arc-based designs. Cirrus Logic is still around as an audio chip designer, stemming from their acquisition of Crystal in the 1990s. -After the requisite $3 million of venture capital funding was acquired (a little nepotism owing to their reputation helped), work immediately began on a first generation graphics chip; this was one of the first of a rush of dozens of companies attempting to develop graphics cards - both established players in the 2D graphics market such as Number Nine and S3, and new companies, almost all of which no longer exist - many of which failed to even release a single graphics card. The name was initially GXNV for "GX next version", after a graphics card Malachowsky led the development of at Sun, but Huang requested him to rename the card to NV1 in order to not get sued. This also inspired the name of the company - NVIDIA, after other names such as "Primal Graphics" and "Huaprimal" were considered and rejected, and their originally chosen name of "Invision" turned out to have been trademarked by a toilet paper company. +After the requisite $3 million of venture capital funding was acquired (a little nepotism owing to their reputation helped), work immediately began on a first generation graphics chip; this was one of the first of a rush of dozens of companies attempting to develop graphics cards - both established players in the 2D graphics market such as Number Nine and S3, and new companies, almost all of which no longer exist - many of which failed to even release a single graphics card. The name was initially GXNV for "GX next version", after a graphics chip Malachowsky led the development of at Sun, but Huang requested him to rename the chip to NV1 in order to not get sued. This also inspired the name of the company - NVIDIA, after other names such as "Primal Graphics" and "Huaprimal" were considered and rejected, and their originally chosen name of "Invision" turned out to have been trademarked by a toilet paper company. In a perhaps ironic twist of fate, toilet paper turned out to be an apt metaphor for the sales, if not quality, of their first product, which Jensen Huang appears to be embarassed to discuss when asked, and has been quoted as saying "You don't build NV1 because you're great". The product was released in 1995 after a two-year development cycle and the creation of what NVIDIA dubbed a hardware simulator, but actually appears to have been simply a set of Windows 3.x drivers intended to emulate their architecture, called the NV0 in 1994. ### The NV1 -The **NV1** was a combination graphics, audio, DRM (yes, really) and game port card implementing what NVIDIA dubbed the "NV Unified Media Architecture" (UMA); the chip was manufactured by SGS-Thomson Microelectronics (now STMicroelectronics) on the 350 nanometer node, who also white-labelled NVIDIA's design (which allegedly[^contract] featured a DAC block designed by SGS-Thomson) as the STG-2000, a version without audio functionality that is also called the "NV1-V32" (for 32-bit VRAM) in internal documentation, as opposed to NVIDIA's design being the NV1-D64. The card was designed to implement a reasonable level of 3D graphics functionality, as well as audio, public-key encryption for DRM purposes (ultimately never used as it would have required the cooperation of software companies) and Sega Saturn game ports, all within a single megabyte of RAM, as memory costs were around $50 a megabyte when initial design began in 1993. +The **NV1** was a combination graphics, audio, DRM (yes, really) and game port card implementing what NVIDIA dubbed the "NV Unified Media Architecture" (UMA); the chip was manufactured by SGS-Thomson Microelectronics (now STMicroelectronics) on the 350 nanometer node, who also white-labelled NVIDIA's design (which allegedly[^contract] featured a DAC block designed by SGS-Thomson) as the STG-2000, a variant without audio functionality, also called the "NV1-V32" (for 32-bit VRAM) in internal documentation as opposed to NVIDIA's NV1-D64. The chip was designed to implement a reasonable level of 3D graphics functionality, as well as audio, public-key encryption for DRM purposes (ultimately never used as it would have required the cooperation of software companies) and Sega Saturn game ports, all within a single megabyte of RAM, as memory costs were around $50 a megabyte when initial design began in 1993. -[^contract]: Source: [Strategic Collaboration Agreement between NVIDIA and SGS-Thomson](http://web.archive.org/web/20240722140726/https://contracts.onecle.com/nvidia/sgs.collab.1993.11.10.shtml), originally covering NV1 but revised to include NV3, apparently part of a filing with the US Securities and Exchange Commission. +[^contract]: Source: [Strategic Collaboration Agreement between NVIDIA and SGS-Thomson](http://web.archive.org/web/20240722140726/https://contracts.onecle.com/nvidia/sgs.collab.1993.11.10.shtml), originally covering NV1 but later revised to include NV3, apparently part of a filing with the US Securities and Exchange Commission. -In order to achieve this, many techniques had to be used that ultimately compromised the quality of the 3D rendering of the card, such as using forward texture mapping, where a texel (output pixel) of a texture is directly mapped to a point on the screen, instead of the more traditional inverse texture mapping, which iterates through pixels and maps texels from those. While this has memory space advantages (as you can cache the texture in the very limited amount of VRAM NVIDIA had to work with very easily), it has many more disadvantages; firstly, this approach does not support UV mapping (a special coordinate system used to map textures to three-dimensional objects) and other aspects of what would be considered to be today basic graphical functionality. +In order to achieve this, many techniques had to be used that ultimately compromised the chip's 3D rendering quality, such as forward texture mapping, where a texel (output pixel) of a texture is directly mapped to a point on the screen, instead of the more traditional inverse texture mapping, which iterates through pixels and maps texels from those. While this has memory space advantages (as you can cache the texture in the very limited amount of VRAM NVIDIA had to work with very easily), it has many more disadvantages; firstly, this approach does not support UV mapping (a special coordinate system used to map textures to three-dimensional objects) and other aspects of what would be considered to be today basic graphical functionality. Additionally, the fundamental implementation of 3D rendering used quad patching instead of traditional triangle-based approaches; this has very advantageous implications for things like curved surfaces, and may have been a very effective design for the CAD/CAM customers purchasing more high end 3D products, however, it turned out to not be particularly useful at all for the actually intended target market of gaming. There was also a total lack of Sound Blaster compatibility (very much a requirement for half-decent audio in games back then) in the audio engine, and VGA compatibility was very slow and partially emulated, which led to slow performance in the games people *actually played*, unless your favourite game was a crappier, slower version of *Descent*, *Virtua Cop* or *Daytona USA* for some reason. Another body blow to NVIDIA was received when Microsoft released Direct3D in 1996 with DirectX 2.0, which not only used triangles, but also became the standard 3D API and deprecated all of the numerous non-OpenGL proprietary APIs of the time, including S3's S3D and MeTaL[^metal], ATI's 3DCIF, and NVIDIA's own NVLIB. @@ -62,7 +62,7 @@ Additionally, the fundamental implementation of 3D rendering used quad patching The upshot of all of this was what can be understood as nothing less than the total failure of NVIDIA to sell or convince anyone to develop for NV1 in any way, despite its innovative silicon design. While Diamond Multimedia purchased 250,000 chips to place into their "Edge 3D" series of boards, barely any of them sold, and those that did sell were often returned, leading to the chips themselves being returned to NVIDIA and hundreds of thousands of chips sitting simply unused in warehouses. Barely any NV1-capable software was released, with the few pieces of software that do exist coming via a partnership with Sega (more on that later), while most others were forced to run under software emulators for Direct3D (or other APIs) written by Priem, which were made possible by the software architecture NVIDIA chose for their drivers, but were slower and worse-looking than software rendering, buggy, and extremely unappealing. -NVIDIA lost $6.4 million in 1995 on a revenue of $1.1 million, and $3 million on a revenue of $3.9 million in 1996. Most of the capital that allowed NVIDIA to continue operating were from the milestone payments from SGS-Thomson for developing the card, their NV2 contract with Sega (again, more on that later), and their venture capital funding; not the very few NV1 sales. The card reviewed poorly, had very little software and ultimately no sales, and despite various desperate efforts to revive it, including releasing the SDK for free (including the proprietary NVLIB API used to develop games against the NV1) and straight up begging their customers on their website to spam developers with requests to develop NV1-compatible versions of games, the card was effectively dead within a year. +NVIDIA lost $6.4 million in 1995 on a revenue of $1.1 million, and $3 million on a revenue of $3.9 million in 1996. Most of the capital that allowed NVIDIA to continue operating were from the milestone payments from SGS-Thomson for developing the chip, their NV2 contract with Sega (again, more on that later), and their venture capital funding, but not from the very few NV1 sales. The NV1 was poorly reviewed, had very little software and ultimately no sales; despite various desperate efforts to revive it, including releasing the SDK for free (including the proprietary NVLIB API used to develop games for the chip) and straight up begging their customers on their website to spam developers with requests to add NV1 support to games, the chip was effectively dead within a year. ### The NV2 @@ -84,7 +84,7 @@ The reasons for 3dfx being able to design such an effective GPU when all others [^edge]: Where a triangle is converted into "spans" of horizontal lines, and the positions of nearby vertexes are used to determine the span's start and end positions. -[^span]: To simplify a complex topic, in a GPU of this era, span interpolation generally involves z-buffering (also known as depth buffering), sorting polygons back to front, and color buffering, storing the color of each pixel sent to the screen in a buffer which allows for blending and alpha transparency. +[^span]: To simplify a complex topic, in a GPU of this era, span interpolation generally involves Z-buffering (also known as depth buffering), sorting polygons back to front, and color buffering, storing the color of each pixel sent to the screen in a buffer which allows for blending and alpha transparency. Effectively, NVIDIA had to design a graphics architecture that could at the very least get close to 3dfx's performance, on a shoestring budget and with very little resources, as 60% of their staff (including the entire sales and marketing teams) had been laid off to preserve money. They could not do a complete redesign of the NV1 from scratch if they felt the need to, as it would take two years (time they simply didn't have) and any design that came out of this effort would be immediately obsoleted by competitors, such as 3dfx's Voodoo series, and ATI's Rage which was initially rather pointless but rapidly advancing in performance and driver stability. The chip would also have to work reasonably well on the first tapeout, as there was no capital to produce more revisions of the chip. The fact NVIDIA were able to achieve a successful design in the form of the NV3 under such conditions was a testament to the intelligence, skill and luck of their designers; we will explore how they managed to achieve this later on this write-up. @@ -92,9 +92,9 @@ Effectively, NVIDIA had to design a graphics architecture that could at the very It was with these financial, competitive and time constraints in mind that design on the NV3 began in 1996. This chip would eventually be commercialised as the RIVA 128, standing for "Real-time Interactive Video and Animation accelerator" followed by a nod to its 128-bit internal bus width which was very large at the time. NVIDIA retained SGS-Thomson (soon to be STMicroelectronics) as their manufacturing partner, in exchange for SGS-Thomson cancelling their competing STG-3001 GPU. In a similar vein to the NV1, NVIDIA was to sell the chip as "NV3" and SGS-Thomson was to white-label it as STG-3000, once again separated by audio functionality; however, NVIDIA convinced SGS-Thomson to cancel their own part and stick to manufacturing the NV3 instead, which would prove to be a terrible decision when NVIDIA dropped them in favor of TSMC for manufacturing of the RIVA 128 ZX due to both yield issues and pressure from venture capital funders. STMicro went on to manufacture PowerVR chips for a few more years, before dropping out of the market entirely by 2001. -After the NV2 disaster, the company made several calls on the NV3's design that turned out to be very good decisions. First, they acquiesced to Sega's advice (which they might have already done to save the Mutara V08/NV2 but it was too late) and moved to an inverse texture mapping triangle-based model, although some remnants of the original quad patching design remain. The unused DRM functionality was also remove, which may have been assisted by the replacement of Curtis Priem with David Kirk[^dkirk] as chief designer, as Priem insisted on including the DRM functionality with the NV1, citing piracy issues with the game he had written as a demo of the Malachowsky-designed GX GPU back when he worked at Sun. +After the NV2 disaster, the company made several calls on the NV3's design that turned out to be very good decisions. First, they acquiesced to Sega's advice (which they might have already done to save the Mutara V08/NV2 but it was too late) and moved to an inverse texture mapping triangle-based model, although some remnants of the original quad patching design remain. The unused DRM functionality was also remove, which may have been assisted by David Kirk[^dkirk] taking over from Curtis Priem as chief designer, as Priem insisted on including the DRM functionality with the NV1, citing piracy issues with the game he had written as a demo of the Malachowsky-designed GX GPU back when he worked at Sun. -[^dkirk]: The rather egg-shaped David Kirk is perhaps notable as a "Special Thanks" credit on *Gex* and the producer of the truly unparalleled *3D Baseball* on the Sega Saturn during his time at Crystal Dynamics. +[^dkirk]: David Kirk is perhaps notable as a "Special Thanks" credit on *Gex* and the producer of the truly unparalleled *3D Baseball* on the Sega Saturn during his time at Crystal Dynamics. Another decision that turned out to pay very large dividends was deciding to forgo a native API entirely and build the card around accelerating the most popular graphical APIs, which led to an initial focus on Direct3D, although OpenGL drivers were first publicly released in alpha form in December 1997 and fully in early 1998. DirectX 3.0 was the initial target, and after 4.0 was [cancelled due to lack of developer interest in its new functionality](https://devblogs.microsoft.com/oldnewthing/20040122-00/?p=40963), 5.0 came out late during development of the chip, which turned out to be mostly compliant, with the exception of some blending modes such as additive blending which Jensen Huang later claimed was due to Microsoft not giving them the specification in time. This compliance was made much easier by the design of their driver, which allowed (and still allows) graphical APIs to be plugged in as "clients" to the Resource Manager kernel; as I mentioned earlier, this will be explained in full detail later. @@ -104,13 +104,15 @@ The initial revision of the architecture appears to have been completed in Janua ### RIVA 128 -Luckily for NVIDIA, the NV3 chip worked well enough to be sold to their board partners (almost certainly thanks to that hardware simulation package), and the company survived. Most accounts indicate they were only three or four weeks away from bankruptcy; when 3dfx saw the RIVA 128 at its reveal at the CGDC 1997 conference, one founder responded with "you guys are still around?", considering 3dfx almost *bought* NVIDIA effectively for the purpose of killing the company as a theoretical competitor, but NVIDIA refused as they assumed they would be bankrupt within months anyway. However, this revision A of the chip was not the one NVIDIA actually commercialised; SGS-Thomson dropped the plans for the STG-3000 at some point, which led NVIDIA, now flush with cash[^revenue], to create a new revision of the chip to remove the sound functionality (although some parts remained), fix some errata and make other minor adjustments to the silicon. The chip was respun, with the revision B silicon being completed in October 1997 and presumably available a month or two later; it is most likely that some revision A cards were sold at retail, but based on the dates, these would have to be very early units[^stb], with the earliest NVIDIA RIVA 128 drivers that I have discovered (labelled as "Version 0.75" and also doubling as the only NV1 drivers for Windows NT) being dated August 1997, and reviews starting to drop on websites such as AnandTech in the first half of September 1997. There are no known drivers available for the audio functionality in the revision A RIVA 128, so anyone wishing to use it would have to write custom drivers. +Luckily for NVIDIA, the NV3 chip worked well enough to be sold to their board partners (almost certainly thanks to that hardware simulation package), and the company survived. Most accounts indicate they were only three or four weeks away from bankruptcy; when 3dfx saw the RIVA 128 at its reveal at the CGDC 1997 conference, one founder responded with "you guys are still around?", considering 3dfx almost *bought* NVIDIA effectively for the purpose of killing the company as a theoretical competitor, but NVIDIA refused as they assumed they would be bankrupt within months anyway. However, this revision A of the chip was not the one NVIDIA actually commercialised; SGS-Thomson dropped their plans for the STG-3000 at some point, which led NVIDIA, now flush with cash[^revenue], to create a new revision of the chip to remove the sound functionality (although some parts remained), fix some errata and make other minor adjustments to the silicon. [^revenue]: NVIDIA's revenue in the first nine months of 1997 was only $5.5 million, but skyrocketed up to $23.5 million in the last three months, which correspond to the first three months of the RIVA 128's availability, owing to the numerous sales of chips to add-in board partners. +The chip was respun, with the revision B silicon being completed in October 1997 and presumably available a month or two later; it is most likely that some revision A cards were sold at retail, but based on the dates, these would have to be very early units[^stb], with the earliest NVIDIA RIVA 128 drivers that I have discovered (labelled as "Version 0.75" and also doubling as the only NV1 drivers for Windows NT) being dated August 1997, and reviews starting to drop on websites such as AnandTech in the first half of September 1997. There are no known drivers available for the audio functionality in the revision A RIVA 128, so anyone wishing to use it would have to write custom drivers. + [^stb]: While there are mentions of quality problems with early cards in a lawsuit involving STB Systems, the RIVA 128's first OEM partner, it is not clear if the problems were on STB or NVIDIA's end. -The chip was generally well-reviewed at its launch and considered as the fastest graphics chip released in 1997, beating the Voodoo1 in raw speed but not output video quality, most likely due to NVIDIA's financial situation leading to rushed development of the chip with shortcuts taken in the design process in order to ship on time. Examples of this lower quality include the lack of support for some of Direct3D 5.0's blending modes, and the use of per-polygon mipmapping[^mipmap] instead of the more accurate per-pixel approach, causing seams between different mipmapping layers; the dithering and bilinear texture filtering quality were often criticised as well, and some games exhibited seams between polygons. Furthermore, the drivers were generally very rough at launch, especially if the graphics card was an upgrade and previous drivers were not; while NVIDIA were able to fix many driver issues by the 3.xx versions released in 1998 and 1999, and even wrote a fairly decent OpenGL ICD, the standards for graphical quality had risen over time and what was considered "decent" in 1997 was considered to be "bad" and even "awful" by 1999. +The RIVA 128 was generally well-reviewed at its launch and considered as the fastest graphics chip released in 1997, beating the Voodoo1 in raw speed but not output video quality, most likely due to NVIDIA's financial situation leading to rushed development of the chip with shortcuts taken in the design process in order to ship on time. Examples of this lower quality include the lack of support for some of Direct3D 5.0's blending modes, and the use of per-polygon mipmapping[^mipmap] instead of the more accurate per-pixel approach, causing seams between different mipmapping layers; the dithering and bilinear texture filtering quality were often criticised as well, and some games exhibited seams between polygons. Furthermore, the drivers were generally very rough at launch, especially if the graphics card was an upgrade and previous drivers were not; while NVIDIA were able to fix many driver issues by the 3.xx versions released in 1998 and 1999, going as far as writing a fairly decent OpenGL ICD, the standards for graphical quality had risen over time and what was considered "decent" in 1997 was considered to be "bad" and even "awful" by 1999. [^mipmap]: Mipmapping is a graphical technique involving scaling down textures as you move away from an object in order to prevent shimmering. @@ -120,122 +122,119 @@ After all of this history and exposition, we are finally ready to actually explo --- -## Architectural Overview +## Architectural overview -The NV3 is the third-generation of the NV architecture designed by NVIDIA in 1997, commercialised as the RIVA 128 (or RIVA 128 ZX). It implements a "partially" (by modern standards; by the standards of 1997 it was one of the more fully featured and complete accelerators available) hardware-accelerated, fixed-function 2D and 3D render path, primarily aimed at desktop software and video games. It can use the legacy PCI 2.1 bus, or the then-brand new AGP 1X bus, with the RIVA 128 ZX improving this further in order to use AGP 2X. The primary goal of the architecture was to be cheap to manufacture, be completed quickly (due to the very bad financial condition of NVIDIA at that time), and to beat the 3dfx Voodoo1 in raw pixel pushing performance. It generally achieved these goals, with some caveats, with a cost of $15 per chip in bulk, a design period of somewhere around nine months (excluding Revision B), and mostly better than Voodoo performance (although the Glide API did help 3dfx out); the NVIDIA architecture is much more efficient at drawing small triangles, but this rapidly drops off to a slightly-better-than-the-Voodoo raw performance (which probably ends up being less efficient overall due to the RIVA's higher clockspeed) when drawing larger triangles. While the focus of study has been the Revision B card, efforts have been made to understand both the A and C revisions. To change revision, the NV_PFB_BOOT_0 register in MMIO space (at offset `0x100000`) must return the following values: +NV3 is the third-generation NV architecture designed by NVIDIA in 1997, commercialised as the RIVA 128 family. It implements a fixed-function 2D and 3D render path primarily aimed at desktop software and video games, with hardware acceleration best described as partial by modern standards, but one of the more complete, fully-featured solutions for 1997. It can be attached through the legacy PCI 2.1 bus or AGP 1X (2X on the RIVA 128 ZX), a higher-speed superset of PCI designed for graphics which was brand new at the time but ultimately proved successful. -| Revision | NV_PFB_BOOT_0 value | -| -------- | ------------------- | -| A | `0x30100` | -| B | `0x30110` | -| C | `0x30120` | +The primary goals of this architecture were low manufacturing cost, short development time (due to NVIDIA's dire financial condition at the time), and beating the 3dfx Voodoo1 in raw pixel pushing performance. It generally achieved these goals with caveats, with a bulk cost of $15 per chip, a design period of around 9 months (excluding Revision B), and performance generally better than that of the Voodoo, in spite of 3dfx's more integrated Glide API, and NVIDIA's smaller performance advantage with large triangles. -This changes the "GPU Revision" part of the GPU ID in the framebuffer boot configuration register. +While the focus of study has been the Revision B card, efforts have been made to understand the A and C revisions as well. Each revision has different values for the GPU ID in the framebuffer boot configuration register in MMIO space (at offset `0x100000`) and the PCI configuration space Revision ID register: -Furthermore, the PCI configuration space Revision ID register must return the following values: +| Revision | NV_PFB_BOOT_0 value | PCI revision ID | +| -------- | ------------------- | --------------- | +| A | `0x30100` | `0x00` | +| B | `0x30110` | `0x10` | +| C | `0x30120` | `0x20` | -| Revision | Revision ID | -| -------- | ----------- | -| A | `0x0` | -| B | `0x10` | -| C | `0x20` | +There is a common misconception that the PCI ID is different on RIVA 128 ZX chips; this is partially true, but misleading. The standard NV3 architecture uses a PCI vendor ID of `0x12D2` (identified as "NVidia / SGS Thomson (Joint Venture)" by [The PCI ID Repository](https://pci-ids.ucw.cz/)) instead of NVIDIA's own `0x10DE`, with a device ID of `0x0018`, or `0x0019` on a RIVA 128 ZX with ACPI enabled. However, the presence of a `0x0019` device ID is not sufficient for a RIVA 128 ZX to be detected as such; the revision must be C, or `0x20`, regardless of device ID, as confirmed through VBIOS and driver reverse engineering. Since the device ID can be either value, the best way to check is to use the revision ID encoded into the board at manufacturing time, through the NV_PFB_BOOT_0 register or PCI configuration space. -There is a common misconception that the PCI ID is different on RIVA 128 ZX chips. This is partially true, but misleading. The standard NV3 architecture uses a PCI vendor ID of `0x12D2` (labelled as "SGS/Thomson-NVIDIA joint venture" - not the later NVIDIA vendor ID!) and `0x0018` for the device ID. If ACPI is enabled on a RIVA 128 ZX, the ID changes to `0x0019`. However, the presence of a 0x0019 device ID is not sufficient: the revision must be C, or 0x20, for a RIVA 128 ZX to be detected and the specific Device ID does not matter. This has been verified by reading both reverse engineered VBIOS and driver code. The device ID can be either value, the best way to check is to use the revision ID encoded into the board at manufacturing time (either using the NV_PFB_BOOT_0 register, or the PCI configuration space registers). +The NV3 architecture incorporates accelerated triangle setup (the Voodoo is limited to around 2/3 of that), the aforementioned span and edge interpolation, texture mapping, blending, and final presentation. It does not accelerate the initial polygon transformation or lighting rendering phases. It is capable of rendering in 2D at a resolution of up to 1280x1024 (at least 1600x1200 in ZX, not sure what?) and 32-bit colour. 3D rendering is only possible in 16-bit colour, and at 960x720 or lower in a 4 MB card due to a lack of VRAM. EDID is supported for monitor identification via an entirely software-programmed I2C bus. -The NV3 architecture incorporates accelerated triangle setup, which the Voodoo Graphics only implements around two thirds of, the aforementioned span and edge interpolation, texture mapping, blending, and final presentation. It does not accelerate the initial polygon transformation or lighting rendering phases. It is capable of rendering in 2D at a resolution of up to 1280x1024 (at least 1600x1200 in ZX, not sure what?) and 32-bit colour. 3D rendering is only possible in 16-bit colour, and at 960x720 or lower in a 4MB card due to a lack of VRAM. While 2MB and even 1MB cards were planned, they were seemingly never released. The level of pain of using them can only be imagined; there were also low-end cards released that only used a 64-bit bus - handled using a manufacture-time configuration mechanism, sometimes exposed via DIP switches, known as the straps, which will be explained in Part 2. The RIVA 128 ZX, to compete with the i740, had, among other changes that will be described later, an increased amount of VRAM (8 Megabytes) that also allowed it to perform 3D at higher resolutions of up to 1280x1024. The design of the RIVA is very complex compared to other contemporaneous video cards; I am not sure why such a complex design was used, but it was inherited from the NV1 - the only real reason I can think of is that the overengineered design is intended to be future-proof and easy to enhance without requiring complete rewiring of the silicon, as many other companies had to do. EDID is supported for monitor identification via an entirely software-programmed I2C bus. The GPU is split into a large number (around a dozen) of subsystems (or really "functional blocks" since they are implemented as hardware), each one of which starts with the letter "P" for some reason; some examples of subsystems are `PGRAPH`, `PTIMER`, `PFIFO`, `PRAMDAC` and `PBUS` - presumably, a subsystem has a 1:1 mapping with a functional block on the GPU die, since the registers are named after the subsystem that they are a part of. There are several hundred different registers across the entire graphics card, so things are necessarily simplified for brevity, at least in Part 1. To be honest, the architecture of this graphics card is too complicated to show in a diagram without simplifying things so much as to be effectively pointless or complicating it to the point of not being useful (I tried!), so a diagram has not been provided. +While 2 MB and even 1 MB cards were planned, they were seemingly never released. The level of pain of using them can only be imagined; there were also low-end cards released that only used a 64-bit bus, handled using a manufacture-time configuration mechanism (sometimes exposed via DIP switches) known as straps, which will be explained in Part 2. To compete with the i740, the RIVA 128 ZX had, among other changes that will be described later, an increased amount of VRAM (8 MB) also allowing it to render 3D at higher resolutions of up to 1280x1024. -### Fundamental Concept: The Scene Graph -In order to begin to understand the NVIDIA NV3 architecture, you have to understand the fundamental concept of a scene graph. Although the architecture does not strictly implement a scene graph, the concept is still good to understand how graphical objects are represented by the GPU. A scene graph is a description of a form of tree where the nodes of the tree are graphical objects. The properties of a parent object cascade down to its children; this is how almost all modern game engines represent 3D space (Unity, Unreal, Godot...); a very easy way to understand how a scene graph works is, although with the caveat that characteristics of parent nodes do not automatically cascade down to a child (although they can), is - I am not joking - install Roblox Studio, place some objects into the scene, and save the file as an "RBXLX" file (it has to be RBXLX, as by default since 2013 the engine exports a binary format, although the structure is similar). Then, open it in a text editor of your choice. You will see an XML representation of the scene you have created represented by a scene graph. +The design of the RIVA is very complex compared to other contemporaneous video cards. I am not sure why such a complex design was used, but it was inherited from the NV1; the only real reason I can think of is that the overengineered design is intended to be future-proof and easy to enhance without requiring complete rewiring of the silicon, as many other companies had to do. The GPU is split into a around a dozen subsystems (functional hardware blocks), each with names starting in `P` for some reason; some examples are `PGRAPH`, `PTIMER`, `PFIFO`, `PRAMDAC` and `PBUS`. Presumably, a subsystem has a 1:1 mapping with a functional block on the GPU die, since the registers are named after the subsystem that they are a part of. -The concept of the scene graph is almost certainly how the functional block of all NVIDIA GPUs that actually implements the 2D and 3D drawing engine that makes the GPU, well, a GPU, received its name: `PGRAPH`. This part has survived all the way from very first NV1 all the way to the Blackwell architecture, powering NVIDIA's latest AI-focused GPUs and the brand new RTX 5000 series of consumer-focused GPUs (NVIDIA has not had a ground-up redesign since they started development of their initial NV1 architecture in 1993, although the Ship of Theseus argument applies here). +There are several hundred different registers across the entire graphics card, so things are necessarily simplified for brevity, at least in Part 1. To be honest, the architecture of this graphics card is too complicated to show in a diagram without simplifying things so much as to be effectively pointless or complicating it to the point of not being useful (I tried!), so a diagram has not been provided. + +### Fundamental concept: the scene graph + +In order to begin to understand the NVIDIA NV3 architecture, you have to understand the fundamental concept of a scene graph. Although the architecture does not strictly implement a scene graph, knowing the concept helps understand how graphical objects are represented by the GPU. A scene graph is a description of a form of tree where the nodes of the tree are graphical objects, and the properties of a parent object cascade down to its children. + +This is how almost all modern game engines (Unity, Unreal, Godot...) represent 3D space. A very easy way to understand how a scene graph works is - I am not joking - install Roblox Studio, place some objects into the scene, save it as an `RBXLX` file (not the default), and open it in a text editor of your choice. You will see an XML representation of the scene you have created as a scene graph; the only caveat is that on Roblox, the cascading of characteristics from parent nodes to children is optional. + +The scene graph is almost certainly the namesake for the functional block actually implementing the 2D and 3D drawing engine that makes the GPU, well, a GPU: `PGRAPH`. This part has survived from the very first NV1 all the way to the current Blackwell (RTX 5000) architecture; NVIDIA have never done a ground-up redesign since initial development of the NV1 architecture began in 1993, although the Ship of Theseus argument applies here. ### Clocks -The RIVA 128 is not dependent on the host clock of the machine that it is inserted into. It has (depending on boot-time configuration) a 13.5 or 14.3 Megahertz clock crystal that is split by the hardware into the memory clock (MCLK) and the video clock (VCLK). Note that these names are misleading; the memory clock also handles the actual rendering and timing on the card, with the VCLK seemingly just handling the actual pushing out of frames. The actual clocks are controlled by registers in `PRAMDAC` set by the Video BIOS (which does not otherwise play a serious role in this particular iteration of the NVIDIA architecture - it only does a very basic POST sequence, initialises the card and sets its clockspeed), and can later be overridden by the drivers. After the card is initialised, it effectively never needs the VBIOS again, although there are mechanisms to read from it after initialisation) and were controlled by the OEM manufacturer using three clock parameters (`m`, `n` and `p`), which the card uses to generate the final memory and pixel clock speed using the following algorithm: -`(frequency * nv3->pramdac.pixel_clock_n) / (nv3->pramdac.pixel_clock_m << nv3->pramdac.pixel_clock_p);` +The RIVA 128 is not dependent on the host machine's clock. It has a 13.5 or 14.3 MHz (depending on boot-time configuration) clock crystal, split by the hardware into a memory clock (MCLK) and video clock (VCLK). Note that these names are misleading; the MCLK also handles the chip's actual rendering and timing, with the VCLK seemingly just handling the actual pushing out of frames. -The RAMDAC in the card, which handles final conversion of the digital image generated by the GPU into an analog video signal and clock generation (via three phase-locked loops), has its own clock (ACLK) that ran at around 200 Mhz in the RIVA 128 (revision A/B) and 260 Mhz in the revision C (RIVA 128 ZX) cards. It was not configurable by OEM manufacturers, unlike the other cards. +The actual clocks are controlled by registers in `PRAMDAC` set by the video BIOS, which can later be overridden by drivers. In this iteration of the NVIDIA architecture, the VBIOS only performs a very basic POST sequence, initialises the card and sets its clock speed; once the chip is initialised, the VBIOS is effectively never needed again, although there are mechanisms to read from it after initialisation. Clocks were controlled by card manufacturers through parameters `m`/`n`/`p`, from which the chip derives the final memory and pixel clock speed with the formula `(frequency * n) / (m << p)`. Generally, most manufacturers set the memory clock at around 100 MHz, and the pixel clock at around 40 MHz. -Generally, most manufacturers set the memory clock at around 100 megahertz and the pixel clock at around 40 Megahertz. +The chip's RAMDAC handles final conversion of the digital image generated by the GPU into an analog video signal, and clock generation via three phase-locked loops. It has its own clock (ACLK) running at around 200 MHz on RIVA 128 (revision A/B) and 260 MHz on RIVA 128 ZX (revision C) chips, which unlike the other clocks, was not configurable by manufacturers. -### Memory Mapping -Before we can discuss any part of how the RIVA 128 works, we must explain the memory architecture, since this is a fundamental requirement to even access the graphics card's registers in the first place. NVIDIA picked a fairly strange memory mapping architecture, at least for cards of that time. The exact setup of the memory mapping changed numerous times as NVIDIA's architecture evolved, so only the NV3-based GPUs will be analysed. +### Memory mapping -The RIVA 128 was designed during the era of full-sized, old school PCI, but also needed to be compatible with the then-brand new (the RIVA 128 released the very same month as the first AGP 1X, 1.0-based motherboards) Accelerated Graphics Port (AGP), which was a modified high-transfer speed variant of PCI specifically intended for graphics; all GPUs released between 1997 and 2004, and some low-end GPUs released up to 2008 (the Radeon HD 5000 series of 2009 was intended to have AGP support, and this even made it into the drivers, but the SKUs never launched) used the AGP (or AGP Pro) bus in its various forms. Note that motherboard support did not last that long - a HP computer using an ASUS motherboard I disassembled recently, manufactured in early 2006 and with a motherboard from 2005, exclusively had PCIe. The memory mapping is split into three primary components, all of which are exposed via memory-mapped I/O (there is no facility for any other I/O form, except the Weitek core's registers for VESA compatibility); specifically, they are exposed using the Base Address Registers (BAR) in PCI/AGP configuration space. The RIVA 128 only uses two of these - BAR0 and BAR1; both of these only have their highest byte wired up to anything at all within the GPU and therefore they must be mapped at a 16 megabyte boundary, with BAR0 holding the main GPU registers and BAR1 holding the `DFB` and `RAMIN` areas (which really refer to overlapping areas of memory); these will be delineated later. +Before we can discuss any part of how the RIVA 128 works, the memory architecture must be explained, since this is a fundamental requirement to even access the graphics card's registers in the first place. NVIDIA picked a fairly strange memory mapping architecture, at least for cards of that time. The exact setup of the memory mapping changed numerous times as NVIDIA's architecture evolved, so only NV3-based GPUs will be analyzed. + +The memory mapping is split into three primary components, all exposed via memory-mapped I/O through Base Address Registers (BAR) in PCI configuration space; there is no port I/O support outside of the Weitek core's registers for SVGA compatibility. The RIVA 128 uses two BARs, both 16 MB in size: BAR0 holding the main GPU registers, and BAR1 holding the `DFB` and `RAMIN` areas (which really refer to overlapping areas of memory). #### MMIO -This is the primary area of memory mapping, and is set up as Base Address Register 0 in the PCI configuration registers. This is how you speak to the GPU - sixteen megabytes (!) of MMIO, mapped at whatever the operating system decides to map it at during initialisation (my test system under Windows 2000 tends to map it at `0xdc000000`, but this is HIGHLY variable and dependent on the system configuration). This MMIO area has numerous functional subsystems of the GPU mapped into it and a table of where, and a brief description of what, these subsystems actually are is mapped below (note some parts overlap, and what each graphics object does will be introduced later): -| Range | Name | Purpose | -| ------------------- | ----------- | ------------------------------------------------------------------- | -| `0x0-0xfff` | PMC | Controls the GPU functional units and interrupt state | -| `0x1000-0x1fff` | PBUS | Controls the 128-bit internal bus | -| `0x1800-0x18ff` | PCI mirror | Mirror of PCI configuration registers | -| `0x2000-0x3fff` | PFIFO | FIFO buffer for graphics command submission from DMA | -| `0x4000-0x4fff` | PRM | Real-Mode Device Support (e.g. MPU-401) | -| `0x6000-0x6FFF` | PRAM | Controls RAMIN area configuration | -| `0x7000-0x7FFF` | PRMIO | Real Mode Access registers (see below) | -| `0x9000-0x9FFF` | PTIMER | Custom programmable interval timer | -| `0xA0000-0xAFFFF` | VGA RAM | Emulated VGA VRAM | -| `0xC0000-0xCFFFF` | PRMVIO | Real Mode Video - VGA emulation registers (Weitek) | -| `0x100000-0x100FFF` | PFB | Framebuffer interface - config, debug, initialisation | -| `0x101000-0x101FFF` | PEXTDEV | External Device Interface | -| `0x101000` | PSTRAPS | Device Configuration Bits (Set at factory) | -| `0x110000-0x110FFF` | PROM | VBIOS mirror | -| `0x120000-0x120FFF` | PALT | External memory access mirror (?, possible NV1 remnant) | -| `0x200000-0x200FFF` | PME | Mediaport: External MPEG decoder interface | -| `0x400000-0x401FFF` | PGRAPH | 2D/3D graphics engine: Core | -| `0x410000-0x411FFF` | UBETA | 2D/3D graphics engine: Beta factor object | -| `0x420000-0x421FFF` | UROP | 2D/3D graphics engine: Render operation object | -| `0x430000-0x431FFF` | UCHROMA | 2D/3D graphics engine: Chroma key object | -| `0x440000-0x441FFF` | UPLANE | 2D/3D graphics engine: Plane mask object | -| `0x450000-0x451FFF` | UCLIP | 2D/3D graphics engine: Clip object | -| `0x460000-0x461FFF` | UPATT | 2D/3D graphics engine: Blit pattern object (e.g. for BitBLT) | -| `0x470000-0x471FFF` | URECT | 2D/3D graphics engine: Rectangle object | -| `0x480000-0x481FFF` | UPOINT | 2D/3D graphics engine: Point object | -| `0x490000-0x491FFF` | ULINE | 2D/3D graphics engine: Line object | -| `0x4A0000-0x4A1FFF` | ULIN | 2D/3D graphics engine: Lin (line without starting or ending pixels) | -| `0x4B0000-0x4B1FFF` | UTRI | 2D/3D graphics engine: Triangle object (possible NV1 leftover) | -| `0x4C0000-0x4C1FFF` | UW95TXT | 2D/3D graphics engine: Windows 95 GDI text acceleration object | -| `0x4D0000-0x4D1FFF` | UMEMFMT | 2D/3D graphics engine: Memory to memory format object | -| `0x4E0000-0x4E1FFF` | USCALED | 2D/3D graphics engine: Scaled image from memory object | -| `0x500000-0x501FFF` | UBLIT | 2D/3D graphics engine: Blit object | -| `0x510000-0x511FFF` | UIMAGE | 2D/3D graphics engine: Image object | -| `0x520000-0x521FFF` | UBITMAP | 2D/3D graphics engine: Bitmap object | -| `0x540000-0x541FFF` | UTOMEM | 2D/3D graphics engine: Transfer to memory object | -| `0x550000-0x551FFF` | USTRTCH | 2D/3D graphics engine: Stretched image from CPU object | -| `0x570000-0x571FFF` | UD3D0Z | 2D/3D graphics engine: Direct3D 5.0 triangle w/zeta buffer* object | -| `0x580000-0x581FFF` | UPOINTZ | 2D/3D graphics engine: Point w/zeta buffer* | -| `0x5C0000-0x5C1FFF` | UINMEM | 2D/3D graphics engine: Image in memory object | -| `0x601000-0x601FFF` | PRMCIO | VGA CRTC registers | -| `0x601000-0x601FFF` | PRMCIO | VGA CRTC registers | -| `0x680000-0x6802FF` | PVIDEO | Video overlay engine | -| `0x680300-0x680FFF` | PRAMDAC | Video signal generation, cursor, CLUT, clock generation | -| `0x681200-0x681FFF` | USER_DAC | Optional for external DAC? | -| `0x800000-0xFFFFFF` | USER | Graphics object submission area (for PFIFO, via DMA) | +This is the primary memory mapping area, set up as Base Address Register 0 in the PCI configuration registers. This is how you speak to the GPU: 16 MB (!) of MMIO, mapped at a memory location defined by the system BIOS. Since the video BIOS has no access to PCI services, it instead uses I/O ports `0x3D0`-`0x3D3` in the Weitek SVGA core, mapped to a mechanism called RMA (Real Mode Access); a 32-bit address is formed by writing to all four RMA registers, then the next read/write to the VGA I/O region is redirected to the MMIO area, allowing the VBIOS to access it from real mode and initialise the GPU. -_Note_: There is a wrinkle to this setup here. The VBIOS has to be able to communicate with the main GPU in real mode when PCI is not available. This is achieved by mapping I/O ports `0x3d0`-`0x3d3` in the Weitek core to the registers for a mechanism called RMA - Real Mode Access - that effectively serve as a mechanism for forming a 32-bit address; when a 32-bit address is formed by writing to all four RMA registers, (internally implemented using a mode register) the next SVGA x86 I/O port read/write will become a read/write from the main GPU PCI BAR0 MMIO space. This allows the VBIOS to POST the GPU during its initialisation process. +This MMIO area has numerous functional subsystems of the GPU mapped into it, with some overlap. The actual function of each graphics object will be described later. -*A zeta buffer is Nvidia parlance for a combined Z-buffer (a buffer that is a part of the framebuffer, allowing for orting polygons based on their distance from the camera) and stencil buffer (a buffer allowing for part of an image to be discarded). In this case, a 16-bit z-buffer and 8-bit stencil buffer are interleaved. (Later Nvidia GPUs have a "super zeta buffer"!) +| Range | Name | Purpose | +| ------------------- | ----------- | ------------------------------------------------------------------- | +| `0x0-0xFFF` | PMC | Controls the GPU functional units and interrupt state | +| `0x1000-0x1FFF` | PBUS | Controls the 128-bit internal bus | +| `0x1800-0x18FF` | PCI mirror | Mirror of PCI configuration registers | +| `0x2000-0x3FFF` | PFIFO | FIFO buffer for graphics command submission from DMA | +| `0x4000-0x4FFF` | PRM | Real mode device support (e.g. MPU-401) | +| `0x6000-0x6FFF` | PRAM | Controls RAMIN area configuration | +| `0x7000-0x7FFF` | PRMIO | Real Mode Access registers | +| `0x9000-0x9FFF` | PTIMER | Custom programmable interval timer | +| `0xA0000-0xAFFFF` | VGA RAM | Emulated VGA video memory | +| `0xC0000-0xCFFFF` | PRMVIO | Real Mode Video: VGA emulation registers (Weitek) | +| `0x100000-0x100FFF` | PFB | Framebuffer interface (config, debug, initialisation) | +| `0x101000-0x101FFF` | PEXTDEV | External Device interface | +| `0x101000` | PSTRAPS | Device configuration bits (set at factory) | +| `0x110000-0x110FFF` | PROM | Video BIOS mirror | +| `0x120000-0x120FFF` | PALT | External memory access mirror (unknown, possible NV1 remnant) | +| `0x200000-0x200FFF` | PME | Mediaport: External MPEG decoder interface | +| `0x400000-0x401FFF` | PGRAPH | 2D/3D graphics engine: Core | +| `0x410000-0x411FFF` | UBETA | 2D/3D graphics engine: Beta factor object | +| `0x420000-0x421FFF` | UROP | 2D/3D graphics engine: Render operation object | +| `0x430000-0x431FFF` | UCHROMA | 2D/3D graphics engine: Chroma key object | +| `0x440000-0x441FFF` | UPLANE | 2D/3D graphics engine: Plane mask object | +| `0x450000-0x451FFF` | UCLIP | 2D/3D graphics engine: Clip object | +| `0x460000-0x461FFF` | UPATT | 2D/3D graphics engine: Blit pattern object (e.g. for BitBLT) | +| `0x470000-0x471FFF` | URECT | 2D/3D graphics engine: Rectangle object | +| `0x480000-0x481FFF` | UPOINT | 2D/3D graphics engine: Point object | +| `0x490000-0x491FFF` | ULINE | 2D/3D graphics engine: Line object | +| `0x4A0000-0x4A1FFF` | ULIN | 2D/3D graphics engine: Lin (line without starting or ending pixels) | +| `0x4B0000-0x4B1FFF` | UTRI | 2D/3D graphics engine: Triangle object (possible NV1 leftover) | +| `0x4C0000-0x4C1FFF` | UW95TXT | 2D/3D graphics engine: Windows 95 GDI text acceleration object | +| `0x4D0000-0x4D1FFF` | UMEMFMT | 2D/3D graphics engine: Memory to memory format object | +| `0x4E0000-0x4E1FFF` | USCALED | 2D/3D graphics engine: Scaled image from memory object | +| `0x500000-0x501FFF` | UBLIT | 2D/3D graphics engine: Blit object | +| `0x510000-0x511FFF` | UIMAGE | 2D/3D graphics engine: Image object | +| `0x520000-0x521FFF` | UBITMAP | 2D/3D graphics engine: Bitmap object | +| `0x540000-0x541FFF` | UTOMEM | 2D/3D graphics engine: Transfer to memory object | +| `0x550000-0x551FFF` | USTRTCH | 2D/3D graphics engine: Stretched image from CPU object | +| `0x570000-0x571FFF` | UD3D0Z | 2D/3D graphics engine: Direct3D 5.0 triangle w/zeta buffer[^zetabuf] object | +| `0x580000-0x581FFF` | UPOINTZ | 2D/3D graphics engine: Point w/zeta buffer[^zetabuf] | +| `0x5C0000-0x5C1FFF` | UINMEM | 2D/3D graphics engine: Image in memory object | +| `0x601000-0x601FFF` | PRMCIO | VGA CRTC registers | +| `0x680000-0x6802FF` | PVIDEO | Video overlay engine | +| `0x680300-0x680FFF` | PRAMDAC | Video signal generation, cursor, CLUT, clock generation | +| `0x681200-0x681FFF` | USER_DAC | Optional for external DAC? | +| `0x800000-0xFFFFFF` | USER | Graphics object submission area (for PFIFO, via DMA) | + +[^zetabuf]: A zeta buffer is NVIDIA parlance for a combined Z-buffer (a buffer within the framebuffer for sorting polygons based on their distance from the camera) and stencil buffer (a buffer for discarding parts of an image). In this case, a 16-bit Z-buffer and 8-bit stencil buffer are interleaved. This evolved to a "super zeta buffer" on later NVIDIA GPUs. #### DFB -DFB means "Dumb Framebuffer" (that's what NVIDIA chose to call it) and is simply a linear framebuffer. It is mapped into PCI BAR1 and has a size of 0x400000 by default (depending on the VRAM size?). In the NV3, it is mapped into BAR1 (on later GPUs it was moved to BAR0 starting at `0x1000000`). It is presumably meant for manipulating the GPU without using its DMA facilities. + +`DFB` is the aptly-named "Dumb Framebuffer", a linear framebuffer set up as Base Address Register 1 on the NV3 (moved to BAR0 at `0x1000000` on later GPUs). The default size of 4 MB may change depending on VRAM size. This area is presumably meant for manipulating the GPU without using its DMA facilities. #### RAMIN -Also in PCI BAR1 is the `RAMIN` region. While this area is somewhat complicated, it is the most important area to understand in order to understand how the GPU actually operates. RAMIN is the area of the GPU's VRAM where graphics objects and the structures containing references to them are stored. It is effectively addressed as the last megabyte of VRAM (regardless of the size of VRAM), but addressed in reverse, and aligned to a 16-byte boundary. If this is difficult to understand, you can convert an address in PRAMIN to a real VRAM address using the following formula (where reversal_unit_size is equal to 16): -`real VRAM address = VRAM_size - (ramin_address - (ramin_address % reversal_unit_size)) - reversal_unit_size + (ramin_address % reversal_unit_size) ` +`RAMIN` is also located in BAR1. It's a somewhat complicated area, but also the most important one to understand when it comes to actual operation of the GPU, as it's the part of video RAM where graphics objects and structures containing references to them are stored. -or in the form of bitwise math - code is from my in progress RIVA 128 emulatino (I am only 95% sure that this is right, but it seems to produce the same results as above): - -``` - addr ^= (nv3->nvbase.svga.vram_max - 0x10); - addr >>= 2; // what -``` - -I'm not entirely sure why they did this, but I assume it was for providing a more convenient interface to the user and for general efficiency reasons. +This area is effectively the last megabyte of VRAM (regardless of VRAM size), but organized as 16-byte blocks which are then stored from the top down. A `RAMIN` address can be converted to a real VRAM address with the formula `ramin_address ^ (vram_size - 16)`. I'm not entirely sure why they did this, but I assume it was for providing a more convenient interface to the user and for general efficiency reasons. #### Interrupts + Any graphics card worth its salt needs an interrupt system. So a REALLY good one must have two completely different systems for notifying other parts of the GPU about events, right? There is a traditional interrupt system, with both software and hardware support (indicated by bit 31 of the interrupt status register) controlled by a register in `PMC` that turns on and off interrupts for different components of the GPU. Each component of the GPU also allows individual interrupts to be turned on or off, and has its own interrupt status register. Each component (including the removed-in-revision-B `PAUDIO` for some reason) is represented by a bit in the `PMC` interrupt status register. If the interrupt status register of a component, ANDED with the interrupt status register, is 1, an interrupt is declared to be pending (with some minor exceptions that will be explained in later parts) and a PCI/AGP IRQ is sent. The interrupt registers are set up such that, when they are viewed in hexadecimal, an enabled interrupt appears as a 1 and a disabled interrupt as a 0. Interrupts can be turned off GPU-wide (or for one of just hardware or software) via the `PMC_INTR_EN` register (at `0x0140`) This allows an interrupt to be implemented as: