<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://polymonster.co.uk/feed.xml" rel="self" type="application/atom+xml" /><link href="https://polymonster.co.uk/" rel="alternate" type="text/html" /><updated>2026-03-22T11:58:31+00:00</updated><id>https://polymonster.co.uk/feed.xml</id><title type="html">Alex Dixon</title><entry><title type="html">‘AI coding tools are powerful but we mustn’t let our own skills atrophy’</title><link href="https://polymonster.co.uk/blog/ai-atrophy" rel="alternate" type="text/html" title="‘AI coding tools are powerful but we mustn’t let our own skills atrophy’" /><published>2026-03-20T00:00:00+00:00</published><updated>2026-03-20T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/ai-atrophy</id><content type="html" xml:base="https://polymonster.co.uk/blog/ai-atrophy"><![CDATA[<p>It started with the “don’t get left behind” brigade, somehow they managed to convince me that I was now missing out on developing critical new skills. People who hadn’t put in the hours to learn to code in the first place were now at the top of the field and leaving the rest of us behind, who have put in multiple decades and tens of thousands of hours of effort to learn our craft. They want their PR’s merged upstream to get credit on GitHub for code they didn’t write or even understand, flooding the pull request system to breaking point. I tried to ignore it but the chatter was incessant and I had to take a look for myself.</p>

<p>I didn’t want the AI coding tools to be good. After all, I have dedicated a large part of my life to coding. It is much more than a job, it’s a hobby, a love, part of who I am.  But coding is dead, it’s going to be over in 6 months. I’ve been having to hear this daily for the last 3 years and it is exhausting and inescapable, everywhere I look I see it. “The models will get better” they say, “this is the worst it will ever be”… and so it goes on and on and on. Waking up every day reading articles about how your job is going to be replaced is not good for mental health, when will we ever hear the end of the constant hype train?</p>

<p>I don’t want the AI hyperscaler tech bros to succeed. With the enshitification of technology steadily underway, I don’t trust a single one of them. Now they have plagiarised the entire sum of human knowledge they want to kick down the ladder, sue the competition, close it all off, we were here first. These advancements should be exciting, but knowing capitalism will certainly try its best to eventually squeeze every last penny out of it makes it hard for me to appreciate the moment we are living in.</p>

<p>That being said, these tools are impressive, sometimes they amaze me, sometimes they frustrate me, and I dislike the landscape surrounding it all at the same time. The dichotomy is hard to explain, I didn’t want to get hooked on wanting to use them, but after just one session where my friend first showed me Claude code, I was dreaming about it, the thoughts of using AI infiltrated my mind. I was somehow addicted already.</p>

<p>I work with ML researchers, and have dipped my toe in there myself a little. A while ago a colleague explained LLMs to me in a way that stuck: they’re very good at interpolation. Words are represented as vectors in a high-dimensional space, and the model learns patterns between them. Based on this prior insight I started with tasks that I thought would be simple and likely a success: adding new records store scrapers to my <a href="https://github.com/polymonster/diig">music app</a>; filling out some missing parts of a <a href="https://github.com/polymonster/hotline">graphics engine</a> backend; and clearing some long overdue technical debt. Even though I expected these tasks to be trivial, I was still impressed with how Claude worked, how it asked questions to clarify details, and how it seemed to understand exactly what I wanted.</p>

<p>I was sucked in even further. I paid for a subscription despite saying I hated big tech and would not pay them a penny… What a hypocrite.</p>

<p>It’s hard to measure productivity, but I can say that AI has given me a motivational boost to get back into projects. Sometimes knowing tech debt exists in a project is enough to slow me down. Asking AI to clean that up while I do the interesting stuff feels like a weight lifted off my shoulders. Coming back to the code base after a while, things can feel unfamiliar and AI is great at helping to ease you back into things as well, explaining the current state of something that was a work in progress. I was really amazed at issues found when I asked Claude to review some of my code bases. It found subtle bugs from just looking at the code, I fixed them and together added tests to catch those cases. The code review was so useful for my Rust <a href="https://github.com/polymonster/maths_rs">maths library</a> that I decided to publish it as version <a href="https://crates.io/crates/maths-rs">1.0.0</a>. These aspects of AI coding augment my skills, make light work of the boring stuff and also help me make sure every detail is covered meticulously.</p>

<p>The ease of development is powerful but it can also be a double edged sword. Since it is so easy to ask to fix something or add a new feature you can easily end up with a lot of features and a lot of code. More code is not better, more code is bad, it is more to maintain and it increases the complexity of any future work. Being able to pile on features without thought dilutes our ability to discern the most impactful meaningful changes. Good software engineers tend to naturally optimize in this regard, because it means less work. Less work is good and laziness can make for smarter decisions.</p>

<p>The veneer can easily peel away when working on complex and abstract problems. I started to get into a rut with difficult tasks. Claude was struggling, I didn’t like the code it generated and it was taking a lot of time to not just read the code, but became detached from what I was trying to achieve. I started to realise that using an LLM to code completely changes our relationship to the code. This changing relationship has become prominent in a series I was livestreaming on YouTube entitled <a href="https://www.youtube.com/watch?v=qLcyWXWqqNU">Sloppy Gamedev</a>. The aim of this is to try and make a game using AI. I originally attempted to purely vibe code it, during the first session I realised Claude wasn’t going to be able to make a game on vibes alone. The first attempt was pretty terrible, lots of hardcoded values so it was not extendible or reusable and lots of bugs.</p>

<p>So you could say this is a skill issue, I need better prompts or better context, and if you had all of that information ahead of time, maybe a purely vibe coded game would be possible. Claude needs a lot of detail and I also need to figure out and understand what those details should be. With things like gamedev a lot of the problems require iteration and this is where I think things start to break down. What I need is a more collaborative relationship between my code, LLM code, and understanding of the architecture we need. I found this difficult at first because I did not like the LLM generated code, I did not want to edit it myself and felt alienated.</p>

<p>Sometimes the code it generates is just very anti-human, an example being that I noticed inline dot products and magnitude calculations with the code repeated verbatim each time, expanding scalar maths and not using the maths library functions. Things that have always been implicit now need to be explicit, and currently for me at least it’s very difficult to forecast everything ahead of time. We can tweak things in plan mode but after a long time of trying to refine a single plan I am itching to accept it, to see the parts that work and then that will allow me to iterate again. This is how I naturally work, write a small burst of code, test it, run it, tweak it, and continue. Claude generates a lot of code and quickly, accepting code that partially works to just see it in action leads to immediate tech debt.</p>

<p>Working on a crowd simulation, agents were getting stuck on corners. I asked Claude multiple times to fix it, it piled on more code, each time claiming to have fixed it. I found myself trapped in the prompt loop death spiral. I had to force myself to sit down at the computer with no help allowed and just figure it out myself. I spent a good few hours just drawing some debug geometry and fiddling around with the problem. This is where the realisation really hit me about what I was missing. The process itself is a crucial part of development, it’s not about the lines of code in the end, it’s about the intuition you build to get there. In this session not only did I improve the agents getting stuck on corners I also gained key insights on how to further improve it and how to parameterize and control the improvements. The code I wrote to do this was actually not great, it was messy and it was thrown away straight after, but that’s OK and that is part of the process. Somehow I had lost this ability when prompting Claude; it had changed me, frozen, unable to understand the code only able to prompt again and again. It took restraint not to just reach out and ask an LLM to do the thing for me, but in pushing past that barrier I was rewarded.</p>

<p>Since then I have been better able to guide Claude with a new found understanding of the problem. The important detail here is building intuition and knowledge, this worries me since I feel an element of skill atrophy when it’s so easy to just ask for help. I have already put in a lot of time learning the hard way, what chance do the newcomers have when people say there is no point in learning to code anymore? How are we able to steer the LLMs if we don’t understand the problems we need to solve? Claude’s attempt at gamedev was quite poor on its own, with guidance and collaboration it was much better, so for that reason I think learning to code and learning to understand the LLM generated code will always be an important skill.</p>

<p>If you liked this, check out my <a href="https://www.youtube.com/@polymonster">YouTube</a> where I’m messing around with AI and check out my mostly handwritten <a href="https://github.com/polymonster">repos</a> :)</p>]]></content><author><name></name></author><summary type="html"><![CDATA[‘A candid first-person account of trying AI coding tools, weighing their genuine productivity gains against concerns about skill atrophy, trust, and the relentless AI hype cycle.’]]></summary></entry><entry><title type="html">Porting diig from iOS to Android in less than 2 weeks</title><link href="https://polymonster.co.uk/blog/porting-diig-to-android" rel="alternate" type="text/html" title="Porting diig from iOS to Android in less than 2 weeks" /><published>2026-01-28T00:00:00+00:00</published><updated>2026-01-28T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/porting-diig-to-android</id><content type="html" xml:base="https://polymonster.co.uk/blog/porting-diig-to-android"><![CDATA[<p>I recently decided to port my music app ‘diig’ to Android since I had some requests from friends and other potential Android users. The app was originally designed and built for iOS using all of my own code and no external frameworks. The whole thing took around 2 weeks to port, not full days work but just a few hours here and there. 2 weeks to port an entire app, 99% feature parity with iOS.</p>

<p>I began work while on the train travelling back to visit my parents and decided to ‘raw dog’ some relaxing ‘casual coding’ on the way. This part of the process was about as relaxing as Super Hans’ infamous ‘relaxing bit of crack’ line in Peep Show. The code base for <a href="https://github.com/polymonster/diig">diig</a> is already multi-platform since the backend uses my game engine <a href="https://github.com/polymonster/pmtech">pmtech</a>. The engine already supported a number of platforms but not Android, however since Android is built on Linux, I already had a lot of functionality I could reuse from my Linux backend. I have a well tested OpenGL/WebGL and GLES rendering backend and FMOD for cross platform audio for the time being. I knew code wise there were only a few gaps that needed filling in.</p>

<h2 id="premake">Premake</h2>

<p>Premake made getting started very easy. I think it is a very underrated and overlooked tool. I don’t know how CMake became the de facto gold standard for project generators. Premake makes project configuration easy in lua scripts, which I find more flexible than CMake.</p>

<p>The lua config setup allows you to specify project level, compiler and linker settings. With variables you can generically handle multiple platforms and configurations easily. My existing configs already had code paths for Win32, macOS, iOS and Linux with multiple rendering backends like Direct3D11, OpenGL and Metal. So a good portion of setting up Android was just plumbing through another combination of platform and config settings.</p>

<p>Premake outputs CMake and gradle build files. Any useful setting that is required in gradle files needs to be passed from premake. I had to add some new settings and propagate information set via premake into the gradle or CMake files. This is a necessary step, it makes it a little bit more painful in the setup but it serves you much more over time, because it ensures your projects can be generated the same on other machines.</p>

<p>The lua configs for multi-platform configuration were pretty easy to extend and add Android, although the code does feel a little bit spaghetti-like since platforms and features have been tacked on over a long period of time now. I decided just to add another bunch of stuff onto the Jenga tower and not get sidelined refactoring. I would like to rewrite the scripts and could do a neater job, but it doesn’t really add much value to what I am trying to achieve here.</p>

<h2 id="android-sdk">Android SDK</h2>

<p>The real pain of the whole process was the Android platform itself. It’s just quite fiddly to get it into a good state. The build system uses gradle, CMake and ninja. There is the SDK, the NDK (Native Development Kit) for C/C++ and the JDK for Java. All of these dependencies have their own versions and breaking changes happen frequently.</p>

<p>I thought first I would quickly plumb through a platform path for Android in my premake configs and get straight to fixing up compile errors. This was wishful thinking! I had to spend the best part of a 3-hour train journey fighting with all of the various parts of the Android build system to massage all the dependencies into a place where they worked together. Plus having to modify and remove deprecated functionality since I encountered breaking changes from a necessary SDK update.</p>

<p>I was doing this on a train with mobile hot-spot internet on my phone and ended up with a 50gb+ install of various dependencies. This is one problem that will not go away. It works as of now, but as time goes on this process repeats as versions of the various dependencies update. It’s not just a case of updating an SDK and then fixing the deprecated parts - you have multiple components, and you have to individually manage and then ensure they are compatible. You suffer for a day to fix it all up once every 6 months or at the time where you come back to the project after a while away.</p>

<h2 id="android-studio">Android Studio</h2>

<p>Android Studio itself is not the nicest IDE or debugger, but it is handy to have a graphical debugger and not just debugging from the command line, so I suffer through the issues. I do find the interface to be very noisy, there are a lot of pop ups and dialogs, squiggles, indents, inline hints, and long error messages. As you are working you can see the intelli-sense update and the elements subtly shift and move in the UI. It infuriates me that the shortcut keys for debug stepping and continue are different to visual studio and vscode, and that all of the shortcuts feel different and non standard, this adds another layer of friction. I could spend time configuring it, but when you want to get stuck in to some work in a short time period you don’t want to waste time configuring keyboard shortcuts.</p>

<p>Feedback of error messages in Android Studio also seem to be way more verbose and annoying that any other IDEs, for example when you get a C++ compiler error, the end of the massage has tons of unnecessary verbal spew about  java exceptions and you have to scroll back through the log to find the real errors. This makes things especially more stressful when you get confusing build errors you are un-familiar with. Logcat is also particularly stressful, it outputs so much information you have to sift through to find your own errors buried under a mountain of irrelevant info, if you filter it you worry you are missing some critical extra info.</p>

<h2 id="entry-point--program-structure">Entry Point / Program Structure</h2>

<p>For the entry point and core interaction with the SDK, Android uses Java or Kotlin code. Any C++ code needs to be compiled into a shared library. You can use an NDK only approach, but I am familiar with the Java setup having used it in the past, and it does make some things easier since the NDK and the C versions of the API are badly documented and there are more Java examples.</p>

<p>Android requires an Activity which represents the application flow. You implement methods such as <code class="language-plaintext highlighter-rouge">onCreate</code> or <code class="language-plaintext highlighter-rouge">onPause</code> and <code class="language-plaintext highlighter-rouge">onResume</code>. These are invoked by the OS when you start your app. I also implement a wrapper of the <code class="language-plaintext highlighter-rouge">SurfaceView</code> that handles the creation of an EGL context and OpenGL surface for rendering.</p>

<p>The engine consists of 2 C++ static libraries which are linked to the <code class="language-plaintext highlighter-rouge">diig.so</code> shared library. Android is a bit more awkward than other platforms because ordinarily you would build an executable that links the 2 other C++ libs. In this case the executable is Java and we load the C++ dynamically at launch.</p>

<h2 id="compiler-errors">Compiler Errors</h2>

<p>The next step on a journey to porting is to get around any compiler errors. This part of the process is actually where I start to feel more comfortable. Mostly this is inside C++ files and most of the errors are expected or in my own code, which gives me agency to fix it.</p>

<p>I have had to do a lot of porting for work so this part comes quite naturally. First the project will need tweaking a bit, making sure include paths are set or all the right files have been added to compilation. Then I tend to <code class="language-plaintext highlighter-rouge">ifdef</code> out problematic code that I don’t currently need and look at it later, to focus on a small subset of the code base. I try to use <code class="language-plaintext highlighter-rouge">Ifdefs</code> for platform specific functionality sparingly and split things into file-per-platform. Some <code class="language-plaintext highlighter-rouge">ifdefs</code> go into shared code like OpenGL or the shared posix implementation. But once the project was set up, I didn’t have a great deal of legitimate compiler issues because of how much existing code was reused.</p>

<h2 id="linker-errors">Linker Errors</h2>

<p>Linker errors will occur for missing symbols that do not have an implementation for the Android platform. Most of these were to be expected and it is an easy fix. Here I just make a function stub for any of the missing symbols, that is an empty function that just returns a default value if applicable.</p>

<p>There were a few tricky linker issues to solve involving the audio system. FMOD has its own native libs and they need to be copied into a subdirectory of the Android studio project called <code class="language-plaintext highlighter-rouge">jniLibs</code>. To do this I added a copy step in premake, which copies the files during premake project generation. FMOD also requires some calls to <code class="language-plaintext highlighter-rouge">loadLibs</code> to load the C++ code and a call to <code class="language-plaintext highlighter-rouge">FMOD_Android_JNI_Init</code>. This took me a little time to figure out since I had cryptic error messages, but persistence always prevails and I got there in the end.</p>

<h2 id="development">Development</h2>

<p>Getting to this point took a good few days, but this was the fun part, or the place I wanted to be. With the code compiling and running it was time to slowly, one function at a time implement the missing functionality of the stub functions. I used these small <a href="https://github.com/polymonster/pmtech/tree/master/examples/code">examples</a> as unit tests to isolate functionality.</p>

<p>The first step was to get the <code class="language-plaintext highlighter-rouge">empty_project</code> sample working that just logs something to the console, this required implementing the logging macro since printf does not display in logcat. After that it was a straightforward process to try rendering the <code class="language-plaintext highlighter-rouge">basic_triangle</code> to make sure OpenGL was working OK. I moved onto the <code class="language-plaintext highlighter-rouge">imgui_example</code> to make sure I could use the UI. <code class="language-plaintext highlighter-rouge">play_sound</code> to test FMOD, which is important since this is a music app. Finally <code class="language-plaintext highlighter-rouge">input_example</code> to hook in the input and touch events. Once these samples were working I had all of the core functionality for diig and the app should “just work”.</p>

<p>In total I ended up adding 871 lines of C++ code for the <code class="language-plaintext highlighter-rouge">os</code> module, 151 lines of C++ for Android filesystem related code, and 487 lines of Java code for the core activity. This was the bulk of it for the entire backend. Modifications were required in a few places for platform specific quirks in FMOD, OpenGL. There were also 100 or so lines of lua code for premake.</p>

<h2 id="jni">JNI</h2>

<p>To interoperate between Java and C++ code the Java Native Interface is used. I have to use JNI to pass information from the Java side, such as touch and keyboard (OSK) events from Java where they originate, and through to the C++ code the rest of the app code base calls. I also have to interop in both directions. Calling C from Java is quite simple, you just need to use <code class="language-plaintext highlighter-rouge">public static native</code>. Going the other direction is a little bit more work:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">os_clear_clipboard_string</span><span class="p">()</span>
<span class="p">{</span>
    <span class="k">auto</span> <span class="n">env</span> <span class="o">=</span> <span class="n">get_jni_env</span><span class="p">();</span>
    <span class="k">if</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">jmethodID</span> <span class="n">method</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">GetMethodID</span><span class="p">(</span><span class="n">s_android_context</span><span class="p">.</span><span class="n">m_activity_class</span><span class="p">,</span> <span class="s">"clearClipboardString"</span><span class="p">,</span> <span class="s">"()V"</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">CallVoidMethod</span><span class="p">(</span><span class="n">s_android_context</span><span class="p">.</span><span class="n">m_activity_object</span><span class="p">,</span> <span class="n">method</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When calling Java from C++ you have to get the method by name, but also provide the signature for the types. Then there are a host of functions you can call such <code class="language-plaintext highlighter-rouge">CallVoidMethod</code>, <code class="language-plaintext highlighter-rouge">CallBooleanMethod</code> and so on for each type. It’s pretty simple but also easy to make a mistake and get the signature wrong or call the wrong typed call. It doesn’t take much effort but the “plumbing” adds up, so I try to have a minimal amount of these wrapper functions.</p>

<h2 id="a-loose-end">A Loose End</h2>

<p>There is one loose end that is still to clean up even after writing this post. It annoyed me when I encountered it and it is just the way things are. When the app backgrounds on Android, upon return the whole app boots again from the start. I had to do this because upon background and return the EGL Context (OpenGL) is lost and that means all GPU resources need to be recreated. iOS does not have this same behaviour and the OS magically sorts it out for you. I was being lazy and just didn’t get round to thinking about a strategy for it yet. I have had to do this kind of thing before for my job. I just really cannot be bothered with this sort of menial work imposing itself on this project which I wanted to be light and fun, it all too soon starts feeling like a job and I suppose depending on how far I want to take it I will have to get round to fixing this, but for the time being I chose to ignore it.</p>

<h2 id="google-play-store">Google Play Store</h2>

<p>The final boss was automation and delivery to the Google Play store. First I had to pay £20 to actually set up an account. The one off fee is certainly better than Apple’s yearly £80 developer fee, but my bank blocked the transaction and I had to jump through some additional hoops to make sure I didn’t accidentally pay it twice. Then you have to do identity verification where I had to send my driving license, passport photo, a bank statement, and the blood of my first unborn child.</p>

<p>I went about setting up a GitHub action to automate the publishing. This <a href="https://github.com/r0adkll/upload-google-play">Action</a> is helpful for handling Google Play. I hooked it up, ran it and the upload failed. I persisted until I discovered my account was not verified yet, so I had to wait for a few days for that to happen. After verification I  tried again, still failing to upload. It turns out you need to first push a build manually from the Google Play console, do that and run a build… still fails. At this point the error was regarding the json key not being valid to upload.</p>

<p>This was the most frustrating part of the entire process, the Google documentation was out of date. The way you enable auth or generate a token for Google Play upload had changed and the documentation had not. The error message was not very clear so I tried many times adding permissions to various accounts. I tried using Copilot, it gaslighted me time and time again. I persisted until I finally found this <a href="https://help.radio.co/en/articles/6232140-how-to-get-your-google-play-json-key">article</a> that described the new steps necessary to generate a json key. And success, cloud based build and release with the push of a tag!</p>

<p>The automation to Google Play has been rock solid and reliable since using it, more so than the equivalent for iOS. At some point Apple started enforcing that build machines were registered and linked to your developer account to be able to push to TestFlight. This means you can’t use a cloud GitHub action runner and instead I need to use my own machine as a self-hosted runner. This adds extra admin and unnecessary friction to the whole process, whereas Android just looks after itself.</p>

<h2 id="onwards">Onwards</h2>

<p>Porting is a game of persistence, it can be a slog at times, but if you keep on persisting, fixing the errors one by one, starting small, building outward you always get there in the end.</p>

<p>It can be a bit of a rollercoaster, the dopamine hits hard when some existing code “just works” or it goes smoothly because of well planned abstractions that were set in place years ago, the feeling of relief when you manage to work around some obtuse error from dependencies you have never heard of… just to be hit by crushing anxiety at the new error that appears in its place.</p>

<p>After the initial steps of friction, the setup has been incredibly fun to use and all of the extra effort required for setting up and configuring something to be not just multi-platform but seamlessly multi-platform makes me able to just dip in and out and do little bits of work flexibly. I can work on my PC and target Android with a dual monitor and more desk space. Or on macOS and target Android or iOS on my laptop, on the go or more casually.</p>

<p>The diig app is available in closed beta for Android or iOS. If you would like to try it out please contact me for an invite.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[How I ported the diig iOS music app to Android in under 2 weeks using the pmtech game engine's Linux backend, Premake, and OpenGL ES, achieving 99% feature parity.]]></summary></entry><entry><title type="html">Borrow checker says “No”! An error that scares me every single time!</title><link href="https://polymonster.co.uk/blog/borow-checker-says-no" rel="alternate" type="text/html" title="Borrow checker says “No”! An error that scares me every single time!" /><published>2025-10-31T00:00:00+00:00</published><updated>2025-10-31T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/borow-checker-says-no</id><content type="html" xml:base="https://polymonster.co.uk/blog/borow-checker-says-no"><![CDATA[<p>It’s Halloween and I have just been caught out by a spooky borrow checker error that caught me by surprise. It feels as though it is the single most time consuming issue to fix and always seems to catch me unaware. The issue in particular is “cannot borrow x immutably as it is already borrowed mutably” - it manifests itself in different ways under different circumstances, but I find myself hitting it often when refactoring. It happened again recently so I did some investigating and thought I would discuss it in more detail.</p>

<p>The issue last hit me when I was refactoring some code in my graphics engine <a href="https://github.com/polymonster/hotline">hotline</a>, I have been creating some content on YouTube and, after a little bit of a slog to fix the issue, I recorded a video of me going through the scenario of how it occurred and some patterns to use that I have adopted in the past to get around it. You can check out the video if you are that way inclined, the rest of this post will mostly echo what is in the video, but it might be a bit easier to follow code snippets and description in text.</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/5X4sftCRac0?si=CY8AfYDXrs_8WlZm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p>I have a generic graphics API, which consists of traits called <a href="https://github.com/polymonster/hotline/blob/master/src/gfx.rs">gfx</a>. This is there to allow different platform backends to implement the trait; currently I have a fully implemented Direct3D12 backend and I recently began to port macOS using Metal.</p>

<p>The gfx backend wraps underlying graphics API primitives; in this case we are mostly concerned about <code class="language-plaintext highlighter-rouge">CmdBuf</code> which is a command buffer. Command buffers are used to submit commands to the GPU. They do things like <code class="language-plaintext highlighter-rouge">draw_indexed_instanced</code> or <code class="language-plaintext highlighter-rouge">set_render_pipeline</code>, amongst other things. For the purposes of this blog post, what the command buffer does is not really that important, just that is does <code class="language-plaintext highlighter-rouge">do_something</code>, which at the starting point when the code was working is a trait method that takes an immutable self and another immutable parameter ie. <code class="language-plaintext highlighter-rouge">fn do_something(&amp;self, param: &amp;Param)</code>.</p>

<p>In the rest of the code base I have a higher level rendering system called <code class="language-plaintext highlighter-rouge">pmfx</code>. This is graphics engine code that is not platform specific but implements shared functionality. So where <code class="language-plaintext highlighter-rouge">gfx</code> is a low level abstraction layer, <code class="language-plaintext highlighter-rouge">pmfx</code> implements concepts of a <code class="language-plaintext highlighter-rouge">View</code> that is a view of a scene that we can render from. A <code class="language-plaintext highlighter-rouge">View</code> has a camera that can look at the scene and is then passed to a render function, which can build a command buffer to render the scene from that camera’s perspective. The engine is designed to be multithreaded and render functions are dispatched through <code class="language-plaintext highlighter-rouge">bevy_ecs</code> systems, so a view gets passed into a render system but it is wrapped in an <code class="language-plaintext highlighter-rouge">Arc&lt;Mutex&lt;View&gt;&gt;</code>.</p>

<p>I made a small cutdown example of this code to be able to demonstrate the problem I encounter, so let’s start with the initial working version:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nb">Arc</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="n">Mutex</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">Cmd</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">View</span> <span class="p">{</span>
    <span class="n">cmd</span><span class="p">:</span> <span class="n">Cmd</span><span class="p">,</span>
    <span class="n">param</span><span class="p">:</span> <span class="n">Param</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">Param</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Cmd</span>
<span class="p">{</span>
    <span class="k">fn</span> <span class="nf">do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">param</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Param</span><span class="p">)</span> <span class="p">{</span>
        <span class="nd">unimplemented!</span><span class="p">(</span><span class="s">""</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">get_view</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Arc</span><span class="o">&lt;</span><span class="n">Mutex</span><span class="o">&lt;</span><span class="n">View</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="nd">unimplemented!</span><span class="p">();</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="n">view</span><span class="py">.cmd</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I tried to simplify it as much as possible so these snippets should compile if you copy and paste them, they won’t run thanks to <code class="language-plaintext highlighter-rouge">unimplemented!</code> macro (which I absolutely love using, it is so handy!) but we only care about the borrow checker anyway.</p>

<p>All we really need to think about is that a <code class="language-plaintext highlighter-rouge">Cmd</code> can <code class="language-plaintext highlighter-rouge">do_something</code> and it also gets passed in a <code class="language-plaintext highlighter-rouge">Param</code>, which is also contained as part of ‘view’. Coming from a C/C++ background I landed on my personal preference being procedural C code with context passing, so I tend to group things together into a single struct. It makes sense to me in this case and I wanted to group everything inside <code class="language-plaintext highlighter-rouge">View</code>, and we fetch the view from elsewhere in the engine.</p>

<p>So the code in the snippet compiles fine and I was working with this setup for some time. I began work on macOS and it turned out that the <code class="language-plaintext highlighter-rouge">do_something</code> method needed to mutate the command buffer so that I could mutate some internal state and make the Metal graphics API behave similarly to Direct3D12. This is common for graphics API plumbing.</p>

<p>The specific example in this case was that in Direct3D we call a function <code class="language-plaintext highlighter-rouge">bind_index_buffer</code> to bind an index buffer before we make a call to <code class="language-plaintext highlighter-rouge">draw_indexed</code>, but in Metal there is no equivalent to bind an index buffer. Instead you pass a pointer to your index buffer when calling the equivalent draw indexed. So to fix this, when we call <code class="language-plaintext highlighter-rouge">bind_index_buffer</code> we can store some extra state in the command buffer so we can pass it in the later call to <code class="language-plaintext highlighter-rouge">draw_indexed</code>.</p>

<p>In hindsight any method on the command buffer trait that does anything, like set anything or write into the command buffer, should take a <code class="language-plaintext highlighter-rouge">&amp;mut self</code> because it is mutating the command buffer after all. In my case since I am calling through to methods on  <code class="language-plaintext highlighter-rouge">ID3D12CommandList</code>, which is unsafe code and does not require any mutable references.</p>

<p>In our simplified example, in order to store, state <code class="language-plaintext highlighter-rouge">do_something</code> now needs to change and take a mutable self: <code class="language-plaintext highlighter-rouge">do_something(&amp;mut self, param: &amp;Param)</code> it should be noted that <code class="language-plaintext highlighter-rouge">view</code> itself was already <code class="language-plaintext highlighter-rouge">mut</code>.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span> <span class="n">Cmd</span>
<span class="p">{</span>
    <span class="k">fn</span> <span class="nf">do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">param</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Param</span><span class="p">)</span> <span class="p">{</span>
        <span class="nd">unimplemented!</span><span class="p">(</span><span class="s">""</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="n">view</span><span class="py">.cmd</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Borrow checker now kicks in…my heart sinks. In the real code base not only did I have to modify a single call site, but I had hundreds of places where this error was happening, I made the decision here and now to make any methods that write to the command buffer also be mutable and make the mutability</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0502]: cannot borrow `view` as immutable because it is also borrowed as mutable
  --&gt; src/main.rs:30:28
   |
30 |     view.cmd.do_something(&amp;view.param);
   |     ----     ------------  ^^^^ immutable borrow occurs here
   |     |        |
   |     |        mutable borrow later used by call
   |     mutable borrow occurs here

For more information about this error, try `rustc --explain E0502`.
error: could not compile due to 1 previous error
</code></pre></div></div>
<p>This is not the first time I have encountered this problem and I doubt it will be the last. There are a number of ways to resolve it and they aren’t too complicated. The frustrating thing is that it seems to occur always when you are doing something else and not just when you decide to refactor, so you end up having a mountain of errors to solve before you can get back to the original task. I suppose you could call it a symptom of bad design or lack of experience, but when writing code things inevitably change and bend with new requirements, and Rust throws these unexpected issues up for me more often than I find with C, and often the required refactor takes more effort as well. But that is the cost you pay, hopefully more upfront effort to get past the borrow checker means fewer nasty debugging stages later. So let’s look at some patterns to fix the issue!</p>

<h3 id="take">Take</h3>

<p>The one I actually went for in this case was using <code class="language-plaintext highlighter-rouge">std::mem::take</code>. We take the <code class="language-plaintext highlighter-rouge">CmdBuf</code> out of view so we no longer need to borrow a ‘view’ to use <code class="language-plaintext highlighter-rouge">cmd</code>, and then when finished return the cmd into ‘view’. It is important to note here that <code class="language-plaintext highlighter-rouge">CmdBuf</code> needs to derive default in order for this to work, as when we take the <code class="language-plaintext highlighter-rouge">cmd</code> in <code class="language-plaintext highlighter-rouge">view</code> will become <code class="language-plaintext highlighter-rouge">CmdBuf::default()</code></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Default)]</span>
<span class="k">struct</span> <span class="n">Cmd</span><span class="p">;</span>

<span class="c1">// ..</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// take cmd out of view</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">cmd</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">take</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">view</span><span class="py">.cmd</span><span class="p">);</span>

    <span class="c1">// the immutable and mutable references are now split</span>
    <span class="n">cmd</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>

    <span class="c1">// return the cmd into view</span>
    <span class="n">view</span><span class="py">.cmd</span> <span class="o">=</span> <span class="n">cmd</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This approach is the simplest I could think of at the time because any existing code using <code class="language-plaintext highlighter-rouge">view.cmd</code> doesn’t need updating, everything stays the same and we just separate the references. In this case it was easy to derive the default for  <code class="language-plaintext highlighter-rouge">CmdBuf</code>.You need to remember to set the <code class="language-plaintext highlighter-rouge">cmd</code> back on <code class="language-plaintext highlighter-rouge">view</code> here, which could be a pitfall and cause unexpected behaviour if you didn’t.</p>

<h3 id="edit-update">EDIT: Update</h3>

<p>I posted this article to reddit and people kindly pointed out to me that the borrow cant split to individual fields because I was borrowing a <code class="language-plaintext highlighter-rouge">MutexGuard</code> of <code class="language-plaintext highlighter-rouge">view</code> and that access to the fields was going through the <code class="language-plaintext highlighter-rouge">DerefMut</code> trait. This simple line resolves my problem with no need for any other changes.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// now we have a mutable reference to view and not a MutexGuard</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="o">*</span><span class="n">v</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
</code></pre></div></div>

<p>I can make excuses but ultimately I should’ve checked in more detail what <code class="language-plaintext highlighter-rouge">view</code> actually was a reference to. In my defence this code was inside an attribute macro and the rust analyser here wasn’t giving me any type hints, which in rust I find very useful and necessary. Additionally the <code class="language-plaintext highlighter-rouge">DerefMut</code> trait also abstracts this behaviour so to me it just looked like a reference to a view. I do feel foolish about this but hopefully the sentiment of this article still rings true. A bad decision in code of the past pops up at an inopportune moment and clouded my judgement on possible solutions. The other ideas in this post have still been useful in other scenarios, but an important step is to always double check what you are working with and what you think you are working with and not rush into any further bad decisions.</p>

<h3 id="clone">Clone</h3>

<p>If you can’t easily derive default on a struct there are some other options. If the struct is clonable or you can easily derive a clone, you can clone to achieve a similar effect.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Clone)]</span>
<span class="k">struct</span> <span class="n">Cmd</span><span class="p">;</span>

<span class="c1">// ..</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// clone cmd</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">cmd</span> <span class="o">=</span> <span class="n">view</span><span class="py">.cmd</span><span class="nf">.clone</span><span class="p">();</span>

    <span class="c1">// the immutable and mutable references are now split</span>
    <span class="n">cmd</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Cloning might be considered a heavier operation than ‘take’ depending on the circumstances, but this method has the same benefit as the take version whereby unaffected code that is using <code class="language-plaintext highlighter-rouge">cmd</code> elsewhere doesn’t need to be changed.</p>

<h3 id="refcell">RefCell</h3>

<p>Another approach would be to use <code class="language-plaintext highlighter-rouge">RefCell</code> this allows for interior mutability and again we do not need to worry about default or clone.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">cell</span><span class="p">::</span><span class="n">RefCell</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">Cmd</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">View</span> <span class="p">{</span>
    <span class="n">cmd</span><span class="p">:</span> <span class="n">RefCell</span><span class="o">&lt;</span><span class="n">Cmd</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">param</span><span class="p">:</span> <span class="n">Param</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// borrow ref cell</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">cmd</span> <span class="o">=</span> <span class="n">view</span><span class="py">.cmd</span><span class="nf">.borrow_mut</span><span class="p">();</span>

    <span class="c1">// the immutable and mutable references are now split</span>
    <span class="n">cmd</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>
<span class="p">}</span>

</code></pre></div></div>

<h3 id="option-takeswap">Option (Take/Swap)</h3>

<p>We also need to update any code that ever used <code class="language-plaintext highlighter-rouge">view.cmd</code> and do the same. Not ideal but it allows us to get around the need for a default or clone. I have had to resort to this in other places in the code base.</p>

<p>There are more options; quite literally <code class="language-plaintext highlighter-rouge">Option</code> here can help. If we make <code class="language-plaintext highlighter-rouge">cmd</code> an <code class="language-plaintext highlighter-rouge">Option&lt;CmdBuf&gt;</code> then this gives us the ability to use <code class="language-plaintext highlighter-rouge">None</code> as the default and we can use the <code class="language-plaintext highlighter-rouge">std::mem::take</code> approach. We can also use <code class="language-plaintext highlighter-rouge">std::mem::swap</code> and swap with <code class="language-plaintext highlighter-rouge">None</code>. Swapping works similar to ‘take’, where we take mem and swap with the default.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Cmd</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">View</span> <span class="p">{</span>
    <span class="n">cmd</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">Cmd</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">param</span><span class="p">:</span> <span class="n">Param</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// clone cmd</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">cmd</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">take</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">view</span><span class="py">.cmd</span><span class="p">);</span>

    <span class="c1">// the immutable and mutable references are now split</span>
    <span class="n">cmd</span><span class="nf">.as_mut</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>

    <span class="c1">// return the cmd to view</span>
    <span class="n">view</span><span class="py">.cmd</span> <span class="o">=</span> <span class="n">cmd</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">Option</code> approach also requires more effort as we need to now take a reference and unwrap the option and update any code that ever used <code class="language-plaintext highlighter-rouge">view.cmd</code> to do the same. Not ideal, but it allows us to get around the need for a default or clone, and if your type is already optional then this will fit easily.</p>

<h3 id="interior-mutability">Interior Mutability</h3>

<p>There is one final approach that could save a lot of time, and that would be to not change the <code class="language-plaintext highlighter-rouge">do_something</code> function at all in the first place. That is to keep it as <code class="language-plaintext highlighter-rouge">do_something(&amp;self, param: &amp;Param)</code>. So how do we mutate the interior state without requiring the self to be mutable?</p>

<p>This can be done with <code class="language-plaintext highlighter-rouge">RefCell</code> in single threaded code or <code class="language-plaintext highlighter-rouge">RWLock</code> in multithreaded code. Since we already looked at <code class="language-plaintext highlighter-rouge">RefCell</code> I will do an example of <code class="language-plaintext highlighter-rouge">RWLock</code>.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Cmd</span> <span class="p">{</span>
    <span class="n">interior</span><span class="p">:</span> <span class="nb">Arc</span><span class="o">&lt;</span><span class="n">RwLock</span><span class="o">&lt;</span><span class="nb">u32</span><span class="o">&gt;&gt;</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Cmd</span>
<span class="p">{</span>
    <span class="k">fn</span> <span class="nf">do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">param</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Param</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// we now mutate the interior, locking and writing in a thread say way</span>
        <span class="k">let</span> <span class="n">interior</span> <span class="o">=</span> <span class="k">self</span><span class="py">.interior</span><span class="nf">.try_write</span><span class="p">()</span><span class="nf">.and_then</span><span class="p">(|</span><span class="k">mut</span> <span class="n">interior</span><span class="p">|</span> <span class="p">{</span>
            <span class="o">*</span><span class="n">interior</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
            <span class="nf">Ok</span><span class="p">(())</span>
        <span class="p">});</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="nf">get_view</span><span class="p">();</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// code at the call site can stay the same as the original</span>
    <span class="n">view</span><span class="py">.cmd</span><span class="nf">.do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.param</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I decided to make the mutability explicit to the trait and that was based on how the command buffers are used in the engine, in other places I have taken other approaches favouring interior mutability. For this case a view can be dispatched in parallel with other views, but the engine is designed such that 1 thread per view and no work happens to a single view on multiple threads at the same time. Command buffers are submitted in a queue in order and dispatched on the GPU.</p>

<p>Here it made sense to me to avoid locking interior mutability for each time we call a method on a <code class="language-plaintext highlighter-rouge">CmdBuf</code> and it works with the engine’s design. We lock a view at the start of a render thread, fill it with commands and then hand it back to the graphics engineer for submission to the GPU. The usage is explicit, we just needed to appease the borrow checker!</p>

<p>I hope you enjoyed this article, please check out my <a href="https://www.youtube.com/@polymonster">YouTube channel</a> for more videos or more articles on my blog, let me know what you think and if you have any other strategies or approaches I would love to hear about them. I would also like to hear about compiler and borrow checker errors you find particularly time consuming or frustrating to deal with.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[A deep dive into Rust's “cannot borrow as immutable because it is also borrowed as mutable” error, with patterns and strategies to resolve it during refactoring.]]></summary></entry><entry><title type="html">diig - A music discovery app for record diggers</title><link href="https://polymonster.co.uk/blog/diig" rel="alternate" type="text/html" title="diig - A music discovery app for record diggers" /><published>2025-10-12T00:00:00+00:00</published><updated>2025-10-12T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/diig</id><content type="html" xml:base="https://polymonster.co.uk/blog/diig"><![CDATA[<p>diig is a music discovery app and the beginnings of a music platform that I started working on a few years ago. In that time I have been using the app myself, and so have a few friends, but I haven’t really announced much about it so I thought I would get some words down about the project. The name diig comes from the term crate digger which is given to record collectors who dig though vast quantities of records to find hidden gems.</p>

<p>The idea came from a frustration in my user experience of online record stores. Online record shops have audio snippets of the records they sell so you can browse and listen before you buy, obviously if you want to buy music it is usually best to know what it sounds like first (although I have been known to buy blindly, especially if it has a really cool sleeve or artwork!). The problem with these websites is that their music players are not perfect, consistency across different stores is variable, the desktop versions of websites usually perform much better than the mobile ones, and I just found the general UX of listening to snippets while browsing online just never felt how I wanted it to.</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/orp7-q3D72I?si=REl_WjaL8ga7iE59" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p>I am mainly interested in buying physical vinyl records. The main goal of digging is to go through a lot of music as quickly as possible to find obscure things that your friends might not know, buying in physical shops is nice especially if you have a store where they let you listen before you buy and the responsiveness of dropping a needle into a groove, skipping through tracks is really nice. But in the modern world with so much music being released all the time, online shopping is still a crucial part of record collecting for me and I have to buy some stuff online I won’t find anywhere else. The two can also compliment one another - it’s nice to have an idea of stuff you like in advance of visiting a physical store.</p>

<p>For the first part of the project my focus has been on a mobile app and I wanted to optimise that experience. So what constitutes feeling nice? And what am I trying to optimize for here? I found online record store players to be quite laggy with a good deal of latency between pressing play on a track and actually hearing the audio, we are talking fine margins here but if you want to listen through something like 400 snippets of audio in 10 minutes then the latency adds up. I also found the UI just not great for mobile, having to click instead of swipe just doesn’t help the experience and makes it feel clunky.</p>

<p>There was one place that I found to be really nice in terms of UX for browsing music snippets and this was on Instagram. Record labels would put up new releases and you could swipe right to go through the individual tracks. The problem is that the Instagram algorithm pollutes everything and you can’t curate a purely music only feed. So with this idea in mind, I was sure I could implement something similar to provide myself a niche app tailored perfectly to my use case.</p>

<p>Having worked on game engines, games and low level high performance systems I knew I had the skills to make the app. I also had the added boost and insights from the previous company I worked at where we made the live action branching narrative game Erica, which required low latency video and audio playback. Here I was dealing with streaming, buffering, and decoding audio and high definition video… I thought to myself just having to play some compressed mp3 snippets is going to be easy!</p>

<p>It was in fact easy, the iOS app itself was up and running in a couple of weekends. I also “rawdogged” pretty much all of this code, no LLM and no Copilot. I had a head start because I used my game engine <a href="https://github.com/polymonster/pmtech">pmtech</a> to do all the graphics, os and low level stuff. The app is mostly C style C++ with some objective-C for iOS and I built the UI in ImGui. To get data into the app I am scraping info from my favourite record shops. The <a href="https://github.com/polymonster/diig/tree/main/scrape">scrapers</a> are written in python and use simple ad hoc parsing code which just extracts information about releases, mp3 and image links into a simple unified schema. The releases are uploaded to a firebase database where they can be fetched from the app and you can log in as a user to store your own likes and sync over different devices. Scrapers run nightly on a <a href="https://github.com/polymonster/diig/actions">GitHub action</a> and once a release has been populated on the initial scrape their availability is the only thing that needs updating, that is if it is available for preorder, available to buy in stock, or out of stock. The scrapers also track position information so that you can view things in a chart format as the record stores usually have a chart for each genre or category. I have an automated GitHub action which can be used to push updates to the app to <a href="https://github.com/polymonster/diig/actions/workflows/release_testflight.yml">iOS via TestFlight</a>, but the app doesn’t need updating often and the data it pulls is all stored and updated in the cloud.</p>

<p>And that is about it, the main app and scraper ecosystem has been up and running for a few years and I have been using diig to help me browse for and discover new music. I recently added a new scraper to the project and put some videos on <a href="https://www.youtube.com/playlist?list=PLReR5EQ5ED7Oca7bp3Gv9S3vb4lYcDKZc">YouTube</a> about that process in more detail. I plan on continuing work on this project and now have some ideas for more components to the diig platform. I would like to add Android support and also a web front end that can provide a different user experience. All of this stuff is currently in closed beta, if you’re interested in trying it out then please contact me. If you want to contribute the code is available on <a href="https://github.com/polymonster/diig">GitHub</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[An introduction to diig, a music discovery iOS app built from scratch for vinyl record collectors to quickly browse and preview audio from online record stores.]]></summary></entry><entry><title type="html">Maintaining CI is a pain in the…</title><link href="https://polymonster.co.uk/blog/ci-pain" rel="alternate" type="text/html" title="Maintaining CI is a pain in the…" /><published>2025-03-04T00:00:00+00:00</published><updated>2025-03-04T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/ci-pain</id><content type="html" xml:base="https://polymonster.co.uk/blog/ci-pain"><![CDATA[<p>An ongoing source of frustration is maintaining continuous integration in open source hobby projects. It’s really useful to have continuous builds, automated tests and package delivery, but it comes with maintenance. Time will pass and the time will come where I want to tag a build in git and let all my lovely automated CI publish a package, or maybe work on a project I haven’t touched for a while and I want to run the tests, but for what feels like more often than not, the build fails for an unexpected reason.</p>

<p>The problem is that even if very little changes in the source code the CI often fails for various reasons out of your control. It takes a while to get back into the headspace of how the build is configured and start debugging a problem. It’s really annoying when you just want to spend time working on something new and fun and are now sweating on what was supposed to be a relaxing Saturday morning, trying to fix tests and areas of the code that you didn’t intend on looking at. You end up with the “fix CI” commit history of death as you push changes and wait to see the results on a cloud hosted runner.</p>

<p>There are various reasons as to why this happens. I’ve just gone through a frustrating ordeal with updating my iOS distribution certificates that expired recently and so prevented me from publishing a new build of my iOS app <a href="https://github.com/polymonster/diig">diig</a>. The app beta expired so I stopped being able to use it; this happens every 60 days and I havent had to make any changes to the app itself for a while so 60 days expires and I have to push a new build. I haven’t released the app to the AppStore to make it publicly available because it’s something I’m just using personally, the 60 day limit in itself is annoying but having to do the yearly certificate and provisioning profile update is even more so. I always forget all of the things you need, so for my future self here is the rough run down:</p>

<p>First you need a development certificate and a distribution certificate, you can create new certificates on the Apple Developer website in the Certificates Identifiers &amp; Profiles section. You need to create a certificate signing request which can be done through Keychain Access &gt; Certificate Assistant &gt; Request a Certificate from a Certificate Authority.</p>

<p>The certificates (.cer files) can be downloaded and then imported into the keychain and then exported as a .p12 file with a password. The password here is stored in GitHub Actions as a secret. The .p12 files can be encoded as base64: <code class="language-plaintext highlighter-rouge">base64 -i dist.p12</code>. The output in the console is copied into another secret. Here I have something along the lines of DEV_P12 and DIST_P12.</p>

<p>Next, a provisioning profile is required for both development and distribution that can be generated from the Certificates Identifiers &amp; Profiles section as well. I created an iOS development profile and selected the development certificate, same for the iOS distribution profile.</p>

<p>The profiles are added to the repository (they should probably also be secret, but this was how the build was already set up) and copied into the <code class="language-plaintext highlighter-rouge">~/Library/MobileDevice/Provisioning Profiles</code> folder on the build agent.</p>

<p>Finally, everything should build because the actions <a href="https://github.com/polymonster/diig/blob/main/.github/workflows/release_testflight.yml">yml</a> file does the file copying, the base64 decoding and all of that jazz. But I was wrong, the build was still failing. The error was that Xcode did not have a valid provisioning profile. OK, then maybe something was up with the certs or the profiles, I revoked them, generated them again and was extra careful about them making sure the right cert was named the right thing, the pasted secrets didn’t have any extraneous characters or mistakes. Try building again, same error. Maybe just redo the certs and profiles again? Just to be sure. Still the same problem!</p>

<p>At this point tagging builds (burned 5 tags) and pushing, waiting for the dreaded CI failure was annoying. So I decided to see what I could do locally on my machine to reproduce the issue more rapidly. The problem with this is that the keychain has working provisioning profiles that are managed by Xcode so I am able to build locally and that was why I didn’t try this sooner. I need to build on an external machine that has no such user account connected to Xcode.</p>

<p>I realised I was able to look in the <code class="language-plaintext highlighter-rouge">~/Library/MobileDevice/Provisioning Profiles</code> folder and see the older stale profiles (from the last time I set this up). Ahh, I can delete those ones and see if I can reproduce the issue using the archive command line:</p>

<p><code class="language-plaintext highlighter-rouge">xcodebuild archive -workspace build/ios/diig_ios.xcworkspace -configuration Release -scheme diig -archivePath build/ios/diig_ios OTHER_CODE_SIGN_FLAGS="--keychain $KEYCHAIN_PATH" PROVISIONING_PROFILE="digiosdev" CODE_SIGN_STYLE="Manual" -verbose</code></p>

<p>Error: Xcode requires a valid provisioning profile.</p>

<p>But the profile digiosdev is clearly there in the folder so why does Xcode complain there is not a provisioning profile? Copilot was able to help me here and it suggested using PROVISIONING_PROFILE_SPECIFIER instead of PROVISIONING_PROFILE.</p>

<p>Problem solved. This took me a few hours on a Saturday morning before leaving to meet friends and then a further few hours the next day to fiddle around and get the build working again. I did all the certificate and provisioning profile stuff correctly the first time and it’s annoying that for some reason since updating the profile PROVISIONING_PROFILE_SPECIFIER was necessary for it to be picked up, maybe it could be due to an Xcode update? Apple has a tendency to change things a lot, deprecate APIs, make changes to signing and distribution, it’s painful to keep up at times.</p>

<p>But herein lies the crux of this all, even if you don’t change a thing yourself the world around you can change and that can cause build systems to suddenly fail.</p>

<p>This has happened to me countless times. Python environment setup has changed multiple times on different platforms and for different projects over the years. Things which cause my <code class="language-plaintext highlighter-rouge">pip</code> setup to fail so I hack around to find the working one. Could it be <code class="language-plaintext highlighter-rouge">pip3</code> or <code class="language-plaintext highlighter-rouge">python3 -m pip</code> or <code class="language-plaintext highlighter-rouge">py -3 -m pip</code>, maybe using brew to install Python instead. I don’t know, just hack until it works again.</p>

<p>Android builds on Linux have been another immense pain in my <a href="https://github.com/polymonster/premake-android-studio">android-studio</a> project. Android also has a tendency to change a lot, you have a lot that goes into it: SDK, NDK, Gradle, Kotlin, Java, CMake and Ninja and even more build systems in there, all of these changing over time cause headaches, especially if you haven’t touched the thing for a year or something and somebody comes along with a small PR and all of a sudden the CI is broken. At one time I had to forcibly downgrade the Java version on the actions runner because it caused a known crash in the Android studio licensing agreement, this fixed it for a while but then the Java version I needed became unavailable to GitHub actions and I had to upgrade and find other fixes… thanks also to PR contributors on helping to maintain the CI on that project.</p>

<p>Another frustrating session of CI fixing came in my Rust graphics engine <a href="https://github.com/polymonster/hotline">hotline</a>. A Rust compiler update in conjunction with a particular <a href="https://docs.rs/bevy_ecs/latest/bevy_ecs/">bevy_ecs</a> version started to cause a hard to diagnose crash in my tests. It only happened in the tests I couldn’t reproduce in a standalone build and also couldn’t reproduce in a single test. It was only when all tests ran (in single threaded) and eventually one would crash. I spent weeks on this, only half an hour or so after work but chipping away at it, trying to debug it and make sense of what was going on. I had particular difficulty because I had no symbols or callstack, I rolled back my code to a known working version where the tests passed and it was published to crates.io and it was still crashing. In the end the fix was to update bevy_ecs, which sounds straight forward but it took me a while to attribute it to bevy_ecs and updating required me to fix API breaking changes in my code, it was not simply a case of a version bump. Frustrating to spend a few weeks trying to fix these tests for an unrelated reason to what I wanted them to be used for, to help me implement new features without breaking existing functionality.</p>

<p>Another perplexing issue with a Rust project was when it began to fail compilation, even though no changes to code were made. The reason was that an external dependency had an updated version, this particular crate had been <a href="https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html">patched</a> and patching only applies to a specific version of a crate. Since the version changed the patch was not applied and the unpatched version did not compile. This is where I discovered about <a href="https://doc.rust-lang.org/cargo/reference/resolver.html">explicit versioning</a> in cargo and how even with a full version specifier cargo may try to change or update a dependency version to make the best fit within the cargo tree. In this case the solution was to commit the cargo lock file and user cargo build —frozen to make the CI more stable. Easy fix but unexpected symptoms always cause alarm at first.</p>

<p>Some conclusions I can draw from these scenarios: I could run the CI periodically so issues get caught sooner and not just when I am making changes, but that would cause the same kind of frustration and might even be worse for a hobby project where I would be alerted CI now is broken and now I know I have to fix it at some point, it might detract from other projects. Using custom docker images would help to lock down the versions of software running the builds, I don’t know much about that though so it’s something to look into. Cargo.lock proved a good solution to enforce stable versioning in Rust projects. At least there are some solutions to help improve reliability, but they don’t help with the issue such as PROVISIONING_PROFILE_SPECIFIER it threw me off totally, I was so close the first time to getting it right and this completely screwed me, macOS is constantly updating, forcing you to update Xcode and forcing you to face and fix these problems head on.</p>

<p>Maintaining CI is a pain in the proverbial. In a production environment, for a job and with a team it’s a necessity and you generally have better coverage, but for a solo project it’s great to have but a burden to maintain. Even if nothing changes on your side the world changes around us,  sometimes you just gotta suck it up and fix it.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[The ongoing frustrations of maintaining CI pipelines for open source hobby projects: expired iOS certificates, dependency drift, and the "fix CI" commit spiral.]]></summary></entry><entry><title type="html">A Haiku About Debugging and the Perception of Productivity</title><link href="https://polymonster.co.uk/blog/debugging-haiku" rel="alternate" type="text/html" title="A Haiku About Debugging and the Perception of Productivity" /><published>2025-02-07T00:00:00+00:00</published><updated>2025-02-07T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/debugging-haiku</id><content type="html" xml:base="https://polymonster.co.uk/blog/debugging-haiku"><![CDATA[<p>Often a small fix<br />
Requires thorough debugging<br />
Its trace left unseen<br /></p>

<p>I wrote an almost-haiku in a Slack message to a coworker as we approached the end of the sprint. It came after spending the better part of two days debugging a difficult problem, which ultimately led to the addition of just two characters to a C++ source file to fix the issue. Along the way, I actually wrote a significant amount of ephemeral code across a data pipeline executable, a graphics runtime executable, and shader code.
Earlier in my career, I struggled with deleting this kind of transient code. I often felt it might be worth keeping, just in case a similar issue arose in the future. I didn’t want my efforts to go to waste. But over time, I’ve learned to let it go. Debugging isn’t just about the code that remains; it’s also about the effort invested in understanding the problem. It includes all the temporary print statements, UI widgets, debug primitives, and countless other tools hastily written and discarded.</p>

<p>Then there’s the time spent in the debugger; stepping through execution, analysing hex values, copying and pasting into notepads, diffing outputs, crafting heroic watch expressions, or tracing obscure memory aliasing issues. All of this work requires experience and patience.
I was fortunate to learn from engineers who were magicians at hardcore debugging, and they passed their wisdom on to me. In turn, I passed it down to the next generation. But still, a part of me feels the need to justify the work I’ve done, because some things are difficult to measure.
When you finally fix a one-liner, it seems obvious in hindsight. It’s frustrating to think you didn’t find the solution sooner. Sometimes you were close, only for another clue to send you down the wrong tangent. Other times, you feel like a badass, pulling off some low-level trickery, only to realise it had nothing to do with the actual problem, and now it just feels like time wasted, showing off to yourself.
Among respected colleagues and peers, we can talk about these endeavours. We learn, laugh, and congratulate one another. That’s never been an issue because we understand what it’s like. But often, there are people outside our world who don’t.</p>

<p>They want time logged. They want burn-down charts. They want to know why something took so long. They ask, “Why did you say this was a small t-shirt size?” (or whatever nonsense they think helps quantify complexity).</p>

<p>This isn’t meant as a dig at those people — they’re often just asking a question, not accusing anyone of wasting time. But for me, it triggers something. It makes me overthink, constantly trying to justify the effort. I suspect this comes from a deeply ingrained attitude toward work, conditioned by society:</p>

<p>That work means sitting at a desk from 9 to 5. That you must be physically present to be productive. That you must return to the office, not because it’s better, but because we don’t like the idea of you doing your washing at home during the workday. That, deep down, we are still mill workers.</p>

<p>Ironically, I don’t even have a return-to-office mandate. I’m not required to go in at all. But I still feel it. It’s been subliminally drilled into me since childhood: showing up is the job. It doesn’t account for the times I’ve solved a bug in my sleep. Or that time on the tube, when I mentally fixed an atomic race condition in a shader, then got to my desk and wrote a few lines of code to solve it, before slowly shifting into the headspace of the next challenge.</p>

<p>This haiku is my reminder - maybe even the start of reconditioning myself. To anyone else reading this: the unseen work, while ephemeral, is often the most important.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Reflections on the invisible work of debugging: how a two-day investigation producing just two changed characters is still deeply valuable, and why ephemeral code matters.]]></summary></entry><entry><title type="html">printf debugging is OK</title><link href="https://polymonster.co.uk/blog/printf-debugging-is-ok" rel="alternate" type="text/html" title="printf debugging is OK" /><published>2024-05-06T00:00:00+00:00</published><updated>2024-05-06T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/printf-debugging-is-ok</id><content type="html" xml:base="https://polymonster.co.uk/blog/printf-debugging-is-ok"><![CDATA[<p>I stopped going on Twitter a while ago because it has the tendency to evoke rage, as it is designed to do. But every now and then I check back in - it can be useful sometimes for keeping up with graphics research, gamedev news and some people do post nice things, like sharing projects they are working on, so there is something to pull me back from time to time.</p>

<p>After checking the other day I saw this debate going around about not using an IDE or debugger, just using ‘notepad’ to write code. I looked in the comments, people arguing about who was right and all the usual toxic vibes, and it reminded me of some earlier occasions of people discussing the same topic.</p>

<p>It feels like the same old debate has been going on for a long time now, it’s packaged differently each time, but I don’t really know why people get so wound up about things. The main arguments are “if you need to use a debugger you’re an idiot and you don’t understand the code you are writing” (that’s not an actual quote but there was a similar take along those lines). Then there is “If you can’t use a debugger you’re an idiot”. The hating on the ‘printf’ crew is omnipresent.</p>

<p>At the risk of poking a hornet’s nest, I just wanted to share some thoughts and ideas on this subject in a balanced way, because I don’t think there needs to be an ultimate solution here. We need to debug code and there are tools out there to help us, some are more useful than others in certain situations, but at the end of the day do whatever you need to do to fix those bugs.</p>

<h2 id="debuggers">Debuggers</h2>

<p>I use a debugger regularly, I will launch most work in C++ from Visual Studio or Xcode and preferably run in a debug build. I know for some people this is often a terrible UX because of the performance of debug builds, so a prerequisite here is fast debug builds. This is hard to retrofit but having a usable debug build is useful. Once running I can use the debugger break and step if I need to, and if I encounter a crash then there is a nice call stack I can look through in more detail.</p>

<p>I have noticed that it is extremely common for graduate and junior software engineers to have little to no debugging knowledge or experience. It’s not something that seems like it is taught at university and I have also been told stories of teachers imposing their usage of VIM and esoteric debugging strategies upon the students. For the record I am not a VIM user (another topic that ends up in polarising debates). I find using a mouse and 2 finger typing works for me.</p>

<p>The moment when you show someone how to use a hardware breakpoint or a watchpoint and find a bug immediately is like seeing the lightbulb appear on top of their head, a whole world of possibility opening in front of them, or the dismay of the wasted hours trying to catch some dodgy logic through layers and layers of object oriented spaghetti.</p>

<p>Some of those argue about using only ‘notepad’ and no debugger because they can dry run their code on paper and they “don’t write bugs”, but I find it difficult to understand how they work within a larger team project or codebase . A lot of bugs and issues I have ever had to fix were not in code I wrote myself, they were in legacy systems, colleague’s code, or in open source code (and some hard as nails bugs to track too!) that had been just lifted into a project. If you believe in the impending AI coding apocalypse then human engineers may merely be around to debug and fix issues with AI generated code. So yeah, being able to write perfect code yourself is one thing, but using a debugger to debug existing code in a large complex project shouldn’t be a thing of shame and we might need all the tools we can to help.</p>

<p>Along with debuggers we get all sorts of other tools, which also should be used as and when we need them. Address sanitizer can catch memory issues easily, where in a bygone era we would have this 1 in 1000 crash somewhere reading outside of an array bounds, we can enable ASan and catch this every time without the undefined behaviour lottery. Same for undefined behaviour sanitizer, now we can catch UB when it’s benign and not only when a noticeable side effect occurs.</p>

<p>I don’t know if these notepad-only coders are taking all of those tools off the table as well, but when you have something like ASan that can catch an issue for you I just don’t really know why you wouldn’t use it. I have seen a lot of comments that seem to suggest the debugger slows them down, but in this case I certainly think the debugger speeds you up.</p>

<p>So if you’re reading this and you don’t know about these tools I would say take a look and see, they can be useful and might be able to save you a lot of time. There are tons of things you can do and it’s hard to cover it all here. I learned a lot from working with other people and side by side debugging difficult problems. I think there should be more resources to teach these skills instead of it being handed down information.</p>

<h2 id="printf-debugging-is-ok">Printf debugging is OK</h2>

<p>So for the ‘printf’ haters I would also say that whilst using a debugger most of the time, sometimes I revert to ‘printf’ debugging. There are some situations where there is no other choice - in the past I have had to debug release builds where we were unable to reproduce the bug in debug. Even pulling in debug modules for the engine (for on screen debug info) changed the executable such that we couldn’t reproduce the issue. The last thing was to put a few print statements in using the raw ‘printf’ and removing them and adding more as we narrowed down the issue and eventually extracted enough information to fix the problem.</p>

<p>I have also had the need to use ‘printf’ when debugging certain kinds of behaviours in an application. In the case of something like touch event tracking for mobile devices, if you try to debug an issue with breakpoints you interrupt the hardware and it makes it difficult to reproduce issues in the same way they appear naturally. So here printing the state of touch down events, touch up events, and being able to see the logical flow can identify a problem. There are many more scenarios that benefit from this type of debugging. Just throw the prints in and make sure to remove them after so no one knew you were ever there, like a ninja.</p>

<h2 id="custom-tools">Custom tools</h2>

<p>Custom UI based debugging tools can go one step further than printf debugging, providing some similar traits but also allowing more flexibility and controllability, I assume the notepad wielders who don’t use a regular debugger must have some such custom tools and things to help them track down issues. I am a big fan of embedded debugging and profiling tools within an application. You know stuff like performance counters that I can just pop-up in a UI or tweakable values to help to refine behaviours or visual appearance. I find that since the explosion of ImGui the level of integrated ad-hoc debugging tools and info has exponentially increased.</p>

<p>But with these kinds of custom tools, I personally wouldn’t try and re-invent the wheel. I would aim to make stuff that complements the existing tools I can pull off the shelf. So for example I like to have a quick, at a glance profiler for all my key performance hotspots that I can check whenever I notice something. But for more in-depth profiling I would use a CPU or GPU profiler to dig deeper.</p>

<h2 id="just-doing-what-needs-to-be-done">Just doing what needs to be done</h2>

<p>At the end of the day, finding bugs is just something that we need to get done, whatever helps you find and fix the issue doesn’t bother me as long as we get the job done. On a closing note, I noticed some code in a pull request left in by accident by another person:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span><span class="p">(</span><span class="n">some_condition</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I found this interesting. I do the same thing except I usually name my variable ‘a’. This is to insert some code where a breakpoint can be put on the ‘int x’ line and then it kind of acts like a conditional breakpoint when some_condition is true. You could use a conditional breakpoint within the debugger, but they can be slow and for me historically unreliable, but this little snippet gives you your own conditional breakpoint that works without fail.</p>

<p>Just make sure to remove the code before the PR next time!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[A balanced take on the printf vs debugger debate: why there is no single right answer, and how to choose the right debugging approach for the situation at hand.]]></summary></entry><entry><title type="html">Building a new graphics engine in Rust - Part 4</title><link href="https://polymonster.co.uk/blog/building-new-engine-4" rel="alternate" type="text/html" title="Building a new graphics engine in Rust - Part 4" /><published>2023-04-29T00:00:00+00:00</published><updated>2023-04-29T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/building-new_engine-4</id><content type="html" xml:base="https://polymonster.co.uk/blog/building-new-engine-4"><![CDATA[<p>Work has been continuing smoothly on my Rust graphics engine project <a href="https://github.com/polymonster/hotline">hotline</a> over the last month or so. I was slowly winding down from my current day job and have a little time off before starting a new role, so that has given me more time to dedicate to this project. I have been focusing on implementing different graphics demos and rendering techniques, which has thrown up a few missing pieces in the <code class="language-plaintext highlighter-rouge">gfx</code> backend and I am keen to get the API as complete as possible, because I am unsure of how much time I will have to work on it when I start my new role or even the validity of working on code in the public domain.</p>

<h2 id="tests">Tests</h2>

<p>I started out the project building unit tests of graphics functionality and had those hooked up to run locally or on a self-hosted GitHub Actions runner. As the project progressed I encountered some issues with the tests crashing or being unable to run when launched within the plugin environment. I have a nice system of being able to switch between <code class="language-plaintext highlighter-rouge">demos</code> or in their current form they serve more as unit tests or examples, so I had been able to quickly, yet manually, run through the different examples to check things were all in good shape after making changes or refactoring. But still, automation is better and I was missing that support and comfort it can bring. I needed to resolve a crash inside my <code class="language-plaintext highlighter-rouge">imgui</code> backend where font glyph ranges passed to <code class="language-plaintext highlighter-rouge">cimgui</code> were actually pointing to dropped memory - this issue never seemed to crop up in debug or release builds and only in the tests, so it went undiagnosed for a while. The fix for this was fairly straightforward; just ensuring the memory remained in scope for when it was used.</p>

<p>Another issue with test running was that only one application can lock / use a dynamic library at a time, otherwise <code class="language-plaintext highlighter-rouge">libloading</code> will panic. Rust tests are launched and run asynchronously over multiple threads, so for the time being I have to run with <code class="language-plaintext highlighter-rouge">-- -test-threads=1</code>. This also helps with graphics related code because spawning 36+ Direct3D12 devices simultaneously is not possible and causes some of the tests to panic early on - failure to create a device is just a hard panic and there is no error handling. I suppose in this case I could wait and re-try and have some system there to at least allow some multithreading, but for the time being I am happy with the setup.</p>

<p>I also added support for each of the tests to take a grab of the backbuffer and write it to disk. I am not doing any kind of image detection on the images to automate pass or failure (currently if the test doesn’t panic or crash is enough to succeed) but having the images is a nice way to just glance and manually verify that everything looks correct.</p>

<p><img src="https://raw.githubusercontent.com/polymonster/polymonster.github.io/master/images/hotline/example_thumbnails.png" width="100%" /></p>

<p>Having these tests in place makes making changes or refactoring the lower level API’s easier and allows me to move quickly and confidently, which is exactly what I needed. They also run on the CI every time a commit is pushed and that helps to catch regressions where the shader data and the code become out of sync.</p>

<h2 id="hotline-data">Hotline Data</h2>

<p>With the tests in place I have made a few refactors and additions to the <code class="language-plaintext highlighter-rouge">gfx</code> API backend and I used the tests to aid this process. In my previous post I mentioned that I created a separate data repository to keep the main repository size down because crates.io has a 10mb limit. I originally created the <a href="https://github.com/polymonster/hotline-data">hotline-data</a> repository and used <code class="language-plaintext highlighter-rouge">cargo</code> to clone and update it on a build, so that the examples would still work whether using them from crates.io or GitHub. I actually decided against this and opted that the examples would work when using the repository direct from GitHub, and that if you decide to use the library from crates.io then you would configure data yourself for your own project. This subtle change enabled me to use <code class="language-plaintext highlighter-rouge">hotline-data</code> as a submodule and as a result that makes it easier to keep the data and the main repository in sync.</p>

<p>In the process of adding new graphics features I have had to make additions and changes to <a href="https://github.com/polymonster/pmfx-shader">pmfx-shader</a>. It comes bundled as a binary with the <code class="language-plaintext highlighter-rouge">hotline-data</code> repository, but while developing I switch to a development version which is actually written in python. Because things are moving quickly I have been frequently encountering new issues and switching to development mode. Now with the submodules and the tests this is helping to catch cases where I push to the repository with <code class="language-plaintext highlighter-rouge">pmfx-shader</code> in dev mode, so that I can quickly fix it and keep the repository in a state where it is buildable for new users at all times.</p>

<h2 id="dropping-gpu-resources">Dropping GPU Resources</h2>

<p>I have previously mentioned challenges involving memory lifetime management between Rust and in-flight GPU resources, but I recently decided to bite the bullet and start handling these issues in the <code class="language-plaintext highlighter-rouge">Drop</code> trait for <code class="language-plaintext highlighter-rouge">gfx::Texture</code> and <code class="language-plaintext highlighter-rouge">gfx::Buffer</code>. I originally wanted to steer clear of this because it creates a dependency from a resource type to a <code class="language-plaintext highlighter-rouge">gfx::Heap</code> or a <code class="language-plaintext highlighter-rouge">gfx::Device</code> and then that in turn also throws in multithreaded considerations. I wanted to keep the low-level backend as simple and as dumb as possible, however from a user facing point of view it’s just too easy to run into serious problems such as a GPU hang / device removal (due to dropping an in-flight resource) or leaking views in heaps. This is because dropping a resource is very easy in Rust, you can simply allow a <code class="language-plaintext highlighter-rouge">gfx::Texture</code> or <code class="language-plaintext highlighter-rouge">gfx::Buffer</code> to go out of scope or assign a new one to a mutable variable.</p>

<p>The problem of dropping GPU resources started to rear its ugly head and force my hand as I started setting up some more complicated examples that were loading many textures. When switching between demos, textures were dropped as the <code class="language-plaintext highlighter-rouge">bevy_ecs</code> world was reset to default, but the associated shader resource views were not de-allocated from the shader heap. I also had issues with stretchy buffer types that resize like a vector for pushing in debug draw lines, or for light data and draw data in the bindless setup. When resizing, the previous smaller buffer would still be in-flight on the GPU and just dropping in place would lead to undefined behaviour. So this is where I really re-evaluated my thinking, just coming from a user perspective it’s a lot to think about and easy to fall into the trap.</p>

<p>In order to handle this I decided to just go for the full blown <code class="language-plaintext highlighter-rouge">Arc&lt;Mutex&gt;</code> wrapped around a <code class="language-plaintext highlighter-rouge">DropList</code> inside a <code class="language-plaintext highlighter-rouge">gfx::Heap</code>. All resources upon creation are assigned an <code class="language-plaintext highlighter-rouge">Arc&lt;Mutex&lt;DropList&gt;&gt;</code> to their respective heap and inside the <code class="language-plaintext highlighter-rouge">Drop</code> trait their resource views are added to the <code class="language-plaintext highlighter-rouge">DropList</code>. In future I would like to consider a lockless approach, but as I have done for the rest of the project I am focusing on stability first and the <code class="language-plaintext highlighter-rouge">Mutex</code> approach has worked well so far. In order to take ownership and add to the <code class="language-plaintext highlighter-rouge">DropList</code> the members inside a resource have now become <code class="language-plaintext highlighter-rouge">Option</code>’s so it’s possible to trivially <code class="language-plaintext highlighter-rouge">std::mem::swap</code> them, which is not great for the code elsewhere but it was a necessary change. I did try to just hack it and make a <code class="language-plaintext highlighter-rouge">null</code> version of an <code class="language-plaintext highlighter-rouge">ID3D12_RESOURCE</code> but this internally ends up causing a crash in <code class="language-plaintext highlighter-rouge">windows-rs</code> where a <code class="language-plaintext highlighter-rouge">v-table</code> is expected so the optional approach felt necessary. It adds some clutter in the backend, which was admittedly rushed, but that’s the price you pay for stability I suppose. When a resource is dropped the resource itself, any subresources (used for MSAA resolves), and any resource views are passed into the <code class="language-plaintext highlighter-rouge">DropList</code> owned by a heap:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Structure to track resources and resoure view allocations in `Drop` traits</span>
<span class="k">struct</span> <span class="n">DropResource</span> <span class="p">{</span>
    <span class="n">resources</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">ID3D12Resource</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">frame</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="n">heap_allocs</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">DropList</span> <span class="p">{</span>
    <span class="n">list</span><span class="p">:</span> <span class="n">Mutex</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">DropResource</span><span class="o">&gt;&gt;</span>
<span class="p">}</span>

<span class="cd">/// Thread safe ref counted drop-list that can be safely used in drop traits,</span>
<span class="cd">/// tracks the frame a resource was dropped on so it can be waited on</span>
<span class="k">type</span> <span class="n">DropListRef</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nb">Arc</span><span class="o">&lt;</span><span class="n">DropList</span><span class="o">&gt;</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">DropList</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nb">Arc</span><span class="o">&lt;</span><span class="n">DropList</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nn">Arc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">DropList</span> <span class="p">{</span>
            <span class="n">list</span><span class="p">:</span> <span class="nn">Mutex</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">())</span>
        <span class="p">})</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="cd">/// Drop trait for a texture resource</span>
<span class="k">impl</span> <span class="nb">Drop</span> <span class="k">for</span> <span class="n">Texture</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="cd">/// Compile time const allows this feature to be omitted</span>
        <span class="k">if</span> <span class="n">MANAGE_DROPS</span> <span class="p">{</span>
            <span class="c1">// only grab resources if we have a drop list, this allows the swap chain rtv</span>
            <span class="c1">// to manage itself</span>
            <span class="k">let</span> <span class="k">mut</span> <span class="n">res_vec</span> <span class="o">=</span> <span class="k">if</span> <span class="k">self</span><span class="py">.drop_list</span><span class="nf">.is_some</span><span class="p">()</span> <span class="p">{</span>
                <span class="c1">// swap out the resources for None</span>
                <span class="k">let</span> <span class="k">mut</span> <span class="n">res</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>
                <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">swap</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">res</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.resource</span><span class="p">);</span>
                <span class="k">let</span> <span class="k">mut</span> <span class="n">res_vec</span> <span class="o">=</span> <span class="nd">vec!</span><span class="p">[</span>
                    <span class="n">res</span><span class="nf">.unwrap</span><span class="p">()</span>
                <span class="p">];</span>
                <span class="k">if</span> <span class="k">self</span><span class="py">.resolved_resource</span><span class="nf">.is_some</span><span class="p">()</span> <span class="p">{</span>
                    <span class="k">let</span> <span class="k">mut</span> <span class="n">res</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>
                    <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">swap</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">res</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.resolved_resource</span><span class="p">);</span>
                    <span class="n">res_vec</span><span class="nf">.push</span><span class="p">(</span><span class="n">res</span><span class="nf">.unwrap</span><span class="p">());</span>
                <span class="p">}</span>
                <span class="n">res_vec</span>
            <span class="p">}</span>
            <span class="k">else</span> <span class="p">{</span>
                <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span>
            <span class="p">};</span>
            <span class="c1">// texture resource views</span>
            <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">drop_list</span><span class="p">)</span> <span class="o">=</span> <span class="o">&amp;</span><span class="k">self</span><span class="py">.drop_list</span> <span class="p">{</span>
                <span class="cd">/// Add resources to the drop list</span>
                <span class="k">let</span> <span class="k">mut</span> <span class="n">drop_list</span> <span class="o">=</span> <span class="n">drop_list</span><span class="py">.list</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
                <span class="k">let</span> <span class="k">mut</span> <span class="n">drop_res</span> <span class="o">=</span> <span class="n">DropResource</span> <span class="p">{</span>
                    <span class="n">resources</span><span class="p">:</span> <span class="n">res_vec</span><span class="nf">.to_vec</span><span class="p">(),</span>
                    <span class="n">frame</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
                    <span class="n">heap_allocs</span><span class="p">:</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span>
                <span class="p">};</span>
                <span class="n">res_vec</span><span class="nf">.clear</span><span class="p">();</span>

                <span class="cd">/// Add resource views to the drop list</span>
                <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">srv_index</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.srv_index</span> <span class="p">{</span>
                    <span class="n">drop_res</span><span class="py">.heap_allocs</span><span class="nf">.push</span><span class="p">(</span><span class="n">srv_index</span><span class="p">);</span>
                <span class="p">}</span>
                <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">uav_index</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.uav_index</span> <span class="p">{</span>
                    <span class="n">drop_res</span><span class="py">.heap_allocs</span><span class="nf">.push</span><span class="p">(</span><span class="n">uav_index</span><span class="p">);</span>
                <span class="p">}</span>
                <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">resolved_srv</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.resolved_srv_index</span> <span class="p">{</span>
                    <span class="n">drop_res</span><span class="py">.heap_allocs</span><span class="nf">.push</span><span class="p">(</span><span class="n">resolved_srv</span><span class="p">);</span>
                <span class="p">}</span>
                <span class="n">drop_list</span><span class="nf">.push</span><span class="p">(</span><span class="n">drop_res</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="cd">/// Since we now need to make resources `Options` and this adds extra baggage in the code elsewhere</span>

<span class="c1">//</span>
<span class="k">let</span> <span class="n">desc</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="n">target</span><span class="py">.resource</span><span class="nf">.as_ref</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.GetDesc</span><span class="p">()</span> <span class="p">};</span> <span class="c1">// as ref, unwrap</span>

<span class="c1">//  </span>
<span class="k">let</span> <span class="n">barrier</span> <span class="o">=</span> <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">tex</span><span class="p">)</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">barrier</span><span class="py">.texture</span> <span class="p">{</span>
    <span class="nf">transition_barrier</span><span class="p">(</span>
        <span class="n">tex</span><span class="py">.resource</span><span class="nf">.as_ref</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">(),</span>
        <span class="c1">// ..</span>
    <span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We need to manually sweep and clean things up. This step is out of line with Rust’s memory model but we need to synchronise the delete with the swap chain to ensure that any remaining references are complete on the GPU. So at the end of each frame we have a little bit of housekeeping to do where we can check the current frame number vs the frame in which the resource was dropped. To avoid a dependency on the <code class="language-plaintext highlighter-rouge">SwapChain</code> during the <code class="language-plaintext highlighter-rouge">Drop</code> itself, the frame index is initialised to zero and it is set the first time we call <code class="language-plaintext highlighter-rouge">cleanup</code>. The cleanup code will finally drop internal Direct3D12 resources when safe and then add the associated resource views in heaps onto a <code class="language-plaintext highlighter-rouge">FreeList</code> so the handles can be recycled when a new allocation is made.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">cleanup_dropped_resources</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">swap_chain</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">SwapChain</span><span class="p">)</span> <span class="p">{</span>
    <span class="cd">/// lock drop list so it's thread safe</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">drop_list</span> <span class="o">=</span> <span class="k">self</span><span class="py">.drop_list.list</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">free_list</span> <span class="o">=</span> <span class="k">self</span><span class="py">.free_list.list</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">complete_indices</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">res_index</span><span class="p">,</span> <span class="n">drop_res</span><span class="p">)</span> <span class="k">in</span> <span class="n">drop_list</span><span class="nf">.iter_mut</span><span class="p">()</span><span class="nf">.enumerate</span><span class="p">()</span> <span class="p">{</span>
        <span class="c1">// initialise the frame, and then wait</span>
        <span class="k">if</span> <span class="n">drop_res</span><span class="py">.frame</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
            <span class="n">drop_res</span><span class="py">.frame</span> <span class="o">=</span> <span class="n">swap_chain</span><span class="py">.frame_index</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">else</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">swap_chain</span><span class="py">.frame_index</span> <span class="o">-</span> <span class="n">drop_res</span><span class="py">.frame</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">diff</span> <span class="o">&gt;</span> <span class="n">swap_chain</span><span class="py">.num_bb</span> <span class="k">as</span> <span class="nb">usize</span> <span class="p">{</span>
                <span class="c1">// waited long enough we can add the resource views to the free list</span>
                <span class="k">for</span> <span class="n">alloc</span> <span class="k">in</span> <span class="o">&amp;</span><span class="n">drop_res</span><span class="py">.heap_allocs</span> <span class="p">{</span>
                    <span class="n">free_list</span><span class="nf">.push</span><span class="p">(</span><span class="o">*</span><span class="n">alloc</span><span class="p">);</span>
                <span class="p">}</span>
                <span class="n">drop_res</span><span class="py">.resources</span><span class="nf">.clear</span><span class="p">();</span>
                <span class="n">drop_res</span><span class="py">.heap_allocs</span><span class="nf">.clear</span><span class="p">();</span>
                <span class="n">complete_indices</span><span class="nf">.push</span><span class="p">(</span><span class="n">res_index</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// remove complete items in reverse</span>
    <span class="n">complete_indices</span><span class="nf">.reverse</span><span class="p">();</span>
    <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">complete_indices</span> <span class="p">{</span>
        <span class="n">drop_list</span><span class="nf">.remove</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// call clean up at the end of each frame, this could be deferred or ran at different times</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">run</span><span class="p">(</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="k">super</span><span class="p">::</span><span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>

    <span class="c1">// ..</span>

    <span class="c1">// cleanup heaps</span>
    <span class="k">self</span><span class="py">.pmfx.shader_heap</span><span class="nf">.cleanup_dropped_resources</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="py">.swap_chain</span><span class="p">);</span>
    <span class="k">self</span><span class="py">.device</span><span class="nf">.cleanup_dropped_resources</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="py">.swap_chain</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I would note that now at this point the <code class="language-plaintext highlighter-rouge">gfx::Texture</code> struct has become a chunky 148 bytes (not just from the extra requirements for <code class="language-plaintext highlighter-rouge">Drop</code> but also subresource management, render targets, depth stencils etc). It’s not something I am super keen on, but since we pass around only <code class="language-plaintext highlighter-rouge">usize</code> shader resource handles to reference textures in a shader, the texture struct itself can be a heavyweight resource and won’t likely be required during ecs iterators and such; it’s more like you create it once and just keep a hold of it so the memory remains in scope while it is used on the GPU.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Texture</span> <span class="p">{</span>
    <span class="n">resource</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">ID3D12Resource</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">resolved_resource</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">ID3D12Resource</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">resolved_format</span><span class="p">:</span> <span class="n">DXGI_FORMAT</span><span class="p">,</span>
    <span class="n">rtv</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">TextureTarget</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">dsv</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">TextureTarget</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">srv_index</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">resolved_srv_index</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">uav_index</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">subresource_uav_index</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">shared_handle</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">HANDLE</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// drop list for srv, uav and resolved srv</span>
    <span class="n">drop_list</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">DropListRef</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// the id of the shader heap for (uav, srv etc)</span>
    <span class="n">shader_heap_id</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">u16</span><span class="o">&gt;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I did consider a mechanism to pass a reference to a command buffer, so that dropping a reference to a texture wouldn’t actually drop the internal resource, but in a bindless rendering setup this becomes a lot harder to track. You are just indexing into descriptor arrays on the GPU and you don’t have to physically bind anything like in a bindful rendering architecture, so I am happy with the drop trait handling for the time being. Still, I can foresee a lot of potential pitfalls but it’s a step in the right direction.</p>

<h2 id="resource-heaps">Resource Heaps</h2>

<p>Initially I tried to abstract away the <code class="language-plaintext highlighter-rouge">Heap</code> concept for speed of development - by keeping a <code class="language-plaintext highlighter-rouge">Heap</code> as part of a <code class="language-plaintext highlighter-rouge">Device</code> this allows the ability to call <code class="language-plaintext highlighter-rouge">device.create_texture()</code> in a Direct3D11 kind of way. I still like this approach for quick demos and noodling around, but it became clear as more complexity began to emerge in the higher level <code class="language-plaintext highlighter-rouge">pmfx</code> library and entity component system that it would be a benefit to be able to create and manage your own heaps. This would allow the heaps to be dynamically resized and resources could be re-allocated or moved; an entire heap could be thrown away between levels in a game or switching between projects. I decided to allow both methods by having the following set of functions.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Create buffer and create texture will add resource views into an internally managed heap owned by the device</span>
<span class="k">fn</span> <span class="n">create_buffer</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Sized</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span>
    <span class="n">info</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">BufferInfo</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="p">[</span><span class="n">T</span><span class="p">]</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Buffer</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span><span class="p">;</span>

<span class="k">fn</span> <span class="n">create_texture</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Sized</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span>
    <span class="n">info</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">TextureInfo</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="p">[</span><span class="n">T</span><span class="p">]</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Texture</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span><span class="p">;</span>

<span class="c1">// pass a heap to create buffer resoruce views on a user managed heap</span>
<span class="k">fn</span> <span class="n">create_buffer_with_heap</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Sized</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span>
    <span class="n">info</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">BufferInfo</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="p">[</span><span class="n">T</span><span class="p">]</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">heap</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">Heap</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Buffer</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span><span class="p">;</span>

<span class="c1">// textures might require multiple heaps, you can provide your own or use the device managed heaps with None</span>

<span class="k">pub</span> <span class="k">struct</span> <span class="n">TextureHeapInfo</span><span class="o">&lt;</span><span class="nv">'stack</span><span class="p">,</span> <span class="n">D</span><span class="p">:</span> <span class="n">Device</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// Heap to allocate shader resource views and un-ordered access views</span>
    <span class="k">pub</span> <span class="n">shader</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="nv">'stack</span> <span class="k">mut</span> <span class="nn">D</span><span class="p">::</span><span class="n">Heap</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="cd">/// Heap to allocate render target views</span>
    <span class="k">pub</span> <span class="n">render_target</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="nv">'stack</span> <span class="k">mut</span> <span class="nn">D</span><span class="p">::</span><span class="n">Heap</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="cd">/// Heap to allocate depth stencil views</span>
    <span class="k">pub</span> <span class="n">depth_stencil</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="nv">'stack</span> <span class="k">mut</span> <span class="nn">D</span><span class="p">::</span><span class="n">Heap</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// create texture with user managed heaps</span>
<span class="k">fn</span> <span class="n">create_texture_with_heaps</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Sized</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span>
    <span class="n">info</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">TextureInfo</span><span class="p">,</span>
    <span class="n">heaps</span><span class="p">:</span> <span class="n">TextureHeapInfo</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="p">[</span><span class="n">T</span><span class="p">]</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Texture</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span><span class="p">;</span>
</code></pre></div></div>

<p>This allows for maximum flexibility, providing low level control where you need it or simpler ergonomics when you don’t.</p>

<h3 id="imgui-image-rendering-with-multiple-heaps">ImGui Image Rendering With Multiple Heaps</h3>

<p>Allowing user specified heaps threw up problems with imgui image rendering. Because now a <code class="language-plaintext highlighter-rouge">Texture</code> may reside in different heaps, a simple call to <code class="language-plaintext highlighter-rouge">imgui.image(texture)</code> did not provide enough context. Previously I was relying on only a single program-wide shader resource heap that was internally managed by the <code class="language-plaintext highlighter-rouge">gfx::Device</code>. Some data still resides in said heap, this is true for the imgui font texture, so I needed a way to pass this information around. Luckily due to the changes in dropping GPU resources I had more information contained in a <code class="language-plaintext highlighter-rouge">gfx::Texture</code> I could use at my disposal. Imgui images work by passing around a <code class="language-plaintext highlighter-rouge">void*</code> for a <code class="language-plaintext highlighter-rouge">ImTextureID</code>. Now, with Rust lifetimes I did not want to make this a full blown reference because simply all we need is the shader resource view handle, which is stored as a <code class="language-plaintext highlighter-rouge">usize</code>. The handles are allocated linearly inside a <code class="language-plaintext highlighter-rouge">gfx::Heap</code> and managed with a free list, so having a full range of 64-bits for shader resource handles is more than enough, even 32-bits is probably excessive. Each <code class="language-plaintext highlighter-rouge">gfx::Heap</code> gets allocated an ID and the ID’s are assigned sequentially upon creation, so I took the approach of packing the shader resource view handle and heap id together into 64-bits. 16 upper bits represent the heap id and the lower 48 represent the shader resource view handle. When passing a texture to <code class="language-plaintext highlighter-rouge">imgui.image</code> this process is handled for you, and then when we come to render imgui we just need to provide a vector of any additional user heaps.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// pass an array of heap references to imgui render. empty vector will use the device heap only</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">render</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span>
    <span class="n">app</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">A</span><span class="p">,</span>
    <span class="n">main_window</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">A</span><span class="p">::</span><span class="n">Window</span><span class="p">,</span>
    <span class="n">device</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">D</span><span class="p">,</span>
    <span class="n">cmd</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">D</span><span class="p">::</span><span class="n">CmdBuf</span><span class="p">,</span>
    <span class="n">image_heaps</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">Vec</span><span class="o">&lt;&amp;</span><span class="nn">D</span><span class="p">::</span><span class="n">Heap</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">)</span>

<span class="c1">// code to unpack the 16bit heap id and 48bit srv id</span>
<span class="k">fn</span> <span class="nf">to_srv_heap_id</span><span class="p">(</span><span class="n">tex_id</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">cty</span><span class="p">::</span><span class="nb">c_void</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="p">(</span><span class="nb">usize</span><span class="p">,</span> <span class="nb">u16</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">mask</span> <span class="o">=</span> <span class="mi">0x0000ffffffffffff</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">srv_id</span> <span class="o">=</span> <span class="p">(</span><span class="n">tex_id</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">heap_id</span> <span class="o">=</span> <span class="p">((</span><span class="n">tex_id</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">)</span> <span class="o">&amp;</span> <span class="o">!</span><span class="n">mask</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="mi">48</span><span class="p">;</span>
    <span class="p">(</span><span class="n">srv_id</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">heap_id</span> <span class="k">as</span> <span class="nb">u16</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">render_draw_data</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// ..</span>

    <span class="c1">// extract srv and heap id from the packed texture id</span>
    <span class="k">let</span> <span class="p">(</span><span class="n">srv</span><span class="p">,</span> <span class="n">heap_id</span><span class="p">)</span> <span class="o">=</span> <span class="nf">to_srv_heap_id</span><span class="p">(</span><span class="n">imgui_cmd</span><span class="py">.TextureId</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">heap_id</span> <span class="o">==</span> <span class="n">device</span><span class="nf">.get_shader_heap</span><span class="p">()</span><span class="nf">.get_heap_id</span><span class="p">()</span> <span class="p">{</span>
        <span class="c1">// bind the device heap</span>
        <span class="n">cmd</span><span class="nf">.set_binding</span><span class="p">(</span><span class="n">pipeline</span><span class="p">,</span> <span class="n">device</span><span class="nf">.get_shader_heap</span><span class="p">(),</span> <span class="mi">1</span><span class="p">,</span> <span class="n">srv</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="c1">// bund srv in another heap</span>
        <span class="k">for</span> <span class="n">heap</span> <span class="k">in</span> <span class="n">image_heaps</span> <span class="p">{</span>
            <span class="k">if</span> <span class="n">heap</span><span class="nf">.get_heap_id</span><span class="p">()</span> <span class="o">==</span> <span class="n">heap_id</span> <span class="p">{</span>
                <span class="n">cmd</span><span class="nf">.set_binding</span><span class="p">(</span><span class="n">pipeline</span><span class="p">,</span> <span class="n">heap</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">srv</span><span class="p">);</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There is more than enough prevision in the 48 and 16 bits for any sane use-cases so I began to think I could also pack some extra info in there as well, maybe the texture type and render flags could be easily packed into 8-bits to allow changing shaders and rendering textures differently through imgui. This would open the opportunity to provide a more fully featured texture viewer, with alpha masking or different kinds of controls. I had implemented something similar before using custom callbacks, but I didn’t really like the overall architecture and just packing data into the <code class="language-plaintext highlighter-rouge">ImTextureID</code> is much nicer. But that’s something on the backburner for another day.</p>

<h2 id="gpu-hangs">GPU-Hangs</h2>

<p>I started to encounter intermittent, random GPU hangs and device removals as I began to add more complicated examples. When these occurred I would get misleading call stacks from crashes occurring during the stack unwind and no D3D12 validation errors or messages. I spent a while trying to pin down the problem in an old school fashion; commenting out code and simplifying, but I could never quite put my finger on what exactly was going on. In one particular example that had bindless draw, material, and light data lookups, with a fair amount of indirection in there it would sometimes crash on startup, but not all the time. Once the sample had booted it was stable. Other hangs would occur on some basic examples, but only when switching between. It was like the first frame would be intermittently unstable or not, and if you got past that then it would be OK. I verified my indices coming into the shaders and I did resolve a few things that looked like they might be the cause, but ended up being red-herrings, these were namely some out-of-order updates where render calls were being made before updates.</p>

<p>This sort of problem is <em>really</em> annoying to search for on the internet. Just searching for <code class="language-plaintext highlighter-rouge">DXGI_DEVICE_HUNG</code> throws up search results from various commercial games, with users on reddit and steam complaining about their games not working. Just so much clutter with useless info (update your drivers, get a new GPU). I wanted some developer focused information! I managed to find some forum posts on gamedev that mentioned <code class="language-plaintext highlighter-rouge">EnableGPUValidation</code> this can be enabled on a <code class="language-plaintext highlighter-rouge">D312Debug1</code> interface:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// enable debug layer</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">dxgi_factory_flags</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="nd">cfg!</span><span class="p">(</span><span class="n">debug_assertions</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">debug</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">D3D12DebugVersion</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>
    <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">debug</span><span class="p">)</span> <span class="o">=</span> <span class="nf">D3D12GetDebugInterface</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">debug</span><span class="p">)</span><span class="nf">.ok</span><span class="p">()</span><span class="nf">.and</span><span class="p">(</span><span class="n">debug</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">debug</span><span class="nf">.EnableDebugLayer</span><span class="p">();</span>

        <span class="c1">// slower but more detailed GPU validation</span>
        <span class="k">if</span> <span class="n">GPU_VALIDATION</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">debug1</span> <span class="p">:</span> <span class="n">ID3D12Debug1</span> <span class="o">=</span> <span class="n">debug</span><span class="nf">.cast</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
            <span class="n">debug1</span><span class="nf">.SetEnableGPUBasedValidation</span><span class="p">(</span><span class="k">true</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="nd">println!</span><span class="p">(</span><span class="s">"hotline_rs::gfx::d3d12: enabling debug layer"</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">dxgi_factory_flags</span> <span class="o">=</span> <span class="n">DXGI_CREATE_FACTORY_DEBUG</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I was able to reproduce the hangs after a few attempts and got some debug output, progress!</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">D3D12</span> <span class="n">ERROR</span><span class="p">:</span> <span class="n">GPU</span><span class="o">-</span><span class="n">BASED</span> <span class="n">VALIDATION</span><span class="p">:</span> <span class="n">Draw</span><span class="p">,</span> <span class="n">Uninitialized</span> <span class="n">root</span> <span class="n">argument</span> <span class="n">accessed</span><span class="py">. Shader</span> <span class="n">Stage</span><span class="p">:</span> <span class="n">VERTEX</span><span class="p">,</span> 
<span class="n">Root</span> <span class="n">Parameter</span> <span class="nb">Index</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">Draw</span> <span class="nb">Index</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">Shader</span> <span class="n">Code</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">couldn</span><span class="nv">'t</span> <span class="n">find</span> <span class="n">file</span> <span class="n">location</span> <span class="k">in</span> <span class="n">debug</span> <span class="n">info</span><span class="o">&gt;</span><span class="p">,</span> 
<span class="n">Asm</span> <span class="n">Instruction</span> <span class="n">Range</span><span class="p">:</span> <span class="p">[</span><span class="mi">0xd</span><span class="o">-</span><span class="mi">0xffffffff</span><span class="p">],</span> <span class="n">Asm</span> <span class="n">Operand</span> <span class="nb">Index</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> 
<span class="n">Command</span> <span class="n">List</span><span class="p">:</span> <span class="mi">0x00000232EE9CC450</span><span class="p">:</span><span class="nv">'Unnamed</span> <span class="n">ID3D12GraphicsCommandList</span> <span class="n">Object</span><span class="err">'</span><span class="p">,</span> 
<span class="n">SRV</span><span class="o">/</span><span class="n">UAV</span><span class="o">/</span><span class="n">CBV</span> <span class="n">Descriptor</span> <span class="n">Heap</span><span class="p">:</span> <span class="mi">0x00000232EDB1C3C0</span><span class="p">:</span><span class="nv">'Unnamed</span> <span class="n">ID3D12DescriptorHeap</span> <span class="n">Object</span><span class="err">'</span><span class="p">,</span> 
<span class="n">Sampler</span> <span class="n">Descriptor</span> <span class="n">Heap</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">not</span> <span class="n">set</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">Pipeline</span> <span class="n">State</span><span class="p">:</span> <span class="mi">0x0000023299DFCB30</span><span class="p">:</span><span class="nv">'Unnamed</span> <span class="n">ID3D12PipelineState</span> <span class="n">Object</span><span class="err">'</span><span class="p">,</span>  
<span class="p">[</span> <span class="n">EXECUTION</span> <span class="n">ERROR</span> <span class="err">#</span><span class="mi">935</span><span class="p">:</span> <span class="n">GPU_BASED_VALIDATION_ROOT_ARGUMENT_UNINITIALIZED</span><span class="p">]</span>
</code></pre></div></div>

<p>At this point I was happy, I don’t mind something being wrong, especially if there is some validation telling me it’s wrong. It gives me the opportunity to work with it and fix the validation and then the symptoms. I do feel stressed if I am encountering a problem with no errors, warnings or validation messages. This makes me feel like I’m straying into territory of broken hardware or drivers which may be harder to work around, although having said that, in my experience such issues account for an extremely small percentage of problems I have encountered in my life as a programmer, and almost all issues I have ever faced have been self inflicted.</p>

<p>This validation error brought me back to shader registers, spaces, and root parameter indices. My original code was grouping all descriptor ranges by visibility and then creating a root parameter per shader visibility, which I go into more detail in the next section, but while I am here on the topic of GPU hangs / device removals I encountered another problem that did not have any validation output.</p>

<p>The cause of the issue came from population <code class="language-plaintext highlighter-rouge">IndirectArgument</code> unordered access buffers on the GPU as part of a GPU driven rendering setup. It took a while to track down because of the lack of information, but thanks to the useful size and alignment hints supplied by vscode I was able to notice that the size of the indirect argument structure was larger than expected - some padding was being added at the end. For all structures in use on the GPU I was using <code class="language-plaintext highlighter-rouge">#[repr(C)]</code> and I thought this would be enough to prevent this kind of problem, but in the end I needed to change to <code class="language-plaintext highlighter-rouge">#[repr(packed)</code> to prevent any padding being added.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// size = 72, align = 8</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">DrawIndirectArgs</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">vertex_buffer</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">VertexBufferView</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">index_buffer</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">IndexBufferView</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">ids</span><span class="p">:</span> <span class="n">Vec4u</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">args</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">DrawIndexedArguments</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// size = 68, align = 1</span>
<span class="nd">#[repr(packed)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">DrawIndirectArgs</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">vertex_buffer</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">VertexBufferView</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">index_buffer</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">IndexBufferView</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">ids</span><span class="p">:</span> <span class="n">Vec4u</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">args</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">DrawIndexedArguments</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="bindless-rendering">Bindless Rendering</h2>

<p>I had done some initial exploratory work into bindless rendering in the very early stages of this project, but recently I started needing more data accessible on the GPU and that began to highlight changes that needed to be made both from a functionality point of view but also a usability perspective. With the aforementioned GPU hangs being caused by the bindless setup I started to look into that in more detail. The naming around this aspect of modern graphics api’s is quite confusing, it’s different in Vulkan and Metal and I don’t find <code class="language-plaintext highlighter-rouge">Descriptors</code>, <code class="language-plaintext highlighter-rouge">RootSignatures</code> or <code class="language-plaintext highlighter-rouge">RootConstants</code> very intuitive names to begin with. Since I had worked with Vulkan first I stuck with <code class="language-plaintext highlighter-rouge">PipelineLayout</code>, <code class="language-plaintext highlighter-rouge">Descriptors</code> and <code class="language-plaintext highlighter-rouge">PushConstants</code>. I did have to do a little backtracking here just to make sure everything was consistently named and in the context of this post I hope the concepts are clear enough to follow. A <code class="language-plaintext highlighter-rouge">PipelineLayout</code> describes <code class="language-plaintext highlighter-rouge">Descriptors</code>, <code class="language-plaintext highlighter-rouge">PushConstants</code> and <code class="language-plaintext highlighter-rouge">Samplers</code> that are used in the pipeline where <code class="language-plaintext highlighter-rouge">Descriptors</code> are arrays of resources such as textures, structured buffers, or constant buffers. <code class="language-plaintext highlighter-rouge">PushConstants</code> are a small amount of data that we can push into the command buffer on the CPU to have access to in a shader, and <code class="language-plaintext highlighter-rouge">Samplers</code> are used to sample textures and are the only part of all of this that has a sense of familiarity with older graphics APIs.</p>

<p><code class="language-plaintext highlighter-rouge">PipelineLayouts</code> are automatically generated by my shader system <a href="https://github.com/polymonster/pmfx-shader">pmfx-shader</a>. Based on resource usage in shaders and a small amount of metadata, the pmfx-shader system is able to parse the code and automatically generate the layout. Binding heaps and push constants can be a little confusing because in the shader we specify which <code class="language-plaintext highlighter-rouge">register</code> and <code class="language-plaintext highlighter-rouge">space</code> we bind them to, and in the old days of bindful rendering, we would bind a texture onto the designated <code class="language-plaintext highlighter-rouge">register</code> or <code class="language-plaintext highlighter-rouge">slot</code> as I often would call them. I discovered that even with different <code class="language-plaintext highlighter-rouge">registers</code> or <code class="language-plaintext highlighter-rouge">spaces</code> the binding <code class="language-plaintext highlighter-rouge">slot</code> or ‘root parameter index’ (as Direct3D12 calls it) might not be as expected due to the auto-generated layout from <code class="language-plaintext highlighter-rouge">pmfx-shader</code>. Direct3D12 allows multiple descriptor ranges to be bound to the same slot; I am not sure the benefit of grouping more descriptors onto the same slot or keeping them separate, but they do need to be at the very least grouped by shader visibility, which is one of <code class="language-plaintext highlighter-rouge">Vertex</code>, <code class="language-plaintext highlighter-rouge">Fragment</code>, <code class="language-plaintext highlighter-rouge">Compute</code> etc or <code class="language-plaintext highlighter-rouge">All</code> of it is to be bound and accessible on multiple stages.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// `PipelineLayout` is required to create a pipeline it describes the layout of resources for access on the GPU.</span>
<span class="nd">#[derive(Default,</span> <span class="nd">Clone,</span> <span class="nd">Serialize,</span> <span class="nd">Deserialize)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">PipelineLayout</span> <span class="p">{</span>
    <span class="cd">/// Vector of `DescriptorBinding` which are arrays of textures, samplers or structured buffers, etc</span>
    <span class="k">pub</span> <span class="n">bindings</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">DescriptorBinding</span><span class="o">&gt;&gt;</span><span class="p">,</span>
    <span class="cd">/// Small amounts of data that can be pushed into a command buffer and available as data in shaders</span>
    <span class="k">pub</span> <span class="n">push_constants</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">PushConstantInfo</span><span class="o">&gt;&gt;</span><span class="p">,</span>
    <span class="cd">/// Static samplers that come along with the pipeline, </span>
    <span class="k">pub</span> <span class="n">static_samplers</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">SamplerBinding</span><span class="o">&gt;&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I toyed with a few implementations first, all of which felt cumbersome and I read through the <a href="https://learn.microsoft.com/en-us/windows/win32/direct3d12/resource-binding-in-hlsl">msdn docs</a> about bindings, multiple times and that took more than a while to sink in. In an attempt to make this as simple as possible for the user you can supply a vector of descriptors when creating a <code class="language-plaintext highlighter-rouge">PipelineLayout</code>. Under the hood the descriptors are grouped by type (srv, uav, cbv), register and shader visibility. This gives unique slots for different types of resources and opens the door to a bindful rendering model where it may be useful.</p>

<p>The key takeaway in regards to bindless rendering was that we bind a heap separately and then we can apply offsets within that heap to the different slots in the pipeline, but critically we can only bind a single descriptor heap at any one time. So in the context of a bindless rendering architecture we only have a single heap we use at any time and that contains all of our resources. Equipped with this knowledge it made it easy for me to add a utility function that would bind the heap for all of the slots, which need access to <code class="language-plaintext highlighter-rouge">Descriptors</code>… I still feel uncomfortable calling them descriptors, and still confused. I often refer to them as shader resources myself, but anyway. A quick example of how the code evolved over time:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">render</span><span class="p">(</span><span class="n">cmd_buf</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">CmdBuf</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">Heap</span><span class="p">)</span> <span class="p">{</span>

    <span class="c1">// 1. initially you would bind the heap on to the pipeline slot manually...</span>
    <span class="n">cmd_buf</span><span class="nf">.set_render_heap</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>

    <span class="c1">// you would also have a separate function for compute</span>
    <span class="n">cmd_buf</span><span class="nf">.set_compute_heap</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>

    <span class="c1">// and would need to bind to multiple slots</span>
    <span class="n">cmd_buf</span><span class="nf">.set_render_heap</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>
    <span class="n">cmd_buf</span><span class="nf">.set_render_heap</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>
    <span class="n">cmd_buf</span><span class="nf">.set_render_heap</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>

    <span class="c1">// 2. I then added a utility so we know which slot is associated with a particular register</span>
    <span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">ShaderResource</span><span class="p">);</span>
    <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
        <span class="n">cmd_buf</span><span class="nf">.set_render_heap</span><span class="p">(</span><span class="n">slot</span><span class="py">.index</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="cd">/// this still required multiple binds</span>
    <span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">ShaderResource</span><span class="p">);</span>
    <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
        <span class="n">cmd_buf</span><span class="nf">.set_render_heap</span><span class="p">(</span><span class="n">slot</span><span class="py">.index</span><span class="p">,</span> <span class="n">shader_heap</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// 3. moved to a single set_heap call with generics for compute and render pipelines</span>
    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.set_heap</span><span class="p">(</span><span class="n">render_pipeline</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">);</span>

    <span class="c1">// </span>
    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.set_heap</span><span class="p">(</span><span class="n">compute_pipeline</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="bindful-rendering">Bindful Rendering</h3>

<p>It is still useful sometimes to revert to a ‘bindful’ render model, the imgui backend code is doing this since the shader itself only uses a single texture and the <code class="language-plaintext highlighter-rouge">ImTextureID</code> is passed through code as previously discussed. I was also using this similar approach in a <code class="language-plaintext highlighter-rouge">blit</code> function that just needed access to a single texture. Here I added a utility function that can be used to obtain a <code class="language-plaintext highlighter-rouge">PipelineSlot</code> based on the register, space, and type of resource. An offset into the heap can then be applied to the <code class="language-plaintext highlighter-rouge">PipelineSlot</code>. The offset is supplied by obtaining a texture or buffers <code class="language-plaintext highlighter-rouge">srv_index</code> or <code class="language-plaintext highlighter-rouge">uav_index</code>; these can be used to bind the heap at an offset and essentially they become the same as an old-school Direct3D11 style renderer.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// set the heap</span>
<span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.set_heap</span><span class="p">(</span><span class="n">pipeline</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">);</span>

<span class="c1">// find the slot and bind the offset</span>
<span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">ShaderResource</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_binding</span><span class="p">(</span><span class="n">pipeline</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">,</span> <span class="n">slot</span><span class="py">.index</span><span class="p">,</span> <span class="n">srv</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">PipelineSlot</code> API can also be used to get the correct location of push constants, again looking them up by register and space from the shader. The return value is optional so that it makes it possible to re-use shared code and only bind certain constants if a shader requires them. For example, some shaders may need push constants for the view matrix as well as an object world matrix, but some shaders may only need one or the other.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// bind view push constants</span>
<span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span><span class="n">slot</span><span class="py">.index</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">camera</span><span class="py">.view_projection_matrix</span><span class="p">));</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span><span class="n">slot</span><span class="py">.index</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">camera</span><span class="py">.view_position</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// bind the world buffer info</span>
<span class="k">let</span> <span class="n">world_buffer_info</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_world_buffer_info</span><span class="p">();</span>
<span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span>
        <span class="n">slot</span><span class="py">.index</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">num_32bit_constants</span><span class="p">(</span><span class="o">&amp;</span><span class="n">world_buffer_info</span><span class="p">),</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">world_buffer_info</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="gpu-driven-rendering">GPU-Driven Rendering</h2>

<p>In addition to the bindless architecture, I intend the final core <code class="language-plaintext highlighter-rouge">ecs</code> architecture to be GPU driven. GPU driven rendering allows command buffers to be populated on the GPU and this offloads CPU intensive work. This is where graphics APIs diverge quite significantly, so for this stage I am focusing on what is possible in Direct3D12 but I do have one eye on compatibility for other platforms. There is a nice detailed <a href="https://github.com/gpuweb/gpuweb/issues/31">post</a> on the webgpu issues page that outlines differences between graphics APIs. I had also had some prior experience with Metal as well so the concepts are relatively familiar. In short, Metal allows you to build entire command buffers on the GPU; Direct3D12 allows you to change a binding (vertex, index or descriptor), change push constants and draw arguments, but you can’t change pipelines; Vulkan allows you only to change draw arguments… that is without any extensions.</p>

<p>I will cross the bridge of cross platform support when I come to it, but in its current form hotline offers support for <code class="language-plaintext highlighter-rouge">execute_indirect</code> just as Direct3D12 does. I made a few samples to trial this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// populate a buffer of draw arguments</span>
<span class="k">let</span> <span class="n">args</span> <span class="o">=</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">DrawArguments</span> <span class="p">{</span>
    <span class="n">vertex_count_per_instance</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
    <span class="n">instance_count</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
    <span class="n">start_vertex_location</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
    <span class="n">start_instance_location</span><span class="p">:</span> <span class="mi">0</span>
<span class="p">};</span>

<span class="c1">// create an INDIRECT_ARGUMENT_BUFFER</span>
<span class="k">let</span> <span class="n">draw_args</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">BufferInfo</span><span class="p">{</span>
    <span class="n">usage</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">BufferUsage</span><span class="p">::</span><span class="n">INDIRECT_ARGUMENT_BUFFER</span><span class="p">,</span>
    <span class="n">cpu_access</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">CpuAccessFlags</span><span class="p">::</span><span class="n">NONE</span><span class="p">,</span>
    <span class="n">format</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">Format</span><span class="p">::</span><span class="n">Unknown</span><span class="p">,</span>
    <span class="n">stride</span><span class="p">:</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">DrawArguments</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="n">initial_state</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">,</span>
    <span class="n">num_elements</span><span class="p">:</span> <span class="mi">1</span>
<span class="p">},</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="nd">data!</span><span class="p">(</span><span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">args</span><span class="p">)))</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="c1">// create a command signature</span>
<span class="k">let</span> <span class="n">command_signature</span> <span class="o">=</span> <span class="n">device</span><span class="py">.create_indirect_render_command</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">DrawArguments</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="nd">vec!</span><span class="p">[</span><span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">{</span>
        <span class="n">argument_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">IndirectArgumentType</span><span class="p">::</span><span class="n">Draw</span><span class="p">,</span>
        <span class="n">arguments</span><span class="p">:</span> <span class="nb">None</span>
    <span class="p">}],</span> 
    <span class="nb">None</span>
<span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="c1">// bind buffers and make the execute indirect call </span>
<span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">world_matrix</span><span class="na">.0</span><span class="p">);</span>
<span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_index_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mesh</span><span class="na">.0</span><span class="py">.ib</span><span class="p">);</span>
<span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_vertex_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mesh</span><span class="na">.0</span><span class="py">.vb</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

<span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.execute_indirect</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="n">command</span><span class="na">.0</span><span class="p">,</span> 
    <span class="mi">1</span><span class="p">,</span> 
    <span class="o">&amp;</span><span class="n">args</span><span class="na">.0</span><span class="p">,</span> 
    <span class="mi">0</span><span class="p">,</span> 
    <span class="nb">None</span><span class="p">,</span> 
    <span class="mi">0</span>
<span class="p">);</span>
</code></pre></div></div>

<h3 id="gpu-entity-frustum-culling">GPU Entity Frustum Culling</h3>

<p>I set up a basic example with a large number of draw calls being submitted from the CPU to see how switching to <code class="language-plaintext highlighter-rouge">execute_indirect</code> would fare. The initial implementation loaded 32k entities with 30 unique meshes randomly selected, so the vertex and index buffers needed changing each draw call, with a single pipeline used for all meshes. This clocked in heavily CPU bound at about 80ms per-frame with no culling being performed what-so-ever.</p>

<p>Setting up for <code class="language-plaintext highlighter-rouge">execute_indirect</code> is fairly straightforward. The whole thing starts off with a buffer created on the CPU containing all of the draw arguments and buffer indices for each entity we want to draw. Then we create an unordered access buffer that the draw arguments will be copied into dynamically for only visible entities, after culling them on the GPU. Here it uses an <code class="language-plaintext highlighter-rouge">AppendStructuredBuffer</code> in the shader; this is a type not present in Metal shader language or GLSL, so in future I will have to implement some system to get the same behaviour, but essentially it consists of a buffer for data with a space in the buffer to store an atomic counter, which is incremented as we append items into the buffer. We can pass this counter to the <code class="language-plaintext highlighter-rouge">execute_indirect</code> call so it knows how many entities to draw.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// command signature specifies we change vertex and index buffers and update 2 push constants</span>
<span class="k">let</span> <span class="n">command_signature</span> <span class="o">=</span> <span class="n">device</span><span class="py">.create_indirect_render_command</span><span class="p">::</span><span class="o">&lt;</span><span class="n">DrawIndirectArgs</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="nd">vec!</span><span class="p">[</span>
        <span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">{</span>
            <span class="n">argument_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">IndirectArgumentType</span><span class="p">::</span><span class="n">VertexBuffer</span><span class="p">,</span>
            <span class="n">arguments</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectTypeArguments</span> <span class="p">{</span>
                <span class="n">buffer</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectBufferArguments</span> <span class="p">{</span>
                    <span class="n">slot</span><span class="p">:</span> <span class="mi">0</span>
                <span class="p">}</span>
            <span class="p">})</span>
        <span class="p">},</span>
        <span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">{</span>
            <span class="n">argument_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">IndirectArgumentType</span><span class="p">::</span><span class="n">IndexBuffer</span><span class="p">,</span>
            <span class="n">arguments</span><span class="p">:</span> <span class="nb">None</span>
        <span class="p">},</span>
        <span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">{</span>
            <span class="n">argument_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">IndirectArgumentType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">,</span>
            <span class="n">arguments</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectTypeArguments</span> <span class="p">{</span>
                <span class="n">push_constants</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectPushConstantsArguments</span> <span class="p">{</span>
                    <span class="n">slot</span><span class="p">:</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="py">.slot</span><span class="p">,</span>
                    <span class="n">offset</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
                    <span class="n">num_values</span><span class="p">:</span> <span class="mi">4</span>
                <span class="p">}</span>
            <span class="p">})</span>
        <span class="p">},</span>
        <span class="nn">gfx</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">{</span>
            <span class="n">argument_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">IndirectArgumentType</span><span class="p">::</span><span class="n">DrawIndexed</span><span class="p">,</span>
            <span class="n">arguments</span><span class="p">:</span> <span class="nb">None</span>
        <span class="p">}</span>
    <span class="p">],</span> 
    <span class="nf">Some</span><span class="p">(</span><span class="n">pipeline</span><span class="p">)</span>
<span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="c1">// buffer is populated with draw call information for all entities</span>
<span class="c1">// read data from the arg_buffer in compute shader to generate the `dynamic_buffer`</span>
<span class="k">let</span> <span class="n">arg_buffer</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_buffer_with_heap</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">BufferInfo</span><span class="p">{</span>
    <span class="n">usage</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">BufferUsage</span><span class="p">::</span><span class="n">SHADER_RESOURCE</span><span class="p">,</span>
    <span class="n">cpu_access</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">CpuAccessFlags</span><span class="p">::</span><span class="n">NONE</span><span class="p">,</span>
    <span class="n">format</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">Format</span><span class="p">::</span><span class="n">Unknown</span><span class="p">,</span>
    <span class="n">stride</span><span class="p">:</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">DrawIndirectArgs</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="n">initial_state</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">,</span>
    <span class="n">num_elements</span><span class="p">:</span> <span class="n">indirect_args</span><span class="nf">.len</span><span class="p">()</span>
<span class="p">},</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="nd">data!</span><span class="p">(</span><span class="o">&amp;</span><span class="n">indirect_args</span><span class="p">),</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="c1">// append buffer created to copy visible entities into</span>
<span class="c1">// dynamic buffer has a counter packed at the end</span>
<span class="k">let</span> <span class="n">dynamic_buffer</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_buffer_with_heap</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">BufferInfo</span><span class="p">{</span>
    <span class="n">usage</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">BufferUsage</span><span class="p">::</span><span class="n">INDIRECT_ARGUMENT_BUFFER</span> <span class="p">|</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">BufferUsage</span><span class="p">::</span><span class="n">UNORDERED_ACCESS</span> <span class="p">|</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">BufferUsage</span><span class="p">::</span><span class="n">APPEND_COUNTER</span><span class="p">,</span>
    <span class="n">cpu_access</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">CpuAccessFlags</span><span class="p">::</span><span class="n">NONE</span><span class="p">,</span>
    <span class="n">format</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">Format</span><span class="p">::</span><span class="n">Unknown</span><span class="p">,</span>
    <span class="n">stride</span><span class="p">:</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">DrawIndirectArgs</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="n">initial_state</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">,</span>
    <span class="n">num_elements</span><span class="p">:</span> <span class="n">indirect_args</span><span class="nf">.len</span><span class="p">(),</span>
<span class="p">},</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="nd">data!</span><span class="p">[],</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="c1">// create a buffer with 0, so we can clear the counter each frame by copy buffer region</span>
<span class="k">let</span> <span class="n">counter_reset</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_buffer_with_heap</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">BufferInfo</span><span class="p">{</span>
    <span class="n">usage</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">BufferUsage</span><span class="p">::</span><span class="n">NONE</span><span class="p">,</span>
    <span class="n">cpu_access</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">CpuAccessFlags</span><span class="p">::</span><span class="n">NONE</span><span class="p">,</span>
    <span class="n">format</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">Format</span><span class="p">::</span><span class="n">Unknown</span><span class="p">,</span>
    <span class="n">stride</span><span class="p">:</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">u32</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="n">initial_state</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">CopySrc</span><span class="p">,</span>
    <span class="n">num_elements</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="p">},</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="nd">data!</span><span class="p">[</span><span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="mi">0</span><span class="p">)],</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="k">fn</span> <span class="nf">render</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// reset the counter</span>
    <span class="k">let</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="nf">.get_counter_offset</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.copy_buffer_region</span><span class="p">(</span><span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="p">,</span> <span class="n">offset</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.counter_reset</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">u32</span><span class="o">&gt;</span><span class="p">());</span>

    <span class="c1">// transition to `UnorderedAccess`</span>
    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.transition_barrier</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">TransitionBarrier</span> <span class="p">{</span>
        <span class="n">texture</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
        <span class="n">buffer</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="p">),</span>
        <span class="n">state_before</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">CopyDst</span><span class="p">,</span>
        <span class="n">state_after</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">UnorderedAccess</span><span class="p">,</span>
    <span class="p">});</span>

    <span class="c1">// dispatch compute job to perform culling</span>
    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.dispatch</span><span class="p">(</span>
        <span class="nn">gfx</span><span class="p">::</span><span class="n">Size3</span> <span class="p">{</span>
            <span class="n">x</span><span class="p">:</span> <span class="n">indirect_draw</span><span class="py">.max_count</span> <span class="o">/</span> <span class="n">pass</span><span class="py">.numthreads.x</span><span class="p">,</span>
            <span class="n">y</span><span class="p">:</span> <span class="n">pass</span><span class="py">.numthreads.y</span><span class="p">,</span>
            <span class="n">z</span><span class="p">:</span> <span class="n">pass</span><span class="py">.numthreads.z</span>
        <span class="p">},</span>
        <span class="n">pass</span><span class="py">.numthreads</span>
    <span class="p">);</span>

    <span class="c1">// transition to `IndirectArgument`</span>
    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.transition_barrier</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">TransitionBarrier</span> <span class="p">{</span>
        <span class="n">texture</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
        <span class="n">buffer</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="p">),</span>
        <span class="n">state_before</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">UnorderedAccess</span><span class="p">,</span>
        <span class="n">state_after</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ResourceState</span><span class="p">::</span><span class="n">IndirectArgument</span><span class="p">,</span>
    <span class="p">});</span>

    <span class="c1">// draw indirect</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.execute_indirect</span><span class="p">(</span>
        <span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.signature</span><span class="p">,</span>
        <span class="n">indirect_draw</span><span class="py">.max_count</span><span class="p">,</span>
        <span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="nf">Some</span><span class="p">(</span><span class="o">&amp;</span><span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="p">),</span>
        <span class="n">indirect_draw</span><span class="py">.dynamic_buffer</span><span class="nf">.get_counter_offset</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span>
    <span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In the shader we use bindless lookups to obtain the entities extents data, camera planes, and perform a test against each plane in the frustum to detect whether an entity is inside or not. If the entity is visible, its draw data is copied into the indirect argument buffer and the counter is incremented.</p>

<div class="language-hlsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">indirect_draw</span> <span class="p">{</span>
    <span class="n">buffer_view</span>         <span class="n">vb</span><span class="p">;</span>
    <span class="n">buffer_view</span>         <span class="n">ib</span><span class="p">;</span>
    <span class="n">uint4</span>               <span class="n">ids</span><span class="p">;</span>
    <span class="n">draw_indexed_args</span>   <span class="n">args</span><span class="p">;</span>
<span class="p">};</span>

<span class="c1">// potential draw calls we want to make</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">indirect_draw</span><span class="o">&gt;</span> <span class="n">input_draws</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space11</span><span class="p">);</span>

<span class="c1">// draw calls to populate during the `cs_frustum_cull` dispatch</span>
<span class="nb">AppendStructuredBuffer</span><span class="o">&lt;</span><span class="n">indirect_draw</span><span class="o">&gt;</span> <span class="n">output_draws</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">u0</span><span class="p">,</span> <span class="n">space0</span><span class="p">);</span>

<span class="n">bool</span> <span class="nf">aabb_vs_frustum</span><span class="p">(</span><span class="kt">float3</span> <span class="n">aabb_pos</span><span class="p">,</span> <span class="kt">float3</span> <span class="n">aabb_extent</span><span class="p">,</span> <span class="kt">float4</span> <span class="n">planes</span><span class="p">[</span><span class="mi">6</span><span class="p">])</span> <span class="p">{</span>
    <span class="n">bool</span> <span class="n">inside</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">int</span> <span class="n">p</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">p</span> <span class="o">&lt;</span> <span class="mi">6</span><span class="p">;</span> <span class="o">++</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">float3</span> <span class="n">sign_flip</span> <span class="o">=</span> <span class="nb">sign</span><span class="p">(</span><span class="n">planes</span><span class="p">[</span><span class="n">p</span><span class="p">].</span><span class="n">xyz</span><span class="p">)</span> <span class="o">*</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">;</span>
        <span class="n">float</span> <span class="n">pd</span> <span class="o">=</span> <span class="n">planes</span><span class="p">[</span><span class="n">p</span><span class="p">].</span><span class="n">w</span><span class="p">;</span>
        <span class="n">float</span> <span class="n">d2</span> <span class="o">=</span> <span class="nb">dot</span><span class="p">(</span><span class="n">aabb_pos</span> <span class="o">+</span> <span class="n">aabb_extent</span> <span class="o">*</span> <span class="n">sign_flip</span><span class="p">,</span> <span class="n">planes</span><span class="p">[</span><span class="n">p</span><span class="p">].</span><span class="n">xyz</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">d2</span> <span class="o">&gt;</span> <span class="o">-</span><span class="n">pd</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">inside</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">inside</span><span class="p">;</span>
<span class="p">}</span>

<span class="p">[</span><span class="nb">numthreads</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span>
<span class="kt">void</span> <span class="nf">cs_frustum_cull</span><span class="p">(</span><span class="n">uint</span> <span class="n">did</span> <span class="o">:</span> <span class="n">SV_DispatchThreadID</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// grab entity draw data</span>
    <span class="n">extent_data</span> <span class="n">extents</span> <span class="o">=</span> <span class="n">get_extent_data</span><span class="p">(</span><span class="n">did</span><span class="p">);</span>
    <span class="n">camera_data</span> <span class="n">main_camera</span> <span class="o">=</span> <span class="n">get_camera_data</span><span class="p">();</span>

    <span class="c1">// grab potential draw call</span>
    <span class="n">indirect_draw</span> <span class="n">input</span> <span class="o">=</span> <span class="n">input_draws</span><span class="p">[</span><span class="n">resources</span><span class="p">.</span><span class="n">input1</span><span class="p">.</span><span class="n">index</span><span class="p">][</span><span class="n">did</span><span class="p">];</span>

    <span class="k">if</span><span class="p">(</span><span class="n">aabb_vs_frustum</span><span class="p">(</span><span class="n">extents</span><span class="p">.</span><span class="n">pos</span><span class="p">,</span> <span class="n">extents</span><span class="p">.</span><span class="n">extent</span><span class="p">,</span> <span class="n">main_camera</span><span class="p">.</span><span class="n">planes</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">output_draws</span><span class="p">[</span><span class="n">resources</span><span class="p">.</span><span class="n">input0</span><span class="p">.</span><span class="n">index</span><span class="p">].</span><span class="n">Append</span><span class="p">(</span><span class="n">input</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The plane culling code is some old code from my C++ code base, it was implemented following this excellent <a href="https://fgiesen.wordpress.com/2010/10/17/view-frustum-culling">article</a> from <a href="@rygorous@mastodon.gamedev.place">ryg</a>.</p>

<p>Switching from regular <code class="language-plaintext highlighter-rouge">draw_indexed</code> calls to <code class="language-plaintext highlighter-rouge">draw_indirect</code> improves the CPU time significantly (80ms to 16ms with v-sync) and the GPU time also drops due to the decreased vertex workload. I was able to then increase the number of entities and get the same performance. I did notice then that the program still remains CPU bound with higher draw or entity counts - this is due to the entities positions, world matrices, and bounds being updated on the CPU. More of this work could be offloaded to the GPU and there can also be consideration to manage static vs dynamic objects differently. There seems to be some increased CPU overhead with larger draw counts in <code class="language-plaintext highlighter-rouge">execute_indirect</code> I want to also investigate the difference in performance of indirect draw indexed vs draw indexed instanced.</p>

<p>There is still more research to do in this area. I don’t have a GPU that supports mesh shaders yet so I am still investigating what is possible without such luxuries, but I think instanced <code class="language-plaintext highlighter-rouge">execute_indirect</code> calls will be helpful and some triangle / cluster level culling could also be added for large meshes, I can see a few possible in-roads there without the need for mesh shaders. But while that stuff sits in the back of my mind, putting all of this together leads to the next section.</p>

<h2 id="bindless--gpu-driven-entity-component-system">Bindless / GPU Driven Entity Component System</h2>

<p>With the bindless setup and GPU driven examples in mind, some structure started to form for how these systems will be driven by entities in the entity component system. Across the various samples the following data is required for access on the GPU.</p>

<ul>
  <li>per-entity draw data (world matrix).</li>
  <li>per-entity bounds / entents data for GPU culling.</li>
  <li>material data (texture ids, colours, material parameters).</li>
  <li>light data (positions, colours, attenuation factors).</li>
  <li>camera data (projection matrices, view positions).</li>
</ul>

<p>This data can be stored in <code class="language-plaintext highlighter-rouge">StructuredBuffers</code> and updated each frame. The <code class="language-plaintext highlighter-rouge">pmfx</code> system creates a set of <code class="language-plaintext highlighter-rouge">WorldBuffers</code> which is essentially a structure of arrays containing a <code class="language-plaintext highlighter-rouge">DynamicBuffer</code>, which is a structured buffer that can grow and stretch like a vector. I am using persistently mapped buffers to make updates to the GPU and multi-buffering the internals so each frame we write to a back buffer while the front buffer can be read on the GPU. The world buffers contains the following:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">DynamicWorldBuffers</span><span class="o">&lt;</span><span class="n">D</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">Device</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// Structured buffer containing bindless draw call information `DrawData`</span>
    <span class="k">pub</span> <span class="n">draw</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">DrawData</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="cd">/// Structured buffer containing bindless draw call information `DrawData`</span>
    <span class="k">pub</span> <span class="n">extent</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">ExtentData</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// Structured buffer containing `MaterialData`</span>
    <span class="k">pub</span> <span class="n">material</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">MaterialData</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// Structured buffer containing `PointLightData`</span>
    <span class="k">pub</span> <span class="n">point_light</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">PointLightData</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// Structured buffer containing `SpotLightData`</span>
    <span class="k">pub</span> <span class="n">spot_light</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">SpotLightData</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// Structured buffer containing `DirectionalLightData`</span>
    <span class="k">pub</span> <span class="n">directional_light</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">DirectionalLightData</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="cd">/// Constant buffer containing camera info</span>
    <span class="k">pub</span> <span class="n">camera</span><span class="p">:</span> <span class="n">DynamicBuffer</span><span class="o">&lt;</span><span class="n">D</span><span class="p">,</span> <span class="n">CameraData</span><span class="o">&gt;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In the shader code we have these un-bounded bindless arrays of different resource types which all alias the same register but sit in a different space.</p>

<div class="language-hlsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// structures of arrays for indriect / bindless lookups</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">draw_data</span><span class="o">&gt;</span> <span class="n">draws</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space0</span><span class="p">);</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">extent_data</span><span class="o">&gt;</span> <span class="n">extents</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space1</span><span class="p">);</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">material_data</span><span class="o">&gt;</span> <span class="n">materials</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space2</span><span class="p">);</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">point_light_data</span><span class="o">&gt;</span> <span class="n">point_lights</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space3</span><span class="p">);</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">spot_light_data</span><span class="o">&gt;</span> <span class="n">spot_lights</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space4</span><span class="p">);</span>
<span class="kt">StructuredBuffer</span><span class="o">&lt;</span><span class="n">directional_light_data</span><span class="o">&gt;</span> <span class="n">directional_lights</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space5</span><span class="p">);</span>

<span class="c1">// textures </span>
<span class="kt">Texture2D</span> <span class="n">textures</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space6</span><span class="p">);</span>
<span class="kt">Texture2DMS</span><span class="o">&lt;</span><span class="kt">float4</span><span class="p">,</span> <span class="mi">8</span><span class="o">&gt;</span> <span class="n">msaa8x_textures</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space7</span><span class="p">);</span>
<span class="kt">TextureCube</span> <span class="n">cubemaps</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space8</span><span class="p">);</span>
<span class="kt">Texture2DArray</span> <span class="n">texture_arrays</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space9</span><span class="p">);</span>
<span class="kt">Texture3D</span> <span class="n">volume_textures</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">space10</span><span class="p">);</span>
</code></pre></div></div>

<p>All resources go into the same <code class="language-plaintext highlighter-rouge">gfx::Heap</code>. I call this the <code class="language-plaintext highlighter-rouge">shader_heap</code> and it contains textures of all kinds as well as structured buffers. We can then use indices to look up the information we need on the GPU. Some things like materials can have 2 levels of indirection (first lookup the material by ID and then lookup textures by ID provided by the material). Depending on how draw calls are made, this information may come from different sources and I have explored a few different strategies I will cover later in this post, but for the simplest approach let’s just say that per draw call, we use <code class="language-plaintext highlighter-rouge">PushConstants</code> for each entity to push, and we need to push some constants that can tell us the id’s of each of the world buffers. The <code class="language-plaintext highlighter-rouge">WorldBuffersInfo</code> struct contains a pair of <code class="language-plaintext highlighter-rouge">uint</code>’s - one to identify the location of the buffer and one to notify the length of the buffer so we can loop over <code class="language-plaintext highlighter-rouge">n</code> lights and also offers some opportunity to perform some range checks. In the context of <code class="language-plaintext highlighter-rouge">execute_indirect</code> the id’s of the draw and material buffers are pushed through as part of the indirect draw arguments.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// bind view push constants</span>
<span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span><span class="n">slot</span><span class="py">.slot</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">camera</span><span class="py">.view_projection_matrix</span><span class="p">));</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span><span class="n">slot</span><span class="py">.slot</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">camera</span><span class="py">.view_position</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// bind the world buffer info</span>
<span class="k">let</span> <span class="n">world_buffer_info</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_world_buffer_info</span><span class="p">();</span>
<span class="k">let</span> <span class="n">slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">slot</span> <span class="p">{</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_render_constants</span><span class="p">(</span>
        <span class="n">slot</span><span class="py">.slot</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">num_32bit_constants</span><span class="p">(</span><span class="o">&amp;</span><span class="n">world_buffer_info</span><span class="p">),</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">world_buffer_info</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// bind the shader resource heap</span>
<span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_heap</span><span class="p">(</span><span class="n">pipeline</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">pmfx</span><span class="py">.shader_heap</span><span class="p">);</span>
</code></pre></div></div>

<p>Now in any shader we can look up the <code class="language-plaintext highlighter-rouge">WorldBuffers</code> and get a particular draw, material, or light data. I made some utility functions to assist this process which also makes the lookups more readable.</p>

<div class="language-hlsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// get entity world matrix based on entity id</span>
<span class="n">draw_data</span> <span class="n">draw</span> <span class="o">=</span> <span class="n">get_draw_data</span><span class="p">(</span><span class="n">entity_input</span><span class="p">.</span><span class="n">ids</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>

<span class="c1">// get entity material based on id</span>
<span class="n">material_data</span> <span class="n">mat</span> <span class="o">=</span> <span class="n">get_material_data</span><span class="p">(</span><span class="n">entity_input</span><span class="p">.</span><span class="n">ids</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>

<span class="c1">// lookup lights and loop over</span>
<span class="n">uint</span> <span class="n">point_lights_id</span> <span class="o">=</span> <span class="n">world_buffer_info</span><span class="p">.</span><span class="n">point_light</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">uint</span> <span class="n">point_lights_count</span> <span class="o">=</span> <span class="n">world_buffer_info</span><span class="p">.</span><span class="n">point_light</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>

<span class="k">if</span><span class="p">(</span><span class="n">point_lights_id</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">point_lights_count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">point_light_data</span> <span class="n">light</span> <span class="o">=</span> <span class="n">point_lights</span><span class="p">[</span><span class="n">point_lights_id</span><span class="p">][</span><span class="n">i</span><span class="p">];</span>

        <span class="c1">// ..</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="compute-passes">Compute Passes</h2>

<p>The main scene scene can be rendered through render systems and driven by the <code class="language-plaintext highlighter-rouge">bevy_ecs</code> scheduler and I have now added support to provide compute passes inside <code class="language-plaintext highlighter-rouge">.pmfx</code> configs, which can be dispatched automatically or hooked into their own function if some custom code is required. I intend to do all post-processing through compute shaders and completely abandon rasterization post-processing. If all data required by a compute shader is supplied in a <code class="language-plaintext highlighter-rouge">.pmfx</code> config, it is very quick and easy to integrate new compute passes to the frame’s render graph:</p>

<div class="language-jsonnet highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">textures</span><span class="p">:</span> <span class="p">{</span>
    <span class="nx">compute_texture3d</span><span class="p">:</span> <span class="p">{</span>
        <span class="nx">width</span><span class="p">:</span> <span class="mi">64</span><span class="p">,</span>
        <span class="nx">height</span><span class="p">:</span> <span class="mi">64</span><span class="p">,</span>
        <span class="nx">depth</span><span class="p">:</span> <span class="mi">64</span><span class="p">,</span>
        <span class="nx">usage</span><span class="p">:</span> <span class="p">[</span><span class="nx">UnorderedAccess</span><span class="p">,</span> <span class="nx">ShaderResource</span><span class="p">]</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nx">pipelines</span><span class="p">:</span> <span class="p">{</span>
    <span class="nx">compute_write_texture3d</span><span class="p">:</span> <span class="p">{</span>
        <span class="nx">cs</span><span class="p">:</span> <span class="nx">cs_write_texture3d</span>
        <span class="nx">push_constants</span><span class="p">:</span> <span class="p">[</span>
            <span class="nx">resources</span>
        <span class="p">]</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nx">render_graphs</span><span class="p">:</span> <span class="p">{</span>
    <span class="nx">compute_test</span><span class="p">(</span><span class="nx">base</span><span class="p">):</span> <span class="p">{</span>
        <span class="nx">write_texture</span><span class="p">:</span> <span class="p">{</span>
            <span class="kd">function</span><span class="p">:</span> <span class="s">"dispatch_compute"</span>
            <span class="nx">pipelines</span><span class="p">:</span> <span class="p">[</span><span class="s">"compute_write_texture3d"</span><span class="p">]</span>
            <span class="nx">uses</span><span class="p">:</span> <span class="p">[</span>
                <span class="p">[</span><span class="s">"compute_texture3d"</span><span class="p">,</span> <span class="s">"Write"</span><span class="p">]</span>
            <span class="p">]</span>
            <span class="nx">target_dimension</span><span class="p">:</span> <span class="s">"compute_texture3d"</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Bindless rendering again makes light work of the configuration because all textures and buffers we might want to use will already be allocated inside a heap, and each resource will have appropriate views setup based on usage flags supplied during creation. The only thing we need to know is the index of the resource view we wish to access a resource through. Resource usages can be specified in the <code class="language-plaintext highlighter-rouge">.pmfx</code> config, and you can also notify the target resource dimensions (the one which you are writing to) so that on the code size the group size can be automatically worked out.</p>

<div class="language-jsonnet highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">textures</span><span class="p">:</span> <span class="p">{</span>
    <span class="nx">gbuffer_albedo</span><span class="p">:</span> <span class="p">{</span>
        <span class="nx">ratio</span><span class="p">:</span> <span class="p">{</span>
            <span class="nx">window</span><span class="p">:</span> <span class="nx">main_dock</span>
            <span class="nx">scale</span><span class="p">:</span> <span class="mf">1.0</span>
        <span class="p">}</span>
        <span class="nb">format</span><span class="p">:</span> <span class="nx">RGBA16f</span>
        <span class="nx">usage</span><span class="p">:</span> <span class="p">[</span><span class="s">"ShaderResource"</span><span class="p">,</span> <span class="s">"RenderTarget"</span><span class="p">]</span>
        <span class="nx">samples</span><span class="p">:</span> <span class="mi">8</span>
    <span class="p">}</span>
    <span class="nx">gbuffer_normal</span><span class="p">(</span><span class="nx">gbuffer_albedo</span><span class="p">):</span> <span class="p">{}</span>
    <span class="nx">gbuffer_position</span><span class="p">(</span><span class="nx">gbuffer_albedo</span><span class="p">):</span> <span class="p">{}</span>
    <span class="nx">gbuffer_depth</span><span class="p">(</span><span class="nx">gbuffer_albedo</span><span class="p">):</span> <span class="p">{</span>
        <span class="nb">format</span><span class="p">:</span> <span class="nx">D24nS8u</span>
        <span class="nx">usage</span><span class="p">:</span> <span class="p">[</span><span class="s">"ShaderResource"</span><span class="p">,</span> <span class="s">"DepthStencil"</span><span class="p">]</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nx">render_graphs</span><span class="p">:</span> <span class="p">{</span>
    <span class="nx">multiple_render_targets_test</span><span class="p">:</span> <span class="p">{</span>
        <span class="nx">meshes</span><span class="p">:</span> <span class="p">{</span>
            <span class="nx">view</span><span class="p">:</span> <span class="s">"heightmap_mrt_view"</span>
            <span class="nx">pipelines</span><span class="p">:</span> <span class="p">[</span>
                <span class="s">"heightmap_mrt"</span>
            <span class="p">]</span>
            <span class="kd">function</span><span class="p">:</span> <span class="s">"render_meshes_pipeline"</span>
        <span class="p">}</span>
        <span class="nx">resolve_mrt</span><span class="p">:</span> <span class="p">{</span>
            <span class="kd">function</span><span class="p">:</span> <span class="s">"dispatch_compute"</span>
            <span class="nx">pipelines</span><span class="p">:</span> <span class="p">[</span><span class="s">"heightmap_mrt_resolve"</span><span class="p">]</span>
            <span class="nx">uses</span><span class="p">:</span> <span class="p">[</span>
                <span class="p">[</span><span class="s">"staging_output"</span><span class="p">,</span> <span class="s">"Write"</span><span class="p">]</span>
                <span class="p">[</span><span class="s">"gbuffer_albedo"</span><span class="p">,</span> <span class="s">"ReadMsaa"</span><span class="p">]</span>
                <span class="p">[</span><span class="s">"gbuffer_normal"</span><span class="p">,</span> <span class="s">"ReadMsaa"</span><span class="p">]</span>
                <span class="p">[</span><span class="s">"gbuffer_position"</span><span class="p">,</span> <span class="s">"ReadMsaa"</span><span class="p">]</span>
                <span class="p">[</span><span class="s">"gbuffer_depth"</span><span class="p">,</span> <span class="s">"ReadMsaa"</span><span class="p">]</span>
            <span class="p">]</span>
            <span class="nx">target_dimension</span><span class="p">:</span> <span class="s">"staging_output"</span>
            <span class="nx">depends_on</span><span class="p">:</span> <span class="p">[</span><span class="s">"meshes"</span><span class="p">]</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This data is packed together with the resource dimensions, which are also sometimes useful to be able to lookup in the shader for sampling coordinates and so forth.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// To lookup resources in a shader, these are passed to compute shaders:</span>
<span class="cd">/// index = srv (read), uav (write)</span>
<span class="cd">/// dimension is the resource dimension where 2d textures will be (w, h, 1) and 3d will be (w, h, d)</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">ResourceUse</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">index</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">dimension</span><span class="p">:</span> <span class="n">Vec3u</span>
<span class="p">}</span>

<span class="cd">/// Resoure uage for a graph pass</span>
<span class="nd">#[derive(Serialize,</span> <span class="nd">Deserialize,</span> <span class="nd">Clone)]</span>
<span class="k">enum</span> <span class="n">ResourceUsage</span> <span class="p">{</span>
    <span class="cd">/// Write to an un-ordeded access resource or rneder target resource</span>
    <span class="n">Write</span><span class="p">,</span>
    <span class="cd">/// Read from the primary (resovled) resource</span>
    <span class="n">Read</span><span class="p">,</span>
    <span class="cd">/// Read from an MSAA resource</span>
    <span class="n">ReadMsaa</span>
<span class="p">}</span>

<span class="c1">// pass the resource usage indices as push constants</span>
<span class="k">let</span> <span class="n">using_slot</span> <span class="o">=</span> <span class="n">pipeline</span><span class="nf">.get_pipeline_slot</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">DescriptorType</span><span class="p">::</span><span class="n">PushConstants</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span> <span class="o">=</span> <span class="n">using_slot</span> <span class="p">{</span>
    <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">pass</span><span class="py">.use_indices</span><span class="nf">.len</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">num_constants</span> <span class="o">=</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">num_32bit_constants</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pass</span><span class="py">.use_indices</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
        <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.push_compute_constants</span><span class="p">(</span>
            <span class="mi">0</span><span class="p">,</span> 
            <span class="n">num_constants</span><span class="p">,</span> 
            <span class="n">i</span> <span class="k">as</span> <span class="nb">u32</span> <span class="o">*</span> <span class="n">num_constants</span><span class="p">,</span> 
            <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pass</span><span class="py">.use_indices</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-hlsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">resource_use</span> <span class="p">{</span>
    <span class="n">uint</span>  <span class="n">index</span><span class="p">;</span>
    <span class="n">uint3</span> <span class="n">dimension</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">resource_uses</span> <span class="p">{</span>
    <span class="n">resource_use</span> <span class="n">input0</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input1</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input2</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input3</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input4</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input5</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input6</span><span class="p">;</span>
    <span class="n">resource_use</span> <span class="n">input7</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">ConstantBuffer</span><span class="o">&lt;</span><span class="n">resource_uses</span><span class="o">&gt;</span> <span class="n">resources</span><span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">b0</span><span class="p">);</span>

<span class="p">[</span><span class="nb">numthreads</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">)]</span>
<span class="kt">void</span> <span class="nf">cs_write_texture3d</span><span class="p">(</span><span class="n">uint3</span> <span class="n">did</span> <span class="o">:</span> <span class="n">SV_DispatchThreadID</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// ..</span>

    <span class="n">rw_volume_textures</span><span class="p">[</span><span class="n">resources</span><span class="p">.</span><span class="n">input0</span><span class="p">.</span><span class="n">index</span><span class="p">][</span><span class="n">did</span><span class="p">.</span><span class="n">xyz</span><span class="p">]</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="n">nn</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="n">nn</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9</span> <span class="o">?</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span> <span class="o">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>For simple compute passes there may be no need for a user to specify any additional data, so the resource usage information is automatically bound and the compute shader is dispatched automatically. This, in addition to being able to write multiple resources at the same time from a shader, and the opportunity to single pass jobs vs multi-pass ping-pong approaches as seen in raster based systems, is very appealing. I have yet to write any proper post-processes but the infrastructure is now in place.</p>

<p>It might be necessary to drive more complex compute workflows with scene data, which is evident in the GPU driven frustum culling example. Here, instead of having the compute shader automatically dispatched, you can supply a custom function that gets passed the useful aforementioned data (resource usage info, dimensions etc).</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">dispatch_compute_frustum_cull</span><span class="p">(</span>
    <span class="n">pmfx</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Res</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">pass</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">pmfx</span><span class="p">::</span><span class="n">ComputePass</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">indirect_draw_query</span><span class="p">:</span> <span class="n">Query</span><span class="o">&lt;&amp;</span><span class="n">DrawIndirectComponent</span><span class="o">&gt;</span><span class="p">)</span> 
    <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>

    <span class="c1">// custom code to setup compute pipelines</span>

    <span class="c1">// ..</span>

    <span class="n">pass</span><span class="py">.cmd_buf</span><span class="nf">.dispatch</span><span class="p">(</span>
        <span class="nn">gfx</span><span class="p">::</span><span class="n">Size3</span> <span class="p">{</span>
            <span class="n">x</span><span class="p">:</span> <span class="n">indirect_draw</span><span class="py">.max_count</span> <span class="o">/</span> <span class="n">pass</span><span class="py">.numthreads.x</span><span class="p">,</span>
            <span class="n">y</span><span class="p">:</span> <span class="n">pass</span><span class="py">.numthreads.y</span><span class="p">,</span>
            <span class="n">z</span><span class="p">:</span> <span class="n">pass</span><span class="py">.numthreads.z</span>
        <span class="p">},</span>
        <span class="n">pass</span><span class="py">.numthreads</span>
    <span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>After doing a few simple compute examples I was surprised not to see any Direct3D12 validation warnings or errors regarding resource states and prompts to insert transition barriers. At first I thought, ‘great’ it might not be something I need to worry about, but after using the GPU based validation I mentioned earlier to diagnose some GPU hangs, I noticed that there were some validation warnings spewing out in the console. Everything works fine so I haven’t tackled it yet, but having the validation messages to notify is very useful, and with the resource usage information in compute passes and also in render passes, the <code class="language-plaintext highlighter-rouge">pmfx</code> system will be able to automatically insert barriers as I am already doing for render target to shader resource, MSAA resolves and mip-map generation…</p>

<h2 id="generate-mip-maps">Generate Mip Maps</h2>

<p>Direct3D11 and Metal provide mechanisms to generate mip-maps for textures at run time but Direct3D12 has no such inbuilt functionality. Generating mips for textures such as render targets can be quite useful so I added a quick utility to do this. It consists of a built-in compute shader which can perform the downsample iteratively. I initially tried to make a single pass downsample and had some reasonable results, but I put it on hold for the time being as it was taking longer than I initially anticipated. The internal implementation can be changed at a later time, here I just wanted to make sure the API was nice and easy to use. A <code class="language-plaintext highlighter-rouge">gfx::Texture</code> can be created with various flags, internally it may create multiple resources and resource views based on flags. You can create a textures that allows run time mip-map generation and a similar process is also followed for MSAA resolves:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create texture with usage GENERATE_MIP_MAPS</span>
<span class="k">let</span> <span class="n">info</span> <span class="o">=</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">TextureInfo</span> <span class="p">{</span>
    <span class="n">width</span><span class="p">,</span>
    <span class="n">height</span><span class="p">,</span>
    <span class="n">tex_type</span><span class="p">,</span>
    <span class="n">initial_state</span><span class="p">,</span>
    <span class="n">usage</span> <span class="p">|</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">TextureUsage</span><span class="p">::</span><span class="n">GENERATE_MIP_MAPS</span><span class="p">,</span>
    <span class="n">mip_levels</span><span class="p">,</span>
    <span class="n">depth</span><span class="p">:</span> <span class="n">pmfx_texture</span><span class="py">.depth</span><span class="p">,</span>
    <span class="n">array_layers</span><span class="p">:</span> <span class="n">pmfx_texture</span><span class="py">.array_layers</span><span class="p">,</span>
    <span class="n">samples</span><span class="p">:</span> <span class="n">pmfx_texture</span><span class="py">.samples</span><span class="p">,</span>
    <span class="n">format</span><span class="p">:</span> <span class="n">pmfx_texture</span><span class="py">.format</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// create texture with heap</span>
<span class="k">let</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">device</span><span class="py">.create_texture_with_heaps</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="o">&amp;</span><span class="n">info</span><span class="p">,</span>
    <span class="nn">gfx</span><span class="p">::</span><span class="n">TextureHeapInfo</span> <span class="p">{</span>
        <span class="n">shader</span><span class="p">:</span> <span class="n">heap</span><span class="p">,</span>
        <span class="o">..</span><span class="nn">Default</span><span class="p">::</span><span class="nf">default</span><span class="p">()</span>
    <span class="p">},</span>
    <span class="nb">None</span>
<span class="p">)</span><span class="o">?</span><span class="p">;</span>

<span class="c1">// resolve texture</span>
<span class="n">cmd_buf</span><span class="nf">.resolve_texture_subresource</span><span class="p">(</span><span class="n">tex</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

<span class="c1">// generte mips for texture</span>
<span class="n">cmd_buf</span><span class="nf">.generate_mip_maps</span><span class="p">(</span><span class="n">tex</span><span class="p">,</span> <span class="n">device</span><span class="p">,</span> <span class="n">device</span><span class="nf">.get_shader_heap</span><span class="p">())</span><span class="o">?</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="graphics-examples">Graphics Examples</h2>

<p>I have completed a relatively comprehensive set of graphics examples which demonstrate and test the implemented features of the hotline API’s all integrated and using the entity component system kindly provided by <code class="language-plaintext highlighter-rouge">bevy_ecs</code>. Some of these examples are pretty basic and I am leaving them there for test purposes and to aid future work porting the engine to different platforms. Along the way I have been using these to explore different rendering techniques and get a rough idea of performance and will ultimately decide on a final architecture that will be used under the hood by the <code class="language-plaintext highlighter-rouge">ecs</code>. This final architecture is starting to take shape but I will do a quick run-down of the examples I have implemented so far. I went into more detail in previous post about how the <code class="language-plaintext highlighter-rouge">ecs</code> works, but the gist of it is that you supply <code class="language-plaintext highlighter-rouge">setup</code>, <code class="language-plaintext highlighter-rouge">update</code>, and <code class="language-plaintext highlighter-rouge">render</code> functions that are <code class="language-plaintext highlighter-rouge">bevy_ecs</code> systems and then a <code class="language-plaintext highlighter-rouge">render_graph</code> supplied in a <code class="language-plaintext highlighter-rouge">pmfx</code> config file. More information about each of the graphics <a href="https://github.com/polymonster/hotline#examples">examples</a> can be found in the hotline GitHub <a href="https://github.com/polymonster/hotline">repository</a>.</p>

<h2 id="next-up">Next Up</h2>

<p>I am going to continue researching into GPU-Driven rendering techniques and hopefully start creating some more advanced looking demos. I hope you enjoyed this post, if you did you can follow the social links on my website for more content on whichever platforms you use.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Implementing graphics demos and rendering techniques in the hotline Rust engine, tackling automated testing challenges and fleshing out the D3D12 gfx backend API.]]></summary></entry><entry><title type="html">Building a new graphics engine in Rust - Part 3</title><link href="https://polymonster.co.uk/blog/building-new-engine-3" rel="alternate" type="text/html" title="Building a new graphics engine in Rust - Part 3" /><published>2023-03-03T00:00:00+00:00</published><updated>2023-03-03T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/building-new-engine-3</id><content type="html" xml:base="https://polymonster.co.uk/blog/building-new-engine-3"><![CDATA[<p>Following on from the <a href="https://www.polymonster.co.uk/blog/bulding-new-engine-in-rust-2">part 2</a> post a little while ago, I have been continuing work on my graphics engine <a href="https://github.com/polymonster/hotline">hotline</a> in Rust. My recent focus has been on plugins, multi-threaded command buffer generation and hot reloading for Rust code, <code class="language-plaintext highlighter-rouge">hlsl</code> shader code and <code class="language-plaintext highlighter-rouge">pmfx</code> render configs. I have made decent progress to the point where there is something quite usable and structured in a way I am relatively happy with. This leg of the journey has been by far the most challenging though, so I wanted to write about my current progress and detail some of the issues I have faced.</p>

<p>Here are the results of my first sessions actually using the engine. I created these primitives and used the hot reloading system to iterate on them to get perfect vertices, normals, and uv-coordinates. My intention is to use this tool for graphics demos and procedural generation so the focus is on making a live coding environment and not an interactive GUI editor. The visual <code class="language-plaintext highlighter-rouge">client</code> provides some feedback to the user and information but it is not really an editor as such, the editing goes in source code and data files which is reflected in the <code class="language-plaintext highlighter-rouge">client</code>. In time I may decide to add more interactive editor features but for now it’s all about coding. Here’s a demo video of some of the features:</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/jkD78gXfIe0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<p>I am using a single screen with vscode on the left and hotline on the right; by launching the hotline client executable from the vscode terminal it prints errors for hot reloading in the terminal and the line number links are clickable to automatically go to error lines.</p>

<h2 id="recap">Recap</h2>

<p>I had previously created <code class="language-plaintext highlighter-rouge">gfx</code>, <code class="language-plaintext highlighter-rouge">os</code> and <code class="language-plaintext highlighter-rouge">av</code> abstraction API’s that currently have Windows-specific backend implementations, but the API’s are designed to easily add more platforms in the future. At this time I am trying to push as far ahead as possible on a single platform because I did spend a lot of time working on cross platform support in my C++ <a href="https://github.com/polymonster/pmtech">game engine</a> or the engines I have worked on for my day job. Cross platform maintenance can become time consuming, so for a little while I have decided just to focus on feature development.</p>

<p>I have also been on a few side quests that have fallen under the umbrella of this graphics engine project, but I wrote about those separately. They were implementing <a href="https://www.polymonster.co.uk/blog/imgui-backend">imgui</a> with viewports and docking, and <a href="https://github.com/polymonster/maths-rs">maths-rs</a> a linear algebra library I have been working on while away from my Windows desktop machine. A few people asked about maths-rs and why don’t I just use any existing library? I simply wanted something to work on using my laptop in the spare time I had and a maths library was the first thing I thought of. This is a downside of having a Windows only engine, my opportunity to work on it is limited to being chained to a machine in my house. I already had a C++ <a href="https://github.com/polymonster/maths">maths library</a> that I ported a lot of the code from, but while I was there I improved the API consistency, added more overlap, intersection, distance functions to both libraries, and added more tests to assist porting to <code class="language-plaintext highlighter-rouge">maths-rs</code>. Now I’m at the point where I can use all the libraries to start building graphics demos.</p>

<h2 id="cratesio">Crates.io</h2>

<p>You can use <a href="https://github.com/polymonster/hotline">hotline</a> as a library and use the <code class="language-plaintext highlighter-rouge">gfx</code>, <code class="language-plaintext highlighter-rouge">os</code>, <code class="language-plaintext highlighter-rouge">av</code>, <code class="language-plaintext highlighter-rouge">imgui</code>, <code class="language-plaintext highlighter-rouge">pmfx</code>, and any other modules it provides. It is now available on <a href="https://crates.io/crates/hotline-rs">crates.io</a>. To my dismay the crates.io registry for the name “hotline” was already taken as a placeholder by someone <a href="https://crates.io/search?q=hotline">else</a>. The same happened to <a href="https://crates.io/search?q=maths">maths</a>, so for both my projects on crates.io I had to call them <code class="language-plaintext highlighter-rouge">hotline-rs</code> and <code class="language-plaintext highlighter-rouge">maths-rs</code>. It’s a bit disappointing that people claim the names and then haven’t produced any code yet. I’d be fine with someone claiming the name first if they actually had a decent, usable package.</p>

<p>I had some trouble with crates.io because the package size exceeded the lofty limit of 10mb! Most of my repository size was a result of some executables I am using to build data and shaders. I have these tools from prior work so I am still using the python based build system (built into executables with help from <a href="https://pyinstaller.org/en/stable/">PyInstaller</a>). To reduce the repository size I moved the executables and data files for the hotline examples into their own GitHub repository <a href="https://github.com/polymonster/hotline-data">hotline-data</a>, which is cloned inside <code class="language-plaintext highlighter-rouge">hotline</code> as part of a <code class="language-plaintext highlighter-rouge">cargo build</code>.</p>

<p>This data feature is optional and enabled by default, but it means you could bring your own build system or use the one provided, and more importantly this feature can be disabled from package builds / publishing to crates.io.</p>

<p>I also had some compilation issues when publishing the package to crates.io, because currently Windows is the only supported platform. The core API’s are generic using compile time traits, but samples and plugins need to instantiate a concrete type of GPU <code class="language-plaintext highlighter-rouge">Device</code> or operating system <code class="language-plaintext highlighter-rouge">Window</code> and there are no macOS or Linux supported backends yet. In time I would like to add a stub implementation for each module. I worked around this for now by making the entire files that require concrete types as Windows only.</p>

<h2 id="plugin-architecture--hot-reloading">Plugin-Architecture / Hot Reloading</h2>

<p>The main work I have been focusing on for the last few weeks is making live reloadable code work through a plugin system, where plugins can be loaded dynamically at run time with no modifications required to the <code class="language-plaintext highlighter-rouge">client</code> executable. The <code class="language-plaintext highlighter-rouge">client</code> provides a very thin wrapper around a main loop, it creates some core resources such as <code class="language-plaintext highlighter-rouge">os::App</code>, <code class="language-plaintext highlighter-rouge">gfx::Device</code> and so forth. It provides a core loop that will submit command lists and swap buffers and makes it easy to hook in your own update or render logic. With the <code class="language-plaintext highlighter-rouge">client</code> running, <code class="language-plaintext highlighter-rouge">plugins</code> can be dynamically loaded from <code class="language-plaintext highlighter-rouge">dylibs</code> and code changes can be detected causing the library to be rebuilt and reloaded with the client still running. I am using <a href="https://crates.io/crates/hot-lib-reloader">hot-lib-reloader</a> to assist the lib reloading, although I need to bypass some of it’s cool features like the <code class="language-plaintext highlighter-rouge">hot_functions_from_file</code> macro because I wanted to remove the dependency on the <code class="language-plaintext highlighter-rouge">client</code> knowing about the plugins.</p>

<p>Creating a new plugin is quite easy, first you need a dynamic library crate type. The <a href="https://github.com/polymonster/hotline/tree/master/plugins">plugins</a> directory in hotline has a few different plugins that can be used as examples. But you basically just need a <code class="language-plaintext highlighter-rouge">Cargo.toml</code> like this</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[package]</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"ecs"</span>
<span class="py">version</span> <span class="p">=</span> <span class="s">"0.1.0"</span>
<span class="py">edition</span> <span class="p">=</span> <span class="s">"2021"</span>

<span class="nn">[lib]</span>
<span class="py">crate-type</span> <span class="p">=</span> <span class="p">[</span><span class="s">"rlib"</span><span class="p">,</span> <span class="s">"dylib"</span><span class="p">]</span>

<span class="nn">[dependencies]</span>
<span class="nn">hotline-rs</span> <span class="o">=</span> <span class="p">{</span> <span class="py">path</span> <span class="p">=</span> <span class="s">"../.."</span> <span class="p">}</span>
</code></pre></div></div>

<p>Inside a dynamic library plugin you can choose to get hooked into a few core function calls from the client each frame by implementing the <code class="language-plaintext highlighter-rouge">Plugin</code> trait:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="o">*</span><span class="p">;</span>

<span class="k">pub</span> <span class="k">struct</span> <span class="n">EmptyPlugin</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Plugin</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">EmptyPlugin</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">create</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="k">Self</span> <span class="p">{</span>
        <span class="n">EmptyPlugin</span> <span class="p">{</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">setup</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span><span class="p">)</span> 
        <span class="k">-&gt;</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"plugin setup"</span><span class="p">);</span>
        <span class="n">client</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">update</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="nn">client</span><span class="p">::</span><span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span><span class="p">)</span>
        <span class="k">-&gt;</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"plugin update"</span><span class="p">);</span>
        <span class="n">client</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">unload</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span><span class="p">)</span>
        <span class="k">-&gt;</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"plugin unload"</span><span class="p">);</span>
        <span class="n">client</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">ui</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span><span class="p">)</span>
    <span class="k">-&gt;</span> <span class="n">Client</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="p">,</span> <span class="nn">os_platform</span><span class="p">::</span><span class="n">App</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"plugin ui"</span><span class="p">);</span>
        <span class="n">client</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nd">hotline_plugin!</span><span class="p">[</span><span class="n">EmptyPlugin</span><span class="p">];</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">hotline_plugin!</code> macro creates a c-abi wrapper around the <code class="language-plaintext highlighter-rouge">Plugin</code> trait. I initially tried to use a <code class="language-plaintext highlighter-rouge">Box&lt;dyn Plugin&gt;</code> which was returned from the plugin library to the main client executable so the trait functions could be called, but when trying to lookup the function in the <code class="language-plaintext highlighter-rouge">vtable</code> memory seemed to be garbage. After some investigation this seems to be because Rust does not have a stable-abi so I created the macro to work around this by allocating the plugin trait on the heap, pass a FFI pointer back to the client and then pass the FFI pointer into a c-abi which the macro generates. I expected with the same compiler that I wouldn’t need to make the wrapper API but was unable to get it working.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[macro_export]</span>
<span class="nd">macro_rules!</span> <span class="n">hotline_plugin</span> <span class="p">{</span>
    <span class="p">(</span><span class="nv">$input:ident</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
        
        <span class="c1">// c-abi wrapper for `Plugin::create`</span>
        <span class="nd">#[no_mangle]</span>
        <span class="k">pub</span> <span class="k">fn</span> <span class="nf">create</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">core</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_void</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">ptr</span> <span class="o">=</span> <span class="nn">new_plugin</span><span class="p">::</span><span class="o">&lt;</span><span class="nv">$input</span><span class="o">&gt;</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">core</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_void</span><span class="p">;</span>
            <span class="k">unsafe</span> <span class="p">{</span>
                <span class="k">let</span> <span class="n">plugin</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">transmute</span><span class="p">::</span><span class="o">&lt;*</span><span class="k">mut</span> <span class="nn">core</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_void</span><span class="p">,</span> <span class="o">*</span><span class="k">mut</span> <span class="nv">$input</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span>
                <span class="k">let</span> <span class="n">plugin</span> <span class="o">=</span> <span class="n">plugin</span><span class="nf">.as_mut</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
                <span class="o">*</span><span class="n">plugin</span> <span class="o">=</span> <span class="nv">$input</span><span class="p">::</span><span class="nf">create</span><span class="p">();</span>
            <span class="p">}</span>
            <span class="n">ptr</span>
        <span class="p">}</span>

        <span class="c1">// ..</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="ecs-plugin">ECS Plugin</h2>

<p>All plugins do not necessarily have to implement the <code class="language-plaintext highlighter-rouge">Plugin</code> trait. Plugins can extend others in custom ways. I started on a basic <code class="language-plaintext highlighter-rouge">ecs</code> that uses <a href="https://docs.rs/bevy_ecs/latest/bevy_ecs/">bevy_ecs</a> and the bevy <code class="language-plaintext highlighter-rouge">Scheduler</code> to distribute work onto different threads. The reason for this <code class="language-plaintext highlighter-rouge">plugin-ception</code> kind of approach is to be able to edit the core <code class="language-plaintext highlighter-rouge">ecs</code> while the client is running, as well as extension plugins, but also trying to make the whole thing as flexible as possible to allow freedom to implement totally different types of plugins and be able to work on them with hot-reloading.</p>

<p>To load functions from other libraries you can access the <code class="language-plaintext highlighter-rouge">libs</code> currently loaded in the hotline <code class="language-plaintext highlighter-rouge">client</code>. These are just a wrapper around <a href="https://docs.rs/libloading/latest/libloading/">libloading</a> that allows you to retrieve a <code class="language-plaintext highlighter-rouge">Symbol&lt;T&gt;</code> by name.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Finds available demo names from inside ecs compatible plugins, call the function `get_system_&lt;lib_name&gt;` to disambiguate</span>
<span class="k">fn</span> <span class="nf">get_demo_list</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">PlatformClient</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">demos</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">lib_name</span><span class="p">,</span> <span class="n">lib</span><span class="p">)</span> <span class="k">in</span> <span class="o">&amp;</span><span class="n">client</span><span class="py">.libs</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">function_name</span> <span class="o">=</span> <span class="nd">format!</span><span class="p">(</span><span class="s">"get_demos_{}"</span><span class="p">,</span> <span class="n">lib_name</span><span class="p">)</span><span class="nf">.to_string</span><span class="p">();</span>
            <span class="k">let</span> <span class="n">list</span> <span class="o">=</span> <span class="n">lib</span><span class="py">.get_symbol</span><span class="p">::</span><span class="o">&lt;</span><span class="k">unsafe</span> <span class="k">extern</span> <span class="k">fn</span><span class="p">()</span> <span class="k">-&gt;</span>  <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">function_name</span><span class="nf">.as_bytes</span><span class="p">());</span>
            <span class="k">if</span> <span class="k">let</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">list_fn</span><span class="p">)</span> <span class="o">=</span> <span class="n">list</span> <span class="p">{</span>
                <span class="k">let</span> <span class="k">mut</span> <span class="n">lib_demos</span> <span class="o">=</span> <span class="nf">list_fn</span><span class="p">();</span>
                <span class="n">demos</span><span class="nf">.append</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">lib_demos</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">demos</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The core <code class="language-plaintext highlighter-rouge">ecs</code> provides some functionality to create <code class="language-plaintext highlighter-rouge">setup</code>, <code class="language-plaintext highlighter-rouge">update</code> or <code class="language-plaintext highlighter-rouge">render</code> systems. You can add your own system functions inside different <code class="language-plaintext highlighter-rouge">plugins</code> and have the <code class="language-plaintext highlighter-rouge">ecs</code> plugin locate these systems to build schedules for different <code class="language-plaintext highlighter-rouge">demos</code>. All system stages get dispatched concurrently on different threads, so in time it’s likely more stages will be added to <code class="language-plaintext highlighter-rouge">setup</code>, <code class="language-plaintext highlighter-rouge">update</code> and <code class="language-plaintext highlighter-rouge">render</code>. Defining custom systems is quite straightforward. These are just <code class="language-plaintext highlighter-rouge">bevy_ecs</code> systems:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// update system which takes hotline resources `main_window`, `pmfx` and `app`</span>
<span class="nd">#[no_mangle]</span>
<span class="k">fn</span> <span class="nf">update_cameras</span><span class="p">(</span>
    <span class="n">app</span><span class="p">:</span> <span class="n">Res</span><span class="o">&lt;</span><span class="n">AppRes</span><span class="o">&gt;</span><span class="p">,</span> 
    <span class="n">main_window</span><span class="p">:</span> <span class="n">Res</span><span class="o">&lt;</span><span class="n">MainWindowRes</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="k">mut</span> <span class="n">pmfx</span><span class="p">:</span> <span class="n">ResMut</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="k">mut</span> <span class="n">query</span><span class="p">:</span> <span class="n">Query</span><span class="o">&lt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">Name</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Position</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Rotation</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">ViewProjectionMatrix</span><span class="p">),</span> <span class="n">With</span><span class="o">&lt;</span><span class="n">Camera</span><span class="o">&gt;&gt;</span><span class="p">)</span> <span class="p">{</span>    
    <span class="k">let</span> <span class="n">app</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">app</span><span class="na">.0</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="k">mut</span> <span class="n">position</span><span class="p">,</span> <span class="k">mut</span> <span class="n">rotation</span><span class="p">,</span> <span class="k">mut</span> <span class="n">view_proj</span><span class="p">)</span> <span class="k">in</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">query</span> <span class="p">{</span>
        <span class="c1">// ..</span>
    <span class="p">}</span>

    <span class="c1">// ..</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Rendering systems get generated from render graphs specified through the <code class="language-plaintext highlighter-rouge">pmfx</code> system, and hook themselves into a <code class="language-plaintext highlighter-rouge">bevy_ecs</code> system function call.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// render system which takes hotline resource `pmfx` and a `pmfx::View`</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">render_meshes</span><span class="p">(</span>
    <span class="n">pmfx</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Res</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">view</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">pmfx</span><span class="p">::</span><span class="n">View</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">mesh_draw_query</span><span class="p">:</span> <span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Query</span><span class="o">&lt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">WorldMatrix</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">MeshComponent</span><span class="p">)</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>
    
    <span class="c1">// ..</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In order to dynamically locate and call these functions we need to supply a bit of boiler plate to look up the functions by name.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Register demo names for this plugin which is called `ecs_demos`</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">get_demos_ecs_demos</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nd">demos!</span><span class="p">[</span>
        <span class="s">"primitives"</span><span class="p">,</span>
        <span class="s">"draw_indexed"</span><span class="p">,</span>
        <span class="s">"draw_indexed_push_constants"</span><span class="p">,</span>

        <span class="c1">// ..</span>
    <span class="p">]</span>
<span class="p">}</span>

<span class="cd">/// Register plugin system functions</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">get_system_ecs_demos</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span> <span class="n">view_name</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">SystemDescriptor</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">name</span><span class="nf">.as_str</span><span class="p">()</span> <span class="p">{</span>
        <span class="c1">// setup functions</span>
        <span class="s">"setup_draw_indexed"</span> <span class="k">=&gt;</span> <span class="nd">system_func!</span><span class="p">[</span><span class="n">setup_draw_indexed</span><span class="p">],</span>
        <span class="s">"setup_primitives"</span> <span class="k">=&gt;</span> <span class="nd">system_func!</span><span class="p">[</span><span class="n">setup_primitives</span><span class="p">],</span>
        <span class="s">"setup_draw_indexed_push_constants"</span> <span class="k">=&gt;</span> <span class="nd">system_func!</span><span class="p">[</span><span class="n">setup_draw_indexed_push_constants</span><span class="p">],</span>

        <span class="c1">// render functions</span>
        <span class="s">"render_meshes"</span> <span class="k">=&gt;</span> <span class="nd">render_func!</span><span class="p">[</span><span class="n">render_meshes</span><span class="p">,</span> <span class="n">view_name</span><span class="p">],</span>

        <span class="c1">// I had to add this `std::hint::black_box`!</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">hint</span><span class="p">::</span><span class="nf">black_box</span><span class="p">(</span><span class="nb">None</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I hope to use <code class="language-plaintext highlighter-rouge">#[derive()]</code> macros to reduce the need for the boilerplate code, but I haven’t really looked into it in much detail yet. I had to add <code class="language-plaintext highlighter-rouge">std::hint::black_box</code> around the <code class="language-plaintext highlighter-rouge">None</code> case in the <code class="language-plaintext highlighter-rouge">get_system_</code> functions. I am here getting away without calling these functions without a c-abi wrapper so that might be the reason. Everything is working for the time being but I am prepared to address this if need be.</p>

<h2 id="pmfx">Pmfx</h2>

<p>Another core engine feature I have been working on is <code class="language-plaintext highlighter-rouge">pmfx</code> which is a high level platform agnostic graphics API that builds on top of the lower level <code class="language-plaintext highlighter-rouge">gfx</code> API. The idea here is that the <code class="language-plaintext highlighter-rouge">gfx</code> backends are fairly dumb wrapper API’s and <code class="language-plaintext highlighter-rouge">pmfx</code> can bring that low level functionality together in a way which is shared amongst different platforms, <code class="language-plaintext highlighter-rouge">pmfx</code> is also a data driven rendering system where render pipelines, passes, views, and graphs can be specified in <a href="https://github.com/polymonster/jsn">jsn</a> config files to make light work of configuring rendering. This is not new code and it’s something I have worked on and used in other code bases, but it is currently undergoing an overhaul to bring it more inline with modern graphics API architectures. The main <a href="https://github.com/polymonster/pmfx-shader">pmfx-shader</a> repository contains the data side of all of this.</p>

<p>So how does it work? You can write regular <code class="language-plaintext highlighter-rouge">hlsl</code> shaders and then supply <code class="language-plaintext highlighter-rouge">pmfx</code> files, which are used to create <code class="language-plaintext highlighter-rouge">pipelines</code>, <code class="language-plaintext highlighter-rouge">views</code>, <code class="language-plaintext highlighter-rouge">textures</code> (and render targets) and more. <code class="language-plaintext highlighter-rouge">views</code> are like render passes but with a bit more detail, such as a function that can be dispatched into a render pass with a camera for example.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>textures: {
    main_colour: {
        ratio: {
            window: "main_window",
            scale: 1.0
        }
        format: "RGBA8n"
        usage: ["ShaderResource", "RenderTarget"]
        samples: 8
    }
    main_depth(main_colour): {
        format: "D24nS8u"
        usage: ["ShaderResource", "DepthStencil"]
        samples: 8
    }
}
views: {
    main_view: {
        render_target: [
            "main_colour"
        ]
        clear_colour: [0.45, 0.55, 0.60, 1.0]
        depth_stencil: [
            "main_depth"
        ]
        clear_depth: 1.0
        viewport: [0.0, 0.0, 1.0, 1.0, 0.0, 1.0]
        camera: "main_camera"
    }
    main_view_no_clear(main_view): {
        clear_colour: null
        clear_depth: null
    }
}
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">pmfx</code> config files supply useful defaults to minimise the amount of members that need initialising to setup render state, and <code class="language-plaintext highlighter-rouge">pmfx</code> can parse <code class="language-plaintext highlighter-rouge">hlsl</code> files with extra context provided through <code class="language-plaintext highlighter-rouge">pipelines</code> to generate shader reflection info, descriptor layouts, and more, which is yet to come.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pipelines: {
    mesh_debug: {
        vs: vs_mesh
        ps: ps_checkerboard
        push_constants: [
            "view_push_constants"
            "draw_push_constants"
        ]
        depth_stencil_state: depth_test_less
        raster_state: cull_back
        topology: "TriangleList"
    }
}
</code></pre></div></div>

<p>You can supply render graphs which are built at run-time with automatic resource transitions and barriers inserted based on dependencies, this is still in early stages because my use cases are currently quite simple but in time I expect this to grow a lot more:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>render_graphs: {
    mesh_debug: {
        grid: {
            view: "main_view"
            pipelines: ["imdraw_3d"]
            function: "render_grid"
        }
        meshes: {
            view: "main_view_no_clear"
            pipelines: ["mesh_debug"]
            function: "render_meshes"
            depends_on: ["grid"]
        }
        wireframe: {
            view: "main_view_no_clear"
            pipelines: ["wireframe_overlay"]
            function: "render_meshes"
            depends_on: ["meshes", "grid"]
        }
    }
}
</code></pre></div></div>

<p>You can take a look at a simple example of <a href="https://github.com/polymonster/hotline-data/blob/master/src/shaders">pmfx</a> supplied with the hotline repository. Based on this file a reflection <a href="https://github.com/polymonster/pmfx-shader/blob/master/examples/outputs/v2_info.json">info file</a> is generated, as well as recompiling the <code class="language-plaintext highlighter-rouge">hlsl</code> source into byte code with <code class="language-plaintext highlighter-rouge">DXC</code>. You can supply compile time flags that are evaluated and will generate shader permutations. Shaders which share the same source code, even though the permutation flags may differ, are hashed and re-used so as few as possible shaders are generated and compiled. <code class="language-plaintext highlighter-rouge">pmfx</code> also carefully tracks all shaders and render states so only minimal changes get reloaded.</p>

<h3 id="pmfx-rust">Pmfx Rust</h3>

<p>The Rust side of <code class="language-plaintext highlighter-rouge">pmfx</code> uses <a href="https://docs.rs/serde/latest/serde/">serde</a> to serialise and deserialise json into <code class="language-plaintext highlighter-rouge">hotline_rs::gfx</code> structures so they can be passed straight to the <code class="language-plaintext highlighter-rouge">gfx</code> API. <code class="language-plaintext highlighter-rouge">pmfx</code> tracks source files that are dependencies to build shaders or <code class="language-plaintext highlighter-rouge">pmfx</code> render configs and then re-builds are triggered when changes are detected.</p>

<p>All of the render config states and objects have hashes exported along with them and these hashes can be used to check for changes against live resources in use inside the <code class="language-plaintext highlighter-rouge">client</code>. Only changed resources get re-compiled and reloaded. With checks to minimise shader rebuilds and checks on reloads this will hopefully mitigate compilation costs where combinatorial explosion can occur due to having many shader permutations.</p>

<p>The <code class="language-plaintext highlighter-rouge">pmfx</code> API can be used to load <code class="language-plaintext highlighter-rouge">pipelines</code> and <code class="language-plaintext highlighter-rouge">render_graphs</code> and then those resources can be found by name. Ownership of the resources remains with <code class="language-plaintext highlighter-rouge">pmfx</code> itself and render systems can borrow the resources for a short time on the stack to pass them into command buffers.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// load and create resources</span>
<span class="k">let</span> <span class="n">pmfx_bindless</span> <span class="o">=</span> <span class="n">asset_path</span><span class="nf">.join</span><span class="p">(</span><span class="s">"data/shaders/bindless"</span><span class="p">);</span>
<span class="n">pmfx</span><span class="nf">.load</span><span class="p">(</span><span class="n">pmfx_bindless</span><span class="nf">.to_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">())</span><span class="o">?</span><span class="p">;</span>
<span class="n">pmfx</span><span class="nf">.create_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="p">,</span> <span class="s">"compute_rw"</span><span class="p">,</span> <span class="n">swap_chain</span><span class="nf">.get_backbuffer_pass</span><span class="p">())</span><span class="o">?</span><span class="p">;</span>
<span class="n">pmfx</span><span class="nf">.create_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="p">,</span> <span class="s">"bindless"</span><span class="p">,</span> <span class="n">swap_chain</span><span class="nf">.get_backbuffer_pass</span><span class="p">())</span><span class="o">?</span><span class="p">;</span>

<span class="c1">// borrow resources (we need to get a pipeline built for a compatible render pass)</span>
<span class="k">let</span> <span class="n">fmt</span> <span class="o">=</span> <span class="n">swap_chain</span><span class="nf">.get_backbuffer_pass</span><span class="p">()</span><span class="nf">.get_format_hash</span><span class="p">();</span>
<span class="k">let</span> <span class="n">pso_pmfx</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_render_pipeline_for_format</span><span class="p">(</span><span class="s">"bindless"</span><span class="p">,</span> <span class="n">fmt</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="k">let</span> <span class="n">pso_compute</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_compute_pipeline</span><span class="p">(</span><span class="s">"compute_rw"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

<span class="c1">// use resource in command buffers</span>
<span class="n">cmdbuffer</span><span class="nf">.set_compute_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pso_compute</span><span class="p">);</span>

<span class="c1">// ..</span>

<span class="n">cmdbuffer</span><span class="nf">.set_render_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pso_pmfx</span><span class="p">);</span>
</code></pre></div></div>

<p>Views are a <code class="language-plaintext highlighter-rouge">pmfx</code> feature that start to lean into the <code class="language-plaintext highlighter-rouge">bevy_ecs</code> which contains entities such as <code class="language-plaintext highlighter-rouge">cameras</code> and <code class="language-plaintext highlighter-rouge">meshes</code> that can be used to render world views. A view contains a command buffer that can be generated each frame, they also have a camera (view constants) that can be bound for the pass and a render pass to render into. This is all passed to a <code class="language-plaintext highlighter-rouge">bevy_ecs</code> system function, which is dispatched on the CPU concurrently with any other render systems. Each view has its own command buffer and the jobs are read-only (aside from writing the command buffer), so they can be safely dispatched on different threads at the same time. You can build command buffers and make draw calls like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">render_meshes</span><span class="p">(</span>
    <span class="n">pmfx</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Res</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">view</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">pmfx</span><span class="p">::</span><span class="n">View</span><span class="o">&lt;</span><span class="nn">gfx_platform</span><span class="p">::</span><span class="n">Device</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">mesh_draw_query</span><span class="p">:</span> <span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Query</span><span class="o">&lt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">WorldMatrix</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">MeshComponent</span><span class="p">)</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="nn">hotline_rs</span><span class="p">::</span><span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>
        
    <span class="k">let</span> <span class="n">pmfx</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">pmfx</span><span class="na">.0</span><span class="p">;</span>

    <span class="k">let</span> <span class="n">fmt</span> <span class="o">=</span> <span class="n">view</span><span class="py">.pass</span><span class="nf">.get_format_hash</span><span class="p">();</span>
    <span class="k">let</span> <span class="n">mesh_debug</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_render_pipeline_for_format</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.view_pipeline</span><span class="p">,</span> <span class="n">fmt</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">camera</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_camera_constants</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.camera</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

    <span class="c1">// setup pass</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.begin_render_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.pass</span><span class="p">);</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_viewport</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.viewport</span><span class="p">);</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_scissor_rect</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.scissor_rect</span><span class="p">);</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_render_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mesh_debug</span><span class="p">);</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_constants</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">16</span> <span class="o">*</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nn">gfx</span><span class="p">::</span><span class="nf">as_u8_slice</span><span class="p">(</span><span class="n">camera</span><span class="p">));</span>

    <span class="k">for</span> <span class="p">(</span><span class="n">world_matrix</span><span class="p">,</span> <span class="n">mesh</span><span class="p">)</span> <span class="k">in</span> <span class="o">&amp;</span><span class="n">mesh_draw_query</span> <span class="p">{</span>
        <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.push_constants</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">world_matrix</span><span class="na">.0</span><span class="p">);</span>
        <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_index_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mesh</span><span class="na">.0</span><span class="py">.ib</span><span class="p">);</span>
        <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.set_vertex_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mesh</span><span class="na">.0</span><span class="py">.vb</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
        <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.draw_indexed_instanced</span><span class="p">(</span><span class="n">mesh</span><span class="na">.0</span><span class="py">.num_indices</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// end / transition / execute</span>
    <span class="n">view</span><span class="py">.cmd_buf</span><span class="nf">.end_render_pass</span><span class="p">();</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Resource transitions are an important part of modern graphics API and I am aiming to make this as smooth as possible. In the <code class="language-plaintext highlighter-rouge">pmfx</code> file you can provide <code class="language-plaintext highlighter-rouge">render_graphs</code>, which will automatically insert transitions based on state tracking. As mentioned above, all of the view render functions are dispatched concurrently on the CPU, but on the GPU they get executed in a specific order based on the render graph’s dependencies, with appropriate transitions inserted in between. This is still quite bare bones because I am not doing anything overly complicated yet, but I expect this aspect of <code class="language-plaintext highlighter-rouge">pmfx</code> to require a lot more attention as the project progresses. <code class="language-plaintext highlighter-rouge">pmfx</code> also provides the ability to insert resolves for MSAA resources which is like a special kind of transition.</p>

<h2 id="challenges">Challenges</h2>

<p>This leg of the project has been by far the most challenging. I started to hit more difficulties with memory ownership than I had until now, and with the addition of <code class="language-plaintext highlighter-rouge">bevy_ecs</code> and multi-threading introduces different scenarios to handle. Mostly the difficulties are to do with memory ownership, borrowing and mutability. Sometimes the borrow checker can be brutal and small tasks to refactor code can send you down a wormhole you didn’t expect.</p>

<h3 id="refactoring">Refactoring</h3>

<p>Refactoring in general I have found more difficult at times in Rust than in any other language. I tend to start things quite quickly and get something working; this typically means creating separate objects on the stack inside <code class="language-plaintext highlighter-rouge">main</code> and then that means the data is a bit more favourable to avoid overlapping mutability / borrowing issues.</p>

<p>When performing what initially seems like a simple refactor to bring that code more inline with where you want it to be or where your mental model is, you can hit a load of borrow checker errors and it turns out to be a more challenging task than you thought due to the mutual exclusion property of mutable references or just trying to move something to a thread, maybe some types can’t be <code class="language-plaintext highlighter-rouge">Send</code> and then this means you have to re-think how your data is grouped together or how it is synchronised across threads.</p>

<h3 id="ownership">Ownership</h3>

<p>Until this point I had mostly been dealing with objects on the stack that had been quite easy to either move or pass as reference through the call stack. Here are some examples of more complicated memory ownership:</p>

<h4 id="basic-lifetimes">Basic Lifetimes</h4>

<p>I have a couple of places where I am using lifetimes, but have tried to steer away from them as much as possible. The current place I am using them is when passing <code class="language-plaintext highlighter-rouge">info</code> structures to create resources from a module backend.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Information to create a pipeline through `Device::create_render_pipeline`. where the shaders will be visible in the current stack</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">RenderPipelineInfo</span><span class="o">&lt;</span><span class="nv">'stack</span><span class="p">,</span> <span class="n">D</span><span class="p">:</span> <span class="n">Device</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// Vertex Shader</span>
    <span class="k">pub</span> <span class="n">vs</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="nv">'stack</span> <span class="nn">D</span><span class="p">::</span><span class="n">Shader</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="cd">/// Fragment Shader</span>
    <span class="k">pub</span> <span class="n">fs</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="nv">'stack</span> <span class="nn">D</span><span class="p">::</span><span class="n">Shader</span><span class="o">&gt;</span><span class="p">,</span>

    <span class="c1">// ..</span>
<span class="p">}</span>

<span class="cd">/// The shader lifetime lasts long enough to pass to `Device::create_render_pipeline`</span>
<span class="k">let</span> <span class="n">vsc_info</span> <span class="o">=</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">ShaderInfo</span> <span class="p">{</span>
    <span class="n">shader_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ShaderType</span><span class="p">::</span><span class="n">Vertex</span><span class="p">,</span>
    <span class="n">compile_info</span><span class="p">:</span> <span class="nb">None</span>
<span class="p">};</span>
<span class="k">let</span> <span class="n">vs</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_shader</span><span class="p">(</span><span class="o">&amp;</span><span class="n">vsc_info</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">vsc_data</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

<span class="k">let</span> <span class="n">psc_info</span> <span class="o">=</span> <span class="nn">gfx</span><span class="p">::</span><span class="n">ShaderInfo</span> <span class="p">{</span>
    <span class="n">shader_type</span><span class="p">:</span> <span class="nn">gfx</span><span class="p">::</span><span class="nn">ShaderType</span><span class="p">::</span><span class="n">Vertex</span><span class="p">,</span>
    <span class="n">compile_info</span><span class="p">:</span> <span class="nb">None</span>
<span class="p">};</span>
<span class="k">let</span> <span class="n">fs</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_shader</span><span class="p">(</span><span class="o">&amp;</span><span class="n">psc_info</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">psc_data</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

<span class="k">let</span> <span class="n">pso</span> <span class="o">=</span> <span class="n">device</span><span class="nf">.create_render_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">gfx</span><span class="p">::</span><span class="n">RenderPipelineInfo</span> <span class="p">{</span>
    <span class="n">vs</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="o">&amp;</span><span class="n">vs</span><span class="p">),</span>
    <span class="n">fs</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="o">&amp;</span><span class="n">fs</span><span class="p">),</span>

    <span class="c1">// ..</span>
<span class="p">})</span><span class="o">?</span><span class="p">;</span>
</code></pre></div></div>

<p>I have considered moving the objects that need lifetimes out of the structure and just passing them instead to the function so that lifetimes are not needed, but that means you lose the ability for <code class="language-plaintext highlighter-rouge">defaults</code> so I’m not sure. There is a situation to handle when passing a resource into a command buffer so that resource is to be used by the GPU as the resource can be dropped before it is used but I will cover that later.</p>

<h4 id="overlapping-mutability">Overlapping Mutability</h4>

<p>Overlapping mutability has been tricky to get around at times. That is taking 2 mutable references to data that overlaps. It’s interesting to have to tackle this because it’s not something that you need to think about in C or C++, yet it is happening all the time, <a href="https://en.wikipedia.org/wiki/Load-Hit-Store">load-hit-stores</a> occur when aliasing memory by two pointers as function arguments because they cannot be guaranteed to be different at compile time. Using <code class="language-plaintext highlighter-rouge">restrict</code> was something I used to do in the past when thinking about performance, but it’s not really something I think all that much about these days because the memory aliasing concept is quite abstracted. Rust forbids it and as a result you end up in difficult situations when grouping data in different ways.</p>

<p>It’s a natural instinct to want to bundle things together, maybe coming from a C background with context passing has got me leaning this way. But in <code class="language-plaintext highlighter-rouge">hotline</code> one of the more difficult scenarios I hit was when creating the <code class="language-plaintext highlighter-rouge">Client</code>. It felt to me natural that the <code class="language-plaintext highlighter-rouge">Client</code> could bundle together some common core functionality and be something passed around between plugins. But trouble arises when you need to borrow 2 members at the same time.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// call plugin ui functions</span>
<span class="k">for</span> <span class="n">plugin</span> <span class="k">in</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.plugins</span> <span class="p">{</span>
    <span class="c1">// ..</span>

    <span class="k">self</span> <span class="o">=</span> <span class="nf">ui_fn</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">plugin</span><span class="py">.instance</span><span class="p">,</span> <span class="n">imgui_ctx</span><span class="p">);</span> <span class="c1">// cannot move self because it is borrowed as mutable (`for plugin in &amp;mut self.plugins`)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I was able to work around this particular instance by moving the plugins into another vector, but also then I had to separate what members were part of the <code class="language-plaintext highlighter-rouge">Plugin</code> so that the <code class="language-plaintext highlighter-rouge">libs</code> could be accessed inside plugin functions to allow them to find functions to call.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// take the plugin mem so we can decouple the shared mutability between client and plugins</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">plugins</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">take</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.plugins</span><span class="p">);</span>

<span class="c1">// call plugin ui functions</span>
<span class="k">for</span> <span class="n">plugin</span> <span class="k">in</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">plugins</span> <span class="p">{</span>
    <span class="c1">// ..</span>

    <span class="k">self</span> <span class="o">=</span> <span class="nf">ui_fn</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">plugin</span><span class="py">.instance</span><span class="p">,</span> <span class="n">imgui_ctx</span><span class="p">);</span> <span class="c1">//now we can move self </span>
<span class="p">}</span>
</code></pre></div></div>

<p>This illustrates to me how data ownership and grouping is quite a different beast in Rust to what I am used to.</p>

<h4 id="iterator-consumers">Iterator Consumers</h4>

<p>I found myself breaking apart algorithms and doing things like finding data that needs to be mutated in one pass, gathering the results then iterating over the results in a separate pass to separate the mutability.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// iterate over `pmfx_tracking` to check for changes, and reload data</span>
<span class="k">for</span> <span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">tracking</span><span class="p">)</span> <span class="k">in</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.pmfx_tracking</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">mtime</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">metadata</span><span class="p">(</span><span class="o">&amp;</span><span class="n">tracking</span><span class="py">.filepath</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.modified</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="k">if</span> <span class="n">mtime</span> <span class="o">&gt;</span> <span class="n">tracking</span><span class="py">.modified_time</span> <span class="p">{</span>
        
        <span class="c1">// perform a reload</span>
        <span class="k">self</span><span class="py">.shaders</span><span class="nf">.remove</span><span class="p">(</span><span class="n">shader</span><span class="p">);</span> <span class="cd">//!! this is not possible as `self` is already borrowed (`for (_, tracking) in &amp;mut self.pmfx_tracking`)</span>
        <span class="c1">// ..</span>

        <span class="c1">// update modified time</span>
        <span class="n">tracking</span><span class="py">.modified_time</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">metadata</span><span class="p">(</span><span class="o">&amp;</span><span class="n">tracking</span><span class="py">.filepath</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.modified</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>My instinct initially went for imperative style loops, but since then I started to adopt the iterator patterns using <code class="language-plaintext highlighter-rouge">filter</code>, <code class="language-plaintext highlighter-rouge">map</code>, <code class="language-plaintext highlighter-rouge">fold</code>, and <code class="language-plaintext highlighter-rouge">collect</code>. This means that creating a mutable collection and inserting into that collection can be replaced with an immutable collection.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// first collect paths that need reloading</span>
<span class="k">let</span> <span class="n">reload_paths</span> <span class="o">=</span> <span class="k">self</span><span class="py">.pmfx_tracking</span><span class="nf">.iter_mut</span><span class="p">()</span><span class="nf">.filter</span><span class="p">(|(</span><span class="n">_</span><span class="p">,</span> <span class="n">tracking</span><span class="p">)|</span> <span class="p">{</span>
    <span class="nn">fs</span><span class="p">::</span><span class="nf">metadata</span><span class="p">(</span><span class="o">&amp;</span><span class="n">tracking</span><span class="py">.filepath</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.modified</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span> <span class="o">&gt;</span> <span class="n">tracking</span><span class="py">.modified_time</span>
<span class="p">})</span><span class="nf">.map</span><span class="p">(|</span><span class="n">tracking</span><span class="p">|</span> <span class="p">{</span>
    <span class="n">tracking</span><span class="na">.1</span><span class="py">.filepath</span><span class="nf">.to_string_lossy</span><span class="p">()</span><span class="nf">.to_string</span><span class="p">()</span>
<span class="p">})</span><span class="py">.collect</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;&gt;</span><span class="p">();</span>

<span class="c1">// iterate over the paths we want to reload</span>
<span class="k">for</span> <span class="n">reload_filepath</span> <span class="k">in</span> <span class="n">reload_paths</span> <span class="p">{</span>
    <span class="k">if</span> <span class="o">!</span><span class="n">reload_filepath</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span>

        <span class="c1">// repeat similarly inside, collecting resources that need updating first</span>

        <span class="c1">// find textures that need reloading</span>
        <span class="k">let</span> <span class="n">reload_textures</span> <span class="o">=</span> <span class="k">self</span><span class="py">.textures</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.filter</span><span class="p">(|(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)|</span> <span class="p">{</span>
            <span class="k">self</span><span class="py">.pmfx.textures</span><span class="nf">.get</span><span class="p">(</span><span class="o">*</span><span class="n">k</span><span class="p">)</span><span class="nf">.map_or_else</span><span class="p">(||</span> <span class="k">false</span><span class="p">,</span> <span class="p">|</span><span class="n">src</span><span class="p">|</span> <span class="p">{</span>
                <span class="n">src</span><span class="py">.hash</span> <span class="o">!=</span> <span class="n">v</span><span class="na">.0</span>
            <span class="p">})</span>
        <span class="p">})</span><span class="nf">.map</span><span class="p">(|(</span><span class="n">k</span><span class="p">,</span> <span class="n">_</span><span class="p">)|</span> <span class="p">{</span>
            <span class="n">k</span><span class="nf">.to_string</span><span class="p">()</span>
        <span class="p">})</span><span class="py">.collect</span><span class="p">::</span><span class="o">&lt;</span><span class="n">HashSet</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;&gt;</span><span class="p">();</span>

        <span class="c1">// ..</span>

        <span class="c1">// reloading outside of any iterator tied to self (self here is mutable)</span>
        <span class="k">self</span><span class="nf">.recreate_textures</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">reload_textures</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I have started to think this way a bit more, but it’s a little alien to me when I can achieve the same thing with a simple loop.</p>

<h4 id="moves">Moves</h4>

<p>For the plugin libs and for <code class="language-plaintext highlighter-rouge">bevy_ecs</code> particularly I need to pass the hotline modules as resources to the <code class="language-plaintext highlighter-rouge">ecs</code> systems. In the end I settled on moving the entire hotline <code class="language-plaintext highlighter-rouge">Client</code> into the plugin functions, into the ecs <code class="language-plaintext highlighter-rouge">World</code> and back out again. The core hotline modules also need to be wrapped up to be a <code class="language-plaintext highlighter-rouge">Resource</code> for <code class="language-plaintext highlighter-rouge">bevy_ecs</code>.</p>

<p>This feels quite nice in a way, that in each plugin you have full ownership of <code class="language-plaintext highlighter-rouge">hotline</code> and can do what you like, which makes it possible to do things like asynchronous system updates through bevy <code class="language-plaintext highlighter-rouge">Scheduler</code> and let that have full control.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// move hotline resource into world</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="n">session_info</span><span class="p">);</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="nf">DeviceRes</span><span class="p">(</span><span class="n">client</span><span class="py">.device</span><span class="p">));</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="nf">AppRes</span><span class="p">(</span><span class="n">client</span><span class="py">.app</span><span class="p">));</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="nf">MainWindowRes</span><span class="p">(</span><span class="n">client</span><span class="py">.main_window</span><span class="p">));</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="nf">PmfxRes</span><span class="p">(</span><span class="n">client</span><span class="py">.pmfx</span><span class="p">));</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="nf">ImDrawRes</span><span class="p">(</span><span class="n">client</span><span class="py">.imdraw</span><span class="p">));</span>
<span class="k">self</span><span class="py">.world</span><span class="nf">.insert_resource</span><span class="p">(</span><span class="nf">UserConfigRes</span><span class="p">(</span><span class="n">client</span><span class="py">.user_config</span><span class="p">));</span>

<span class="c1">// update systems</span>
<span class="k">self</span><span class="py">.schedule</span><span class="nf">.run</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.world</span><span class="p">);</span>

<span class="c1">// move resources back out</span>
<span class="n">client</span><span class="py">.device</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">DeviceRes</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="na">.0</span><span class="p">;</span>
<span class="n">client</span><span class="py">.app</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">AppRes</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="na">.0</span><span class="p">;</span>
<span class="n">client</span><span class="py">.main_window</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">MainWindowRes</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="na">.0</span><span class="p">;</span>
<span class="n">client</span><span class="py">.pmfx</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="na">.0</span><span class="p">;</span>
<span class="n">client</span><span class="py">.imdraw</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">ImDrawRes</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="na">.0</span><span class="p">;</span>
<span class="n">client</span><span class="py">.user_config</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">UserConfigRes</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="na">.0</span><span class="p">;</span>
<span class="k">self</span><span class="py">.session_info</span> <span class="o">=</span> <span class="k">self</span><span class="py">.world.remove_resource</span><span class="p">::</span><span class="o">&lt;</span><span class="n">SessionInfo</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
</code></pre></div></div>

<p>It requires a small amount of hokey-cokey to do so, which is also a little strange but I kind of like it, I had to break out the different modules inside the client to avoid overlapping mutability when using the <code class="language-plaintext highlighter-rouge">SwapChain</code>, <code class="language-plaintext highlighter-rouge">Device</code> and <code class="language-plaintext highlighter-rouge">CmdBuf</code>.</p>

<h4 id="arcs">Arcs</h4>

<p>I am aware I could wrap everything in an <code class="language-plaintext highlighter-rouge">Arc</code> to get interior mutability and that might remove the need for the moves, but I decided to try and use them only where necessary. I am using <code class="language-plaintext highlighter-rouge">Arc</code> and <code class="language-plaintext highlighter-rouge">Mutex</code> anywhere inter-thread synchronisation is necessary. I quite like lockless data structures in C. I will take a look at <a href="https://tokio.rs">tokio</a> when I get a chance  but for now I was going with a heavy handed approach in a few places just to get the program structured how I would like. I have this <code class="language-plaintext highlighter-rouge">Reloader</code> and <code class="language-plaintext highlighter-rouge">ReloadResponder</code> setup that watches files and flags when changes have occurred, triggers a rebuild and reloads, the <code class="language-plaintext highlighter-rouge">ReloadResponder</code> is also the only place using <code class="language-plaintext highlighter-rouge">dyn</code> dispatch, there is still more work to do in that area as I struggled with trying to achieve polymorphic behaviour that I would implement in C++.</p>

<p>Another place using an <code class="language-plaintext highlighter-rouge">Arc</code> is in <code class="language-plaintext highlighter-rouge">pmfx</code> because <code class="language-plaintext highlighter-rouge">Views</code> need to be mutable in render functions and they are located within a <code class="language-plaintext highlighter-rouge">HashMap</code>, so it’s not possible to borrow mutable from <code class="language-plaintext highlighter-rouge">pmfx</code> itself without interior mutability. This led me to go with an <code class="language-plaintext highlighter-rouge">Arc</code>, however a move should be viable because only a single render system will ever write to a single view, so here it does feel unnecessary to require an <code class="language-plaintext highlighter-rouge">Arc</code> but I also had to work with <code class="language-plaintext highlighter-rouge">bevy_ecs</code> systems here which forced my hand slightly. A <code class="language-plaintext highlighter-rouge">RefCell</code> might also be a better option in this scenario.</p>

<h4 id="in-flight-gpu-resources">In-Flight GPU Resources</h4>

<p>Within a multi-buffered GPU rendering system, while the CPU is building command buffers for the current frame, another previous frame is being executed concurrently on the GPU. This introduces issues where if we decide to <code class="language-plaintext highlighter-rouge">drop</code> a resource on the CPU side, it may still be in-use on the GPU and by dropping the resource this will cause a D3D12 validation error and can lead to a device removal. I encountered this issue first with textures used for videos in the <code class="language-plaintext highlighter-rouge">av</code> API, so I added a <code class="language-plaintext highlighter-rouge">destroy_texture</code> function that passes ownership of the texture to the GPU <code class="language-plaintext highlighter-rouge">Device</code> and a function <code class="language-plaintext highlighter-rouge">cleanup_resources</code> will check the resources are no longer in use before <code class="language-plaintext highlighter-rouge">dropping</code> them. This goes a little against Rust’s memory model with the need to explicitly <code class="language-plaintext highlighter-rouge">drop</code> at the right time.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// swap the texture to None, and pass ownership of texture to the device. Where it will be cleaned up safely</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">none_tex</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>
<span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">swap</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">none_tex</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.texture</span><span class="p">);</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">tex</span><span class="p">)</span> <span class="o">=</span> <span class="n">none_tex</span> <span class="p">{</span>
    <span class="n">device</span><span class="nf">.destroy_texture</span><span class="p">(</span><span class="n">tex</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In some places where full reloads are taking place there is a useful function on a <code class="language-plaintext highlighter-rouge">SwapChain</code>, which can wait for the last submitted frame to complete on the GPU and then any <code class="language-plaintext highlighter-rouge">drops</code> happening will be guaranteed to be safe before any new frames are submitted.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// check if we have any reloads available</span>
<span class="k">if</span> <span class="k">self</span><span class="py">.reloader</span><span class="nf">.check_for_reload</span><span class="p">()</span> <span class="o">==</span> <span class="nn">ReloadState</span><span class="p">::</span><span class="n">Available</span> <span class="p">{</span>
    <span class="c1">// wait for last GPU frame so we can drop the resources</span>
    <span class="n">swap_chain</span><span class="nf">.wait_for_last_frame</span><span class="p">();</span>
    <span class="k">self</span><span class="nf">.reload</span><span class="p">(</span><span class="n">device</span><span class="p">);</span>
    <span class="k">self</span><span class="py">.reloader</span><span class="nf">.complete_reload</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This solution is much nicer because the <code class="language-plaintext highlighter-rouge">drop</code> can just happen naturally. It might not be possible or desired to perform this hard sync with the GPU, so in future I expect to have to use the <code class="language-plaintext highlighter-rouge">destroy</code> functions more (and add them for different resource types).</p>

<h3 id="build-times--linker-issues--debugging">Build Times / Linker Issues / Debugging</h3>

<p>Build times are currently the biggest problem; a <code class="language-plaintext highlighter-rouge">plugin</code> takes around 6 seconds to build with a little extra to complete the reload, so live code editing does not feel hugely responsive. Reloading shaders or render configs is very fast though, so that balances it out a bit if you work across code and shaders, and is all the more reason to use more GPU driven techniques / compute. The build times in full debug builds are much slower, but because of an issue with more than 65535 symbols exported from a plugin, which is not supported by the MSVC toolchain, I am forced to switch to <code class="language-plaintext highlighter-rouge">01</code> optimization for debug and that has similar performance to release.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>= note: LINK : fatal error LNK1189: library limit of 65535 objects exceeded
</code></pre></div></div>

<h4 id="profiling-build-times">Profiling Build Times</h4>

<p>This <a href="https://fasterthanli.me/articles/why-is-my-rust-build-so-slow">post</a> has lots of detailed info about build times. I have tried to profile the build with the <code class="language-plaintext highlighter-rouge">cargo build -Z timings</code> option but it is only available on the <code class="language-plaintext highlighter-rouge">nightly</code> channel. Switching to nightly made it possible to run with the flag but I couldn’t see any output <code class="language-plaintext highlighter-rouge">cargo-timings</code> files, I wonder if the <code class="language-plaintext highlighter-rouge">-Z timings</code> is not available on Windows? As a result I am shooting in the dark a little here. I have done some exploration to figure out what might work best.</p>

<h4 id="experimenting-with-build-times">Experimenting With Build Times</h4>

<p>I tried to separate out the plugins over more libs so that the core hotline lib did not have to depend on <code class="language-plaintext highlighter-rouge">bevy_ecs</code>. This didn’t make much of a positive difference because it created the requirement for an additional plugin with shared code. This attempt ended up with 5 total build artefacts, which ended in 13 second build times.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">hotline_rs.dll</code></li>
  <li><code class="language-plaintext highlighter-rouge">client.exe</code></li>
  <li><code class="language-plaintext highlighter-rouge">ecs_base.dll</code></li>
  <li><code class="language-plaintext highlighter-rouge">ecs.dll</code></li>
  <li><code class="language-plaintext highlighter-rouge">ecs_demos.dll</code></li>
</ul>

<p>Each library or executable that requires building adds a noticeable constant cost which seems like the link time. So reducing the amount of libs actually helped improve build times and adding more only increased the build time. I moved the <code class="language-plaintext highlighter-rouge">ecs_base</code> plugin into <code class="language-plaintext highlighter-rouge">hotline</code>, which reduces the number of build artefacts and brings me to 6 second build times. If I build a single lib and executable the build time is around 10 seconds, so the live building is an improvement, if still not where I would like it to be but maybe this can improve in time.</p>

<p>For an end user the desired result is that they would not need to modify the <code class="language-plaintext highlighter-rouge">client</code>, the <code class="language-plaintext highlighter-rouge">hotline</code> lib or the core <code class="language-plaintext highlighter-rouge">ecs</code> and only work inside a plugin such as <code class="language-plaintext highlighter-rouge">ecs_demos</code>. So this has about a 6 second build time, which is not too bad, however when working on the core engine itself care needs to be taken to make sure the plugins and the core libs are in sync so that means building more artefacts. Building from clean and switching between release and debug also added a cost, so keeping the number of libs down was the way to go.</p>

<h4 id="avoiding-unnecessary-builds">Avoiding Unnecessary Builds</h4>

<p>Due to the build times being fairly long I had to ensure that any builds were not being triggered erroneously when they did not need to. With shaders inside the main repository it causes <code class="language-plaintext highlighter-rouge">cargo build</code> to think that <code class="language-plaintext highlighter-rouge">hotline</code> needs rebuilding when shaders have changed, even though this should not affect any of the libs or the executables. This is particularly painful because if modifying a shader the client is able ro rebuild and reload the shaders and associated pipelines very quickly, however the next time code is modified in a plugin it causes the plugin to rebuild the main <code class="language-plaintext highlighter-rouge">lib</code> as well, causing the total build time to reach about 10 seconds. If the shaders are excluded from the package in <code class="language-plaintext highlighter-rouge">Cargo.toml</code> then the issue does not occur, but the data is necessary to ship to users.</p>

<p>Due to other constraints I ended up moving the data into a separate repository, which mitigates the issue of rebuilding the main library. For now the problem is kept at bay, but it is a bit more work to maintain changes in the data repository. I need to find a more long term solution to this. Even editing the <code class="language-plaintext highlighter-rouge">todo.txt</code> file I have inside the repository causes a cargo build to take a few seconds. I know you can exclude directories from the package, which resolves the issue, but also I would like these things to publish to crates.</p>

<h4 id="working-in-plugin-environment">Working in Plugin Environment</h4>

<p>Debugging in general is more difficult in the <code class="language-plaintext highlighter-rouge">plugin</code> environment, if you are attached to the debugger <code class="language-plaintext highlighter-rouge">plugin</code> rebuilds will fail because the <code class="language-plaintext highlighter-rouge">.pdb</code> is locked. So sometimes it means resorting to <code class="language-plaintext highlighter-rouge">println!</code> debugging when you need to debug the hot reload process itself.</p>

<p>I added support for serialisation of the basic program state, which is synchronised between release and debug builds. It keeps the camera position and the currently selected demos, so even from a full restart you are right back where you left off. This makes those times where something goes terribly wrong, or the times you need to edit the core engine, just a little bit easier.</p>

<p>For the convenience that having hot reloaded plugins aims to provide, developing in that environment is quite tricky, so currently it feels like having plugins is an extra burden to carry around. It would be nice to be able to switch between statically linked and dynamically linked plugins and that would also be a good option for a final packaged build of an application, where the hot reloading would not be required. This is something I will look into when I get a chance.</p>

<h3 id="error-handling">Error Handling</h3>

<p>I have spent quite a lot of time handling errors and propagating them in a way to allow the client to continue running gracefully should something go wrong. It’s quite easy to quickly get things working just using <code class="language-plaintext highlighter-rouge">unwrap</code> to panic if something is missing, fix the issue and leave the unwrap there. That’s how I tend to like working in other code bases; if something is missing, assert and then fix that before moving on. In certain situations, like a game for instance, missing data should not be present in a final build so having all of the code to gracefully handle it always felt like extra baggage. But in some situations like this code base and in tools, you need to allow things to go wrong and interactively resolve them.</p>

<p>Luckily Rust is really good at error handling and it actively wants you to do so, even in cases where I would hit a panic for a missing shader or some other data which may have had a typo in or an incorrect path. When I was hitting the panic I would know exactly where and then returning <code class="language-plaintext highlighter-rouge">Result</code> from a function allows the use of <code class="language-plaintext highlighter-rouge">?</code>, which makes a significant improvement to the readability of the code by reducing the need to unwrap. Here’s a bloated messy initial setup, partly down to no being sure how to handle errors in <code class="language-plaintext highlighter-rouge">bevy_ecs</code> systems.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[no_mangle]</span>
 <span class="k">pub</span> <span class="k">fn</span> <span class="nf">render_meshes</span><span class="p">(</span>
     <span class="n">pmfx</span><span class="p">:</span> <span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Res</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">,</span>
     <span class="n">view_name</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>
     <span class="n">mesh_draw_query</span><span class="p">:</span> <span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Query</span><span class="o">&lt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">WorldMatrix</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">MeshComponent</span><span class="p">)</span><span class="o">&gt;</span><span class="p">)</span> <span class="p">{</span>

    <span class="c1">// this is just code needed to get gfx resources and unwrap them to use in command buffer generation</span>
    <span class="k">let</span> <span class="n">arc_view</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_view</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view_name</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">arc_view</span><span class="nf">.is_none</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">let</span> <span class="n">arc_view</span> <span class="o">=</span> <span class="n">arc_view</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="k">let</span> <span class="n">view</span> <span class="o">=</span> <span class="n">arc_view</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">fmt</span> <span class="o">=</span> <span class="n">view</span><span class="py">.pass</span><span class="nf">.get_format_hash</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">mesh_debug</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_render_pipeline_for_format</span><span class="p">(</span><span class="s">"mesh_debug"</span><span class="p">,</span> <span class="n">fmt</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">mesh_debug</span><span class="nf">.is_none</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">let</span> <span class="n">mesh_debug</span> <span class="o">=</span> <span class="n">mesh_debug</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">camera</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_camera_constants</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.camera</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">camera</span><span class="nf">.is_none</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">let</span> <span class="n">camera</span> <span class="o">=</span> <span class="n">camera</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// ..</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And with the proper result propagation it looks much better, I was also able to unwrap and pass the <code class="language-plaintext highlighter-rouge">View</code> into the function instead of fetching it by name because now I was calling the function from a closure which gives a bit more control.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[no_mangle]</span>
 <span class="k">pub</span> <span class="k">fn</span> <span class="nf">render_meshes</span><span class="p">(</span>
     <span class="n">pmfx</span><span class="p">:</span> <span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Res</span><span class="o">&lt;</span><span class="n">PmfxRes</span><span class="o">&gt;</span><span class="p">,</span>
     <span class="n">view_name</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>
     <span class="n">mesh_draw_query</span><span class="p">:</span> <span class="nn">bevy_ecs</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="n">Query</span><span class="o">&lt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">WorldMatrix</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">MeshComponent</span><span class="p">)</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="p">{</span>

    <span class="k">let</span> <span class="n">fmt</span> <span class="o">=</span> <span class="n">view</span><span class="py">.pass</span><span class="nf">.get_format_hash</span><span class="p">();</span>
    <span class="k">let</span> <span class="n">mesh_debug</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_render_pipeline_for_format</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.view_pipeline</span><span class="p">,</span> <span class="n">fmt</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">camera</span> <span class="o">=</span> <span class="n">pmfx</span><span class="nf">.get_camera_constants</span><span class="p">(</span><span class="o">&amp;</span><span class="n">view</span><span class="py">.camera</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

    <span class="c1">// ..</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There are still lots of combinations and things to test when it comes to error handling so I have added some initial tests to try and catch things that might go wrong, but I foresee this as ongoing work and need to get in the habit of thinking of that earlier on instead of quickly getting something working and refactoring.</p>

<h3 id="whats-next">What’s Next?</h3>

<p>I’m pretty happy with the overall program structure and also the stability has been great. I think that’s a good affirmation that all of the hard work playing ball with the borrow checker pays off in the long run. I still think there’s a lot to think about in terms of memory ownership, this is one area I’m not as certain about as anything I have worked on for a long time. Next up I will be starting to add lighting and shadows, firstly I need to add these concepts into the <code class="language-plaintext highlighter-rouge">ecs</code> and then plan to work on clustered lighting and virtual shadow maps. There are a few bits of <code class="language-plaintext highlighter-rouge">pmfx</code> I need to add and hookup to make that possible, but in general the graphics side of the engine is really coming along.</p>

<p>I posted about this both on <a href="https://twitter.com/polymonster">twitter</a> and <a href="https://mastodon.gamedev.place/@polymonster">mastodon</a>, I was keen to move to mastodon but still finding much more engagement on twitter. Give me a follow if you’re interested and check out the <a href="https://github.com/polymonster/hotline">GitHub</a> or <a href="https://crates.io/crates/hotline-rs">crates.io</a> page.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[The most challenging leg of the hotline Rust graphics engine: plugins, multi-threaded command buffer generation, and hot reloading for Rust code, HLSL shaders, and render configs.]]></summary></entry><entry><title type="html">Building a gamedev maths library in Rust from scratch</title><link href="https://polymonster.co.uk/blog/maths-rs" rel="alternate" type="text/html" title="Building a gamedev maths library in Rust from scratch" /><published>2022-08-27T00:00:00+00:00</published><updated>2022-08-27T00:00:00+00:00</updated><id>https://polymonster.co.uk/blog/maths-rs</id><content type="html" xml:base="https://polymonster.co.uk/blog/maths-rs"><![CDATA[<p>I have just finished up a linear algebra maths library in Rust and it’s available on <a href="https://crates.io/crates/maths-rs">crates.io</a>. It contains the usual implementation of vectors, matrices and quaternions but also tons of useful intersection, distance functions, point tests, graphs, utility functions and ergonomic decisions to hopefully make this fun and nimble to use for gamedev and graphics coding. I have been spending small chunks of time writing functions and tests over the summer, it has been quite enjoyable and I have learned a lot more about the Rust programming language, especially going into more detail with traits and trait bounds than I have previously, and also my first real work with macros.</p>

<p>There are already many other Rust maths libraries available on GitHub or Crates.io. This website <a href="https://arewegameyet.rs/ecosystem/math/">are we game yet</a> has a list of gamedev libraries for Rust. I tried a few of them, such as <a href="https://crates.io/crates/cgmath">cgmath</a> and <a href="https://crates.io/crates/nalgebra">nalgebra</a> in my graphics engine for a short while but I ended up wanting more and being interested in how I would implement one myself. A lot of maths libraries out there; for all the languages you can think of, usually implement vectors, matrices and quaternions. What is less common is a comprehensive collection of intersection tests, distance and utility functions… in fact SIMD support is probably more common in a maths library than a ray triangle intersection function. I had already been through this process and, over a number of years, had accumulated a decent set of functionality in my C++ <a href="https://github.com/polymonster/maths">library</a>, so I decided to essentially port that functionality to Rust. My initial plan was to use an existing Rust library for the vector, matrix and quaternion implementations and then just implement the intersection and utility functions, but after a while I wanted to make changes to try and get my Rust library stylistically closer to the C++ one. At first I wasn’t sure if what I wanted would be possible, but in the end I am happy with the results. You can take a look at the full <a href="https://crates.io/crates/maths-rs">documentation</a> or <a href="https://github.com/polymonster/maths-rs">readme</a>, which give a detailed overview of the feature set.</p>

<h2 id="c-maths-library">C++ Maths Library</h2>

<p>As a gamedev with a strong focus towards graphics a maths library is an essential tool to have in your toolbox. Since I started coding and working on games I slowly built up a maths library in C++, at this point it has had many changes and I have accumulated a lot of functions over the years from different sources, books, websites, and blogs. One of the biggest influences was this blog <a href="https://www.reedbeta.com/blog/on-vector-math-libraries/">post</a> by Nathan Reed. He outlined how to make a vector library that is templated by both type and size to allow n-dimensional vectors and only needing to implement one set of functions for the entire thing. Prior to this I had never been a huge fan of templates due to the escalation of complexity once you start nesting and combining them, and in general the unreadable hard to follow C++ standard library. But here was a really concrete example of something a little more complex than a container class which added value. I adopted this approach and started to enjoy template meta-programming. I went to great lengths to implement these <a href="https://github.com/polymonster/maths/blob/master/swizzle.h">swizzles</a> which look and feel just like writing maths code in a shader.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vec4f</span> <span class="n">swizz</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">wzyx</span><span class="p">;</span>       <span class="c1">// construct from swizzle</span>
<span class="n">swizz</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">xxxx</span><span class="p">;</span>             <span class="c1">// assign from swizzle</span>
<span class="n">swizz</span><span class="p">.</span><span class="n">wyxz</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">xxyy</span><span class="p">;</span>        <span class="c1">// assign swizzle to swizzle</span>
<span class="n">vec2f</span> <span class="n">v2</span> <span class="o">=</span> <span class="n">swizz</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span>        <span class="c1">// construct truncated</span>
<span class="n">swizz</span><span class="p">.</span><span class="n">wx</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>            <span class="c1">// assign truncated</span>
<span class="n">swizz</span><span class="p">.</span><span class="n">xyz</span> <span class="o">*=</span> <span class="n">swizz2</span><span class="p">.</span><span class="n">www</span><span class="p">;</span>    <span class="c1">// arithmetic on swizzles</span>
<span class="n">vec2</span> <span class="n">v2</span> <span class="o">=</span> <span class="n">swizz</span><span class="p">.</span><span class="n">xy</span> <span class="o">*</span> <span class="mf">2.0</span><span class="n">f</span><span class="p">;</span>  <span class="c1">// swizzle / scalar arithmetic</span>
</code></pre></div></div>

<h2 id="shader-code">Shader Code</h2>

<p>For writing maths code, shaders are somewhat of a gold standard to me, functions such as <code class="language-plaintext highlighter-rouge">dot</code> and <code class="language-plaintext highlighter-rouge">cross</code> are built in and the maths code is just part of the language. With my C++ library I wanted the same feeling so you can do stuff like this, with function overloads and the ability to operate on different sized vectors and scalars with function calls:</p>

<div class="language-hlsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">float</span> <span class="n">m</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span> <span class="c1">// min of float</span>
<span class="kt">float3</span> <span class="n">m3</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">v</span><span class="p">);</span> <span class="c1">// min of float3</span>
<span class="n">float</span> <span class="n">dp3</span> <span class="o">=</span> <span class="nb">dot</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">v</span><span class="p">);</span> <span class="c1">// dot on float3</span>
<span class="n">float</span> <span class="n">dp4</span> <span class="o">=</span> <span class="nb">dot</span><span class="p">(</span><span class="n">v4</span><span class="p">,</span> <span class="n">v4</span><span class="p">);</span> <span class="c1">// dot on float4</span>
<span class="c1">// and so on..</span>
</code></pre></div></div>

<p>Writing graphics algorithms, gameplay code, or procedural generation code can build up quickly and become quite verbose. This is why I like commonly used functions such as <code class="language-plaintext highlighter-rouge">dot</code> or <code class="language-plaintext highlighter-rouge">cross</code> just to be in scope. I have seen things in maths libraries which end up with <code class="language-plaintext highlighter-rouge">vector.dotProduct(other)</code> all over the place and find this sort of thing hard to read and follow. A hill I will die on (and I know it’s an unpopular opinion in some circles but here we go anyway) is that single letter / short variable names are OK. They can help with readability if they make the code more compact, and with comments you can make the overall algorithm more readable where the focus is on the operations instead of variable names. Having a million <code class="language-plaintext highlighter-rouge">myVector.dotProduct(someOtherVector)</code> generates so much noise and could be equivalent to <code class="language-plaintext highlighter-rouge">dot(v, o)</code>… A lot of mathematical notation is just single greek symbols so it comes with territory, and check out a lot of stuff on <a href="https://www.shadertoy.com/view/4dfXDn">shadertoy</a>, this is full of single letter variables too… if you hate that kind of thing maybe don’t check shadertoy you might have a heart attack.</p>

<h2 id="what-is-possible-in-rust">What is possible in Rust?</h2>

<p>I set out to see if it was possible to get what I wanted out of Rust. The aim here was to get something as close as possible to my C++ library that looked and felt like a shader language. This would also make it easier to port code between my C++ code base, shaders and Rust, but also make something that is ergonomic and fun to use. At this point I want to stress that most of this work was to get something that looks and feels how I wanted it to. I am not worrying about SIMD from the start, I will look into it in the future but I was happy with a scalar implementation to begin with; my C++ library is also a simple scalar implementation. I do have interest in performance so wanted to keep an eye on it, but the primary focus was ergonomics. For really heavy computations I would be inclined to use compute shaders or write specialised SIMD routines, but this library is aimed more toward game mechanics or procedural generation.</p>

<p>The other maths libraries around take different approaches to the internals of a vector struct. Some of them implemented concrete types of <code class="language-plaintext highlighter-rouge">Vec2</code>, <code class="language-plaintext highlighter-rouge">Vec3</code> and <code class="language-plaintext highlighter-rouge">Vec4</code>, while some others go for an entirely n-dimensional approach for wider linear algebra. For the kind of thing I am working on I want to be able to access <code class="language-plaintext highlighter-rouge">.x</code> or <code class="language-plaintext highlighter-rouge">.y</code> members; I tend to do a lot of this for gameplay code, flatting movement onto an xz-plane by setting <code class="language-plaintext highlighter-rouge">vec.y = 0.0</code> so this made the n-dimensional approach less appealing. I do only need 2, 3 and 4 dimensional vectors so having 3 implementations isn’t too bad, but it is a bit of repetition so I wanted to try and consolidate this like I did with C++ templates. It is possible to get the n-dimensional style vector <code class="language-plaintext highlighter-rouge">Index</code> operator <code class="language-plaintext highlighter-rouge">[i]</code> to get the best of both worlds.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// this allows for generic sized vectors of type `T`. But no acces to members such as v.x, v.y, v.z.</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">VecN</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">v</span><span class="p">:</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In C++ it’s possible to use an un-named union to have a vector which is both a struct of array and a struct of members. This way for 1-4 dimensional vectors you can access <code class="language-plaintext highlighter-rouge">.xyzw</code> data members.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">struct</span> <span class="nc">Vec</span><span class="o">&lt;</span><span class="mi">3</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="p">{</span>
    <span class="k">union</span> <span class="p">{</span>
        <span class="n">T</span> <span class="n">v</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
        <span class="k">struct</span>
        <span class="p">{</span>
            <span class="n">T</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">;</span>
        <span class="p">};</span>
        <span class="k">struct</span>
        <span class="p">{</span>
            <span class="n">T</span> <span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">;</span>
        <span class="p">};</span>
        <span class="n">swizzle_v3</span><span class="p">;</span>
    <span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Rust has this feature <a href="https://internals.rust-lang.org/t/pre-rfc-anonymous-struct-and-union-types/3894">proposal</a> for anonymous unions, so in future this could become possible and allow diect access to data array or members via a union. But for the time being that is not possible, so we can use the <code class="language-plaintext highlighter-rouge">Index</code> operator.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// this allows for fixed sized vectors access to members such as v.x, v.y, v.z</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Vec3</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="n">x</span><span class="p">:</span> <span class="n">T</span><span class="p">,</span>
  <span class="n">y</span><span class="p">:</span> <span class="n">T</span><span class="p">,</span>
  <span class="n">z</span><span class="p">:</span> <span class="n">T</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">IndexMut</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">Vec3</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">index_mut</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">i</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">T</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">i</span> <span class="p">{</span>
            <span class="mi">0</span> <span class="k">=&gt;</span> <span class="n">x</span>
            <span class="mi">1</span> <span class="k">=&gt;</span> <span class="n">y</span>
            <span class="mi">2</span> <span class="k">=&gt;</span> <span class="n">z</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="n">z</span> <span class="c1">// clamp out of bound access? </span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">test</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="n">Vec3</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// access like array</span>
  <span class="k">let</span> <span class="n">fx</span> <span class="o">=</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
  <span class="c1">// access as member</span>
  <span class="k">let</span> <span class="n">fx</span> <span class="o">=</span> <span class="n">v</span><span class="py">.x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Debug performance is important and I wanted to try and keep the codegen as simple and as lean as possible. Coming from a C++ background I have a natural intuition toward compiler code generation in different scenarios. I set up a small <a href="https://godbolt.org/z/YsP8393xa">example</a> on godbolt to illustrate this of 2 dot product functions, one which is directly operating on a float and another which uses a template function. Before writing this I expected both to come out the same because all of the template work is done at compile time. The un-optimised version is not too far from optimisation level 1.</p>

<p>I was not prepared for what came when I started to do the same in Rust, again here is an <a href="https://rust.godbolt.org/z/qMdTv6v1E">example</a> on godbolt. Even switching to very basic implementations I found un-optimised rust code generation when using generics to be significantly more bloated than a more direct implementation, even though all of the generics are being handled at compile time. In the Rust version you can switch the compiler optimization level to 1 and see that the resulting code generation for the dot product functions is identical… In the C++ example both result in the same code generation, regardless of optimisation level.</p>

<p>I was a little disappointed about the complexity of the code generation in debug builds, I am yet to build anything large scale so time will tell. I decided in the end to continue on the generic path and just to suck up the need for optimisation, I will profile the code in some real world scenarios when I get the chance.</p>

<h2 id="macros-and-generics-vs-c-templates">Macros and Generics vs C++ Templates</h2>

<p>I discovered that macros could be used to generate the concrete implementations for fixed sized vectors. This acted a bit like C++ templates which was somewhat of a surprise. Until this point I had associated Rust generics to be the equivalent to C++ templates, but in reality I came to understand that C++ templates are quite different. A C++ template won’t compile until it is instantiated, this allows you to write any old code inside a templated function:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">TestStruct</span> <span class="p">{</span>
    <span class="kt">float</span> <span class="n">member</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="kt">float</span> <span class="n">function</span><span class="p">(</span><span class="n">T</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">.</span><span class="n">member</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Say you have a struct member <code class="language-plaintext highlighter-rouge">s.member</code> for every <code class="language-plaintext highlighter-rouge">T</code> you want to instantiate - you can ensure that the struct has a member called <code class="language-plaintext highlighter-rouge">member</code> and it’s type is a <code class="language-plaintext highlighter-rouge">float</code> everything will be OK. If you have a struct that does not have a <code class="language-plaintext highlighter-rouge">member</code> then you will get a compile error when you try to create an instance of <code class="language-plaintext highlighter-rouge">T</code>, but until that time comes you can implement the template function.</p>

<p>Rust generics don’t allow this - you cannot access raw data members and you need to create traits, associated methods or functions and supply trait bounds to access them in generic functions. This prevents you from creating functions that do not satisfy trait bounds in the first place. Rust declarative macros work a bit more like C++ templates; you can write any old code inside the macro and then you only know if it works or fails to compile when you try to instantiate the macro.</p>

<p>This was my first time using macro;  at first they were quite tricky to get my head around but I eventually got used to the syntax. I tried to use them as much as possible but found when needing nested repetitions things got quite complicated, so I did opt for some manual work instead of packing everything inside macros:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// macro implementation of vec struct for Vec2, Vec3 and Vec4</span>
<span class="nd">macro_rules!</span> <span class="n">vec_impl</span> <span class="p">{</span>
    <span class="p">(</span><span class="nv">$VecN:ident</span> <span class="p">{</span> <span class="nv">$</span><span class="p">(</span><span class="nv">$field:ident</span><span class="p">,</span> <span class="nv">$field_index:expr</span><span class="p">),</span><span class="o">*</span> <span class="p">},</span> <span class="nv">$len:expr</span><span class="p">,</span> <span class="nv">$module:ident</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
        <span class="nd">#[derive(Debug,</span> <span class="nd">Copy,</span> <span class="nd">Clone)]</span>
        <span class="nd">#[repr(C)]</span>
        <span class="k">pub</span> <span class="k">struct</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
            <span class="nv">$</span><span class="p">(</span><span class="k">pub</span> <span class="nv">$field</span><span class="p">:</span> <span class="n">T</span><span class="p">,)</span><span class="o">+</span>
        <span class="p">}</span>

        <span class="c1">//... more macro code in / implementations in here</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nd">vec_impl!</span><span class="p">(</span><span class="n">Vec2</span> <span class="p">{</span> <span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="mi">1</span> <span class="p">},</span> <span class="mi">2</span><span class="p">,</span> <span class="n">v2</span><span class="p">);</span>
<span class="nd">vec_impl!</span><span class="p">(</span><span class="n">Vec3</span> <span class="p">{</span> <span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="mi">2</span> <span class="p">},</span> <span class="mi">3</span><span class="p">,</span> <span class="n">v3</span><span class="p">);</span>
<span class="nd">vec_impl!</span><span class="p">(</span><span class="n">Vec4</span> <span class="p">{</span> <span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="mi">3</span> <span class="p">},</span> <span class="mi">4</span><span class="p">,</span> <span class="n">v4</span><span class="p">);</span>

<span class="c1">// manual implementation of dot products for Vec2, Vec3 and Vec4</span>
<span class="cd">/// trait for dot product</span>
<span class="k">pub</span> <span class="k">trait</span> <span class="n">Dot</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// vector dot-product</span>
    <span class="k">fn</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="k">Self</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="k">Self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">T</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Dot</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">Vec2</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">where</span> <span class="n">T</span><span class="p">:</span> <span class="n">Number</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="k">Self</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="k">Self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">T</span> <span class="p">{</span>
        <span class="n">a</span><span class="py">.x</span> <span class="o">*</span> <span class="n">b</span><span class="py">.x</span> <span class="o">+</span> <span class="n">a</span><span class="py">.y</span> <span class="o">*</span> <span class="n">b</span><span class="py">.y</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Dot</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">Vec3</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">where</span> <span class="n">T</span><span class="p">:</span> <span class="n">Number</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="k">Self</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="k">Self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">T</span> <span class="p">{</span>
        <span class="n">a</span><span class="py">.x</span> <span class="o">*</span> <span class="n">b</span><span class="py">.x</span> <span class="o">+</span> <span class="n">a</span><span class="py">.y</span> <span class="o">*</span> <span class="n">b</span><span class="py">.y</span> <span class="o">+</span> <span class="n">a</span><span class="py">.z</span> <span class="o">*</span> <span class="n">b</span><span class="py">.z</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Dot</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">Vec4</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">where</span> <span class="n">T</span><span class="p">:</span> <span class="n">Number</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="k">Self</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="k">Self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">T</span> <span class="p">{</span>
        <span class="n">a</span><span class="py">.x</span> <span class="o">*</span> <span class="n">b</span><span class="py">.x</span> <span class="o">+</span> <span class="n">a</span><span class="py">.y</span> <span class="o">*</span> <span class="n">b</span><span class="py">.y</span> <span class="o">+</span> <span class="n">a</span><span class="py">.z</span> <span class="o">*</span> <span class="n">b</span><span class="py">.z</span> <span class="o">+</span> <span class="n">a</span><span class="py">.w</span> <span class="o">*</span> <span class="n">b</span><span class="py">.w</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I ran into trouble with repetitions and horizontal operations on a vector. Having to add <code class="language-plaintext highlighter-rouge">+</code> to chain together repetitions means on something like a dot product you get <code class="language-plaintext highlighter-rouge">v.x + v.y + v.z +</code>; the trailing <code class="language-plaintext highlighter-rouge">+</code> kills compilation. For some other things, such as struct initialization, this is OK because Rust allows trailing <code class="language-plaintext highlighter-rouge">,</code>. I had a look into the <code class="language-plaintext highlighter-rouge">?</code> operator to run a repetition only once but couldn’t get it to work, maybe I am doing something wrong. I will revisit this at some point to try and get a better result. For the <code class="language-plaintext highlighter-rouge">Eq</code> op I also had the same issue; wanting to chain horizontal checks with <code class="language-plaintext highlighter-rouge">&amp;&amp;</code> here inside the macro I just whack a true on the end and then close the parenthesis:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">macro_rules!</span> <span class="n">vec_impl</span> <span class="p">{</span>
    <span class="p">(</span><span class="nv">$VecN:ident</span> <span class="p">{</span> <span class="nv">$</span><span class="p">(</span><span class="nv">$field:ident</span><span class="p">,</span> <span class="nv">$field_index:expr</span><span class="p">),</span><span class="o">*</span> <span class="p">},</span> <span class="nv">$len:expr</span><span class="p">,</span> <span class="nv">$module:ident</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
      <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="nb">Eq</span> <span class="k">for</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">where</span> <span class="n">T</span><span class="p">:</span> <span class="nb">Eq</span>  <span class="p">{}</span>
      <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="nb">PartialEq</span> <span class="k">for</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="k">where</span> <span class="n">T</span><span class="p">:</span> <span class="nb">PartialEq</span>  <span class="p">{</span>
          <span class="k">fn</span> <span class="nf">eq</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">other</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">bool</span> <span class="p">{</span>
              <span class="nv">$</span><span class="p">(</span><span class="k">self</span>.<span class="nv">$field</span> <span class="o">==</span> <span class="n">other</span>.<span class="nv">$field</span> <span class="o">&amp;&amp;</span><span class="p">)</span><span class="o">+</span>
              <span class="kc">true</span> <span class="c1">// redundant check just to get it to compile</span>
          <span class="p">}</span>
      <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>For the dot product the same could be achieved by adding a <code class="language-plaintext highlighter-rouge">T::zero()</code> at the start or the end to make the repetition work. Now because it’s a zero and should be a const then this would be optimised away, but maybe not in debug? I did some tests to see - indeed it ended up generating 42 more assembly instructions in an un-optimised build and even a few extra instructions in optimisation level 1. In the end I decided to create and implement a <code class="language-plaintext highlighter-rouge">Dot</code> trait to avoid this and so I did not have to rely on compiler optimisation; at this point though with the amount of code that gets generated from the trait implementation it did feel a little like pissing in the wind but it’s better than nothing?</p>

<h2 id="associated-methods-associated-functions-and-generic-functions">Associated Methods, Associated Functions and Generic Functions</h2>

<p>Another hurdle was to get around Rust’s lack of function overloading. Of the other libraries I trialled I noticed they implemented the <code class="language-plaintext highlighter-rouge">dot</code> as so: <code class="language-plaintext highlighter-rouge">v.dot(x)</code> so the dot function implemented as an associated method <code class="language-plaintext highlighter-rouge">dot(self, other: Self)</code>. While it’s a small difference, it is not what I wanted, which is <code class="language-plaintext highlighter-rouge">dot(v, x)</code> which could be implemented as an associated function <code class="language-plaintext highlighter-rouge">dot(a: Self, b: Self)</code>. The main problem with this then becomes how can I call <code class="language-plaintext highlighter-rouge">dot(x, v)</code> without having to qualify it for the vector type <code class="language-plaintext highlighter-rouge">Vec3::dot(x, v)</code>. The <code class="language-plaintext highlighter-rouge">Vec3::</code> adds to code verbosity and I was keen to eliminate this.</p>

<p>Generics come to the rescue here by allowing a generic <code class="language-plaintext highlighter-rouge">dot</code> to be implemented that can take any width of vector. For any types implementing the <code class="language-plaintext highlighter-rouge">Dot</code> trait (which is all of the vector sizes I care about) we can make a single function which uses the <code class="language-plaintext highlighter-rouge">Dot</code> as a trait bound.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// returns the vector dot product between a . b</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="n">dot</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">Number</span><span class="p">,</span> <span class="n">V</span><span class="p">:</span> <span class="n">VecN</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">V</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">V</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">T</span> <span class="p">{</span>
    <span class="nn">V</span><span class="p">::</span><span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It then dawned on me, I could share traits between vectors and scalar numerical types so that I could implement common functions such as <code class="language-plaintext highlighter-rouge">min, max, clamp</code> etc. In order to do this the number of traits exploded a little, but the end result was the ability to have tons of useful generic functions that could be called on any types (trait bounds permitting) which gave the same look and feel as a shader language.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// returns the maximum of a and b</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="n">max</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">Number</span><span class="p">,</span> <span class="n">V</span><span class="p">:</span> <span class="n">NumberOps</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">V</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">V</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">V</span> <span class="p">{</span>
    <span class="nn">V</span><span class="p">::</span><span class="nf">max</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>At first I was slightly confused why Rust didn’t implement its own numerical traits, I noticed a few crates which implemented traits for numerical types and operations but I wasn’t sure which ones I should use or why. There was also historical mention of numerical traits in rust, I later discovered they were removed from the standard library. In a bid to navigate this minefield I just decided to implement my own traits to add only exactly what I needed. I had to implement a <code class="language-plaintext highlighter-rouge">Base</code> trait (implemented by both vectors and scalars). <code class="language-plaintext highlighter-rouge">Number</code> for floats and ints, <code class="language-plaintext highlighter-rouge">Float</code> for floats, <code class="language-plaintext highlighter-rouge">SignedNumber</code> for signed types (signed integers and floats). Along with those operations that can be performed on those types <code class="language-plaintext highlighter-rouge">NumberOps</code>, <code class="language-plaintext highlighter-rouge">SignedNumberOps</code> and <code class="language-plaintext highlighter-rouge">FloatOps</code>. The base type traits are really just aggregations of arithmetic ops so they can be used as trait bounds inside other traits. The <code class="language-plaintext highlighter-rouge">Ops</code> types I added supply traits for things such as <code class="language-plaintext highlighter-rouge">floor</code> or <code class="language-plaintext highlighter-rouge">ceil</code> and <code class="language-plaintext highlighter-rouge">round</code> on floats. <code class="language-plaintext highlighter-rouge">min</code>, <code class="language-plaintext highlighter-rouge">max</code> and <code class="language-plaintext highlighter-rouge">clamp</code> on numbers etc. The vectors also implement the <code class="language-plaintext highlighter-rouge">NumberOps</code>, <code class="language-plaintext highlighter-rouge">FloatOps</code> and <code class="language-plaintext highlighter-rouge">SignedNumberOps</code> where the base <code class="language-plaintext highlighter-rouge">T</code> used in <code class="language-plaintext highlighter-rouge">Vec&lt;T&gt;</code> supports the operations.</p>

<p>This is where I became very familiar with <code class="language-plaintext highlighter-rouge">where</code> clauses. It’s really quite cool to implement <code class="language-plaintext highlighter-rouge">NumberOps</code> for <code class="language-plaintext highlighter-rouge">Vec&lt;T&gt;</code> where <code class="language-plaintext highlighter-rouge">T: NumberOps</code> or <code class="language-plaintext highlighter-rouge">FloatOps</code> for <code class="language-plaintext highlighter-rouge">Vec&lt;T&gt; where T: FloatOps</code>. So we are saying for any vector of <code class="language-plaintext highlighter-rouge">i32</code> we get number ops and for any vector of <code class="language-plaintext highlighter-rouge">float</code> we get both number ops and float ops. After implementing the various combinations of traits this gives the flexibility to supply trait bounds to generic functions, which allows me to use scalar or vector types!</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">f</span> <span class="p">:</span> <span class="nb">f32</span> <span class="o">=</span> <span class="nf">min</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">);</span>
<span class="k">let</span> <span class="n">v</span> <span class="o">=</span> <span class="nf">min</span><span class="p">(</span><span class="nf">vec2f</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span> <span class="nf">vec2f</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">));</span>
</code></pre></div></div>

<h2 id="from-with-tuples">From With Tuples</h2>

<p>I found the <code class="language-plaintext highlighter-rouge">From</code> trait to be quite useful. We can implement <code class="language-plaintext highlighter-rouge">From</code> multiple times with generic arguments <code class="language-plaintext highlighter-rouge">From&lt;T&gt;</code> so I used ‘From’ to construct different sized vectors from one another; truncating them when assigning to smaller sizes or extending with zeros to larger sizes. One thing that isn’t possible to do though is to allow the function to take multiple values. The trait expects a single parameter passed into the <code class="language-plaintext highlighter-rouge">fn from(other: T)</code> function.</p>

<p>Tuples can provide almost the same functionality, when I thought of this idea I thought it was a cool hack! Just adding the need to supply an extra pair of parentheses to allow <code class="language-plaintext highlighter-rouge">From</code> multiple values. By using tuples in <code class="language-plaintext highlighter-rouge">From</code> functions I was able to create various combinations of constructors for vectors of different sizes combined together or with scalar values.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">v4</span> <span class="o">=</span> <span class="nn">Vec4f</span><span class="p">::</span><span class="nf">from</span><span class="p">((</span><span class="n">v2</span><span class="p">,</span> <span class="n">v2</span><span class="p">));</span> <span class="c1">// vec4 from 2x v2's</span>
<span class="k">let</span> <span class="n">v3</span> <span class="o">=</span> <span class="nn">Vec3f</span><span class="p">::</span><span class="nf">from</span><span class="p">((</span><span class="n">v2</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">));</span> <span class="c1">// vec3 from 1x v2 and 1x scalar</span>
<span class="k">let</span> <span class="n">v2</span> <span class="o">=</span> <span class="nn">Vec2f</span><span class="p">::</span><span class="nf">from</span><span class="p">((</span><span class="mf">5.0</span><span class="p">,</span> <span class="mf">6.0</span><span class="p">));</span> <span class="c1">// vec2 from 2x scalars</span>
<span class="k">let</span> <span class="n">v4</span> <span class="o">=</span> <span class="nn">Vec4f</span><span class="p">::</span><span class="nf">from</span><span class="p">((</span><span class="n">v2</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">));</span> <span class="c1">// vec4 from 1x v2 and 2x scalars</span>
<span class="k">let</span> <span class="n">v4</span> <span class="o">=</span> <span class="nn">Vec4f</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="n">v2</span><span class="p">);</span> <span class="c1">// vec4 from vec2 (splat 0's)</span>
<span class="k">let</span> <span class="n">v2</span> <span class="o">=</span> <span class="nn">Vec2f</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="n">v4</span><span class="p">);</span> <span class="c1">// vec2 from vec4 (truncate)</span>

<span class="c1">// construct rows from tuples</span>
<span class="k">let</span> <span class="n">m3v</span> <span class="o">=</span> <span class="nn">Mat3f</span><span class="p">::</span><span class="nf">from</span><span class="p">((</span>
    <span class="nf">vec3f</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">),</span>
    <span class="nf">vec3f</span><span class="p">(</span><span class="mf">4.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">,</span> <span class="mf">6.0</span><span class="p">),</span>
    <span class="nf">vec3f</span><span class="p">(</span><span class="mf">7.0</span><span class="p">,</span> <span class="mf">8.0</span><span class="p">,</span> <span class="mf">9.0</span><span class="p">)</span>
<span class="p">));</span>
</code></pre></div></div>

<p>Tuples also worked well to construct matrices from rows of vectors, or scalar values.</p>

<h2 id="test-driven-development">Test Driven Development</h2>

<p>After ironing out the main structure of the API with all of the numerical traits, operations, vector and scalar combinations I started to do most of the grunt work implementing functions. For a maths library there are quite a lot of things you need to implement, but one thing that is quite nice about them is they are very small and very unit testable pieces of work. I went pure TDD here in a lot of cases, making the test first and then implementing functionality. Not strictly in all cases (I did write some functions first and then the tests later, soz… sue me!), but due to the nature of the code and iterating with tests this process was really enjoyable. In total there are currently 110 tests covering most areas, there are still a few missing pieces I will be adding over time and I hope to get some code coverage tools working to aid that process. You can take a look at the current tests <a href="https://github.com/polymonster/maths-rs/blob/master/tests/tests.rs">here</a>.</p>

<p>I am lucky enough to have a few different work laptops, one of which is the M1 MacBook Air that I had previously just been using for building and testing compatibility with the M1 and x86 builds. As I was going away a few times over the summer I decided to bring this laptop with me as it was small and lightweight compared to my MacBook Pro 16”. This was a revelation, I was able to code in all sorts of places: planes, trains and even by the pool! Combining this with the easily unit testable nature of maths code I made light work of the whole thing. I spent time on holiday, just a few minutes here and there, to implement another couple of tests and another couple of functions, over a 9-week period I implemented this whole thing, chipping away at it a small piece at a time.</p>

<p>For a lot of the tests I wrote some simple examples by hand, this included all of the vector and matrix arithmetic, constructors and so forth. These kinds of things are fairly easy to write down on paper. For intersection tests things get a bit more interesting; it’s easy to come up with a few trivial example tests (such as a line intersection with 2 axis aligned lines), which I added, but for more cases I already had a pretty comprehensive set that I had generated from this visual <a href="https://www.polymonster.co.uk/pmtech/examples/maths_functions.html">demo</a> of my C++ library. I made interactive 3D samples of all of the available intersection tests, verified their correctness visually and then used some code generation injection to generate the tests. I made a python script that was able to convert the C++ test code into Rust test code.</p>

<p>A great benefit of having tests is the ability to refactor, as things progressed I saw new opportunities to make things more generic, which required refactoring some traits. I also was able to maintain good consistency in the API after introducing small issues such as in functions <code class="language-plaintext highlighter-rouge">point_inside_aabb</code> where I had ordered the arguments as <code class="language-plaintext highlighter-rouge">(aabb_min: V, aabb_max: V, p: V)</code> this doesn’t read so well as the arguments are in the opposite order to the function name. Some of these inconsistencies came from my C++ maths library implementation which is in use in a few projects making it more difficult to refactor, it was nice here to unify all of these details.</p>

<h2 id="features">Features</h2>

<h3 id="swizzles">Swizzles</h3>

<p>Vector swizzling is a handy feature in shader language and getting some support into this Rust library feels like coming full-circle. I wrote my first Rust program <a href="https://github.com/polymonster/permute">permute</a> whilst on holiday in 2019; the program can output all permutation combinations from some given inputs and an output format. I used the library in 2019 to generate C++ template code for vector swizzles in my C++ maths <a href="https://github.com/polymonster/maths/blob/master/swizzle.h">library</a>.</p>

<p>I adapted the source slightly to output swizzles for Rust. I couldn’t quite get the swizzles to the same shader-style degree as the C++ implementation. It might be possible in future with support for [unnamed unions], but for the time being I generated traits and functions to return swizzled vectors of various sizes and a collection of <code class="language-plaintext highlighter-rouge">set</code> methods as well.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// swizzling</span>
<span class="k">let</span> <span class="n">wxyz</span> <span class="o">=</span> <span class="n">v4</span><span class="nf">.wxyz</span><span class="p">();</span> <span class="c1">// swizzle</span>
<span class="k">let</span> <span class="n">xyz</span> <span class="o">=</span> <span class="n">v4</span><span class="nf">.xyz</span><span class="p">();</span> <span class="c1">// truncate</span>
<span class="k">let</span> <span class="n">xxx</span> <span class="o">=</span> <span class="n">v4</span><span class="nf">.xxx</span><span class="p">();</span> <span class="c1">// and so on..</span>
<span class="k">let</span> <span class="n">xy</span> <span class="o">=</span> <span class="n">v3</span><span class="nf">.yx</span><span class="p">();</span> <span class="c1">// ..</span>

<span class="c1">// mutable swizzles</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">v</span> <span class="o">=</span> <span class="nn">Vec4f</span><span class="p">::</span><span class="nf">zero</span><span class="p">();</span>
<span class="n">x</span><span class="nf">.set_xwyz</span><span class="p">(</span><span class="n">v</span><span class="p">);</span> <span class="c1">// set swizzle</span>
<span class="n">x</span><span class="nf">.set_xy</span><span class="p">(</span><span class="n">v</span><span class="nf">.yx</span><span class="p">());</span> <span class="c1">// assign truncated</span>
<span class="n">x</span><span class="nf">.set_yzx</span><span class="p">(</span><span class="n">v</span><span class="nf">.zzz</span><span class="p">());</span> <span class="c1">// etc.. </span>
</code></pre></div></div>

<h3 id="left-hand-sided-scalar---vector-arithmetic">Left-Hand Sided Scalar - Vector Arithmetic</h3>

<p>In order to support left hand side scalar multiplication with vectors (as supported in shader languages) I had to implement arithmetic on foreign types to make this sort of thing possible:</p>

<div class="language-hlsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// multiplying a scalar by vector results in a vector</span>
<span class="kt">float3</span> <span class="n">v</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="o">*</span> <span class="nf">float3</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>

<p>Initially I tried to do this once for all vectors of type <code class="language-plaintext highlighter-rouge">&lt;T&gt;</code> but this was not permitted, resulting in the following error:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="nb">Add</span><span class="o">&lt;</span><span class="n">Vec2</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="n">T</span> <span class="p">{</span>
  <span class="c1">// ^ type parameter `T` must be covered by another type when it appears before the first local type (`Vec2&lt;T&gt;`)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I had to implement the ops for each primitive type I wanted. A macro allows for a single implementation but here I had to commit to concrete vector types I may want, I used a macro here to optimise the process.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// macro to stamp out all arithmetic ops for lhs scalars</span>
<span class="nd">macro_rules!</span> <span class="n">vec_scalar_lhs</span> <span class="p">{</span>
    <span class="p">(</span><span class="nv">$VecN:ident</span> <span class="p">{</span> <span class="nv">$</span><span class="p">(</span><span class="nv">$field:ident</span><span class="p">),</span><span class="o">+</span> <span class="p">},</span> <span class="nv">$t:ident</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
        <span class="k">impl</span> <span class="nb">Add</span><span class="o">&lt;</span><span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="nv">$t</span> <span class="p">{</span>
            <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;</span><span class="p">;</span>
            <span class="k">fn</span> <span class="nf">add</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">other</span><span class="p">:</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;</span> <span class="p">{</span>
                <span class="nv">$VecN</span> <span class="p">{</span>
                    <span class="nv">$</span><span class="p">(</span><span class="nv">$field</span><span class="p">:</span> <span class="k">self</span> <span class="o">+</span> <span class="n">other</span>.<span class="nv">$field</span><span class="p">,)</span><span class="o">+</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="c1">// other ops go here...</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="shorthand-constructors">Shorthand Constructors</h3>

<p>I also added shorthand constructors which look like <code class="language-plaintext highlighter-rouge">glsl</code> - again here I needed to stamp out a concrete implementation, so I committed to the following types for both constructors and for left hand side scalar arithmetic:</p>

<p><code class="language-plaintext highlighter-rouge">vecf</code> = 32-bit float
<code class="language-plaintext highlighter-rouge">vecd</code>= 64-bit float
<code class="language-plaintext highlighter-rouge">veci</code> = 32-bit signed integer
<code class="language-plaintext highlighter-rouge">vecu</code> = 32-bit unsigned integer</p>

<h3 id="from-primitive-casts">From (primitive casts)</h3>

<p>I also added a macro which creates <code class="language-plaintext highlighter-rouge">From</code> traits between these various primitive types so you can cast between vector of int to vector of float.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">macro_rules!</span> <span class="n">vec_cast</span> <span class="p">{</span>
    <span class="p">(</span><span class="nv">$VecN:ident</span> <span class="p">{</span> <span class="nv">$</span><span class="p">(</span><span class="nv">$field:ident</span><span class="p">),</span><span class="o">+</span> <span class="p">},</span> <span class="nv">$t:ident</span><span class="p">,</span> <span class="nv">$u:ident</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
        <span class="k">impl</span> <span class="nb">From</span><span class="o">&lt;</span><span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$u</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;</span> <span class="p">{</span>
            <span class="k">fn</span> <span class="nf">from</span><span class="p">(</span><span class="n">other</span><span class="p">:</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$u</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;</span> <span class="p">{</span>
                <span class="nv">$VecN</span> <span class="p">{</span>
                    <span class="nv">$</span><span class="p">(</span><span class="nv">$field</span><span class="p">:</span> <span class="n">other</span>.<span class="nv">$field</span> <span class="k">as</span> <span class="nv">$t</span><span class="p">,)</span><span class="o">+</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="k">impl</span> <span class="nb">From</span><span class="o">&lt;</span><span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$u</span><span class="o">&gt;</span> <span class="p">{</span>
            <span class="k">fn</span> <span class="nf">from</span><span class="p">(</span><span class="n">other</span><span class="p">:</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$t</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nv">$VecN</span><span class="o">&lt;</span><span class="nv">$u</span><span class="o">&gt;</span> <span class="p">{</span>
                <span class="nv">$VecN</span> <span class="p">{</span>
                    <span class="nv">$</span><span class="p">(</span><span class="nv">$field</span><span class="p">:</span> <span class="n">other</span>.<span class="nv">$field</span> <span class="k">as</span> <span class="nv">$u</span><span class="p">,)</span><span class="o">+</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Some people may not require all of these types, so I have exposed the macros to create the constructors and arithmetic operation implementations for primitive types and added a feature in the <code class="language-plaintext highlighter-rouge">Cargo.toml</code> to disable these features should they wish.</p>

<h2 id="wrapping-it-all-up">Wrapping it all up</h2>

<p>During this process I discovered many small inconsistencies and gaps in my C++ library so I took the opportunity to note these down and will revisit that when I get a chance.</p>

<p>There is still some more work to do in order to complete the project. I intend on using the library now to create a graphical demo in Rust using my in-progress graphics library, which can showcase the maths library’s features in a visual way, much like my C++ libraries live <a href="https://www.polymonster.co.uk/pmtech/examples/maths_functions.html">demo</a>. This process was taking a while so I decided at this point to publish the project and add the graphical demo later so that I could write up this blog post while thoughts were fresh in my mind.</p>

<p>The final step was to publish on crates.io, the process is very simple… I added metadata to the <code class="language-plaintext highlighter-rouge">Cargo.toml</code> file, added a readme with some examples, added document comments, fixed all <code class="language-plaintext highlighter-rouge">cargo clippy</code> warnings and finally hit publish. The culmination of roughly 10 weeks of consistent work. It felt satisfying.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[How I built maths-rs, a comprehensive Rust linear algebra library for game development with vectors, matrices, quaternions, and a full suite of intersection and distance functions.]]></summary></entry></feed>