Wednesday, 20 July 2016

Thoughts on scripting a game engine

Scripting is hard

Letting users script your system is a hard problem. How do you do it. Do you just expose eval of your favourite scripting environment with the selected apis to your system's machinery? Do you hack up an eval with an all-access pass to your machinery?

I'm using the terms I think I learned from Steve Yegge. The machinery, the engine refer both to the immutable parts of the software system set in stone by the original compiler. Think of the parts in Emacs or a web browser written in C. Scripting layer refers to the parts written in, using those examples, Lisp or Javascript.

I haven't researched Emacs' codebase at all, so I'm basing my understanding on reading blogs and inferring on its behaviour. AFAIK Emacs is implemented with the absolute minimal base system written in C, and as much as possible scripted on top of that in Lisp. Thus almost every parameter is trivially open to mutation by the user. There seems to be a way to extend Emacs with shared objects, but you don't want to do that unless you have to, because unless doing native, system-dependent things Lisp and C are equivalent in power and the tooling is a lot better in the lisp side. In fact, with Lisp you can mold Emacs to a email app or browser or anything which doesn't resemble the original text editor at all. I know Amazon used to have an email app written inside Emacs. I have also seen a few POS-systems running inside unix shells which might've been more useable to both the users and the developers if done inside Emacs.

But, but, but. Emacs has complete access to the shell and the filesystem. That's... that's... a security hole isn't it?

Might be. It doesn't really matter though, unless you're using Emacs as a http server and routing input from the sockets to eval, which might be a bit stupid. If you're paranoid of the third party Emacs extensions you're installing, it might be useful to grep for '(shell-command' and other IPC functions in the extension's sources. But that's why we have MELPA these days. Scripts user runs inside Emacs are assumed to be trusted.

Let's have a look on the browsers. There's a shitload of C, C++ and other compiled things in their machinery. There are the websites browser is a runtime for. A website can be a static, declarative document which makes interpreting it easy(ish). A website can also include procedural scripts. Those scripts, written in JS, are by definition not trusted. There's a sandbox in the browser the scripts are run in so that a website can't just ‘rm -rf --no-preserve-root /‘. Scripts are not allowed (or supposed? I'm not sure if the apis exist to do this) to mess with the browser's window chrome, only with the viewport browser assigns them.

The difference is the level of trust. Emacs scripts are assumed to be run by a user who knows what they're doing, and thus if a third-party script succesfully ‘rm -rf‘s everything, it's user's fault for running Emacs as root. Javascript is assumed to be run either by a babbling bumbling band of baboons who haven't dedicated their lives on computers or silently, completely unbeknownst to the user. However both leave the eval function open. Thus, if you have a way to ask textual input from the user, implementing a runtime REPL is easy.

  while(true) {
    alert(eval(prompt("Input: ", "")));
  }
;; don't do infinite loops in single-threaded, completely synchronous Emacs
(while true
  (message (prin1-to-string (eval (read)))))

Maybe it isn't that hard after all

How do Emacs and browsers map to the ways-to-script I started this text with? Browsers follow the first principle. Everything is forbidden until someone with enough authority (be it the user or someone in the w3c) authorizes it. Emacs does the second. It wishes to be able to script everything runtime, but because Unix won and our machines aren't lisp all the way down, that's not plausible.

Secret in these two environments is that they are easy and fun to script. I think Microsoft's tools like Office and Visual Studio are also runtime scriptable with a Visual Basic variant, but opening their scripting environment has never caused me the feeling to poke the system with a stick and see how it behaves. Last I checked (which I admit might have been in the XP era, me being somewhere around 10-16 years old) the VBA environment was a lousily documented, heavy gui app, with APIs that make, IDK, Java's ZIP-file API beautiful.

What are browsers then? Well, when I got into the web programming, the introspection in browsers was lousy and the DOM API has never won any beauty contests either. However, I thin these days they have introspection (in a form of the console and the dom viewer) and discoverability/self documentation (in a form of console auto-complete at least), they have always (in the context of one who learned to spell 'internet' well past the first browser wars) been freely available, and the feedback loop is much faster than with the compiled languages. The first argument was mostly a filler to achieve the magical count of arguments, but the latter two are important. They invite poking, and even though the user didn't fully grok how DOM works, they can hack functional stuff up fast and iteratively make it better afterwards.

Even though DOM isn't beautiful, it's necessary to have. In the browsers you don't have access to the machinery without it. If the scripting layer implemented only the base language, stdlib (dates etc.) and a runtime eval, you'd have a glorified calculator. You need a way to read stuff from the user and to print to them. If scripting had completely access to the environment's windowing system, that'd be cool, but then they'd be able to shell to rm too. Unnecessarily called rm isn't cool. Instead the browser decides to expose the selected parts of the windowing toolkit inside its own window to the scripts. The environment's windowing machinery is hidden under a common, system independent abstraction known as the dom.

I don't know much about how Emacs fills the needs like the one filled by the DOM in browsers. I haven't had the need to do much more with Emacs Lisp than re-assigning a few keystrokes, setting up modes and generating repetitive Java- and C#-shit. Yet. What I know is that Emacs does both the introspection and self-documentation a lot better than the browsers. In every mode you can write a small lisp snippet and press Ctrl+x Ctrl+e (or wherever you've bound eval-last-sexp). This evals the snippet in the context of the current buffer (similarly to how browser console evals javascript in the context of the currently open website). The lisp symbols carry their own documentation with them. Unlike in Java where documentation has to be separatedly compiled from the metadata in the source, you can ask an Emacs Lisp function for its documentation.

Let's assume you're writing an Emacs extension where you need to run a query-replace in the current buffer. As user you'd call it by pressing Alt+Shift+5 (M-%), but how do you do it as a programmer? The help system is under Ctrl+h. You need to know the group in which you are trying to get help. Now you need to know what happens when pressing Alt+Shift+5 (M-%). Let's ask help on keybindings: Ctrl+h k (for keybindings) Alt+Shift+5 (or C-h k M-%). This tells us all we need, API looks like this: (query-replace FROM-STRING TO-STRING &optional DELIMITED START END BACKWARD). The info-page has a lot more interesting stuff too, M-x count-words tells me there's about 328 words documented.

The example is a bit foolish, but that's because I've not done anything deep enough on Emacs to have any real-world examples. It still demoes the help system that's always near your fingertips, and unlike Visual Studio's unIntellisense, stays hidden when necessary and doesn't disappear halfway through the reading. To get more information on the help categories press C-h ?

(Somewhat stupid or ironic that just after I tell how the help system doesn't hide without user's consent, I find the one cursor in which I can put neither mark nor point in)

Scripting MERPG

So, how can these observations be used when designing my engine's scripting system? You might've not heard (ha :D) but the engine is as deeply written in Lisp as possible. This provides an excellent base to build a scripting api on top of. To be clear, this particular Lisp is Clojure and the machinery is written in Java. The only part I've had to do in Java that's not in either JVM's base class library or Clojure's runtime is the map renderer. There are a few parts in the render process written in Clojure which are the low hanging fruits if I ever have to optimize things. So, aside from the finer rendering details, everything else is done in the Clojure layer. The important stuff is even documented, so the users can connect to the engine's nrepl-server (nrepl-server basically exposes in-process eval to sockets) and either run (doc 'merpg.important.symbol) in the repl or use Cider's C-c C-d C-d - keybinding that opens the dedicated *help* buffer.

But exposing eval and rudimentary introspection tools provided by nrepl to the end-user isn't enough. Well, if your app's importance is similar to browsers', then you might get away with it, but those having to work with your half-assed system will curse you with their dying breaths. It's better to write a lot of documentation, and provide a lot of hooks for events you expect user's to have a need to script on (like, for example, DOM's onfoobar - event api). We also need a few really well thought out abstractions. Not like DOM's, which has useful abstractions but a horrible API, but like in Emacs, which's buffer abstraction is beautiful and the language permits making beautiful API's around it.

I like to think highly on my choices on how the game assets and -state is stored in The Registry, how registry is optimized on insertion, deletion and easy lookups when you know what you're looking for, and how it's transformed in the background to structures more optimized for rendering and other stuff. Time will tell though whether this design works or if it leads to worse performance than Minecraft's. Similarly, I like to think that the registry is an easy abstraction for the end-user to comprehend (I mean, what could be simpler than key-val - tables?), but only time will tell. When this editor has a "Build Executable" - button, I plan on implementing a couple of simpler games on the engine and improving the it based on those experiences.

Anyway, the relevant abstractions are the registry and every object type you see on the editor's domtree. Maps, tilesets, layers, animated and static sprites, tiles (which aren't visible on the domtree), and after I've designed this through, scripts.

The simplest way to add reactions to events would be to add watches on registry based either on the concrete ID (think of the following scenario

;; we're inside some other event
(let [sprite (animated-sprite! (re/peek-registry :selected-map) "./my beautiful spritesheet.png" 10)]
  (re/add-watch-on-key sprite
                       ;; This is called before committing the new-sprite-obj to the registry so you can check what's changed.
                       ;; new-sprite is an atom so that events can manipulate the object before it ends up in the registry without
                       ;; causing endless loops
                       (fn [new-sprite]
                         (let [{:keys [x y] :as new-sprite-obj} @new-sprite
                               {old-x :x
                                old-y :y} (re/peek-registry sprite)]
                           (if (or (not= x old-x)
                                   (not= y old-y))
                             (println "The sprite has moved!"))))))
this reacts to the movement of the dynamically loaded animation. animated-sprite! (like all the others that load assets from disk) puts the loaded asset automagically to the registry and returns the key with which you could fetch it from the registry. With re/add-watch-on-key you add the event that's called every time someone does a re/update-registry with the key you're interested in. The event gets called with the value that's en route to the registry. Surprisingly the value is an atom though. This way events can modify it without firing any events.

The user should also be able to set watches on the :types of objects. Call an event every time a sprite has moved.

There should be a simple way to poll the status of both the keyboard and the mouse. Every new environment I've moved to after I stopped doing coolbasic has disapointed me with the complexity of reading those. Events are cool and follow the best practices (or whatever is the buzzword for following whatever they were teaching as the gospel decade and a half ago in the java certification universities) for reading the IO state when you're doing a CRUD app, but with polling you can do a lot more complex actions with really simple code.

I'll provide filterable and map-able reagi streams for keydown, keyup, mousedown, mouseup and mouse coordinates. I'm not 100% certain these are as simple as I think. In case they prove to be too complex in the demo phase, I'll fart up a Windows VM and check if my recollection of the Coolbasic's IO API is just pure nostalgy or if it really was simpler.

Of course I have to expose :onload and :onclose - events, which are run on the startup and closing of the final game.

Script - assets and the editor

So, how would an user actually write and eval these scripts? The obvious way would be to let them write the scripts in their favourite text editor, keep the files with the .memap - project file and somehow import those files into the editor. I think last I checked Unity did something like this, but Unity's project directories are a nightmare. Let's not do that. Users having to carry multiple files in a project directory provides a lot more possibilities for the system to break than having a single file that contains the whole project image.

Thus I have to embed the scripts as assets inside the image. Which means having users using their favourite text editor becomes a bit more difficult. In theory you can open files-in-a-zip with Emacs' dired, but that's not exactly simple, Emacs is hardly everyone's favourite text editor (for reasons I've never understood :P) and the more complex editors are complex enough to miss the extremely simple use case of being able to save to a file handle pointing to inside a zip file.

I could implement an Emacs-like editor inside the map editor. But I sure as hell am not going to do that. I'd have the lisp machine model ready, but to make it useful even to those fond of Emacs I'd have the fart up an elisp->clojure - compiler (not impossible), rewrite Emacs' abstractions and do a fuckload of testing to find the corner cases. I'm not ready to clone Emacs, better people have tried and not-yet-succeeded.

The third option: make a server. If we already have an nrepl-server running for poking the live instance with a stick, it's no problem to make another server you could query the text-file assets from. I'm not sure if I could just extend nrepl-server to do this or if I have to invent a completely new protocol and dedicate another socket for this. If we assume we can trust everyone connecting to this file-server-socket, making it support find-file and save-buffer isn't hard. If the user has been able to connect with cider (or any other nrepl client) before reading the script asset from the server, we get the eval- and introspection capabilities for free.

Making a dedicated server or extending nrepl means I have to hold my nose and dive into the wonderful world of elisp. I plan on making a minor-mode which overloads find-file and save-buffer to work with urls pointing to the running server. I have almost no clue on how this would work technically, I just know that Emacs does async socket IPC and I have the Emacs' online help and the whole internet near my fingertips. The UX would be such that the user first connects to the running nrepl. Then in a buffer with cider running they'd use C-x C-f and input an url like "localhost:33500/your.games.ns.core". Format is "server:port/ns-name.here". If I can make this server by extending nrepl, I probably could make the format such that host:port/ isn't compulsory and if left out, Emacs would just use the default nrepl connection. When pressing C-x C-s in a buffer that's been loaded from a server, it could remember the path in a buffer-local variable, send it there and the server would either save the new source or ask the client to merge if there's been new material from another client after the last read on this client.

Format of the asset

The assets in the game server's registry is simple. Relevant properties are :id with which you refer to the asset in-engine, :name (which is completely irrelevant for development, and is used only in the editor's domtree as a prettier string than the id) and :parent-id. :parent-id belongs to the set of the loaded map-ids. It matters also in a way that when :selected-map changes in the registry, those scripts with the new :map-id as their :parent-id will be run. :order specifies the order in which the files are loaded. The most import property is :src. It's a textual representation of the source, what will be sent to the editor requesting it with C-x C-f and what will be overridden when (C-x C-s)ing. There might be more properties if I implement the support for concurrent editing.

Watches are installed when loading map's scripts. Scripts are autoloaded based on their :parents. There's no automatic cleanup of the old map's scripts, because generating a complement of an indefinite impure function is somewhat difficult. If you require cleanup, keep a hold of your watches' ids and install a watch on :selected-map that drops them.

There are also the game's :onload and :onclose which are run on process startup and process shutdown. To those you bind dedicated script assets in the game editor.

Zonetiles

This became a bit longer post than I anticipated. Bear with me, this should be the last title.

Zonetiles are an old concept based on the idea that a certain code will be run when there's a sprite entering a certain tile. My favourite use case for zonetiles are the doors to houses or dungeons or whatever. In the past they've been implemented as a hashmap of [tile-x tile-y] => lambda. That doesn't cut it currently, though, because entering lambdas sucks without all the base work I've specified on this text. Besides, simple coordinate-lambda - mapping is... a bit too simple in way.

Instead I plan on making a zonetile api in which the user filters the set of current map's tiles they wish to set the zone in with an indefinite predicate. Then they filter the sprites the zone matters to with another predicate. Then every whateverth millisecond, the engine shall search all the colliding tile-sprite pairs and call a zonetile lambda with their ids.

Conclusion

There's a lot to do. First I implement the script assets in the merpg editor. Then I'll research how to implement the file server. Afterwards is time to hack Emacs. When that's done, the editor side of this project is done. I think running the game shall require an optimized mode where it doesn't, for example, rerender the whole map every frame. Only when it has changed. After that mode is done, I need a way to dump the project image to disk as an executable jar file. Then... it's... playtime?

No comments:

Post a Comment