Optimizing Many Spines

HHeleor · 2021년 10월 11일

I'm currently trying to optimize my game's rendering which slows to a crawl when many spine sprites are loaded at once.

I have a few constraints that (may?) be unique to my use case:

I share a single spine rig for every one of the characters.

Today, I'm cloning the skeleton rig for each instance.

The rig has 20 different skins which may or may not be enabled for a given character.
Each skin has hot-swappable textures (for example the 'shirt' slot might be replaced with 'shirt_red.png' or 'shirt_blue.png').

This is needed as we have over a thousand textures and trying to model each of them as skins is probably not doable.

The spine rig I have uploaded and a testing tool here: https://genfanad.com/admin/character.html

The problems I have are:

Even when the rig is off screen, the skeleton mesh updates take a large amount of time.
Large amount of webgl calls (primarily due to material swapping and lack of instancing, I assume?)

8000 calls with just 400 characters on screen.

Does anyone have any suggestions on how to make this approach more efficient? I'd love to move all of the animation to the GPU if possible. Has anyone run all spine logic on the GPU before, including texture swapping (if that's even possible)?

(I am also willing to pay a bounty for someone to optimize this for me. Not sure if that's against forum rules or not.)

Mario · 2021년 10월 11일

From the performance log, this doesn't seem to be an issue with Spine, but with three.js. For some reason, it seems to compile a shader when Spine is merely uodating the geometry. It's hard for me to tell why that is, as I'm not familiar with three.js internals. But that's what I'd investigate first. It may have to do with the way you set materials/textures on your SkeletonMesh instances.

HHeleor · 2021년 10월 11일

There's a few different issues with the overall performance trace

I wouldn't focus on that specific profiler run necessarily. You do call out correctly that compiling the shader is expensive, but those models are static so it definitely shouldn't be doing that. I can look into that, but are you aware of any existing bugs that would cause that?

Note that even if the spine rigs are off screen (or when I remove them from the scene entirely), there is still a significant performance slowdown just from spine.threejs.SkeletonMesh update(). That's the part that I'm looking to see if there's a way to run in the GPU. Ideally, I'd have the entire skeleton data in the GPU and have each instance render with a specific set of texture IDs.

Are there any other comments you have about my constraints and setups (as I described)?

Mario · 2021년 10월 13일

I'm unsure why we should ignore that specific trace, as it clearly indicates the problem. The issue is 1. uploading the meshes to the GPU and 2. recompilation of the shader, which should never ever happen that often. That eats 20% of your frame budget, and it's the first thing I'd fix.

Updating skeletond data, i.e. bones and free form mesh deforms, on the GPU is going to be hard and is unlikely to gain you much. The data layout doesn't lend itself to parallel processing, as it is in form of a tree.

I've created a pure WebGL benchmark here, to eliminate all ThreeJS related issues, like the shader recompilation:
https://github.com/EsotericSoftware/spine-runtimes/blob/4.0/spine-ts/spine-webgl/tests/test-drawcalls.html

It draws 400 skeletons with a custom skin, which are probably a bit more complex than the skeletons you show in your screenshot above. That ends up with 33 draw calls in total, as multiple skeletons can be batch rendered. The limit here is the maximum vertex buffer size with 32k vertices/indices.

The heaviest code path is VertexAttachment.computeWorldVertices(), which takes the bone transforms and transforms mesh vertices accordingly (aka skining). Here are the stats for this skeleton as shown in Spine's metric view:

That's 1742 vertex transforms, possibly influenced by more than one bone, times 400 skeletons, for a total of at least 696.800 vertex transforms performed each frame (assuming single bone weights, which is not true, so it's likely quite a bit more, probably at least twice as much).

One way to cut this down is to reduce the number of vertices in the meshes of the skeleton as well as bone influences. Another way to speed this up is to move the transform code to the GPU. However, as you'd have to set the bone matrix palette for each skeleton, that'd result in broken batching. We've found that GPU side skinning usually does not outcompete CPU side skinning with proper batching. What you see in this micro benchmark is pretty much as good as it gets I'm afraid.

As for skeleton's being updated while out of view, that is ThreeJS specific. We can't decide for your use case if you want the animations of a skeleton to be updated or not if its not in view. You are explicitely calling SkeletonMesh.update(delta), which will trigger the vertex transform calculations. I was unable to find a way in ThreeJS to check if a model is in the frustum or not, which would allow you to skip the call to SkeletonMesh.update(delta). There's Object3D.frustumCulled but that's a setting for the ThreeJS internal renderer, which doesn't seem to surface the culling result to your application code.

Weirdly enough, ThreeJS doesn't seem to provide an out of the box method to check if an Object3D is within the current camera frustum. It seems that you have to construct your own Frustum, then manually check the bounding box vertices of the object against it.

HHeleor · 2021년 10월 13일

Thanks for the detailed reply and the sample application.

Is there a way to check how many textures/materials the sample skeleton is using? My spine rig uses 20 distinct textures. I forgot to add this link when I was making the original post: https://genfanad.com/admin/character.html

I suspect this might be breaking any batching that exists?

Relevant code snippets: https://imgur.com/a/4I34CX4, full code file: https://pastebin.com/whdTtuSD

As mentioned in my original post, I'm swapping textures in the atlas ("shirt/shirt.png" -> "shirt/red_peasant.png") and recreating the skeleton for each instance of the mesh. Would this break optimizations? If so, is there a better way of reusing the same rig (without making each potential texture a separate skin, as there are thousands?)

Mario · 2021년 10월 13일

The sample i posted uses a single texture atlas page. You can check that in the .atlas file (like you have in your own example with 20 pages).

Yes, those 20 individual textures break batching. Also, the atlas pages are huge. Given the in-game screenshot above, I'd assume you'll never zoom in so far, that you need a 2048x2048 texture for buckle variations. I've inspected the texture atlas pages mentioned in your default.atlas. Did you generate those yourself? Because they are extremely unoptimized, with tons of whitespace. E.g. .

Spine wouldn't generate such an unoptimized atlas even with the worst packing settings. I strongly suggest to rethink your atlas strategy. I'm pretty sure you could fit all your images into a handful of texture atlas pages, and use our skin functionality. You are essentially recreating all that manually without gaining anything.

You are also re-parsing the skeleton .json file for each of your objects, see line 287 here https://pastebin.com/whdTtuSD. You can share the SkeletonData across any number of skeletons, which will bring down load times significantly.

Also, you set the color on each attachment of a skin in line 225. That's not necessary at all. A better way to do this is to set the Skeleton.color, which will then get combined with all slots/attachment colors of the skeleton.

I strongly suggest you don't recreate our skins functionality but instead use what we provide, and also use our texture atlas packer with white space stripping and rotation to get the tightest fits. Also, think about how big your skeletons will appear on screen in the game and adjust your image sizes accordingly. It's very very unlikely a player will zoom all in on a buckle such that you need a single 512x512 image for just the buckle.

HHeleor · 2021년 10월 13일

You are also re-parsing the skeleton .json file for each of your objects, see line 287 here https://pastebin.com/whdTtuSD. You can share the SkeletonData across any number of skeletons, which will bring down load times significantly.

I thought that would be ideal, but the skeletonData comes from the skeletonJson, which is created using the custom atlas. If I reuse the skeletonData, wouldn't that result in reusing the atlas information as well? (Losing all of the texture changes from the specific instance of the skeleton)

Spine wouldn't generate such an unoptimized atlas even with the worst packing settings.
You're right for that specific belt, but there are belts that do use the extra space (sashes, fancy shirts). Since the actual rig needs to contain pieces for all possible belts, we chose to expand the actual attachment larger (so that we do not have to keep modifying the rig whenever we want to go slightly outside the current bounds).

I strongly suggest to rethink your atlas strategy. I'm pretty sure you could fit all your images into a handful of texture atlas pages, and use our skin functionality. You are essentially recreating all that manually without gaining anything.

The purpose of the individual textures is so that artists can create new skins without having to use the spine editor (which has proven to be too complex for most 2D artists that we have contracted). The atlas pages are generated by spine but have been created such that an artist can edit only the image files. (The images used in-game are compressed over the size you see.)

We currently have over 1000 unique images and we intend to have 10-100x more than that in the final game. Would you recommend a spine rig with 100,000 skins? I thought that seemed to be too much, especially since a given character would only ever have one skin enabled for each of the possible slots.

Also, you set the color on each attachment of a skin in line 225. That's not necessary at all. A better way to do this is to set the Skeleton.color, which will then get combined with all slots/attachment colors of the skeleton.

We did it this way to allow the shirt to be tinted separately from the pants or the cape. Is there a separate Skeleton.color on each skin?

I strongly suggest you don't recreate our skins functionality but instead use what we provide, and also use our texture atlas packer with white space stripping and rotation to get the tightest fits. Also, think about how big your skeletons will appear on screen in the game and adjust your image sizes accordingly. It's very very unlikely a player will zoom all in on a buckle such that you need a single 512x512 image for just the buckle.

I'd love to be able to use the skins functionality as-is but it did not seem feasible. Is there a way to set up the model in a way that:
a) Does not load skin textures that are not currently 'enabled'. (Lazy loading of textures)
b) Skins can be created in the runtime, without having to export a new spine rig whenever a new skin is added.

Nate · 2021년 10월 14일

Heleor wrote
the skeletonData comes from the skeletonJson, which is created using the custom atlas. If I reuse the skeletonData, wouldn't that result in reusing the atlas information as well? (Losing all of the texture changes from the specific instance of the skeleton)

We don't generally recommend swapping textures to change how attachments look. You generally want as much on a single texture as possible because changing textures breaks batching.

Instead you could have an attachment in Spine for each variation (hat red, hat purple, etc). Then when you pack your atlas, all the images can be packed together tightly.

If a variation has multiple images (eg shirt arm left, arm right, and torso) you can use a skin to group the attachments in Spine. At runtime you can combine many skins representing various "items" (shirt, pants, sword, etc) into the skin the skeleton uses.

Heleor wrote
You're right for that specific belt, but there are belts that do use the extra space (sashes, fancy shirts). Since the actual rig needs to contain pieces for all possible belts, we chose to expand the actual attachment larger (so that we do not have to keep modifying the rig whenever we want to go slightly outside the current bounds).

This is fine to organize your art, but it is worth finding ways to not have all that whitespace eat up space in your atlas textures. With the setup described above, each image can be packed with whitespace stripping or using the polygons packing mode. When an image is drawn at runtime, it is offset so it is drawn as if it still had the whitespace.

Heleor wrote
The purpose of the individual textures is so that artists can create new skins without having to use the spine editor (which has proven to be too complex for most 2D artists that we have contracted).
...
We currently have over 1000 unique images and we intend to have 10-100x more than that in the final game. Would you recommend a spine rig with 100,000 skins?

The Spine editor can likely handle that many attachments, but you're right that it is not recommended because it requires too much tedious work that can be avoided in other ways. You approach to use whitespace in the attachment images is good, you just don't want that whitespace in your atlas.

This is going to get a bit long from here because you want everything: 1) to render any of many variations, and 2) to render many of those all at once.

What you can do is rig one set of attachments in Spine that use an image with enough whitespace to fit any of your images for that attachment. These attachments are your "template". Artists can now go off and create any number of attachments images using the template images, as long as their images are the same size. You do not rig any of these images in Spine, your skeleton only contains only the template images.

You can create an image that shows all the bones using Spine's PSD export, so your artists know where the bone origins are located. This lets them position their images based on where the bones rotate.

Now what to do at runtime? You could pack all the images into an atlas (you can exclude the template images) using whitespace stripping (you can use polygon packing too, but let's not complicate things any further for now). When you load your skeleton at runtime, you can provide an AttachmentLoader to control how the texture region for an attachment is found.

There's a few ways to go from here, depending on your needs. When the AttachmentLoader is asked to create the attachment for template-shirt-left-arm you could create the attachment and instead provide the texture region you want, say red-shirt-left-arm. The resulting SkeletonData has attachments configured with texture regions for the variations you want. However, the skeleton data and attachments are shared across Skeleton instances, which may not be what you want.

Instead, you could have your AttachmentLoader create the attachment but not configure it with any texture region. Then you can create multiple skeleton instances from the resulting skeleton data. Before you render a skeleton instance, you need to set the texture region for each attachment. Attachments are shared across skeleton instances, so you want to first copy the attachment (Attachment copy), set the texture region on the attachment, and replace the attachment on the skeleton with the copy.

Alternatively, you could avoid making attachment copies by customizing the skeleton rendering. When you go to render an attachment, instead of getting the texture region from the attachment, you'd decide which texture region to use (based on your application state) and use that instead.

This is all great, no? Your artists can create thousands of new images without any rigging in Spine. You can load the skeleton data once and render multiple skeletons with different variations. At this point you may have the issue that your 100k images makes for an atlas that is huge, even when packed with whitespace stripping. When rendering a skeleton it's likely each image will be on a different texture, resulting in a texture switch and batch flush for each attachment -> poor performance.

To solve this you'll need the attachment images that will be drawn subsequently to be on the same atlas page. One solution to this is to package your game with individual images (or download them as needed) rather than an atlas with ALL the images. Once you know exactly how to outfit a skeleton, you would pack only those images to a texture at runtime. You could even look at the animations you plan to play to find any attachment keys, so those images can be packed too.

You could try to fit multiple skeletons in one texture, or maybe one texture per skeleton is good enough. This runtime packing is a bit of work to implement. Some game toolkits provide ways of doing it, such as Unity or libgdx.

Note if you happen to know the variations at build time, you can put thier images in subfolders so they are grouped together on a single atlas page, pack them with Spine's texture packer, and then at runtime only load the atlas pages you need.

Now you can finally render any skeleton variations with good performance! There are other ways you could make this even more complicated , such as storing your individual attachments as SVG, then rendering the SVG at runtime (usually quite difficult, but probably not too bad using a browser) to create your skeleton's packed texture.

As an alternative solution to runtime packing, or to augment runtime packing when using one texture per skeleton, you could customize rendering to use a texture array and pass the desired texture as a vertex attribute. This avoids flushing the batch when the texture changes. More information and an implementation is available here:
https://www.youtube.com/watch?v=bw6JsLnx5Jg
https://github.com/libgdx/libgdx/issues/5907

We did it this way to allow the shirt to be tinted separately from the pants or the cape. Is there a separate Skeleton.color on each skin?

When you need to tint only some slots, setting the color for just those slots is the way to go.

HHeleor · 2021년 10월 15일

Thanks for the detailed reply. Amusingly enough, I was packing all of the art for a single skeleton into a single texture before I started using Spine. I moved to Spine to avoid having to do that.

The way I read all the solutions is that it basically requires either rewriting or heavily modifying the Spine renderer?

The link you provided on rendering spritebatches looks very promising and would work even with my current texture code

has anyone implemented / used it in the threejs (or any of the webgl) Spine implementation?

Would moving to texture batching be something that potentially fits in the Spine runtime library roadmap, or is our use case weird enough that we'd be on our own?

Nate · 2021년 10월 15일

Heleor wrote
Thanks for the detailed reply. Amusingly enough, I was packing all of the art for a single skeleton into a single texture before I started using Spine. I moved to Spine to avoid having to do that.

It's not surprising that Spine doesn't help with that because it has to do with the way OpenGL is typically used to render: the pipeline needs to be flushed to render from a different texture. Packing the atlas efficiently is also important to reduce GPU memory usage.

Heleor wrote
The way I read all the solutions is that it basically requires either rewriting or heavily modifying the Spine renderer?

No, not at all. I suggested customizing the skeleton renderer as a possible solution for 1) avoiding copying attachments, or 2) using a texture array so texture binds don't break batching. None of the other solutions require modifying the skeleton renderer or any other Spine Runtimes code.

Heleor wrote
The link you provided on rendering spritebatches looks very promising and would work even with my current texture code

has anyone implemented / used it in the threejs (or any of the webgl) Spine implementation?

Not that I know of, sorry. Note as I described above, you will eventually be limited by the size of your atlas.

Heleor wrote
Would moving to texture batching be something that potentially fits in the Spine runtime library roadmap, or is our use case weird enough that we'd be on our own?

It would be a great addition to the Spine Runtimes, though we don't currently have it on our short term radar. It would need to be optional, as their is a little performance reduction to render in that way. You'd only want to use it for multi page atlases that have images for the same skeleton on different pages.

HHeleor · 2021년 10월 15일

For the texture batching, what would it take to get that feature prioritized? I completely agree that it should be optional (and it would likely depend on the hardware available on each device whether it should be enabled or not).

I can throw a (relatively small) amount of money towards a bounty if you have that set up.

Nate · 2021년 10월 15일

It's more about how much time there is in the day. We have a massive backlog of features that customers would absolutely love to see. We prioritize features that benefit the most people while also considering the how long it takes to get them done.

The best way to fast track getting a new feature into the Spine Runtimes is to submit a PR on GitHub that adds the feature. Barring that, unfortunately it'll have to wait until we can get to it.

FWIW, avoiding texture switches is pretty standard across most/all game toolkits: libgdx, Unity, cocos2d-x, etc. People are likely used to working around the problem.

From the libgdx issue comments, it seems it's likely compatible with most/all devices. I'm not sure why it's not in more widespread use.

HHeleor · 2021년 10월 15일

Is there a github issue or similar that I can follow for the texture batching?

Mario · 2021년 10월 16일

It's very unlikely that we'd add this to spine-threejs. Ths reason: even if we got it to work on top of ThreeJS' rendering infrastructure (and that's a very big if), it'd only work for things that we render in our runtime, i.e. skeletons. It would not work for anything else you render through ThreeJS, unless this was added to ThreeJS itself. The matter is further complicated by the fact that most skeletons are made up translucent images. That requires sorting of objects to be rendered, i.e. terrain (tiles), trees, props, skeletons, world space UI, etc., by material. Only skeletons that end up next to each other in the sorted list of things to render could be rendered via this approach. And anytime such a subset of skeletons is being rendered, we'd have to setup the entire texture unit state, render, then tear it down again so we don't interfere with ThreeJS's own rendering.

Nate · 2021년 10월 17일

Agreed, it's a better fit for game toolkits that give more control over rendering, like the spine-libgdx and spine-ts runtimes.