Comments

Log in with itch.io to leave a comment.

a quick benchmark below on Relic Hunter source which is still quite impressive performance wise considering the trees and alpha textures on the scene! πŸ‘πŸ‘πŸ‘

oddly the screen resolution and tree count does not dramatically affect the framerate that much.

Default setup (fps ray off/on)
1000 trees flip 0 = 25/50
1000 trees flip 1 = 20/30

Lesser trees (fps ray off/on)
~10 trees flip 0 = 60/90
~10 trees flip 1 = 30/60

Still this is awesome! πŸ˜€

Thank you! The performance of the effect is one thing, but the performance of the whole scene / engine is yet another to-be-optimized issue. I have just realiszed that I didn't follow one of the most important rules: keep the surface count low. For instance, 7000 Grass meshes (2 quads x-ed) that are made with copyentity and via a contingent placed and oriented around the camera, are still 7000 surfaces, a huge impact. I just optimized it here from 20fps to 30 fps on my little card, simply by kicking the contingent system out, create 70'000 grass meshes, and 10x10 dummy meshes that are distributed evenly over the area. Then I addmesh the grass to the dummies, depending on their location. This way the grass is split up into 100 segments, each one containing only 3 surfaces, because they are 3 different brushes / grass-types. Directx does that, when you addmesh things together, it will optimize the brush count by reusing already existing identical brushes and adds the mesh to the corresponding surface,  keeping the surface count low. The camera range can then exclude a lot of the grass sectors easily. It really speeds things up. I'll do the same with the trees and bushes get rid of the LOD system. Funny, it started with the LOD. Trying to get better grass, testing some alternative ground, I'll add a screenie.

Whether Render to texture is faster than copyrect I don't know, but in theory, when you render to the texture, you can skip the copyrect part, but you have to render anyway. So I guess yes, it might be faster, even tho, copyrect 256x256 to a 256-flagged texture was like 1 or 2 ms only here. I guess optimizing the scene as mentioned has a bigger impact. Esp, since the scene is rendered twice.

(1 edit)

That's awesome! 😊 looking forward to the update πŸ‘

I also did a side by side comparison with the fastext version and I noticed one major thing, the fastext rays could be set not to or does not reach the whole scene.

Click here for screenshot 

Perhaps this can be one way to better the framerate and performance. Also, the rays are more defined while keeping the scene not washed out.

..and how's about just making the rays based on the light position?

You can just do the same how fastext does it using CameraProject and TFormVector.

(+1)

It's a decade ago I purchased the FastExt lib, I didn't even remember they had this. But yes, the alignment of the effect should still be fixed, it's not logical by now. If the effect is fixed to an object like the sun, you can lower polycount and texture size of the ray mesh. But Somehow I like the screen filter approach, that fixes the filter to the camera. It just really needs a fix that adjusts the angle and position, so certain camera motions don't make it look unlogical. Right now I am more concerned about optimizing the render time, fixing the effect angle etc. I save as the easy part as a dessert. 

Also, small texture size causes flickering rays because of marching pixels (and the KIND of pre-mipmapping in the dds texture has an impact, masked textures flicker more with the sharp "next neighbor" method), a blur of that render might fix it, but it's costly, or yet another  challenge.

(+1)

Great stuff! πŸ‘Œ

(+1)

Wow I just used the word "fix" 6 times ^^

btw, do you happen to have the source that was shown in your itch.io page cover?

(+1)

That's from the last Relic Hunter prototype. However, adding it to the procedural forest source is relatively easy.

(20 secΓΆnds later)

Just uploaded the source, but no time to clean up, sorry.

Thanks! is this the v2? will check it out asap..

(+1)

It is the second release of relic hunter v 0.11, but the include file for b3dfile.bb is missing, but that's in the pine tree creation zip.

got it thanks! πŸ‘

(1 edit) (+1)

Looks great! My last attempt at this was too slow for practical use.. πŸ˜… curious how this would hold up and look on larger scenes. 

(+1)

Thanks. While it still is too much centered for an effect that comes from all sides (which is unlogical anyway, when you look at the shading of the trees), it could be faded in and out dynamically when one is looking at the actual sun, or where the sun would be. The effect adds only a few hundred Tris to the scene, and they are blended in in add-mode, can be put in front of everything with EntityOrder, so this ray-mesh isn't the bottle-neck. One thing that is slow is to copy this render to the texturebuffer, even with the 256-flag. Render to texture would be faster. I think I remember I was able to do that using the FastLib for Blitz3D. The other burden is the render itself. This can be optimized by hiding the grass and maybe the bushes temporarily. And of course, as mentioned in the code, using EntityColor to turn the trees and the ground black during this render, rather than using WritePixelFast as in this demo (which I do in the experimental version of relic hunter). However, in the end of the day it really is an additional render, just like a cubemap would be, or some other fancy stuff, so we have to take the polycount into account. That said, when I first tested the trees without LOD, I threw 600k tris at my cheapo onboard card (AMD Radeaon R3) and it rendered it like a 1-cube scene, I guess 60 fps. So I'm getting a bit wasteful in terms of polycount. Maybe 1000 Tris for a tree is still too much  (could be made less dense in the pine tree code). I guess DX7/Blitz3D still forces us to do more lowpoly as it lacks of certain FAST features, like shaders, automatic LOD, render to texture etc. But I can't help myself, I had unity installed, and mildly put, I didn't like it. Also, these days there are so many engines out there, I could spend my whole life just to test them, and from past engine-tests I know it usually gets you nowhere. Still, a Blitz3D to WebGL converter would be very nice.

(+1)

Oh and BTW (also sorry for the text avalanche), as I'm thinking about it, I may as well try an entirely different approach that uses only 1 render: do a point sample mini copy of the main render from the backbuffer, eg. 256x256 pixels, then scale them down to 128x128 while smoothing the point samples (blur-shrink), probably using inline assembler (like you can in gfa basic, and probably freebasic that can make it a dll that then can be used in Blitz3D via userlib decls). And when reading the point samples from the render, ignore anything that isn't bright enough (or lower it to rgb 0).

(+1)

Ok, in case anyone is interested: I tried the above idea (not the ASM part) and two things became clear: first of all, doing only one render inevitably causes a recursive feedback, because the points sampled are brightened and sampled again repeatedly, forcing me to use a very low alpha, but even then it stabilizes only due to rounding errors, causing it to flicker wildly. So I concluded there is no way around a 2nd render. 

However, I found a much faster way: Render the scene without the rays mesh, full display size. Then do the point sample from the backbuffer and move it to the ray  mesh texture. Then move the camera 10000 units away, where the entire scene is out of rendering range (the ray mesh is parented to the camera), set the cameraClsMode to maintain the backbuffer and now render the ray mesh alone ontop of it. Then set CameraClsMode to 1,1 again and move camera back to the scene. I was able to lower the rendering time of the effect from 23 to 16 ms - still very slow.

That's when I figured out the second thing: from the 16 ms about 13 ms were used only by the commands lockbuffer backbuffer() and unlockbuffer backbuffer() ! I tried it with no fastpixelreading, it took 13ms, then also without lockbuffer and it went down to like 1ms.

So the main bottleneck seems to be lockbuffer. It seems to wait for some green light from directX, which is in sync with the system framerate. I tried VWait right before lockbuffer and was able to lower it from 16 to 5 ms. But VWait should always be followed by flip 0, if used at all. Maybe I'll upload the source.

(3 edits)

Very interesting and cool insights you got there, as always!  I'm curious about the freebasic or  inline assembly way to make it faster as I would presume this is how FastExt does this effect.

There is also this one idea that I am very interested with the outcome from Fredborg which RemiD described before that you might look into below. I guess you might use some form of light trails effect for the rays and perhaps you can have a go at it! πŸ˜Š

"the idea was to have a subdivided quad parented to the camera, have its vertices colored with the sun color, and use linepicks from the sun to each vertex, and set the vertices alphas accordingly (if a light ray can reach a vertex, alpha 0.5, if a light ray can't reach a vertex, alpha 0)
with blendmode add or multiply2..."



Interesting, but a ray resolution of 256x256 would be 64k Linepicks which might be slow too. Below 128x128 it becomes really blurry.

I was thinking of another simpler way of rendering the rays and then doing a 2D image mask on the objects in front so the scene will be like a 1 or 2 pass render to texture effect.