• 1/1

CASE STUDY: Network rendering vs Mental Ray’s Distributed Bucket Rendering.

Posted by chrismmurray, 10 September 2012 2:55 pm

I recently had the chance to help out a friend with some rendering issues and what actually happened I thought would be an interesting case study here. This is NOT a how-to on acutally setting up and using Distributed Bucket Rendering (henceforth referred to as “DBR”). This is an exploration of one instance where it was a critical consideration in a very large render job. If you want the “how-to” of DBR. You can find it in the 3dsMax Help file here: http://docs.autodesk.com/3DSMAX/15/ENU/3ds-Max-Help/files/GUID-81270C05-C3D7-42A9-A129-459389FED064.htm

The Scenario (drastically summarized)

It was a huge rendering task for a “ride film”. If you’ve ever done anything thing like this, you know the image rendering and post processing is as big of a task as prepro and production.  Rendering immersive environments frequently means rendering images to the specification of the projection system. In the case of my friend (and other ride films I’ve worked on) it usually a square image cropped from a larger one (1080x1080 cropped from 1920x1080 or some permutation thereof. Ok… that’s different. Simply and extra step right? Well, ride films are part of 360 workflows (a whole other blog post) which are almost always multi-camera rigs -- front, left, right, top, and back. You don’t usually need to render them bottom view.  

Again, in the case of my friend, not only did he have to render 5 camera views, the sequence length was 2,800 frames. (note I did NOT say the film length). So quick math puts us at 14,000 frames—about 9 minutes if this were a linear piece…  the segment is really only about 90 seconds. Because the design requirements have exorbitant detail and realism, the frames are rendering at between 30-60 minutes per frame. At this point the raging hordes jump in with better ideas and scene optimizations. But hold your horses! When you consider 5 camera sets-ups—just rendering the elements amps up the complexity almost exponentially. Sure, I concede that studios that only do ride films full time have an optimized system in place. So consider this a learning curve for those who don’t.

So while many of us are willing to tackle a 90 second piece without much concern, these 5 camera setups require lots of planning especially when it comes to rendering resources. Rendering time can frequently equal or surpass the entire production time due to the sheer volume of rendering. Heaven forbid there are changes—which there always are.

Finally, my buddy’s machines had limited RAM. The scene file took 16gb of RAM just to load the scene. It was swapping like crazy. As you will see below, DBR uses significantly LESS ram while still giving you access to the same amount of processing power. This is a critical consideration on a deadline.

Assumptions

Before we can dig into tackling the rendering task, let’s get some assumptions out of the way that are specific to this case study.

1.)    Resolutions are fixed. We can’t really cheat upscaling lower res images because of special post processing required by the projection system

2.)    The quantity of machines is more or less fixed—at 5. Meaning unless you’re a huge shop with unlimited (relatively speaking) rendering resources, you will most likely be rendering with what you have. Assume you can upgrade RAM and drive space. We’ll talk about that later.

3.)    Mental Ray is a requirement. Some of this may apply with VRAY, but that is not part of the scope of this discussion. Since we’re talking specifically about DBR, you can assume MR is a requirement. DBR is not available in the other renderers that ship with 3dsMax.

4.)    Render times or more or less consistently high--- about 30-60 min/frame.

5.)    You know backburner and have used it successfully in some way. Without this context, this post may seem somewhat academic. Having firsthand knowledge of the issues surrounding network rendering will greatly assist you in trying some of these new things out.

Rendering strategy

OK here we go. This is why you are here. Why even consider anything but network rendering?

The answer to this question has direct correlation to assumption #2—machines are fixed.

Typically we’d just throw the frames at the farm and go home for the weekend. But rendering 14,000 frames requires a little more attention to planning. Saving as little as a 2 minutes/frame can equal up to 6 hours of time savings-- every minute counts.

But if render times are high (assumption #4) what options do I have? This is where DBR comes in.

Typically, DBR is a MR feature that is associated with rendering large single images. If you’re not familiar with what DBR is here’s a high-level explanation. Like net rendering it leverages machines on your network to render images. But instead of sending max files and scene data over and that machine rendering one whole single image…DBR leverages just the processor core to render portions of a single frame (called a bucket). You already seen buckets in MR, they are the little white corner squares that appear on an image when you render it. Each bucket represents a processor core. So if you have a dual quad-core, you will see 8 buckets. It’s pretty straight forward. With DBR, you can render a single frame on one machine faster if you use additional processors than with just that single machine’s processors. Those single machines are called “satellites.”  

Render farm throughput: All frames are not equal

So how will we apply that here? When considering this method, you have to understand the math associated with the performance of your specific render farm. Every farm will be different. Everyone’s math calculations will be different. It is entirely possible that your math may lead you to conclude that DBR may not be beneficial. In other words, you need to know the capabilities of your farm per job. The only way to attain this is to experiment with these two methods.

In the case of my friend’s ride film, DBR made sense in some cases. What follows is specific to the render farm in question. These calculations will not apply to your farm. But the method of calculation will.

I use frames/hour as my final metric of rendering power. How many frames per hour can I write to disk? You can get caught up in swapping, memory latency, and all this other stuff. Those all mean something for sure, but ultimately, for me, it’s about fr/hr. And for this discussion see assumption #2 (again).

Many people, rightly so, are strict measurers of minutes or hours per frame. In the context of a single machine this is important. It’s also important to know this as part of our calculations. But it isn’t the best or final measure of the efficiency of the entire farm. 

Understanding farm throughput

So for this rendering example we did some simple tests. We found “worst case frames”. Lots lights, geometry on full screen, etc… The frames we think would take the longest to render. Here's what we did.

·         A series of at least 20 frames were rendered using the straight net render method. This should give you a good estimate about the number of frames per hour you can push through the farm.

·         That same series of frames were rendering “locally” using DBR leveraging the processors of the other machines.

·         Calculate the best option. Net render vs. DBR. The higher frames/hour count wins. Simple!

It’s important to recognize that as render jobs progress, the render times will change as well. This will impact the efficiency of the two methods differently. In some cases (like early on in the shot where the frame was partially black) straight network rendering was the better option. But in the middle, with all the intricate detail, lights and shadows DBR was by far the better option. 

The ILLUSTRATION below is intended to demonstrate when we used old-school net rendering and when we chose to use DBR. The RED LINE indicate scene complexity. The brackets across the top indicate when we use each method. The blue boxes across the bottom indicate the progression through the animation in frames. It is not EXACT as no specific measurement is provided. Its merely an aid to visualise the decision making process for each rendering method.

 

 

What about adding more resources?

RAM

RAM is a logical place to add resources to an existing farm. But in this situation there is an interesting thing about RAM usage. The scene file needed 16GB of RAM just to load (not even mentioning rendering and swapping yet), but the DBR method only uses 3-4 GB to RENDER on the host machines. So while adding more RAM may impact overall single frame rendering per machine, adding RAM with the DBR method is negligible because of its low resource overhead. 

Machines

Adding machines will definitely add to your frame throughput in single frame situations. One thing about DBR is that it has a 4 satellite limit (total of 5 machines, your workstation plus 4 others). But now you can begin to make some smart choices given what you now know about RAM usage and the DBR method. It may make sense to add a few bare-bones high-end CPU’s with minimal RAM to push through the DBR frames and still leave you room to upgrade them later. Remember, you can pick and choose which satellites to use. As you add faster cores, you can drop slower ones off the satellite list.

Lastly, there is a 4 satellite limit to using DBR per licensed copy of 3dsMax. If you want to use more than 5 machines, you need to have an additional license(s) of 3dsMax.

Drive Space (for swapping)

There is another area where people tend to spend some dough; adding drive space for swapping. While this is a decent idea as drive space is SO cheap, I find it unlikely that, unless your render farm also houses your ripped movie collection, you are lacking for drive space. I don’t think this a really a great investment unless it’s really needed for other reasons.

Why not use DBR and Backburner all the time?

As I mentioned there is a limitation to DBR. You can only use 4 “satellite” machines (regardless of the number of cores). So even though you may have 10 machines, you can’t leverage all ten on a single machine for DBR rendering. But there is a way slightly more complicated way to do that that is best visualized in the chart below.

Basically, you pick and submit only the licensed host machines to Backburner. NOTE: Each host machine MUST be a licensed copy of 3dsMax. Each host machine that you submit then refers to its own satellites (make sure they aren’t a satellite of another machine already!)

 

Conclusion   

This solution is not for everyone. This bigger picture here is help you understand where the bottlenecks may appear in your network rendering project and what you can do about it. Sometimes, merely adding more resources isn't the best solution depending on your situation. Since this was a case study from a very specific project, I concede that there maybe some other limitations or factors we didn't cover here. Again, this wasn't intended to be an all encompassing exploration of the in's and out's of DBR. I'm sure you'll let me know about them in the comments section. But please do! I always have something to learn. 

If you're not doing so already, I invite you to follow me on twitter at @chrismmurray.

 

7 Comments

raymarcher

Posted 11 September 2012 3:04 pm

Interesting, then how many licences of 3ds Max were needed to handle the number of frames considering the 4 satellite limit?

Jonathan de Blok

Posted 11 September 2012 8:05 pm

Interesting read! To bad DBR requires extra licences or else I could have test it on my EC2 farm.

For comparison a 'normal' backburner job on Amazon's EC2:
-assuming 30min a frame on a 8-core EC2 unit
-so 14000 frames = 7000 hours total render time
-running a 100 spot instances will do the job in 70 hours
-cost: 7000h * 0.20$ (spot price) = $1400

chrismmurray

Posted 14 September 2012 3:45 pm

raymarcher: Not sure what what mean but the limit is 1 license per 4 satellites. So the number or frames really doesn't matter. What relevant is how many satellites you have vs/ how many you want to use.

chrismmurray

Posted 14 September 2012 3:47 pm

Jonathan, the cloud is going to make thing interesting for sure. Soon, I'm sure cloud rendering expenses will make its way into production budgets...

3DMadness

Posted 20 September 2012 7:51 pm

Really nice article!

Did your friend considered render using a single fish eye lens instead of the multiple camera setup? I've worked before with immersive rendering and got fine with the fish eye, you just need a raytracer like mental ray or vray.

Rodrigo Assaf

Posted 16 October 2012 7:26 pm

Fantastic Article! Thanks for Sharing!

dbowker3d

Posted 3 December 2012 2:03 am

Interesting ideas here. Can I ask exactly HOW big those scenes were he was submitting? Or how many polys, etc? What would cause the scene to need 16GB of RAM to load? Also, what about using DXF files to bring memory down?

I frequently work on and render scenes exceeding 14 million polys always in HD, and though 60 mins a frame is no big deal on my render farm, no machine on the farm (except one) has more than 4GB of RAM. Is it having many, many lights (like more than 12?) that adds so much, or is it multiple and huge huge textures? I guess over time I've learned to adapt by using almost all Mental Ray A&D materials with very few non-procedural textures.

Maybe one or two times in the last three years have I had them run out of memory. Usually they just use up a massive amount of swap space on go with it. Anyway, I'm just trying to get a picture of how these files gobble up so much memory. Thanks.

Add Your Comment

You must be logged in to post a comment.

Please only report comments that are spam or abusive.