Widescreen Gaming Forum

[-noun] Web community dedicated to ensuring PC games run properly on your tablet, netbook, personal computer, HDTV and multi-monitor gaming rig.
It is currently 14 Dec 2024, 22:11

All times are UTC [ DST ]




Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: 28 Feb 2010, 00:32 
Offline
Insiders
Insiders
User avatar

Joined: 22 Aug 2007, 19:00
Posts: 647
And that matters why? If the load is split 50/50 among both GPUs (in theory) memory usage is also split 50/50. Unless you're telling me it always duplicates memory across both GPUs, every byte-for-byte is exactly the same? Which would barely make any sense since as I understand it the CF bridge has poor bandwidth.


Top
 Profile  
 


PostPosted: 28 Feb 2010, 00:53 
Offline
Insiders
Insiders
User avatar

Joined: 29 Jul 2007, 05:24
Posts: 1512
Location: NZ
And that matters why? If the load is split 50/50 among both GPUs (in theory) memory usage is also split 50/50. Unless you're telling me it always duplicates memory across both GPUs, every byte-for-byte is exactly the same? Which would barely make any sense since as I understand it the CF bridge has poor bandwidth.

This

" As a result, a Radeon HD 3870 X2 paired with a Radeon HD 3850 256MB would perform like a trio of Radeon HD 3850 256MB cards. And, of course, that means the effective memory size for the entire GPU phalanx would effectively be 256MB, not 768MB, because memory isn't shared between GPUs in CrossFire (or in SLI, for that matter)."

From here

_________________
Dipping bags at Mach1.9


Top
 Profile  
 
PostPosted: 28 Feb 2010, 01:42 
Offline
Editors
Editors
User avatar

Joined: 31 Jul 2006, 14:58
Posts: 1497
And that matters why? If the load is split 50/50 among both GPUs (in theory) memory usage is also split 50/50. Unless you're telling me it always duplicates memory across both GPUs, every byte-for-byte is exactly the same? Which would barely make any sense since as I understand it the CF bridge has poor bandwidth.


Yeah its basically like that, both cards have to have the same data in there memory they do not split it, even if they do split part of the load.

_________________
ViciousXUSMC on the Web - YouTube :: FaceBook :: Website


Top
 Profile  
 
PostPosted: 28 Feb 2010, 05:26 
Offline
Founder
Founder
User avatar

Joined: 13 Oct 2003, 05:00
Posts: 7358
That graph doesn't have enough information, what resolution and settings was it at (and yes I went to the wiki url you linked first). It also says it only has a .7 fps benefit.

The article states that I was testing 5040x1050 at Ultra Settings with 4xAA.

Anyway I was more thinking of the 5970 4GB that we are talking about in this thread (not a 5870 1GB vs 2GB), I somehow doubt 2GB vs 4GB will be a significant difference.

The 5970 has two 5870 GPUs it will basically be (HD 5870 2GB)x2.

And that matters why? If the load is split 50/50 among both GPUs (in theory) memory usage is also split 50/50. Unless you're telling me it always duplicates memory across both GPUs, every byte-for-byte is exactly the same? Which would barely make any sense since as I understand it the CF bridge has poor bandwidth.

Yes, that is what we are saying. Every SLI and CrossFireX setup as always been like this. The NVIDIA GTX295 1792MB card was really 896MBx2, which is why GTX280/285 in SLI (each with a full 1GB) would generally outperform even if clocks were set to match.

You don't get to add the memory from each GPU in series (stack them). They each run in parallel with their own GPU and hold the hole image in the framebuffer. The 4GB card is just getting to 2GB for each GPU.

When are you getting the 6 card? That will be an interesting test.

Should be soon. Embargo lifts on March 11th. Should have it in time to test and review, so I would expect very soon. Should also be getting a 5970 2GB in March as well. Will let me directly compare the 1GPUx2GB vs 2GPUx1GB scenario. Whenever the 4GB cards start releasing, I'd love to stack one of those on top.

ATI was nice enough to catch me up on all the cards I had missed out on, so I've still got a number to test (5850, 5830, 5750), and a number to write up that I've tested (5770, 5670, 5550, 5450). I got the 5830, 56, 55 and 54 all before they announced but couldn't get them tested and reviewed individually in time for their release dates.

Plan to do one round-up for Mainstream (56, 55, 54), one for Performance (57), one for Enthusiast (59, 58, E6). I may group the 57 with either the Mainstream or the Enthusiast, as it's kind of a bridge between the two.


Top
 Profile  
 
PostPosted: 28 Feb 2010, 05:42 
Offline
Insiders
Insiders
User avatar

Joined: 29 Jul 2007, 05:24
Posts: 1512
Location: NZ
Sweet, send me the 5970 4GB and I'll do the review vs 2GB for you ;)

_________________
Dipping bags at Mach1.9


Top
 Profile  
 
PostPosted: 28 Feb 2010, 09:19 
Offline
Insiders
Insiders
User avatar

Joined: 22 Aug 2007, 19:00
Posts: 647
That article doesn't really do a good job of explaining usage of memory in parallel vs in series. Just that the CF is as good as the weakest video card (which is reasonable). I suppose it makes sense that both cards clone the framebuffer.. but that would only be the resolution * 24/8 number of bytes. So about 20MB for a 3x1920x1200 at 24bpp. Double/triple that for double/triple buffering = 40/60MB. Hardly anything worth mentioning for a any modern video card.

So my question is how is that the rest of the memory is "duplicated?" I would love a better explanation since right now it looks like you guys are saying all memory operations are transmitted across to the other GPU, similar to a write-through L1/2 cache. Which as I stated earlier wouldn't make much sense since the GPU CF bridge bandwidth is poor (GPUs do a great job of parallelizing the work, but only once the work tasks get to it).
[quote]That graph doesn't have enough information, what resolution and settings was it at (and yes I went to the wiki url you linked first). It also says it only has a .7 fps benefit.

The article states that I was testing 5040x1050 at Ultra Settings with 4xAA.


Well to be honest it wasn't perfectly clear, yes the top most FC2 benchmark was obviously Ultra/4xAA and the resolutions were clearly marked, but the rest of the benchmarks I didn't want to guess about :mrgreen:.

Still with that information in mind it really doesn't seem like a huge win!


Top
 Profile  
 
PostPosted: 28 Feb 2010, 09:49 
Offline
Insiders
Insiders
User avatar

Joined: 29 Jul 2007, 05:24
Posts: 1512
Location: NZ
I don't know exactly how it works, I unsuccessfully looked for a web page to explain it.

However the fact is that is how it works, you only get the (lowest) VRAM of one of your SLI/CF card/s.

_________________
Dipping bags at Mach1.9


Top
 Profile  
 
PostPosted: 28 Feb 2010, 17:17 
Offline
Founder
Founder
User avatar

Joined: 13 Oct 2003, 05:00
Posts: 7358
That article doesn't really do a good job of explaining usage of memory in parallel vs in series. Just that the CF is as good as the weakest video card (which is reasonable). I suppose it makes sense that both cards clone the framebuffer.. but that would only be the resolution * 24/8 number of bytes. So about 20MB for a 3x1920x1200 at 24bpp. Double/triple that for double/triple buffering = 40/60MB. Hardly anything worth mentioning for a any modern video card.

Your making a few mistakes here. One, you're looking at the problem as if it were a 2D image. It is simply not a static 2D rendering. The GPU is rendering objects far into the distance often at a greater fidelity that you can imagine. Also, a 3D rendered scene requires the GPU to store tons of additional data - geometry, lighting, texture (which by themselves can easily be tens or hundreds of megabytes each). You add to that anti-aliasing, and you can easily overfill the video card. There are plenty of games I couldn't get to even load at 5040x1050 with 4xAA on a 1GB card. Quoting from the Wikipedia article on anti-aliasing:

In general, supersampling is a technique of collecting data points at a greater resolution (usually by a power of two) than the final data resolution. These data points are then combined (down-sampled) to the desired resolution, often just by a simple average. The combined data points have less visible aliasing artifacts (or moiré patterns).

Full-scene anti-aliasing by supersampling usually means that each full frame is rendered at double (2x) or quadruple (4x) the display resolution, and then down-sampled to match the display resolution. So a 4x FSAA would render 16 supersampled pixels for each single pixel of each frame.


So, crank up to 4xAA (much less 8xAA) and you're rendering 16x the pixels of 5760x1200 (6.9M pixels). This would put you at 110.6M pixels. Then add back in the textures, the lighting and the geometry and do you see where the memory is going?

So my question is how is that the rest of the memory is "duplicated?" I would love a better explanation since right now it looks like you guys are saying all memory operations are transmitted across to the other GPU, similar to a write-through L1/2 cache. Which as I stated earlier wouldn't make much sense since the GPU CF bridge bandwidth is poor (GPUs do a great job of parallelizing the work, but only once the work tasks get to it).


Quoting from the Wikipedia article on SLI:
SLI offers two rendering and one anti-aliasing method for splitting the work between the video cards:
* Split Frame Rendering (SFR), the first rendering method. ... This method does not scale geometry or work as well as AFR, however.
* Alternate Frame Rendering (AFR), the second rendering method. Here, each GPU renders entire frames in sequence – one GPU processes even frames, and the second processes odd frames, one after the other. When the slave card finishes work on a frame (or part of a frame) the results are sent via the SLI bridge to the master card, which then outputs the completed frames. Ideally, this would result in the rendering time being cut in half, and thus performance from the video cards would double. In their advertising, Nvidia claims up to 1.9x the performance of one card with the dual-card setup.
* SLI Antialiasing. This is a standalone rendering mode that offers up to double the antialiasing performance...


AFR is what is most commonly used. This is why the whole frame is rendered. Honestly, if you'd like a better explanation, look for one. I had no idea exactly why it worked this way, but five minutes on Wikipedia and I had it. Also, do I need to know how the anti-lock brakes work on my car to know they are better than "regular" brakes.

Here are the articles I read on SLI, CrossFire and AA:
http://en.wikipedia.org/wiki/Scalable_Link_Interface
http://en.wikipedia.org/wiki/ATI_CrossFire
http://en.wikipedia.org/wiki/Anti-aliasing

A Google for "CrossFire White Paper" gave me this:
http://ati.amd.com/technology/crossfire/downloads.html

Browsing the NVIDIA website gave me this:
http://www.slizone.com/page/slizone_learn.html

Well to be honest it wasn't perfectly clear, yes the top most FC2 benchmark was obviously Ultra/4xAA and the resolutions were clearly marked, but the rest of the benchmarks I didn't want to guess about :mrgreen:.

All tests were run at the same settings. Otherwise, I couldn't draw any comparisons.

Still with that information in mind it really doesn't seem like a huge win!


I guess it depends on your definition of "huge" and what games/settings/etc you play at. However, if I can add about $25 (an extra 1GB of DDR5) or $50 (an extra 2GB of DDR5) to the price of a video card and eliminate the glitches and the stalls that bring the game every 30-60 seconds on average, then it's a win for me.

Those dips only last for a second, and thus barely effect the overall weighted average of the 280 second run. However, they make a huge impact (IMHO) on the enjoyment, performance and playability.

When I compare the 1GB HD 5870 to the 2GB HD 5870 Eyefinity, I'm going to set up my HDV cam so we can see what the "real-world" impact is to the user experience.


Top
 Profile  
 
PostPosted: 28 Feb 2010, 17:18 
Offline
Founder
Founder
User avatar

Joined: 13 Oct 2003, 05:00
Posts: 7358
I don't know exactly how it works, I unsuccessfully looked for a web page to explain it.

However the fact is that is how it works, you only get the (lowest) VRAM of one of your SLI/CF card/s.

My guess is that to have consistent performance and quality with Alternate Frame Rendering, you would limit the render pipeline to what all GPUs to access/handle.


Top
 Profile  
 
PostPosted: 28 Feb 2010, 19:51 
Offline

Joined: 17 Jan 2010, 06:04
Posts: 60

In general, supersampling is a technique of collecting data points at a greater resolution (usually by a power of two) than the final data resolution. These data points are then combined (down-sampled) to the desired resolution, often just by a simple average. The combined data points have less visible aliasing artifacts (or moiré patterns).

Full-scene anti-aliasing by supersampling usually means that each full frame is rendered at double (2x) or quadruple (4x) the display resolution, and then down-sampled to match the display resolution. So a 4x FSAA would render 16 supersampled pixels for each single pixel of each frame.


So, crank up to 4xAA (much less 8xAA) and you're rendering 16x the pixels of 5760x1200 (6.9M pixels). This would put you at 110.6M pixels. Then add back in the textures, the lighting and the geometry and do you see where the memory is going?



Hey Ibrin, while I agree with your reasoning, I must point out that super sampling isn't the default AA technique for that very reason; it's performance isn't all that scalable because memory size and bandwidth is too much of a constraint. That's why both brands of cards default to multi-sampling, which provides lower memory requirements with a similar output by determining pixel shade by comparing to the pixels around it. (to gratuitously oversimplify)

That said, I totally agree with your overall point. I'm very excited to get to 2GB per GPU. Taking your example, (5700x1200) that makes for one HUGE frame buffer. Since one has to have at least a front and a back buffer that makes for ((5700x1200x32) (WxHxBpp) = 218,880,000 x 2 (front/back buffer) = 437,760,000. WOW! Now mind you I think they use lossless compression for the front and back buffer, but still; in terms of raw memory size requirements that's about 438MB of 1GB for just the frame buffers. If you triple buffer that brings the figure up to 657MB! EDIT This computation is wrong. Read on to next post. Should have been /8 as Downtown pointed out, making the values 27MB, 54MB, and 82MB respectively.

I love this site by the way, thanks for setting it up. Been lurking for years and I finally signed up when I got my eyefinity config. :cheers


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: DotNetDotCom.org [Bot] and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  




Powered by phpBB® Forum Software © phpBB Group