Counting FLOPSs in a FLOPS – part 2

in #gridcoin6 years ago

Relative performance of a few chosen GPUs and CPUs.

As a reference computer I have used a computer based on quite old Xeon E5645 processor and GTX 1060 6GB graphic card.

In each table first and second line is for the same workunit. Third and forth line is for another workunit etc. This way you can compare performance of some graphic cards in relation to GTX 1080.

Amicable Numbers

ami.png

Recently in Amicable Numbers GTX 1060 6GB can achieve ~17k credits per hour or ~400k per day. Each completed workunit (WU) is rewarded the same number of credits. While usually one WU calculation time is around 1600 seconds, sometimes there are WU that take over 2000s or less than 500s. Relative performance is usually within 10 to 15% of the expected performance based on the official FP32 FLOPS. Almost no AMD cards, as it’s difficult / impossible to find out exact model from the data available on the project’s site. Rather low performance of AMD RX 470 might be due to misconfiguration (I.e CPU starving), overheating or other reasons.

einstein@home

ein.png

In einstein@home project GTX 1060 6GB takes almost always very close to 1000 seconds to complete the WU. GTX 1060 6GB can achieve ~12k credits per hour or up to ~290k per day (Max RAC ~ 290k). In this small sample Radeon cards either underperformed or outperformed expected times to complete. Very low performance of GTX 760 might be due to misconfiguration (I.e CPU starving), overheating or other reasons. Usually, for GTX cards actual performance is within 10% of expected.

yoyo@home

Measured MegaFLOPS are as they appear on the computer spec page on the project site. I assume it is acquired using Whetstone benchmark. Formula BOINC MFLOPS is calculated based on BOINC equation: GigaFLOPS = RAC/200. RATIO is the ratio of the two.

yoyo.png

Note that computed data are values per one core per hour, as yoyo WUs run each on one CPU core. Xeon E5645 has 5 cores or 12 virtual cores (threads), thus it can run up to 12 Wus simultaneously. Times are actual CPU times. Rewards are quite scattered. One hour of Xeon E5645 (core) time can be rewarded as little as 24 or as much as 50 credits, in average 36 credits. Twelve threads working at 100% could earn up to 10k credits per day.

Also see Counting FLOPSs in a FLOPS - part 1

Sort:  

Good work. A couple of questions:

How did you get the runtimes for GPUs that aren't the 1060 3GB?

How many times did you run each type of WU?

How did you get the runtimes for GPUs that aren't the 1060 3GB?

1060 6GB, from the project website, for example see here

How many times did you run each type of WU?

Once, no choice here.

1060 6GB, from the project website, for example see here

Did you use this example in your table? It seems like neither of the GPUs used to accomplish the tasks completed in that WU have the same specs as those in the table.

Also, is that how you compare the WU? i.e. Go to a WU, look at the tasks, and then extract information from there?

Once, no choice here.

Sorry, I should have asked how many times did you run WU from this particular application. It looks like all of them are from the same one. I'm wondering if some project administrators would be willing to give us some WU from these projects that we can use for tests.

Regarding the amicable numbers example, how can we compare those WU? If a WU takes a different amount of time for each task even on the same GPU, how can we compare across GPUs? This brings up the bigger point that different WU, even from the same application, can require very different computational power, if for example a conditional hits. I was thinking of addressing this by taking the average over many WU.

I think there's a typo, but I'm not sure. In the einstein@home table, the 1060 is listed 5 times in a row, with very different runtimes for some of them.

Did you use this example in your table?

No, just an example. I haven't saved links to WUs used in tables, so it would be difficult to find them again.

Also, is that how you compare the WU? i.e. Go to a WU, look at the tasks, and then extract information from there?

Yes

This brings up the bigger point that different WU, even from the same application, can require very different computational power

Yes, this is the case and the problem. And some explanation.

Regarding the amicable numbers example, how can we compare those WU? If a WU takes a different amount of time for each task even on the same GPU, how can we compare across GPUs? ....

I don't think the same WU X would take different times on a particular card and I have no possibility to test it. Also WU = task.

In the tables, each two consecutive lines are for the same WU. For example line 1 and 2 in Amicable, gtx 1060 and gtx 760 computed the same WU X. Line 3 and 4 gtx 1060 and RX 470 computed the same WU Y. I should have add another column with WU numbers to make it clear. As you see, the same gtx 1060 was used and works as a benchmark for relative comparisons. Single WU is probably enough for relative comparison. But only particular cards (computers) with confidence, not necessary card classes, as someone might have overheating problems or some system misconfiguration.
Using moving average over tasks would be better.

In some computer profiles you might see something like [3] NVIDIA GTX 1080 what means the rig has 3 cards installed, another might be 1070 or 750. Thus I used for comparisons computers with only one card.
Another problem - on 1080 you may run more than one task at once, depending how user will configure his BOINC client; as I don't own 1080 and couldn't research it in depth I've tried not to include such strong cards in the tables. Still, you can draw some conclusions if you compare RAC for a couple 1080s and times to complete tasks (WUs).

I don't think the same WU X would take different times and I have no possibility to test it. Also WU = task.

My mistake, I should have said application, not WU.

Yes

From that link you provided, it seems like every WU is comprised of two tasks, and both of those tasks are completed by different machines. Are those two tasks running the exact same computations? If not, I'm not sure it's accurate to compare them from that single data point.

I assumed it is the same task = WU run twice for verification purposes...
If there are two tasks being part of the same WU... I hope not!

Also, I've noticed other 1060s have almost exact times as mine running those WUs (when I had a chance to find a WU that was crunched by another 1060); so I hope WU = task, and not WU = task1 + task2.

P.S. some minor edits in my previous comment.

I assumed it is the same task = WU run twice for verification purposes...
If there are two tasks being part of the same WU... I hope not!

That seems likely, but verification is definitely needed. If that is the case though, that provides a very nice way of comparing hardwares directly.

Coin Marketplace

STEEM 0.19
TRX 0.14
JST 0.029
BTC 64689.27
ETH 3135.60
USDT 1.00
SBD 2.56