It is currently Thu Apr 18, 2024 7:55 pm

All times are UTC - 7 hours [ DST ]

Recent News:



Post new topic Reply to topic  [ 1534 posts ]  Go to page Previous  1 ... 27, 28, 29, 30, 31, 32, 33 ... 103  Next
Author Message
PostPosted: Thu Sep 18, 2008 8:47 am 
Offline
Top Contributor
Top Contributor

Joined: Sat May 17, 2008 10:27 pm
Posts: 704
Location: Round Rock, TX
Thanks: 26
Thanked: 22 times in 22 posts
I've seen the high temp warning before though never so many in such a short time span. I don't remember ever seeing warnings about the rear fans turning so slowly. If the reporting is accurate maybe that explains why the temps go so high. i.e. fans slow down and temps go up. I didn't include the messages about the fans returning to normal. I'll double check that tonight and see how long the fans were slow and if the temps returned to normal after the fans returned to normal.

_________________
HP MSS EX490, Intel Core 2 Quad Q9300, 1TB System drive, 6TB Storage pool managed by StableBit DrivePool


Top
 Profile  
Thanks  

Attention Guest: Remove this ad by Registering with the MediaSmartServer.net Forums. It's Free!
PostPosted: Thu Sep 18, 2008 8:55 am 
Offline
Max Contributor
Max Contributor

Joined: Fri Jan 18, 2008 11:32 am
Posts: 1027
Thanks: 2
Thanked: 84 times in 76 posts
TxDot, Based on what I've seen the warnings and errors last 2-3 secs and are no way relective of reality, It can't be a sensor throwing false readings because it happens on all sensors. I vote for a timing bug in WNAS which shows up in faster processors. I saw them using the 4050e as well as the 3800+ EE SFF. They have no effect on reliability and do not affect the health of the system, in other words if you never look at the event log you would never know it happened.

_________________
HP EX470, 4 GIG, 4200+ EE SFF, WHS V1, Production
HP EX485, 4 GIG, E5800, WHS 2011, Stablebit Drive Pool, Transfer Server, Backup Server
HP EX490, 4 GIG, Q9400S, WHS 2011, Stablebit Drive Pool, Mediaserver
HP EX485, 4 GIG, Q8200S, WHS 2011, ITUNES Server


Top
 Profile  
Thanks  
PostPosted: Thu Sep 18, 2008 9:15 am 
Offline
Top Contributor
Top Contributor

Joined: Sat May 17, 2008 10:27 pm
Posts: 704
Location: Round Rock, TX
Thanks: 26
Thanked: 22 times in 22 posts
erail wrote:
TxDot, Based on what I've seen the warnings and errors last 2-3 secs and are no way relective of reality, It can't be a sensor throwing false readings because it happens on all sensors. I vote for a timing bug in WNAS which shows up in faster processors. I saw them using the 4050e as well as the 3800+ EE SFF. They have no effect on reliability and do not affect the health of the system, in other words if you never look at the event log you would never know it happened.

Are you seeing the divide by zero errors as well? I don't remember all the details but I think it's a 7f error.

_________________
HP MSS EX490, Intel Core 2 Quad Q9300, 1TB System drive, 6TB Storage pool managed by StableBit DrivePool


Top
 Profile  
Thanks  
PostPosted: Thu Sep 18, 2008 9:40 am 
Offline
Max Contributor
Max Contributor

Joined: Fri Jan 18, 2008 11:32 am
Posts: 1027
Thanks: 2
Thanked: 84 times in 76 posts
I have seen no zero divide errors. What component is throwing them?
edit: TxDot, I feel that 14 minutes is too long to try to tie a event error to a shut down.

_________________
HP EX470, 4 GIG, 4200+ EE SFF, WHS V1, Production
HP EX485, 4 GIG, E5800, WHS 2011, Stablebit Drive Pool, Transfer Server, Backup Server
HP EX490, 4 GIG, Q9400S, WHS 2011, Stablebit Drive Pool, Mediaserver
HP EX485, 4 GIG, Q8200S, WHS 2011, ITUNES Server


Top
 Profile  
Thanks  
PostPosted: Thu Sep 18, 2008 12:39 pm 
Offline
1TB storage
1TB storage
User avatar

Joined: Mon Sep 01, 2008 7:35 pm
Posts: 36
Location: Seattleish, WA
Thanks: 0
Thanked: 0 time in 0 post
erail wrote:
I have seen no zero divide errors. What component is throwing them?
edit: TxDot, I feel that 14 minutes is too long to try to tie a event error to a shut down.


I hypothesized about one of the dumps that showed a DivByZero error starting here:

viewtopic.php?p=17904#p17904

_________________
:: Mark


Top
 Profile  
Thanks  
PostPosted: Fri Sep 19, 2008 11:09 pm 
Offline
2.5TB storage
2.5TB storage

Joined: Thu Apr 03, 2008 3:52 pm
Posts: 217
Thanks: 12
Thanked: 61 times in 32 posts
Looking at your stack trace:

Code:
b9cb2b24 8091ccdd 0000007f b912d4d9 00000000 nt!KeBugCheck+0x14
b9cb2b7c 8088a400 b9cb2b88 b9cb2c10 b912d4d9 nt!Ki386CheckDivideByZeroTrap+0x41
b9cb2b7c b912d4d9 b9cb2b88 b9cb2c10 b912d4d9 nt!KiTrap00+0x88
WARNING: Stack unwind information not available. Following frames may be wrong.
b9cb2c10 b91276ca 00001001 00008001 b9cb2c30 WNAS+0x64d9


It looks at though your WNAS driver did a divide by zero operation, thus crashing the driver and probably the operating system.

I've pasted the assembly at WNAS+0x64d9 in a screenshot below. I don't know what this driver was trying to do.


Attachments:
WinDbg-Screenshot.PNG
WinDbg-Screenshot.PNG [ 26.4 KiB | Viewed 11105 times ]
Top
 Profile  
Thanks  
PostPosted: Sat Sep 20, 2008 5:08 am 
Offline
1TB storage
1TB storage
User avatar

Joined: Mon Sep 01, 2008 7:35 pm
Posts: 36
Location: Seattleish, WA
Thanks: 0
Thanked: 0 time in 0 post
cakalapati wrote:
Looking at your stack trace:

Code:
b9cb2b24 8091ccdd 0000007f b912d4d9 00000000 nt!KeBugCheck+0x14
b9cb2b7c 8088a400 b9cb2b88 b9cb2c10 b912d4d9 nt!Ki386CheckDivideByZeroTrap+0x41
b9cb2b7c b912d4d9 b9cb2b88 b9cb2c10 b912d4d9 nt!KiTrap00+0x88
WARNING: Stack unwind information not available. Following frames may be wrong.
b9cb2c10 b91276ca 00001001 00008001 b9cb2c30 WNAS+0x64d9

It looks at though your WNAS driver did a divide by zero operation, thus crashing the driver and probably the operating system.

I've pasted the assembly at WNAS+0x64d9 in a screenshot below. I don't know what this driver was trying to do.

Actually, I dunno that it was the WNAS driver... One of the things that I mentioned many pages ago when I looked at the trace is that we really wouldn't be able to tell what happens since we don't have a debug build of Server 2003.

The trace shows "WNAS+0x64d9" because the load point to the WNAS driver was the closest symbol that the debugger had. Pushing 0x64d9 past that is more than 25k, and it could be in some other driver's address space, or kernel or who knows where...

What's happening in that disassembly is that the CPU is looking to divide some value into 1350000. "1350000" is a pretty specific number, and as such, it could give us a clue as to what was going on. The divisor of zero comes as a result of a multiply instruction which is taking it's operands from two memory locations. If either one of them contained zero, then (as we all remember) the result would be zero, and that would leave us with the fodder to create the DivByZero error.

And... Given that this looks like it's happening in the CPU and not the FPU, and the divisor is (basically) being pulled from memory, that leads me to think that we're not chasing CPU errors under heat/load, but bus errors (or the memory controller). We know it's not a memory error 'cause this issues shows up running the stock RAM stick.

_________________
:: Mark


Top
 Profile  
Thanks  
PostPosted: Sat Sep 20, 2008 4:58 pm 
Offline
2.5TB storage
2.5TB storage

Joined: Tue Sep 09, 2008 10:41 am
Posts: 366
Thanks: 0
Thanked: 4 times in 4 posts
midiwall wrote:
cakalapati wrote:
...
And... Given that this looks like it's happening in the CPU and not the FPU, and the divisor is (basically) being pulled from memory, that leads me to think that we're not chasing CPU errors under heat/load, but bus errors (or the memory controller). We know it's not a memory error 'cause this issues shows up running the stock RAM stick.



I think you are onto something here.

on my machine:

1) I can upgrade the ram with the g-skill 2gb stick and all is well.
2) if I use the stock ram and upgrade the cpu to a 1640 it boots and works fine, until i tax the system some then it just shuts down HARD.
3) the intersting bit is that if i try the 1640 WITH the memory upgrade the machine wont even finish booting (it wont even get to a point that I can ping it from another machine)- before it HARD shutsdown.

I repeated all of the above over and over again and always the same.

So clearly adding in the memory stick upgrade adds stress to something that is already taxed by the cpu upgrade. I'm completely clueless about such things but I'm starting to think it has somethign to do with the bus. I really dont know exactly what I'm reading but when i try to look at the amd data sheets it looks like all these alternative chips have different bus speeds compared to the original?

that's my shot in the dark for the day.

What's interesting is once i get that hard shut from scenario 3 if i go right back at it and try to reboot the Hard shutdown occurs sooner and sooner each time. So somehow heat of something plays into all this. Might not be the cause at all but it certainly seems to have an effect on the symptoms.


Top
 Profile  
Thanks  
PostPosted: Tue Sep 23, 2008 7:09 pm 
Offline
2.5TB storage
2.5TB storage

Joined: Mon Jul 28, 2008 12:07 pm
Posts: 255
Thanks: 4
Thanked: 4 times in 4 posts
I just wanted to report in on the 3800+SFF. I haven't had a chance to try OCCT because at this point my server fulfills certain roles in my network and I don't want it to shut down when I'm not at home. However, the server has been running just fine for the past 2 weeks at least. I reduced the fans speeds to optimized defaults in the MSS Fan Speed Control program and even moved the server to a less ventilated location and it has been doing just fine. The temps get pretty darn high on the NB and the ACPI but it runs just fine. In fact the temps are even higher than I've seen with the BE2350 at the same fan speeds. I no longer use the airflow mod since I don't really need it. That mod lowered my ACPI and NB temps quite a bit but my HDD temps went up. But since the machine is stable I'm not going to use it. I want to do a server restore this weekend and restore my permissions and I think the hardware will be fine.


Top
 Profile  
Thanks  
PostPosted: Tue Sep 23, 2008 9:00 pm 
Offline
Top Contributor
Top Contributor

Joined: Sat May 17, 2008 10:27 pm
Posts: 704
Location: Round Rock, TX
Thanks: 26
Thanked: 22 times in 22 posts
sxr71 wrote:
...I no longer use the airflow mod since I don't really need it. That mod lowered my ACPI and NB temps quite a bit but my HDD temps went up...

Are you referring to the fan added to the NB heat sink?

_________________
HP MSS EX490, Intel Core 2 Quad Q9300, 1TB System drive, 6TB Storage pool managed by StableBit DrivePool


Top
 Profile  
Thanks  
PostPosted: Wed Sep 24, 2008 10:56 pm 
Offline
2.5TB storage
2.5TB storage

Joined: Mon Jul 28, 2008 12:07 pm
Posts: 255
Thanks: 4
Thanked: 4 times in 4 posts
TxDot wrote:
sxr71 wrote:
...I no longer use the airflow mod since I don't really need it. That mod lowered my ACPI and NB temps quite a bit but my HDD temps went up...

Are you referring to the fan added to the NB heat sink?



No, my jury-rigged paper mod consisting of placing a folded piece of paper just large enough to either fully or partly block the air intake in front of the drives and then turn the fan speed to maximum. This way the air has to flow through the bottom part of the case through the heatsinks on the MB and possibly around the PSU. After doing this it seemed to help my old configuration of the BE2350 stay more stable during stressful loads. I left it that way and cranked out a pretty big transcoding job that pegged the CPU at 100% for several hours. A task that surely wouldn't have completed without the paper based on what was happening before when I tried the same transcoding job without the paper. It lowered the ACPI temp the most. I would keep an eye on drive temps to be safe and alter the amount and location of obstruction based on drive temps.


Top
 Profile  
Thanks  
PostPosted: Thu Sep 25, 2008 10:18 am 
Offline
Top Contributor
Top Contributor

Joined: Sat May 17, 2008 10:27 pm
Posts: 704
Location: Round Rock, TX
Thanks: 26
Thanked: 22 times in 22 posts
sxr71 wrote:
TxDot wrote:
sxr71 wrote:
...I no longer use the airflow mod since I don't really need it. That mod lowered my ACPI and NB temps quite a bit but my HDD temps went up...

Are you referring to the fan added to the NB heat sink?



No, my jury-rigged paper mod consisting of placing a folded piece of paper just large enough to either fully or partly block the air intake in front of the drives and then turn the fan speed to maximum. This way the air has to flow through the bottom part of the case through the heatsinks on the MB and possibly around the PSU. After doing this it seemed to help my old configuration of the BE2350 stay more stable during stressful loads. I left it that way and cranked out a pretty big transcoding job that pegged the CPU at 100% for several hours. A task that surely wouldn't have completed without the paper based on what was happening before when I tried the same transcoding job without the paper. It lowered the ACPI temp the most. I would keep an eye on drive temps to be safe and alter the amount and location of obstruction based on drive temps.

Any chance you have pictures of it in place?

_________________
HP MSS EX490, Intel Core 2 Quad Q9300, 1TB System drive, 6TB Storage pool managed by StableBit DrivePool


Top
 Profile  
Thanks  
PostPosted: Thu Sep 25, 2008 11:58 am 
Offline
2.5TB storage
2.5TB storage

Joined: Mon Jul 28, 2008 12:07 pm
Posts: 255
Thanks: 4
Thanked: 4 times in 4 posts
Sure if I can find the piece of paper I used, or I'll make a new one. I had to make cut outs for the lights. The server actually looks nice with it, it has a sort of clean look with the black mesh of the door in front of the white paper. I'll bet other colors would look nice too.


Top
 Profile  
Thanks  
PostPosted: Sun Sep 28, 2008 3:28 pm 
Offline
2.5TB storage
2.5TB storage

Joined: Thu Jan 31, 2008 12:59 pm
Posts: 298
Location: Quebec
Thanks: 9
Thanked: 34 times in 27 posts
Time to provide a report. :)

I've searched for motherboards' BIOSes with SiS761GX north and 966 south bridges + AM2 sockets and didn't find a lot. My search was probably too narrow, but I could only find Shuttles and Foxconns, while ECS and MSI boards where AMI or other.

Finally, with trial and error (and lots of A: driver rebooting :wink: ) I came upon a new combination of ACPI table and AGESACPU components. I think they are more recent then the ones we had from Shuttle and Abit.

For the ACPI table I stick with Shuttle SS21T BIOS but use the SS212S10M release dated 2008/07/08 here. ymboc had used the preceding release of SS21S10J which was dated 2007/06/27. Not sure it is such a big upgrade though, but it works on my machine!

For the AGESACPU component I went with the Foxconn 761GXM2MA-KRS2H board. The 694W1P28 BIOS release is dated 2008/03/31 here. The Abit AN9 BIOS, the m601b18, was dated 2007/11/19. There is a new m601b19 that I have not tested.

The combination of the two components (Shuttle ACPI + Foxconn AGESACPU) is working OK for my BE-2350. The Northbridge temp is at 54C as reported by Speedfan. I will try an Orthos short test later on, to monitor voltages when the system is full load.

Please understand me well, I'm not saying this combination is better than the one ymboc had suggested, I'm just saying that other combinations seem possible and maybe could improve some problems that have been experienced with other processors than the stock one. Don't forget that ACPI deals with power management.

For those that have not yet performed a BIOS mod and would be tempted to try this, read ymboc's guide, earlier on page 4, in this actual thread before trying anything.

_________________
HP-ProLiant N40L 8GB WHS2011
HP-EX470 4GB Athlon-X2-BE-2350-(Stepping G1) Windows 8 (server build)


Top
 Profile  
Thanks  
The following user would like to thank VieuxJules for this post
ymboc
PostPosted: Mon Sep 29, 2008 5:45 am 
Offline
Top Contributor
Top Contributor

Joined: Sat Feb 09, 2008 7:50 pm
Posts: 765
Thanks: 18
Thanked: 119 times in 44 posts
Cool. I didn't realize that shuttle released an updated bios for the SS21T (after I prepared the original instructions). I'll do a compare between the two ACPI tables to see if they changed anything.

Did you try the agesacpu from the shuttle bios? Comparing the release dates you posted for the foxconn vs the shuttle, it looks like the shuttle may have the newer agesacpu component as well.

So... you didn't come right out and say it but are you noticing an improvement with the changes you made?

Followup: I compared the acpitbl from the 'M' & the 'J' shuttle bios and they are the same...
Code:
C:\bios>comp M_ACPITBL.BIN J_ACPITBL.BIN
Comparing M_ACPITBL.BIN and J_ACPITBL.BIN...
Files compare OK


Top
 Profile  
Thanks  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1534 posts ]  Go to page Previous  1 ... 27, 28, 29, 30, 31, 32, 33 ... 103  Next

All times are UTC - 7 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group