From: Rich Alderson Newsgroups: alt.sys.pdp10 Subject: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: 07 Aug 2001 14:50:32 -0400 Organization: Systems Administration, XKL LLC, Redmond WA 98052 Lines: 50 Sender: alderson+news@panix6.panix.com Message-ID: NNTP-Posting-Host: panix6.panix.com X-Trace: news.panix.com 997210232 21164 166.84.0.231 (7 Aug 2001 18:50:32 GMT) X-Complaints-To: abuse@panix.com NNTP-Posting-Date: 7 Aug 2001 18:50:32 GMT X-Newsreader: Gnus v5.7/Emacs 20.7 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsxfer.eecs.umich.edu!panix!news.panix.com!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6381 The following is a portion of the console log of Mathom.XKL.COM, wherein the very first ever recorded UP2LNG BUGHLT occurred early today: 01-08-07-02:22:26 ******************** 01-08-07-02:22:27 *BUG BUGINF: "NODDMP" at 7-Aug-2001 02:26:18 01-08-07-02:22:27 *DDMP fork blocked 01-08-07-02:22:27 * Data: DDPDUE=2 01-08-07-02:22:27 ******************** 01-08-07-02:22:27 $11B>>HLTAD0/ XCT HLTADR BUGHLT/ UPDTCK+4 01-08-07-10:27:14 UPDTCK+4/ XCT UP2LNG todclk[ 377777,,777763 01-08-07-10:34:21 up2lng? 01-08-07-10:34:23 APRSRV G 01-08-07-10:34:24 01-08-07-10:36:03 tadidt[ 145022,,554057 01-08-07-10:38:23 01-08-07-10:38:24 1[ 17,,500000 01-08-07-10:39:10 updtck/ RDTIME T1 01-08-07-10:39:16 UPDTCK+1/ DSUB T1,TIMBAS 01-08-07-10:39:16 UPDTCK+2/ SKIPL T1 01-08-07-10:39:17 UPDTCK+3/ CAML T1,BASDIV 01-08-07-10:39:17 UPDTCK+4/ XCT UP2LNG 01-08-07-10:39:18 UPDTCK+5/ DIV T1,BASDIV 01-08-07-10:39:27 01-08-07-10:39:27 timbas[ 3760,,404262 01-08-07-10:39:34 TIMBAS+1[ 20140,,464000 01-08-07-10:39:41 1[ 17,,500000 01-08-07-10:39:42 2[ 243,,730000 01-08-07-10:39:49 basdiv[ 17,,500000 01-08-07-10:40:18 01-08-07-10:40:29 01-08-07-10:40:29 XKL-1>halt 01-08-07-10:40:32 XKL-1%b The translation of the re-boot time above (TADIDT) is !dd DDT 1[ 0 .priou 2[ 0 145022,,554057 odtim$x 5-Jul-2000 10:04:00<> ^Z ! -- Rich Alderson Mgr., SysAdmin XKL LLC ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Wed, 08 Aug 01 09:54:35 GMT Organization: UltraNet Communications, Inc. Lines: 25 Message-ID: <9krbr5$137$2@bob.news.rcn.net> References: <3b711bed_2@news.iglou.com> X-Trace: UmFuZG9tSVZhVdXsv/EU1lEN9/dOWNpmBQyp2MpJuN/trozHvzjPB9HLG3diI0pA X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 8 Aug 2001 12:37:57 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.imp.ch!news.imp.ch!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-255-96 Xref: chonsp.franklin.ch alt.sys.pdp10:6371 In article <3b711bed_2@news.iglou.com>, "Douglas H. Quebbeman" wrote: > >"Rich Alderson" wrote in message >news:mddr8unn1qf.fsf@panix6.panix.com... >> The following is a portion of the console log of Mathom.XKL.COM, wherein >the >> very first ever recorded UP2LNG BUGHLT occurred early today: > >[..snip..] > >> odtim$x >> 5-Jul-2000 10:04:00<> > >Can we assume this TD-1 was left UP TOO LONG on purpose? > >And congrats... ;-) -dq That says a lot about the hardware. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Wed, 08 Aug 01 11:58:12 GMT Organization: UltraNet Communications, Inc. Lines: 29 Message-ID: <9krj3o$pu3$2@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> X-Trace: UmFuZG9tSVZFyJez3zfR3IVsVPwXOW2GT4a2l70W97AY2XBmM6Xf4Lu6aHSxHWqw X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 8 Aug 2001 14:42:00 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!216-164-248-147 Xref: chonsp.franklin.ch alt.sys.pdp10:6375 In article <3B7140A7.1E914AEA@bartek.dontspamme.net>, Arthur Krewat wrote: >Rich Alderson wrote: >> >> The following is a portion of the console log >>of Mathom.XKL.COM, wherein the >> very first ever recorded UP2LNG BUGHLT occurred early today: > >Is this something that happens to TOPS-10 also, >or is it immune to this? I have no idea. There should be a symbol UPTIME or something in the monitor whose location should contain the count of ticks since last reload. I have no idea what the code does with an overflow. >This might be a problem :) A nice kind of problem. What do you do? Define another word that contains a count of overflow conditions? Rats. I bet I have a design bug in my USAGE files. Keeping a system up for a year and a month was under TW's category of fairies, virgins, and inside straights. /BAH Subtract a hundred and four for e-mail. ###### Reply-To: "Douglas H. Quebbeman" From: "Douglas H. Quebbeman" Newsgroups: alt.sys.pdp10 References: Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Wed, 8 Aug 2001 07:01:00 -0400 Lines: 23 Organization: The Estopinal Group X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.00.3018.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300 NNTP-Posting-Host: 204.250.0.238 X-Original-NNTP-Posting-Host: 204.250.0.238 Message-ID: <3b711bed_2@news.iglou.com> X-Trace: news.iglou.com 997268461 204.250.0.238 (8 Aug 2001 07:01:01 -0400) X-Authenticated-User: dougq X-Original-NNTP-Posting-Host: 204.250.0.238 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!nntp.abs.net!uunet!dca.uu.net!ash.uu.net!news.iglou.com Xref: chonsp.franklin.ch alt.sys.pdp10:6365 "Rich Alderson" wrote in message news:mddr8unn1qf.fsf@panix6.panix.com... > The following is a portion of the console log of Mathom.XKL.COM, wherein the > very first ever recorded UP2LNG BUGHLT occurred early today: [..snip..] > odtim$x > 5-Jul-2000 10:04:00<> Can we assume this TD-1 was left UP TOO LONG on purpose? And congrats... ;-) -dq -- Surgically Excise the Pig-Latin from my e-mail address in order to reply... No Tourbots ###### From: Arthur Krewat Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Organization: Kilonet.net Lines: 9 Message-ID: <3B7140A7.1E914AEA@bartek.dontspamme.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 4.7 [en] (X11; I; SunOS 5.8 i86pc) X-Accept-Language: en Date: Wed, 08 Aug 2001 13:42:31 GMT NNTP-Posting-Host: 24.186.100.134 X-Trace: news02.optonline.net 997278151 24.186.100.134 (Wed, 08 Aug 2001 09:42:31 EDT) NNTP-Posting-Date: Wed, 08 Aug 2001 09:42:31 EDT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!howland.erols.net!cyclone2.usenetserver.com!usenetserver.com!news01.optonline.net!news02.optonline.net.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6382 Rich Alderson wrote: > > The following is a portion of the console log of Mathom.XKL.COM, wherein the > very first ever recorded UP2LNG BUGHLT occurred early today: Is this something that happens to TOPS-10 also, or is it immune to this? This might be a problem :) aak ###### Path: chonsp.franklin.ch!not-for-mail From: Neil Franklin Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: 08 Aug 2001 23:26:38 +0200 Organization: My own Private Self Lines: 99 Message-ID: <6ud766dz01.fsf@chonsp.franklin.ch> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> NNTP-Posting-Host: chonsp.franklin.ch X-Trace: chonsp.franklin.ch 997305998 1575 10.0.3.2 (8 Aug 2001 21:26:38 GMT) X-Complaints-To: news@chonsp.franklin.ch NNTP-Posting-Date: 8 Aug 2001 21:26:38 GMT X-Newsreader: Gnus v5.7/Emacs 20.4 Xref: chonsp.franklin.ch alt.sys.pdp10:6397 jmfbahciv@aol.com writes: > In article <3B7140A7.1E914AEA@bartek.dontspamme.net>, > Arthur Krewat wrote: > >Rich Alderson wrote: > >> > >> The following is a portion of the console log > >>of Mathom.XKL.COM, wherein the > >> very first ever recorded UP2LNG BUGHLT occurred early today: A built in max time suicide? [user shaken] Linux in its old days (1.0.x kernel) had an 2^32*0.01s = 497 days suicide in it. Once it was experienced by someone there was a clamour on the kernel developers mailing list and this bug was fixed 3 days later. I have since heard of 5 year uptimes. Windoze had an 49 day suicide in it, that took over 10 years to find :-). > >This might be a problem :) > > A nice kind of problem. > > Keeping a system up for a year and a month > was under TW's category of fairies, virgins, and inside straights. User shaken a second time, from such an assumption, what experiences must have lead to that. A short look at my servers: root.galapagos:/root 1>uname -a Linux galapagos 2.2.18pre21 #1 Sat Nov 18 18:47:15 EST 2000 i686 unknown root.galapagos:/root 2>uptime 11:12pm up 93 days, 11:11, 1 user, load average: 0.00, 0.00, 0.00 is so low because machine was moved to new room then root.gray:/root 1> uname -a SunOS gray 5.6 Generic_105181-16 sun4d sparc SUNW,SPARCserver-1000 root.gray:/root 2> uptime 11:04pm up 349 day(s), 7:12, 1 user, load average: 0.05, 0.02, 0.02 was reboot because wedged database, IIRC root.island:/ 1>uname -a IRIX64 island 6.5 01200532 IP27 root.island:/ 2>uptime 11:03pm up 14 days, 12:50, 9 users, load average: 0.98, 0.96, 0.83 was victim of an full swap partition, triggered kernal bug, took 2xx days uptime down with it root.keaton:/root 1> uname -a SunOS keaton 5.6 Generic_105181-16 sun4u sparc SUNW,Ultra-2 root.keaton:/root 2> uptime 11:04pm up 391 day(s), 12:34, 1 user, load average: 0.02, 0.01, 0.02 last reboot was a power faillure, damn building crew root.lasseter:/ 1>uname -a IRIX64 lasseter 6.5 07201607 IP27 root.lasseter:/ 2>uptime 11:05pm up 58 days, 11:29, 1 user, load average: 0.00, 0.00, 0.00 was shutdown for replacing broken hardware root.murnau:/root 1> uname -a SunOS murnau 5.6 Generic_105181-16 sun4u sparc SUNW,Ultra-2 root.murnau:/root 2> uptime 11:04pm up 391 day(s), 12:34, 1 user, load average: 0.02, 0.01, 0.01 was victim of same power faillure as keaton root.xar:/ 1> uname -a IRIX64 xar 6.5 10181058 IP27 root.xar:/ 2> uptime 11:04pm up 238 days, 9:02, 1 user, load average: 0.36, 0.30, 0.21 can not remember last reboot reason, as was while I was away So that is 2 of 7 over 1 year and 1 month, and a further one getting 1 year next month. Are modern computers really that much better in this respect? -- Neil Franklin, neil@franklin.ch.remove http://neil.franklin.ch/ Hacker, Unix Guru, El Eng HTL/BSc, Sysadmin, Archer, Roleplayer - Intellectual Property is Intellectual Robbery ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 12:17:20 GMT Organization: UltraNet Communications, Inc. Lines: 42 Message-ID: <9ku8j2$eu9$5@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <0935ntc1gfeul6hn8uu6ar81guc57id2qo@4ax.com> X-Trace: UmFuZG9tSVaHegXljJBj80u7OoGEdNo7jyBDbglYvMaYjQItJxRxKE4s37g/ed7i X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 15:00:50 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!pinatubo.switch.ch!newsfeeds.belnet.be!news.belnet.be!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!newsfeed.online.be!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-255-192 Xref: chonsp.franklin.ch alt.sys.pdp10:6405 In article <0935ntc1gfeul6hn8uu6ar81guc57id2qo@4ax.com>, R.J.S. wrote: >On Wed, 08 Aug 01 11:58:12 GMT, jmfbahciv@aol.com wrote: > >>In article <3B7140A7.1E914AEA@bartek.dontspamme.net>, >> Arthur Krewat wrote: >>>Is this something that happens to TOPS-10 also, >>>or is it immune to this? >> >>I have no idea. There should be a symbol UPTIME or something >>in the monitor whose location should contain the count of >>ticks since last reload. I have no idea what the code does >>with an overflow. > >Back in the days of 6.01, 6.02, 6.03, HOSS got an SPR from a KA owner >complaining about the system being up for something like 48 days, and >then crashing because one of the time of day fields overflowed. I remember that. > >22 bits worth of seconds is about 48 days, BTW. > >I remember folks expressing amazement that they were able to keep >their KA running for more that a month. It was probably more like jealousy. > >We had a hard time keeping systems up for over 1 shift (8 hours), let >alone a day or a week. In any case, we did PM's every week. We didn't have much trouble with crashes when we ran the SOUP edit. We had a hell of a lot trouble keeping TW from biting. This was in the days when we'ld been reduced down to one PDP-10 system that had heavy usage. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 07:22:32 GMT Organization: UltraNet Communications, Inc. Lines: 16 Message-ID: <9ktnac$c11$1@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> X-Trace: UmFuZG9tSVZflQL2LT52GlTv6txFvqA66QKGb9CweUP1kuP8T6mvSICBLGAL0Dim X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 10:06:04 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!1963841!news.imp.ch!fr.clara.net!heighliner.fr.clara.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-216-249 Xref: chonsp.franklin.ch alt.sys.pdp10:6406 In article , dittman@dittman.net wrote: >jmfbahciv@aol.com wrote: >: A nice kind of problem. >: What do you do? Define another word that contains a count of >: overflow conditions? Rats. I bet I have a design bug in >: my USAGE files. Keeping a system up for a year and a month >: was under TW's category of fairies, virgins, and inside straights. > >An uptime of only a year and a month isn't very good. It is if the system needs to come down regularly for PMs. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 07:29:01 GMT Organization: UltraNet Communications, Inc. Lines: 52 Message-ID: <9ktnmh$c11$4@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <6ud766dz01.fsf@chonsp.franklin.ch> X-Trace: UmFuZG9tSVbd8LDJZcKRoGoDJEjzAQXbEbD57UD4rqOdg7tpkLi0LAxj39YtGAqh X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 10:12:33 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-216-249 Xref: chonsp.franklin.ch alt.sys.pdp10:6407 In article <6ud766dz01.fsf@chonsp.franklin.ch>, Neil Franklin wrote: >jmfbahciv@aol.com writes: > >> In article <3B7140A7.1E914AEA@bartek.dontspamme.net>, >> Arthur Krewat wrote: >> >Rich Alderson wrote: >> >> >> >> The following is a portion of the console log >> >>of Mathom.XKL.COM, wherein the >> >> very first ever recorded UP2LNG BUGHLT occurred early today: > >A built in max time suicide? [user shaken] > >Linux in its old days (1.0.x kernel) had an 2^32*0.01s = 497 days >suicide in it. Once it was experienced by someone there was a clamour >on the kernel developers mailing list and this bug was fixed 3 days >later. I have since heard of 5 year uptimes. So what did they do? The -10s kept track of uptimes; that data was reported in a number of places. Just zeroing the word is not a solution. Hmm...I'm pretty sure that the USAGE file has to change now. > >Windoze had an 49 day suicide in it, that took over 10 years to find :-). > > >> >This might be a problem :) >> >> A nice kind of problem. >> >> Keeping a system up for a year and a month >> was under TW's category of fairies, virgins, and inside straights. > >User shaken a second time, from such an assumption, what experiences >must have lead to that. Learn the history and you'll understand the assumption. >So that is 2 of 7 over 1 year and 1 month, and a further one getting 1 >year next month. Are modern computers really that much better in this >respect? The hardware sure is. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 07:30:34 GMT Organization: UltraNet Communications, Inc. Lines: 38 Message-ID: <9ktnpe$c11$5@bob.news.rcn.net> References: <3b711bed_2@news.iglou.com> <9krbr5$137$2@bob.news.rcn.net> <3b71bc8c$1_1@news.iglou.com> X-Trace: UmFuZG9tSVbvOhhRI5vAbTm2xBiACa3le3PcjT0Gqz3ERYHJSJqj0phuEpEpGv09 X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 10:14:06 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-216-249 Xref: chonsp.franklin.ch alt.sys.pdp10:6408 In article <3b71bc8c$1_1@news.iglou.com>, "Douglas H. Quebbeman" wrote: > wrote in message news:9krbr5$137$2@bob.news.rcn.net... >> In article <3b711bed_2@news.iglou.com>, >> "Douglas H. Quebbeman" >> wrote: >> > >> >"Rich Alderson" wrote in message >> >news:mddr8unn1qf.fsf@panix6.panix.com... >> >> The following is a portion of the console log of Mathom.XKL.COM, >wherein >> >the >> >> very first ever recorded UP2LNG BUGHLT occurred early today: >> > >> >[..snip..] >> > >> >> odtim$x >> >> 5-Jul-2000 10:04:00<> >> > >> >Can we assume this TD-1 was left UP TOO LONG on purpose? >> > >> >And congrats... ;-) -dq >> >> That says a lot about the hardware. > >Yup, now all we need to do is convince XKL to put the TD-1 >back into production, and to sell them for $250.00. > >Ok, ok, I'd pay $2500 for one (what else is that equity for?)... > >;-) -dq > If I had the money, I would have paid the $100K. If the price had been $10K, I'd be running one now. /BAH Subtract a hundred and four for e-mail. ###### Message-ID: <3B7099A0.C2B08955@jetnet.ab.ca> From: Ben Franchuk X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.19 i586) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <6ud766dz01.fsf@chonsp.franklin.ch> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 13 Date: Tue, 07 Aug 2001 19:45:04 -0600 NNTP-Posting-Host: 207.153.6.54 X-Trace: newsfeed.slurp.net 997313534 207.153.6.54 (Wed, 08 Aug 2001 18:32:14 CDT) NNTP-Posting-Date: Wed, 08 Aug 2001 18:32:14 CDT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.slurp.net!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6398 Neil Franklin wrote: > So that is 2 of 7 over 1 year and 1 month, and a further one getting 1 > year next month. Are modern computers really that much better in this > respect? Considering the Half life of computer is about two years one year running is not bad. Ben -- Standard Disclaimer : 97% speculation 2% bad grammar 1% facts. "Pre-historic Cpu's" http://www.jetnet.ab.ca/users/bfranchuk Now with schematics. ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.3-12 (i686)) Lines: 11 Message-ID: X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Wed, 08 Aug 2001 14:57:45 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Wed, 08 Aug 2001 18:57:45 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!isdnet!news-xfer.siscom.net!easynews!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl1.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6427 jmfbahciv@aol.com wrote: : A nice kind of problem. : What do you do? Define another word that contains a count of : overflow conditions? Rats. I bet I have a design bug in : my USAGE files. Keeping a system up for a year and a month : was under TW's category of fairies, virgins, and inside straights. An uptime of only a year and a month isn't very good. -- Eric Dittman dittman@dittman.net ###### Reply-To: "Douglas H. Quebbeman" From: "Douglas H. Quebbeman" Newsgroups: alt.sys.pdp10 References: <3b711bed_2@news.iglou.com> <9krbr5$137$2@bob.news.rcn.net> Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Wed, 8 Aug 2001 18:26:19 -0400 Lines: 38 Organization: Full Circle Systems X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.00.3018.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300 NNTP-Posting-Host: 204.250.0.238 X-Original-NNTP-Posting-Host: 204.250.0.238 Message-ID: <3b71bc8c$1_1@news.iglou.com> X-Trace: news.iglou.com 997309580 204.250.0.238 (8 Aug 2001 18:26:20 -0400) X-Authenticated-User: dougq X-Original-NNTP-Posting-Host: 204.250.0.238 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!isdnet!sunqbc.risq.qc.ca!newsfeed.mathworks.com!wn3feed!worldnet.att.net!198.6.0.7!uunet!ash.uu.net!news.iglou.com Xref: chonsp.franklin.ch alt.sys.pdp10:6401 wrote in message news:9krbr5$137$2@bob.news.rcn.net... > In article <3b711bed_2@news.iglou.com>, > "Douglas H. Quebbeman" > wrote: > > > >"Rich Alderson" wrote in message > >news:mddr8unn1qf.fsf@panix6.panix.com... > >> The following is a portion of the console log of Mathom.XKL.COM, wherein > >the > >> very first ever recorded UP2LNG BUGHLT occurred early today: > > > >[..snip..] > > > >> odtim$x > >> 5-Jul-2000 10:04:00<> > > > >Can we assume this TD-1 was left UP TOO LONG on purpose? > > > >And congrats... ;-) -dq > > That says a lot about the hardware. Yup, now all we need to do is convince XKL to put the TD-1 back into production, and to sell them for $250.00. Ok, ok, I'd pay $2500 for one (what else is that equity for?)... ;-) -dq -- Surgically Excise the Pig-Latin from my e-mail address in order to reply... No Tourbots ###### From: never+mail@panics.com.invalid (Michael Roach) Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: 9 Aug 2001 02:14:09 GMT Organization: A small notepad underneath my in box Lines: 14 Message-ID: <9ksrlh$sd8$2@news.panix.com> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <6ud766dz01.fsf@chonsp.franklin.ch> NNTP-Posting-Host: panix3.panix.com X-Trace: news.panix.com 997323249 29096 166.84.1.3 (9 Aug 2001 02:14:09 GMT) X-Complaints-To: abuse@panix.com NNTP-Posting-Date: 9 Aug 2001 02:14:09 GMT X-Newsreader: trn 4.0-test74 (May 26, 2000) Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!howland.erols.net!panix!news.panix.com!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6403 In article <6ud766dz01.fsf@chonsp.franklin.ch>, Neil Franklin wrote: >Linux in its old days (1.0.x kernel) had an 2^32*0.01s = 497 days >suicide in it. Once it was experienced by someone there was a clamour >on the kernel developers mailing list and this bug was fixed 3 days >later. I have since heard of 5 year uptimes. > >Windoze had an 49 day suicide in it, that took over 10 years to find :-). Well of course, it took that long to find a Windows machine that wouldn't crash first! -- BLISS is ignorance ###### From: R.J.S. Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 2001 09:23:19 -0400 Organization: Very Organized Message-ID: <0935ntc1gfeul6hn8uu6ar81guc57id2qo@4ax.com> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> X-Newsreader: Forte Agent 1.8/32.548 X-No-Archive: yes MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Original-NNTP-Posting-Host: 192.168.1.2 X-Original-Trace: 9 Aug 2001 09:23:20 -0400, 192.168.1.2 X-Complaints-To: newsabuse@supernews.com Lines: 23 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!newsfeed.online.be!news-hog.berkeley.edu!ucberkeley!tethys.csu.net!canoe.uoregon.edu!logbridge.uoregon.edu!newsfeed.dacom.co.kr!news-xfer.nuri.net!sn-xit-03!sn-post-01!supernews.com!corp.supernews.com!barn.net.com Xref: chonsp.franklin.ch alt.sys.pdp10:6416 On Wed, 08 Aug 01 11:58:12 GMT, jmfbahciv@aol.com wrote: >In article <3B7140A7.1E914AEA@bartek.dontspamme.net>, > Arthur Krewat wrote: >>Is this something that happens to TOPS-10 also, >>or is it immune to this? > >I have no idea. There should be a symbol UPTIME or something >in the monitor whose location should contain the count of >ticks since last reload. I have no idea what the code does >with an overflow. Back in the days of 6.01, 6.02, 6.03, HOSS got an SPR from a KA owner complaining about the system being up for something like 48 days, and then crashing because one of the time of day fields overflowed. 22 bits worth of seconds is about 48 days, BTW. I remember folks expressing amazement that they were able to keep their KA running for more that a month. We had a hard time keeping systems up for over 1 shift (8 hours), let alone a day or a week. In any case, we did PM's every week. ###### Message-ID: <3B72BBB2.2C7514E9@srv.net> From: Kevin Handy X-Mailer: Mozilla 4.74 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <3b72bdab_2@news.iglou.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 8 X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Thu, 09 Aug 2001 13:42:59 EDT Organization: WebUseNet Corp. http://corp.webusenet.com - ReInventing the UseNet Date: Thu, 09 Aug 2001 10:34:58 -0600 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!surfnet.nl!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!cyclone.bc.net!cyclone-sjo1.usenetserver.com!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-sjo2.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6420 "Douglas H. Quebbeman" wrote: [snip] > MTBF for the SCOPE 3.2 Operating System (1972) on the CDC 6600 was 9 > hours... [Insert the obvious Windows jokes here...] ###### Reply-To: "Douglas H. Quebbeman" From: "Douglas H. Quebbeman" Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 9 Aug 2001 12:43:22 -0400 Lines: 24 Organization: Full Circle Systems X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.00.3018.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300 NNTP-Posting-Host: 204.250.0.238 X-Original-NNTP-Posting-Host: 204.250.0.238 Message-ID: <3b72bdab_2@news.iglou.com> X-Trace: news.iglou.com 997375403 204.250.0.238 (9 Aug 2001 12:43:23 -0400) X-Authenticated-User: dougq X-Original-NNTP-Posting-Host: 204.250.0.238 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!newshub2.home.com!news.home.com!howland.erols.net!portc.blue.aol.com.MISMATCH!portc03.blue.aol.com!uunet!dca.uu.net!ash.uu.net!news.iglou.com Xref: chonsp.franklin.ch alt.sys.pdp10:6400 wrote in message news:JYfc7.22132$5j7.1966434@e420r-atl1.usenetserver.com... > jmfbahciv@aol.com wrote: > : A nice kind of problem. > : What do you do? Define another word that contains a count of > : overflow conditions? Rats. I bet I have a design bug in > : my USAGE files. Keeping a system up for a year and a month > : was under TW's category of fairies, virgins, and inside straights. > > An uptime of only a year and a month isn't very good. MTBF for the SCOPE 3.2 Operating System (1972) on the CDC 6600 was 9 hours... -dq -- Surgically Excise the Pig-Latin from my e-mail address in order to reply... No Tourbots ###### From: peter@taronga.com (Peter da Silva) Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: 9 Aug 2001 17:00:21 GMT Organization: TSS Inc. Lines: 14 Message-ID: <9kufj5$1kf9$1@citadel.in.taronga.com> References: <9krj3o$pu3$2@bob.news.rcn.net> <6ud766dz01.fsf@chonsp.franklin.ch> <9ktnmh$c11$4@bob.news.rcn.net> NNTP-Posting-Host: citadel.in.taronga.com X-Trace: citadel.in.taronga.com 997376421 53737 10.0.0.43 (9 Aug 2001 17:00:21 GMT) X-Complaints-To: usenet@taronga.com NNTP-Posting-Date: 9 Aug 2001 17:00:21 GMT X-Newsreader: trn 4.0-test72 (19 April 1999) Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!out.nntp.be!propagator-dallas!news-in-dallas.newsfeeds.com!in.nntp.be!news.kjsl.com!news.usenet2.org!citadel.in.taronga.com!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6414 In article <9ktnmh$c11$4@bob.news.rcn.net>, wrote: >So what did they do? The -10s kept track of uptimes; that data >was reported in a number of places. Just zeroing the word >is not a solution. Add a new call to the API to return two words, and trap the old call so that programs that called the API were aborted when they made it, so they could be found, or at least terminated before rollover. -- Rev. Peter da Silva, ULC. WWFD? "Be conservative in what you generate, and liberal in what you accept" -- Matthew 10:16 (l.trans) ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 07:23:30 GMT Organization: UltraNet Communications, Inc. Lines: 29 Message-ID: <9ktnc6$c11$2@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> X-Trace: UmFuZG9tSVbi3CUef6jN+7uQ+3grn56hMVc8XctxI8AzjGcgVLSxprqzSJ21wUR+ X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 10:07:02 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-216-249 Xref: chonsp.franklin.ch alt.sys.pdp10:6404 In article , dittman@dittman.net wrote: >Ric Werme wrote: >: dittman@dittman.net writes: > >:>jmfbahciv@aol.com wrote: >:>: A nice kind of problem. >:>: What do you do? Define another word that contains a count of >:>: overflow conditions? Rats. I bet I have a design bug in >:>: my USAGE files. Keeping a system up for a year and a month >:>: was under TW's category of fairies, virgins, and inside straights. > >:>An uptime of only a year and a month isn't very good. > >: It was in 1975. It was also impossible if you had a Field Circus contract >: and they came to run PM (Provocative Maintenance) every so often. Or >: if you had RCA's prototype PDP-10 memory. That generally wouldn't survive >: the morning.... > >But we're talking about a system that's still running today. And we're talking about software that was designed on hardware of the past. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 07:25:31 GMT Organization: UltraNet Communications, Inc. Lines: 43 Message-ID: <9ktnfv$c11$3@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> X-Trace: UmFuZG9tSVbVOhCUwT1lOP+s16nm0vNDmmfBtRFIg6ibl8r49T24+GfcnSKUdCxO X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 10:09:03 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!feed.news.nacamar.de!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-216-249 Xref: chonsp.franklin.ch alt.sys.pdp10:6409 In article , dittman@dittman.net wrote: >Eric Smith wrote: >: dittman@dittman.net writes: >:> An uptime of only a year and a month isn't very good. > >: jmfbahciv@aol.com wrote: >:> It was in 1975. It was also impossible if you had a Field Circus contract >:> and they came to run PM (Provocative Maintenance) every so often. Or > >: dittman@dittman.net writes: >:> But we're talking about a system that's still running today. > >: No, we're not. It's true that it's running today, but it's not true >: that it's "still running today" (since 1975). We're talking about >: "modern" hardware (TOAD-1) vs. the traditional PDP-10. It's also not >: running the same operating system as a traditional PDP-10, but I'll >: ignore that. > >: The TOAD-1 is built using a lot fewer components (though the components >: are more highly integrated), so the hardware is inherently more reliable >: and requires less maintenance. Thus it is expected that it should have >: longer uptimes than a KL10 or even KS10. > >Exactly, which is why I would have expected this to be fixed >by the vendor (of the TOAD-1, not the original DEC systems) >as a system-specific patch. > >If the FPGA-based PDP-10 clone is completed the OS should be >patched to allow longer uptimes. It requires a design change. And that why I asked the question which you ignored. > ... I'm hoping the clone does >get completed as a full implementation. Sure. And the software design change should be ready before that. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 09 Aug 01 10:46:54 GMT Organization: UltraNet Communications, Inc. Lines: 46 Message-ID: <9ku39h$et7$15@bob.news.rcn.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <9ktnfv$c11$3@bob.news.rcn.net> X-Trace: UmFuZG9tSVa8w7wGj4NGj1PVh0zaUNu8PqFKgDNNnrGdKheF80ddUy8cJFSsqcci X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 9 Aug 2001 13:30:25 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-216-249 Xref: chonsp.franklin.ch alt.sys.pdp10:6410 In article , dittman@dittman.net wrote: >jmfbahciv@aol.com wrote: >:>If the FPGA-based PDP-10 clone is completed the OS should be >:>patched to allow longer uptimes. > >: It requires a design change. And that why I asked the question >: which you ignored. > >Which question was that? It appears to have been chopped out and I'd have to go to my backups to find it. It's the one about how to design the data to accomodate hardware uptimes that are bigger than we allowed. > >:> ... I'm hoping the clone does >:>get completed as a full implementation. > >: Sure. And the software design change should be ready before that. > >Why was the OS designed to die if it was up too long, anyway? Programs that depended on the data being correct would not have any way of knowing that they were getting bad data. A philosophy of both the -10 and -20 was to not lie about numbers that are used to make decisions. It sounds like the -20 people spent some time thinking about it and couldn't decide what to do about it...it sure as hell isn't an extensible piece of data without a lot of screwing around. Crashing is one way to reset the numbers. Remember, it is not a deep dark sin to crash; it is a deep dark sin to mess up users. >Was this a requirement to insure the customers had their systems >PM'ed on a regular schedule, or to avoid counter roll-over? I'd say avoid handing programs bad data. PMs were guaranteed via the service contracts. After we (TOPS-10) shipped SMP, PMs didn't require a shutdown either. /BAH Subtract a hundred and four for e-mail. ###### Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> From: Ric Werme X-Newsreader: NN version 6.5.0 CURRENT #119 Lines: 21 Message-ID: Date: Wed, 08 Aug 2001 19:48:29 GMT NNTP-Posting-Host: 24.168.185.9 X-Complaints-To: abuse@mediaone.net X-Trace: typhoon.ne.mediaone.net 997300109 24.168.185.9 (Wed, 08 Aug 2001 15:48:29 EDT) NNTP-Posting-Date: Wed, 08 Aug 2001 15:48:29 EDT Organization: Road Runner Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!kanja.arnes.si!news-hub.siol.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!rcn!chnws02.mediaone.net!chnws06.ne.mediaone.net!24.128.8.70!typhoon.ne.mediaone.net.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6423 dittman@dittman.net writes: >jmfbahciv@aol.com wrote: >: A nice kind of problem. >: What do you do? Define another word that contains a count of >: overflow conditions? Rats. I bet I have a design bug in >: my USAGE files. Keeping a system up for a year and a month >: was under TW's category of fairies, virgins, and inside straights. >An uptime of only a year and a month isn't very good. It was in 1975. It was also impossible if you had a Field Circus contract and they came to run PM (Provocative Maintenance) every so often. Or if you had RCA's prototype PDP-10 memory. That generally wouldn't survive the morning.... I remember Ed Taft keeping PARC-10's TENEX up for a month or so. I accused him of showing off. :-) -- Ric Werme | werme@nospam.mediaone.net http://people.ne.mediaone.net/werme | ^^^^^^^ delete ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.3-12 (i686)) Lines: 21 Message-ID: X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Wed, 08 Aug 2001 16:27:18 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Wed, 08 Aug 2001 20:27:18 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!kanja.arnes.si!news-hub.siol.net!feed.cgocable.net!out.nntp.be!propagator-dallas!news-in-dallas.newsfeeds.com!in.nntp.be!easynews!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl3.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6417 Ric Werme wrote: : dittman@dittman.net writes: :>jmfbahciv@aol.com wrote: :>: A nice kind of problem. :>: What do you do? Define another word that contains a count of :>: overflow conditions? Rats. I bet I have a design bug in :>: my USAGE files. Keeping a system up for a year and a month :>: was under TW's category of fairies, virgins, and inside straights. :>An uptime of only a year and a month isn't very good. : It was in 1975. It was also impossible if you had a Field Circus contract : and they came to run PM (Provocative Maintenance) every so often. Or : if you had RCA's prototype PDP-10 memory. That generally wouldn't survive : the morning.... But we're talking about a system that's still running today. -- Eric Dittman dittman@dittman.net ###### Sender: eric@ruckus.brouhaha.com From: Eric Smith Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> Organization: Eric Conspiracy Secret Labs X-Eric-Conspiracy: There is no conspiracy. Date: 08 Aug 2001 14:24:04 -0700 Message-ID: Lines: 33 User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii NNTP-Posting-Host: ruckus.brouhaha.com X-Trace: 8 Aug 2001 14:31:48 -0700, ruckus.brouhaha.com Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!enews.sgi.com!news.spies.com!ruckus.brouhaha.com Xref: chonsp.franklin.ch alt.sys.pdp10:6415 dittman@dittman.net writes: > An uptime of only a year and a month isn't very good. jmfbahciv@aol.com wrote: > It was in 1975. It was also impossible if you had a Field Circus contract > and they came to run PM (Provocative Maintenance) every so often. Or dittman@dittman.net writes: > But we're talking about a system that's still running today. No, we're not. It's true that it's running today, but it's not true that it's "still running today" (since 1975). We're talking about "modern" hardware (TOAD-1) vs. the traditional PDP-10. It's also not running the same operating system as a traditional PDP-10, but I'll ignore that. The TOAD-1 is built using a lot fewer components (though the components are more highly integrated), so the hardware is inherently more reliable and requires less maintenance. Thus it is expected that it should have longer uptimes than a KL10 or even KS10. On the other hand, the TOAD-1 was a very low-volume product, so the hardware can't be expected to be as reliable as some modern high-volume systems. Not all high-volume systems are reliable, but if the vendor attempts to engineer a reliable product, it's more likely. It's also quite possible to build absolute crap in high volumes. But low-volume systems even with very good engineering tend not to reach the same reliability as the best high-volume systems. High-volume manufacturing can provide more feedback as to the nature of defects, and result in ECOs to improve quality. ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.3-12 (i686)) Lines: 32 Message-ID: X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Wed, 08 Aug 2001 21:31:54 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Thu, 09 Aug 2001 01:31:54 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!newsfeed-zh.ip-plus.net!news.ip-plus.net!news.tesion.net!news.belwue.de!news.uni-stuttgart.de!uni-erlangen.de!newsfeeds.belnet.be!news.belnet.be!news.tele.dk!small.news.tele.dk!64.152.100.70!cyclone-sjo1.usenetserver.com!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl1.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6428 Eric Smith wrote: : dittman@dittman.net writes: :> An uptime of only a year and a month isn't very good. : jmfbahciv@aol.com wrote: :> It was in 1975. It was also impossible if you had a Field Circus contract :> and they came to run PM (Provocative Maintenance) every so often. Or : dittman@dittman.net writes: :> But we're talking about a system that's still running today. : No, we're not. It's true that it's running today, but it's not true : that it's "still running today" (since 1975). We're talking about : "modern" hardware (TOAD-1) vs. the traditional PDP-10. It's also not : running the same operating system as a traditional PDP-10, but I'll : ignore that. : The TOAD-1 is built using a lot fewer components (though the components : are more highly integrated), so the hardware is inherently more reliable : and requires less maintenance. Thus it is expected that it should have : longer uptimes than a KL10 or even KS10. Exactly, which is why I would have expected this to be fixed by the vendor (of the TOAD-1, not the original DEC systems) as a system-specific patch. If the FPGA-based PDP-10 clone is completed the OS should be patched to allow longer uptimes. I'm hoping the clone does get completed as a full implementation. -- Eric Dittman dittman@dittman.net ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <9ktnfv$c11$3@bob.news.rcn.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.3-12 (i686)) Lines: 49 Message-ID: X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Thu, 09 Aug 2001 07:13:27 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Thu, 09 Aug 2001 11:13:27 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!nntp.flash.net!easynews!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl3.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6419 jmfbahciv@aol.com wrote: : In article , : dittman@dittman.net wrote: :>Eric Smith wrote: :>: dittman@dittman.net writes: :>:> An uptime of only a year and a month isn't very good. :> :>: jmfbahciv@aol.com wrote: :>:> It was in 1975. It was also impossible if you had a Field Circus : contract :>:> and they came to run PM (Provocative Maintenance) every so often. Or :> :>: dittman@dittman.net writes: :>:> But we're talking about a system that's still running today. :> :>: No, we're not. It's true that it's running today, but it's not true :>: that it's "still running today" (since 1975). We're talking about :>: "modern" hardware (TOAD-1) vs. the traditional PDP-10. It's also not :>: running the same operating system as a traditional PDP-10, but I'll :>: ignore that. :> :>: The TOAD-1 is built using a lot fewer components (though the components :>: are more highly integrated), so the hardware is inherently more reliable :>: and requires less maintenance. Thus it is expected that it should have :>: longer uptimes than a KL10 or even KS10. :> :>Exactly, which is why I would have expected this to be fixed :>by the vendor (of the TOAD-1, not the original DEC systems) :>as a system-specific patch. :> :>If the FPGA-based PDP-10 clone is completed the OS should be :>patched to allow longer uptimes. : It requires a design change. And that why I asked the question : which you ignored. Which question was that? :> ... I'm hoping the clone does :>get completed as a full implementation. : Sure. And the software design change should be ready before that. Why was the OS designed to die if it was up too long, anyway? Was this a requirement to insure the customers had their systems PM'ed on a regular schedule, or to avoid counter roll-over? -- Eric Dittman dittman@dittman.net ###### From: Arthur Krewat Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Organization: Kilonet.net Lines: 25 Message-ID: <3B7290E8.235BFB1C@bartek.dontspamme.net> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <9ktnfv$c11$3@bob.news.rcn.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 4.7 [en] (X11; I; SunOS 5.8 i86pc) X-Accept-Language: en Date: Thu, 09 Aug 2001 13:37:29 GMT NNTP-Posting-Host: 24.186.100.134 X-Trace: news02.optonline.net 997364249 24.186.100.134 (Thu, 09 Aug 2001 09:37:29 EDT) NNTP-Posting-Date: Thu, 09 Aug 2001 09:37:29 EDT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!feed.cgocable.net!cyclone2.usenetserver.com!usenetserver.com!news01.optonline.net!news02.optonline.net.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6413 dittman@dittman.net wrote: > > :> ... I'm hoping the clone does > :>get completed as a full implementation. > > : Sure. And the software design change should be ready before that. > > Why was the OS designed to die if it was up too long, anyway? > Was this a requirement to insure the customers had their systems > PM'ed on a regular schedule, or to avoid counter roll-over? I've been looking at TOPS-10 7.03 and it doesn't care, plus it should stay up for at least 18 years before the uptime counter rolls over ... it runs on "ticks", 60 per second. Using a 36-bit signed word, the max would be 18.12 years. But, then, it doesn't look like TOPS-10 would care about it wrapping around, little things like SYSTAT's uptime would be screwy though, but it looks like the system would keep running forever. The UP2LNG bughalt comes from a TOPS-20 system. Not TOPS-10. I have a feeling TOPS-20 stores uptime in milliseconds, assuming a 60/1000 ratio, that means the max uptime in a 36-bit signed word is 1.09 years. Pretty close to when the TD-1 bughalt'ed. aak ###### Reply-To: "Douglas H. Quebbeman" From: "Douglas H. Quebbeman" Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Thu, 9 Aug 2001 12:45:33 -0400 Lines: 26 Organization: Full Circle Systems X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.00.3018.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300 NNTP-Posting-Host: 204.250.0.238 X-Original-NNTP-Posting-Host: 204.250.0.238 Message-ID: <3b72be2f_1@news.iglou.com> X-Trace: news.iglou.com 997375535 204.250.0.238 (9 Aug 2001 12:45:35 -0400) X-Authenticated-User: dougq X-Original-NNTP-Posting-Host: 204.250.0.238 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!fr.clara.net!heighliner.fr.clara.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!howland.erols.net!portc.blue.aol.com.MISMATCH!portc03.blue.aol.com!uunet!dca.uu.net!ash.uu.net!news.iglou.com Xref: chonsp.franklin.ch alt.sys.pdp10:6399 "Eric Smith" wrote in message news:qhzo9ai6tn.fsf@ruckus.brouhaha.com... > > The TOAD-1 is built using a lot fewer components (though the components > are more highly integrated), so the hardware is inherently more reliable > and requires less maintenance. Thus it is expected that it should have > longer uptimes than a KL10 or even KS10. > > On the other hand, the TOAD-1 was a very low-volume product, so the > hardware can't be expected to be as reliable as some modern high-volume > systems. Clearly YMMV, but I associate low-volume products with a *much* higher Q than high-volume products... from computers to toasters. Regards, -doug q -- Surgically Excise the Pig-Latin from my e-mail address in order to reply... No Tourbots ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <9ktnfv$c11$3@bob.news.rcn.net> <9ku39h$et7$15@bob.news.rcn.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.3-12 (i686)) Lines: 63 Message-ID: X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Thu, 09 Aug 2001 13:36:13 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Thu, 09 Aug 2001 17:36:13 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!newsfeed.frii.net!easynews!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl3.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6424 jmfbahciv@aol.com wrote: : In article , : dittman@dittman.net wrote: :>jmfbahciv@aol.com wrote: : :>:>If the FPGA-based PDP-10 clone is completed the OS should be :>:>patched to allow longer uptimes. :> :>: It requires a design change. And that why I asked the question :>: which you ignored. :> :>Which question was that? : It appears to have been chopped out and I'd have to go to my : backups to find it. It's the one about how to design the data : to accomodate hardware uptimes that are bigger than we allowed. I didn't see that question. If a hardware counter is the issue I guess the only solution would be to implement a software counter that handled the overflow if possible. Not a very good solution, if even possible. An extension in hardware would be better. The FPGA clone of the PDP-8 (the PDP-8/X) does this, and presents itself as a different model (and requires a couple of drivers to be updated). :>:> ... I'm hoping the clone does :>:>get completed as a full implementation. :> :>: Sure. And the software design change should be ready before that. :> :>Why was the OS designed to die if it was up too long, anyway? : Programs that depended on the data being correct would not : have any way of knowing that they were getting bad data. A : philosophy of both the -10 and -20 was to not lie about : numbers that are used to make decisions. It sounds like the : -20 people spent some time thinking about it and couldn't : decide what to do about it...it sure as hell isn't an : extensible piece of data without a lot of screwing around. : Crashing is one way to reset the numbers. Remember, it is : not a deep dark sin to crash; it is a deep dark sin to mess : up users. I agree that crashing is better than presenting corrupted data to users (which is why non-ECC memory so common in PCs is such a bad idea), but even better would be to give a warning that an operating system reboot was going to be automatically performed in xx number of hours (or days) and then reboot at that time. :>Was this a requirement to insure the customers had their systems :>PM'ed on a regular schedule, or to avoid counter roll-over? : I'd say avoid handing programs bad data. PMs were guaranteed via : the service contracts. After we (TOPS-10) shipped SMP, PMs didn't : require a shutdown either. Did the SMP systems allow each CPU to be rebooted to avoid counter rollover? -- Eric Dittman dittman@dittman.net ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <9ktnfv$c11$3@bob.news.rcn.net> <3B7290E8.235BFB1C@bartek.dontspamme.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.3-12 (i686)) Lines: 25 Message-ID: <8Uzc7.2514$Nz2.32172@e420r-atl3.usenetserver.com> X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Thu, 09 Aug 2001 13:38:12 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Thu, 09 Aug 2001 17:38:12 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!colt.net!news.maxwell.syr.edu!newsfeed.frii.net!cyclone-sjo1.usenetserver.com!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl3.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6421 Arthur Krewat wrote: : dittman@dittman.net wrote: :> :> :> ... I'm hoping the clone does :> :>get completed as a full implementation. :> :> : Sure. And the software design change should be ready before that. :> :> Why was the OS designed to die if it was up too long, anyway? :> Was this a requirement to insure the customers had their systems :> PM'ed on a regular schedule, or to avoid counter roll-over? : I've been looking at TOPS-10 7.03 and it doesn't care, plus it : should stay up for at least 18 years before the uptime counter : rolls over ... it runs on "ticks", 60 per second. Using a 36-bit : signed word, the max would be 18.12 years. But, then, it doesn't : look like TOPS-10 would care about it wrapping around, little : things like SYSTAT's uptime would be screwy though, but it looks : like the system would keep running forever. As BAH pointed out, presenting screwing numbers to the users is not a good idea. -- Eric Dittman dittman@dittman.net ###### Path: chonsp.franklin.ch!not-for-mail From: Neil Franklin Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: 09 Aug 2001 21:44:16 +0200 Organization: My own Private Self Lines: 48 Message-ID: <6un159uigf.fsf@chonsp.franklin.ch> References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <6ud766dz01.fsf@chonsp.franklin.ch> <9ktnmh$c11$4@bob.news.rcn.net> NNTP-Posting-Host: chonsp.franklin.ch X-Trace: chonsp.franklin.ch 997386256 370 10.0.3.2 (9 Aug 2001 19:44:16 GMT) X-Complaints-To: news@chonsp.franklin.ch NNTP-Posting-Date: 9 Aug 2001 19:44:16 GMT X-Newsreader: Gnus v5.7/Emacs 20.4 Xref: chonsp.franklin.ch alt.sys.pdp10:6429 jmfbahciv@aol.com writes: > In article <6ud766dz01.fsf@chonsp.franklin.ch>, > Neil Franklin wrote: > >jmfbahciv@aol.com writes: > > > >> In article <3B7140A7.1E914AEA@bartek.dontspamme.net>, > >> Arthur Krewat wrote: > >> >Rich Alderson wrote: > >> >> > >> >> The following is a portion of the console log > >> >>of Mathom.XKL.COM, wherein the > >> >> very first ever recorded UP2LNG BUGHLT occurred early today: > > > >A built in max time suicide? [user shaken] > > > >Linux in its old days (1.0.x kernel) had an 2^32*0.01s = 497 days > >suicide in it. Once it was experienced by someone there was a clamour > >on the kernel developers mailing list and this bug was fixed 3 days > >later. I have since heard of 5 year uptimes. > > So what did they do? Increaded counter width to 64bit. And modifed every piece of kernel code that referended it. As for user land code, I do not know what they did. Most likely also recompile, as Linux is all source available. > The -10s kept track of uptimes; that data > was reported in a number of places. Just zeroing the word > is not a solution. I suppose that would require some periodic test for when it crossed 50% (set a flag) and then when zero reappears and flag is set (rest flag and increment software-only second word. Of course if the overflow created an trap, then simply have the trap handler increment the second word. Linux runs (ran then) on Intel x86 which has no hardware counter. The entire counter was an software affair driven by an 100 times per second timer interrupt. -- Neil Franklin, neil@franklin.ch.remove http://neil.franklin.ch/ Hacker, Unix Guru, El Eng HTL/BSc, Sysadmin, Archer, Roleplayer - Intellectual Property is Intellectual Robbery ###### Sender: eric@ruckus.brouhaha.com From: Eric Smith Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! References: <3B7140A7.1E914AEA@bartek.dontspamme.net> <9krj3o$pu3$2@bob.news.rcn.net> <3b72be2f_1@news.iglou.com> Organization: Eric Conspiracy Secret Labs X-Eric-Conspiracy: There is no conspiracy. Date: 09 Aug 2001 12:55:05 -0700 Message-ID: Lines: 29 User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii NNTP-Posting-Host: ruckus.brouhaha.com X-Trace: 9 Aug 2001 13:03:00 -0700, ruckus.brouhaha.com Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!fr.clara.net!heighliner.fr.clara.net!news.tele.dk!small.news.tele.dk!216.218.192.242!news.he.net!news.kjsl.com!news.spies.com!ruckus.brouhaha.com Xref: chonsp.franklin.ch alt.sys.pdp10:6435 "Douglas H. Quebbeman" writes: > Clearly YMMV, but I associate low-volume products with a *much* > higher Q than high-volume products... from computers to toasters. You excised the part of my posting where I explained why high-volume items *can* be more reliable, but that often they are not. It depends on whether the manufacturer *cares* about quality. When you manufacture in high volume, you can get a lot more feedback (both internally and from the field) as to what reliablity problems there are, and you can ECO them out. In low-volume manufacturing, it is much harder to do that. If a single unit has a particular problem, it is often just fixed in that unit, rather than causing a design change. However, when high-volume items are commodities, margins are low, and consumers have low expectations, the quality tends to suffer. The manufacturer only has incentive to make it barely good enough that they don't get too many warranty returns. For products like that, it often becomes impossible to buy a better quality product at any price. This is known as a "triumph of crapitalism". For example, there is almost no new manufacture of gold CD-Rs any more. Kodak dropped out, and there are signs that the few other remaining vendors will do so as well. I don't know how much more they cost to manufacture, but gold CD-Rs from reputable vendors sold for 2x to 3x the cost of silver CD-Rs. Apparently, though, not enough people care about the longevity of their data, so it is no longer cost-effective for the vendors to manufacture them. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Sun, 12 Aug 01 08:20:10 GMT Organization: UltraNet Communications, Inc. Lines: 20 Message-ID: <9l5nrc$sh5$1@bob.news.rcn.net> References: <9krj3o$pu3$2@bob.news.rcn.net> <6ud766dz01.fsf@chonsp.franklin.ch> <9ktnmh$c11$4@bob.news.rcn.net> <9kufj5$1kf9$1@citadel.in.taronga.com> X-Trace: UmFuZG9tSVZuUCq9G/HKvZtGNgFpuDNZAiy530vIkEY3U7gzgoioG/OTYBUZ5ppj X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 12 Aug 2001 11:04:12 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!newscore.gigabell.net!news0.de.colt.net!colt.net!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!209-122-255-43 Xref: chonsp.franklin.ch alt.sys.pdp10:6443 In article <9kufj5$1kf9$1@citadel.in.taronga.com>, peter@taronga.com (Peter da Silva) wrote: >In article <9ktnmh$c11$4@bob.news.rcn.net>, wrote: >>So what did they do? The -10s kept track of uptimes; that data >>was reported in a number of places. Just zeroing the word >>is not a solution. > >Add a new call to the API to return two words, and trap the old call so >that programs that called the API were aborted when they made it, so >they could be found, or at least terminated before rollover. > I think I'd also schedule a crash for the first release. That would give people enough time to find, change, and test programs that depended on that data. I think I'd give the sysadmin a choice of keeping the old stuff with a crash option, too (if there were data bases too messy to change). /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Sun, 12 Aug 01 08:22:59 GMT Organization: UltraNet Communications, Inc. Lines: 46 Message-ID: <9l5o0k$sh5$2@bob.news.rcn.net> References: <9ktnfv$c11$3@bob.news.rcn.net> <9ku39h$et7$15@bob.news.rcn.net> <9l34pn$noi$1@bob.news.rcn.net> X-Trace: UmFuZG9tSVaPVknvRyDWpHWxmy79/fAgcqRZZHU/3SMVfbCrL7HrZTDZYjrKsdvu X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 12 Aug 2001 11:07:00 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!209-122-255-43 Xref: chonsp.franklin.ch alt.sys.pdp10:6441 In article , dittman@dittman.net wrote: > >:>I agree that crashing is better than presenting corrupted data >:>to users (which is why non-ECC memory so common in PCs is such >:>a bad idea), but even better would be to give a warning that an >:>operating system reboot was going to be automatically performed >:>in xx number of hours (or days) and then reboot at that time. > >: KSYS. That wouldn't be difficult to do if the system was running >: GALAXY. I'd vote for days because there are programs out there >: that may run for 30 days or more. That way the people interested >: in those runs have time to do their hissy fits and think about >: solutions. A year+ of uptime limitation is real easy to forget. >: Especially if the runs are done by grad students who are still >: wet behind their ears. > >You're right, hours or a couple of days wouldn't be enough, along >with warning them not to expect extended uptimes. That way they >could write the programs to checkpoint their data for a restart >later. Or plan their run accordingly. I think there are calculations out there that last that long. I'm not sure why it can't be checkpointed but I'm told there are some that can't. > >:>Did the SMP systems allow each CPU to be rebooted to avoid counter >:>rollover? > >: We never thought about counter roll over. Our emphasis tended >: to be MTBF...the hardware was pretty much guaranteed to crap out. >: However, note that a counter of CPU uptime is not the same as >: a counter of system uptime when you're running an SMP timesharing >: system. > >It sounds like system uptime was software, so that shouldn't be >too difficult to extend. Sure, it's easy to extend. The problem is old data bases and software whose sources have managed to go to that bit map in the sky. /BAH Subtract a hundred and four for e-mail. ###### From: jmfbahciv@aol.com Newsgroups: alt.sys.pdp10 Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Date: Sat, 11 Aug 01 08:42:57 GMT Organization: UltraNet Communications, Inc. Lines: 50 Message-ID: <9l34pn$noi$1@bob.news.rcn.net> References: <9ktnfv$c11$3@bob.news.rcn.net> <9ku39h$et7$15@bob.news.rcn.net> X-Trace: UmFuZG9tSVbWdQ76X29Nfft9WGrimzMHKhhCH0ea90/TqN6xKnRqqMJ4a3przt1F X-Complaints-To: abuse@rcn.com NNTP-Posting-Date: 11 Aug 2001 11:26:47 GMT X-Newsreader: News Xpress Version 1.0 Beta #4 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!newsfeed-zh.ip-plus.net!news.ip-plus.net!news.tesion.net!news.belwue.de!news.uni-stuttgart.de!uni-erlangen.de!newsfeeds.belnet.be!news.belnet.be!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!feed1.news.rcn.net!rcn!207-172-255-45 Xref: chonsp.franklin.ch alt.sys.pdp10:6444 In article , dittman@dittman.net wrote: >jmfbahciv@aol.com wrote: >: Programs that depended on the data being correct would not >: have any way of knowing that they were getting bad data. A >: philosophy of both the -10 and -20 was to not lie about >: numbers that are used to make decisions. It sounds like the >: -20 people spent some time thinking about it and couldn't >: decide what to do about it...it sure as hell isn't an >: extensible piece of data without a lot of screwing around. >: Crashing is one way to reset the numbers. Remember, it is >: not a deep dark sin to crash; it is a deep dark sin to mess >: up users. > >I agree that crashing is better than presenting corrupted data >to users (which is why non-ECC memory so common in PCs is such >a bad idea), but even better would be to give a warning that an >operating system reboot was going to be automatically performed >in xx number of hours (or days) and then reboot at that time. KSYS. That wouldn't be difficult to do if the system was running GALAXY. I'd vote for days because there are programs out there that may run for 30 days or more. That way the people interested in those runs have time to do their hissy fits and think about solutions. A year+ of uptime limitation is real easy to forget. Especially if the runs are done by grad students who are still wet behind their ears. > >:>Was this a requirement to insure the customers had their systems >:>PM'ed on a regular schedule, or to avoid counter roll-over? > >: I'd say avoid handing programs bad data. PMs were guaranteed via >: the service contracts. After we (TOPS-10) shipped SMP, PMs didn't >: require a shutdown either. > >Did the SMP systems allow each CPU to be rebooted to avoid counter >rollover? We never thought about counter roll over. Our emphasis tended to be MTBF...the hardware was pretty much guaranteed to crap out. However, note that a counter of CPU uptime is not the same as a counter of system uptime when you're running an SMP timesharing system. /BAH Subtract a hundred and four for e-mail. ###### Sender: Eric Dittman From: dittman@dittman.net Subject: Re: Historic Event: UP2LNG BUGHLT on Mathom.XKL.COM! Newsgroups: alt.sys.pdp10 References: <9ktnfv$c11$3@bob.news.rcn.net> <9ku39h$et7$15@bob.news.rcn.net> <9l34pn$noi$1@bob.news.rcn.net> User-Agent: tin/1.5.1-20000103 ("Sumerland") (UNIX) (Linux/2.4.6 (i686)) Lines: 35 Message-ID: X-Complaints-To: abuse@usenetserver.com X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly. NNTP-Posting-Date: Sat, 11 Aug 2001 11:54:46 EDT Organization: WebUseNet Corp. - "ReInventing The UseNet" Date: Sat, 11 Aug 2001 15:54:46 GMT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!kanja.arnes.si!news-hub.siol.net!feed.cgocable.net!out.nntp.be!propagator-dallas!news-in-dallas.newsfeeds.com!in.nntp.be!newsfeed.frii.net!easynews!e420r-sjo4.usenetserver.com!usenetserver.com!e420r-atl2.usenetserver.com.POSTED!not-for-mail Xref: chonsp.franklin.ch alt.sys.pdp10:6473 :>I agree that crashing is better than presenting corrupted data :>to users (which is why non-ECC memory so common in PCs is such :>a bad idea), but even better would be to give a warning that an :>operating system reboot was going to be automatically performed :>in xx number of hours (or days) and then reboot at that time. : KSYS. That wouldn't be difficult to do if the system was running : GALAXY. I'd vote for days because there are programs out there : that may run for 30 days or more. That way the people interested : in those runs have time to do their hissy fits and think about : solutions. A year+ of uptime limitation is real easy to forget. : Especially if the runs are done by grad students who are still : wet behind their ears. You're right, hours or a couple of days wouldn't be enough, along with warning them not to expect extended uptimes. That way they could write the programs to checkpoint their data for a restart later. :>Did the SMP systems allow each CPU to be rebooted to avoid counter :>rollover? : We never thought about counter roll over. Our emphasis tended : to be MTBF...the hardware was pretty much guaranteed to crap out. : However, note that a counter of CPU uptime is not the same as : a counter of system uptime when you're running an SMP timesharing : system. It sounds like system uptime was software, so that shouldn't be too difficult to extend. -- Eric Dittman dittman@dittman.net School Zones: Man's attempt to thwart natural selection.