Events:

1997-12-23 at 12:00 [xxx (strindberg)]
PDC: The Strindberg shop will close down from January 15 to 21. During this period, availability of all PDC systems will vary from limited to none due to hardware maintenance and electrical work.
1997-12-10 at 12:00 [xxx (strindberg)]
Strindberg: batch line enabled.
1997-12-08 at 21:00 [xxx (strindberg)]
Strindberg: We will enable batch lines again but reserve the afternoon of tomorrow, 1997-12-09, and will probably use the Wednesday service window for more fixes and testing. Note: This will probably involve several reboots of the log in node.
1997-12-08 at 18:00 [xxx (strindberg)]
Strindberg: Batch processing stopped because of software maintainance and bug fixes in the parallel environment. Strindberg will be accessible for program development but no batch runs will be possible.
1997-12-08 at 14:20 [xxx (strindberg)]
Strindberg: Hanging NFS file systems. Node allocation temporary stopped. Plan to be back in service: 16:00.
1997-12-04 at 10:20
On December 18th, the KALLSUP system will be brought down for file system reorganizations in conjunction with the installation of a new disk system from MAXSTRAT. The estimated down time for this operation is two days.
1997-12-02 at 14:30 [xxx (strindberg)]
Strindberg: The switch is restarted and new jobs are now allowed to start.
1997-12-02 at 13:40
Selma: Disk errors. Checking file system integrity. Up again at 1600 MET.
1997-12-02 at 12:30 [xxx (strindberg)]
Strindberg: A switch fault caused all running jobs to crash.
1997-11-27 at 14:20 [xxx (strindberg)]
Strindberg: Problems to connect to strindberg. Resolved.
1997-11-27 at 14:05 [xxx (strindberg)]
Strindberg: Problems to connect to strindberg. Investigation in progress.
1997-11-26 at 16:50
Networks: We have had unstable local networks during the past hours.
1997-11-26 at 09:45 [xxx (strindberg)]
Strindberg: Piofs up and running again. Node allocation is resumed.
1997-11-26 at 08:41 [xxx (strindberg)]
Strindberg: Piofs had problems starting at around 02:00 this morning. Due to this node allocation has stopped. Investigation is in progress.
1997-11-25 at 23:40 [xxx (strindberg)]
Strindberg: Node allocation is resumed.
1997-11-25 at 22:20 [xxx (strindberg)]
Strindberg: Node allocation is paused. Simply put, there have been problems with one server answering `no' to authentication questions though things were OK.
1997-11-24 at 10:37 [xxx (strindberg)]
Strindberg: You are welcome to log in to the upgraded Strindberg!
1997-11-19 at 12:00
Selma: From 1200 to 1600 Selma will be unavailable because of regular hardware maintainance.
1997-11-13 at 12:08 [xxx (strindberg)]
Strindberg: Around 1600 this afternoon we will turn off several major fileservers for disk reconfiguration. This means that many user's files will not be accesible until tomorrow morning.
1997-11-12 at 14:43 [xxx (strindberg)]
Strindberg: Since the upgrade has gone faster than expected we will now start the most dramatic stage of the whole procedure: switch replacement. More news to come.
1997-11-06 at 10:41 [xxx (strindberg)]
Strindberg: At 08.00 1997-11-13 a gradual upgrade of Strindberg will start. We expect to be back in full production 17.00 1997-11-24. The upgrade includes a change of hardware and software.
1997-11-03 at 09:00
Kallsup: The machine will be brought down for hardware maintenance. The system is expected to back online again at 12.
1997-10-29 at 18:15 [xxx (strindberg)]
Strindberg: Today a large amount of old stalled mail from strindberg/easy was found and released. For those of you who fancy archeology and received some pieces, please take part in the excavation.
1997-10-21 at 04:00
AFS: one fileserver down, node allocation is paused until repair is finished.
1997-10-20 at 09:00 - 12:00
Kallsup: Kallsup will be rebooted to reconfigure file systems. Probably kallsup will be back into service earlier.
1997-10-13 at 08:50 [xxx (strindberg)]
Strindberg: Node allocation file recreated. Node allocaton restarted.
1997-10-13 at 03:42 [xxx (strindberg)]
Strindberg: Node allocation file lost. Node allocaton stopped.
1997-10-02 at 16:45
Selma: Operating system crash. Dump and reboot in progress.
1997-09-26 at 09:00
Check out our new System Usage page where all running and queued jobs are listed.
1997-09-22 at 09:00 [xxx (strindberg)]
Strindberg: unstable node-status-determination. Allocation Paused during fault search.
1997-09-07 at 18:00 [xxx (strindberg)]
Strindberg: A network adapter problem (HIPPI) caused a hang of the login node (strindberg.pdc.kth.se). Recovery in progress...
1997-09-02 at 19:00 [xxx (strindberg)]
Strindberg/AFS/networks: service window tomorrow, Wednesday 1997-09-03 between 13:00 and 15:00.
1997-08-31 at 11:00
Kerberos: master (admin) server down - you cannot change passwd until it's back.
1997-08-22 at 11:00
Selma: /scratch was cleaned of all data older than 1 week. Older data might be retrieved from /test until 1997-08-29.
1997-08-21 at 18:10
Selma: More disk will be made available 97-08-22 sometime between 09:30 and 11:00. This will include a reboot. The NQS system will be stopped during the upgrade.
1997-08-21 at 08:30
Kallsup: One IOP seems to be broken. Recovery and reconfiguration to manage without it in progress.
1997-08-15 at 14:30
Ongoing upgrade of kerberos authentification programs. Affected programs: rxtelnet, kx and depending on your version, ftp. Other programs should work just fine. To upgrade your binaries, fetch a new travelkit.tar. See http://www.pdc.kth.se/support/kerberos-tour.html for guidelines concerning your operation system.
1997-08-15 at 11:30
General: UPS exercise complete.
1997-08-15 at 07:28
General: UPS (pdc power supply) service scheduled to start today at 0900 hours. This is considered a low risk operation, but to play it safe we have now drained the whole machine of jobs. Hope to be back again int the early afternoon.
1997-08-12 at 01:45
KALLSUP: back online
1997-08-12 at 01:15
KALLSUP: disk problem. Reboot and recovery in progress. Running jobs withoutcheckpoint files were lost.
1997-08-08 at 12:00 [xxx (strindberg)]
General: UPS (pdc power supply) service scheduled to start 1997-08-15 at 0900 hours. This is considered a low risk operation. To play it safe we will hold Strindberg batch-lines and keep afs-servers off-line during service anyhow.
1997-08-08 at 08:00
One AFS fileserver began to have problems early in the morning (round 02:00.)
1997-08-06 at 19:30
The CrayDoc webserver has found a permanent home at http://craydoc.pdc.kth.se:8080.
1997-08-06 at 18:00
The CrayDoc webserver is moving around a little due to network activities. Look at http://www.pdc.kth.se/kallsup to find its present whereabouts.
1997-08-03 at 09:00 [xxx (strindberg)]
Strindberg/Info - changed job priority on three jobs. They will run out of ordinary queue-order.
1997-08-01 at 14:00
Mail back to normal.
1997-08-01 at 12:30
Temporary mail problems may cause email to lists like "pdc-staff@pdc.kth.se" and "sp2-staff@pdc.kth.se" to bounce. Please use "pdc-staff@nada.kth.se" instead until we've solved the problem.
1997-07-29 at 18:20 [xxx (strindberg)]
Strindberg/Piofs, formatted and clean. Please note that the the path of your pfs-catalogue is /pfs/home/f/foo as it has been since May 1997.
1997-07-29 at 17:05 [xxx (strindberg)]
Strindberg, log in node (syk-0606) reboot.
1997-07-29 at 16:45 [xxx (strindberg)]
Strindberg/Piofs, replacement installed and included in the config of one of the fileserver nodes. To do: Eunfencing and formatting of the /pfs.
1997-07-29 at 13:30
Strinderg/Piofs, a replacement disk is on its way.
1997-07-29 at 12:00 [xxx (strindberg)]
Strindberg: Problems with the parallel filesystem. Fault search is in progress. Allocation is stopped.
1997-07-29 at 11:25 [xxx (strindberg)]
Strindberg: Restart of the parallel filesystem servers.
1997-07-24 at 14:00
Networks: there was a network dropout between the SP and some file-servers.
1997-07-23 at 08:00 [xxx (strindberg)]
Strindberg: hardware failure on one node. Switch did restart by itself.
1997-07-21 at 09:50
The mail should be working again. The work to move things from dolphin (which has a bad disk) is in progress. This should not affect most users.
1997-07-21 at 08:40
The AFS server dolphin is currently down, this has unfortunately affected the pdc-staff mail alias. Work is in progress.
1997-07-15 at 22:45
The AFS servers at nada.kth.se are back in business.
1997-07-15 at 18:30
In case you still rely on nada.kth.se you will have serious problems right now - the main computer room is being rearranged.
1997-07-07 at 09:35
HSM system back online.
1997-07-07 at 08:00
Upgrade of HSM system software and database conversions in progress. The HSM system will be back again during the day.
1997-07-03 at 17:59
Kallsup up again. Even checkpointed NQS jobs may have been lost.
1997-07-03 at 17:17
Kallsup: We have problems with hung disk controllers. The system will be rebooted.
1997-07-01 at 08:00
About to restart one fileserver.
1997-06-24 at 18:10
Selma back up. Please report any strange behaviour which might be due to new software. ("This worked last week...")
1997-06-24 at 09:00
Selma will be brought down for hardware upgrade. Selma will be up again in the evening.
1997-06-20
Midsummer holiday, the helpdesk will not be open during Friday.
1997-06-16 at 10:30 [xxx (strindberg)]
Reboot of log in node (syk-0606/strindberg).
1997-06-10 at 08:00
Mail: one of the main mail-servers beneath kth.se have a diskcrash. Expect delayed mail.
1997-06-05 at 11:00 [xxx (strindberg)]
Strindberg: JobManager (JM) restarted as a consequence of control-work-station crash.
1997-06-05 at 09:00 [xxx (strindberg)]
Strindberg: Peculiar crash of the control-work-station. Scheduler allocation stopped during fault recovery.
1997-06-02 at 09:20
KALLSUP: On Monday 970616, KALLSUP will be broght down for a minor hardware upgrade (more SCSI adapters). The upgrade is estimated to start at 11:00 AM and require six hours of downtime for upgrade, reconfiguration and testing.
1997-05-26 at 11:00
KALLSUP: The programming envirment (complilers and libraries) has been upgraded. See "Current News" below for details.
1997-05-21 at 21:00 [xxx (strindberg)]
Strindberg: Gaussian94 revision update 970522. Gaussian users should read "Current news" below.
1997-05-21 at 09:00
Selma: System will be unavailable due to disk repairs. Back again around noon.
1997-05-17 at 14:30
Sheduler stopped due to local networking maintenance.
1997-05-09 at 20:20
AFS: all users moved into afs-cell pdc.kth.se.
1997-05-09 at 17:00
AFS: move into pdc-cell continuous, expect a flaky filesystem for a couple of hours.
1997-05-07 at 10:00 [xxx (strindberg)]
General: UPS (pdc power supply) service scheduled to start 1997-05-13 at 0900 hours. This is considered a low risk operation. To play it safe we will hold Strindberg batch-lines during service anyhow.
1997-05-07 at 03:05 [xxx (strindberg)]
Strindberg + AFS: Pike, a major fileserver halted. Investigations are going on, node allocation on SP2 is paused until more facts are known.
1997-05-07 at 01:44
Selma: Halted and rebooted due to memory allocation error of the OS.
1997-05-05 at 14:30
AFS: we are about to reboot one fileserver, some home directories will be inaccessible during reboot.
1997-05-03 at 10:00 [xxx (strindberg)]
Strindberg: PIOFS did go out of order for a couple of minutes. Please let us know if your job have been hit.
1997-04-29 at 09:00
Selma: Scheduled maintainance, benchmarking an tuning of I/O system. No batch or interactive login during Tuesday. Up again Wednesday.
1997-04-15 at 16:20 [xxx (strindberg)]
Strindberg: New rules for PIOFS usage. PIOFS users should read /pfs/README.PFS for details.
1997-04-14 at 18:00 [xxx (strindberg)]
Strindberg: Hardware maintenance in PIOFS server nodes. New jobs will start at 1000.
1997-04-09 at 18:00 [xxx (strindberg)]
Strindberg: The system will be rebooted to activate installed software updates. Please note the HPS switch still has stability problems.
1997-04-09 at 08:00 [xxx (strindberg)]
Strindberg: Switch fault. Recovery in progress
1997-04-08 at 13:00 [xxx (strindberg)]
Strindberg: Hardware maintenance completed.
1997-04-07 at 11:00
AFS: restart of file servers in the afs-cell nada.kth.se. switch diagnostics. The operation is estimated to be completed before 18.00 the same day.
1997-04-07 at 08:30 [xxx (strindberg)]
Strindberg: Switch fault and restart in progress.
1997-04-04 at 13:00 [xxx (strindberg)]
Strindberg: Switch fault and restart in progress.
1997-04-02 at 17:00
AFS: running salvage on one fileserver in the afs-cell nada.kth.se.
1997-04-02 at 12:00 [xxx (strindberg)]
Strindberg: Replacement of a SSA adapter used by piofs scheduled at 1430. No new jobs started until then.
1997-04-02 at 09:00
Kallsup will be brought down for hardware maintenance (CPU replacement) 1997-04-03. The system will be unavailable from 0900 until 1800. This maintenance operation will also affect HSM users.
1997-04-01 at 10:00
Today, PDC signed a contract for a major upgrade of PDC's CM200, 'Bellman'. The new system, which will contain 64k processors, is expected to be functional May 18th. At the same time, PDC has also aquired two new DataVault mass storage units from another supercomputing center.
1997-03-31 at 00:20
Job started that cross tomorrow mornings, 1997-03-31 0900, boundary due to switch fault.
1997-03-30 at 22:30
Restart of switch due to switch fault.
1997-03-27 at 07:00 [xxx (strindberg)]
Strindberg HPS switch fault, restart in progress.
1997-03-26 at 08:15 [xxx (strindberg)]
Strindberg HPS switch fault. Some night jobs may not have completed. The system is now running normally again.
1997-03-19 at 16:10
We will reboot one fileserver and the control-work-station at approx 16:45. Users residing on the fileserver will be affected during reboot.
1997-03-19 at 14:05
Switch fault - repair in progress.
1997-03-17 at 14:00
Running SP jobs were lost due to a major switch fault. The SP is now up and running again.
1997-03-10 at 19:00 [xxx (strindberg)]
Strindberg: Please note that batch is lagging in time because of the reboot.
1997-03-10 at 16:45 [xxx (strindberg)]
Reboot of Strindberg to activate new software.
1997-03-10 at 08:30
Kallsup (the Cray system) will be dumped and restarted due to a broken CPU. Note that this interrupt also affects HSM users.
1997-03-04 at 16:30
Kallsup will be brought down for hardware maintenance Wednesday 1997-03-05, starting 16.30. The system is expected to be back online again the same night.
1997-03-03 at 16:00 [xxx (strindberg)]
Restart of strindberg parallel file system.
1997-03-02 at 18:00 [xxx (strindberg)]
Hung parallel file system (PIOFS) on Strindberg. PIOFS is now up and running again after server software restart.
1997-02-28 at 16:00
Sligthly reduced number of batch-nodes to prevent switch instability - nodes with [switch problem] indications powered off during weekend batch.
1997-02-26 at 10:00
No more tests to do - all back.
1997-02-26 at 09:55
A few more tests to do.
1997-02-26 at 08:00 [xxx (strindberg)]
Strindberg switch fault.
1997-02-24 at 11:30
Kallsup dumped and restarted due to hanged CPUs.
1997-02-21 at 19:30
Switch fault, about to restart the weekend jobs.
1997-02-20 at 10:20 [xxx (strindberg)]
Strindberg back in shape since 1000.
1997-02-20 at 10:00
Kallsup hardware replacements. Two failing CPU modules will be replaced on Thursday, February 20th. The system will be shut down at 11 am.
1997-02-20 at 08:35 [xxx (strindberg)]
Problems with the switch of strindberg. At present no new jobs are started.
1997-02-19 at 12:00
Restart of one fileserver.
1997-02-19 at 08:00
Selma upgarde started. Details here
1997-02-17 at 22:22
All file servers OK - Node allocation restarted
1997-02-17 at 18:00
Some user file systems might be unavailable - Node allocation stopped
1997-02-17 at 17:37
A file server needs to be restarted.
1997-02-13 at 12:00 [xxx (strindberg)]
Strindberg, switch restarted.
1997-02-12 at 15:50
Selma will be down for OS upgrade 97-02-18 12.00 to 97-02-21 18.00 Read "Current news" below for detailed information.
1997-02-09 at 00:01
Switch, PIOFS and Easy restarted. Up and running again.
1997-02-08 at 22:37
Something fishy with node allocation - allocation paused.
1997-02-07 at 18:35
Kallsup is now running jobs again.
1997-02-07 at 11:40
Kallsup will be dumped, examined and restarted due to CPU problems.
1997-02-04 at 19:00
Restart of one file-server in the afs cell pdc.kth.se.
1997-02-04 at 09:45
Kallsup has problems with a few CPUs. The system has been rebooted with 3 CPUs disabled
1997-02-03 at 15:00 [xxx (strindberg)]
Strindberg: switch fault restart in progress.
1997-02-03 at 13:55
Kallsup is now running again. The system may still be unstable due to I/O related problems. The impact of previous errors has been reduced enough to consider the system stable enough to run jobs. However, the system will be brought down for further maintenance a number of times in the near future.
1997-02-01 at 22:25
Problem with scheduler - restarted. Probably no jobs lost - analyze will follow.
1997-01-31 at 14:45
Kallsup still has stability problems. The system will be unavailable the entire weekend. Cray personell from UK are currently diagnosing our problems on site.
1997-01-29 at 10:25
Kallsup users should still read "Current news" below for more information about the upgrade process.
1997-01-28 at 13:20
Batch enabled again. More info on power outage in current news below.
1997-01-28 at 12:50
Ordinary power is back. We start the process of powering on what was shut down.
1997-01-28 at 11:50
Power failure, turning on backup power. Estimated to last for at least 25 minutes.
1997-01-20 at 15:00
Several nodes going down, fault search in progress.
1997-01-20 at 14:17 [xxx (strindberg)]
The log in node syk-0606/strindberg seem unstable. Fault search in progress. Batch run as usual.
1997-01-20 at 11:00 [xxx (strindberg)]
Strindberg: log in node syk-0606/strindberg rebooted.
1997-01-20 at 09:20
Kallsup users should read "Current news" below for more information about the upgrade process.
1997-01-17 at 16:00 [xxx (strindberg)]
Strindberg: Switch fault repair finished.
1997-01-17 at 15:30 [xxx (strindberg)]
Strindberg: Switch fault, repair in progress.
1997-01-14 at 12:40
Problems with Kallsup remains. Possible due to memory problems. System is now available, but not reliably. Don't submit any jobs. HSM is up and running though. .
1997-01-13 at 21:10
Login procedure on Kallsup hung. Will probably be down until sometime tomorrow (tuesday).
1997-01-10 at 20:31
Kallsup up and running. Performing tests. NQS will be started sometime during the weekend. We expect to be back in full production monday morning.
1997-01-09 at 23:12
Kallsup is down because of a memory module error. Spare parts arrive tomorrow, friday. Do not expect kallsup working this week. This includes the HSM system.
1997-01-09 at 22:00
Some fileservers severly damaged (wrinkles in their filesystems). Complete repair might take quite some time. Batch lines enabled. Please let us know if you were hit!
1997-01-09 at 19:00
Accidental emergency power shutdown in the server/router room. We expect to be back at 20:00. The person hitting the button is an engineer from one of our vendors. At least for a couple of more hours. Sigh.
1997-01-07 at 15:40
Kallsup upgrade stage two is in action. The system will not be reliably accessible until friday (97-01-10). This extra down time has been caused by a defective cpu-slot which was not in use previously and thus not discovered.
1997-01-07 at 15:00
One of our file servers got out of sync for about 30 minutes. During that time interval users may have had problems accessing there files in afs.
All flash news for 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995

Back to PDC
Subscribe to rss