Events:

2014-12-26 at 14:40 [xxx (povel)]
Povel is now back in service with updated software. It is our hope that the software update will not affect user applications but could potentially do so as the interconnect drivers had to updated being dependent on the kernel update (which was forced due to CVE-2014-9322).
2014-12-25 at 08:04 [xxx (Ferlin)]
The login node ferlin.pdc.kth.se is unreachable.
2014-12-24 at 22:32 [xxx (lindgren)]
the security update has been installed, the system restarted, and lindgren is running jobs again.
2014-12-24 at 01:17 [xxx (Ferlin)]
About 190 nodes on Ferlin are now back in service with updated software. It is our hope that the software update will not affect user applications but could potentially do so as the interconnect drivers had to updated being dependent on the kernel update (which was forced due to CVE-2014-9322).
2014-12-24 at 00:47 [xxx (lindgren)]
due to an impending security update, new logins are prevented for the time being.
2014-12-24 at 00:38 [milner]
milner just got a kernel update, and was restarted, as a response to an impending security issue.
2014-12-22 at 18:45 [klemming]
The Klemming file system has some problems again, this time triggered by the hardware problems on Lindgren. Underlying reason still unknown and no luck with workarounds so far, so no prognosis when this can be solved yet unfortunately.
2014-12-22 at 08:13 [xxx (Zorn)]
The cluster Zorn is available again. The number of nodes is reduced at the moment, remaining nodes will be online after lunch time again.
2014-12-21 at 21:49 [xxx (lindgren)]
the system is on-line again, lacking one cabinet. With a missing cabinet it now executes with reduced interconnect capacity, it has lost routers to the file-system (/cfs/klemming) and also lost batch-front-end nodes. So it is fragile.
2014-12-21 at 14:24 [xxx (lindgren)]
A less hasty look gives that a complete cabinet (rack) is off-line. This might not be possible to fix today.
2014-12-21 at 14:07 [xxx (lindgren)]
the system is locked up, probably due to file-system / file-system-network problems.
2014-12-18 at 09:34 [xxx (Zorn)]
Zorn will not be available due to secuity-related maintenance works starting from lunch. An end time of this break cannot be specified at the moment.
2014-12-17 at 17:16
Due to a security problem, Ferlin, Povel and Ellen login nodes will be taken down at short notice to be patched. All compute nodes will be patched after they finish their current job and before they return to the queue.
2014-12-15 at 21:41 [klemming]
The move of Klemming to a new fabric is finished. Jobs allowed to start and run on lindgren again.
2014-12-11 at 14:55 [klemming]
On Monday the 15th, starting at 10AM, Klemming will be moved to a new Infiniband fabric in preparation for connecting it to Beskow. During this operation we expect access to Klemming to be a bit shaky. To minimize possible problems for batch jobs, no jobs will run on Lindgren during this operation.
2014-12-10 at 14:30 [milner]
Broken raid controller has been replaced. Firmware/microcode upgraded on raid-controllers. System on-line again. Please report unexpected behaviour.
2014-12-06 at 19:52 [milner]
A failed raid controller on the boot-raid is rendering the system frozen. It is to early to say whether this can be worked around remote over the weekend, or will need hardware replacements to take place.
2014-12-05 at 09:18 [xxx (lindgren)]
overnight the batch-frontend node got exhausted and locked up. It has been restarted. Most likely running jobs have been severly affected.
2014-11-08 at 23:33 [xxx (Zorn)]
The cluster Zorn will not be available on Tuesday, 2014-11-11 from 07:00 to 11:00 CET.
2014-11-06 at 15:43
After some additional delay the transfer node, cfs-aux-4.pdc.kth.se, is now up again.
2014-11-06 at 11:20
Unfortunately the service window for the transfer node, cfs-aux-4.pdc.kth.se, had to be moved to 13:00 today due to ongoing installations. Sorry for the inconvenience.
2014-11-05 at 10:28
The transfer node, cfs-aux-4.pdc.kth.se, will be unavailable starting tomorrow, Nov 6th, from 10AM. The downtime is expected to be less than an hour.
2014-10-27 at 16:32 [xxx (povel)]
The Povel login node is being rebooted due to not being able to access Klemming anymore.
2014-10-27 at 16:30 [klemming]
The Klemming file system is now back to more normal condition. We have identified what action triggers the problems but not why, so debugging will continue hopefully without being noticeable. Please let us know if you still notice any problems.
2014-10-23 at 19:05 [milner]
milner now is on-line again. Please report unexpected behaviour. We have not mounted /cfs/klemming/ on the login for the time being.
2014-10-22 at 10:13 [klemming]
Neither us nor the vendor support have been able to figure out the root cause of the problems with Klemming yet. We will now do some more invasive debugging, which might cause somewhat longer outages than what we are experiencing now, but that will hopefully fix the problems.
2014-10-21 at 15:15 [klemming]
We are having some problems causing Klemming to be very slow. We are currently trying to figure out why.
2014-10-17 at 17:05 [milner]
As an update for previous flashnews, unfortunately we will be forced to shut down Milner from the morning of Tuesday 2014-10-21 for installation of electricity and cooling for a new system at PDC. In the worst case the work will take until Friday 2014-10-24, but hopefully service can be restored before that.
2014-10-16 at 10:59 [xxx (lindgren)]
the lindgren login will soon be rebooted to free up locked up system resources.
2014-10-16 at 08:17 [milner]
At the end of next week we expect delivery of a new system. During unpacking and physical installation to electricity and cooling, milner will need to be shut down. The delivery is scheduled to arrive Wednesday 2014-10-22. We expect that unpacking and physical install will take one to two days.
2014-10-13 at 13:37 [milner]
milner is restarted after the plumbing work.
2014-10-09 at 20:37 [milner]
Coming Monday, 2014-10-13 at 10:00, milner will be shut down during plumbing work in preparation for a new system. We expect it to be back the same day.
2014-10-03 at 12:22 [xxx (povel)]
povel.pdc.kth.se - the login node of the povel cluster was restarted today due to power-loss. No running jobs on povel should have been affected.
2014-09-29 at 21:49 [xxx (lindgren)]
Power distribution work is complete and lindgren is running jobs again.
2014-09-29 at 10:00 [xxx (povel)]
Allocations on Povel will be paused due to temporary unavailability of the Klemming file system (which is due to activities related to preparations for new systems). We apologise for the short notice.
2014-09-26 at 10:19 [klemming]
On Monday, 2014-09-29 starting at 08:00, the file system Klemming will be unavailable due to reorganisations in the computer hall. The file system is expected to be back on-line sometime during the afternoon. The transfer node, cfs-aux-4, will also be unavailable during this period.
2014-09-25 at 14:48 [xxx (lindgren)]
Coming Monday, 2014-09-29 starting at 08:00, lindgren will be powered off while remodeling the power distribution for a new system. The aim is to be finished during Monday.
2014-09-24 at 11:09 [xxx (lindgren)]
re-iussed with proper date: during construction work in the machine room tomorrow, Thursday 2014-09-25/08:00, lindgren will execute jobs on a reduced number of compute nodes. This to reduce stress on cooling.
2014-09-24 at 10:52 [xxx (lindgren)]
during construction work in the machine room tomorrow, Thursday 2014-09-26/08:00, lindgren will execute jobs on a reduced number of compute nodes. This to reduce stress on cooling.
2014-09-16 at 14:36 [xxx (Zorn)]
The cluster Zorn will not be available on 2014-09-25 from 08:00 to 19:00 due to a reservation from a PDC training event.
2014-09-16 at 12:09 [milner]
milner has been restarted, and should work as usual.
2014-09-16 at 11:21 [milner]
during machine room preparation for a new system a vital part of milner was accidently powered off. Milner will need a complete restart.
2014-09-11 at 15:42 [xxx (lindgren)]
the PGI compiler license server has been replaced and compiling should work again.
2014-09-11 at 13:10 [xxx (lindgren)]
the license server for PGI compilers for lindgren is unavailable. Investigation in progress.
2014-08-25 at 09:09 [xxx (Zorn)]
Today, 201408-25, 12:00 - 16:00 CEST, the cluster Zorn has been reserved for an additional Summer Schoo seminar on GPU programming. All experienced users are asked for kind patience and should start theit batch jobs after that.
2014-08-19 at 13:09 [xxx (Zorn)]
The cluster Zorn wil not be accessible and useable for normal batch operation on Monday, 2014-08-19 from 13:13-17:00 CEST as well as from 2014-08-21 16:00 until 2014-08-22:18:00 due to a complete reservation for the PDC Summer School in this time.
2014-08-14 at 14:32 [xxx (Zorn)]
The maintenance of Zorn has been finished. The operation system has been updated during this maintenance.
2014-08-13 at 13:23 [milner]
Milner is on-line again.
2014-08-13 at 12:16 [milner]
Milner will soon be restarted to activate new system software.
2014-08-13 at 11:56 [xxx (lindgren)]
Maintenance completed, lindgren is running jobs again.
2014-08-12 at 15:04 [xxx (Zorn)]
Cluster Zorn: System maintenance on Thursday, 2014-08-14, 08:00-16:00 CEST. The system will not be accessible during this time.
2014-08-12 at 12:11 [xxx (lindgren)]
Preventive maintenance on lindgren starting tomorrow, Wednesday August 13, at 10:00. The maintenance is expected to last a couple of hours. The system will be off-line during parts of the maintenance.
2014-07-23 at 09:53 [milner]
The queue system on milner should be working normally now. There may be still some issues with the milner login node. If you notice anything unusual please report it.
2014-07-22 at 18:31 [milner]
The queue system on milner is experiencing problems at the moment which means that no new jobs can be submitted. We are investigating the issue.
2014-06-23 at 10:51
The national server running RT (support@pdc.kth.se) is currently down due to a hardware failure. SNICdocs and the MATLAB license server is also affected. The server will hopefully be restored today. For urgent issues, PDC support can be reached on +46 87 907 800
2014-06-23 at 09:04 [xxx (Zorn)]
The cluster Zorn received yesterday an urgent security update of the Linux operating system. Please contact us if you experience any kind of problems.
2014-06-05 at 15:54 [xxx (Zorn)]
Th eupdate of the cluster Zorn has been finished. It runs now CentOS 6.5 (before 6.4). In case of problems, please re-compile software first and feel free to contact us for help.
2014-06-03 at 22:22 [xxx (Zorn)]
System maintenance for a software update of Zorn will happen on Thursday 2014-06-05 from 12:00 to 17:00 CEST.
2014-05-28 at 10:06 [milner]
Maintenance finished. System on-line again.
2014-05-26 at 12:22 [xxx (lindgren)]
Maintenance is finished. The system is on-line again.
2014-05-23 at 16:47 [xxx (Zorn)]
The maintenance of the cluster Zorn is finished.
2014-05-23 at 14:00 [milner]
Electrical maintenance work to be made starting Wednesday May 28th at 0730. The system will be powered off during the work. which is expected to take a couple of hours.
2014-05-22 at 17:31 [xxx (Zorn)]
The maintenance of Zorn will be extended to Friday, 2014-05-23. Background is a hardware problem with one of the file system servers.
2014-05-22 at 16:12 [milner]
Maintenance is finished.
2014-05-21 at 13:42 [milner]
Preventive maintenance on milner starting tomorrow, Thursday May 22, at 13:00. The maintenance is expected to last a couple of hours. The system will be off-line during parts of the maintenance.
2014-05-20 at 15:50 [klemming]
At risk: On Monday the 26th starting at 9 AM we will do some maintenance operations on the Klemming file system. The file system is expected to stay up with just minor interruptions. While these operations in general shouldn't affect running batch jobs they do mean a higher risk of a file system outage.
2014-05-20 at 14:06 [xxx (lindgren)]
lindgren will go off-line on Monday May 26/09:00:00 for maintenance work. It is expected to take a handful of hours.
2014-05-17 at 12:52 [xxx (Zorn)]
The cluster Zorn will not be available on 2014-05-22 from 07:00 to 16:00 CEST. During this time mechanical works for the extension of the cluster new servers providing Xeon Phi processors will be executed.
2014-05-12 at 14:10 [milner]
Milner login node (milner-login1.pdc.kth.se) crashed during internal testing. Will be back after reboot.
2014-04-30 at 12:06 [xxx (lindgren)]
the internal lustre filesystem (/cfs/emil) has been repaired and job starts resumed.
2014-04-30 at 09:22 [xxx (lindgren)]
the internal lustre filesystem (/cfs/emil) is behaving bad. all job starts stopped. investigation in progress.
2014-04-17 at 16:00 [xxx (Ferlin)]
the ferlin login node is being rebooted to free locked resources.
2014-04-03 at 12:38 [xxx (Ferlin)]
the Ferlin login node will be rebooted around 13:30 to clear system resources in poor shape.
2014-03-26 at 15:05 [xxx (lindgren)]
the lindgren login node will be rebooted due to a shortage of available memory.
2014-02-27 at 19:28
The AFS issues have now been resolved and there should be no more problems with accessing files under /afs.
2014-02-27 at 16:57
There are problems with the OpenAFS file system at the moment. The investigation has begun.
2014-02-26 at 10:56
The Klemming file system is now fully back again. The underlying reason why the file system couldn't recover by itself this time is still unknown. Please report anything out of the usual.
2014-02-26 at 08:42
There are problems to access the filesystem /cfs/klemming at the moment. Investigations are on-going.
2014-02-25 at 07:54
The SSH login to povel.pdc.kth.se is not possible at the moment.
2014-02-14 at 10:31 [xxx (lindgren)]
the configuration change of /cfs/klemming is through, and jobs have resumed execution.
2014-02-12 at 14:10 [xxx (lindgren)]
As a precaution a blocking reservation is set, starting Friday February 14 at 08:00 in the morning. There is configuration work to be made on /cfs/klemming/ at that time. By the block we avoid having any running jobs at that moment.
2014-02-11 at 09:28
The culprits behind the network outages have been identified and restarted. There should be no network dropouts from now on. If you still experience poor/no connectivity, please let us know.
2014-02-11 at 00:14
There are several reports on intermittent outages in connectivity to several systems at PDC. As the location of who has problems, and who has no problems is yet unclear - please report any unusal behaviour to support@pdc.kth.se.
2014-02-09 at 10:54 [xxx (lindgren)]
The lindgren login is being restarted. Batch jobs run as usual, and you should be able to login again shortly.
2014-01-30 at 06:45 [xxx (Zorn)]
A short break for maintenance of Zorn will take place on Friday, 2014-01-31, from 10:00 to 12:00 CET.
2014-01-09 at 16:03
forwarding info from CSC/KTH. This will affect users with CSC home catalogues: Saturday January 18, starting at 10 am maintenance work will be performed on some CSC servers. Most computers at CSC will be affected during this time. Services like email and www will also be affected.
2014-01-09 at 16:00 [xxx (lindgren)]
the plumbing work is completed and lindgren is running jobs again. Side note: there are plenty of jobs charging retired SNAC allocations. These jobs will not be allowed to start.
2014-01-03 at 14:34 [xxx (lindgren)]
To facilitate new hardware there is plumbing work to the cooling system scheduled to start at Thursday, 2014-01-09/07:00. As lindgren will be affected we will shut it down. The work will take at least one full day.
All flash news for 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995

Back to PDC
Subscribe to rss