Enterprise 10000
Enterprise 10000 or Starfire
History of the Enterprise 10000 or Starfire
This system came out of the Cray or BSD group
The E10000 extends the Enterprise line that was announced in April 1996
as a result of the acquisition, in July 1996, of the Cray Research SPARC(TM)/Solaris
business. With the acquisition came a 250-person organization, the CS6400
64-way system based on SuperSPARC(TM) technology, and the almost-completed
design for what is now being announced as the Enterprise 10000. The Cray
organization has been integrated into SMCC as a product group called Business
Systems. Q. What is the future of the CS6400 system? A. The CS6400 will
be formally withdrawn by calendar Q2 1997. Existing systems will be supported
for five years and can be maintained by SunService(SM). An upgrade program
exists to allow a trade up from the CS6400 to the E10000. In addition,
Sun has sufficient material available to allow expansions to existing CS6400s
for a least one year.
Processors
16 to 64 250MHz 4MB E$, 336MHz 4MB or 400MHz 4MB E$ UltraSPARC-II
Processors codename Blackbird
Blackbird is a .35micron processor.
The process has 44bit virtual to 41bit physical addressing capability
The on board cache is 16KB for both data and instruction
The FPU has two execution units capable of two FOPS per clock cycle
Upgrade will be to either 300Mhz or 333MHz
System Board
Up to 16 system boards
Each system board has a mezzanine cards for SBus
There are 2 SBuses with a total of 4SBus slots
Each SBus runs at 25MHz 64 bit that comes to 200MB/sec total bandwidth
and about ~100MB/sec sustained
Each system card proviedes 400MB/sec bandwidth
Each Starfire system can provide 400MB * 16 or 6.4GB/sec IO bandwidth
There are NO onboard SCSI or eithernet ports.
UPA
The E10000 uses the Gigaplane-XB interconnect between System Boards. This
is a router-based interconnect scheme that, in this version of the E10000,
scales up to 10.5 GBps using an 83.3-MHz internal clock frequency. Latency
is constant at about 600 ns regardless of the load on the interconnect.
The Gigaplane-XB has been designed and tested to run at an internal clock
frequency of 100 MHz, giving a data transfer rate of 12.8 GBps. This is
a future growth path for the E10000. There are two Centerplane support
boards in every Starfire - one for each half of the centerplane. These
boards provide power and clocks to the centerplane.
Domains
The E10000 may be dynamically reconfigured as independent computers using
the Dynamic System Domain feature. The system has been architected for
16 domains, and five are available in the initial version of the E10000.
Each domain has its own copy of Solaris and its own boot disk and hostid.
There is software isolation between domains and some degree of hardware
isolation too. An example of the use of System Domains would be to divide
the E10000 into a Transaction Domain and a Batch Domain. In the daytime,
the Transaction Domain can be expanded at the expense of the Batch Domain,
with the reverse occurring at night. System Domain is a feature borrowed
from the mainframe world (LPARs). It allows the customer the flexibility
to configure the E10000 to meet the needs of his business at any specific
time period, and change it later as business conditions change. It also
allows the E10000 to be divided into a number of independent and secure
systems with isolation between them.
Gigaplane
The Gigaplane-XB is built with two data sections and four address paths.
As mentioned above, there is error correction on the data and the addresses
and this should take care of most transient errors. The Gigaplane-XB is
implemented with active logic on the centerplane. Should one data section
have a hard failure, the E10000 will come back up following an auto reboot
with the remaining section carrying all the traffic. The net data bandwidth
will be halved, but the E10000 will continue to provide service to users.
The address paths degrade from four to three to two to one should there
be a hard failure. Replacement of an E10000 centerplane can be scheduled
for the next maintenance period.
Memory
Each memory bank delivers one complete cache line of 72 bytes per access.
This is 64bytes plus ECC. Two types of SIMMS are available using 16Mb or
64Mb technology. There are 4 memory banks on a system board. The SIMM sizes
are either 32MB or 128MB per simm. There are 8 SIMMS per bank and a total
of 4 banks.
Maximum of 4GB of memory per system board.
SIMM type can not be mixed on a system board but can be mixed in a system
Control Board
All "housekeeping functions" are on a Control Board, which can be optionally
redundant.
Each E10000 requires a Control Board and two may be configured for redundancy.
Should the Control Board fail, the spare will be automatically configured
into the system following an auto reboot. The failed board can be on-line
hot swapped later.
This is the eithernet twisted pair 10baseT connection to the SSP
It provides the JTAG interface to the system boards
It provides the central clock distribution for Starfire
It monitors all cooling fans
It controls the remote switching of any I/O expansion cabinets
It provides the fail safe logic that monitors Starfires temperature and
removes power from the system and I/O cabinets if the upper threshold is
exceeded.
The control board plugs into the centerplane. The redudant control board
plugs in from the other centerplane from the opposite side. A reboot is
required to switch between control boards.
Hostview
Hostview is the GUI that is used to configure and administer a Starfire.
Sun's Ultra Enterprise 10000 server set record results in a critical TPC-D
measure: 300-gigabyte database performance. At 300 gigabytes, the Ultra
Enterprise 10000 server in a 64-processor configuration has TPC-D power
of 1787.9 QppD@300GB and a TPC-D throughput of 1122.3 QthD@300GB, for a
TPC-D price/performance of $3,562 QphD@300GB. Previously numbers near this
level were achieved only in more expensive clustered configurations.
Power and Physical information
The E10000 has fault-tolerant power and cooling and redundant AC line feeds.
These components are also on-line replaceable. These are RAS features that
preserves the high availability of the E10000 should any of these components
fail. The SSP logs the failure for future replacement.
The system can have up to 5 AC inputs.
These are 220V 30AMP connectors. Max draw of 24amps per line
52,000BTU
70"x30" 34" deep with a 5" styling panel
1400 pounds fully loaded
The fans are both below and above the system boards.
The base system comes with 3 AC inputs
This could handle up to 32 processor system
The 4th AC input is needed for systems above that.
The 5th AC input is used for the I/O space
Do NOT have the OS drives internal on Starfire. The 5th AC input is not
part of the load sharing and is not redundant.
The AC inputs go into 48V bulk supplies.
Power Control Unit
A PCU may be installed in the I/O space on a starfire. This is used to
control I/O expansion units. Upto 5 PCU's may be installed in a Starfire.
Operating System
Starfire comes with Solaris, SSP and AP software.
Licensing is for unlimited Solaris users
SSP
The E10000 is configured with an integral System Service Processor (SSP),
based on a SPARCstation(TM) 5 workstation with CD-ROM and management software.
The SSP is used for normal Solaris administration, but also controls system
booting, monitors the hardware for problems and is fundamental to the E10000's
advanced RAS capability. It also uses SNMP-based messaging, which provides
the framework that allows remote monitoring packages to include the E10000
in their management. The premise behind the SSP is that it makes sense
to use a separate independent vehicle for monitoring and controlling the
system. In this way, the SSP can diagnose the E10000 with no compromises.
RAS Features
A major capability for the E10000 is its RAS features. Dynamic Reconfiguration
(DR) allows System Boards to be swapped while the system remains on line.
Dynamic System Domains (DSD) allow the E10000 to be dynamically reconfigured
as multiple smaller computers. There is fault tolerant power and cooling.
In fact, an E10000 can be configured so there are no single points of failure
that would cause the system to be down longer than the auto-reboot time.
F. Auto-reboot for all software hangs or panics is controlled by the SSP.
B. The system will only be down for the time taken to auto reboot (configuration-dependent,
but not more than 30 minutes). "Lights out" operation is possible. F. The
E10000 has fault-tolerant power and cooling and redundant AC line feeds.
These components are also on-line replaceable. B. These are RAS features
that preserves the high availability of the E10000 should any of these
components fail. The SSP logs the failure for future replacement. F. The
E10000 has been designed with a Dynamic Reconfiguration feature that allows
on-line swapping of System Boards without a required auto reboot. DR execution
is controlled by the SSP. B. A RAS feature that enables concurrent servicing
of the E10000; can be used to repair a failure or to upgrade the system
while processing continues.
SUNTRUST
By configuring a Starfire correctly you can improve the availability from
99.5% to 99.95%. This is a difference of 44 hours downtime to 9 hours downtime
per year.
Important uptime configuration issues:
Extra system boards
control board
Power and Cooling
System Service Processor
on the system
system domainins
/O redundancy
your data with RAID 5 or 1
your sysadmins
monitors
to platinum service
TPC-D
Sun's Ultra Enterprise 10000 server set record results in a critical TPC-D
measure:
300-gigabyte database performance. At 300 gigabytes, the Ultra Enterprise10000
server in a 64-processor configuration has TPC-D power of:
1787.9 QppD@300GB and a TPC-D throughput of 1122.3 QthD@300GB,
for a TPC-D price/performance of $3,562 QphD@300GB.
Previously numbers near this level were achieved only in more expensive
clustered configurations.
Availability
AVAILABILITY ------------ First Order Date: January 22, 1997 Volume Shipment
Date: March 1997
400MHz
HPC Benchmarks
Starfire
Benchmarks
Main Starfire
Page