Avere Systems, Inc. | : | FXT 3500 (44 Node Cluster) |
SPECsfs2008_nfs.v3 | = | 1564404 Ops/Sec (Overall Response Time = 0.99 msec) |
|
Tested By | Avere Systems, Inc. |
---|---|
Product Name | FXT 3500 (44 Node Cluster) |
Hardware Available | November 2011 |
Software Available | November 2011 |
Date Tested | October 2011 |
SFS License Number | 9020 |
Licensee Locations |
Pittsburgh, PA USA |
The Avere Systems FXT 3500 appliance provides tiered NAS storage that allows performance to scale independently of capacity. The FXT 3500 is built on a 64-bit architecture that is managed by Avere OS software. FXT clusters scale to as many as 50 nodes and support millions of IO operations per second and deliver tens of GB/s of bandwidth. The Avere OS software dynamically organizes data into tiers. Active data is placed on the FXT appliances while inactive data is placed on slower mass storage servers. The FXT 3500 accelerates performance of read, write and metadata operations and supports both NFSv3 and CIFS protocols. The tested configuration consisted of (44) FXT 3500 cluster nodes. The system also included (4) OpenSolaris ZFS servers that acted as the mass storage systems. Avere's integrated global namespace functionality was used to present a single namespace to all clients.
Item No | Qty | Type | Vendor | Model/Name | Description |
---|---|---|---|---|---|
1 | 44 | Storage Appliance | Avere Systems, Inc. | FXT 3500 | Avere Systems Tiered NAS Appliance running Avere OS V2.1 software. Includes (15) 600 GB SAS Disks. |
2 | 4 | Server Chassis | Genstor | CSRM-4USM24ASRA | 4U 24-Drive Chassis for OpenSolaris mass storage NFS Server. |
3 | 4 | Server Motherboard | Genstor | MOBO-SX8DTEF | Supermicro X8DTE Motherboard for OpenSolaris mass storage NFS Server. |
4 | 8 | CPU | Genstor | CPUI-X5620E | Intel Westmere Quad Core 2.4 GHz 12MB L3 CPU for OpenSolaris mass storage NFS Server.. |
5 | 8 | CPU Heat Sink | Genstor | CPUI-2USMP0038P | SM 2U 3U P0038P Heat Sink for OpenSolaris mass storage NFS Server. |
6 | 48 | Memory | Arrow | MT36JSZF1G72PZ-1GD1 | Micron DRAM Module DDR3 SDRAM 8GByte 240 RDIMM for OpenSolaris mass storage NFS Server. |
7 | 92 | Disk Drive | Hitachi | HDAS-3000H6G64U | 3TB SATA II 64MB 3.5 ULTRASTAR 7200 RPM Disks for OpenSolaris mass storage NFS Server. |
8 | 4 | SSD | Intel | SSDSA2BZ200G301 | Intel SSD 710 Series (200GB, 2.5in SATA 3 Gb/s, 25nm, MLC) for OpenSolaris mass storage NFS Server. |
9 | 12 | RAID Controller | Genstor | RAAD-LSAS9211-8i | LSI SAS 8 port PCI-X 6 Gbps LSI SAS 9211-8i HBA for OpenSolaris mass storage NFS Server. |
10 | 24 | Cabling | Genstor | CABL-SASICBL0281L | SAS 4-Lane for Backplane to 4-Lane SATA for OpenSolaris mass storage NFS Server. |
11 | 4 | Network Card | Intel | E10G42BTDA | Intel 10 Gigabit AF DA Dual Port Server Adapter for OpenSolaris mass storage NFS Server. |
OS Name and Version | Avere OS V2.1 |
---|---|
Other Software | Mass storage server runs OpenSolaris 5.11 svn_134, this package is available for download at http://genunix.org/dist/indiana/osol-dev-134-x86.iso |
Filesystem Software | Avere OS V2.1 |
Name | Value | Description |
---|---|---|
zfs set atime | off | Disable atime updates on the OpenSolaris mass storage system. |
ncsize | 2097152 | Size the directory name lookup cache (DNLC) to 2 million entries on the OpenSolaris mass storage system. |
zfs:zfs_arc_max | 0x1400000000 | Set the maximum size of the ZFS ARC cache to 80GB on the OpenSolaris mass storage system. |
zfs:zfs_arc_meta_limit | 0x13c0000000 | Limit the metadata portion of the ZFS ARC cache to 79GB on the OpenSolaris mass storage system. |
zfs:zfs_vdev_cache_bshift | 13 | Limit the device-level prefetch to 8KB on the OpenSolaris mass storage system. |
zfs:zfs_mdcomp_disable | 1 | Disable metadata compression on the OpenSolaris mass storage system. |
zfs:zfs_txg_timeout | 2 | Aggressively push filesystem transactions from the ZFS intent log to storage on the OpenSolaris mass storage system. |
zfs:zfs_no_write_throttle | 1 | Disable write throttle on the OpenSolaris mass storage system. |
rpcmod:cotsmaxdupreqs | 6144 | Increase the size of the duplicate request cache that detects RPC-level retransmissions on connection-oriented transports on the OpenSolaris mass storage system. |
ddi_msix_alloc_limit | 4 | Limit the number of MSI-X vectors per driver instance on the OpenSolaris mass storage system. |
pcplusmp:apic_intr_policy | 1 | Distribute interrupts over CPUs in a round-robin fashion on the OpenSolaris mass storage system. |
ixgbe.conf:mr_enable | 0 | Disable multiple send and receive queues for Intel 10GbE adapter on the OpenSolaris mass storage system. |
ixgbe.conf:tx_ring_size | 4096 | Increase the packet receive ring to 4096 entries on the Intel 10GbE adapter on the OpenSolaris mass storage system. |
ixgbe.conf:rx_ring_size | 4096 | Increase the packet transmit ring to 4096 entries on the Intel 10GbE adapter on the OpenSolaris mass storage system. |
ixgbe.conf:intr_throttling | 0 | Disable interrupt throttling on the Intel 10GbE adapter on the OpenSolaris mass storage system. |
ixgbe.conf:tx_copy_threshold | 1024 | Increase the transmit copy threshold from the default of 512 bytes to 1024 on the OpenSolaris mass storage system. |
NFSD_SERVERS | 1024 | Run 1024 NFS server threads on the OpenSolaris mass storage system. |
Writeback Time | 12 hours | Files may be modified up to 12 hours before being written back to the mass storage system. |
cfs.pagerFillRandomWindowSize | 32 | Increase the size of random IOs from disk. |
cfs.quotaCacheMoveMax | 10 | Limit space balancing between cache policies. |
cfs.resetOnIdle | true | Equally share cache capacity when system is not recycling any blocks. |
vcm.readdir_readahead_mask | 0x7ff0 | Optimize readdir performance. |
buf.autoTune | 0 | Statically size FXT memory caches. |
buf.neededCleaners | 16 | Allow 16 buffer cleaner tasks to be active. |
tokenmgrs.geoXYZ.fcrTokenSupported | no | Disable the use of full control read tokens. |
tokenmgrs.geoXYZ.tkPageCleaningMaxCntThreshold | 256 | Optimize cluster data manager to write more pages in parallel. |
tokenmgrs.geoXYZ.tkPageCleaningStrictOrder | no | Optimize cluster data manager to write pages in parallel. |
cluster.VDiskCommonMaxRtime | 600000000 | Retry internal RPC calls for up to 600 seconds. Value is in microseconds. |
cluster.VDiskCommonMaxRcnt | 65535 | Retry internal RPC calls for up to 65535 attempts. |
cluster.VDiskInternodeMaxRtime | 600000000 | Retry inter-FXT node RPC calls for up to 600 seconds. Value is in microseconds. |
cluster.VDiskInternodeMaxRcnt | 65535 | Retry inter-FXT node RPC calls for up to 65535 attempts. |
21+1 RAID5 (RAIDZ) array was created on the OpenSolaris mass storage server using the following Solaris command: zpool create vol0 raidz ...list of 21 disk WWN's... log ...WWN of SSD device...
Description | Number of Disks | Usable Size |
---|---|---|
Each FXT 3500 node contains (15) 600 GB 10K RPM SAS disks. All FXT data resides on these disks. | 660 | 342.2 TB |
Each FXT 3500 node contains (1) 250 GB SATA disk. System disk. | 44 | 10.0 TB |
The mass storage systems contain (22) 3 TB SATA disks. The ZFS file system is used to manage these disks and the FXT nodes access them via NFSv3. | 88 | 223.8 TB |
The mass storage systems contain (1) 3 TB system disk. | 4 | 10.9 TB |
The mass storage systems contain (1) 200 GB SSD disk. ZFS Intent Log (ZIL) Device. | 4 | 186.0 GB |
Total | 800 | 587.0 TB |
Number of Filesystems | Single Namespace |
---|---|
Total Exported Capacity | 229162 GB (OpenSolaris mass storage system capacity) |
Filesystem Type | TFS (Tiered File System) |
Filesystem Creation Options | Default on FXT nodes. OpenSolaris mass storage server ZFS filesystem created with 'zpool create vol0 raidz ...list of 21 disk WWN's... log ...WWN of SSD device...' |
Filesystem Config | 21+1 RAID5 configuration on OpenSolaris mass storage server. |
Fileset Size | 185647.1 GB |
Item No | Network Type | Number of Ports Used | Notes |
---|---|---|---|
1 | 10 Gigabit Ethernet | 44 | One 10 Gigabit Ethernet port used for each FXT 3500 appliance. |
2 | 10 Gigabit Ethernet | 4 | The mass storage systems are connected via 10 Gigabit Ethernet. |
Each FXT 3500 was attached via a single 10 GbE port to one of two Arista Networks 7050-64 port 10 GbE switches. The FXT nodes were evenly split across the two switches. The load generating clients were also spread across the two switches. The mass storage servers were also attached to the same two switches, 2 mass storage servers per switch. Finally, the two switches were interconnected using a QSFP+ port along with 4x10GbE ports for a total of 80 Gbps of bidirectional inter-switch bandwidth. A 1500 byte MTU was used throughout the network.
An MTU size of 1500 was set for all connections to the switch. Each load generator was connected to the network via a single 10 GbE port. The SUT was configured with 88 separate IP addresses on one subnet. Each cluster node was connected via a 10 GbE NIC and was sponsoring 2 IP addresses.
Item No | Qty | Type | Description | Processing Function |
---|---|---|---|---|
1 | 88 | CPU | Intel Xeon CPU E5620 2.40 GHz Quad-Core Processor | FXT 3500 Avere OS, Network, NFS/CIFS, Filesystem, Device Drivers |
2 | 8 | CPU | Intel Xeon E5620 2.40 GHz Quad-Core Processor | OpenSolaris mass storage systems |
Each file server has two physical processors.
Description | Size in GB | Number of Instances | Total GB | Nonvolatile |
---|---|---|---|---|
FXT System Memory | 144 | 44 | 6336 | V |
Mass storage system memory | 96 | 4 | 384 | V |
FXT NVRAM | 2 | 44 | 88 | NV |
Grand Total Memory Gigabytes | 6808 |
Each FXT node has main memory that is used for the operating system and for caching filesystem data. A separate, battery-backed NVRAM module is used to provide stable storage for writes that have not yet been written to disk.
The Avere filesystem logs writes and metadata updates to the NVRAM module. Filesystem modifying NFS operations are not acknowledged until the data has been safely stored in NVRAM. The battery backing the NVRAM ensures that any uncommitted transactions persist for at least 72 hours. The OpenSolaris MASS contains a ZFS intent log (ZIL) device. The ZIL device is an Intel 710 SSD and this device will save all write-cached data in the event of a power loss. This SSD device is designed with sufficient capacitance to flush any committed write data to the non-volatile flash media in event of a power failure or unclean shutdown.
The system under test consisted of (44) Avere FXT 3500 nodes. Each node was attached to the network via 10 Gigabit Ethernet. Each FXT 3500 node contains (15) 600 GB SAS disks. The OpenSolaris mass storage systems were each attached to the network via a single 10 Gigabit Ethernet link. The mass storage servers were 4U Supermicro servers each configured with a software RAIDZ 21+1 RAID5 array consisting of (22) 3TB SATA disks. Additionally, a 200GB SSD device was used for the ZFS intent log (ZIL) in each mass storage server.
N/A
Item No | Qty | Vendor | Model/Name | Description |
---|---|---|---|---|
1 | 20 | Supermicro | SYS-1026T-6RFT+ | Supermicro Server with 48GB of RAM running CentOS 5.6 (Linux 2.6.18-238.19.1.el5) |
2 | 2 | Arista Networks | 7050-64 | Arista Networks 64 Port 10 GbE Switch. 48 SFP/SFP+ ports, 4 QSFP+ ports |
LG Type Name | LG1 |
---|---|
BOM Item # | 1 |
Processor Name | Intel Xeon E5645 2.40GHz Quad-Core Processor |
Processor Speed | 2.40 GHz |
Number of Processors (chips) | 2 |
Number of Cores/Chip | 6 |
Memory Size | 48 GB |
Operating System | CentOS 5.6 (Linux 2.6.18-238.19.1.el5) |
Network Type | Intel Corporation 82599EB 10-Gigabit SFI/SFP+ |
Network Attached Storage Type | NFS V3 |
---|---|
Number of Load Generators | 20 |
Number of Processes per LG | 176 |
Biod Max Read Setting | 2 |
Biod Max Write Setting | 2 |
Block Size | 0 |
LG No | LG Type | Network | Target Filesystems | Notes |
---|---|---|---|---|
1..20 | LG1 | 1 | /sfs0,/sfs1,/sfs2,/sfs3 | LG1 nodes are evenly split across the two switches |
All clients were mounted against all filesystems on all FXT nodes.
Each load-generating client hosted 176 processes. The assignment of processes to network interfaces was done such that they were evenly divided across all network paths to the FXT appliances. The filesystem data was evenly distributed across all disks and FXT appliances and MASS storage servers.
N/A
Generated on Tue Nov 15 16:05:03 2011 by SPECsfs2008 HTML Formatter
Copyright © 1997-2008 Standard Performance Evaluation Corporation
First published at SPEC.org on 15-Nov-2011