|
Enhanced
performance and reliability
ILS has been growing at a nice, steady pace for
the last four years. As some of you already know, we monitor the
performance of each of our servers and processing workstations.
If there is anything wrong, an ILS administrator is paged within
4 minutes and starts to take corrective measures.
In the spring of 2005, we saw that that was not enough. We wanted
to improve our system so that we were able to prevent some of the
problems from happening. We identified the three most troublesome
areas. These areas were:
- availability of Marketplace data at our web hosting company
- availability of Marketplace Processing Center network connectivity
- availability of Marketplace data at our game processing workstations
To increase the reliability of our service, we targeted
these three areas and introduced several infrastructure updates
during the summer of 2005.
Database data replication
We saw the loss of availability of Marketplace data
at our web hosting company as a threat to providing reliable service
to our customers. We knew that if our database server went down
and we would not be able to recover it within the next hour, our
service would suffer. To avoid this condition, we decided to introduce
data replication at the database server level.
To achieve this, we purchased another database server
this past summer. We now have two database servers: a master database
server and a slave database server. In this setup, the slave is
instantly replicating data from the master. So, in the unlikely
event that the master database server goes down, we are able to
swap the servers in less than 15 minutes and quickly restore the
Web Marketplace service.
Having all Marketplace data stored on two database
servers increases our reliability and survivability during a serious
problem. It also enables us to continue providing service to our
customers with only a short interruption of our service.
Internet connection in the Marketplace Processing
Center
The next area that required our attention was the
Marketplace Processing Center's Internet connectivity. We have had
Internet connectivity via two independent providers. However, there
were some changes at our backup Internet service provider that could
cause the deterioration of their service. We found another company
(microcerv.net) to replace our backup Internet provider.
At the same time, we upgraded the routers and switches
that we use to maintain Internet connectivity to our two independent
Internet providers. Our current setup enables us to switch from
our primary Internet provider to our backup Internet provider in
less than 60 seconds.
Independently, our primary Internet provider (ISDN.net)
upgraded their equipment and service. They have access to three
Internet backbones here in the Digital Crossing building, where
the Marketplace Processing Center resides.
After these upgrades, we have access to two Internet
providers and four Internet backbones. Via our primary Internet
provider ISDN.net, we have access to WV Fiber, BellSouth, and AT&T
networks. Through our backup Internet provider, we have access to
MCI network.
These upgrades should significantly increase reliability
of our Internet connection in the Marketplace Processing Center.
Processing workstations' upgrade to RAID1
We have several unmanned game processing workstations
whose sole purpose is to process games 24x7, 365 days year after
year. The loss of any of these workstations leads to the interruption
of service. In such a case, we need to restore the data from backup
on another workstation, which can take up to 12 hours. Therefore,
a hard disk failure was identified as the most likely event that
would make a game processing workstation inoperable.
To minimize risks associated with the game processing
workstation being inoperable due to hard disk failure, we upgraded
all of our current game processing workstations to RAID1 level and
bought three new game processing workstations with RAID1 setup.
RAID1 enables us to process games on the game processing workstation
even if one of the hard drives fails. These adjustments have made
a hard drive failure a non-issue. Eventual replacement of the failed
hard drive will be done at a later time when the game processing
workstation is not engaged in data processing. The whole operation
will take less than an hour.
For the technically inclined, RAID is a redundant
array of independent disks. In our RAID configuration we use the
RAID1 setup, where data is mirrored on both drives. So, the failure
of one drive does not lead to the computer's inoperability.
Configuring all of our game processing workstation
with RAID1 setup enables us to process games even if one of the
game processing workstation hard drives fails. This is important
because for many of you who are running condensed games spread over
only a few days, fast game processing is very important. It is very
important for us too, because we shorten the time we need to spend
on fixing a hardware problem.
Conclusion
With these three infrastructure upgrades (Database
data replication, Internet connection in the Marketplace Processing
Center, Processing workstations' upgrade to RAID1), we have significantly
improved the reliability of our service. This will enable us to
spend more time developing Web Marketplace and less time fixing
hardware problems. And that is exactly our goal.
Marketplace Community Newsletter,
issued quarterly.
www.marketplace-simulation.com Copyright © 2005 Innovative Learning
Solutions. All rights reserved.
Innovative Learning Solutions, Inc., 500 West Summit Hill, Knoxville,
Tennessee 37902, USA
Phone: 865.740.1776
|