Los Hicimos! We Did It! We broke the bonds of Amazon (AWS, EC2)

Moving out of Amazon is no small feat.

After a couple of months preparation we finally moved out of Amazon Web Services ([AWS], aka. the Roach Motel) on December 12th, 2011. And not a moment too soon. In the end, after a trial run, we successfully moved out of AWS in less than three hours. This was no small feat considering we had to transfer and maintain synchronization of terabytes of data from a source on the other side of the country.

Our new installation is fantastic. It is excellent blend of carrier class technology and commodity hardware. This delicate balance gives us fabulous data handling capabilities while maintaining very low operating costs.

Why is platform so special?

  • It is built on OpenIndiana of course.
  • It leverages ZFS to the max.
  • DDRdrive X1s for logzilla — blazing fast synchronous write performance.
  • Aggregrated gigE links utilizing jumbo frames every where.
  • Each LACP member connects to a separate physical switch within a virtual chassis. If a physical switch chassis fails, the servers connected to that physical chassis continue to operate on the other chassis at gigE speeds.
  • Full tilt on DRAM. Every slot is used with the maximum size.
  • Gobs of 15K RPM SAS disks per data storage device leveraging multiple Sanima-SC/Newisys NDS-2241 storage chassis per server.
  • Since we are on ZFS, we can hot swap the disks to SSDs when suitable enterprise grade devices come available.
  • There is more SSD based cache (L2ARC)per server than the size of the existing data set, so there is plenty of room to grow in read ops.
  • Obviously remote out of band management, KVM, SMASH interface, with all the bells and whistles.
  • Fully redundant power, every where.
  • Should a server fail, the disks owned by that server can be imported on a partner system (thank you SAS!), and a zone booted to continue operating the services provided by the down partner. Genius!
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Los Hicimos! We Did It! We broke the bonds of Amazon (AWS, EC2)

  1. Hello,

    you have some nice setup there. Only one question:

    why deploy two servers and do a failover thru zone boot on other host? Why not just takeover data pool with zpool import -f and evnetually restore comstar backup (if any)?

    Btw,
    great posts under “solaris” tag. Keep up the good work.

    • I would do both of those things. The order of operation would be to import the data zpool from host1 on to host2 as data_partner or some other name. Then import the zones from data_partner and boot them.

      The configuration uses two servers and two disk shelves. Each server gets one half of the disks in each shelf as a number of mirrors in the “data” pool. Each server can see all of the disks, so in the event of a failed server chassis the data pool from the failed chassis can be imported on the partner system. does that make it more clear?

      thanks for the feed back.

      • Greetings,

        I’ve reviewed your failover scenario and I’m afraid “zone boot” on failover host is not usable for scenario when using COMSTAR.

        Looks like COMSTAR doesn’t like non-global zone. With default configuration in fresh zone it doesn’t seems to be working. SMF services are in maintenance mode with some arbitrary error.

        What’s your experience?

  2. alfie says:

    does openindiana stable enough for production?

    • We are coming up on one year with no significant issues. I give OI an A+ as its data handling capabilities with ZFS phenomenal. As always, your experience is going to largely depend on the hardware you have. I spent a fair amount of time reviewing hardware and choosing devices that would fit the OS. It is not Linux, there are not a tremendous amount of drivers so you have to choose carefully.

      • Nitin says:

        Interesting . in the past I haven\’t tuned MySQL since I\’m never really CPU bound. These new SSDs could in terohy bottleneck the northbridge and or CPU so maybe it\’s going to be an issue again.Being constantly bottlenecked on disk seeks means you don\’t really think about CPU utilization much.Kevin

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>