Showing posts with label Computer Issues. Show all posts
Showing posts with label Computer Issues. Show all posts

Friday, November 4, 2011

Email Up and Running...

In case you didn't notice, Email is back up. The following is from the service provider.

I figured I'd share this as I could definitely feel their pain in this situation - having been in it myself.



Another Outage, Really? Yes, really. I regret having to write to you again so soon, but the latest outage of our IMAP/POP3 email was a very severe one that lasted for over 30 hours. After the last outage, we thought we had fixed the issues, but there were obviously some lingering, unforeseen bugs.


What Really Happened? Let's start from the beginning... A few months ago, we migrated all of our hardware out of a data center in San Diego. San Diego was where email was originally housed. Upon moving, we changed the structure and hardware that was handling IMAP/POP3 in order to make it more reliable and redundant.

We added new redundant switches, raid controllers, storage servers. Even the Ethernet links that connected these were redundant. After much testing, we decided on Oracle's OCFS2 filesystem, but that is where the trouble came, the one part of the system that could not be redundant: the filesystem. Bugs cropped up after the structure was put into production. We worked with Oracle to have them fixed, but their patches were no help. After the Nov 2nd's failure, they then proceeded to update our case to "Severe", but we never heard from them again.

After some trial and error, we were successful in getting the filesystem mounted in a read-only state. Instead of waiting for Oracle to figure out how to allow writes again, we began transferring the email to alternate hardware with a known reliable filesystem. The time estimates were made on how long it would take to move the data across the network. But email is a funny thing that way, the hundreds of gigabytes of data was not the problem, it was the shear number of tiny files being created on the filesystem. Some accounts had over 200,000 files that were only a few hundred bytes long. This is why our time estimates were so grossly incorrect.

We were in the dark just as much as you were as far as a time frame for when the data transfer would be completed. It was a long and painfully slow process.


Is IMAP/POP3 stable now? Yes, it is now structured the same way that it has been for the past 8 years. During those 8 years, we have never had any major outages, so we are confident that this solution is the right one.


So, what happens now? We have come to realize that managing email has become far more costly and time consuming for our small team to manage in house. We could hire more staff and purchase bigger, more expensive hardware, but this would in turn drive up our low prices and our customers would suffer. So, we realize that we should ultimately focus on what we do best, Managed DNS.

In light of this, we are joining forces with an email provider that has the time and resources to manage millions of email accounts effectively. They also have an uptime history of 99.999%, so outages are out of the question.


Does this mean No-IP will no longer manage my email? Not exactly. We will still manage your email, but another company will be maintaining the infrastructure of your mail. All support calls for IMAP/POP3 will still be directed to us. Mail forwarding will also be handled by the new company.


But what about my other Email Services: Alternate-Port SMTP, Backup MX and Mail Reflector? No changes will be happening to these services. We will still be managing all of them in-house and will continue to provide support for any issues that you have concerning these products.


What will the transition be like? Most importantly, will I still have all of my data? All of your emails, contacts, and calendar items will be transitioned as part of the migration process. The only difference you will probably notice is in the webmail interface, which will be different.


Will there be any downtime associated with this transition? Although we are still in the planning phase of this transition, we are hoping to have little to no downtime during this transition, and if there is downtime, it will be kept a minimum, during off peak hours.

We hope that you stick around with us. Email is going to be the same low price that it has always been, and it will now be backed up by a solid 99.999% uptime guarantee and honestly, you won't even notice a difference in your email, well except for the uptime ; ).

Email is important and that is ultimately why we have reached this decision.

The transition to the new service will be as painless as possible. There will be little effort on your end and we will be doing most of the legwork. At most, you will probably have to update a record or two.

You will be receiving information in the coming weeks with information about account migration.

In light of the recent outage, we are offering you a free month of email service, and have already added it to your account.

We thank you for your continued patience and for being a valued customer of No-IP. If you have any questions about anything, please do not hesitate to reach out.

Thursday, November 3, 2011

Email Down

It seems email has been down by the service provider... No-IP has had several issues with email over the last day or so. For updates, see @NoIPStatus on Twitter.com or the No-IP.com Support page.

Saturday, November 27, 2010

Servers Decommissioned - Now Using Email Service

What Happened?

Well, the primary server (host to all the virtual servers within) had a glitch in it's drive array.


Some Background First...

To explain, there's a drive array module which controls the computer's access to the hard drives. With this card, I can pair a number of drives together to logically so they appear to the operating system (i.e. Microsoft Windows, Apple Mac OS, VMware ESX/vSphere, Novell Netware, etc.) as one big drive or a couple of big drives.

When a drive array spans 3 or more drives, the drive array starts to use parity technology for redundancy. This is, in short, a mathematical calculation where a block of data is stored on any of the drives and it's mathematical checksum is stored on the other drives. What this does is provide a high level of redundancy in the event a hard drive within the array of drives goes bad. Very cool so long as fewer than half of the drives in the array go bad at one time.


Back to the Issue...

None of my drives are reporting any issues.

In my case the array glitched (similar to losing power or power spike) and lost the drive configuration. In higher end array modules (and even in this one) I'm supposed to be able to pull the array configuration from the drive back to the card. That failed due to something with the glitch.


What About Backups?

I very likely could have restored from backup if I had backups of each virtual machine - remember though, each virtual machine is essentially a complete computer in and of itself (and I have - correction, had - quite a few). So, unfortunately, I do not have the resources (technically or financially) to do backups of these systems on that level. And I learned long ago with email and database systems, if you don't go all the way with backups (and backup everything) it's best to just do what you can to minimize your losses and downtime.


You did maintenance on the servers the other day. Did that have a play or possibly cause this issue?

Short answer, No. I was done with maintenance on Saturday and the server crashed while I was out buying presents on Tuesday around noon. Plus, what I did had no relation to the drive array system.


My Analogy of the Issue...

Probably the best way to explain this is, if you drive a full stretch Cadillac and go to the grocery store and fill up that huge trunk with groceries, and then 30 miles down the road simultaneously all four tires blow and the transmission breaks, do you think it's because the trunk is full of groceries or is it possible something else which may have happened (like running over glass in the roadway)? :-)

As for backing it up, we could probably have four spare tires in the trunk as well as a spare transmission - but it's unlikely anyone would go to that length of protection on a personal basis.


What Now?

For anyone using my email server, I've transferred it out to a service (provided by No-IP.com) so it will no longer go down (at least - not because of something I've done, or my Internet connection, or my server glitching). I've gone ahead and paid the first year service fees to get the transfer going and email is already flowing in. This service was already providing basic anti-spam services (I didn't have them set too high) so it will continue to do that - and we can also increase the spam protection services as well as the settings for individuals. I will also be re-setting up the reflection addresses which existed in my own server (things like group addresses which forward/reflect email to all members of the group).



What are the differences?

Here are the differences which you'll notice:

  • No server based calendar. Personal calendar within your email client (i.e. Outlook, Apple Calendar, etc.) will be useable.
  • No server based address book. I think everyone had their own address book on their computer anyways. This will still work the same.
  • Web Email will have a different look and feel.

If you wish to use another email service (such as Comcast or Verizon) and want your "@flaming.ws" or "@feely.ws" email forwarded to it, please let me know.

Also, the reverse is true - if you wish to use the "@flaming.ws" or "@feely.ws" email service and currently have another service, please let me know. Either way can be accommodated and this email service is guaranteed to be up 99.99% of the time.


What Else?

For those of you who are using my Corporate Antivirus license, that server is also gone. Your Antivirus client will be fine for a couple of weeks if you wish to buy another (I would highly recommend it). I would recommend Trend Micro or AVG. I've not used AVG but have heard it is good from others. For those who have multiple computers (I think everyone does) and want to purchase Trend Micro, I've provided a link to the online store which allows you to purchase Trend Micro for multiple computers.

A couple little tricks on purchasing Antivirus software - specifically Trend Micro

  1. Purchase it ONLINE from Trend Micro. Don't go to the store! You'll pay more at the store!!

  2. Buy the best one. It's worth it!

  3. Trend Micro

  4. I really like Trend Micro, but when purchasing Trend Micro or any other software, be aware of the "Download Protection Service". It's a waste of money! You'll always be able to download a copy of Trend Micro's software - no matter the situation!

  5. Download Protection Service

    Hit the "Trash Can" icon to remove the "Download Protection Service" and save yourselves about $8.

  6. Don't skimp on virus protection! Remember everything you have on your computer - and what a malicious person could do with it. Trend Micro is having a serious sale right now so now's the time to purchase for your Windows pc.

Is that everything?

Surprisingly, that's about it. Amazing how a complete system wipe can bring you back to a clean slate.

The only other thing is an apology for any inconvenience or issues this may have caused to anyone. Know that your email is now in better hands now and will not go down.

Tuesday, November 23, 2010

Server Down....Unknown Reason

Well this really irritates me. After brining it up the other day from the configuration changes, everything has been running fine until today (and I've left it alone since). Apparently sometime after noon today (Tuesday) the virtual machine host server just stopped working. Technically, the Logical Disk Array (I have 4 large drives acting as one in case of a drive failure) dropped all of it's configurations and none of the drives are reporting valid drive configuration including the disk controller. So 5 sets of drive configuration information are all gone in an instant (that was the backup too).

Talk about a disaster.... That scream of frustration you heard was me.

Still troubleshooting and attempting to figure out how to proceed for now.


ARGH!!!


- Posted using BlogPress from my iPad