Wednesday, December 28, 2011

Scandisk, Defrag, Call Me in the Morning

For this segment, I want to point out how to "take care" of your home computer.

You have to remember that a computer is another electronic appliance.  It needs to be maintained to work properly.  There is the physical care, of course.  My friend keeps his PC on the floor, like many do, under their desk.  He is smart about it and opens up the case periodically and cleans off all the electronics  - removing dust and dog hair.  He also makes sure the vents are clear - good as it keeps things from overheating.  (Hey, gamers out there, that's something you should do as well with your video game systems, keep the vents clear from blockage.)

But this isn't the maintenance I am talking about.  My tech support friends have a line, "Scandisk, Defrag, call me in the morning".  (Your "tech buddy" is probably nodding their head as they read this.)  What does this mean?  They are referring to basic maintenance of the files on your computer.

Let me explain further.  You are probably aware by now that your computer contains a "hard drive", where your programs and files are stored.  The hard drive is a type of storage media, like a tape cassette, CD, or DVD.  Information is stored on it when you install software, or save a document.

What you may not be aware of is "how" the information is stored.  You may have heard of the word "bytes" (megabytes, gigabytes).  A "byte" is a small unit of data storage.  A keyboard character, like the letter "A", is one byte in size.

When we are talking files and storage, we generally talk in multiples of bytes.  If you look up on the Net, you'll find these approximations*:

  • One Kilobyte (KB) = 1,000 bytes
  • One Megabyte (KB) = 1000 KB = 1,000,000 bytes
  • One Gigabyte (GB) = 1000 MB = 1,000,000,000 bytes
  • One Terabyte (TB) = 1000 GB = 1,000,000,000,000 bytes
* The actual multiplier value is not 1000, but 1024 (2 to the 10th power for you math geeks).  1000 is the accepted approximation.  Keep in mind that some computer manufacturers advertise this way rather than the "real" storage space.  A true terabyte drive would store 1,099,511,627,776 bytes - about 10% higher than what is advertised.

So that means the 1TB drives that are out in the stores now have the storage capacity of just over 1 trillion characters.  That's a lot of thank you letters for holiday gifts.  So how does this cause a problem?  The problem is in the way files are stored - sequentially on the disk.

Now wait a minute, that sounds like a good thing, files being stored in order, they can be accessed in order, right?  This is true.  In fact, your operating system (Windows, MacOS, Linux, etc.) will access a "file table" related to the disk when it goes to read or write.  This file table contains the name of each file, with a "pointer".  The pointer says where on the disk the file begins.  This is handy for access.  Now, instead of having to go through the entire disk in order until we find your file, we just look it up in the file table and jump to it.

Again, this sounds all very efficient, and it is, when you are dealing with very small files.  The problem is with larger files.  The thank you letters are going to be very small in size.    Even if it were 100 words long, that's only about 500 characters (or 500 bytes).  Even if we double that, we are only looking at about 1KB in size.  We do, however, need to add any additional information in the file, such as any formatting we've added.  Even so, the files are small enough.

So what about other documents?  Well, a formatted resume would take about 40-50 KB.  A spreadsheet for figuring out your budget could take anywhere from 100KB up to megabytes in size.  Still, we have a couple of 100 gigabytes of disk space on our computer, why should we care?

Well, you shouldn't care about running out of space.  That won't happen unless you consistently install new software.  The issue is again, the way files are stored.  When files are saved, they are not saved as a whole.  The hard disk is broken down into small fragments called "clusters".  A cluster is defined as the minimum amount of space used to store a file segment.  For most modern operating systems, the size of a cluster is 32 KB in size.  Yes, this minimum size means that our small thank you notes, while smaller that 32 KB, will actually each take of 32 KB.  There are ways to break down a hard disk into multiple drives (called partitions), which result in smaller cluster sizes, but it isn't considered worth it nowadays for the very small gain you would achieve with hard disks of such large capacity.  A cluster is also the maximum amount of space used to store the file segment.  So when a file is over 32KB, it will take multiple clusters will store it.  The resume example of 50KB?  2 clusters.  Your budget spreadsheet?  That can run up into the range of a few megabytes, or hundreds of clusters.  That software you just installed?  Well, that can run into the gigabyte range - possibly millions of clusters.

But how can this be a problem if all files are stored sequentially?  The key here is the word "sequentially".  While the definition of the word means "in order", when dealing with file storage it means "in order, dropping in where you find room".  

Confused yet?  Picture your hard disk as a brick wall you are building.  You start laying bricks in order, row by row.  Each brick represents a cluster.  If all you did was install programs and save files, all files would be in precise order.  But that's not how it works in the real world.  Let's say you've been using the computer for a while.  Up comes the holiday again.  Are you going to keep those thank you notes from last year?  How about an old version of your resume?  You finished a game, are you going to keep in installed?  No, of course not.  You'll delete them.  Leaving cluster-sized "holes" in your wall.

Now, you go to create and save a new file (or install new software).  Just like a brick wall, you aren't going to leave holes in it.  First, you'll fill the existing holes.  Saving a file goes the same way.  The empty cluster "holes" are used first.  Only when they are filled, will the file continue writing to "the top of the wall".  This is known as a file being fragmented, or non-contiguous.  That's a fancy word, isn't it?  It essentially means it is not continuous - the clusters for the file are not one after the other.

So what does this mean?  Remember those file pointers?  If a file were contiguous (non-fragmented) the operating system just needs the pointer in the file table to tell it where to find the start of the file in order to access it.  However, when a file is fragmented, it has multiple starting clusters.  Can you guess what that means?  Yes.  We now need a file pointer for each time the file is "fragmented" pointing to where it starts up again.  And these additional file pointers aren't stored in the file table, but as part of the files themselves.

Okay, okay, (you are probably thinking).  This is all very interesting, but why should I care?  Ah, this is where taking care of your equipment is important.  Remember that computers are mostly memory and stored information.  As more and more files become fragmented over time, the files storage gets more and more complex with more pointers.  And as we know, the more complex something is, the greater the chance of it breaking!  In this case, the greater your storage media becomes fragmented, the greater the chance that one or more of those file pointers becomes broken (no longer pointing to the right cluster).  The result?  You can't access the file!

Surely there is a way to fix this?  Absolutely.  Scandisk, Defrag, call me in the morning. 

Let's start with Scandisk first.  Scandisk is an application in Windows used to correct problems in the file system.  It will attempt to reconnect files by fixing their pointers.  In some cases, it can not.  These are known as "lost clusters".  The only thing this application can do is save this data as text files, for you to open and save as another file.  If the file wasn't a document, you end up with garbage!

Well that's not very helpful!  Ah, what you need is to perform some preventative maintenance.  That's where the Disk Defragmenter utility comes in.  This utility, known as Defrag for short, rearranges the files on your hard disk so they become truly sequential (contiguous).  That means each file only needs 1 file pointer, located in the file table.  To be fair, this doesn't clean up every file, but it gets most of them.  It essentially reassembles the brick wall, in order!

Great.  How do I run these utilities?  The answer varies from which operating system you are using - I'll concentrate on Windows.  Windows represents its drives as drive letters, and that's where you access the utilities.  In Windows Vista or 7, click the selection "Computer" from the Start Menu.  (For Windows XP and earlier, double-click the "My Computer" icon on the desktop.  In the window that appears, your hard drives will be listed as C:, D:, etc., under "Hard Drives".  Right-click on the drive letter you want to maintain, and select "Properties" from the menu.  In the dialog that comes up, select the "Tools" tab.  Scandisk is under "Error Checking" and Defrag is under "Defragmentation".  Click the appropriate button to run the utility.  

 Note:  You may receive a message from one of the utilities that it is currently scheduled to run.  In that case, your life couldn't be easier - your system has been configured to run it periodically on its own!

So how often should I run these utilities?  The answer is, it depends.  It really varies on how much you use the computer.  A general rule of thumb is to run Scandisk every few months.  If you are someone who downloads a lot of files and installs a lot of programs, you should run Defrag every few weeks.  If not, every couple of months is fine.

Perform these maintenance tasks,  keep your area clean, and run a virus scanner once in a awhile.  This maintenance will go a long way in keeping your computer in shape!

No comments:

Post a Comment