Advertisement
News
EEtimes
News the global electronics community can trust
eetimes.com
power electronics news
The trusted news source for power-conscious design engineers
powerelectronicsnews.com
EPSNews
News for Electronics Purchasing and the Supply Chain
epsnews.com
elektroda
The can't-miss forum engineers and hobbyists
elektroda.pl
eetimes eu
News, technologies, and trends in the electronics industry
eetimes.eu
Products
Electronics Products
Product news that empowers design decisions
electronicproducts.com
Datasheets.com
Design engineer' search engine for electronic components
datasheets.com
eem
The electronic components resource for engineers and purchasers
eem.com
Design
embedded.com
The design site for hardware software, and firmware engineers
embedded.com
Elector Schematics
Where makers and hobbyists share projects
electroschematics.com
edn Network
The design site for electronics engineers and engineering managers
edn.com
electronic tutorials
The learning center for future and novice engineers
electronics-tutorials.ws
TechOnline
The educational resource for the global engineering community
techonline.com
Tools
eeweb.com
Where electronics engineers discover the latest toolsThe design site for hardware software, and firmware engineers
eeweb.com
Part Sim
Circuit simulation made easy
partsim.com
schematics.com
Brings you all the tools to tackle projects big and small - combining real-world components with online collaboration
schematics.com
PCB Web
Hardware design made easy
pcbweb.com
schematics.io
A free online environment where users can create, edit, and share electrical schematics, or convert between popular file formats like Eagle, Altium, and OrCAD.
schematics.io
Product Advisor
Find the IoT board you’ve been searching for using this interactive solution space to help you visualize the product selection process and showcase important trade-off decisions.
transim.com/iot
Transim Engage
Transform your product pages with embeddable schematic, simulation, and 3D content modules while providing interactive user experiences for your customers.
transim.com/Products/Engage
About
AspenCore
A worldwide innovation hub servicing component manufacturers and distributors with unique marketing solutions
aspencore.com
Silicon Expert
SiliconExpert provides engineers with the data and insight they need to remove risk from the supply chain.
siliconexpert.com
Transim
Transim powers many of the tools engineers use every day on manufacturers' websites and can develop solutions for any company.
transim.com

Digital Data Storage is Undergoing Mind-Boggling Growth

By   09.14.2016 0

Until the 19th century — let’s say until the Napoleonic Wars — life on earth proceeded at a slow pace with no significant differences over long periods of time. If you were a farmer in ancient Egypt, your daily life would not have been much different 2,000 years later under Louis XIV, the Sun King of France, save for possibly somewhat less harsh conditions and slightly more food.

The setting abruptly changed in the 19th century, even for humble farmers. Driven by scientific discoveries and a flurry of inventions, the technological revolution introduced a radical inflexion point and gave rise to massive growth that continues today at an ever-increasing pace. Myths were shattered and questions that had remained unanswered for millennia suddenly found answers, which triggered new questions and opened doors into new fields of human knowledge.

Discoveries in the early 1800s led to new findings in the ensuing decades that, in turn, set the path to breakthroughs and inventions on an accelerated scale unseen by humankind since Homo sapiens first walked the earth.

Where better to look for proof of the exponential progress of the sciences than in the mindboggling escalation of numerical prefixes associated with physical metrics?

Partner Content
View All
By Jason Chien, Product Marketing Director, Silicon Motion   09.05.2023
By Cincon Electronics Co., Ltd.  08.31.2023

The metric system was one of many new ideas conceived during the French Revolution at the close of the 18th century. It was intended to rein in control and order among the many confusing and conflicting systems of weights and measures being used in Europe. Back then, units of length, land area, and weight varied not just from one country to another, but from one region to another within the same country.

The metric system replaced the traditional units with one fundamental unit for each physical quantity, now defined precisely by the International System of Units. Multiples and fractions of these fundamental units are created by adding prefixes to the names of the defined units. These prefixes denote powers of 10, so that metric units are always divided into 10s, 100s, 1,000s, etc.

As originally conceived, the range of prefixes covered six orders of magnitude (106), from one milli (1/1,000) at the low end to one kilo (1,000) at the high end. Over time, these multipliers have been extended in both directions.

About two decades ago, in 1991 to be precise, the 19th General Conference on Weights and Measures extended the list of metric prefixes to the powers of +24 and -24, as illustrated in Table 1.

Are the latest ranges, now covering a space of 48 orders of magnitude (1048), large enough to assure that any physical measurement is going to be included?

The evolution of digital data
Let’s take a look at digital data — an area that has seen exponential growth in the past decade or so — which may be classified as either structured or unstructured.

Structured data is highly organized and made up mostly of tables with rows and columns that define their meaning. Examples are Excel spreadsheets and relational databases.

Unstructured data is everything else. Examples include the following:

  • Email messages, instant messages, text messages…
  • Text files, including Word documents, PDFs, and other files such as books, letters, written documents, audio and video transcripts…
  • PowerPoints and SlideShare presentations
  • Audio files of music, voicemails, customer service recordings…
  • Video files that include movies, personal videos, YouTube uploads…
  • Images of pictures, illustrations, memes…

The volume of unstructured data exploded in the past decade and half. Just compare the size of a text file such as The Divine Comedy

— which was translated into English by Henry F. Cary in 1888 — at 553kB with the file size of an HD video that stores a movie like The Bourne Identity at 30GB. The difference is of seven orders of magnitude (107) or 10 million times.

Statistics published by venues that track the digital data market are staggering. According to IDC Research, digital data will grow at a compound annual growth rate (CAGR) of 42% through 2020. In the 2010-2020 decade, the world’s data will grow by 50X; i.e., from about 1ZB in 2010 to about 50ZB in 2020.

“Between the dawn of civilization and 2003, we only created five exabytes; now we’re creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes (53 trillion gigabytes) — an increase of 50 times.” — Hal Varian, Chief Economist at Google.

And IBM found that humans now create 2.5 quintillion bytes of data daily; that’s the equivalent of about half a billion HD movie downloads.

Digital data measurement
Let’s consider an oft-overlooked anomaly with regard to measuring digital data. This anomaly is that digital data is measured using a binary system, not a decimal or metric system. The basic unit of digital data is the bit (“b”), and eight bits make up a byte (“B”). Alphanumeric characters are coded in bytes, one per character. The storage industry uses bytes, while the networking industry refers to transmission speeds useing bits-per-second.

In a metric system, 1,000 is equal to 10 to the power of 3 (103), but 1kb (kilobit) or 1kB (kilobyte) correspond to two to the power of 10 (210), which equates to 1,024 bits or 1,024 bytes, respectively. In other words, 1kB is a little larger than 1,000 bytes. This is a small difference that, oftentimes, no one cares about. However, when the amount of information reaches a trillion bytes (1TB), the difference amounts to 10%, and that’s no longer trivial. Table 2 illustrates the multiplying factor associated with using a binary system.

Attempts to solve this conundrum have been made by several organizations who have suggested the use of a different set of prefixes for the binary system, such as kibi for 1,024, mebi for 1,048,576, gibi for 1,073,741,824, and so forth. To date, none of these are in general use.

Consumers continue to ignore the difference, while disk drive and computer manufacturers targeting consumers only mention it in passing in the “small print.” Enterprise storage companies, on the other hand, now live in the terabyte/petabyte era and do distinguish between the two — at least when calculating and comparing costs.

Digital data storage supply and demand
The advent of the computer accelerated our ability to create data, but it also brought a new challenge. Now that we can generate data blazingly fast, how do we store it?

My Compaq 386 desktop from around 1989 had a hard disk drive (HDD) with a capacity of about 100MB. In 2001, about 10 years later, the data storage capacity of my laptop HDD amounted to about 2GB — roughly an increase of one order of magnitude or 10X. My 2016 laptop boasts a solid state hard drive (SSHD) with 1TB of capacity. That’s in the ballpark of one thousand times increase in less than 15 years.

It’s far easier to generate zettabytes of data than to manufacture zettabytes of data storage capacity. A wide gap is emerging between data generation and hard drive and flash production. In Figure 3, the blue bar chart maps data growth — actual and estimated — over a 20-year period. The orange bar chart tracks storage factory capacity.

By 2020, demand for capacity will outstrip production by six zettabytes, or nearly double the demand of 2013 alone.

Electronic design data in EDA
An interesting application area that produces large quantities of data is the Electronic Design Automation (EDA) industry. At the present rate, the data generated by EDA tools doubles every year, but not all EDA data is equally organized.

The process of designing an electronic chip is based on creating an accurate model of the chip’s architecture, behavior, and functionality. Broadly speaking, the process consists of two stages or phases: front-end and back-end.

During the front-end design phase, engineers create a chip design by compiling source files into a model. The chip design model is verified by scheduling and running simulation jobs in a large compute grid.

The front-end phase generates an input/output (I/O)-intensive workload when a large number of jobs run in parallel: EDA applications read and compile millions of small source files to build and simulate a chip design. The workload requires high levels of concurrency because of the large number of jobs that need to run in parallel, generating a random I/O pattern.

During the back-end design and verification phase, the data access pattern becomes more sequential. The backend workload tends to have a smaller number of jobs with a sequential I/O pattern that runs for a longer period of time. The output of all the jobs involved in a chip’s design phases can produce terabytes of data. Even though the output is often considered working space, the data still requires the highest tier of storage for performance.

Within the storage system, EDA workflows tend to store a large number of files in a single directory — typically per design phase — in a deep directory structure on a large storage system. Performance-sensitive project directories, including those for both scratch and non-scratch directories, dominate the file system.

Directories contain source code trees, front-end register transfer level (RTL) files that define logic in a Hardware Description Language (HDL), binary compiled files after synthesis against foundry libraries, and the output of functional verifications and other simulations (see also Performance at Scale for EDA). This poses interesting challenges to the vendors of the data storage devices that EDA vendors rely upon, as we will discuss in a future column.

Conclusion
At the time of the 19th General Conference on Weights and Measures in 1991, a metric prefix to the power of 24 was considered to be large enough to include virtually all known physical measures for many years to come.

Approximately twenty years later, in 2010, digital data storage hit the “Zetta” prefix, with only one prefix, the “Yotta,” left available. Maybe the time is approaching for another conference to further expand the available prefixes.

Dr. Lauro Rizzatti is a verification consultant and industry expert on hardware emulation (www.rizzatti.com). Previously, Dr. Rizzatti held positions in management, product marketing, technical marketing, and engineering. He can be reached at lauro@rizzatti.com.

Related posts:

0 comments
Post Comment
realjjj   2016-09-15 12:55:23

At least Figure 3 is problematic, as you harvest a lot of the data from Seagate's marketing materials. http://www.recode.net/2014/1/10/11622168/stuffed-why-data-storage-is-hot-again-really

The problem is that Seagate has been singing this tune for 5 years while reducing production capacity to adapt to the declining demand, in terms of units. Just a few years ago, after buying Samsung's HDD division, they had the ability to ship almost 100 million units per quarter, now they are are reducing production capacity to 35-40 million units per quarter.

Your point remains valid but some of the data presented is not.

jimfordbroadcom   2016-09-15 13:42:10

So where does all this data come from?  Consider this: I'm a hardware guy not generating any large files like those GDSII files you mention, although I do work for an IC company.  I'm a printed circuit board (PCB) design engineer, so the files I generate for building PCB's are mostly in the MB (megabyte) range, perhaps 10 MB tops.  Now, I don't make movies with as you say 30 GB of data each, I just go about my merry way reading and writing emails, and occasionally generating those MB files.  So tell me why my 237 GB hard drive recently got completely filled up!  I had to get IT help to delete a few 10's of GB of unnecessary files just to get my laptop computer to work again!  Useless data generated by lazy software engineers, that's what the problem is!  Get with the program, folks!  In terms of software productivity, it may be a good idea that we are educating software engineers to abstract away the details, but when they don't have any idea what the hardware is doing, there are severe consequences, and data bloat is one of them.  Another is the need for faster and faster hardware.  And why do we need faster and faster hardware, you ask?  Because of slower and slower software, of course!  And it gets worse: software is getting slower and slower faster than hardware is getting faster and faster.  Whew!

DMcCunney   2016-09-15 16:16:41

@jimforbroadcom: So tell me why my 237 GB hard drive recently got completely filled up!  I had to get IT help to delete a few 10's of GB of unnecessary files just to get my laptop computer to work again!

What OS is on your laptop, and what sort of useless files were those?

I'm fussy about digital housekeeping, what gets stored locally, and precisely where it's put, and I periodically throw out the trash to keep things tidy.

It sounds a lot like you don't have a process in place to manage such things, and need one.

>Dennis

jimfordbroadcom   2016-09-15 16:23:21

Running Windows 7.  I don't remember the exact files.  Our IT helpdesk tech remoted into my computer and deleted the unnecessary junk.  No, I don't spend a lot of time or effort on garbage collection, and I don't really believe I should have to.  This is a computer, after all.  It's like the old days of UNIX engineering workstations when you had to RM CORE every week or so or the computer would drown in its own $#!+  Sigh...

DMcCunney   2016-09-15 16:34:01

@jimforbroadcom:  No, I don't spend a lot of time or effort on garbage collection, and I don't really believe I should have to.

In an ideal world, I'd agree. In the imperfect world we live in, you do need to expend some effort.  That doesn't mean doing it manually.  That means installing tools to do it, teaching them what junk is, and running them on occasion.

I recommend a freeware application called CCleaner from an outfit called Piriform.  CCleaner is used to delete junk files created by the OS and applications, and has a steadily increasing number of apps whose junk it knows how to clean.  You need to configure precisely what it will count as junk and delete, but that's a one time exercise.

The late husband of a friend was a sloppy housekeeper and didn't throw stuff out, and extended the bad habit to his PC.  His widow asked me to look at it, and CCleaner removed gigabytes of junk and made it useful.

>Dennis

jimfordbroadcom   2016-09-15 16:43:34

Thanks, Dennis.  I will check out CCleaner.  I've tried CleanSweep before and wasn't too impressed.

Messy housekeeping?  I can relate.  If we could attach photos I'd show you my desk and lab benches.  They'd almost make Bob Pease or Jim Williams blush!

DMcCunney   2016-09-15 17:04:00

@jimforbroadcom: Thanks, Dennis.  I will check out CCleaner.  I've tried CleanSweep before and wasn't too impressed .

I haven't looked at CleanSweep.  I've been using CCleaner since the old days when it was called Crap Cleaner.  There's a payware Pro version with more features, but I haven't needed it.  It will also optionally check for newer versions and offer to send you to where you can get them.

CCleaner installs right-click context-menu shortcuts on the Recycle Bin to open it or run it.  Open it, tell it to Analyze, and it will scan the system and return a list of what it will remove.  You can fine tune that if desired.

By default, it empties the Recycle Bin, and deletes various junk files elsewhere.  It does warn you it's an actual delete operation, and be sure you want what it will clean to go away.

Install it, open it, and spend a bit of time in the Windows and Applications section to fine tune what it removes.  Once you've done that, you should just be able to run it periodically.

Messy housekeeping?  I can relate.  If we could attach photos I'd show you my desk and lab benches.  They'd almost make Bob Pease or Jim Williams blush!

I had a co-worker a while back whose office was like that.  He didn't have a problem because he knew what pile to look under, but $DEITY help anyone else who had to find something in his office if he was out...

>Dennis

TonyTib   2016-09-15 17:41:11

I like to use PortableApps (from http://www.portableapps.com ) which are all free, and includes some free disk cleaners.

Note that sometimes you can be over-aggressive.

Another area to look is the Windows swap file, which defaults to twice your memory size.  On our machines with 32G RAM, the swap file defaults to 64G - and those PC's never swap.  So I I set the initial swap size to 2G.

On desktop machines, look and see if the hibernate file is there - if you have a lot of memory it can also be quite large.  I never hibernate my desktop, and thus set Windows to remove it (Google for instructions on how to do this) You probably don't want to remove it on a laptop.

DMcCunney   2016-09-15 20:10:20

@TonyTib: I like to use PortableApps (from http://www.portableapps.com ) which are all free, and includes some free disk cleaners.

CCleaner has a portable version with no installer intended for sysadmins and advanced users.

Note that sometimes you can be over-aggressive.

Which I why I recommend looking at CCleaner's configuration and making sure you understand what it will delete.  You can shoot yourself in the foot.

Another area to look is the Windows swap file, which defaults to twice your memory size. On our machines with 32G RAM, the swap file defaults to 64G - and those PC's never swap.  So I I set the initial swap size to 2G.

It's technically possible to run with no swap file, but I use a minimal one as well.  I have 8GB RAM in the current desktop (the max it will take), and I normally never see memory usage exceed 50%.

And I boot from an SSD, but have an HD as well, so the swap file is placed there.

On desktop machines, look and see if the hibernate file is there - if you have a lot of memory it can also be quite large.  I never hibernate my desktop, and thus set Windows to remove it (Google for instructions on how to do this) You probably don't want to remove it on a laptop.

I use Hibernation on a laptop, but the desktop is on 24/7.  When I went to Win10, I discovered I had to disable the new "Hybrid Shutdown" feature.  That doesn't do an actual true shutdown, and on my machine, required a power cycle to shut it down.  (I dual boot Windows and Linux, and sometimes want to be in Linux instead of Windows.)  So no hiberfile.sys file here.  MS makes the blithe assumption the world is running on a laptop where battery power is the scarce resource, and everyone wants to suspend and resume as quickly as possible.  Er, no...

>Dennis

Rizzatti   2016-09-15 20:53:02

Realjjj,

Thanks you for your comments. Using one chart, commenting it and giving credit to the source ought not to qualify for "lot of the data from Seagate's marketing materials." Seagate is an authority in the field of data storage and I did not question the authenticity of their chart. But I wonder if the drop in HDD production is affecting drives of less than 1 TB. Today, I would not purchase a drive with less than 1 TB, preferably 4 TB. And next year probably 10 TB. It must be tough for storage companies to keep up with such a fast moving target.

I am traveling in India, but once back to the West Coast, I will check with my contacts at other storage companies and possibly comment further.

Again, thank you for this opportunity.

realjjj   2016-09-15 23:20:47

Bit growth is solid but to claim that supply is far bellow demand while reducing capacity instead of expanding it, is absurd. It's the opposite of grabbing that claimed opportunity.

There must also be a disconnect between data creation and how much of it has sufficient value to be retained long term. Even if that chart is accurate, it likely means that enough older data is being deprecated for supply to be sufficient. One could theorize that cheaper storage would allow for less data to be deprecated but that's a matter of cost not volumes- how long one keeps surveillance footage, depends on TCO not supply. At the very least they purposely ignore demand elasticity.

Seagate is stuck with HDDs and by claiming that demand is much higher than supply, they try to make the case that HDDs have a long life ahead because CAPEX for NAND is very high.


A lot of data is being generated but not all is stored long term as cost is always a factor.

realjjj   2016-09-15 23:33:02

What eats a lot of storage should be in Windows/Temp , you would need to log as admin to see the pointless .tmp files.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles