Today's companies face the challenge of managing distributed development efforts across multiple remote locations. Not only are there substantial communication and process issues involved, but simply enabling all team members to collaborate on the same data set at the same time can be difficult, and data storage requirements for design files are growing every year.
While gigabytes of project data can easily be shared on a local-area network (LAN) using standard file server technology, these benefits do not extend to remote offices connected over wide-area networks (WANs). When it comes to file sharing over WANs, standard file server protocols provide unacceptably slow response times while opening and writing files. This forces IT to make an unappealing choice: either live with reduced productivity due to poor network performance at remote offices or use replication schemes that waste storage and inhibit global collaboration.
However, a new class of product known as wide-area file services (WAFS) holds the promise of solving the shared data problem for distributed engineering organizations without sacrificing performance. Let's take a look at WAFS systems and the technology capabilities they need to deliver.
File Sharing Dilemmas
All major file sharing protocols, including NFS (Network File Systemfor Unix/Linux environments), CIFS (Common Internet File Systemfor Windows environments), and IPX/SPX (Internetwork Packet Exchange/Sequenced Packet Exchangefor Novell environments) were designed for LAN environments where clients and servers were located in the same building or campus.
The assumption that the client and the server would be in close proximity led to a number of design decisions that do not scale across WANs. For example, these file sharing protocols tend to be rather "chatty", which means that they send many remote procedure calls (RPCs) across the network to perform operations.
For certain operations on a filesystem using the NFS protocol (such as an rsync of a source code tree), almost 80% of the RPCs sent across the network can be access() RPCs, while the actual read() and write() RPCs typically comprise only 8-10% of the RPCs. Thus 80% of the work done by the protocol is simply spent trying to determine if the NFS client has the proper permissions to access a particular file on the NFS server, rather than actually moving data.
In a LAN environment, these RPCs do not impact performance significantly, but when combined with the high latency typical of WANs, these RPCs can be deadly to performance. Worse, remote clients often end up timing out and retransmitting the RPCs, compounding the inefficiency. Furthermore, because data movement RPCs make up such a small percentage of the communication, increasing network bandwidth will make no difference to the aggravated end user.
Various solutions have been proposed to the WAN file sharing problem, including replicating file copies and implementing distributed filesystems, but neither approach has provided a complete solution. Enterprise content delivery networks (eCDNs) tried to mitigate this problem by caching copies of files at each remote office. But eCDNs, like web caching infrastructure, only provide a read-only copy of data at the remote office. If remote office users wanted to modify the file, they either had to go across the WAN to access the original copy and incur a major performance penalty, or update the local copy and create multiple, out-of-sync versions of the same file.
Filesystems developed over the last 15 to 20 years such as AFS attempted to solve the WAN file sharing problem using a distributed filesystem architecture, which unites disparate file servers at remote offices into a single logical filesystem. However, these technologies required substantial changes in IT architecture to work properly.
In particular, filesystem-based technologies require remote-office applications to use entirely new protocols because they do not export data using industry standard protocols such as NFS or CIFS. With over 1 billion computers deployed in the world that access data using either CIFS or NFS, clearly such a solution is untenable. Further, these technologies depend on "owning" the data store at the data center, which is clearly untenable given the billions of dollars invested in current file server and NAS infrastructure.
The bottom line is that for any solution to the wide area file sharing problem to gain traction, it must be able to integrate itself with existing infrastructure rather than requiring new infrastructure to be built.
In spite of the failures of both caching technologies like eCDNs and distributed filesystems to address the central issues in WAN file sharing, these technologies do provide important components for solving the WAN file sharing problem. New wide area file services (WAFS) products combine distributed file systems with caching technology to allow real-time, read-write access to shared file storage from any location, while also providing interoperability with standard file sharing protocols such as NFS and CIFS.
WAFS products enable transparent worldwide design collaboration on the same data set, without complicated replication schemes or slow network performance. WAFS products will cache files in a read-write mode at remote locations, thus speeding up data access for remote users tremendously. WAFS enables LAN semantics for file access to be extended to the entire enterprise.
WAFS systems (Figure 1) usually consist of edge file gateway (EFG) appliances, which are placed at remote offices, and one or more central server (CS) appliances that allow storage resources to be accessed by the EFGs.
Figure 1: Diagram of a typical diagram of a WAFS system.
Each EFG appears as a local fileserver to remote office users. Together, the EFGs and CS implement a distributed file system and communicate using a WAN-optimized protocol. This protocol is translated back and forth to NFS and CIFS at either end, to communicate with centralized storage and remote user applications.
Key Design Issues
When building a WAFS system, three key design questions that must be addressed include:
- What are the features of the optimized protocol run between the EFGs and CSes across the WAN?
- What specific optimizations have to occur in the system design for reading files?
- What is the specific architecture for writing files and moving updates back to central storage resources?
The protocol used between the remote offices and the datacenter should incorporate file-aware differencing technology, data compression, streaming, and other technologies to improve performance and efficiency in moving data across the WAN. File-aware differencing is especially important because it can detect which parts of a file have changed, and only move those parts across the WAN. Furthermore, if pieces of a file have been rearranged, only offset information will be sent, rather than the data itself. These techniques result in tremendous, order-of-magnitude bandwidth reduction across the WAN and time savings in accessing files by remote users.
Read performance is governed by the ability of the EFG to cache files at the remote office, and the ability to serve cached data to users while minimizing the overhead of expensive kernel user communication and context switches, in effect enabling the cache to act just like a high-performance file server. If the WAFS system is architected correctly the remote cache should mirror the data center exactly and only a few WAN round trips are required to check credentials and availability of file updates, but read requests will be satisfied from the local cache. Thus, regardless of how many NFS/CIFS read RPCs come into the EFG, it should hardly translate into any WAN traffic.
Unlike read performance, write performance is governed by the write caching mechanism that is used in the WAFS system. The two main types of mechanisms are known as write-back and write-through.
In a write-through approach, data written to a file is sent immediately over the WAN to the datacenter, while in write-back the data is written to the EFG and then sent over the WAN. Either approach, in isolation, has certain associated tradeoffs. Write-through is very safe, because all file updates are stored in the datacenter, but it suffers from poor performance and does not survive WAN disruptions. Write-back is very fast, but is riskier if the EFG fails before updates are sent to the datacenter.
The optimal combination involves using a write-back approach for maximum performance, coupled with synchronous logging of file updates to persistent storage, ensuring no data loss in case of file system crashes or WAN outages. Write-back caching is typically very difficult to implement correctly with logging, but has superior performance and reliability characteristics.
Figure 2 depicts representative performance for both read and write operations over a WAFS system and over a standard wide area network. Opening a 5-Mbyte file over the WAN takes about 122 seconds, while a high-performing WAFS system will fetch the file in 11 seconds the first time it is accessed, and at essentially LAN speed on subsequent (warm) accesses because the file is cached locally. Writing a 2 MB file over the WAN takes 81 seconds. A write-back WAFS system achieves the same result in about 4 seconds.
Figure 2: Diagram showing performance of a WAFS system.
In all the tests shown in Figure 2, the WAN latency was 60 ms (representative of actual conditions between San Francisco and Houston) and the bandwidth allotted was 1.544 Mbit/s (T1 Line). Clearly, WAFS products can enable near-LAN speed read-write access to data in a WAN environment.
Data Coherency and Consistency
Data coherency and data consistency are important properties of WAFS implementations, because they ensure that file updates are safe (cannot be written over) and available throughout the network of edge devicescrucial features for supporting engineering collaboration.
Data coherency means that file updates (writes) from any one remote office are guaranteed never to conflict with updates from another remote office. Properly designed WAFS implementations guarantee this by maintaining a system of file leases. Leases are defined as a particular access privilege to a file from a remote office.
If a user at a remote office wants to write to a cached file, the EFG at that office must obtain a "write lease", i.e. a right to modify the document before it can do so. WAFS solutions guarantee that at any time there will be only one remote office that has the write lease on a particular file thus guaranteeing coherence. Also, when a user at another office tries to open the file, the EFG that has the write lease flushes its data first and optionally can give up the write lease if there are no active writers to the file. This mechanism ensures that writers at different offices do not collide with each other and that file updates are safe.
Data consistency implies that file updates made at one office are always available enterprise-wide, and well-architected WAFS system do this immediately after the update is made. Again, for collaboration, this is supremely important because remote designers want to be sure they are working on the most current version of any file, no matter where it was worked on last.
Any WAFS implementation should be capable of handling large files and large numbers of files (particularly important for CAD), as well as large numbers of concurrent users. A WAFS product that cannot scale beyond a few hundred MB of files or 10 users is not of much use.
The issue of write-through and write-back architectures mentioned earlier figures into how well a WAFS implementation scales. A write-through WAFS implementation that cannot scale to enterprise levels as synchronous data transfers across the WAN quickly becomes a bottleneck as the number of files and users increase. Systems that are based on write-back architectures, that incorporate differencing and compression technologies, scale much better.
Additionally, scalable systems recognize temporary files that applications (e.g. Microsoft Word) may create during the normal course of operation and do not send these files over the WAN, instead only sending over the final revision, which accelerates performance.
Revision Control Systems
One big question that gets asked about WAFS systems is whether these systems are somehow competitive to revision control systems. In fact, revision control systems are one of the key areas to benefit from WAFS, and the two solutions are complimentary.
Any revision control system that stores its data in flat files, even if it stores metadata in a separate database, will see a tremendous improvement in performance using WAFS. To properly support revision control systems it is highly imperative for a WAFS implementation to provide both data coherency and data consistency.
WAFS products are capable of exporting entire revision control repositories to remote sites, allowing the revision control system manager to run locally at the remote sitedramatically speeding up check-in, check-out, and build processes. As locks are released, the changed files are propagated to the server, immediately ensuring that changes are reflected back to the datacenter in a consistent and coherent manner.
For companies sporting distributed teams, sharing data between a number of remote offices in real-time has been a challenge due to the poor performance and reliability of file sharing protocols such as NFS and CIFS when used over wide are networks. Workarounds to the problem have had undesirable consequences. Emerging WAFS technologies, on the other hand, offer a solution to the problem that enables enterprises around the world to share data with the same performance, reliability, interface, and semantics that they would have if they were sharing data on a local area network.
About the Author
Vinodh Dorairajan is a senior software engineer at Tacit Networks. Vinodh holds a holds a masters degree in computer applications from Bharathidasan University, India and can be reached at email@example.com.