United Business Media EE Times




Search

HOMELATEST NEWSSEMICONDUCTORSMOST POPULARMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSS

 

TARGETING XML PROCESSING FOR THE WEB








EE Times



hris Brandin has spent 25 or so of his 49 years in the computer industry. And for most of that time the chief technology officer and chairman of startup Neocore LLC (Colorado Springs, Colo.) has been acutely aware of the need for a new computing paradigm to reflect the changing Net-centric and network-processing application base. That base, Brandin said, is reaching a critical point, especially with the emergence of standardized methodologies for organizing and parsing textual and graphical information such as the Extensible Markup Language (XML).

"I happen to think that the emergence of XML is one of the most profound and important events, not just in the current computing environment, but during the entire short history of modern computing," said Brandin. For the first time, he said, computers can now communicate the way humans do: combining information with self-describing metadata that communicates the nature of the information being transmitted. "When I meet someone for the first time I do not just say my name, 'Chris,' the way a computer would," Brandin said. "Rather, I say, 'My name [the metadata] is Chris [the data].' "

But the benefits we derive from XML do not come without costs. The language is certainly easy to use and to adapt for a wide variety of applications, and it's independent of the platform, data stream or communications protocol within which it is packaged, he said. But XML is also costly in terms of memory space, code size and packet length, and requires much more computing power to process than more efficient languages such as C or C++.

Neocore, the company that Brandin founded with chief executive officer Tim Dix, has come up with a solution, or rather a set of solutions, that derive from concepts collectively known as parallel associative computing. But they go further, incorporating new techniques for pattern matching, searching and recognition, and symbolic representation.

Many of the ideas that Brandin, a graduate in physics and math from the Rochester Institute of Technology and the University of Maryland, has come up with derive from his broad experience in building large-scale networked parallel computer systems for phone companies, the Pentagon's Norad, utilities and the New York Stock Exchange. That, and his acknowledged fascination with computing architectures at the extreme edges of the application spectrum.

"In the mainstream, we have been using the same basic Von Neumann-like architectures since the 1940s," Brandin said. "And while we have come up with more sophisticated methods of making that architecture go faster and work more efficiently, it is basically unchanged." In the mainstream, in businesses, on the desktop and in many embedded applications, he said, what has changed is not the processor, but the way we define the problems and the application software needed to accomplish tasks. In the process much has been sacrificed in order to get results that, while not exactly what we wanted or needed, were close enough.

It is clear, Brandin said, that existing computer architectures-good at number crunching and applications that can be bent to take advantage of that-are running out of steam in the new Net-centric environment, where they do nothing to break a number of bottlenecks on the way to nearly ubiquitous, Web-enabled and network-based computing.

"I see two sets of problems, one on its way to being solved and another still to be addressed adequately," Brandin said. "One is the issue of data and information movement, which is being addressed by a new generation of definitely non-Von Neumann network processors that are being deployed into the network and communications fabric in switches and routers." The other problem, which still has no adequate solution, is closer to home and to human experience. It involves the way the text and visual information contained in Web pages and in business- to-business transactions is processed.

HTML and XML, for that matter, are hypertext techniques for linking information by associations, rather than in the strictly hierarchical file structures used in most computer systems. And the addition to the XML specification of features such as X-link and X-path ensures that the World Wide Web will become more associative in structure rather than less. "Most of the traditional computing solutions to accelerating XML are at the limit of the capabilities of the underlying architectures and will not scale up any further," Brandin said. "What we need is a compute engine that is more like the human brain, that is associative in nature."

Although there has been considerable research into associative processing and related technologies, such as content-addressable memory, such techniques have not made it out into the mainstream, Brandin said, mainly because of the silicon and processor overhead.

So he and his small coterie of engineers and computer scientists at Neocore have extended and adapted such techniques and combined them with new ways of representing data, and of searching and recognizing it. At the heart of the technology they are incorporating into an XML processor, the team is developing the idea of "pattern-based associative processing."

In pure associative processing, retrieval is based on returning content that is an exact match of a search query. In Neocore's pattern-based scheme, by contrast, what is searched for is a representation of the search object containing only the essential elements of the object that make it unique.

"A problem with traditional associative schemes is that the search parameters could vary in length, so performance and retrieval time depended on the size and complexity of the starting search object," said Brandin. What he and his co-researchers have done is develop the concept of "iconization."

In the Neocore approach an icon is a standardized 64-bit representation of the information requested, no matter what the length or complexity-a gestalt of the original object or text string that contains unique identifiers that an associative processor can use to retrieve the actual data directly.

Built into an XML Commerce Server the startup is developing for deployment by Bowstreet Software Inc. (Portsmouth, N.H.), Neocore's pattern-based associative processor has three key elements: the associative processor, which analyzes the query, extracts the essential defining elements and performs the searches; a generator of the 64-bit icons that stand in for the actual search objects; and an associative memory controller, which reconfigures standard RAM to operate as an associative memory array. In the traditional approach to associative memory, data, rather than being identified by a particular location, is pinpointed by properties it contains. In the standard structure, to retrieve a word, a search key must be presented that represents particular values of all or some of the bits in the word. But in the Neocore scheme, the memory is searched using the essential defining elements incorporated in the 64-bit icon.

"I could not have invented a better application for our technology than XML," said Brandin. "When we looked at XML initially, we realized there [was] a wide range of things that could be done with XML, if it were supported in hardware, that are unachievable with any current technology."

Today only a few alternatives are available to directly process XML in anything approaching realistic performance. "These techniques are OK on small documents, but on real-world-size documents, performance drops off precipitously, almost geometrically," Brandin said. "Using a document object model [DOM] schema, it took several minutes to absorb a 10-Mbyte document and several hundred seconds to run a 1,000-query search." On a 440-Mbyte document, about 20 hours would have been required for a similar 1,000-query operation. With Neocore's pattern-based associative processor methodology, Brandin said, an XML server can do the same initial document-acquisition step in a few seconds; running a 1,000-query job takes a hundredth of a second, no matter what the size of the document.

Brandin's goals for the pattern-based associative processing techniques extend far beyond the XML processor to a wide range of concerns facing builders of network-based systems: complete content scanning of network packets at full line speed, 3-D content searches, network security, packet analysis and URL filtering, among others.

"There are so many ways that this technology can be used," he said, "that the problem for us is deciding what to focus on first-just go after the few applications we can handle, and worry and make decisions about other applications later. The problems we see will still be there and other than quick-fix temporary fixes, we don't see anything coming along that will solve them better than the approach we have developed."












  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   


 

FEATURED TOPIC



ADDITIONAL TOPICS












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2008 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Your California Privacy Rights | Terms of Service | About