IBM: Storage Tank–fleetingly code-named Golden Retriever

Print Friendly, PDF & Email

[b”>IBM plans open-source storage strategy[/b”>
By Stephen Shankland
December 20, 2002, 3:58 PM PT

SAN JOSE, Calif.–To encourage the broadest possible support for its forthcoming “Storage Tank” technology, IBM will release an open-source version of the software needed to let servers tap into the next-generation storage system.

Big Blue is working with an undisclosed open-source group on the software and will release the code when the product is generally available in 2003, said David Pease, manager of storage software at IBM’s Almaden Research Center and leader of the 5-year-old Storage Tank project. In addition, IBM plans to publish the communication method fundamental to the next-generation storage project.

The collaborative approach is the most recent example of IBM trying to capitalize on the momentum of the open-source movement. The company also backs the Linux operating system, the Globus Toolkit for supercomputing networks, and several other projects of the collaborative programming movement.

IBM has tapped into the open-source community as a way to speed the development and adoption of technologies it favors and to give itself more cachet with in-the-know programmers. The company devotes many of its own resources to open-source projects, most notably its Linux Technology Center.

Storage Tank–fleetingly code-named Golden Retriever–is a technology designed to get more use out of existing storage systems and make them easier to manage. With Storage Tank, existing systems can be linked, so vaster amounts of data can be stored.

The technology works by using a different way of keeping track of descriptive information–“metadata” such as physical locations, file sizes or access permissions–that accompanies the actual content within the files. Where most storage systems include this metadata in the storage system itself, Storage Tank spreads the information across a group of metadata servers, lower-end dual-processor Intel servers running Linux.

The approach permits several advantages. For one thing, it can keep track of a lot of files. IBM’s goal is for the system to control as many as a billion files, said Jai Menon, an IBM fellow and storage research manager at Big Blue’s Almaden Research Center.

In addition, files of a certain type can be automatically moved to a particular storage “pool.” For example, video and audio streaming files can be physically stored automatically on a particular storage device suited to that task, while infrequently used text files can be stored on a device with lower performance.

The use of the pools in conjunction with preset policies will let administrators automate tasks such as data backup, Menon said.

The system also means the same files can be accessed directly by different operating systems. Currently, because most operating systems have their own ways of storing files, that’s difficult to do without using file system software from a company such as Veritas Software.

But for servers with multiple operating systems to tap into Storage Tank, a piece of software called an agent must be running to communicate with the metadata server. IBM plans to release a sample agent program as open-source software, Pease said.

Releasing the example software will permit others to write agents to tap into Storage Tank, Pease said. In addition, IBM will describe the protocols the agents use to communicate with the metadata servers, allowing others to build their own metadata servers if they wish, he said.

IBM hopes its strategy will make Storage Tank widely used. “Our goal for Storage Tank is nothing short of world domination,” Pease said, only partly joking.

IBM’s Almaden labs are working on several technologies besides Storage Tank, showing off several products in a recent media tour of the facility.

IBM expects only about 60 percent to 65 percent of its research lab projects to reach the product stage, Menon said. “We don’t want to be more successful, because if we’re not being crazy enough, we’re not being challenged enough,” he said.

IBM has several other projects stewing at the lab:

• A project code-named SledRunner is designed to give priority access to the programs that need a fast response time from hard drive arrays. Often low-priority jobs take up a storage system’s time when those jobs could be postponed a few fractions of a second. The SledRunner name stems from the acronym SLE (service level enforcement).

• IBM is taking a crack at a technology that it acknowledges has flopped in the past–arranging blocks of data on a hard drive in the order it will be needed, which minimizes the time the hard drive has to spend moving mechanical components to grab the next tidbit of information. This project, called Automatic Locality Improving Storage (ALIS), relies on arranging data after monitoring the order in which it’s actually used as a computing process runs.

• For a future version of Storage Tank, IBM plans a front-end “gateway” so that remote computers can tap into a tank over a network. It’s similar in concept to EMC’s Celerra product or Network Appliance’s products in a deal with Hitachi Data Systems.

• IBM is working on a “semantic file system,” software to make it easier to find a specific file. Current file systems store files in an ever-more-complex cascade of directories, but a better method than indexed contents of files, for example, could help people find what they need faster–“a sort of Google for enterprise file systems,” Menon said.

• “Differential remote copy” is a technology to speed up the process of copying data on one storage system to a distant site with the identical data that protects against disasters such as earthquakes. The technique sends only the data that has changed since the last update, Menon said.

[b”>Distributed metadata searching system and method[/b”>
Patent Number: 6,434,548

A system and method of distributed metadata searching is disclosed. The present invention permits an extension of the searching and retrieval functions of existing Internet web search engines by utilizing computational resources embodied in user computer systems and search browsers. By distributing the searching and scanning functions to the user level, the present invention reduces the computational and communications burden on Internet web search engines and crawlers, resulting in lower computational resource utilization by Internet search engine providers. Given the exponential growth rate currently being experienced in the Internet community, the present invention provides one of the few methods by which complete searches of this vast distributed database may be performed. The present invention permits embodiments incorporating a Search Manger (1001) further comprising a Service Results Manager (1013), User Profile Database (1012), Service Manager(1013), and Service Database (1014); a Light Weight Application SCANNER (1002); and a Search Engine (1008). These components may be augmented in some preferred embodiments via the use of a Search Browser (1003), Internet Communications (1004); Web Site(s) (1005), Web Crawler(s) (1006), and a Repository Database (1007).

More Info

IBM Storage Tank…/storage_tank/

NAS over Storage Tank…/index.shtml

Distributed Storage Tank…/index.shtml

Posted in Technology Review.

Leave a Reply

Your email address will not be published. Required fields are marked *