An indexing mechanism (indexer). The purpose of the indexer is to walk through HTTP

Posted on 2003/5/13 by m1bxd

Our search software consists of two parts. The first is an indexing mechanism (indexer). The purpose of the indexer is to walk through HTTP, FTP, NEWS servers or local files, recursively grabbing all the documents and storing words meta-data about those documents in an SQL database in a smart and efficient manner.

Crawler / spider / meta

http://www.mnogosearch.org/

The Second Superpower Rears its Beautiful Head

Posted on 2003/5/8 by m1bxd

http://cyber.law.harvard.edu/people/jmoore/secondsuperpower.html

http://www.theregister.co.uk/content/6/30087.html

Blog news and link – Blogchalking project

Posted on 2003/5/8 by m1bxd

http://www.blogchalking.tk/

http://blogmatcher.com
http://blog.iloha.net/lab/faq.php

http://www.dot.tk/

disk space left – command line

Posted on 2003/5/8 by m1bxd

du -h |grep -v -e “.*/.*/”

Biggest files
find /usr/local -xdev -printf “%s %pn” | sort -n | tail -100

infoanarchy.org

Posted on 2003/5/8 by m1bxd

This wiki contains information related to file sharing, copyright, the gift economy, cyber liberties, peer to peer research, information tools, and similar topics which are discussed on infoAnarchy. All content contributed is in the public domain except where otherwise noted.

http://www.infoanarchy.org/wiki/wiki.pl?Homepage

xodp similar projects

Posted on 2003/4/7 by m1bxd

http://xodp.sourceforge.net/similarprojects.html

Different Google Searches

Posted on 2003/4/7 by m1bxd

http://www.google.com/help/operators.html

link:

The query [link:”> will list webpages that have links to the specified webpage. For instance, [link:www.google.com”> will list webpages that have links pointing to the Google homepage. Note there can be no space between the “link:” and the web page url.

Quick List
â€¢ cache:
â€¢ link:
â€¢ related:
â€¢ info:
â€¢ stocks:
â€¢ site:
â€¢ allintitle:
â€¢ intitle:
â€¢ allinurl:
â€¢ inurl:

How to backup Outlook categories

Posted on 2003/4/7 by m1bxd

http://www.slipstick.com/outlook/olcat.htm

Master Category List The master category list is not a separate file, but instead is part of the Windows Registry. Each user has a different category list

To back up the Master Category List in Outlook 97/98:

Run Regedit and go to HKEY_CURRENT_USERSoftwareMicrosoftOffice8.0Outlook and select the Categories key.

Choose Registry | Export Registry File to make a copy of the Categories branch of the registry.
To back up the Master Category List in Outlook 2000:

Run Regedit and go to HKEY_CURRENT_USERSoftwareMicrosoftOffice9.0Outlook and select the Categories key.

Choose Registry | Export Registry File to make a copy of the Categories branch of the registry.
To back up the Master Category List in Outlook 2002:

Run Regedit and go to HKEY_CURRENT_USERSoftwareMicrosoftOffice10.0Outlook and select the Categories key.

Choose Registry | Export Registry File to make a copy of the Categories branch of the registry.
You can use this exported branch to distribute a category list to other Outlook users. See the MSKB article How to Migrate Custom Categories to Other Users. CAUTION: Using a .reg file to propagate a category list does not update a user’s own list; instead it completely replaces it. I personally do not recommend this method, because it eliminates much the utility of the Category feature for users. See the next section for what I think is a better method.

If you remove a category from the master list, any items marked with that category are not affected. In the Categories dialog box, that category is listed as “(not in master list).”

Also see:

OL2000 Custom Categories Are Not Transferred from Outlook 2000 to Outlook 2002
http://support.microsoft.com/?kbid=303937

Proxies

Posted on 2003/4/7 by m1bxd

http://www.proxyinfo.co.uk/nproxies.htm
http://www.all-nettools.com/pr.htm

The Semantic WWW by Tim Berners-Lee 1998

Posted on 2003/3/27 by m1bxd

Tim Berners-Lee
Date: September 1998. Last modified: $Date: 1998/10/14 20:17:13 $

Status: An attempt to give a high-level plan of the architecture of the Semantic WWW. Editing status: Draft. Comments welcome

http://www.w3.org/DesignIssues/Semantic

/Weblogs/Tools/

Posted on 2003/3/25 by m1bxd

http://directory.google.com/Top/Computers/Internet/On_the_Web/Weblogs/Tools/

nntp//rss

Posted on 2003/3/25 by m1bxd

Bridging the worlds of NNTP clients and RSS feeds, nntp//rss is an application that will enable you to use your existing favorite NNTP newsreader to read your information channels.
http://www.methodize.org/nntprss

Applications that bring together RSS feeds on your own computer:
http://www.larkfarm.com/rss_resources.htm
http://www.cincomsmalltalk.com/BottomFeeder/

Feedster is a search engine for what is called an “RSS Feed”

Posted on 2003/3/25 by m1bxd

Feedster is a search engine for what is called an “RSS Feed”. An RSS Feed is an XML tagged file which allows a website, news site or blog (actually any site) to provide to the world a list of its current contents. RSS feeds can contain all kinds of information from news to blog / weblog posts to stock quotes and more.

http://feeder.ripcord.co.nz/
http://www.feedster.com/

The PA is dead? http://dear_raed.blogspot.com

Posted on 2003/3/21 by m1bxd

http://dear_raed.blogspot.com

News agencies lose battle on the internet
http://news.independent.co.uk/digital/news/story.jsp?story=389135 Continue reading →

joelonsoftware.com

Posted on 2003/3/12 by m1bxd

http://www.joelonsoftware.com/
Interesting guy

eLiberation – ePilot

Posted on 2003/3/6 by m1bxd

[b”>eLiberation[/b”> is a two-year-old Internet software company that uses its “relational micro-transaction technology” to offer streaming and downloadable file tracking, reporting and revenue allocation. Its Integrated Peer Commerce System (IPCS)provides digital content owners with a method of commercializing the distribution of their digital media within a peer-to-peer environment (“superdistribution”), that includes: high volume micro-payments, royalty tracking and revenue distribution, content distribution control, protection of intellectual property and copyrights, marketing control and back-end reporting. IPCS is made up of three smaller services/mechanisms: a Financial Transaction Management (FTM) tracking and reporting service, a Digital Rights Management (DRM) access control mechanism that allows publishers and artists to manage their own intellectual property rights and Rich Information Files (RIF) that enable new marketing opportunities to be provided at the point of sale by allowing value-added sales and marketing information to be included along with the digital content distribution.

eLiberation’s first commercial implementation of its Financial Transaction Management model is ePilot.com, a cost-per-click search engine that pays its members to search for information using the company’s patent-pending ePilot application. The ePilot site performs over 3 million micro-transcactions every day. eLiberation has also partnered with Groove Networks to provide a branded version of its FTM software to Groove’s File Sharing users.

http://www.epilot.com/

PinPost.com – The Peer-to-Peer Instant Classifieds Network

Posted on 2003/3/6 by m1bxd

Avalist Unveils PinPostâ„¢ – The Peer-to-Peer Instant Classifieds Network

Vancouver, BC – August 15, 2001 – Avalist Networks Inc., a peer-to-peer software development company, is proud to announce the direct market release of its end-user product, PinPostâ„¢ – the P2P Instant Classifieds Network (http://www.PinPost.com). In Beta release since May 2001, PinPostâ„¢ is a full-featured and free downloadable application for localized buying and selling through classified ad-style listings.

PinPostâ„¢, a Microsoft Windowsâ„¢ compatible application, is the first product of its kind, combining peer-to-peer technology with a Linux-based database infrastructure to create a powerful buy and sell network. It is a community-oriented tool serving 60,000 individually selectable localities in North America, enabling buyers and sellers to target their own cities and neighborhoods.

PinPostâ„¢ allows users to instantaneously share listings information through a network of peer computers. Based on locality, classification and keywords they can seamlessly search and browse through a massive collection of listings, conveniently organized within 4,000 product and service categories and posted by 90,000 buyers and sellers.Continue reading →

backflip.com or bookmarksync.com to get bloggered by google?

Posted on 2003/3/6 by m1bxd

Internal profiling of bookmarks – more valuable company
http://www.backflip.com/
http://www.backflip.com/help/ch_export_pages.html – at last

The others to serious consider more:

No profiling of your bookmarks – better product by specification
http://www.bookmarksync.com/

http://www.realmarker.com
http://www.bookmark-pals.co.uk/
http://www.syncit.com/
http://www.mybookmarks.com/
http://www.hotlinks.com/
http://www.blinkpro.com/
http://www.ranks.com/home/organize/top_online_organizers/#362
http://www.linkagogo.com/ – AnnotateContinue reading →

“oo bacon” – XML/RSS news feeds converted to newsgroup :-)

Posted on 2003/3/6 by m1bxd

[b”>XML to Usenet News conversion[/b”>
Stop downloading those news aggregators! You no longer need to stay connected and chew on the web all day. Chances are you’ve had the perfect news reader installed all along. From now on, you can read any XML newsfeed from the comfort of a Usenet news reader such as MS Outlook Express. News will pop up in your inbox automatically, instantly.

http://www.genecast.com/

http://dmoz.org/../Metadata/RDF/Applications/RSS/

http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=XML+RSS+applications+

Linux servers to host Microsoft Office applications :-)

Posted on 2003/3/5 by m1bxd

silicon.com-Tue 25 February 2003 04:07PM GMT

New software from CodeWeavers will allow Linux servers to host Microsoft Office applications, which can then be accessed through a web browser, the company said on Monday.

CodeWeavers, which sells Windows-emulation software for the Linux platform, has launched a server version of its CrossOver Office that runs with Tarantella’s Enterprise 3 server software.

The combination allows administrators to set up a Linux server running applications such as Word and Outlook, which can be made accessible to large numbers of Linux or Unix users without needing any special client software.

Microsoft applications are not directly available on Linux because the operating system is a direct competitor to Windows. However, developers such as CodeWeavers have found ways of getting around this blockade in order to make desktop Linux installations more attractive to businesses.

http://www.codeweavers.com/

IBM: Storage Tank–fleetingly code-named Golden Retriever

Posted on 2003/3/5 by m1bxd

[b”>IBM plans open-source storage strategy[/b”>
By Stephen Shankland
CNET News.com
December 20, 2002, 3:58 PM PT

SAN JOSE, Calif.–To encourage the broadest possible support for its forthcoming “Storage Tank” technology, IBM will release an open-source version of the software needed to let servers tap into the next-generation storage system.

Big Blue is working with an undisclosed open-source group on the software and will release the code when the product is generally available in 2003, said David Pease, manager of storage software at IBM’s Almaden Research Center and leader of the 5-year-old Storage Tank project. In addition, IBM plans to publish the communication method fundamental to the next-generation storage project.

The collaborative approach is the most recent example of IBM trying to capitalize on the momentum of the open-source movement. The company also backs the Linux operating system, the Globus Toolkit for supercomputing networks, and several other projects of the collaborative programming movement.

IBM has tapped into the open-source community as a way to speed the development and adoption of technologies it favors and to give itself more cachet with in-the-know programmers. The company devotes many of its own resources to open-source projects, most notably its Linux Technology Center.

Storage Tank–fleetingly code-named Golden Retriever–is a technology designed to get more use out of existing storage systems and make them easier to manage. With Storage Tank, existing systems can be linked, so vaster amounts of data can be stored.

The technology works by using a different way of keeping track of descriptive information–“metadata” such as physical locations, file sizes or access permissions–that accompanies the actual content within the files. Where most storage systems include this metadata in the storage system itself, Storage Tank spreads the information across a group of metadata servers, lower-end dual-processor Intel servers running Linux.

The approach permits several advantages. For one thing, it can keep track of a lot of files. IBM’s goal is for the system to control as many as a billion files, said Jai Menon, an IBM fellow and storage research manager at Big Blue’s Almaden Research Center.

In addition, files of a certain type can be automatically moved to a particular storage “pool.” For example, video and audio streaming files can be physically stored automatically on a particular storage device suited to that task, while infrequently used text files can be stored on a device with lower performance.

The use of the pools in conjunction with preset policies will let administrators automate tasks such as data backup, Menon said.

The system also means the same files can be accessed directly by different operating systems. Currently, because most operating systems have their own ways of storing files, that’s difficult to do without using file system software from a company such as Veritas Software.

But for servers with multiple operating systems to tap into Storage Tank, a piece of software called an agent must be running to communicate with the metadata server. IBM plans to release a sample agent program as open-source software, Pease said.

Releasing the example software will permit others to write agents to tap into Storage Tank, Pease said. In addition, IBM will describe the protocols the agents use to communicate with the metadata servers, allowing others to build their own metadata servers if they wish, he said.

IBM hopes its strategy will make Storage Tank widely used. “Our goal for Storage Tank is nothing short of world domination,” Pease said, only partly joking.

IBM’s Almaden labs are working on several technologies besides Storage Tank, showing off several products in a recent media tour of the facility.

IBM expects only about 60 percent to 65 percent of its research lab projects to reach the product stage, Menon said. “We don’t want to be more successful, because if we’re not being crazy enough, we’re not being challenged enough,” he said.

IBM has several other projects stewing at the lab:

â€¢ A project code-named SledRunner is designed to give priority access to the programs that need a fast response time from hard drive arrays. Often low-priority jobs take up a storage system’s time when those jobs could be postponed a few fractions of a second. The SledRunner name stems from the acronym SLE (service level enforcement).

â€¢ IBM is taking a crack at a technology that it acknowledges has flopped in the past–arranging blocks of data on a hard drive in the order it will be needed, which minimizes the time the hard drive has to spend moving mechanical components to grab the next tidbit of information. This project, called Automatic Locality Improving Storage (ALIS), relies on arranging data after monitoring the order in which it’s actually used as a computing process runs.

â€¢ For a future version of Storage Tank, IBM plans a front-end “gateway” so that remote computers can tap into a tank over a network. It’s similar in concept to EMC’s Celerra product or Network Appliance’s products in a deal with Hitachi Data Systems.

â€¢ IBM is working on a “semantic file system,” software to make it easier to find a specific file. Current file systems store files in an ever-more-complex cascade of directories, but a better method than indexed contents of files, for example, could help people find what they need faster–“a sort of Google for enterprise file systems,” Menon said.

â€¢ “Differential remote copy” is a technology to speed up the process of copying data on one storage system to a distant site with the identical data that protects against disasters such as earthquakes. The technique sends only the data that has changed since the last update, Menon said.

http://zdnet.com.com/2100-1104-978641.html

[b”>Distributed metadata searching system and method[/b”>
Patent Number: 6,434,548

Abstract:
A system and method of distributed metadata searching is disclosed. The present invention permits an extension of the searching and retrieval functions of existing Internet web search engines by utilizing computational resources embodied in user computer systems and search browsers. By distributing the searching and scanning functions to the user level, the present invention reduces the computational and communications burden on Internet web search engines and crawlers, resulting in lower computational resource utilization by Internet search engine providers. Given the exponential growth rate currently being experienced in the Internet community, the present invention provides one of the few methods by which complete searches of this vast distributed database may be performed. The present invention permits embodiments incorporating a Search Manger (1001) further comprising a Service Results Manager (1013), User Profile Database (1012), Service Manager(1013), and Service Database (1014); a Light Weight Application SCANNER (1002); and a Search Engine (1008). These components may be augmented in some preferred embodiments via the use of a Search Browser (1003), Internet Communications (1004); Web Site(s) (1005), Web Crawler(s) (1006), and a Repository Database (1007).

More Info

http://www.webmasterworld.com/forum34/367.htm

IBM Storage Tank
http://www.almaden.ibm.com/…/storage_tank/

NAS over Storage Tank
http://www.almaden.ibm.com/…/index.shtml

Distributed Storage Tank
http://www.almaden.ibm.com/…/index.shtml

IBM – high availability web hosting … using existing web and internet protocols

Posted on 2003/3/5 by m1bxd

http://www.almaden.ibm.com/cs/people/bayardo/userv/
http://www.almaden.ibm.com/cs/people/bayardo/userv/userv.html
uServ — P2P Webserver from IBM

more on metadata servers – jxta.org

Posted on 2003/3/5 by m1bxd

http://www.openp2p.com/pub/a/p2p/2001/06/06/jxtasearch.html
http://www.jxta.org/project/www/docs/DomainFAQ.html
http://www.searchtools.com/tools/jxta-search.html

http://www.searchtools.com/
http://www.opencola.com/

Continue reading →

metadata – data on data

Posted on 2003/3/5 by m1bxd

http://dublincore.org/
http://www.google.com/search?q=distributed+metadata+servers&hl=en&lr=&ie=ISO-8859-1

http://dopey.hil.unb.ca/unb_metadata/metadata.html
http://adam.ac.uk/adam/index.html

http://www.objectrad.com/javaMetadataServer.html

remember wiki and crit?

Posted on 2003/3/5 by m1bxd

http://directory.google.com/Top/Computers/Software/Groupware/Wiki/?tc=1
http://www.wiki.org/

and crit?…

http://crit.org

[b”>Wikipedia[/b”> is a multilingual project to create a complete and accurate open content encyclopedia.
http://www.wikipedia.org/

p2p surfing

Posted on 2003/3/4 by m1bxd

http://www.openp2p.com/pub/q/p2p_category
http://www.business2.com/webfile/0,1638,8782,00.html
http://www.bsoftware.com/directory.html
http://www.human-links.com/
http://open-content.net/

Grub provides a free for download, free to run, distributed crawling client, which is used to create an infrastructure (database + volunteers) that will eventually provide URL update status information for nearly every web page on the Internet. Grub’s distributed crawler network will enable websites, content providers, and individuals to notify others that changes have occurred in their content, all in real time.
http://grub.org/

What’s black and white and no longer red all over?

more micropayments again

Posted on 2003/3/4 by m1bxd

http://www.peppercoin.com/
Hey – but Mr R – this sounds just like :
http://www.ginx.com/nx/

http://www.rightsmarket.com/

The Man Who Predicted Sept. 11

Posted on 2003/2/16 by m1bxd

An interview with the head of security for investment firm Morgan Stanley Dean Witter was filmed in 1998 on the 44th floor of the World Trade Center. In the film, Rick Rescorla, who was killed in the Sept. 11 attack, describes events that will lead to an attack and the subsequent war on terrorism. The segment was to be incorporated into a documentary on the nature of warfare, but the documentary was never completed and the footage sat hidden away until sometime after the Sept. 11 attack. A colonel active in three wars, Rescorla is brutally honest and eerily prophetic. It appears from online user reviews that the only people reviewing the film who’ve not been impressed are those who have been unable to play it. One notes that a “daft” JavaScript effectively prevents Linux users from viewing the content. For most of us, the usual players will work just fine.

http://atomfilms.shockwave.com/af/content/voice_prophet

Source: NSB – http://www.netsurf.com/nsd/