CoordGuru - mailing list
Check hot topics or new products

If you are not a member Join CoordGuru.
Do you want to Log in or register?

Search in Coord Wiki:

Please Sign in or Register

Introduction to Coord

Cube

Coord is an open source implementation of a SBA(Space-based Architecture) built on DHT(Distributed Hash Table). In a technical point of view, Coord is similar to Tuple Space which is an implementation of the associative memory paradigm for parallel/distributed computing. Coord transparently manages such a space which can be mapped to the memory, file, or even database, and converge those into a large-scale virtual space. In the virtual space, data is located by one or more hash functions as if a point is placed on the "coordinates", a process communicates with another only through the space which looks like "coordinator ". In result, Coord provides a large-scale sharable object storage for parallel/distributed computing.

Nevertheless, Coord is NOT just for the emerging distributed key-value storage systems such as Bigtable, Hbase, Dynamo, MemcacheDB, CouchDB, and Cassandra. That is because Coord provides a distributed computing framework, Coord MapReduce, for large-scale data analysis, which is coined in new semantics by Google. Coord MapReduce is worked on a simple distributed file system, dust . It splits and scatters a file by the chunk, but does not have a centralized metadata server to locate the chunks. In this point, dust differs from GFS/HDFS. In order to take advantage of Coord MapReduce, there is no need to install a special distributed file system. By means of using dust, Coord MapReduce finally helps users to parallelize map/reduce tasks with massive data.

Moreover, Coord provides a better capability for remote execution and parallel processing, warp . It is NOT just a remote or parallel execution tool such as ssh/gexec since it supports load-balance. In a real time, warp assigns the best node to run legacy codes such as c/c++, java, python, or scripts. With warp, users do not need to worry about where to perform their tasks any more. It enables users to easily parallelize their codes programmed in a single machine over the cluster.

Now, Coord is being evolved into a cloud computing platform for large-scale data analysis such as infromation retrival, data/text mining, and machine learning. 

How to Get Started


  • What is Coord?
  • Quick Start
  • How to Install
  • How to Program
  • Language Support - java, python, php, and etc.
  • FAQ
  • How to Analyze Large-scale Dataset


  • Coord Tutorial - public tutorial (presented at 2009 NHN DeView)
  • HOWTO Coord - introduce how to work with Coord
  • HOWTO Coord in Java - introduce how to work with Coord in Java
  • HOWTO Coord in Python - introduce how to work with Coord in Python
  • HOWTO Coord in PHP - introduce how to work with Coord in PHP
  • HOWTO Coord MapReduce - introduce how to perform MapReduce using Coord
  • Parallelize NCDC dataset crawlers with Coord - To crawl NCDC dataset, a legacy code in python was used in a single machine. However, the crawling consumes too many time, so the legacy code should be rewritten for multiple machines. Through Coord, it is not difficult to parallelize the lagacy code in a cluster computing environment. In this tutorial, I will show how to use Coord to process crawlers in parallel without many modifications
  • Analyze MovieLens dataset with Coord MapReduce - Coord provides C++ !MapReduce framework to analyze large-scale dataset. This tutorial shows how to use Coord to analyze !MovieLens dataset
  • Large-scale Graph Search with Coord - Coord enables a large-scale graph search. In this tutorial, some simple triples can be tested in Coord to search some query patterns and find some hidden relations between subject-subject, subject-object, or object-object
  • How to Get Involved

    Join the mailing lists and participate in the discussions around the development of Coord. If you encounter a problem and have an idea how to fix it, please start by making a patch and filing it with our issue tracking system.

    Contact to Me

    Author: Woohyun Kim
    Blog: http://blog.naver.com/wisereign
    Email: woorung@gmail.com

      • News

        Coord won the grand prize at 2009 Open SW Contest
        It was held underMKE(Ministry of Knowledge Economy).

        Coord Tutorial is presented at 2009 NHN DeVeiw
        Coord MapReduce, GraphSearch, and Recommendation will be introduced

        Coord 0.4.0 is released
        Coord 0.4.0 provides coord tutorial & dataset for demos.

        Coord 0.3.5 is released
        Coord 0.3.5 adds a graph search for enabling semantic search.

        2009 Coord Summer of Code
        Join the event, and get the prize!!!

        Coord 0.3 is finally released
        Coord 0.3 provides C++ MapReduce framework for large-scale data analysis.

        2008 WoC Award
        Coord+Lucene project got 2008 WoC(Winter of Code) Award at an open source contest for university students which NCSoft and KIPA held.

        2008 NHN Conference Presentation
        Coord was introduced in 2008 NHN Conference and demonstated in 2008 NHN Deview.

        Download

        Download Coord 0.4.0 Sep. 18 2009

        Related Works

        Greenplum - Greenplum takes MapReduce to support petabyte-scale data analytics

        Aster - Aster is a high-performance database system for data warehousing and analytics, which tightly integrate SQL with MapReduce

        Voldemort - Voldemort is a distributed key-value storage system

        Ringo - Ringo is a distributed key-value storage system

        Scalaris - Scalaris is a scalable, transactional, distributed key-value store. It can be used for building scalable Web 2.0 services

        Kai - Kai is an open source implementation of Amazon Dynamo

        Dynomite - Dynomite is an eventually consistent distributed key value store based off of Amazon Dynamo

        MemcacheDB- MemcacheDB is a distributed key-value storage system designed for persistent. It is NOT a cache solution, but a persistent storage engine for fast and reliable key-value based object storage and retrieval

        ThruDB - Thrudb is a set of simple services built on top of the Apache Thrift framework that provides indexing and document storage services for building and scaling websites

        CouchDB - Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API

        Cassandra - Cassandra is a distributed storage system for managing structured data while providing reliability at a massive scale

        Neptune - Neptune is Distributed Large scale Structured Data Storage, and open source project implementing Google Bigtable

        HBase - HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google Bigtable

        Hypertable - Hypertable is an open source project based on published best practices and our own experience in solving large-scale data-intensive tasks