dd: April 2012

Wednesday, April 25, 2012

Monday, April 23, 2012

MongoDB cheatsheet

Import Data
//import SourceFile (csv file) into DataBaseName CollectionName with firstline as header
mongoimport -d DataBaseName -c CollectionName --type csv --file SourceFile --headerline

//import file,xxx.txt into foo in Collection bar without headerinfo(have to specify the fieldnames)
mongoimport -d foo -c bar --type csv --file xxx.txt -f id,timestamp,latitude,longitude,place

mongoimport -d foo -c bar --type tsv --file xxx.txt -f id,timestamp,latitude,longitude,place

Create Index

db.[collection-name].ensuerIndex({[tag-name]: 1})

Create Spatial Index

db.[collection-name].ensureIndex({[tag-name]:"2d"})

Python SimpleHTTPServer

Type the following in the direction which you want to share.

python -m SimpleHTTPServer 9999

or

python -m http.server 8001

//this command create a simple http server on local port 9999 using SimpleHTTPServer module.

Then other users can get the files in the directory using a browser.

Monday, April 16, 2012

Spatial data mining procedures

Procedure
1. import json files
2. build index
3. build foreign keys
4. build tiles
5. build neighborhood index (orignal oid, target oid, distance, direction, topology)
6. spatial data mining tools

Platforms
1. PostgreSQL + postgis (open source solution)
2. MongoDB
3. MS-SQL
4. Hadoop

Input
1. Place_dump_US
2. Checkin

Output
1. OpenLayers + Geoserver (visualization)
2. Patterns (representation)

Sunday, April 15, 2012

Hadoop Clusters Setup

Pre-setup
1. install jdk/jre1.6 or up
2. install ssh
a. create master sshkey
ssh-keygen -t dsa -P ""
cat id_dsa.pub>>authorized_keys
b. copy master public key to slaves
scp id_dsa.pub slaveN:~/.ssh/master.pub
c. add master pub key to authorized_keys
cat master.pub>>authorized_keys
d. from master, ssh to slaveN and check if a passphrase is needed.
3. edit /etc/hosts & /etc/hostname

Setup
1. setup env.xml (export JAVA_HOME)

2. core-site (specify name node and jobtracker) --for master & slaves

fs.default.name
hdfs://master

3. hdfs-site.xml (data node) --for master & slave

dfs.name.dir
/home/hduser/hddata/name

dfs.data.dir
/home/hduser/hddata/data

4. mapred-site.xml (jobtracker) --for master & slaves

mapred.job.tracker
master:54311

5. list all slaves to conf/slaves --for master/jobtracker only

6. chmod g-w to all data and name directories

** start-dfs.sh will consult slaves on name-node and start all data nodes on slaves.
** start-mapred.sh will consult salves on job-tracker-node and start all task-trackers on slaves.

Startup
1. execute "hadoop namenode -format" on name node site

2. execute "start-dfs.sh" on name node site

3. execute "start-mapred.sh" on job tracker site

Shutdown
1. execute "stop-mapred.sh" on job tracker site

2. execute "stop-dfs.sh" on name node site

dd

Wednesday, April 25, 2012

Python split file based on json tag