Wednesday, April 25, 2012
Python split file based on json tag
'''A script to split places_dump_US.geojson into small files based on the json tag, province'''
'''script start'''
import json
def writeState(m,state='none'):
out_name='places_dump_US.'+state+'.geojson';
tg=open(out_name,'a',-1)
print(json.dumps(m),file=tg)
tg.close
sf=open('places_dump_US.geojson','r',-1)
count=0;
for line in sf:
try:
p=json.loads(line)
cstate = p.get('properties').get('province')
writeState(p,cstate)
print(count)
count+=1
except:
writeState(p)
print(count)
count+=1
sf.close
'''script end'''
Monday, April 23, 2012
MongoDB cheatsheet
Import Data
//import SourceFile (csv file) into DataBaseName CollectionName with firstline as header
mongoimport -d DataBaseName -c CollectionName --type csv --file SourceFile --headerline
//import file,xxx.txt into foo in Collection bar without headerinfo(have to specify the fieldnames)
mongoimport -d foo -c bar --type csv --file xxx.txt -f id,timestamp,latitude,longitude,place
mongoimport -d foo -c bar --type tsv --file xxx.txt -f id,timestamp,latitude,longitude,place
Create Index
db.[collection-name].ensuerIndex({[tag-name]: 1})
Create Spatial Index
db.[collection-name].ensureIndex({[tag-name]:"2d"})
//import SourceFile (csv file) into DataBaseName CollectionName with firstline as header
mongoimport -d DataBaseName -c CollectionName --type csv --file SourceFile --headerline
//import file,xxx.txt into foo in Collection bar without headerinfo(have to specify the fieldnames)
mongoimport -d foo -c bar --type csv --file xxx.txt -f id,timestamp,latitude,longitude,place
mongoimport -d foo -c bar --type tsv --file xxx.txt -f id,timestamp,latitude,longitude,place
db.[collection-name].ensuerIndex({[tag-name]: 1})
Create Spatial Index
db.[collection-name].ensureIndex({[tag-name]:"2d"})
Python SimpleHTTPServer
Type the following in the direction which you want to share.
python -m SimpleHTTPServer 9999
or
python -m http.server 8001
//this command create a simple http server on local port 9999 using SimpleHTTPServer module.
Then other users can get the files in the directory using a browser.
python -m SimpleHTTPServer 9999
or
python -m http.server 8001
//this command create a simple http server on local port 9999 using SimpleHTTPServer module.
Then other users can get the files in the directory using a browser.
Monday, April 16, 2012
Spatial data mining procedures
Procedure
1. import json files
2. build index
3. build foreign keys
4. build tiles
5. build neighborhood index (orignal oid, target oid, distance, direction, topology)
6. spatial data mining tools
Platforms
1. PostgreSQL + postgis (open source solution)
2. MongoDB
3. MS-SQL
4. Hadoop
Input
1. Place_dump_US
2. Checkin
Output
1. OpenLayers + Geoserver (visualization)
2. Patterns (representation)
1. import json files
2. build index
3. build foreign keys
4. build tiles
5. build neighborhood index (orignal oid, target oid, distance, direction, topology)
6. spatial data mining tools
Platforms
1. PostgreSQL + postgis (open source solution)
2. MongoDB
3. MS-SQL
4. Hadoop
Input
1. Place_dump_US
2. Checkin
Output
1. OpenLayers + Geoserver (visualization)
2. Patterns (representation)
Sunday, April 15, 2012
Hadoop Clusters Setup
Pre-setup
1. install jdk/jre1.6 or up
2. install ssh
a. create master sshkey
ssh-keygen -t dsa -P ""
cat id_dsa.pub>>authorized_keys
b. copy master public key to slaves
scp id_dsa.pub slaveN:~/.ssh/master.pub
c. add master pub key to authorized_keys
cat master.pub>>authorized_keys
d. from master, ssh to slaveN and check if a passphrase is needed.
3. edit /etc/hosts & /etc/hostname
Setup
1. setup env.xml (export JAVA_HOME)
2. core-site (specify name node and jobtracker) --for master & slaves
fs.default.name
hdfs://master
3. hdfs-site.xml (data node) --for master & slave
dfs.name.dir
/home/hduser/hddata/name
dfs.data.dir
/home/hduser/hddata/data
4. mapred-site.xml (jobtracker) --for master & slaves
mapred.job.tracker
master:54311
5. list all slaves to conf/slaves --for master/jobtracker only
6. chmod g-w to all data and name directories
** start-dfs.sh will consult slaves on name-node and start all data nodes on slaves.
** start-mapred.sh will consult salves on job-tracker-node and start all task-trackers on slaves.
Startup
1. execute "hadoop namenode -format" on name node site
2. execute "start-dfs.sh" on name node site
3. execute "start-mapred.sh" on job tracker site
Shutdown
1. execute "stop-mapred.sh" on job tracker site
2. execute "stop-dfs.sh" on name node site
1. install jdk/jre1.6 or up
2. install ssh
a. create master sshkey
ssh-keygen -t dsa -P ""
cat id_dsa.pub>>authorized_keys
b. copy master public key to slaves
scp id_dsa.pub slaveN:~/.ssh/master.pub
c. add master pub key to authorized_keys
cat master.pub>>authorized_keys
d. from master, ssh to slaveN and check if a passphrase is needed.
3. edit /etc/hosts & /etc/hostname
Setup
1. setup env.xml (export JAVA_HOME)
2. core-site (specify name node and jobtracker) --for master & slaves
3. hdfs-site.xml (data node) --for master & slave
4. mapred-site.xml (jobtracker) --for master & slaves
5. list all slaves to conf/slaves --for master/jobtracker only
6. chmod g-w to all data and name directories
** start-dfs.sh will consult slaves on name-node and start all data nodes on slaves.
** start-mapred.sh will consult salves on job-tracker-node and start all task-trackers on slaves.
Startup
1. execute "hadoop namenode -format" on name node site
2. execute "start-dfs.sh" on name node site
3. execute "start-mapred.sh" on job tracker site
Shutdown
1. execute "stop-mapred.sh" on job tracker site
2. execute "stop-dfs.sh" on name node site
Subscribe to:
Posts (Atom)