R.A. Epigonos et al.

Misc > High Performance Computing(HPC)

家のだめマシンもクラスタリングすればどうにかなるのかの実験をしてみたくなった。まぁね、お金を出して解決するよりも今あるものをうまいこと使いましょうという話です。

目次

[メモ] コンピュータをクラスタ化してみる

とにかくやってみないとわからないことが多い。どんなプログラムでは効果的だがどんなプログラムでは効果的でないのか。結局のところ使う人それぞれが別のプログラムを使うわけなので、個々人が試して見なさいということ。それが一番。こういうネタは基本的にああプロンプト帰ってくるまでに時間かなるなぁとか思っていると其の間につい調べちゃうんだよね。

自分で書くプログラムはMPIにせよマルチスレッドやマルチプロセスとか自由に出来るわけなんだけど、出来あいのプログラムってのは変えにくい。でもやっぱり出来合いのプログラムであっても速くなってほしい。そんなときの解決策の一つとしてSSIというものが実用的なら僕はすごくうれしい。

とりあえずキーワードの羅列だけ。openMosix,ClusterKnoppix,OpenSSI,Kerrighed GridEngine,Rocks Clusters,SCore,巫女ぐにょLinuxスイッチングハブ,,SMPマシン(複数のプロセッサをもつマシン),Beowulf

  1. みっし~の研究生活: Linux HPCクラスターの構築(その2)
  2. Amazon.com: High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks): Joseph Sloan: Books
  3. 負荷分散ソフトウエアGrid Engine
  4. rubyneko - 第10回 関西 Debian 勉強会 行ってきました
  5. PC Cluster Consortium
  6. OpenMosixによる計算クラスタの構築
  7. MIKO GNYO/Linux
  8. MIKO GNYO/Linux: 検索結果
  9. kuroyagi さんのノートブック

SSI(Single Server Image)環境

調べた感じだとしたのように5つほど選択肢があるようだ。これらから派生していくつかのディストリビューションがある。ClusterKnoppixは openMosixを組み込んだカーネルを用いたKnoppix。僕にとって重要なのはクラスタの運用中にノードの抜き差しが出来るかどうかだ。

  1. openMosix(LinuxPIM)
  2. Kerrighed
  3. OpenSSI
  1. openmosix|Kerrighed|OpenSSI|LinuxPMI - Google 検索
  2. オープンソースのクラスター管理システム - SourceForge.JP Magazine
  3. Linux.com :: A survey of open source cluster management systems
  4. coLinuxとopenMosixで異機種混合のクラスターを構成する
  5. スラッシュドット・ジャパン | openMosixでHPCクラスタはいかが?

Kerrighed

とりあえず上に上げた3つの中で最新の更新のものKerrighedを試してみる。debian etchで環境構築する。まずはInstalling Kerrighed 2.3.0 - Kerrighedをみつつカーネルのコンパイルを行う。なんだか余分なパッケージを大量に入れたような気もするが。

$ su -
erter
# apt-get install xmlto
# apt-get install lsb
# apt-get install rsync
# apt-get install pkg-config
# apt-get install libtool
# apt-get install gcc
# apt-get install bzip2
# cd /usr/src/
# wget http://kerrighed.gforge.inria.fr/kerrighed-latest.tar.gz
# wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2
# ls
kerrighed-2.3.0.tar.gz  linux-2.6.20.tar.bz2
# tar zxf kerrighed-2.3.0.tar.gz
# tar jxf linux-2.6.20.tar.bz2
# ls
kerrighed-2.3.0  kerrighed-2.3.0.tar.gz  linux-2.6.20  linux-2.6.20.tar.bz2
# cd kerrighed-2.3.0
# ./configure --with-kernel=/usr/src/linux-2.6.20
# make patch
# make defconfig
# make kernel
# make
# make kernel-install
# make install
# ls -l /boot/vmlinuz-2.6.20-krg
-rw-r--r-- 1 root root 2488432 Jan  4 12:38 /boot/vmlinuz-2.6.20-krg
# ls -l /boot/System.map
lrwxrwxrwx 1 root root 21 Jan  4 12:38 /boot/System.map -> System.map-2.6.20-krg
# ls -l /lib/modules/2.6.20-krg
total 52
lrwxrwxrwx 1 root root   21 Jan  4 12:38 build -> /usr/src/linux-2.6.20
drwxr-xr-x 2 root root 4096 Jan  4 12:49 extra
drwxr-xr-x 2 root root 4096 Jan  4 12:38 kernel
-rw-r--r-- 1 root root   45 Jan  4 12:49 modules.alias
-rw-r--r-- 1 root root   69 Jan  4 12:49 modules.ccwmap
-rw-r--r-- 1 root root   44 Jan  4 12:49 modules.dep
-rw-r--r-- 1 root root   73 Jan  4 12:49 modules.ieee1394map
-rw-r--r-- 1 root root  141 Jan  4 12:49 modules.inputmap
-rw-r--r-- 1 root root   81 Jan  4 12:49 modules.isapnpmap
-rw-r--r-- 1 root root   74 Jan  4 12:49 modules.ofmap
-rw-r--r-- 1 root root   99 Jan  4 12:49 modules.pcimap
-rw-r--r-- 1 root root   43 Jan  4 12:49 modules.seriomap
-rw-r--r-- 1 root root 3217 Jan  4 12:49 modules.symbols
-rw-r--r-- 1 root root  189 Jan  4 12:49 modules.usbmap
lrwxrwxrwx 1 root root   21 Jan  4 12:38 source -> /usr/src/linux-2.6.20
# ls -l /etc/default/kerrighed
-rwxr-xr-x 1 root root 327 Jan  4 12:49 /etc/default/kerrighed
# ls -l /lib/modules/2.6.20-krg
total 52
lrwxrwxrwx 1 root root   21 Jan  4 12:38 build -> /usr/src/linux-2.6.20
drwxr-xr-x 2 root root 4096 Jan  4 12:49 extra
drwxr-xr-x 2 root root 4096 Jan  4 12:38 kernel
-rw-r--r-- 1 root root   45 Jan  4 12:49 modules.alias
-rw-r--r-- 1 root root   69 Jan  4 12:49 modules.ccwmap
-rw-r--r-- 1 root root   44 Jan  4 12:49 modules.dep
-rw-r--r-- 1 root root   73 Jan  4 12:49 modules.ieee1394map
-rw-r--r-- 1 root root  141 Jan  4 12:49 modules.inputmap
-rw-r--r-- 1 root root   81 Jan  4 12:49 modules.isapnpmap
-rw-r--r-- 1 root root   74 Jan  4 12:49 modules.ofmap
-rw-r--r-- 1 root root   99 Jan  4 12:49 modules.pcimap
-rw-r--r-- 1 root root   43 Jan  4 12:49 modules.seriomap
-rw-r--r-- 1 root root 3217 Jan  4 12:49 modules.symbols
-rw-r--r-- 1 root root  189 Jan  4 12:49 modules.usbmap
lrwxrwxrwx 1 root root   21 Jan  4 12:38 source -> /usr/src/linux-2.6.20
# ls -l /etc/default/kerrighed
-rwxr-xr-x 1 root root 327 Jan  4 12:49 /etc/default/kerrighed
# ls -lR /usr/local/share/man*
/usr/local/share/man:
total 36
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man1
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man2
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man3
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man4
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man5
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man6
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man7
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man8
drwxr-sr-x 2 root staff 4096 Jan  4 12:49 man9
/usr/local/share/man/man1:
total 20
-rw-r--r-- 1 root staff  886 Jan  4 12:49 checkpoint.1
-rw-r--r-- 1 root staff 1314 Jan  4 12:49 krgadm.1
-rw-r--r-- 1 root staff 2334 Jan  4 12:49 krgcapset.1
-rw-r--r-- 1 root staff  813 Jan  4 12:49 migrate.1
-rw-r--r-- 1 root staff  894 Jan  4 12:49 restart.1
/usr/local/share/man/man2:
total 12
-rw-r--r-- 1 root staff 1322 Jan  4 12:49 krgcapset.2
-rw-r--r-- 1 root staff 1349 Jan  4 12:49 migrate.2
-rw-r--r-- 1 root staff 1248 Jan  4 12:49 migrate_self.2
/usr/local/share/man/man3:
total 0
/usr/local/share/man/man4:
total 0
/usr/local/share/man/man5:
total 4
-rw-r--r-- 1 root staff 1838 Jan  4 12:49 kerrighed_nodes.5
/usr/local/share/man/man6:
total 0
/usr/local/share/man/man7:
total 8
-rw-r--r-- 1 root staff 2055 Jan  4 12:49 kerrighed.7
-rw-r--r-- 1 root staff 2900 Jan  4 12:49 kerrighed_capabilities.7
/usr/local/share/man/man8:
total 0
/usr/local/share/man/man9:
total 0
node01:/usr/src/kerrighed-2.3.0# ls -l /usr/local/bin/krgadm
-rwxr-xr-x 1 root staff 21315 Jan  4 12:49 /usr/local/bin/krgadm
node01:/usr/src/kerrighed-2.3.0# ls -l /usr/local/bin/krgcapset
-rwxr-xr-x 1 root staff 21058 Jan  4 12:49 /usr/local/bin/krgcapset
node01:/usr/src/kerrighed-2.3.0# ls -l /usr/local/bin/migrate
-rwxr-xr-x 1 root staff 11358 Jan  4 12:49 /usr/local/bin/migrate
node01:/usr/src/kerrighed-2.3.0# ls -l /usr/local/lib/libkerrighed.*
-rw-r--r-- 1 root staff 36258 Jan  4 12:49 /usr/local/lib/libkerrighed.a
-rwxr-xr-x 1 root staff   843 Jan  4 12:49 /usr/local/lib/libkerrighed.la
lrwxrwxrwx 1 root staff    21 Jan  4 12:49 /usr/local/lib/libkerrighed.so -> libkerrighed.so.1.0.0
lrwxrwxrwx 1 root staff    21 Jan  4 12:49 /usr/local/lib/libkerrighed.so.1 -> libkerrighed.so.1.0.0
-rwxr-xr-x 1 root staff 28805 Jan  4 12:49 /usr/local/lib/libkerrighed.so.1.0.0
node01:/usr/src/kerrighed-2.3.0# ls -l /usr/local/include/kerrighed
total 56
-rw-r--r-- 1 root staff   810 Jan  4 12:49 capabilities.h
-rw-r--r-- 1 root staff   840 Jan  4 12:49 capability.h
-rw-r--r-- 1 root staff   601 Jan  4 12:49 checkpoint.h
-rw-r--r-- 1 root staff   197 Jan  4 12:49 comm.h
-rw-r--r-- 1 root staff  1054 Jan  4 12:49 hotplug.h
-rw-r--r-- 1 root staff   233 Jan  4 12:49 kerrighed.h
-rw-r--r-- 1 root staff 13742 Jan  4 12:49 kerrighed_tools.h
-rw-r--r-- 1 root staff  1163 Jan  4 12:49 krgnodemask.h
-rw-r--r-- 1 root staff  1459 Jan  4 12:49 proc.h
-rw-r--r-- 1 root staff   405 Jan  4 12:49 process_group_types.h
-rw-r--r-- 1 root staff  1494 Jan  4 12:49 types.h
# mkinitramfs -o /boot/initrd.img-2.6.20-krg 2.6.20-krg
# vi /boot/grub/menu.lst
default 3
title           Debian GNU/Linux, kernel 2.6.20-krg
root            (hd0,0)
kernel          /boot/vmlinuz-2.6.20-krg root=/dev/hda1 ro session_id=1
initrd          /boot/initrd.img-2.6.20-krg
savedefault
# ifconfig
# echo "session=1">> /etc/kerrighed_nodes
# echo "nbmin=1">> /etc/kerrighed_nodes
# echo "127.0.0.1:0:lo">> /etc/kerrighed_nodes
# cat /etc/kerrighed_nodes
session=1
nbmin=1
127.0.0.1:0:lo

とりあえずここまで。まだ動いてはいない。何かがおかしいのだがそれが何かわからない状態。

  1. ocs/Howto/Kerrighed - Mandriva Community Wiki
  2. kerrighed installation how to | In da Wok ......
  3. Installing Kerrighed 2.2.0 - Kerrighed
  4. Main Page - Kerrighed
  5. grub menu.lst default - Google 検索
  6. GNU_GRUB
  7. Grubでデュアルブート時のデフォルト(標準)起動OS設定
  8. session_id kerrighed menu.lst - Google 検索
  9. Tutorial: Kerrighed | Bioinformatics
  10. krg_DRBL - Grid Architecture - Trac
  11. Linux安裝入門與基本管理

Sun Grid Engine

マシンが増えると、どうにかして全部使いたくなります。まぁリーズナブルな時間、1月とか、かかる計算も、死んでるマシンを追加すれば20日くらいに短縮できるかもしれない。ああ。お金がないって素敵。色々と工夫するから。

せっかくなので、最新版をもらってくる。Sun Grid Engine 6.2を。このときSunのアカウントが必要。古めの版にはアカウント不必要。Linux版をダウンロードしておく。

  1. Sun Grid Engine の機能詳細
  2. Sun Grid Engine(SGE)利用法 | スーパーコンピュータ | ヒトゲノム解析センター
  3. gridengine: ホーム
  4. gridengine: Grid Engine HOWTOs

マスターホストのセットアップ

まずはSGEをインストールするディレクトリを作る。

# mkdir -p /opt/sge62

作ったディレクトリを$SGE_ROOT環境変数にセットする。

# export SGE_ROOT=/opt/sge62

SGEの管理者を作る。

# useradd sgeagmin

ダウンロードしてきたファイル(ここではx86アーキテクチャ)を解凍。

# tar zxf ge62_lx24-x86.tar.gz
  1. Ubuntu でグリッドコンピューティング - May the Source be with you

Sun Grid Engineの導入(2回目)

やり方が大方わかったところで本番用の環境でインストール。まずはダウンロード。どうやら、6.2u2が出ているのでこれのlinux版をダウンロード。ダウンロードしたものは以下の14ファイル。

$ ls
bytecount_cksum.list
LICENSE.txt
sdm10u2_core_rpm.zip
sdm10u2_core_targz.zip
sge62u2_1_linux24-i586_rpm.zip
sge62u2_1_linux24-i586_targz.zip
sge62u2_1_linux24-ia64_rpm.zip
sge62u2_1_linux24-ia64_targz.zip
sge62u2_1_linux24-x64_rpm.zip
sge62u2_1_linux24-x64_targz.zip
sge62u2_arco_rpm.zip
sge62u2_arco_targz.zip
THIRDPARTYLICENSEREADME.txt
webconsole3.0.2-linux.targz.zip

とても親切なインストールマニュアルがあるのでそれを参照。英語版だけどわかりやすい。基本的にCD-ROMに収められたソフトのインストール手順のようなので、そこは読み替え。マシンはx86で、tar methodでインストールしたいのでこのsge62u2_1_linux24-i586_targz.zipファイルを解凍。すると、sge6_2u2_1/ディレクトリが作られて、そのなかにマニュアルで言及されているcommonとarchtecture dependentのbinファイルが出来る。

$ unzip sge62u2_1_linux24-i586_targz.zip
$ ls sge6_2u2_1/
sge-6_2u2_1-bin-linux24-i586.tar.gz
sge-6_2u2-common.tar.gz
$ pwd
/usr/src/

これでマニュアルにそってインストールが進められそう。まずはsge-rootディレクトリ(/opt/sge-6.2)を作って、そこに移動して、先に解凍しておいた2つの*.tar.gzをsge-rootに解凍する。

$ su -
Password:
# mkdir -p /opt/sge6-2
# cd /opt/sge6-2
# tar zxf /usr/src/sge6_2u2_1/sge-6_2u2-common.tar.gz
# tar zxf /usr/src/sge6_2u2_1/sge-6_2u2_1-bin-linux24-i586.tar.gz
# ls 
3rd_party  doc       include        install_qmaster  mpi   start_gui_installer
catman     dtrace    inst_sge       lib              pvm   util
ckpt       examples  install_execd  man              qmon

次に環境変数SGE_ROOTを設定し、確認。

# export SGE_ROOT='/opt/sge6-2'
# printenv SGE_ROOT
/opt/sge6-2

最後にutil/setfileperm.shを走らせる。

# util/setfileperm.sh $SGE_ROOT

ここからはいちいち質問に答えていく。この質問はyes。

                    WARNING WARNING WARNING
                    -----------------------
We will set the the file ownership and permission to

   UserID:         0
   GroupID:        0
   In directory:   /opt/sge6-2

We will also install the following binaries as SUID-root:

   $SGE_ROOT/utilbin/<arch>/rlogin
   $SGE_ROOT/utilbin/<arch>/rsh
   $SGE_ROOT/utilbin/<arch>/testsuidroot
   $SGE_ROOT/bin/<arch>/sgepasswd
   $SGE_ROOT/bin/<arch>/authuser

Do you want to set the file permissions (yes/no) [NO] >> yes

enterキーでずらずら流れる。どうやらパーミッションを設定しているようだ。

Verifying and setting file permissions and owner in >3rd_party<
Verifying and setting file permissions and owner in >bin<
Verifying and setting file permissions and owner in >ckpt<
Verifying and setting file permissions and owner in >dtrace<
Verifying and setting file permissions and owner in >examples<
Verifying and setting file permissions and owner in >inst_sge<
Verifying and setting file permissions and owner in >install_execd<
Verifying and setting file permissions and owner in >install_qmaster<
Verifying and setting file permissions and owner in >lib<
Verifying and setting file permissions and owner in >mpi<
Verifying and setting file permissions and owner in >pvm<
Verifying and setting file permissions and owner in >qmon<
Verifying and setting file permissions and owner in >util<
Verifying and setting file permissions and owner in >utilbin<
Verifying and setting file permissions and owner in >catman<
Verifying and setting file permissions and owner in >doc<
Verifying and setting file permissions and owner in >include<
Verifying and setting file permissions and owner in >man<

Your file permissions were set

次にguiインストールかcommand lineインストールか。ここではcommand lineインストールにする。マニュアルのnote部分を読む。とりあえずlinuxに新規インストールするぶんには問題なさそうだな。マニュアルのやることリストには2つある。

  1. インストールスクリプトをマスターホストとすべての計算ホストで実行する。
  2. 認証ホストと計算キューをsubmitするホストの情報を登録する。

よくわからんが進める。インストールを進める前に、セキュリティを高めたかったら読めと言われている文書があるのでそれに目を通す。で、わかったこと。

  1. csp-protocolで暗号化されたメッセージをホスト間でやり取り
  2. 秘密鍵の交換は公開鍵プロトコルで行われる。
  3. 暗号化は透過的に行われる
  4. 暗号化セッションはセッションの開始からある時間内で有効。

暗号化はホスト間でやり取りされるメッセージに不正な操作を行われる恐れがある場合には有効だが、それ以外では計算リソースの無駄になる。で、暗号化は行わないことに決定。

さらにInstalling SMF Servicesも読むが、solalis 10のための機能らしいので飛ばす。

マスターホストのインストールに進む。ミスったら最初からやり直すことが出来るそうな。しかし、始める前にのセクションで注意がある。計算ホストと計算キューをsubmitするホストでユーザネームが同じじゃないとだめだそうな。と言うことで、

インストールの開始。

# ./install_qmaster

ライセンス表示の後に同意を求められるのでy

Do you agree with that license? (y/n) [n] >> y

80x24がいいとか何とか言われる。enterキーを押して次に進む。

Welcome to the Grid Engine installation
---------------------------------------

Grid Engine qmaster host installation
-------------------------------------

Before you continue with the installation please read these hints:

   - Your terminal window should have a size of at least
     80x24 characters

   - The INTR character is often bound to the key Ctrl-C.
     The term >Ctrl-C< is used during the installation if you
     have the possibility to abort the installation

The qmaster installation procedure will take approximately 5-10 minutes.

Hit <RETURN> to continue >>

hostnameがlocalhostだったり、IPアドレスが127.0.x.xだったりすると怒られる。

Unsupported local hostname
--------------------------

The current hostname is resolved as follows:

Hostname: localhost
Aliases:  hoge
Host Address(es): 127.0.0.1

It is not supported for a Grid Engine installation that the local hostname
contains the hostname "localhost" and/or the IP address "127.0.x.x" of the
loopback interface.
The "localhost" hostname should be reserved for the loopback interface
("127.0.0.1") and the real hostname should be assigned to one of the
physical or logical network interfaces of this machine.

Installation failed.

Press <RETURN> to exit the installation procedure >>

enterで抜ける。/etc/hostsを編集して切り抜けるか。とりあえず編集前がこんな感じ。

# cat /etc/hosts
127.0.0.1       localhost       hoge

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

で、編集。ifconfigででてきたイーサネットアダプタに割り当てられたアドレスにたいして名前をつければよい。今までは127.0.0.1にhogeというホスト名(エイリアスの)が割り当てられていたが、これを192.168.14.6のホスト名にする。このマシンには4つのethアダプタがあるので、それ以外のものについても適当に追加。

127.0.0.1      localhost
192.168.14.6   hoge
192.168.1.1    hoge1
192.168.2.1    hoge2
192.168.3.1    hoge3

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

この状態で、もういっかい./install_qmasterを行う。前と同様にライセンス表示から始まるが、前と重複する部分は気にせず進む。rootでは無いsge管理者を設定したいのでy

# ./install_qmaster
Choosing Grid Engine admin user account
---------------------------------------

You may install Grid Engine that all files are created with the user id of an
unprivileged user.

This will make it possible to install and run Grid Engine in directories
where user >root< has no permissions to create and write files and directories.

   - Grid Engine still has to be started by user >root<

   - this directory should be owned by the Grid Engine administrator

Do you want to install Grid Engine
under an user id other than >root< (y/n) [y] >> y

sge管理ユーザネームを入力するのだが、管理ユーザを作っておくのを忘れたのでCtrl + Cで終了。

Choosing a Grid Engine admin user name
--------------------------------------

Please enter a valid user name >>

sgeadminと言う名前でsge管理ユーザを作る。

# adduser sgeadmin
Adding user `sgeadmin' ...
Adding new group `sgeadmin' (1001) ...
Adding new user `sgeadmin' (1001) with group `sgeadmin' ...
Creating home directory `/home/sgeadmin' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for sgeadmin
Enter the new value, or press ENTER for the default
        Full Name []:
        Room Number []:
        Work Phone []:
        Home Phone []:
        Other []:
Is the information correct? [Y/n] Y

ユーザが出来たらもう一回./install_qmastar。sge管理ユーザ名を入力してenter。

# ./install_qmaster
Choosing a Grid Engine admin user name
--------------------------------------

Please enter a valid user name >> sgeadmin

Installing Grid Engine as admin user >sgeadmin<

Hit <RETURN> to continue >>

SEG_ROOT環境変数が間違っていれば書き換え。間違ってないのでenter。

Checking $SGE_ROOT directory
----------------------------

The Grid Engine root directory is:

   $SGE_ROOT = /opt/sge6-2

If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/opt/sge6-2] >>

Your $SGE_ROOT directory: /opt/sge6-2

Hit <RETURN> to continue >>

qmasterがwatchするポートをどのようにして決めるか。シェル変数ではなく、/etc/servicesで設定するためにデフォルトの2を選択。

Grid Engine TCP/IP communication service
----------------------------------------

The port for sge_qmaster is currently set as service.

   sge_qmaster service set to port 6444

Now you have the possibility to set/change the communication ports by using the
>shell environment< or you may configure it via a network service, configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form

    sge_qmaster <port_number>/tcp

to your services database and make sure to use an unused port number.

How do you want to configure the Grid Engine communication ports?

Using the >shell environment<:                           [1]

Using a network service like >/etc/service<, >NIS/NIS+<: [2]

(default: 2) >>

sge_qmasterをgrid engineのコミュニケーション手段として使うと言うこと。enterで次に進む。

Grid Engine TCP/IP service >sge_qmaster<
----------------------------------------

Using the service

   sge_qmaster

for communication with Grid Engine.

Hit <RETURN> to continue >>

こんどは実行デーモンであるsge_execdの設定。マスターデーモンと同じで、/etc/servicesで設定するのでデフォルトの2を選択。

Grid Engine TCP/IP communication service
----------------------------------------

The port for sge_execd is currently set as service.

   sge_execd service set to port 6445

Now you have the possibility to set/change the communication ports by using the
>shell environment< or you may configure it via a network service, configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form

    sge_execd <port_number>/tcp

to your services database and make sure to use an unused port number.

How do you want to configure the Grid Engine communication ports?

Using the >shell environment<:                           [1]

Using a network service like >/etc/service<, >NIS/NIS+<: [2]

(default: 2) >>

マスターデーモンのときと同様にenter押すだけ。

Grid Engine TCP/IP communication service
-----------------------------------------

Using the service

   sge_execd

for communication with Grid Engine.

Hit <RETURN> to continue >>

よくわかってないのでcellについては。だから言われたとおりデフォルトのままenter。

Grid Engine cells
-----------------

Grid Engine supports multiple cells.

If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name

   default

If you want to install multiple cells you can enter a cell name now.

The environment variable

   $SGE_CELL=<your_cell_name>

will be set for all further Grid Engine commands.

Enter cell name [default] >>

Using cell >default<.
Hit <RETURN> to continue >>

計算クラスターの名前を入力。とりあえずデフォルトのまま。

Unique cluster name
-------------------

The cluster name uniquely identifies a specific Sun Grid Engine cluster.
The cluster name must be unique throughout your organization. The name
is not related to the SGE cell.

The cluster name must start with a letter ([A-Za-z]), followed by letters,
digits ([0-9]), dashes (-) or underscores (_).

Enter new cluster name or hit <RETURN>
to use default [p6444] >>
creating directory: /opt/sge6-2/default/common

Your $SGE_CLUSTER_NAME: p6444

Hit <RETURN> to continue >>

スプールディレクトリを選ぶが、デフォルトのままでOKなのでそのままenter

Grid Engine qmaster spool directory
-----------------------------------

The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.

The admin user >sgeadmin< must have read/write access
to the qmaster spool directory.

If you will install shadow master hosts or if you want to be able to start
the qmaster daemon on other hosts (see the corresponding section in the
Grid Engine Installation and Administration Manual for details) the account
on the shadow master hosts also needs read/write access to this directory.

The following directory

[/opt/sge6-2/default/spool/qmaster]

will be used as qmaster spool directory by default!

Do you want to select another qmaster spool directory (y/n) [n] >>

windowsの実行ホストをインストールするか聞かれるのでno。

Windows Execution Host Support
------------------------------

Are you going to install Windows Execution Hosts? (y/n) [n] >>

パーミッションの確認。デフォルトではyだがここはnを選択してみる。

Verifying and setting file permissions
--------------------------------------

Did you install this version with >pkgadd< or did you already
verify and set the file permissions of your distribution (y/n) [y] >>

確認と変更を行うか聞かれるので、ここはデフォルトのままy。

Verifying and setting file permissions
--------------------------------------

We may now verify and set the file permissions of your Grid Engine
distribution.

This may be useful since due to unpacking and copying of your distribution
your files may be unaccessible to other users.

We will set the permissions of directories and binaries to

   755 - that means executable are accessible for the world

and for ordinary files to

   644 - that means readable for the world

Do you want to verify and set your file permissions (y/n) [y] >>

どうやらここの処理は最初に行ったutil/setfileperm.sh $SGE_ROOTと同じことをしてくれているようだ。

Verifying and setting file permissions and owner in >3rd_party<
Verifying and setting file permissions and owner in >bin<
Verifying and setting file permissions and owner in >ckpt<
Verifying and setting file permissions and owner in >dtrace<
Verifying and setting file permissions and owner in >examples<
Verifying and setting file permissions and owner in >inst_sge<
Verifying and setting file permissions and owner in >install_execd<
Verifying and setting file permissions and owner in >install_qmaster<
Verifying and setting file permissions and owner in >lib<
Verifying and setting file permissions and owner in >mpi<
Verifying and setting file permissions and owner in >pvm<
Verifying and setting file permissions and owner in >qmon<
Verifying and setting file permissions and owner in >util<
Verifying and setting file permissions and owner in >utilbin<
Verifying and setting file permissions and owner in >catman<
Verifying and setting file permissions and owner in >doc<
Verifying and setting file permissions and owner in >include<
Verifying and setting file permissions and owner in >man<

Your file permissions were set

Hit <RETURN> to continue >>

yにしておく。/etc/hostsとか参照しないのかな。

Select default Grid Engine hostname resolving method
----------------------------------------------------

Are all hosts of your cluster in one DNS domain? If this is
the case the hostnames

   >hostA< and >hostA.foo.com<

would be treated as equal, because the DNS domain name >foo.com<
is ignored when comparing hostnames.

Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y

Ignoring domain name when comparing hostnames.

Hit <RETURN> to continue >>
Making directories
------------------

creating directory: /opt/sge6-2/default/spool/qmaster
creating directory: /opt/sge6-2/default/spool/qmaster/job_scripts
Hit <RETURN> to continue >>

バークレーDBにしておく。これってDBサーバ不可欠ってことなのかな。それだと結構辛いかもしれない。

Setup spooling
--------------
Your SGE binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>

シャドーマスタは使わない、極力早いほうが良い。ということでデフォルトのn。スプールサーバはセットアップしない方針で。

The Berkeley DB spooling method provides two configurations!

Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host

Berkeley DB Spooling Server:
If you want to setup a shadow master host, you need to use
Berkeley DB Spooling Server!
In this case you have to choose a host with a configured RPC service.
The qmaster host connects via RPC to the Berkeley DB. This setup is more
failsafe, but results in a clear potential security hole. RPC communication
(as used by Berkeley DB) can be easily compromised. Please only use this
alternative if your site is secure or if you are not concerned about
security. Check the installation guide for further advice on how to achieve
failsafety without compromising security.

Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>


Hit <RETURN> to continue >>

スプールディレクトリの場所を指定する。デフォルトのまま。

Berkeley Database spooling parameters
-------------------------------------

Please enter the database directory now, even if you want to spool locally,
it is necessary to enter this database directory.

Default: [/opt/sge6-2/default/spool/spooldb] >>

creating directory: /opt/sge6-2/default/spool/spooldb
Dumping bootstrapping information
Initializing spooling database

Hit <RETURN> to continue >>

これもデフォルト。

Grid Engine group id range
--------------------------

When jobs are started under the control of Grid Engine an additional group id
is set on platforms which do not support jobs. This is done to provide maximum
control for Grid Engine jobs.

This additional UNIX group id range must be unused group id's in your system.
Each job will be assigned a unique id during the time it is running.
Therefore you need to provide a range of id's which will be assigned
dynamically for jobs.

The range must be big enough to provide enough numbers for the maximum number
of Grid Engine jobs running at a single moment on a single host. E.g. a range
like >20000-20100< means, that Grid Engine will use the group ids from
20000-20100 and provides a range for 100 Grid Engine jobs at the same time
on a single host.

You can change at any time the group id range in your cluster configuration.

Please enter a range [20000-20100] >>

Using >20000-20100< as gid range. Hit <RETURN> to continue >>

デフォルトのまま

Grid Engine cluster configuration
---------------------------------

Please give the basic configuration parameters of your Grid Engine
installation:

   <execd_spool_dir>

The pathname of the spool directory of the execution hosts. User >sgeadmin<
must have the right to create this directory and to write into it.

Default: [/opt/sge6-2/default/spool] >>

トラブル時のメールをどこに配送するか。とりあえずsgeadminでいいのではということで。

Grid Engine cluster configuration (continued)
---------------------------------------------

<administrator_mail>

The email address of the administrator to whom problem reports are sent.

It's is recommended to configure this parameter. You may use >none<
if you do not wish to receive administrator mail.

Please enter an email address in the form >user@foo.com<.

Default: [none] >> sgeadmin@localhost

最終確認。nでenter。

The following parameters for the cluster configuration were configured:

   execd_spool_dir        /opt/sge6-2/default/spool
   administrator_mail     sgeadmin@localhost

Do you want to change the configuration parameters (y/n) [n] >>
Creating local configuration
----------------------------
Creating >act_qmaster< file
Adding default complex attributes
Adding default parallel environments (PE)
Adding SGE default usersets
Adding >sge_aliases< path aliases file
Adding >qtask< qtcsh sample default request file
Adding >sge_request< default submit options file
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<

Hit <RETURN> to continue >>

ブート時にマスターホストを起動させるかどうか。起動させたいのでそのままenter。

qmaster startup script
----------------------

We can install the startup script that will
start qmaster at machine boot (y/n) [y] >>

cp /opt/sge6-2/default/common/sgemaster /etc/init.d/sgemaster.p6444
/usr/sbin/update-rc.d sgemaster.p6444
 Adding system startup for /etc/init.d/sgemaster.p6444 ...
   /etc/rc0.d/K03sgemaster.p6444 -> ../init.d/sgemaster.p6444
   /etc/rc1.d/K03sgemaster.p6444 -> ../init.d/sgemaster.p6444
   /etc/rc6.d/K03sgemaster.p6444 -> ../init.d/sgemaster.p6444
   /etc/rc2.d/S95sgemaster.p6444 -> ../init.d/sgemaster.p6444
   /etc/rc3.d/S95sgemaster.p6444 -> ../init.d/sgemaster.p6444
   /etc/rc4.d/S95sgemaster.p6444 -> ../init.d/sgemaster.p6444
   /etc/rc5.d/S95sgemaster.p6444 -> ../init.d/sgemaster.p6444

Hit <RETURN> to continue >>

で、マスターデーモンを起動してくれる。

Grid Engine qmaster startup
---------------------------

Starting qmaster daemon. Please wait ...
   starting sge_qmaster
Hit <RETURN> to continue >>

ファイルから実行ホストのリストを入力するか聞かれるので、no。

Adding Grid Engine hosts
------------------------

Please now add the list of hosts, where you will later install your execution
daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may
press <RETURN> if the line is getting too long. Once you are finished
simply press <RETURN> without entering a name.

You also may prepare a file with the hostnames of the machines where you plan
to install Grid Engine. This may be convenient if you are installing Grid
Engine on many hosts.

Do you want to use a file which contains the list of hosts (y/n) [n] >>

とりあえずのところは、マスターホストかつ実行ホストにしておく。自分のホストネームを実行ホストとして登録。

Adding admin and submit hosts
-----------------------------

Please enter a blank seperated list of hosts.

Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.

Host(s): master01
Finished adding hosts. Hit <RETURN> to continue >>

シャドウホストとシャドウマスターホストの違いが良くわからんが推奨されているのでここはy。

If you want to use a shadow host, it is recommended to add this host
to the list of administrative hosts.

If you are not sure, it is also possible to add or remove hosts after the
installation with <qconf -ah hostname> for adding and <qconf -dh hostname>
for removing this host

Attention: This is not the shadow host installation
procedure.
You still have to install the shadow host separately

Do you want to add your shadow host(s) now? (y/n) [y] >>

ファイルから読み込むかなのでno。

Adding Grid Engine shadow hosts
-------------------------------

Please now add the list of hosts, where you will later install your shadow
daemon.

Please enter a blank separated list of your execution hosts. You may
press <RETURN> if the line is getting too long. Once you are finished
simply press <RETURN> without entering a name.

You also may prepare a file with the hostnames of the machines where you plan
to install Grid Engine. This may be convenient if you are installing Grid
Engine on many hosts.

Do you want to use a file which contains the list of hosts (y/n) [n] >>

状況を鑑みるに、シャドウホストは今のところ必要無いな。1台で実行ホストとマスターホストを動かしているわけで、こいつが落ちたらシャドウだろうがなんだろうが落ちるからね。と言うことで追加せずに先に進む。

Adding admin hosts
------------------

Please enter a blank seperated list of hosts.

Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.

Host(s):
Finished adding hosts. Hit <RETURN> to continue >>

確認だけなのでそのままenter。

Creating the default <all.q> queue and <allhosts> hostgroup
-----------------------------------------------------------

root@master01 added "@allhosts" to host group list
root@master01 added "all.q" to cluster queue list

Hit <RETURN> to continue >>

これは1。normalだと負荷に応じてスケジューリングしてくれるのだそうな。

Scheduler Tuning
----------------

The details on the different options are described in the manual.

Configurations
--------------
1) Normal
          Fixed interval scheduling, report limited scheduling information,
          actual + assumed load

2) High
          Fixed interval scheduling, report limited scheduling information,
          actual load

3) Max
          Immediate Scheduling, report no scheduling information,
          actual load

Enter the number of your preferred configuration and hit <RETURN>!
Default configuration is [1] >> 1

We're configuring the scheduler with >Normal< settings!
Do you agree? (y/n) [y] >>

あとは使い方の解説。3回enterでプロンプトが帰ってくる。

Using Grid Engine
-----------------

You should now enter the command:

   source /opt/sge6-2/default/common/settings.csh

if you are a csh/tcsh user or

   # . /opt/sge6-2/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   - $SGE_ROOT         (always necessary)
   - $SGE_CELL         (if you are using a cell other than >default<)
   - $SGE_CLUSTER_NAME (always necessary)
   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)
   - $PATH/$path       (to find the Grid Engine binaries)
   - $MANPATH          (to access the manual pages)

Hit <RETURN> to see where Grid Engine logs messages >>
Grid Engine messages
--------------------

Grid Engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)
   /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

   Qmaster:     /opt/sge6-2/default/spool/qmaster/messages
   Exec daemon: <execd_spool_dir>/<hostname>/messages


Grid Engine startup scripts
---------------------------

Grid Engine startup scripts can be found at:

   /opt/sge6-2/default/common/sgemaster (qmaster)
   /opt/sge6-2/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>
Your Grid Engine qmaster installation is now completed
------------------------------------------------------

Please now login to all hosts where you want to run an execution daemon
and start the execution host installation procedure.

If you want to run an execution daemon on this host, please do not forget
to make the execution host installation in this host as well.

All execution hosts must be administrative hosts during the installation.
All hosts which you added to the list of administrative hosts during this
installation procedure can now be installed.

You may verify your administrative hosts with the command

   # qconf -sh

and you may add new administrative hosts with the command

   # qconf -ah <hostname>

Please hit <RETURN> >>

とりあえず環境変数が上手くセットされるかチェックするために上にあったとおりシェルスクリプトを走らせる。上手く設定されている雰囲気。

# printenv
SHELL=/bin/bash
TERM=screen
OLDPWD=/root
USER=root
MAIL=/var/mail/root
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/opt/sge6-2
LANG=C
SGE_ROOT=/opt/sge6-2
PS1=\h:\w\$
SHLVL=1
HOME=/root
LOGNAME=root
_=/usr/bin/printenv
# . /opt/sge6-2/default/common/settings.sh
# printenv
MANPATH=/opt/sge6-2/man:/usr/share/man:/usr/local/share/man
SHELL=/bin/bash
TERM=screen
SGE_CELL=default
OLDPWD=/root
USER=root
MAIL=/var/mail/root
PATH=/opt/sge6-2/bin/lx24-x86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/opt/sge6-2
LANG=C
SGE_ROOT=/opt/sge6-2
PS1=\h:\w\$
SHLVL=1
HOME=/root
LOGNAME=root
SGE_CLUSTER_NAME=p6444
_=/usr/bin/printenv

これでマスターデーモンのセットアップは終了、なんかつかれた。次に実行デーモンのセットアップ。まずは実行デーモンが登録されているかチェック。hogeはマスターホストをインストールしたマシンだが、同じマシンで実行デーモンを走らせるので、これでよし。

# qconf -sh
hoge

で、実行デーモンのインストーラを起動。実行デーモンのインストールは5分で出来るそうな。ほんとかよ。そのままreturn。

# ./install_execd
Welcome to the Grid Engine execution host installation
------------------------------------------------------

If you haven't installed the Grid Engine qmaster host yet, you must execute
this step (with >install_qmaster<) prior the execution host installation.

For a sucessfull installation you need a running Grid Engine qmaster. It is
also neccesary that this host is an administrative host.

You can verify your current list of administrative hosts with
the command:

   # qconf -sh

You can add an administrative host with the command:

   # qconf -ah <hostname>

The execution host installation will take approximately 5 minutes.

Hit <RETURN> to continue >>

SEG_ROOTの場所確認。あっているのでそのままenter

Checking $SGE_ROOT directory
----------------------------

The Grid Engine root directory is:

   $SGE_ROOT = /opt/sge6-2

If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/opt/sge6-2] >>

Your $SGE_ROOT directory: /opt/sge6-2

Hit <RETURN> to continue >>

セルの選択だが、これはマスターデーモンで作ったセルの名前がdefaultだったのでそのままenter。j

Grid Engine cells
-----------------

Please enter cell name which you used for the qmaster
installation or press <RETURN> to use [default] >>

Using cell: >default<

Hit <RETURN> to continue >>

次は確認。実行デーモンが監視するポートの選択。そのままenter。

Grid Engine TCP/IP communication service
----------------------------------------

The port for sge_execd is currently set as service.

   sge_execd service set to port 6445

Hit <RETURN> to continue >>
Checking hostname resolving
---------------------------

This hostname is known at qmaster as an administrative host.

Hit <RETURN> to continue >>

スプールディレクトリの選択。そのままでenter

Execd spool directory configuration
-----------------------------------

You defined a global spool directory when you installed the master host.
You can use that directory for spooling jobs from this execution host
or you can define a different spool directory for this execution host.

ATTENTION: For most operating systems, the spool directory does not have to
be located on a local disk. The spool directory can be located on a
network-accessible drive. However, using a local spool directory provides
better performance.

FOR WINDOWS USERS: On Windows systems, the spool directory MUST be located
on a local disk. If you install an execution daemon on a Windows system
without a local spool directory, the execution host is unusable.

The spool directory is currently set to:
<</opt/sge6-2/default/spool/master01>>

Do you want to configure a different spool directory
for this host (y/n) [n] >>
Creating local configuration
----------------------------
sgeadmin@master01 modified "master01" in configuration list
Local configuration for host >master01< created.

Hit <RETURN> to continue >>

実行デーモンのスタートアップ登録。yでenter。

execd startup script
--------------------

We can install the startup script that will
start execd at machine boot (y/n) [y] >> y

cp /opt/sge6-2/default/common/sgeexecd /etc/init.d/sgeexecd.p6444
/usr/sbin/update-rc.d sgeexecd.p6444
 Adding system startup for /etc/init.d/sgeexecd.p6444 ...
   /etc/rc0.d/K03sgeexecd.p6444 -> ../init.d/sgeexecd.p6444
   /etc/rc1.d/K03sgeexecd.p6444 -> ../init.d/sgeexecd.p6444
   /etc/rc6.d/K03sgeexecd.p6444 -> ../init.d/sgeexecd.p6444
   /etc/rc2.d/S95sgeexecd.p6444 -> ../init.d/sgeexecd.p6444
   /etc/rc3.d/S95sgeexecd.p6444 -> ../init.d/sgeexecd.p6444
   /etc/rc4.d/S95sgeexecd.p6444 -> ../init.d/sgeexecd.p6444
   /etc/rc5.d/S95sgeexecd.p6444 -> ../init.d/sgeexecd.p6444

Hit <RETURN> to continue >>
Grid Engine execution daemon startup
------------------------------------

Starting execution daemon. Please wait ...
   starting sge_execd

Hit <RETURN> to continue >>
Adding a queue for this host
----------------------------

We can now add a queue instance for this host:

   - it is added to the >allhosts< hostgroup
   - the queue provides 1 slot(s) for jobs in all queues
     referencing the >allhosts< hostgroup

You do not need to add this host now, but before running jobs on this host
it must be added to at least one queue.

Do you want to add a default queue instance for this host (y/n) [y] >>

root@master01 modified "@allhosts" in host group list
root@master01 modified "all.q" in cluster queue list

Hit <RETURN> to continue >>
Using Grid Engine
-----------------

You should now enter the command:

   source /opt/sge6-2/default/common/settings.csh

if you are a csh/tcsh user or

   # . /opt/sge6-2/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   - $SGE_ROOT         (always necessary)
   - $SGE_CELL         (if you are using a cell other than >default<)
   - $SGE_CLUSTER_NAME (always necessary)
   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)
   - $PATH/$path       (to find the Grid Engine binaries)
   - $MANPATH          (to access the manual pages)

Hit <RETURN> to see where Grid Engine logs messages >>
Grid Engine messages
--------------------

Grid Engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)
   /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

   Qmaster:     /opt/sge6-2/default/spool/qmaster/messages
   Exec daemon: <execd_spool_dir>/<hostname>/messages


Grid Engine startup scripts
---------------------------

Grid Engine startup scripts can be found at:

   /opt/sge6-2/default/common/sgemaster (qmaster)
   /opt/sge6-2/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

実行デーモンのインストールはこれで終了。テストしてみる。まずはrootからログアウト。

# exit

再度sgeadminでログインし、環境変数設定用のシェルスクリプトを読み込む。

$ . /opt/sge6-2/default/common/settings.sh

qconfでクラスタの設定を確認。

$ qconf -sconf
#global:
execd_spool_dir              /opt/sge6-2/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           sgeadmin@localhost
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false sharelog=00:00:00
finished_jobs                100
gid_range                    20000-20100
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
max_advance_reservations     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0
jsv_url                      none
jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w

テストジョブをサブミットする。マニュアルによれば下のような感じ。

$ rsh hoge date
Permission denied.
$ qsub $SGE_ROOT/examples/jobs/simple.sh

rshのほうは失敗するが、qsubのほうは成功。rshが失敗する理由は、.rhostsが正しく設定されていないから。にもかかわらずqsubが成功した理由はrshを通さずに呼び出されたからか。

$ echo 'hoge sgeadmin' >> ~/.rhosts
$ rsh hoge date
2009年  4月 10日 金曜日 15:12:40 JST
$ date
2009年  4月 10日 金曜日 15:12:49 JST

ここまででテストジョブの投入は終了。

[メモ] hudsonとかTheSchwartzとか

コンパイルやテストをした後にコミットしていると、コンパイルとテストに時間がかかってしまう場合にあまり効率が良いとは言えない。ということでCIと呼ばれる考え方がある。それを実現するのにhudsonというソフトがあるそうな。コンパイルとテストの前にどんどんSubversionのようなCVSリポジトリのbranchesにコミットして、hudsonにコンパイルとテスト用のスクリプトを起動させて、コンパイルとテストが成功したら自動的にtrunkにマージするとかの処理を自動的に出来るようにするのかな。

ジョブキューサーバというものもあるわけで、これをやれと命令したらそれが終了するまでシェルが帰ってこないのは嫌なので、やれという命令はいったんキューイングされて、暇になったときに自動的に走らされるということ。これを実現するのが、TheSchwartzかな。

話は変わるけどジョブスケジューラとして、Torqueとかもあるそうな。

ソーシャルブックマーク

  1. はてなブックマーク
  2. Google Bookmarks
  3. del.icio.us

ChangeLog

  1. Posted: 2003-12-14T20:05:25+09:00
  2. Modified: 2003-12-14T14:13:14+09:00
  3. Generated: 2023-08-27T23:09:13+09:00