やっぱり rct-complete-symbol が動かない

最近 rcodetools に文句ばかり言ってて申し訳ないけど、やっぱり rct-complete-symbol が動いてないっぽい。

マルチバイト文字を含む行で rct-complete-symbol すると、

# -*- coding: utf-8; -*-
"こんにちは".enc  # これは補完できる

エンコーディングが utf-8 なら補完できるが、

# -*- coding: euc-jp; -*-
"こんにちは".enc  # これは補完できない

euc-jp や cp932 だと "No candidates." となる。

rct-complete は、「どのファイルの何行目の何バイト目を補完せよ」という感じで指示しないといけないのだけど、「何バイト目」というのが Emacs から取れないみたい。*1

実験

euc-jp, cp932, utf-8 で「こんにちは」とだけ書いたファイルを用意する。

$ hexdump -C hello.euc
00000000  a4 b3 a4 f3 a4 cb a4 c1  a4 cf                    |こんにちは|
0000000a
$ hexdump -C hello.cp932
00000000  82 b1 82 f1 82 c9 82 bf  82 cd                    |こんにちは|
0000000a
$ hexdump -C hello.utf8
00000000  e3 81 93 e3 82 93 e3 81  ab e3 81 a1 e3 81 af     |こんにちは|
0000000f

実験用の関数を用意する。

(defun foo ()
  (interactive)
  (let ((s (buffer-substring (point-at-bol) (point))))
    (message (format "length: %d  string-width: %d  string-bytes: %d"
		     (length s)
		     (string-width s)
		     (string-bytes s)))))

各ファイルの末尾で M-x foo したところ、

length: 5  string-width: 10  string-bytes: 15

どのファイルでも結果は同じだった。
エンコーディングが違うのにバイト数は同じ。*2

クイックハック

バイト数をあきらめて文字数で処理してみる。

以下 rcodetools 0.8.2.0 へのパッチ。*3

まず rcodetools.el

--- rcodetools.el.org	2009-04-14 21:35:27.905532000 +0900
+++ rcodetools.el	2009-04-14 21:36:06.395079800 +0900
@@ -155,7 +155,8 @@
     (rct-shell-command
      (format "%s %s %s --line=%d --column=%d %s"
              command opt (or rct-option-local "")
-             (rct-current-line) (current-column)
+             (rct-current-line)
+             (length (buffer-substring (point-at-bol) (point)))
              (if rct-use-test-script (rct-test-script-option-string) ""))
      eval-buffer)
     (message "")

length で文字数を数えるように変更。

xmpfilter

--- xmpfilter.org	2009-04-14 21:37:52.245246200 +0900
+++ xmpfilter	2009-04-14 21:52:12.637563600 +0900
@@ -76,6 +76,13 @@
 targetcode = ARGF.read
 Dir.chdir options[:wd] if options[:wd]
 
+if RUBY_VERSION >= "1.9"
+  targetcode.force_encoding('ASCII-8BIT')
+  if /\A.*\n?.*#.*coding[:=]\s*(?<encoding>[-a-z0-9_]+)/i =~ targetcode
+    targetcode.force_encoding(encoding)
+  end
+end
+
 if XMPFilter.detect_rbtest(targetcode, options)
   require 'rcodetools/xmptestunitfilter'
   klass = XMPTestUnitFilter

処理対象のスクリプトのマジックコメントからエンコードを設定。

rct-complete

--- rct-complete.org	2009-04-14 21:44:37.206587600 +0900
+++ rct-complete	2009-04-14 21:52:14.811495800 +0900
@@ -37,6 +37,14 @@
 
 targetcode = ARGF.read
 Dir.chdir options[:wd] if options[:wd]
+
+if RUBY_VERSION >= "1.9"
+  targetcode.force_encoding('ASCII-8BIT')
+  if /\A.*\n?.*#.*coding[:=]\s*(?<encoding>[-a-z0-9_]+)/i =~ targetcode
+    targetcode.force_encoding(encoding)
+  end
+end
+
 XMPFilter.detect_rbtest(targetcode, options)
 # Do the job. dispatched by klass.
 puts klass.run(targetcode, options)

xmpfilter に同じ。

結果

補完してみる。

# -*- coding: euc-jp; -*-
RUBY_VERSION            # =>

"こんにちは".enc

ちゃんと候補が出てきた。

xmpfilter も試す。

# -*- coding: euc-jp; -*-
RUBY_VERSION            # => "1.9.1"

"こんにちは".encoding   # => #<Encoding:EUC-JP>

できた。

*1:取れるのかもだけど取り方を知らない。

*2:これは Emacs の内部コードのバイト数か?

*3:最新じゃないけど、たぶん変更が少なくて済むから。