Erlang Plugin for NetBeans in Scala#3: Minimal Lexer Interation
Caoyuan Blog - - February 04, 2009The minim supporting is to integrate a language lexer to NetBeans’s language support framework. As NetBeans 7.0, there is a new effort for common language supporting, which is called CSL (Common Scripting Language). Don’t be confused by this module’s name, it not only for scripting language, actually, Java support has been migrated to CSL in 7.0. There are discussions on a better name. CSL is forked and created on GSF (Generic Scripting Framework) as a GSF’s new variant that is based on new Parsing & Indexing API.
Since I’m going to write an Erlang lexer in Rats! generator parser, I need to add dependency on rats run-time libs first, rats run-time module is under contrib/xtc, which I patched to support end position of each production.
Then, you have to tell the project where to find the rats! libs, this can be done by adding following properties to nbproject/project.properties, now this nbproject/project.properties looks like:
nbproject/project.properties
javac.compilerargs=-Xlint:unchecked
javac.source=1.5
nbm.homepage=http://wiki.netbeans.org/Erlang scala.library=${cluster}/modules/ext/scala-library-2.7.3.jar
scala.compiler=${cluster}/modules/ext/scala-compiler-2.7.3.jar
scala.libs=\ ${scala.library}:\ ${scala.compiler} rats.jar=${cluster}/modules/xtc.jar
rats.package.dir=org/netbeans/modules/erlang/editor/rats
rats.lexer.file=LexerErlang.rats
Al Rats! definitions of Erlang token can be found at http://hg.netbeans.org/main/contrib/file/tip/erlang.editor/src/org/netbeans/modules/erlang/editor/rats/ Don’t ask me how to write Rats! rules for languages, you should get these information from Rats! web site.
The Erlang’s lexer will be generated via “rats.lexer.file=LexerErlang.rats”, which is the entry point of all defined rules for Erlang tokens. Run “rats” target will generate a LexerErlang.java file which is the lexer class that will be used to create Erlang tokens from Erlang source files.
Now, we should integrate this lexer class to NetBeans’ lexer engine, this is done by two Scala files:
ErlangLexer.scala
/* * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER. * * Copyright 1997-2008 Sun Microsystems, Inc. All rights reserved. * * The contents of this file are subject to the terms of either the GNU * General Public License Version 2 only ("GPL") or the Common * Development and Distribution License("CDDL") (collectively, the * "License"). You may not use this file except in compliance with the * License. You can obtain a copy of the License at * http://www.netbeans.org/cddl-gplv2.html * or nbbuild/licenses/CDDL-GPL-2-CP. See the License for the * specific language governing permissions and limitations under the * License. When distributing the software, include this License Header * Notice in each file and include the License file at * nbbuild/licenses/CDDL-GPL-2-CP. Sun designates this * particular file as subject to the "Classpath" exception as provided * by Sun in the GPL Version 2 section of the License file that * accompanied this code. If applicable, add the following below the * License Header, with the fields enclosed by brackets [] replaced by * your own identifying information: * "Portions Copyrighted [year] [name of copyright owner]" * * Contributor(s): * * The Original Software is NetBeans. The Initial Developer of the Original * Software is Sun Microsystems, Inc. Portions Copyright 1997-2006 Sun * Microsystems, Inc. All Rights Reserved. * * If you wish your version of this file to be governed by only the CDDL * or only the GPL Version 2, indicate your decision by adding * "[Contributor] elects to include this software in this distribution * under the [CDDL or GPL Version 2] license." If you do not indicate a * single choice of license, a recipient has the option to distribute * your version of this file under either the CDDL, the GPL Version 2 or * to extend the choice of license to its licensees as provided above. * However, if you add GPL Version 2 code and therefore, elected the GPL * Version 2 license, then the option applies only if the new code is * made subject to such option by the copyright holder. */
package org.netbeans.modules.erlang.editor.lexer import _root_.java.io.IOException
import _root_.java.io.Reader
import _root_.java.util.ArrayList
import _root_.java.util.Iterator
import _root_.java.util.List
import org.netbeans.api.lexer.Token
import org.netbeans.modules.erlang.editor.rats.LexerErlang
import org.netbeans.spi.lexer.Lexer
import org.netbeans.spi.lexer.LexerInput
import org.netbeans.spi.lexer.LexerRestartInfo
import org.netbeans.spi.lexer.TokenFactory
import xtc.parser.Result
import xtc.tree.GNode
import xtc.util.Pair import org.netbeans.modules.erlang.editor.lexer.ErlangTokenId._ /** * * @author Caoyuan Deng */
object ErlangLexer { private var cached:Option[ErlangLexer] = None def create(info:LexerRestartInfo[ErlangTokenId]) = synchronized { cached match { case None => cached = Some(new ErlangLexer) case _ => } cached.get.restart(info) cached } def release = cached = None } class ErlangLexer extends Lexer[ErlangTokenId] { var info : LexerRestartInfo[ErlangTokenId] = _ var input : LexerInput = _ var tokenFactory : TokenFactory[ErlangTokenId] = _ var lexerInputReader : LexerInputReader = _ val tokenStream = new ArrayList[TokenInfo] /** * tokenStream.iterator() always return a new iterator, which point the first * item, so we should have a global one. */ var tokenStreamItr : Iterator[TokenInfo] = tokenStream.iterator var lookahead : Int = 0 def restart(info:LexerRestartInfo[ErlangTokenId]) { this.info = info input = info.input tokenFactory = info.tokenFactory lexerInputReader = new LexerInputReader(input) /** * @Note: it seems input at this time is empty, so we can not do scanning here */ tokenStream.clear tokenStreamItr = tokenStream.iterator lookahead = 0 } def state : Object = null def nextToken :Token[ErlangTokenId] = { if (!tokenStreamItr.hasNext) { tokenStream.clear scanTokens tokenStreamItr = tokenStream.iterator /** * @Bug of LexerInput.backup(int) ? * backup(0) will cause input.readLength() increase 1 */ lookahead = input.readLength if (lookahead > 0) { input.backup(lookahead) } } if (tokenStreamItr.hasNext) { val tokenInfo = tokenStreamItr.next if (tokenInfo.length == 0) { // EOF return null } // read token's chars according to tokenInfo.length var i = 0 while (i < tokenInfo.length) { input.read i += 1 } // see if needs to lookahead, if true, perform it lookahead -= tokenInfo.length; // to cheat incremently lexer, we needs to lookahead one more char when // tokenStream.size() > 1 (batched tokens that are not context free), // so, when modification happens extractly behind latest token, will // force lexer relexer from the 1st token of tokenStream val lookahead1 = if (tokenStream.size > 1) lookahead + 1 else lookahead if (lookahead1 > 0) { var i = 0 while (i < lookahead1) { input.read i += 1 } input.backup(lookahead1) } val tokenLength = input.readLength createToken(tokenInfo.id, tokenLength) } else { assert(false, "unrecognized input" + input.read) null } } def createToken(id:ErlangTokenId, length:Int) : Token[ErlangTokenId] = id.fixedText match { case null => tokenFactory.createToken(id, length) case fixedText => tokenFactory.getFlyweightToken(id, fixedText) } def scanTokens : Result = { /** * We cannot keep an instance scope lexer, since lexer (sub-class of ParserBase) * has internal states which keep the read-in chars, index and others, it really * difficult to handle. */ val scanner = new LexerErlang(lexerInputReader, " ") try { // just scan from position 0, incrmental lexer engine will handle start char in lexerInputReader val r = scanner.pToken(0) if (r.hasValue) { val node = r.semanticValue.asInstanceOf[GNode] flattenToTokenStream(node) r } else { System.err.println(r.parseError.msg) null } } catch { case e:Exception => e.printStackTrace null } } def flattenToTokenStream(node:GNode) : Unit = { if (node.size == 0) { /** @Note: * When node.size() == 0, it's a void node. This should be limited to * EOF when you define lexical rats. * * And in Rats!, EOF is !_, the input.readLength() will return 0 */ // assert(input.readLength == 0, // "This generic node: " + node.getName + // " is a void node, this should happen only on EOF. Check you rats file.") val tokenInfo = new TokenInfo(0, null) tokenStream.add(tokenInfo) } var i = 0 while (i < node.size) { node.get(i) match { case null => /** child may be null */ case child:GNode => flattenToTokenStream(child) case child:Pair[_] => assert(false, "Pair:" + child + " to be process, do you add 'flatten' option on grammar file?") case child:String => val length = child.length val id = ErlangTokenId.valueOf(node.getName) match { case None => ErlangTokenId.IGNORED case Some(v) => v.asInstanceOf[ErlangTokenId] } val tokenInfo = new TokenInfo(length, id) tokenStream.add(tokenInfo) case child => println("To be process: " + child) } i += 1 } } def release = ErlangLexer.release /** * Hacking for
xtc.parser.ParserBase of Rats! which use java.io.Reader * as the chars input, but uses only {@link java.io.Reader#read()} of all methods in * {@link xtc.parser.ParserBase#character(int)} */ class LexerInputReader(input:LexerInput) extends Reader { override def read : Int = input.read match { case LexerInput.EOF => -1 case c => c } override def read(cbuf:Array[Char], off:Int, len:Int) : Int = { throw new UnsupportedOperationException("Not supported yet.") -1 } override def close = {} } class TokenInfo(val length:Int, val id:ErlangTokenId) { override def toString = "(id=" + id + ", length=" + length + ")" }
}
ErlangTokenId
/* * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER. * * Copyright 1997-2007 Sun Microsystems, Inc. All rights reserved. * * The contents of this file are subject to the terms of either the GNU * General Public License Version 2 only ("GPL") or the Common * Development and Distribution License("CDDL") (collectively, the * "License"). You may not use this file except in compliance with the * License. You can obtain a copy of the License at * http://www.netbeans.org/cddl-gplv2.html * or nbbuild/licenses/CDDL-GPL-2-CP. See the License for the * specific language governing permissions and limitations under the * License. When distributing the software, include this License Header * Notice in each file and include the License file at * nbbuild/licenses/CDDL-GPL-2-CP. Sun designates this * particular file as subject to the "Classpath" exception as provided * by Sun in the GPL Version 2 section of the License file that * accompanied this code. If applicable, add the following below the * License Header, with the fields enclosed by brackets [] replaced by * your own identifying information: * "Portions Copyrighted [year] [name of copyright owner]" * * Contributor(s): * * The Original Software is NetBeans. The Initial Developer of the Original * Software is Sun Microsystems, Inc. Portions Copyright 1997-2006 Sun * Microsystems, Inc. All Rights Reserved. * * If you wish your version of this file to be governed by only the CDDL * or only the GPL Version 2, indicate your decision by adding * "[Contributor] elects to include this software in this distribution * under the [CDDL or GPL Version 2] license." If you do not indicate a * single choice of license, a recipient has the option to distribute * your version of this file under either the CDDL, the GPL Version 2 or * to extend the choice of license to its licensees as provided above. * However, if you add GPL Version 2 code and therefore, elected the GPL * Version 2 license, then the option applies only if the new code is * made subject to such option by the copyright holder. */
package org.netbeans.modules.erlang.editor.lexer import _root_.java.util.Collection
import _root_.java.util.Collections
import _root_.java.util.HashMap
import _root_.java.util.HashSet
import _root_.java.util.Map
import _root_.java.util.Arrays import org.netbeans.api.lexer.InputAttributes
import org.netbeans.api.lexer.Language
import org.netbeans.api.lexer.LanguagePath
import org.netbeans.api.lexer.Token
import org.netbeans.api.lexer.TokenId
import org.netbeans.spi.lexer.LanguageEmbedding
import org.netbeans.spi.lexer.LanguageHierarchy
import org.netbeans.spi.lexer.Lexer
import org.netbeans.spi.lexer.LexerRestartInfo /** * * @author Caoyuan Deng */
object ErlangTokenId extends Enumeration { // Let type of enum's value the same as enum itself type ErlangTokenId = V // Extends Enumeration.Val to get custom enumeration value class V(val name:String, val fixedText:String, val primaryCategory:String) extends Val(name) with TokenId { override def ordinal = id } object V { def apply(name:String, fixedText:String, primaryCategory:String) = new V(name, fixedText, primaryCategory) } val IGNORED = V("IGNORED", null, "ingore") val Error = V("Error", null, "error") // --- Spaces and comments val Ws = V("Ws", null, "whitespace") val Nl = V("Nl", null, "whitespace") val LineComment = V("LineComment", null, "comment") val CommentTag = V("CommentTag", null, "comment") val CommentData = V("CommentData", null, "comment") // --- Literals val IntegerLiteral = V("IntegerLiteral", null, "number") val FloatingPointLiteral = V("FloatingPointLiteral", null, "number") val CharacterLiteral = V("CharacterLiteral", null, "char") val StringLiteral = V("StringLiteral", null, "string") // --- Keywords val Andalso = V("Andalso", "andalso", "keyword") val After = V("After", "after", "keyword") val And = V("And", "and", "keyword") val Band = V("Band", "band", "keyword") val Begin = V("Begin", "begin", "keyword") val Bnot = V("Bnot", "bnot", "keyword") val Bor = V("Bor", "bor", "keyword") val Bsr = V("Bsr", "bsr", "keyword") val Bxor = V("Bxor", "bxor", "keyword") val Case = V("Case", "case", "keyword") val Catch = V("Catch", "catch", "keyword") val Cond = V("Cond", "cond", "keyword") val Div = V("Div", "div", "keyword") val End = V("End", "end", "keyword") val Fun = V("Fun", "fun", "keyword") val If = V("If", "if", "keyword") val Not = V("Not", "not", "keyword") val Of = V("Of", "of", "keyword") val Orelse = V("Orelse", "orelse", "keyword") val Or = V("Or", "or", "keyword") val Query = V("Query", "query", "keyword") val Receive = V("Receive", "receive", "keyword") val Rem = V("Rem", "rem", "keyword") val Try = V("Try", "try", "keyword") val When = V("When", "when", "keyword") val Xor = V("Xor", "xor", "keyword") // --- Identifiers val Atom = V("Atom", null, "identifier") val Var = V("Var", null, "identifier") // --- Symbols val LParen = V("LParen", "(", "separator") val RParen = V("RParan", ")", "separator") val LBrace = V("LBrace", "{", "separator") val RBrace = V("RBrace", "}", "separator") val LBracket = V("LBracket", "[", "separator") val RBracket = V("RBracket", "]", "separator") val Comma = V("Comma", ",", "separator") val Dot = V("Dot", ".", "separator") val Semicolon = V("Semicolon", ";", "separator") val DBar = V("DBar", "||", "separator") val Bar = V("Bar", "|", "separator") val Question = V("Question", "?","separator") val DLt = V("DLt", "<<", "separator") val LArrow = V("LArrow", "<-", "separator") val Lt = V("Lt", "<", "separator") val DGt = V("DGt", ">>", "separator") val Ge = V("Ge", ">=", "separator") val Gt = V("Gt", ">", "separator") val ColonMinus = V("ColonMinus", ":-", "separator") val DColon = V("DColon", "::", "separator") val Colon = V("Colon", ":", "separator") val Hash = V("Hash", "#", "separator") val DPlus = V("DPlus", "++", "separator") val Plus = V("Plus", "+", "separator") val DMinus = V("DMinus", "--", "separator") val RArrow = V("RArrow", "->", "separator") val Minus = V("Minus", "-", "separator") val Star = V("Star", "*", "separator") val Ne = V("Ne", "/=", "separator") val Slash = V("Slash", "/", "separator") val EEq = V("EEq", "=:=", "separator") val ENe = V("ENe", "=/=", "separator") val DEq = V("DEq", "==", "separator") val Le = V("le", "=<", "separator") val Eq = V("Eq", "=", "separator") val Exclamation = V("Exclamation", "!", "separator") /** * MIME type for Erlang. Don't change this without also consulting the various XML files * that cannot reference this value directly. */ val ERLANG_MIME_TYPE = "text/x-erlang"; // NOI18N /** should use def instead of val here, which will be called from instanceCreate of NetBeans' system */ def language = new LanguageHierarchy[ErlangTokenId] { protected def mimeType = ERLANG_MIME_TYPE protected def createTokenIds : Collection[ErlangTokenId] = { val ids = new HashSet[ErlangTokenId] elements.foreach{ids add _.asInstanceOf[ErlangTokenId]} ids } protected def createLexer(info:LexerRestartInfo[ErlangTokenId]) : Lexer[ErlangTokenId] = ErlangLexer.create(info) match { case None => null case Some(l) => l } override protected def createTokenCategories : Map[String, Collection[ErlangTokenId]] = { val cats = new HashMap[String, Collection[ErlangTokenId]] cats } override protected def embedding(token:Token[ErlangTokenId], languagePath:LanguagePath, inputAttributes:InputAttributes) = { null // No embedding } }.language }
In ErlangTokenId, we implemented import org.netbeans.api.lexer.TokenId, and defined all Erlang token’s Id, with its name, id, fixedText and primaryCategory.
ErlangLexer.scala is the bridge between LexerErlang.java and NetBeans’ lexer engine. Erlang’s lexer engine is an incremental engine, which will automatically handle the position you insert/modify a char, wrap and pass the sanitary strings in a LexerInput to the lexer. I have carefully degined ErlangLexer.scala to get these benefits, you can use this file for other languages too, if you have understood how I write rats rules for tokens.
Now, you should implemented an org.netbeans.modules.csl.spi.DefaultLanguageConfig, which register all services for your language, such as: CodeCompletionHandler, DeclarationFinder, Formatter, IndexSearcher, InstantRenamer, KeystrokeHandler, OccurrencesFinder, SemanticAnalyzer, StructureScanner etc. For the first step, we only implemented a minim supporting for Erlang, which actually is only a lexer to get Erlang tokens and highlighting them. So, our implementation is fairly simple:
ErlangLanguage.scala
/* * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER. * * Copyright 1997-2007 Sun Microsystems, Inc. All rights reserved. * * The contents of this file are subject to the terms of either the GNU * General Public License Version 2 only ("GPL") or the Common * Development and Distribution License("CDDL") (collectively, the * "License"). You may not use this file except in compliance with the * License. You can obtain a copy of the License at * http://www.netbeans.org/cddl-gplv2.html * or nbbuild/licenses/CDDL-GPL-2-CP. See the License for the * specific language governing permissions and limitations under the * License. When distributing the software, include this License Header * Notice in each file and include the License file at * nbbuild/licenses/CDDL-GPL-2-CP. Sun designates this * particular file as subject to the "Classpath" exception as provided * by Sun in the GPL Version 2 section of the License file that * accompanied this code. If applicable, add the following below the * License Header, with the fields enclosed by brackets [] replaced by * your own identifying information: * "Portions Copyrighted [year] [name of copyright owner]" * * Contributor(s): * * The Original Software is NetBeans. The Initial Developer of the Original * Software is Sun Microsystems, Inc. Portions Copyright 1997-2006 Sun * Microsystems, Inc. All Rights Reserved. * * If you wish your version of this file to be governed by only the CDDL * or only the GPL Version 2, indicate your decision by adding * "[Contributor] elects to include this software in this distribution * under the [CDDL or GPL Version 2] license." If you do not indicate a * single choice of license, a recipient has the option to distribute * your version of this file under either the CDDL, the GPL Version 2 or * to extend the choice of license to its licensees as provided above. * However, if you add GPL Version 2 code and therefore, elected the GPL * Version 2 license, then the option applies only if the new code is * made subject to such option by the copyright holder. */
package org.netbeans.modules.erlang.editor import _root_.java.io.File
import _root_.java.util.Collection
import _root_.java.util.Collections
import _root_.java.util.HashMap
import _root_.java.util.Map
import _root_.java.util.Set
import org.netbeans.api.lexer.Language;
import org.netbeans.modules.csl.api.CodeCompletionHandler
import org.netbeans.modules.csl.api.DeclarationFinder
import org.netbeans.modules.csl.api.Formatter
import org.netbeans.modules.csl.api.IndexSearcher
import org.netbeans.modules.csl.api.InstantRenamer
import org.netbeans.modules.csl.api.KeystrokeHandler
import org.netbeans.modules.csl.api.OccurrencesFinder
import org.netbeans.modules.csl.api.SemanticAnalyzer
import org.netbeans.modules.csl.api.StructureScanner
import org.netbeans.modules.csl.spi.DefaultLanguageConfig
import org.netbeans.modules.parsing.spi.Parser
import org.netbeans.modules.parsing.spi.indexing.EmbeddingIndexerFactory
import org.openide.filesystems.FileObject
import org.openide.filesystems.FileUtil
import org.netbeans.modules.erlang.editor.lexer.ErlangTokenId /* * Language/lexing configuration for Erlang * * @author Caoyuan Deng */
class ErlangLanguage extends DefaultLanguageConfig { override def getLexerLanguage = ErlangTokenId.language override def getDisplayName : String = "Erlang" override def getPreferredExtension : String = { "erl" // NOI18N } }
where def getLexerLanguage = ErlangTokenId.language is extract the LanguageHierarchy implementation for ErlangTokenId in ErlangTokenId.scala, which will tell the framework about token ids, category, embedding information.
The final step is register ErlangLanguage and fontColor.xml, erlangResolver.xml etc in layer.xml for color highlights, mime resolver and your language icon.
layer.xml
Now build your new module, lunch it, and open a .erl file, you get:
Click on the picture to enlarge it
Categories: Blogs Caoyuan Blog
Comments
No comments so far, you could be the first.Add comment
Erlang on Twitter
» VaiguntaSarathy (Vaigunta Sarathy): FS#29929: [erlang] Simplify PKGBUILD http://t.co/rDJ85DMb
» vadson27 (vadson ferreira): FS#29929: [erlang] Simplify PKGBUILD http://t.co/6Oox4Ehf
» vaibhavsingh544 (Vabhav Singh): FS#29929: [erlang] Simplify PKGBUILD http://t.co/Sjhjc2aM
» vaccumakeh (Vladimir Rostov): FS#29929: [erlang] Simplify PKGBUILD http://t.co/S86CjIjg
» ITJobs_EU_UK (ITJobs_EU_UK): #JB Ruby Developer ( Ruby / RoR Erlang LAMP ): Job Description : Ruby Developer / Software Engineer Location: Lo… http://t.co/74omWQ9m
» udzura (Uchio KONDO): 文字列操作が弱い、は今のErlangではfalseであると
» udzura (Uchio KONDO): Erlang , R14 あたりからutf-8の文字列の扱いに強くなったとのこと #shinjukuex
» winda_lestari63 (winda cliquers): gg usja di pikirin prins ank itw gjhe..RT:@Prinsia_2140
@Erlang_ABNIC @ji_bero @erlang_abnic @rb_120511
» ErlangSolutions (Erlang Solutions): Want to join the best of the best of the best? ESL is hiring 40 engineers! Join our Linkedin Group to keep updated http://t.co/CDB7qYeI
» Prinsia_2140 (˘⌣˘ pRiNsiAH ˘⌣˘): @Erlang_ABNI : http://t.co/j61ODMNv
Statistics
Number of aggregated posts: 10498
Number of comments: 2115
Most recent article: May 15, 2012
Latest comments
» cheap soccer jerseys on Memory Models in Erlang vs Java: Nice discussion here,you are doing a great job. i was looking for this information. i found it on your page…
» mandesejohn on Couchbase Meetup at new HQ: Thanks for sharing experience. It should be really a great post. It should be knowledgeable and informative. Keep it up. flower delivery columbus ohio
» vermaseo on Scale means Skills: I’m surprised people are still commenting about this. George has been moved on to bigger and better things with the president for awhile now.ledikanten
