CS 252 Project: Parallel Parsing

Size: px
Start display at page:

Download "CS 252 Project: Parallel Parsing"

Transcription

1 CS 252 Project: Parallel Parsing Seth Fowler, Joshua Paul University of California, Berkeley May 14, 2000

2 Outline 1 Introduction: Parsing and Grammars Motivation Parsing & Grammars 2 Earley Algorithm & Parallelization 3 Packrat Algorithm & Parallelization 4 Goals

3 Motivation Parsing is Ubiquitous Much of our modern-day computing infrastructure is built on computer languages that are interpreted or compiled in real-time. Examples include: Scripting: JavaScript, Python Layout: HTML, CSS, PostScript/PDF Data: XML, JSON Virtual machines: Java, C#

4 Motivation Parsing is Ubiquitous Much of our modern-day computing infrastructure is built on computer languages that are interpreted or compiled in real-time. Examples include: Scripting: JavaScript, Python Layout: HTML, CSS, PostScript/PDF Data: XML, JSON Virtual machines: Java, C# The use of any such language relies centrally on parsing, the process of turning a one-dimensional stream of characters into a more appropriate internal representation (often a parse-tree).

5 Motivation High-performance parsing Parallel Parsing Inspired by the Parallel Web Browser Project a at Berkeley, and motivated by the current and future availability of medium and large-scale chip multiprocessors (CMPs), we propose to parallelize the parsing process. a lmeyerov/projects/pbrowser/ Hardware Modifications In the process of attempting to parallelize parsing, we additionally seek out incremental hardware changes that may provide substantial speedup for parsing and other similar tasks.

6 Parsing & Grammars Grammars Formal grammar A formal grammar is a set of rules that implicitly describe syntactically valid strings. Context-free grammar We are primarily interested context-free grammars (and the closely-related parsing expression grammars), which may be described by: V, the set of non-terminal (internal) character. Σ, the set of terminal (output) characters. R, the set of production rules, taking a single non-terminal to a sequence of terminal and non-terminal characters. S, the starting non-terminal.

7 Parsing & Grammars Example An example grammar V = {S,E}, Σ = {n,+,,(,)} R = {S E,E n E + E E E (E)} We may generate syntactically valid strings by starting with S, and using the production rules: S E E E (E) E (E + E) E (3 + 4) 5

8 Parsing & Grammars Example An example grammar V = {S,E}, Σ = {n,+,,(,)} R = {S E,E n E + E E E (E)} We may generate syntactically valid strings by starting with S, and using the production rules: S E E E (E) E (E + E) E (3 + 4) 5 The Parsing Problem Given an output string, determine whether it can be generated by the grammar, and, if so, produce one or more associated derivations.

9 Outline 1 Introduction: Parsing and Grammars 2 Earley Algorithm & Parallelization Earley Algorithm Parallel Earley algorithm 3 Packrat Algorithm & Parallelization 4 Goals

10 Earley Algorithm Earley Item Central to the Earley algorithm are Earley items, which describe the complete state of an in-progress grammar production: Example The grammar production of interest. The terminal position where the production began. The present status of the production. [E E + E,1] indicates that production E E + E started at position 1, and that an E and a + have already been parsed.

11 Earley Algorithm Earley Sets One or more Earley items are associated with each position in the list of terminals we are parsing. The set of Earley items associated with a position is called an Earley set. Example ( ) x 5 [E E + E, 1] [E E + E, 1] [E (E ), 0]? [S E, 0] Objective: An algorithm that associates with each position an Earley set containing every valid Earley item for that position.

12 Earley Algorithm Earley algorithm This objective can be attained using a dynamic programming algorithm. Suppose that Earley sets have been computed for the first n 1 positions: the Earley set for the n-th position may be computed using three rules: 1 Scan: Suppose that the n-th terminal is y, and that there is an Earley item [X α y β,j] at position n 1. Then add Earley item [X αy β,j] to position n. 2 Predict: Suppose there is an Earley item [X α Y β,j] at position n. Then add Earley items of the form [Y α,n + 1] to position n. 3 Complete: Suppose there is an Earley item [Y α,j] at position n, and an Earley item [X β Y γ,k] at position j. Then add Earley item [X β Y γ,k] to position n.

13 Earley Algorithm Earley algorithm example ( ) [E ( E ), 0] [E E + E, 1]

14 Earley Algorithm Earley algorithm example ( ) [E ( E ), 0] [E E + E, 1] [E n, 3] Predict

15 Earley Algorithm Earley algorithm example ( ) [E ( E ), 0] [E E + E, 1] [E n, 3] Scan [E n, 3]

16 Earley Algorithm Earley algorithm example ( ) [E ( E ), 0] [E E + E, 1] [E n, 3] [E n, 3] [E E + E, 1] Complete

17 Earley Algorithm Earley algorithm example ( ) [E ( E ), 0] [E E + E, 1] [E n, 3] Complete [E n, 3] [E E + E, 1] [E (E ), 0]

18 Parallel Earley algorithm Towards a parallel algorithm (I) We proceed towards a parallel algorithm in two steps:

19 Parallel Earley algorithm Towards a parallel algorithm (I) We proceed towards a parallel algorithm in two steps: 1 Divide the list of terminals into contiguous blocks.

20 Parallel Earley algorithm Towards a parallel algorithm (I) We proceed towards a parallel algorithm in two steps: 1 Divide the list of terminals into contiguous blocks. 2 Divide the Earley sets associated with each block into disjoint Earley subsets, each corresponding to the block-origin of the Earley items within.

21 Parallel Earley algorithm Towards a parallel algorithm (II) Key Observation Computing an Earley subset in a block requires looking at only a single Earley subset in every other block!

22 Parallel Earley algorithm Towards a parallel algorithm (II) Key Observation Computing an Earley subset in a block requires looking at only a single Earley subset in every other block! Example Consider computing an Earley subset in block 3. Valid Earley items consist only of completions from specific pairs of other Earley subsets

23 Parallel Earley algorithm Towards a parallel algorithm (II) Key Observation Computing an Earley subset in a block requires looking at only a single Earley subset in every other block! Example Consider computing an Earley subset in block 3. Valid Earley items consist only of completions from specific pairs of other Earley subsets

24 Parallel Earley algorithm Towards a parallel algorithm (II) Key Observation Computing an Earley subset in a block requires looking at only a single Earley subset in every other block! Example Consider computing an Earley subset in block 3. Valid Earley items consist only of completions from specific pairs of other Earley subsets

25 Parallel Earley algorithm Towards a parallel algorithm (II) Key Observation Computing an Earley subset in a block requires looking at only a single Earley subset in every other block! Example Consider computing an Earley subset in block 3. Valid Earley items consist only of completions from specific pairs of other Earley subsets

26 Parallel Earley algorithm Parallel Earley Dependencies We obtain the following dependency graph amongst the Earley subsets.

27 Parallel Earley algorithm Parallel Earley Dependencies We obtain the following dependency graph amongst the Earley subsets. The dependencies prohibiting parallelization are precisely those between the top-level Earley subsets.

28 Parallel Earley algorithm Parallel Earley Dependencies We obtain the following dependency graph amongst the Earley subsets. The dependencies prohibiting parallelization are precisely those between the top-level Earley subsets. Removing the critical dependencies In computing the top-level Earley subsets, assume every possible Earley item originating in previous block does exist.

29 Parallel Earley algorithm Parallel Earley Dependencies We obtain the following dependency graph amongst the Earley subsets. The dependencies prohibiting parallelization are precisely those between the top-level Earley subsets. Removing the critical dependencies In computing the top-level Earley subsets, assume every possible Earley item originating in previous block does exist. Thread 1 Thread 2 Thread 3

30 Outline 1 Introduction: Parsing and Grammars 2 Earley Algorithm & Parallelization 3 Packrat Algorithm & Parallelization Background Packrat Parsing Parallelizing Packrat Parsing 4 Goals

31 Background Parsing Expression Grammars Packrat parsers use parsing expression grammars (PEGs). Compared to Earley s CFGs, PEGs: Directly describe how to read a language instead of how to write it. Can easily handle tokenization and syntax in the same grammar. Can always be used to parse strings in linear time. Are not as expressive: ambiguity and left recursion are impossible! Example of a PEG Rule [T F + T / F]

32 Packrat Parsing The Packrat Parsing Algorithm Conceptually, packrat parsers perform three steps: 1 Construct a memoization table. Rows correspond to nonterminals, columns correspond to character positions in the input string. 2 Proceeding from right to left, use the rule for each nonterminal to attempt a match at each character position, recording the result in the table. Recursion is used to resolve other nonterminals referenced in the rule. 3 Read the value in the cell corresponding to the first character position and the start symbol of the grammar; if it matched, the parse was successful! Real packrat parsers avoid unnecessary computation by evaluating the cells in the table lazily, starting at the start symbol.

33 Packrat Parsing A Sample Packrat Table C1 C2 C3 C4 C5 Term C6 C6 C6 Factor C4 C4 C6 Digit C2 C4 C6 2 * The table above is completely filled in based upon the simple grammar below. [Term Factor + Term / Factor] [Factor Digit * Factor / Digit] [Digit 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9]

34 Parallelizing Packrat Parsing Constraints on Parallelism Each cell in the packrat table may depend upon the values of any cell in the same column or to the right. How can we achieve parallelism within these limitations?

35 Parallelizing Packrat Parsing Extra Threads for Speculative Computation One approach: A main thread (red) works from left to right using the standard lazy packrat algorithm. Speculative threads (assorted colors) evaluate cells ahead of the main thread. The cells are chosen by a suitable heuristic. With a good heuristic, many or most of the values the main thread needs will have been computed ahead of time. Main and Speculative Threads

36 Parallelizing Packrat Parsing Extra Threads for Speculative Computation (continued) This approach already shows some speedup. However, it isn t ideal: High synchronization overhead. Not cache-friendly or NUMA-friendly. As input string grows longer, work inefficiency grows. Can we do better?

37 Parallelizing Packrat Parsing Synthesized Start Symbols Synchronization costs and inter-processor cache contention can be reduced using synthesized start symbols, which play a similar role to Earley items. The input string is divided into blocks, and a different thread is assigned to each block. Each thread speculates within its block, as before, until all of the blocks to its left are complete.

38 Parallelizing Packrat Parsing Synthesized Start Symbols (continued) As a block completes, it creates a new start symbol for the next block which contains the remaining portion of all of its in-progress matches. The next thread stops speculating and begins working sequentially from the beginning of its block, attempting to match the new start symbol. Each new start symbol is defined to match whenever the start symbol of the previous block would have matched if its evaluation had continued. If the last block is able to match the synthesized start symbol it has received, the parse was successful.

39 time Parallelizing Packrat Parsing The Processing Window Large input strings can make poor use of the cache if we simply divide the input string up evenly between hardware threads, and the amount of potentially wasteful speculative work will be uneven. An alternative is to use a processing window. 1 A block size is chosen to suit the hardware in use. 2 The first n blocks are assigned to the n hardware threads. 3 As each block completes, its thread is reassigned to the next block that is not being worked on. View over Time (2 Threads, 3 Blocks)

40 Outline 1 Introduction: Parsing and Grammars 2 Earley Algorithm & Parallelization 3 Packrat Algorithm & Parallelization 4 Goals Future Work

41 Future Work Future Work Short-term Benchmark and provide performance analysis for these algorithms. Long-term Provide theoretical bounds for the speedup achievable in parallel parsing algorithms and use these bounds to evaluate our work. Exploit features of existing hardware, such as vector instructions and SIMD, to accelerate these algorithms. Investigate new vector instructions that may prove beneficial for problems of this type.