algorithm - What's the optimal way to merge k lists? -
suppose you're given merge function merge (find union of) 2 lists l1 , l2 of size s1 , s2 in o(s1+s2) time. optimal way merge k lists of size s1, s2, ..., sk?
i'm thinking should sort s1, ..., sk , sort first 2 lists correspond lowest 2 sizes. when these merged, find position of size in sorted size list , go on process until end 1 list.
i'm having problem 2 things: 1. whether indeed optimal (is there method homecoming in faster time)? 2. how analyze running time when list size changes when merge?
this precisely same problem finding optimal variable-length bit-encoding string formed alphabet of k
symbols known frequencies s1, s2, … sk
. algorithm exactly huffman algorithm, , you'll find proof of optimality in textbook on algorithms (and many online resources) because classic case of greedy algorithm simple correctness proof.
the repeated application of two-way merge induces binary tree in each node merge. given tree, contribution of leaf total cost of overall merge weight of leaf multiplied depth in tree. (each node merge, , values in leaf participate in merges in path leaf root; number of such merges depth of leaf in tree.) -- or identically --, total length of huffman-encoded bitstring sum of products of weight (frequency) of symbol depth of leaf corresponding symbol in construction tree.
one little improvement algorithm (which missed people writing huffman tree builders): necessary sort weights s1, s2, … sk
, that's sort needed. there, algorithm selects 2 lowest nodes , adds them. resulting sums must monotonically non-decreasing in size (if sum smaller previous sum, previous sum not have been sum of 2 smallest elements). can set sums in queue; @ each step take 2 smallest elements either sorted array of leaves or (implicitly) sorted queue of nodes.
this can optimized farther overwriting array of leaves queue of nodes. (the queue grows bottom of array towards top; it's simple prove top of queue never overtake bottom of array.)
algorithm sorting merge
No comments:
Post a Comment