Archive for April, 2009

RQuery: now with | and & operators 0

This weekend I wanted to tackle some of rquery’s shortcomings so I added the ability to |/& your operations and also the ability to use an object in the block instead of the symbol class for the column names. I agree with the assertion that it’s bad to pollute other classes, but I still think the symbol.declaration syntax is pretty clean and there’s isn’t much likely hood of those methods ever being implemented elsewhere ( says the guy who decided to implement them :P )

Examples

The key bit is the ability to add multiple operations to the same line connected with | or & operators. In this way you can form extremely complex queries, where as before you were limited to whatever you could get out a series of anded logic. The only stipulation is that you you have to wrap the operations with parenthesis if they are on the same line, or ruby will complain about a syntax error.

#All operations need to be on the same line and in parens
#either the | operator or the & operator can be used on a singel line
User.where do |user|
  (user.age.is > 20) | (user.age.in 16,18)
end

The above example should be easy to follow but there is one important gotcha with including &’s and |’s on the same line. Because the & operator has precedence in a string of operations if you want both | and & on the same line you need to force precedence for the | where it needs to be evaluated before the & in the resulting query.

#the & takes precedence here and will be grouped with the contains "Alice" which will be
#or'd with the contains "George"
#=> name contains "George" or (name contains "Alice and age from 20 to 30)
User.where do |user|
  (user.name.contains "George") | (user.name.contains "Alice") & (user.age.from 20..30)
end

In this case the query writer wanted to get the users who’s names contained either George or Alice and then filter with an age range. What it’ll actually do is find the users who’s name contains Alice and have an age between 20 and 30, and add anyone with a name like George. To fix this you can add more parens :(

#to correct the above to the more intuitive version add parens to force precedence of the
#contains operations
#=> (name contains "George" or name contains "Alice) and age from 20 to 30
User.where do |user|
 ((user.name.contains "George") | (user.name.contains "Alice")) & (user.age.from 20..30)
end

Or you can simply move the and’d operation to the next line as the operations are evaluated top down as the ruby block is executed and they are grouped using and.

#in this sutation it would be cleaner and easier to just move the and'd statement down
#a line as all seperate lines are and'd and lines have precedence from top to bottom
#additionaly operations on seperate lines don't need parens
User.where do |user|
 (user.name.contains "George") | (user.name.contains "Alice")
 user.age.from 20..30
end

The biggest change here is simply being able to use the | instead of the default for each line (and). And finally, if you checked out RQuery before, you’ll have noticed I’m passing in an object that represents the model being queried. The advantage is there’s no class polution, and the not so obvious advantage is that it will throw and nice verbose and descriptive exception if you try and use an attribute that doesn’t exist for a given model.


#should you attempt to use and attribute that doesn't exist for a given model
#rquery will tell you before it's sent to the db
User.where do |user|
  user.ssn.is == "123-45-6789"
end
# RQuery::AttributeNotFoundError: The field 'ssn' doesn't exist for this object
#       from /Users/johnbender/Projects/rquery/lib/rquery/attrib...
#       from (irb):24
#       from /Users/johnbender/Projects/rquery/lib/rquery/active...
#       from /Users/johnbender/Projects/rquery/lib/rquery/active...
#       from /Users/johnbender/Projects/rquery/lib/rquery/active...
#       from (irb):23


My specs cover most of the code and functionality for the activerecord extension (still the only piece I’ve worked on) and you can still use the symbol syntax if you want.

#environment config
RQuery.use_symbols

#example of using symbols, you can see more at the RQuery page on my site.
User.where do
 (:name.contains "George") | (:name.contains "Alice")
 :age.from 20..30
end

You can get the gem from github as johnbender-rquery, and I won’t be updating the rubyforge one unless github fails to build my gem again ( /cross_fingers ).

Searching the abstract form in erlang 6

[update: I've added the head matching version of the check funs as per the recommendations in the comments. Thanks Hynek!]

I imagine there’s a much better way to do this, but in the process of writing another parse_transform hack I wanted to search for specific things within the abstract form of a module which looks something like this:

[{attribute,1,file,{"./test2.erl",1}},
 {attribute,1,module,test2},
 {attribute,2,export,[{append,2},{prepend,2}]},
 {attribute,3,compile,[]},
 {attribute,6,after_exec,{[prepend,append],to_a_atom}},
 {function,8,before_prepend_to_a_atom,2,
     [{clause,8,
          [{var,8,'S1'},{var,8,'S2'}],
          [],
          [{op,9,'++',{var,9,'S1'},{var,9,'S2'}}]}]},
 {function,11,before_append_to_a_atom,2,
     [{clause,11,
          [{var,11,'S1'},{var,11,'S2'}],
          [],
          [{match,12,
               {var,12,'Fun'},
               {'fun',12,
                   {clauses,
                       [{clause,12,...

It's a bunch of recursive lists, tuples, atoms, etc. Which means you should be able to write a simple function to dive through it and find something like an atom or number, but what if you want to match a tuple where the third item is the value "Fred"?

Here's the hack I came up with this evening:

-module(util).

-export([find/2]). 

-define(call_check, fun({call,_,{atom,_,Name},_}) -> true;
                       (_) -> false
                    end).

-define(function_check, fun({_, _, Name, _, _}) -> true;
                           (_) -> false
                        end).
find(Form, CompareFun) ->
    find(Form, CompareFun, []). 

find(Form, CompareFun, Acc) when is_tuple(Form) ->
    case CompareFun(Form) of
        true -> Acc ++ [Form];
        _ -> find(tuple_to_list(Form), CompareFun, Acc)
    end;

find(Form, CompareFun, Acc) when is_list(Form) ->
    case CompareFun(Form) of
        true -> Acc ++ Form;
        _ -> Acc ++ [Y || X <- Form, Y <- find(X, CompareFun, Acc), Y =/= []]
    end;

find(Form, CompareFun, Acc) ->
    case CompareFun(Form) of
        true -> Acc ++ [Form];
        _ -> Acc
    end.

Its fairly easy to follow how the 3 clauses of find/3 work

  • First Clause: If Form is a tuple check it with the comparison function. If the comparison function returns true add it to the accumulator and return the accumulator. If not convert it to a list and call find.
  • Second Clause: If Form is a list check it with the comparison function. If the comparison function returns true add it to the accumulator and return the accumulator. If not call find on each of the elements of the list.
  • Third Clause: If Form is anything else it can’t contain something so we simply check it with the comparison function, and add it to the accumulator if it matches otherwise return the current value of the accumulator.

The funs defined at the top as macros are the matching functions. They have to be custom built for whatever you’re looking for, and the variables you include within them must be bound in the scope where you use them.

Aside from that, there are 3 things I think this highlights: anonymous functions are a gateway drug, you only have to deal with a small set of possible data structures in Erlang, and I would love the ability to check for equality with the same syntax used for assignment/matching:

{foo, _dontcare, _alsodontcare} == Bar

In this case I ended up using the assignment, catching the badmatch error, and returning false unless it worked. This is not ideal (the try should probably match against the badmatch error at the very least), but it does work, its short and easy to read and its useful for searching complex data structures. Using find would look something like:

%% finds anything in Form that resembles {call,_,{atom,_,Name},_}
foo(Form, Name) ->
    find(Form, ?call_check).

Name fills its roll in the macro and the fun is used to check all the elements in Form. What you get back is a list of the matches. Not too shabby for 30 something lines of code!

non-deterministic description 2

File this under that “meta-tao of hacking” or WARNING: thar be rosy language and fluff hereabouts.

Walking my dog down the street at dusk today the air had a nice warmth to it and I enjoyed the pink and gold hues of the sunset (please stick with this post it does go somewhere). It’s warming up here, and instead of racing home with my head down I was free to enjoy the smell of the air and the sites around me. So, I was letting this scene (dusk, pink, leaves, trees) roll around in my head trying to think how I would describe it accurately to my wife so she could somehow feel and see the light as it passed through the new leaves and buds of flowers, the warmth hitting my face, etc, etc.

Mooshy

Its very hard, if not impossible to achieve that type of expressiveness. Thats what makes great writers. Transferring the totally ephemeral/sensory/emotional/mooshy stuff to these semi-deterministic buckets of words and phrases that comprise our language is a real and difficult challenge that people seek to tackle every day. Even as I’m writing this I’m struggling to lay my ideas out in a way that can be understood by someone else reading them. Without even considering the gulf of perception and personal experience that separates any two people this task is almost insurmountable.

In a lot of ways language designers are trying to do the same thing in reverse. They’re trying to give us the tools to describe a very deterministic world using words and phrases with more meaning than their underlying implementation. We’ve continually abstracted farther and farther away from the on die voltage so that a more detailed and expressive version of computational reality can be recorded.

As a result we get a meeting somewhere in the middle:

God/Universe => Words/Phrases => ---?--- < = Haskell <= Assembly <= 0011101001000100

As a programmer your brain bridges the gap there, further distilling your understanding of language and how to represent the world in written form into code so we can turn that last bit around.

God/Universe => Words/Phrases => :D => Haskell => Assembly => 0011101001000100

Sweet diagram bro

That gap from Words/Phrases to your language of choice is 6 dashes and a question mark wide. Plenty wide enough for anyone who hacks, or takes an interest therein, to feel proud. Thats the world we live in, striving to make our lives easier and our code more expressive.

As time marches on it seems like we've worked our way towards that reality of using languages to tell the computer exactly what to do. Slowly but surely working up from hardware, to assembler, to cobol, C, Java, Ruby/Haskell/Scala/. It's gotten to the point now where its even efficient when it gets boiled all the way down! So what are some of the spoken language "idioms" we use every day when programming, and what are some other things that have yet to make their way into our programming?

merge! valid? Damn you GHCI compile my code!

I personally really like the Ruby way of using the language equivalent of tone to convey meaning in code. When you ask a question of an object and expect a boolean you can use a question mark (how novel!) and when I use the bang operator it reminds me of getting frustrated with clamshell packaging and just smashing my way into it with a screwdriver which is most definitely a destructive update.

Even better is Haskell's type inference. Admittedly my knowledge of Haskell is limited to a small set of hacks that I've been toying with, and in almost every case I define a type signature for my functions, but inference is something we do every time we speak or write. My use of "their" or "there" in spoken form is immediately apparent because of the context its been wrapped in. Haskell's compiler(s) is(are) smart enough to force a type based a what can be gathered about the operations over that type within a function. Thats pretty awesome.

Stuff you should ignore

There are other places that inference could be used. Namespaces/modules for instance. For better or worse, allowing the compiler/vm to handle namespace collisions could be achieved my the inferred functionality and object contained within. This of course gives people the free reign to name their modules whatever they like, thereby confusing the poor end user who needs to use two modules with completely different functionality and the same name (I think there's already another project named RQuery).

Haskell learnings 0

This week I caved and decided to learn haskell. I’m about 5 chapters into RWH and its great except for a few topics they gloss over early on (Type classes, $). Luckily my attention was directed here, and I was quickly able to fill in some gaps. Big thanks to f4hy from proggit for the link.

Aside from the link there are a couple things that have gotten me really excited. One of which is partial function application.

Partial Function Application

-- file: ch04/Fold.hs
myFoldl :: (a -> b -> a) -> a -> [b] -> a
myFoldl f z xs = foldr step id xs z
    where step x g a = g (f a x)

This is a problem from chapter four of RWH with a lot going on but I’m going to focus specifically on the function application for foldr. If you’re new to haskell it looks pretty ambiguous but function application associates to the left and is equivelant to:

(foldr step id xs) z

That’s still not much help because a common call to foldr has a value of some sort where the function id is in this case (note: id simply returns whatever argument it is passed). For example.

--zero value of 1
foldr (+) 1 [1,2,3]
--7

So to figure out how this works we can start with the trace of how foldr builds it’s thunks up to establish basic understanding:

--included for reference
foldr step zero (x:xs) = step x (foldr step zero xs)
foldr _    zero []     = zero
 
--foldr (+) 1 [1,2,3] 
--step = (+)
--zero = 1
+ 1 (+ 2 (+ 3 (1)))

That’s simple enough, now lets apply this to the chapter 4 problem and see where it gets us.

--a custom foldl using foldr
myFoldl f z xs = foldr step id xs z
    where step x g a = g (f a x)
 
--foldr step id [1,2,3] 
--step = step
--zero = id
step 1 (step 2 (step 3 (id)))

But how does that damn z fit in here? If evaluating foldr with those three arguments gets me that expression how does that last argument fit in? This is where I got stopped and figured that either this assumption was incorrect or I was missing some fundamental piece of the problem. As it turns out, this is correct but the way in which this statement is evaluated to come up with a result is not obvious, and really amazing.

The explanation I got from #haskell (Gracenotes/Eridius) was simple and important for working with haskell in general. All functions with more than a single argument can be decomposed one argument at a time into anonymous functions with the remaining arguments.

--given the following definition of step
step x g a = g (f a x)
 
--we can count on the following assertions
step = \x g a -> g (f a x)
 
step 1 = \g a -> g(f a 1)
 
step 1 id = \a -> id (f a 1)

In the case of our step function and the resulting expression of our foldr, you can see we only give it 2 arguments but step calls for 3. That leaves us with the last argument unassigned, and evaluating that resulting expression looks something like:

step 1 (step 2 (step 3 (id)))
 
\a3 -> (\a2 -> (\a -> id (f a 3)) (f a2 2)) (f a3 1)

Confusing? Yes its hard to read so walking though it step by step might be useful.

--right most fold operation 
step 3 (id)
 
--only two arguments to the step function
--step x g a = g (f a x) so...
\x g a = g (f a x)
 
--we know x and g (3 and id respectively)
\a -> id (f a 3)
 
--having evaluated the rightmost fold we have
step 1 (step 2 (\a -> id (f a 3)))
 
--having evaluated the 2 rightmost folds
step 1 (\a2 -> (\a -> id (f a 3)) (f a2 x))
 
--and finally evaluating the last fold we get 
\a3 -> (\a2 -> (\a -> id (f a 3)) (f a2 2)) (f a3 1)

At this point we’ve at least figured out how the function is able to take only two arguments (partial application!) and we can actually plug that back in to our original myFoldl (adding in the example argument values (+), 1, and [1,2,3]) to see what it will look like

--for myFoldl (+) 1 [1,2,3]
myFoldl (+) 1 [1,2,3] = (\a3 -> (\a2 -> (\a -> id (+ a 3)) (+ a2 2)) (+ a3 1))  1  
    where step x g a = g (f a x)

And if you look waaay over there on the right we have our third argument! So you can follow the course of the evaluation down from there replacing a3 with 1 and then evaluating (+ a3 1) then passing the result to a2 and (+ a2 2) to a.

The laziness of the language allows you to deal with completing your function call at a later date through partial application. This also gives some insight into the type signature syntax of functions

myFunc :: Int -> Int -> Bool
myFunc x y = x == y

What its really saying is that after evaluation of myFunc with its first argument of type Int another function accepting an argument of type Int will be returned. Should you supply another argument to the resulting function a final result of Bool will be returned. I’m not speaking specifically about the execution of the code but rather the nature of the application of functions to the arguments passed.

--different ways to write myFunc
\x -> (\y -> x == y)
\x y -> x == y
myFunc 1 = \y -> 1 == y 
myFunc 1 2 = False

If you look at that first one there it looks an awful lot like the type signature. This kind of symmetry to the code and how it works is deeply satisfying to me. I am extremely excited to get my mind wrapped around how to use things like monoids to make your code more stable.

Hopefully I’ll be back with more as haskell continues to flip my lid.