coder . cl » algorithms

practice makes perfect

Daniel Molina Wegener — Sun, 28 Aug 2011 13:37:02 +0000

As developer, someone that is passionate for his work, will tend to learn new stuff without being directed to do that. Also, a passionate developer probably will develop a software solution using its own initiative, publishing a Free/Open Source Software solution or simply a Proprietary Product that will be available for download on its site or something similar to Android Market. “Practice Makes Perfect”, it can lead a developer from being a junior developer to a senior developer in a short period. The more interest they have, the more they get induced to build better code and better solutions, increasing his knowledge and leaving behind common programming mistakes. Usually an interested developer visits programming forums and mailing lists, has a Blog discussing programming topics — which indicates that he can write and communicate better than average developers — and has better initiatives using its creativity to solve problems.

Programming does not have patterns, so you cannot learn programming using how-to guides or similar stuff. The highest level of abstraction that programming has is an algorithm, which can have very varying representations in real source code, so you cannot memorize algorithms in hard coded languages, and require that you must understand concepts instead of learning certain language approach. Usually a polyglot programmer will try to implement the same algorithm in different languages or similar languages to see how different are they and how he can handle the same problem using different approaches in the real world. A non polyglot programmer will tend to use algorithm variations to produce the same solution with different perspectives.

Practice is the act of rehearsing a behaviour over and over, or engaging in an activity again and again, for the purpose of improving or mastering it, as in the phrase “practice makes perfect”. Sports teams practice to prepare for actual games. Playing a musical instrument well takes a lot of practice. It is a method of learning and of acquiring experience. [Practice (learning method), Wikipedia]

The most common approach to practice is learning a new language. As I wrote in past posts, you can take a book to learn a language, but you must consider three types of books: tutorials, user manuals and reference manuals. The first one that you must handle on each new language that you learn is a tutorial, it will guide you in the very basic features that the language will have, the you should take the user manual and finally you must use the reference manual to see which components are common to its standard library. Also, those tutorials which are fulfilled with exercises are very useful to learn, since they require that you start practising while you are learning. Also you can find programming problems — which require from you a higher level of thinking — by finding those problems exposed in programming contests. Try to acquire some books related to algorithms, learning from them and then practising your programming capabilities by solving those problems exposed in programming contests.

I know that the number of programmers that care about practising his programming capabilities are lower than the average developer, so you as employer must take care of those programmers that really care about programming as a professional activity. You will not find them easily. Where those programmers — usually called passionate programmers — care about programming as an interesting activity, instead of those employees that only take care about the position that they are occupying in the organization. Keep practising, you will be a better programmer.

© Daniel Molina Wegener for coder . cl, 2011. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

understanding oauth

Daniel Molina Wegener — Thu, 21 Apr 2011 01:47:53 +0000

“An open protocol to allow secure API authorization in a simple and standard method from desktop and web applications”. The basis of OAuth protocol, is to allow applications to share its user resources or data among third party applications without sharing the user credentials. It is done using authentication tokens, allowing certain applications to expose its API to other desktop applications and web based applications. The protocol was created with the purpose to solve the problem of enabling delegated access to protected resources.

The basic behavior of this Authentication Protocol, is made through HTTPS requests. Initial, the consumer requests a temporary token — for example gravatar.com accessing your facebook.com account data is a consumer —, called request token. With the request token, the oauth_callback parameter is sent, which holds the callback URL that will be triggered once the user or resource owner authorizes the consumer to access its resources.

OAuth User

The Consumer never has real user credentials, it only has temporary Authentication Tokens to be used to access the User or Resource Owner resources.

OAuth Consumer

A classic example, is Twitter allowed third party applications. To take a look on which applications are accessing to your Twitter account, you must go to the Settings section, and click on the Connections panel. You will see some applications like Twitpic, Gravatar and Twitcam.com.

OAuth Behavior

The first step, is to build and send the Request Token from the Consumer to the Service Provider. This is made using a simple HTTPS request with the OAuth authentication mechanism enabled on the application.

 POST /initiate HTTP/1.1
 Host: some.service.com
 Authorization: OAuth realm="SomeService",
    oauth_consumer_key="dpf43f3p2l4k3l03",
    oauth_signature_method="HMAC-SHA1",
    oauth_timestamp="137131200",
    oauth_nonce="wIjqoS",
    oauth_callback="http%3A%2F%2Fsome.service.com%2Fready",
    oauth_signature="74KNZJeDHnMBp0EMJ9ZHt%2FXKycU%3D"

The initial parameters on the Request Token are the consumer key, an unique identifier of the consumer, that should be stored on the Service Provider application to identify its authorized consumers, for example Twitpic.com on Twitter has its own consumer key. The signature method should be HMAC-SHA1 [ref. RFC2104] or RSA-SHA1 [ref. RFC3447, Sec 8.2], where the first one uses a simple cryptographic hash algorithm and second one requires the PKI RSA standard. You can use PLAINTEXT, but it is not recommended and not commonly recognized. The timestamp holds the request timestamp — on those timestamp friendly days where here in Chile we were playing with timezones the wrong timestamp was generating some problems on certain applications. The oauth_nonce is:

A nonce is a random string, uniquely generated by the client to allow the server to verify that a request has never been made before and helps prevent replay attacks when requests are made over a non-secure channel.

The oauth_callback argument is the callback URL that will be triggered once the user authorizes the consumer to access its resources, and the oauth_signature is the algorithm described by the signature_method argument applied to the URL base string, which is made concatenating the request method (or HTTP verb) and the full URL encoded with all OAuth arguments, except the oauth_signature argument.

Once the Request Token HTTPS request is made, the server returns the oauth_token, oauth_token_secret and oauth_callback_confirmed parameters to the consumer with the form of application/x-www-form-urlencoded content type. Where the consumer must handle those arguments and build the local object containing the OAuth Request Token, then with that token, the consumer request the Authorization URL on the Service Provider endpoint, and it should be triggered by the User — from a standard browser or his desktop application — and then the User or Resource Owner authorizes the consumer application to use its resources.

Once the Consumer is authorized by the User or Resource Owner, is triggered the Consumer provided callback URL on the oauth_callback argument, but it is triggered from the Service Provider application with the oauth_token and the oauth_verifier to construct the subsequent Resource Requests.

This protocol is very simple and useful, and it is having a wide adoption between a lot of web sites. It is not an official standard yet, but is becoming a new authentication standard for many web based applications.

references

The OAuth 1.0 Protocol

© Daniel Molina Wegener for coder . cl, 2011. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

using dependency injection on gui

Daniel Molina Wegener — Sun, 11 Apr 2010 17:04:32 +0000

Many people know the Model View Controller architectural pattern. Another interesting pattern is Presentation Abstraction Control architectural pattern. We can implement it using Dependency Injection or similar Inversion of Control patterns. So, when we are modeling solutions, we many times leave the control or business logic on the Controller. This will guide us to a very coupled platform. Remember that any well designed architecture claims for cohesion instead of coupling its components. On this article I will try to analyze the use of IoC on the PAC pattern, so we can have more maintainable software components, mainly on n-tier architectures.

control classes and interfaces

We can hold our business logic on separate classes. As example we will use a bank loan application, which is supposed to bring us the proper load for certain kind of customers of a bank. So, we need to define an interface pf the loan component.

interface Loan {
    void setUserData(User bankUser);
    void setCustomerData(Customer cus, Account acc);
    LoanResults getLoanResults();
}

This simple interface should be implemented by a variety of Loan types, from consumer ones to mortgage ones, since this variety of loans have different evaluation and estimation logic.

Now we need to define a way to instantiate the proper Loan implementation. The proper Loan implementation request is on the hands of the control layer, since the Loan interface is the abstraction layer itself. So, if we have to need the same methods for every Loan that we request, we just use a Factory Pattern returning the interface Loan, instead of his real implementation.

the abstraction layer

The abstraction layer is implemented on the Loan interface and its returning implementation based on the factory pattern. So we need to implement the factory outside of the package that holds that set of interfaces and implementations. The factory also hold the logic that decides which Loan implementation will be returned, based on the user data, customer data and account data. So we need a call similar to this one:

Loan loanRequest = LoanFactory.getInstance(data);
LoanResults requestResults = loanRequest.getLoanResults();

So, the factory and the loan interface holds the abstraction layer, separating the decoupling the business logic from its real implementation and generating more cohesion. What does the loan request form in the application about those components? The answer is quiet simple: nothing. We will use another pattern to hold input data and pass it through the abstraction layer from the presentation layer. Here is when we are doing dependency injection.

the presentation layer

The presentation layer, which in this case holds a simple LoanRequestForm will never know which concrete class will be used to deliver the LoanResults, so we have to define an instance of another pattern, an Adapter. The adapter should transform the input data from the LoanRequestForm to some kind of Data Transfer Object, to be passed to the factory as CustomerData.

class LoanRequestForm {
    protected Customer cus;
    protected User usr;
    protected Account acc;
    protected LoanRequestFormAdapter adapter = new LoanRequestFormAdapter();
    protected Loan loanRequest;

    LoanResults getLoanResults() {
        try {
            loanRequest = LoanFactory.getInstance(adapter.transform(this));
            LoanResults requestResults = loanRequest.getLoanResults();
            return requestResults;
        } catch (InvalidDataException ex) {
            log.error(ex);
        }
    }
}

Here we are injecting a dependency for the LoanRequestForm and also we acquiring abstraction through the LoanFactory, transforming its input using the LoanRequestFormAdapter to transform the form data into a DTO (Data Transfer Object). The business logic is completely separated from the its User Interface and the abstraction layer. So we are fitting the model with PAC pattern and the IoC DI pattern at once.

The transformation between the form and the factory is a must. The factory can not depend on the form and the form can not depend on the factory. The adapter plays an important role here, since it’s separating layers and transforming the application into an n-tier architected application, so we have more control over changes and possible enhancements on the model. Also it will bring us possible automated processes, since the factory do not needs the form to operate over the loan. So you build another application that just handles the user data and generate loan bids automatically to the customers based on the same logic, then we are reusing the properly the code.

conclusion

We can glue the PAC pattern with IoC DI pattern, so we have more reliable and flexible GUI implementations, then we can work in our platforms without worries about how to generate reusable code if we follow well designed and well applied patterns. Also we can use frameworks, but it depends on the application size and how do need to work on it. For an application with just two active forms, we do not need large scale architectures.

© Daniel Molina Wegener for coder . cl, 2010. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

source code optimization in c

Daniel Molina Wegener — Thu, 31 Dec 2009 21:45:02 +0000

"…premature optimization is the root of all evil".
— Donald Knuth

I agree with the fact that we must do our source code level optimizations when we have finshed the construction stage or it is almost complete. I was searching articles and papers about optimizing C source code to be applied on my programs and libraries. I’ve collected some of those optimizations. But you must not confuse algorithm optimization, source code optimization and compiler optimization, since the first one refers to algorithm design and the second one just refers to the algorithm implementation, and both are sharing just few common approaches to formal reductions.

Usually the source code optimization just applies well known formal reductions. We will not treat those formalities in this article. Instead we will take a look on some examples that I’ve collected, allowing a common reasoning about how to optimize the source code. Most ideas for source code optimizations comes from η-conversions using λ equivalences, allowing for example reductions from [pmath size=8]O(log ~n) ~left~ O(n)[/pmath]. Also, some optimizations without an apparent η-conversion, since most is done at low level, as those which are taken by the compiler and translated to less assembler instructions, such as function inlining and parameter reduction. Most η-conversions which implies η-reductions, and are called strength reductions. But we have also the opposed side, where we are — instead of doing reductions — populating our programs with more lines of code, including repeated code, such as the loop unrolling optimization.

inline code

C supports both macros and inlining functions. Both features are reaching the same effect, when the code is compiled, all inlined instructions are placed when we call the macro or the inline function. For example, a typical approach, is a max(a, b) function or a MAX(a, b) macro.

/* the slowest expression, compiling and running */
int
max(int a, int b)
{
	if (a > b)
		return a;
	else
		return b;
}

/* normal expression with inlining */
inline int
max(int a, int b)
{
	return ((a > b) ? a : b);
}

/* the faster expression using macros */
#define MAX(a, b)			((a > b) ? a : b)

The inline and the macro definitions both are similar. The macro places every occurrence of MAX(var1, var2) expressions with the ((var1 > var2) ? var1 : var2) expression. The inline does the same, and places every occurrence of max(var1, var2) with the result of the expression ((var1 > var2) ? var1 : var2). So, the macro usage is oriented to replace entire blocks of code, instead of inline functions which are oriented to replace calls and skip the assembler call instruction and its derivates. We may look examples of the call opcode in typical system calls on operating systems, such as FreeBSD system calls.

parameter reduction

Reducing parameters implies less assembler instructions. C calling convention typically push parameters into the stack, which implies that for each parameter a push opcode call is made. For example you can group function arguments — if they are use subsequently by a group of functions — under struct forms. Think a little, each push instruction in his family takes 2 clocks, varying to 18 clocks on x86 architecture — depending on vendor and model. For 8 arguments, it will take from 16 to 36 clocks, against 2 to 18 clocks with one argument. This approach on argument reduction may be applied to those functions which are not inlined.

/* slower version */
int
my_function(int arg1, int arg2, int arg3, int arg4)
{
	/* do something... */
}

/* faster version */
struct myargs
{
	int arg1; int arg2; int arg3; int arg4;
};

int
my_function(struct myargs *args)
{
	/* do something... */
}

practical reductions

We do reduction when we remove unnecessary steps in our functions. We can do most reductions just by thinking a little on the code, and also there are some well known reductions which can be used as optimizations.

removing else

/* with else, smaller code, but slower one */
inline int
test(int a)
{
	return a > 0 ? 1 : 0;
}

/* without else, large code but faster one */
inline int
test(int a)
{
	if (a > 0)
		return 1;
	/* implied else */
	return 0;
}

On this optimization, we are removing jmp and jxx family of instructions, where most of them takes near to 7 clocks on x86 architecture and also the required instruction to setup the proper context for the following instruction in the program sequence. This is like the spartan programming style, where most code is minimized through reductions and similar programming tasks.

bitwise operations

Bitwise operations are cheaper than other operations. For example, curiously, a shift operation and plus operations have less clocks than multiplication and division operations. The happens to logical evaluations. In other post, I’ve wrote about logical minimizations. A complement to what I’ve said, is the fact that mul instruction takes from 10 to 40 clocks and div instruction takes from 15 to 40 clocks on x86 architecture, against add, sub, shr, shl and similar instruction that are taking from 2 to 7 clocks. Remember that the number clocks used by some instruction is vendor and model dependent. For example if we do:

/* we take a call to the math.h function pow() */
n = y * pow(2.0, x);

/* we can replace it by */
n = y << x;

In other words, the x << y operation is equivalent to [pmath size=8]y * 2 ^ x[/pmath]. You can find other math equivalences too. From this basic approach, we can reach other types of optimizations through the proper maths. You can take a look on the bit wizardry page from Jörg Arndt if you want to seek for more bitwise operations and reductions.

reversing counter loops

Usually we do one step loops for fixed set of counters. The following example shows a reduction which can be used every time we can do a reverse loop.

/* for loop coded usually */
for (c = 0; c < MAX_C_VALUE; c++) {
	/* do something... */
}

/* for reversed loop */
for (c = MAX_C_VALUE; c--; /* we do not do nothing here */ ) {
	/* do something... */
}

table lookups

A table lookup is the technique where we create an array and kind related structures to lookup for data, usually previously calculated data or simply, we are looking for concrete data that we want to use.

/* we can have a switch statement */
ourtype_t *varn = NULL;
switch (var1) {
case 0:
	varn = value1; break;
case 1:
	varn = value2; break;
case 2:
	varn = value3; break;
default:
	varn = value1;
}

/* or have an if/else statement */
if (var1 == 0)
	varn = value1;
else if (var1 == 1)
	varn = value2;
else if (var2 == 2)
	varn = value3;
else
	varn = value1;

/* so we can simply replace those values using arrays */
ourtype_t mapsvalues[] = { value1, value2, value3 };
varn = mapsvalues[var1];

Also we usually can setup reductions by creating table lookups and state machines, so we can create the proper map between certain variable data and certain function. State machines are a powerful abstraction which allows us to code different states inside data structures.

We can have predefined values in our table lookup tasks. Then we are using lazy evaluation. Every time we have a constant value which is requested and not calculated each time we work with it, we are doing lazy evaluation.

register keyword

/* using the register keyword should help creating faster code */
register int counter;

reduce data access computation

If we have deep constructed data structures (struct in C), every time we access most deep nodes, we are using pointer arithmetics, which implies basic math operations to access certain parts of our structures. Here we can do tow tasks: alias creation and usage and padding adjustment. Alias usage, means that we must use an internal pointer to access directly a structure member, so we omit pointer calculation each time we access it. Padding adjustment, is just about to create the proper data structure member order.

/* structure without the proper padding */
struct my_struct {
	char *a;
	void *b;
	int c;
	double n;
	char *x;
};

/* the same structure with the proper padding */
struct my_struct {
	double n;
	int c;
	char *a;
	char *x;
	void *b;
};

Also the data access computation is reduced on assembler level code, not the C code. Here we have an implied η-reduction and invisible one. For deeply and nested data structures, we have the same issue. We should have as rule that the structure size must have an ideal size of [pmath size=8]~s = ~n^2[/pmath], with [pmath size=8]~s[/pmath] as the structure size and [pmath size=8]~n[/pmath] as the padding adjustemnt to power of two.

/* some structs with nested members */
struct a {
	int m1;
};

struct b {
	int m2;
	struct a *m3;
}

struct c {
	int m4;
	struct b *m5;
}

/* the non cheaper version to access its members */
struct c *x = some_function_returning_c();
x->m5->m3->m1 = some_other_function();
if (x->m5->m3->m1 != 0) {
	another_function(x->m5->m3->m1);
}

/* the cheaper version to access its members
   using aliases */
struct c *x = some_function_returning_c();
struct b *p;
b = x->m5->m3;
b->m1 = some_other_function();
if (b->m1 != 0) {
	another_function(b->m1);
}

loop unrolling

Each step on a loop repeats some instructions. For example if we have a fixed size array, where we must treat each element with certain function, we can use unrolled loops.

/* normal iteration over fixed size array */
for (i = 0; i < 100; i++) {
	call_some_function(my_array[i]);
}

/* applied loop unrolling and reverse loop */
for (i = 100; i--;) {
	call_some_function(my_array[i]);
	call_some_function(my_array[--i]);
	call_some_function(my_array[--i]);
	call_some_function(my_array[--i]);
	call_some_function(my_array[--i]);
}

Note that here we have a fixed size array. In other case is hard to know the array size. Also we can use our compiler optimization to unroll each loop, if our compiler has the proper option. For example GCC supports automatic loop unrolling by using the -funroll-loops flag.

loop jamming

Reusing blocks of code inside loops matters.

/* here we have two loops for something that can
   be done one loop */
for (i = 0; i < 100; i++) {
	some_function_a(my_array[i]);
}
for (i = 0; i < 100; i += 10) {
	some_function_b(my_array[i]);
}

/* here we have two loops for something that can
   be done one loop */
for (i = 0; i < 100; i++) {
	some_function_a(my_array[i]);
	if ((i % 10) == 0) {
		some_function_b(my_array[i]);
	}
}

On the example above we have the proper loop doing the same tasks with less operations, so we have reduced the steps of that loop from [pmath size=8]O(2n)[/pmath] to [pmath size=8]O(n)[/pmath] — note that [pmath size=8]O(2n)[/pmath] is just symbolic and not strict.

bit padding matters

It depends on the architecture. Processing data types shorter or larger than the register size do not have a cheaper cost than using them. For example if we use char or short, we are not using a complete register, which is more hard to handle than register length variables, such as int and long.

references

Michael E. Lee, "Optimization of Computer Programs in C", Ontek Corporation.
Paul Hsieh, "Programming Optimization".
Koushik Ghosh, Writing Efficient C and C Code Optimization, The Code Project.
Optimizing C and C++ Code. EventHelix.
Cris H. Pappas and William H. Murray, "386 Microprocessor Handbook", McGraw Hill.

© Daniel Molina Wegener for coder . cl, 2009. | Permalink | 2 comments | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

java mutexes

Daniel Molina Wegener — Mon, 14 Dec 2009 11:25:27 +0000

Java synchronization is usually made through the synchronized keyword. Synchronized allow users to create a mutex around certain variable, class or method, where a mutex allows concurrent access to it. By concurrent access we should understand access from multiple threads. If an operation is atomic, there one and only one process or thread executing it. Then, mutex based operations are atomic.

In Java, thread synchronization made using the synchronized keyword can create a heavy — with a high number of threads — overhead in two main cases, the first one is when it is used on classes and the second one when it is used in methods. Then we must think a way to create small locks, mainly if we are using design patterns like singletons.

/* BAD CASE 1, throws NullPointerException */
class BadMutexBaseClass {
	private static BadMutexBaseClass singletonInstance;
	public static BadMutexBaseClass getInstance(void) {
		synchronized (singletonInstance) {
			if (singletonInstance == null) {
				singletonInstance = new BadMutexBaseClass();
			}
			return singletonInstance;
		}
	}
}

/* BAD CASE 2, heavy overhead */
class BadMutexBaseClass {
	private static BadMutexBaseClass singletonInstance;
	public synchronized static BadMutexBaseClass getInstance(void) {
		if (singletonInstance == null) {
			singletonInstance = new BadMutexBaseClass();
		}
		return singletonInstance;
	}
}

Both cases are wrong, we must use mutex object, without touching the singleton member and allowing the program to access the singleton atomically. We can use a basic Object instance to create the proper mutex.

/* MUTEX BASED CASE */
class MutexBaseClass {
	private static Object mutex = new Object();
	private static MutexBaseClass singletonInstance;
	public static MutexBaseClass getInstance(void) {
		synchronized (mutex) { /* critical section */
			if (singletonInstance == null) {
				singletonInstance = new MutexBaseClass();
			}
			return singletonInstance;
		}
	}
}

By creating a dummy object, we can use it as mutex in our program. With a more low level perspective, this technique is similar to the one used with pthread_mutex_lock(P) techniques while we are programming with threads in C. An interesting point is the fact that non-blocking algorithms are faster than locking ones. The mutex technique is a blocking technique. Non-blocking techniques, such as lock-free and obstruction-free, can allow the user to implement faster — but more complex — code.

GCC has built-in atomic operations to allow the creation of such code, like CAS operations, allowing the use of lock-free techniques. For example in the code bellow — which is using the __sync_bool_compare_and_swap atomic operation — we can see that the CAS or Compare and Swap operation is made around the number of transactions with old value and old value plus one.

do {
	old = vproc_shmem->vp_shmem_transaction_cnt;

	if (unlikely(old < 0)) {
		if (vproc_shmem->vp_shmem_flags & VPROC_SHMEM_EXITING) {
			_exit(0);
		} else {
			__crashreporter_info__ = "Unbalanced: vproc_transaction_begin()";
		}
		abort();
	}
} while( !__sync_bool_compare_and_swap(&vproc_shmem->vp_shmem_transaction_cnt, old, old + 1) );

I hope that Java will include kind that atomic operations to allow the use of non-blocking algorithms and more faster code and since I agree with the fact that we are entering the multi-core era…

© Daniel Molina Wegener for coder . cl, 2009. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)