Sunday, August 29, 2010

Visitor pattern

Visitor Pattern :
-----------------------

When to Use the Visitor Pattern

You should consider using a Visitor pattern when you want to perform an operation on the data contained in a number of objects that have different interfaces. Visitors are also valuable if you must perform a number of unrelated operations on these classes. Visitors are a useful way to add function to class libraries or frameworks for which you either do not have the source or cannot change thesource for other technical (or political) reasons. In these latter cases, you simply subclass the classes of the framework and add the accept method to each subclass.


Let's consider a simple subset of the Employee problem discussed in the Composite pattern. We have a simple Employee object that maintains a record of the employee's name, salary, number of vacation days taken, and number of sick days taken. A simple version of this class is the following:

Since Java is a strongly typed language, our base Visitor class needs to have a suitable abstract visit method for each kind of class in the program. In the following first example of our basic abstract visitor class, we have only Employees.

public abstract class Visitor {     
public abstract void visit(Employee emp);     
public abstract void visit(Boss emp); 
} 

Notice that there is no indication what the Visitor does with each class in either the client classes or the abstract Visitor class. We can in fact write a whole lot of Visitors that do different things to the classes in our program. The first Visitor that we write sums the vacation data for all employees.

public class VacationVisitor extends Visitor {     
protected int total_days;     
public VacationVisitor() {         total_days = 0;     }     //--------------     
public void visit(Employee emp) {         total_days += emp.getVacDays();      }    //--------------     
public void visit(Boss boss) {         total_days += boss.getVacDays();     }     //--------------     
public int getTotalDays() {         return total_days;     } }




Adaptor vs Bridge pattern

http://sourcemaking.com/design_patterns/adapter/java/2

Rules of thumb

  • Adapter makes things work after they’re designed; Bridge makes them work before they are.
  • Bridge is designed up-front to let the abstraction and the implementation vary independently. Adapter is retrofitted to make unrelated classes work together.
  • Adapter provides a different interface to its subject. Proxy provides the same interface. Decorator provides an enhanced interface.
  • Adapter is meant to change the interface of an existing object. Decorator enhances another object without changing its interface. Decorator is thus more transparent to the application than an adapter is. As a consequence, Decorator supports recursive composition, which isn’t possible with pure Adapters.
  • Facade defines a new interface, whereas Adapter reuses an old interface. Remember that Adapter makes two existing interfaces work together as opposed to defining an entirely new one.

Thursday, August 26, 2010

Ordered Message processing


Get messages in to queue.

Check the version of message. If we receive (Out of sequence) higher version message ahead of original message, Then put same in Delay queue.

If we receive initial message then figure out all higher versions from delay queue and then process it.

1. Resequencer :

The Resequencer can receive a stream of messages that may not arrive in order. The Resequencer contains in internal buffer to store out-of-sequence messages until a complete sequence is obtained. The in-sequence messages are then published to the output channel. It is important that the output channel is order-preserving so messages are guaranteed to arrive in order at the next component. Like most other routers, a Resequencer usually does not modify the message contents.

From internet:
----------------------

The only way to ensure that messages are processed in the same order as they are created is to ensure you only have one Message Producer and one Message Consumer.


Basically the messages will only maintain the order they are put on and taken off the queue, after that its a bun fight as to which Message Consumer runs. If you worried about performance then the only solution is to have multiple message streams, it's probably easier if I give an example.


I once developed an Order processor that required each order for the same customer to be processed sequencially so I create 10 queues (you can have any number) with 10 Message Consumers, one bound to each queue. Unfortunately you can still only have one Message Producer which does a hash on the Customer Id to ensure that all the messages for same Customers went onto the same queue.

Customised solution to include non-xa resource on phase 2 protocol [serverside.com]

I got the following solution. I thought this custom solution is better than relying on app servers and different options provided by different
app servers.

Let us say you have a update method that updates multiple XA resources and one Non-XA resource, I want to commit all or none. I dont
want my Non-XA resource to participate in global transaction, so I will create a method that will do updates to Non-XA resource
and I use the transaction attribute "NotSupported" and I call this method from a method that particpates in transaction. See the
following snippet to understand this.


/* It updates mutliple XA resources and one Non-XA resource. Transaction-attribute can be Requires, Requires-New, Mandatory */
public void boolean update() throws Exception{
boolean commit = true;
try{
commit = updateXA();
if ( false == commit ){
return false;
}

/* It came here means, XA updates are successful */

/* Now update Non-XA resource */
commit = updateNonXA();
}
catch(Exception e){
commit = false;
throw e;
}
finally{
if( !commit ){
/*rollback the entire transaction that rolls back the resources that participated in transaction.

In a way it rolls back XA updates if any

No need to rollback Non-XA updates, if non-xa update went through successfully, we dont come here

If Non-XA update is not done then we dont roll back its changes, we need only XA changes.*/
ctx.rollbackOnly();
}
}
}

/* Do updates to multiple XA resources here
Transaction-attribute must be Mandatory or Requires(we know the caller is running in transaction ;) )*/
public void boolean update() throws Exception{

}

/* Do updates to single Non-XA resources here
Transaction-attribute must be NotSupported, It should ignore the global transaction */
public void boolean update() throws Exception{

}


Another good post from Mike

XA does generally exact a performance penalty. In non-XA operations, typically there's only one commit that does any signficant amount of work - the RDBMS commit. In XA, you end up with each "Resource Manager" (JMS, RDBMS, etc) commiting, and the Transaction Manager (the app server) also having to keep track of transactions.


In XA mode, the TM and each RM needs to have a transaction log, and typically the following happens:


- App or container asks for commit

- Foreach Resource:


- Call resource prepare - results in "hardening" of data and a disk force

- If all resources voted "yes", TM writes a prepared record to its tran log (also a disk force)

- Foreach Resource:

- Call resource commit - results in a disk force and data is committed

- "soft write" by the TM that the commit is complete (e.g. write w/ no disk force).


If you have two resources, such as your case (JMS and RDBMS), there are five disk forces involved (2 for JMS, 2 for RDBMS, 1 for TM).


The overhead will vary tremendously depending on the environment you're in and the products you're using, but it's often on the order of 50 milliseconds-100 milliseconds on average.


As an earlier thread on TSS indicated, on the JMS side not all JMS providers disk sync by default - they soft write without guaranteeing that data has been written to disk. This may be why you're not seeing as great of a performance hit - check your JMS providers' docs for details on "disk syncing" or "disk forcing" or "synchronized writes". Note also that some JMS providers don't use a transaction log at all, such as JBoss' JMS provider. And some JMS providers get it partially write - for example, one commercial JMS provider only durably records transactions for persistent messages, and (incorrectly) leaves transactions involving non-persistent messages hanging in the breeze.


For providers which have a log but don't disk force, a hard failure of the server (process or machine) can result in lost transactions. For providers which don't have transaction logs at all, the server can't preserve transactions across startup/shutdown of the JMS server at all.


On the TM side, Websphere versions 4.x and 5.x definitely involve a disk force when all resources are prepared. I don't remember what JBoss does.

Phase 2 protocol

http://www.theserverside.com/discussions/thread.tss?thread_id=21385

XA and NonXA datasource[ GO TO TOP ]

An XA transaction, in the most general terms, is a "global transaction" that may span multiple resources. A non-XA transaction always involves just one resource.


An XA transaction involves a coordinating transaction manager, with one or more databases (or other resources, like JMS) all involved in a single global transaction. Non-XA transactions have no transaction coordinator, and a single resource is doing all its transaction work itself (this is sometimes called local transactions).


XA transactions come from the X/Open group specification on distributed, global transactions. JTA includes the X/Open XA spec, in modified form.


Most stuff in the world is non-XA - a Servlet or EJB or plain old JDBC in a Java application talking to a single database. XA gets involved when you want to work with multiple resources - 2 or more databases, a database and a JMS connection, all of those plus maybe a JCA resource - all in a single transaction. In this scenario, you'll have an app server like Websphere or Weblogic or JBoss acting as the Transaction Manager, and your various resources (Oracle, Sybase, IBM MQ JMS, SAP, whatever) acting as transaction resources. Your code can then update/delete/publish/whatever across the many resources. When you say "commit", the results are commited across all of the resources. When you say "rollback", _everything_ is rolled back across all resources.


The Transaction Manager coordinates all of this through a protocol called Two Phase Commit (2PC). This protocol also has to be supported by the individual resources.


In terms of datasources, an XA datasource is a data source that can participate in an XA global transaction. A non-XA datasource generally can't participate in a global transaction (sort of - some people implement what's called a "last participant" optimization that can let you do this for exactly one non-XA item).


For more details - see the JTA pages on java.sun.com. Look at the XAResource and Xid interfaces in JTA. See the X/Open XA Distributed Transaction specification. Do a google source on "Java JTA XA transaction".


-Mike

Tuesday, August 17, 2010

Defining Serializable Fields for a Clas

From Other sources
------------------------

How Serialization Detects When a Class Has Changed

In order for serialization to gracefully detect when a versioning problem has occurred, it needs to be able to detect when a class has changed. As with all the other aspects of serialization, there is a default way that serialization does this. And there is a way for you to override the default.

The default involves a hashcode. Serialization creates a single hashcode, of type long, from the following information:

  • The class name and modifiers
  • The names of any interfaces the class implements
  • Descriptions of all methods and constructors except privatemethods and constructors
  • Descriptions of all fields except private, static, and private transient

Serialization Depends on Reflection

The dependence on reflection is the hardest of these to eliminate. Both serializing and deserializing require the serialization mechanism to discover information about the instance it is serializing. At a minimum, the serialization algorithm needs to find out things such as the value of serialVersionUID, whether writeObject( )is implemented, and what the superclass structure is. What's more, using the default serialization mechanism, (or calling defaultWriteObject( )from within writeObject( )) will use reflection to discover all the field values. This can be quite slow.

Comparing Externalizable to Serializable

Of course, this efficiency comes at a price. Serializablecan be frequently implemented by doing two things: declaring that a class implements the Serializableinterface and adding a zero-argument constructor to the class. Furthermore, as an application evolves, the serialization mechanism automatically adapts. Because the metadata is automatically extracted from the class definitions, application programmers often don't have to do anything except recompile the program.

On the other hand, Externalizableisn't particularly easy to do, isn't very flexible, and requires you to rewrite your marshalling and demarshalling code whenever you change your class definitions. However, because it eliminates almost all the reflective calls used by the serialization mechanism and gives you complete control over the marshalling and demarshalling algorithms, it can result in dramatic performance improvements.

Serialization Fields

Default serializable fields of a class are defined to be the non-transient and non-static fields. This default computation can be overridden by declaring a special field in the
Serializable class, serialPersistentFields. This field must be initialized with an array of ObjectStreamField objects that list the names and types of the serializable fields. The modifiers for the field are required to be private, static, and final.

For example, the following declaration duplicates the default behavior.

 class List implements Serializable {     List next;      private static final ObjectStreamField[] serialPersistentFields                  = {new ObjectStreamField("next", List.class)};  }

In this section, I describe a slightly simplified version of the serialization algorithm. I then proceed to a more complete description of the serialization process in the next section.

Writing

Because the class descriptions actually contain the metadata, the basic idea behind the serialization algorithm is pretty easy to describe. The only tricky part is handling circular references.

The problem is this: suppose instance Arefers to instance B. And instance Brefers back to instance A. Completely writing out Arequires you to write out B. But writing out Brequires you to write out A. Because you don't want to get into an infinite loop, or even write out an instance or a class description more than once you need to keep track of what's already been written to the stream. (Serialization is a slow process that uses the reflection API quite heavily in addition to the bandwidth)

ObjectOutputStreamdoes this by maintaining a mapping from instances and classes to handles. When writeObject( )is called with an argument that has already been written to the stream, the handle is written to the stream, and no further operations are necessary.

If, however, writeObject( )is passed an instance that has not yet been written to the stream, two things happen. First, the instance is assigned a reference handle, and the mapping from instance to reference handle is stored by ObjectOutputStream. The handle that is assigned is the next integer in a sequence.

TIP: Remember the reset( )method on ObjectOutputStream? It clears the mapping and resets the handle counter to 0x7E0000 .RMI also automatically resets its serialization mechanism after every remote method call.

Second, the instance data is written out as per the data format described earlier. This can involve some complications if the instance has a field whose value is also a serializable instance. In this case, the serialization of the first instance is suspended, and the second instance is serialized in its place (or, if the second instance has already been serialized, the reference handle for the second instance is written out). After the second instance is fully serialized, serialization of the first instance resumes. The contents of the stream look a little bit like Figure 10-5.

Diagram.
Figure 10-5. Contents of Serialization's data stream.

Reading

From the description of writing, it's pretty easy to guess most of what happens when readObject( )is called. Unfortunately, because of versioning issues, the implementation of readObject( )is actually a little bit more complex than you might guess.

When it reads in an instance description, ObjectInputStreamgets the following information:

  • Descriptions of all the classes involved
  • The serialization data from the instance

The problem is that the class descriptions that the instance of ObjectInputStreamreads from the stream may not be equivalent to the class descriptions of the same classes in the local JVM. For example, if an instance is serialized to a file and then read back in three years later, there's a pretty good chance that the class definitions used to serialize the instance have changed.

This means that ObjectInputStreamuses the class descriptions in two ways:

  • It uses them to actually pull data from the stream, since the class descriptions completely describe the contents of the stream.
  • It compares the class descriptions to the classes it has locally and tries to determine if the classes have changed, in which case it throws an exception. If the class descriptions match the local classes, it creates the instance and sets the instance's state appropriately.

ReadResolve and ReadReplace

One way of eliminating the extra instances and some of the unnecessary heap allocation would be to do something like this:

public class Gender implements Serializable {
  public final static Gender MALE   = new Gender("Male");
  public final static Gender FEMALE = new Gender("Female");
 
  private String name;
 
  private Gender(String name) {
    this.name = name;
  }
 
  Object writeReplace() throws ObjectStreamException {
    if (this.equals(MALE)) {
      return SerializedForm.MALE_FORM;
    } else {
      return SerializedForm.FEMALE_FORM;
    }
  }
 
  private static class SerializedForm implements Serializable {
 
    final static SerializedForm MALE_FORM   = new SerializedForm(0);
    final static SerializedForm FEMALE_FORM = new SerializedForm(1);
 
    private int value;
 
    SerializedForm(int value) {
      this.value = value;
    }
 
    Object readResolve() throws ObjectStreamException {
      if (value == MALE_FORM.value) {
        return Gender.MALE;
      } else {
        return Gender.FEMALE;
      }
    }
  }
}

This also guarantees that in all cases where genderInstance.equals(MALE) is true, genderInstance == Gender.MALE is also true.


Monday, August 16, 2010

Nice blog

http://biese.wordpress.com/2006/10/30/54/

jax-rpc vs jax-ws

JAX-WS

JAX-WS 2.0 is the successor of JAX-RPC 1.1 - the Java API for XML-based Web services. If possible, JAX-WS should be used instead as it is based on the most recent industry standards.

What remains the same?

Before we itemize the differences between JAX-RPC 1.1 and JAX-WS 2.0, we should first discuss what is the same.

* JAX-WS still supports SOAP 1.1 over HTTP 1.1, so interoperability will not be affected. The same messages can still flow across the wire.

* JAX-WS still supports WSDL 1.1, so what you've learned about that specification is still useful. A WSDL 2.0 specification is nearing completion, but it was still in the works at the time that JAX-WS 2.0 was finalized.

What is different?

Jax-RPC is generated at static time where as Jax-ws is dynamic

* SOAP 1.2
JAX-RPC and JAX-WS support SOAP 1.1. JAX-WS also supports SOAP 1.2.

* XML/HTTP
The WSDL 1.1 specification defined an HTTP binding, which is a means by which you can send XML messages over HTTP without SOAP. JAX-RPC ignored the HTTP binding. JAX-WS adds support for it.

* WS-I's Basic Profiles
JAX-RPC supports WS-I's Basic Profile (BP) version 1.0. JAX-WS supports BP 1.1. (WS-I is the Web services interoperability organization.)

* New Java features
o JAX-RPC maps to Java 1.4. JAX-WS maps to Java 5.0. JAX-WS relies on many of the features new in Java 5.0.
o Java EE 5, the successor to J2EE 1.4, adds support for JAX-WS, but it also retains support for JAX-RPC, which could be confusing to today's Web services novices.

* The data mapping model
o JAX-RPC has its own data mapping model, which covers about 90 percent of all schema types. Those that it does not cover are mapped to javax.xml.soap.SOAPElement.
o JAX-WS's data mapping model is JAXB. JAXB promises mappings for all XML schemas.

* The interface mapping model
JAX-WS's basic interface mapping model is not extensively different from JAX-RPC's; however:
o JAX-WS's model makes use of new Java 5.0 features.
o JAX-WS's model introduces asynchronous functionality.

* The dynamic programming model
o JAX-WS's dynamic client model is quite different from JAX-RPC's. Many of the changes acknowledge industry needs:
+ It introduces message-oriented functionality.
+ It introduces dynamic asynchronous functionality.
o JAX-WS also adds a dynamic server model, which JAX-RPC does not have.

* MTOM (Message Transmission Optimization Mechanism)
JAX-WS, via JAXB, adds support for MTOM, the new attachment specification. Microsoft never bought into the SOAP with Attachments specification; but it appears that everyone supports MTOM, so attachment interoperability should become a reality.

* The handler model
o The handler model has changed quite a bit from JAX-RPC to JAX-WS.
o JAX-RPC handlers rely on SAAJ 1.2. JAX-WS handlers rely on the new SAAJ 1.3 specification.

Thursday, August 12, 2010

Grid computing vs clouding computing

Grids and current Clouds so far:
- Grid systems are designed for collaborative sharing of resources belonging to different admin domains, while Clouds at the moment expose the resources of one domain to the outside world
- Grid systems support the execution of end-users applications as computational activities; a typical computational activity once accepted by a Grid endpoint, is locally handled by a batch system as a batch job; Clouds are mainly used for the remote deployment of services
– this is an important difference; Grids provide more domain-specific services; Clouds can sit below (the RightGrid can be a typical example of this)
– Grids are moving towards the adoption of virtual machine tecnologies, but the usage pattern will be the same (the submitted job is bound with the execution environment as VM image)
- Grid systems support large set of users organized in virtual organizations (credentials are typically enriched with VO-related information); Cloud systems support individual users (to my knowledge)

I would not see the size of allocation as a factor for differentiating them.

ou know, a lot of people really get confused on cloud vs grid. The two are closely related. I always think about it in the terms of virtualization vs grid (since I work for VMware). Grid is great if you have an app that needs a lot of combined compute cycles. Virtualization is great if you have a lot of apps that need a little compute cycles each.

Now enter cloud. Cloud really encompasses both of these. The point of cloud is you don’t have to care if you have a grid infrastructure underneath or a virtualization infrastruture underneath. All you do is deploy your app to the cloud and let the cloud figure out how to get the app the resources it needs.

That’s why cloud is the over arching architecture for virtualization or grid or SaaS or PaaS or anything else. All of these can play in the cloud together at the same time. You build your cloud with these blocks as you see fit and based on what you want your cloud to do. Simple as that.

Wednesday, August 11, 2010

Phantom references

Nice article about phantom reference. It was mystery for me since long time.

WeakReferences will be enqueued before object is really deleted and reference will become dead beside object stay alive. Here PhantomReferences can help, Phantom references are enqueued when objects are deleted from memory and get() method always returnsnull to prevent resurrecting object. So phantom references are good for determining exactly when object is deleted from memory.

http://www.kdgregory.com/index.php?page=java.refobj

Architectural Decisions

JAD Checklist

Here is a quick checklist for how to conduct the JAD sessions to ensure you do not skip any of the most important steps. As you feel more comfortable with this process, feel free to modify this and create your own JAD checklist for your organization to follow:

  1. Assign participants.

  2. Mandatory. Software engineer, architect, operations engineer (database administrator, systems administrator, and/or network engineer).

  3. Optional. Product manager, project manager, quality assurance engineer.

  4. Schedule one or more sessions. Divide sessions by component if possible: database, server, cache, storage, etc.

  5. Start the session by covering the specifications.

  6. Review the architectural principles related to this session’s component.

  7. Brainstorm approaches. No need for complete detail.

  8. List pros and cons of each approach.

  9. If multiple sessions are needed, have someone write down all the ideas and send around to the group.

  10. Arrive at consensus for the design. Use voting, rating, ranking, or any other decision technique that everyone can support.

  11. Create the final documentation of the design in preparation for the ARB.

Don’t be afraid to modify this checklist as necessary for your organization.

Architectural Principles

Twelve Architectural Principles

In this section, we introduce twelve architectural principles. Many times after engagements, we will “seed” the architectural principle gardens of our clients with our twelve principles and then ask them to run their own process, taking as many of ours as they would like, discarding any that do not work for them, and adding as many as they would like. We only ask that they let us know what they are considering so that we can modify our principles over time if they come up with an especially ingenious or useful principle. The Venn diagram shown in Figure 12.3 depicts our principles as they relate to scalability, availability, and cost. We will discuss each of the principles at a high level and then dig more deeply into those that are identified as having an impact on scalability.

Figure 12.3. AKF Architecture Principles


N+1 Design

Simply stated, this principle is the need to ensure that anything you develop has at least one additional instance of that system in the event of failure. Apply the rule of three that we will discuss in Chapter 32, Planning Data Centers, or what we sometimes call ensuring that you build one for you, one for the customer, and one to fail. This principle holds true for everything from large data center design to Web Services implementations.

Design for Rollback

This is a critical principle for Web services, Web 2.0, or Software as a Service (SaaS) companies. Whatever you build, ensure that it is backward compatible. Make sure that you can roll it back if you find yourself in a position of spending too much time “fixing forward.” Some companies will indicate that they can roll back within a specific window of time, say the first couple of hours. Unfortunately, some of the worst and most disastrous failures don’t show up for a few days, especially when those failures have to do with customer data corruption. In the ideal case, you will also design to allow something to be rolled, pushed, or deployed while your product or platform is still “live.” The rollback process will be covered in more detail inChapter 18, Barrier Conditions and Rollback.

Design to Be Disabled

When designing systems, especially very risky systems that communicate to other systems or services, design them to be capable of being “marked down” or disabled. This may give you additional time to “fix forward” or ensure that you don’t go down as a result of a bug that introduces strange out of bounds demand characteristics on your system.

Design to Be Monitored

As we’ve discussed earlier in this book, systems should be designed from the ground up to be monitored. This goes beyond just applying agents to a system to monitor the utilization of CPU, memory, or disk I/O. It also goes beyond simply logging errors. You want your system to identify when it is performing differently than it normally operates in addition to telling you when it is not functioning properly.

Design for Multiple Live Sites

Many companies have disaster recovery centers with systems sitting mostly idle or used for QA until such time as they are needed. The primary issue with such solutions is that it takes a significant amount of time to fail over and validate the disaster recovery center in the event of a disaster. A better solution is to be serving traffic out of both sites live, such that the team is comfortable with the operation of both sites. Our rule of three applies here as well and in most cases you can operate three sites live at equal to or lower cost than the operation of a hot site and a cold disaster recovery site. We’ll discuss this topic in greater detail later in the chapter.

Use Mature Technologies

When you are buying technology, use technology that is proven and that has already had the bugs worked out of it. There are many cases where you might be willing or interested in the vendor promised competitive edge that some new technology offers. Be careful here, because if you become an early adopter of software or systems, you will also be on the leading edge of finding all the bugs with that software or system. If availability and reliability are important to you and your customers, try to be an early majority or late majority adopter of those systems that are critical to the operations of your service, product, or platform.

Asynchronous Design

Whenever possible, systems should communicate in an asynchronous fashion. Asynchronous systems tend to be more fault tolerant to extreme load and do not easily fall prey to the multiplicative effects of failure that characterize synchronous systems. We will discuss the reasons for this in greater detail in the next section of this chapter.

Stateless Systems

Although some systems need state, state has a cost in terms of availability, scalability, and overall cost of your system. When you store state, you do so at a cost of memory or disk space and maybe the cost of databases. This results in additional calls that are often made in synchronous fashion, which in turn reduces availability. As state is often costly compared to stateless systems, it increases the per unit cost of scaling your site. Try to avoid state whenever possible.

Scale Out Not Up

This is the principle that addresses the need to scale horizontally rather than vertically. Whenever you base the viability of your business on faster, bigger, and more expensive hardware, you define a limit on the growth of your business. That limit may change with time as larger scalable multiprocessor systems or vendor supported distributed systems become available, but you are still implicitly stating that you will grow governed by third-party technologies. When it comes to ensuring that you can meet your shareholder needs, design your systems to be able to be horizontally split in terms of data, transactions, and customers.

Design for at Least Two Axes of Scale

Whenever you design a major system, you should ensure that it is capable of being split on at least two axes of the cube that we introduce inChapter 22, Introduction to the AKF Scale Cube, to ensure that you have plenty of room for “surprise” demand. This does not mean that you need to implement those splits on day one, but rather that they are thought through and at least architected so that the long lead time of rearchitecting a system is avoided.

Buy When Non Core

We will discuss this a bit more in Chapter 15, Focus on Core Competencies: Build Versus Buy. Although we have this identified as a cost initiative, we can make arguments that it affects scalability and availability as well as productivity even though productivity isn’t a theme within our principles. The basic premise is that regardless of how smart you and your team are, you simply aren’t the best at everything. Furthermore, your shareholders really expect you to focus on the things that really create competitive differentiation and therefore shareholder value. So only build things when you are really good at it and it makes a significant difference in your product, platform, or system.

Use Commodity Hardware

We often get a lot of pushback on this one, but it fits in well with the rest of the principles we’ve outlined. It is similar to our principle of using mature technologies. Hardware, especially servers, moves at a rapid pace toward commoditization characterized by the market buying predominately based on cost. If you can develop your architecture such that you can scale horizontally easily, you should be buying the cheapest hardware you can get your hands on, assuming that the cost of ownership of that hardware (including the cost of handling higher failure rates) is lower than higher end hardware.

Tuesday, August 10, 2010

Scalability patterns for facebook

  1. Scaling takes Iteration
  2. Don't Over Design
  3. Choose the right tool for the job, but realize that your choice comes with overhead.
  4. Get the culture right. Move Fast - break things. Huge Impact - small teams. Be bold - innovate.
  5. Get track of all information in plain hashMap ids. keep those ids mapping latest.. At rutime simply build information and data by pulling respective data from meta ids.
  6. No prod join.. If you need any data. simply do multi get on cache and them lookup actual info parallely.
  7. Used NFS to run the files. haystack