It is important to understand the distinction between Functions and Sandboxes in the Buildfunctions ecosystem:
Functions (CPUFunction, GPUFunction): Orchestrate top-level application or agent logic.
Sandboxes (CPUSandbox, GPUSandbox): Execute untrusted and dynamic agent actions with full GPU access, automatic model mounting, built-in AI frameworks, runtime dependency installs, and more. They spin up instantly for isolated tasks and can run for up to 24 hours.
This guide focuses on Functions—deploying and managing your top-level infrastructure.
Below is a high-level overview of how to structure your Buildfunctions handler functions and their responses. The various examples demonstrate a consistent pattern: a main entry point function named handler that returns (or echoes) a response containing at least a body. Beyond that, you can optionally include a status code and headers in the natural syntax the language supports.
The handler function:
Naming: Your main function must be named handler. This name is how Buildfunctions identifies which function to invoke when your code runs.
Purpose: The handler function is your function’s entry point. It contains the logic you want to run whenever your function is invoked. You can write other helper functions, but only the handler will be called automatically.
The response format:
Your handler function must provide a response that can be returned to the caller. While the exact syntax differs by language, the structure is essentially the same:
import osdef handler(event, context): # Retrieve the environment variable 'RANDOM_VAR' random_variable_value = os.getenv("RANDOM_VAR") print(f"The value of 'RANDOM_VAR' is: {random_variable_value}") # Construct the response body body = "Hello, world! To see your log, please refer to the logs page." # Return the response return { "statusCode": 200, "headers": { "Content-Type": "text/html; charset=utf-8" }, "body": body }
This example demonstrates a streaming text generation function using transformers. It includes a requirements block to specify dependencies.
Python
Copy
"""Name: streaming-text-generationPython GPU Function for text generation with a streaming response using PyTorch and Transformers. Built and deployed on Buildfunctions.requirements (add these within the Buildfunctions dashboard or via SDK)transformersaccelerate"""import torchfrom transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig# Global variables for cachingmodel = Nonetokenizer = Nonedevice = Nonedef initialize_model(): global model, tokenizer, device try: if model is not None and tokenizer is not None: return device = torch.device("cuda" if torch.cuda.is_available() else "cpu") torch.backends.cudnn.benchmark = True model_path = "/mnt/storage/Llama-3.2-3B-Instruct-bnb-4bit" config = AutoConfig.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) bnb_config = BitsAndBytesConfig(load_in_4bit=True) model = AutoModelForCausalLM.from_pretrained( model_path, config=config, torch_dtype=torch.float16, device_map="auto", quantization_config=bnb_config if not hasattr(config, "quantization_config") else None, ) model.to(device) except Exception as e: print(f"Error initializing the model: {e}") raise RuntimeError("Failed to initialize the model.")async def stream_tokens(prompt): try: initialize_model() input_ids = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True).input_ids.to(device) yield b"<<START_STREAM>>\n" with torch.no_grad(): past_key_values = None generated_ids = input_ids for _ in range(200): try: outputs = model( input_ids=generated_ids, past_key_values=past_key_values, use_cache=True, ) logits = outputs.logits[:, -1, :] # Simple sampling next_token_id = torch.argmax(logits, dim=-1, keepdim=True) past_key_values = outputs.past_key_values generated_ids = next_token_id token = tokenizer.decode(next_token_id.squeeze(), skip_special_tokens=True) yield f"<<STREAM_CHUNK>>{token}<<END_STREAM_CHUNK>>\n".encode() if next_token_id.squeeze().item() == tokenizer.eos_token_id: break except Exception as gen_error: print(f"Error during token generation: {gen_error}") break yield b"<<END_STREAM>>\n" except Exception as e: print(f"Error in streaming tokens: {e}") yield b"<<STREAM_ERROR>>\n"async def async_stream_wrapper(prompt): async for chunk in stream_tokens(prompt): yield chunkdef handler(): try: prompt = "Tell me about the most mysterious phenomena in the universe." return { "statusCode": 200, "headers": { "Content-Type": "text/event-stream", "Cache-Control": "no-cache", "Access-Control-Allow-Origin": "*", }, "body": async_stream_wrapper(prompt), } except Exception as e: return {"statusCode": 500, "body": {"error": "Internal Server Error"}}
A powerful pattern in Buildfunctions is using a top-level Function to orchestrate nested Sandboxes. This allows you to combine persistent endpoints with ephemeral, high-performance compute.